dc.contributor
Universidad Nacional de Educación a Distancia
dc.contributor
Open University
dc.contributor
New York University
dc.contributor.author
Lastra Díaz, Juan José
dc.contributor.author
García Serrano, Ana
dc.contributor.author
Batet Sanromà, Montserrat
dc.contributor.author
Fernández, Miriam
dc.contributor.author
Chirigati, Fernando
dc.date
2019-04-11T07:54:00Z
dc.date
2019-04-11T07:54:00Z
dc.identifier.citation
Lastra Díaz, J.J., García Serrano, A., Batet Sanromà, M., Fernández, M. & Chirigati, F. (2017). HESML: A scalable ontology-based semantic similarity measures library with a set of reproducible experiments and a replication dataset. Information Systems, 66(), 97-118. doi: 10.1016/j.is.2017.02.002
dc.identifier.citation
0306-4379
dc.identifier.citation
10.1016/j.is.2017.02.002
dc.identifier.uri
http://hdl.handle.net/10609/93058
dc.description.abstract
This work is a detailed companion reproducibility paper of the methods and experiments proposed by Lastra-Díaz and García-Serrano in (2015, 2016) [56-58], which introduces the following contributions: (1) a new and efficient representation model for taxonomies, called PosetHERep, which is an adaptation of the half-edge data structure commonly used to represent discrete manifolds and planar graphs; (2) a new Java software library called the Half-Edge Semantic Measures Library (HESML) based on PosetHERep, which implements most ontology-based semantic similarity measures and Information Content (IC) models reported in the literature; (3) a set of reproducible experiments on word similarity based on HESML and ReproZip with the aim of exactly reproducing the experimental surveys in the three aforementioned works; (4) a replication framework and dataset, called WNSimRep v1, whose aim is to assist the exact replication of most methods reported in the literature; and finally, (5) a set of scalability and performance benchmarks for semantic measures libraries. PosetHERep and HESML are motivated by several drawbacks in the current semantic measures libraries, especially the performance and scalability, as well as the evaluation of new methods and the replication of most previous methods. The reproducible experiments introduced herein are encouraged by the lack of a set of large, self-contained and easily reproducible experiments with the aim of replicating and confirming previously reported results. Likewise, the WNSimRep v1 dataset is motivated by the discovery of several contradictory results and difficulties in reproducing previously reported methods and experiments. PosetHERep proposes a memory-efficient representation for taxonomies which linearly scales with the size of the taxonomy and provides an efficient implementation of most taxonomy-based algorithms used by the semantic measures and IC models, whilst HESML provides an open framework to aid research into the area by providing a simpler and more efficient software architecture than the current software libraries. Finally, we prove the outperformance of HESML on the state-of-the-art libraries, as well as the possibility of significantly improving their performance and scalability without caching using PosetHERep.
dc.format
application/pdf
dc.publisher
Information Systems
dc.relation
Information Systems, 2017, 6
dc.relation
https://doi.org/10.1016/j.is.2017.02.002
dc.relation
info:eu-repo/grantAgreement/TIN2015-71785-R
dc.relation
info:eu-repo/grantAgreement/S2015/HUM3494
dc.rights
<a href="http://creativecommons.org/licenses/by-nc-nd/3.0/es/">http://creativecommons.org/licenses/by-nc-nd/3.0/es/</a>
dc.subject
intrinsic and corpus-based Information
dc.subject
ontology-based semantic similarity
dc.subject
content models
dc.subject
WNSimRep v1 dataset
dc.subject
reproducible experiments on word
dc.subject
WordNet-based semantic similarity
dc.subject
mesures semàntiques bibliotecàries
dc.subject
models de contingut
dc.subject
WNSimRep v1 dataset
dc.subject
experiments reproduïbles amb paraules
dc.subject
WordNet-basat en similitud semàntica
dc.subject
informació intrínseca basada en corpus
dc.subject
medidas semánticas bibliotecarias
dc.subject
modelos de contenido
dc.subject
WNSimRep v1 dataset
dc.subject
experimentos reproducibles con palabras
dc.subject
WordNet-basado en similitud semántica
dc.subject
información intrínseca basada en corpus
dc.subject
Ontologies (Information retrieval)
dc.subject
Ontologies (Informàtica)
dc.subject
Ontologías (Informática)
dc.title
HESML: A scalable ontology-based semantic similarity measures library with a set of reproducible experiments and a replication dataset
dc.type
info:eu-repo/semantics/article
dc.type
info:eu-repo/semantics/publishedVersion