Deduplication of Universitat de Lleida scholarly data

dc.contributor
García González, Roberto
dc.contributor
Universitat de Lleida. Escola Politècnica Superior
dc.contributor.author
Berga Gatius, Albert
dc.date.accessioned
2024-12-05T23:07:43Z
dc.date.available
2024-12-05T23:07:43Z
dc.date.issued
2017-07-24T07:41:13Z
dc.date.issued
2017-07-24T07:41:13Z
dc.date.issued
2017-07
dc.identifier
http://hdl.handle.net/10459.1/60159
dc.identifier.uri
http://hdl.handle.net/10459.1/60159
dc.description.abstract
In this project we have used data science tools and techniques to detect duplicated data in GREC repository, which contains information about the articles published by University of Lleida staff. We have used Locality-sensitive hashing (LSH) to group articles in a way that those which are more likely to be duplicates are classified to the same group. Then, we have compared pairwise articles in the same group to determine which pairs are referring the same article.
dc.language
eng
dc.rights
cc-by-nc-nd
dc.rights
info:eu-repo/semantics/openAccess
dc.rights
http://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subject
Spark
dc.subject
Big data
dc.subject
Data mining
dc.subject
Data science
dc.subject
Macrodades
dc.subject
Mineria de dades
dc.title
Deduplication of Universitat de Lleida scholarly data
dc.type
masterThesis


Ficheros en el ítem

FicherosTamañoFormatoVer

No hay ficheros asociados a este ítem.

Este ítem aparece en la(s) siguiente(s) colección(ones)