dc.contributor
García González, Roberto
dc.contributor
Universitat de Lleida. Escola Politècnica Superior
dc.contributor.author
Berga Gatius, Albert
dc.date.accessioned
2024-12-05T23:07:43Z
dc.date.available
2024-12-05T23:07:43Z
dc.date.issued
2017-07-24T07:41:13Z
dc.date.issued
2017-07-24T07:41:13Z
dc.identifier
http://hdl.handle.net/10459.1/60159
dc.identifier.uri
http://hdl.handle.net/10459.1/60159
dc.description.abstract
In this project we have used data science tools and techniques to detect duplicated data in GREC repository, which contains information about the articles published by University of Lleida staff. We have used Locality-sensitive hashing (LSH) to group articles in a way that those which are more likely to be duplicates are classified to the same group. Then, we have compared pairwise articles in the same group to determine which pairs are referring the same article.
dc.rights
info:eu-repo/semantics/openAccess
dc.rights
http://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subject
Mineria de dades
dc.title
Deduplication of Universitat de Lleida scholarly data