To access the full text documents, please follow this link: http://hdl.handle.net/10459.1/60159

Deduplication of Universitat de Lleida scholarly data
Berga Gatius, Albert
García González, Roberto; Universitat de Lleida. Escola Politècnica Superior
In this project we have used data science tools and techniques to detect duplicated data in GREC repository, which contains information about the articles published by University of Lleida staff. We have used Locality-sensitive hashing (LSH) to group articles in a way that those which are more likely to be duplicates are classified to the same group. Then, we have compared pairwise articles in the same group to determine which pairs are referring the same article.
-Spark
-Big data
-Data mining
-Data science
-Macrodades
-Mineria de dades
cc-by-nc-nd
http://creativecommons.org/licenses/by-nc-nd/4.0/
masterThesis
         

Full text files in this document

Files Size Format View
abergag.pdf 1.914 MB application/pdf View/Open

Show full item record

 

Coordination

 

Supporters