DS-Prox : dataset proximity mining for governing the data lake

Inicio | ¿Qué es? | Contacto

English | Català

Consultar RECERCAT

Por comunidades y
colecciones Por fecha Por autores Por títulos Por temas (CDU)

Consultar departamento

Por fecha Por autores Por títulos Por temas (CDU)

Estadisticas

Del documento Todo RECERCAT

Mi RECERCAT

Entrar Alertas por correo-e

Directorio de otros repositorios

RECERCAT Principal > Universitat Politècnica de Catalunya > Documents de recerca > Visualizar documento

Para acceder a los documentos con el texto completo, por favor, siga el siguiente enlace: http://hdl.handle.net/2117/117036

dc.contributor	Universitat Politècnica de Catalunya. Departament d'Enginyeria de Serveis i Sistemes d'Informació
dc.contributor	Universitat Politècnica de Catalunya. inSSIDE - integrated Software, Service, Information and Data Engineering
dc.contributor	Universitat Politècnica de Catalunya. IMP - Information Modeling and Processing
dc.contributor.author	Al-serafi, Ayman Mounir Mohamed
dc.contributor.author	Calders, Toon
dc.contributor.author	Abelló Gamazo, Alberto
dc.contributor.author	Romero Moral, Óscar
dc.date	2017
dc.identifier.citation	Al-serafi, A., Calders, T., Abello, A., Romero, O. DS-Prox : dataset proximity mining for governing the data lake. A: The International Conference on Similarity Search and Applications. "Similarity Search and Applications: 10th International Conference, SISAP 2017: Munich, Germany, October 4-6, 2017: proceedings". Berlín: Springer, 2017, p. 284-299.
dc.identifier.citation	978-3-319-68474-1
dc.identifier.citation	10.1007/978-3-319-68474-1_20
dc.identifier.uri	http://hdl.handle.net/2117/117036
dc.description.abstract	With the arrival of Data Lakes (DL) there is an increasing need for efficient dataset classification to support data analysis and information retrieval. Our goal is to use meta-features describing datasets to detect whether they are similar. We utilise a novel proximity mining approach to assess the similarity of datasets. The proximity scores are used as an efficient first step, where pairs of datasets with high proximity are selected for further time-consuming schema matching and deduplication. The proposed approach helps in early-pruning unnecessary computations, thus improving the efficiency of similar-schema search. We evaluate our approach in experiments using the OpenML online DL, which shows significant efficiency gains above 25% compared to matching without early-pruning, and recall rates reaching higher than 90% under certain scenarios.
dc.description.abstract	Peer Reviewed
dc.language.iso	eng
dc.publisher	Springer
dc.relation	https://link.springer.com/chapter/10.1007/978-3-319-68474-1_20
dc.rights	info:eu-repo/semantics/openAccess
dc.subject	Àrees temàtiques de la UPC::Informàtica::Enginyeria del software
dc.subject	Data mining
dc.subject	Proximity Mining
dc.subject	Data Lakes
dc.subject	Data Governance
dc.subject	Similarity Search
dc.subject	Mineria de dades
dc.title	DS-Prox : dataset proximity mining for governing the data lake
dc.type	info:eu-repo/semantics/submittedVersion
dc.type	info:eu-repo/semantics/conferenceObject

Mostrar el registro sencillo del ítem

Documentos relacionados

Otros documentos del mismo autor/a

Keeping the data lake in form: DS-kNN datasets categorization using proximity mining

Al-serafi, Ayman Mounir Mohamed; Abelló Gamazo, Alberto; Romero Moral, Óscar; Calders, Toon

Towards information profiling: data lake content metadata management

Al-serafi, Ayman Mounir Mohamed; Abelló Gamazo, Alberto; Romero Moral, Óscar; Calders, Toon

H-word: Supporting job scheduling in Hadoop with workload-driven data redistribution

Jovanovic, Petar; Romero Moral, Óscar; Calders, Toon; Abelló Gamazo, Alberto

An integration-oriented ontology to govern evolution in big data ecosystems

Nadal Francesch, Sergi; Romero Moral, Óscar; Abelló Gamazo, Alberto; Vassiliadis, Panos; Vansummeren, Stijn

Quarry

Abelló Gamazo, Alberto; Romero Moral, Óscar; Jovanovic, Petar; Nadal Francesch, Sergi; Bilalli, Besim; Candón Arenas, Héctor; Mayorova, Daria; Thavornun, Varunya; Gil González, Daniel

Accesibilidad | Aviso legal | Política de Cookies | Documentos de uso interno

Coordinación

Patrocinio