Title:
|
On the use of semantic blocking techniques for data cleansing and integration
|
Author:
|
Nin Guerrero, Jordi; Muntés Mulero, Víctor; Martínez Bazán, Norbert; Larriba Pey, Josep
|
Other authors:
|
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors; Universitat Politècnica de Catalunya. DAMA-UPC - Data Management Group |
Abstract:
|
Record Linkage (RL) is an important component
of data cleansing and integration. For years, many efforts have focused on improving the performance of the RL process,
either by reducing the number of record comparisons or by reducing the number of attribute comparisons, which reduces the
computational time, but very often decreases the quality of the results. However, the real bottleneck of RL is the post-process,
where the results have to be reviewed by experts that decide which pairs or groups of records are real links and which are false hits. In this paper, we show that exploiting the relationships (e.g. foreign key) established between one or more data sources, makes it possible to find a new sort of semantic blocking method that improves the number of hits and reduces the amount of review effort. |
Abstract:
|
Peer Reviewed |
Subject(s):
|
-Àrees temàtiques de la UPC::Informàtica::Sistemes d'informació::Emmagatzematge i recuperació de la informació -Data integration (Computer science) -Electronic data processing -- Data preparation -Electronic data processing -- Quality control -Semantic information -Blocking algorithms -Record linkage -Data integration -Data cleansing -Processament electrònic de dades -- Control de qualitat -Processament electrònic de dades -- Depuració |
Rights:
|
|
Document type:
|
Article - Published version Conference Object |
Published by:
|
IEEE Computer Society
|
Share:
|
|