Bijoux : data generator for evaluating ETL process quality

dc.contributor
Universitat Politècnica de Catalunya. Departament d'Enginyeria de Serveis i Sistemes d'Informació
dc.contributor
Universitat Politècnica de Catalunya. MPI - Modelització i Processament de la Informació
dc.contributor.author
Nakuçi, Emona
dc.contributor.author
Theodorou, Vasileios
dc.contributor.author
Jovanovic, Petar
dc.contributor.author
Abelló Gamazo, Alberto
dc.date.issued
2014
dc.identifier
Nakuçi, E. [et al.]. Bijoux : data generator for evaluating ETL process quality. A: International Workshop On Data Warehousing and OLAP. "Proceedings of the 17th International Workshop on Data Warehousing and OLAP". Shanghai: 2014, p. 23-32.
dc.identifier
978-1-4503-0999-8
dc.identifier
https://hdl.handle.net/2117/26130
dc.identifier
10.1145/2666158.2666183
dc.description.abstract
Obtaining the right set of data for evaluating the fulfillment of different quality standards in the extract-transform-load (ETL) process design is rather challenging. First, the real data might be out of reach due to different privacy constraints, while providing a synthetic set of data is known as a labor-intensive task that needs to take various combinations of process parameters into account. Additionally, having a single dataset usually does not represent the evolution of data throughout the complete process lifespan, hence missing the plethora of possible test cases. To facilitate such demanding task, in this paper we propose an automatic data generator (i.e., Bijoux). Starting from a given ETL process model, Bijoux extracts the semantics of data transformations, analyzes the constraints they imply over data, and automatically generates testing datasets. At the same time, it considers different dataset and transformation characteristics (e.g., size, distribution, selectivity, etc.) in order to cover a variety of test scenarios. We report our experimental findings showing the effectiveness and scalability of our approach.
dc.description.abstract
Peer Reviewed
dc.description.abstract
Postprint (published version)
dc.format
10 p.
dc.format
application/pdf
dc.language
eng
dc.relation
http://dl.acm.org/citation.cfm?doid=2666158.2666183
dc.rights
Restricted access - publisher's policy
dc.subject
Àrees temàtiques de la UPC::Informàtica::Sistemes d'informació
dc.subject
Data warehousing
dc.subject
Process quality
dc.subject
Data generator
dc.subject
ETL
dc.subject
Repositoris
dc.subject
Gestors de dades
dc.title
Bijoux : data generator for evaluating ETL process quality
dc.type
Conference report


Fitxers en aquest element

FitxersGrandàriaFormatVisualització

No hi ha fitxers associats a aquest element.

Aquest element apareix en la col·lecció o col·leccions següent(s)

E-prints [73032]