Per accedir als documents amb el text complet, si us plau, seguiu el següent enllaç: http://hdl.handle.net/2117/125127

Data generator for evaluating ETL process quality
Theodorou, Vasileios; Jovanovic, Petar; Abelló Gamazo, Alberto; Nakuçi, Emona
Universitat Politècnica de Catalunya. Departament d'Enginyeria de Serveis i Sistemes d'Informació; Universitat Politècnica de Catalunya. IMP - Information Modeling and Processing; Universitat Politècnica de Catalunya. inSSIDE - integrated Software, Service, Information and Data Engineering
Obtaining the right set of data for evaluating the fulfillment of different quality factors in the extract-transform-load (ETL) process design is rather challenging. First, the real data might be out of reach due to different privacy constraints, while manually providing a synthetic set of data is known as a labor-intensive task that needs to take various combinations of process parameters into account. More importantly, having a single dataset usually does not represent the evolution of data throughout the complete process lifespan, hence missing the plethora of possible test cases. To facilitate such demanding task, in this paper we propose an automatic data generator (i.e., Bijoux). Starting from a given ETL process model, Bijoux extracts the semantics of data transformations, analyzes the constraints they imply over input data, and automatically generates testing datasets. Bijoux is highly modular and configurable to enable end-users to generate datasets for a variety of interesting test scenarios (e.g., evaluating specific parts of an input ETL process design, with different input dataset sizes, different distributions of data, and different operation selectivities). We have developed a running prototype that implements the functionality of our data generation framework and here we report our experimental findings showing the effectiveness and scalability of our approach.
Peer Reviewed
-Àrees temàtiques de la UPC::Informàtica::Sistemes d'informació
-Computer systems - Quality control
-Data generator
-ETL
-Process quality
-Sistemes informàtics -- Gestió i control
Article - Versió presentada
Article
Elsevier
         

Mostra el registre complet del document

Documents relacionats

Altres documents del mateix autor/a

Nakuçi, Emona; Theodorou, Vasileios; Jovanovic, Petar; Abelló Gamazo, Alberto
Theodorou, Vasileios; Abelló Gamazo, Alberto; Thiele, Maik; Lehner, Wolfgang
Theodorou, Vasileios; Abelló Gamazo, Alberto; Lehner, Wolfgang
Theodorou, Vasileios; Abelló Gamazo, Alberto; Lehner, Wolfgang; Thiele, Maik
Theodorou, Vasileios; Abelló Gamazo, Alberto; Thiele, Maik; Lehner, Wolfgang