To access the full text documents, please follow this link: http://hdl.handle.net/2117/112967

Towards a comprehensive Data LifeCycle model for big data environments
Sinaeepourfard, Amir; García Almiñana, Jordi; Masip Bruin, Xavier; Marín Tordera, Eva
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors; Universitat Politècnica de Catalunya. CRAAX - Centre de Recerca d'Arquitectures Avançades de Xarxes
A huge amount of data is constantly being produced in the world. Data coming from the IoT, from scientific simulations, or from any other field of the eScience, are accumulated over historical data sets and set up the seed for future Big Data processing, with the final goal to generate added value and discover knowledge. In such computing processes, data are the main resource, however, organizing and managing data during their entire life cycle becomes a complex research topic. As part of this, Data LifeCycle (DLC) models have been proposed to efficiently organize large and complex data sets, from creation to consumption, in any field, and any scale, for an effective data usage and big data exploitation. 2. Several DLC frameworks can be found in the literature, each one defined for specific environments and scenarios. However, we realized that there is no global and comprehensive DLC model to be easily adapted to different scientific areas. For this reason, in this paper we describe the Comprehensive Scenario Agnostic Data LifeCycle (COSA-DLC) model, a DLC model which: i) is proved to be comprehensive as it addresses the 6Vs challenges (namely Value, Volume, Variety, Velocity, Variability and Veracity, and ii), it can be easily adapted to any particular scenario and, therefore, fit the requirements of a specific scientific field. In this paper we also include two use cases to illustrate the ease of the adaptation in different scenarios. We conclude that the comprehensive scenario agnostic DLC model provides several advantages, such as facilitating global data management, organization and integration, easing the adaptation to any kind of scenario, guaranteeing good data quality levels and, therefore, saving design time and efforts for the scientific and industrial communities.
Peer Reviewed
-Àrees temàtiques de la UPC::Informàtica::Sistemes d'informació
-Internet of things
-Data mining
-Big data
-Data LifeCycle
-Data management
-Data organization
-Data complexity
-Vs Challenges
-Internet de les coses
-Mineria de dades
-Macrodades
Article - Submitted version
Conference Object
Institute of Electrical and Electronics Engineers (IEEE)
         

Show full item record

Related documents

Other documents of the same author

Sinaeepourfard, Amir; García Almiñana, Jordi; Masip Bruin, Xavier; Marín Tordera, Eva
Sinaeepourfard, Amir; García Almiñana, Jordi; Masip Bruin, Xavier; Marín Tordera, Eva; Cirera, J.; Grau, G.; Casaus, F.
Sinaeepourfard, Amir; García Almiñana, Jordi; Masip Bruin, Xavier; Marín Tordera, Eva
Sinaeepourfard, Amir; García Almiñana, Jordi; Masip Bruin, Xavier; Marín Tordera, Eva
Sinaeepourfard, Amir; García Almiñana, Jordi; Masip Bruin, Xavier; Marín Tordera, Eva; Yin, Xuefeng; Wang, Chao
 

Coordination

 

Supporters