MapReduce performance models for Hadoop 2.x

dc.contributor
Universitat Politècnica de Catalunya. Departament d'Enginyeria de Serveis i Sistemes d'Informació
dc.contributor
Universitat Politècnica de Catalunya. inSSIDE - integrated Software, Service, Information and Data Engineering
dc.contributor.author
Glushkova, Daria
dc.contributor.author
Jovanovic, Petar
dc.contributor.author
Abelló Gamazo, Alberto
dc.date.issued
2017
dc.identifier
Glushkova, D., Jovanovic, P., Abelló, A. MapReduce performance models for Hadoop 2.x. A: International Workshop On Design, Optimization, Languages and Analytical Processing of Big Data. "Proceedings of the Workshops of the EDBT/ICDT 2017 Joint Conference (EDBT/ICDT 2017): Venice, Italy, March 21-24, 2017". Venice: CEUR-WS.org, 2017, p. 1-10.
dc.identifier
1613-0073
dc.identifier
https://hdl.handle.net/2117/113535
dc.description.abstract
MapReduce is a popular programming model for distributed processing of large data sets. Apache Hadoop is one of the most common open-source implementations of such paradigm. Performance analysis of concurrent job executions has been recognized as a challenging problem, at the same time, that it may provide reasonably accurate job response time at significantly lower cost than experimental evaluation of real setups. In this paper, we tackle the challenge of defining MapReduce performance models for Hadoop 2.x. While there are several efficient approaches for modeling the performance of MapReduce workloads in Hadoop 1.x, the fundamental architectural changes of Hadoop 2.x require that the cost models are also reconsidered. The proposed solution is based on an existing performance model for Hadoop 1.x, but it takes into consideration the architectural changes of Hadoop 2.x and captures the execution flow of a MapReduce job by using queuing network model. This way the cost model adheres to the intra-job synchronization constraints that occur due the contention at shared resources. The accuracy of our solution is validated via comparison of our model estimates against measurements in a real Hadoop 2.x setup. According to our evaluation results, the proposed model produces estimates of average job response time with error within the range of 11% - 13.5%.
dc.description.abstract
Peer Reviewed
dc.description.abstract
Postprint (published version)
dc.format
10 p.
dc.format
application/pdf
dc.language
eng
dc.publisher
CEUR-WS.org
dc.relation
http://ceur-ws.org/Vol-1810/DOLAP_paper_28.pdf
dc.rights
http://creativecommons.org/licenses/by-nc-nd/3.0/es/
dc.rights
Open Access
dc.rights
Attribution-NonCommercial-NoDerivs 3.0 Spain
dc.subject
Àrees temàtiques de la UPC::Informàtica::Sistemes d'informació
dc.subject
Electronic data processing -- Distributed processing
dc.subject
Cost effectiveness
dc.subject
Open source software
dc.subject
MapReduce performance models
dc.subject
Hadoop 2.x
dc.subject
Queuing theory
dc.subject
Mean value analysis
dc.subject
Processament distribuït de dades
dc.subject
Cost-eficàcia
dc.subject
Programari lliure
dc.title
MapReduce performance models for Hadoop 2.x
dc.type
Conference report


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

E-prints [72987]