Abstract
MapReduce is an emerging programming paradigm for data-parallel applications. We discuss common strategies to implement a MapReduce runtime and propose an optimized implementation on top of MPI. Our implementation combines redistribution and reduce and moves them into the network. This approach especially benefits applications with a limited number of output keys in the map phase. We also show how anticipated MPI-2.2 and MPI-3 features, such as MPI-Reduce-local and nonblocking collective operations, can be used to implement and optimize MapReduce with a performance improvement of up to 25% on 127 cluster nodes. Finally, we discuss additional features that would enable MPI to more efficiently support all MapReduce applications. © 2009 Springer Berlin Heidelberg.
Original language | English |
---|---|
Title of host publication | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)|Lect. Notes Comput. Sci. |
Publisher | Springer Nature |
Pages | 240-249 |
Number of pages | 9 |
Volume | 5759 |
ISBN (Print) | 3642037690, 9783642037696 |
DOIs | |
Publication status | Published - 2009 |
Event | 16th European Parallel Virtual Machine and Message Passing Interface Users' Group Meeting, EuroPVM/MPI - Espoo Duration: 1 Jul 2009 → … http://dblp.uni-trier.de/db/conf/pvm/pvm2009.html#HoeflerLD09http://dblp.uni-trier.de/rec/bibtex/conf/pvm/HoeflerLD09.xmlhttp://dblp.uni-trier.de/rec/bibtex/conf/pvm/HoeflerLD09 |
Publication series
Name | Lecture Notes in Computer Science |
---|
Conference
Conference | 16th European Parallel Virtual Machine and Message Passing Interface Users' Group Meeting, EuroPVM/MPI |
---|---|
City | Espoo |
Period | 1/07/09 → … |
Internet address |