Indiana University

Follow us on Facebook!

Hierarchical MapReduce

Overview

The MapReduce programming model provides an easy way to execute pleasantly parallel applications. Many data-intensive applications fit this programming model and benefit from the scalability that can be delivered using this model. Although commercial clouds can provide virtually unlimited computation and storage resources on-demand, due to financial, security and possibly other concerns, many researchers still run experiments on a number of small clusters with limited number of nodes that cannot unleash the full power of MapReduce. We present a hierarchical MapReduce framework that gathers computation resources from different clusters and run MapReduce jobs across them. The global controller in our framework splits the data set and dispatches them to multiple "local" MapReduce clusters, and balances the workload by assigning tasks in accordance to the capabilities of each cluster and of each node. The local results are then returned back to the global controller for global reduction.

Hierarchical MapReduce Architecture

Contact

Project Contributors

  • Yuan Luo 
  • Yiming Sun
  • Zhenhua Guo
  • Beth Plale

Publications

Presentations

  • A Hierarchical MapReduce Framework, Invited talk at IBM Student Workshop for Frontiers of Cloud Computing 2012, IBM Thomas J. Watson Research Center, Hawthorne, New York, July 30-31, 2012
  • Hierarchical MapReduce: Towards Simplified Cross-Domain Data Processing, Invited talk at Cloud Computing Lecture, Indiana University, Oct 12, 2011.
  • A Hierarchical Framework for Cross-Domain MapReduce Execution, Presented at ECMLS 2011 Workshop, co-located with HPDC 2011, San Jose, CA, Jun 8th, 2011. [Slides

Posters

  • A Hierarchical MapReduce Framework, PRAGMA 22 Workshop, Monash University, Melbourne, Australia, April 17-19, 2012 [Slides
Related News, Events and Publications:

Hierarchical MapReduce: towards simplified cross-domain data processing
Hierarchical MapReduce Programming Model and Scheduling Algorithms
Middleware alternatives for storm surge predictions in Windows Azure Prof. Plale introduces research D2I will carry forward with a new project in conjunction with Craig Mattocks from University of Miami. Attendees included representatives from the National Hurricane Center and the National Weather Service. May 2, 2012.
A Hierarchical MapReduce Framework PRAGMA 22 Workshop, Monash University, Melbourne, Australia, April 17-19, 2012. Poster presented by Yuan Luo
Karma Provenance Collection Tool The Karma tool is a standalone tool that can be added to existing cyberinfrastructure for purposes of collection and representation of provenance data. Karma utilizes a modular architecture that permits support for multiple instrumentation plugins that make it usable in different architectural settings.
Provenance and Metadata The Data to Insight Center has a strong presence in provenance and metadata for scientific data through numerous funded projects and efforts with collaborators both at Indiana University and at other institutions.
Provenance
As research digital data collections become more accessible, it becomes increasingly important to address the issues of data validity and quality: To record and manage information about where each data object originated, the processes applied to...
A Hierarchical Framework for Cross-Domain MapReduce Execution The MapReduce programming model provides an easy way to execute pleasantly parallel applications. Many data-intensive life science applications fit this programming model and benefit from the scalability that can be delivered using this model. One such application is AutoDock, which consists of a suite of automated tools for predicting the bound conformations of flexible ligands to macromolecular targets. However, researchers also need sufficient computation and storage resources to fully enjoy...
Report to the Lilly Endowment, Inc. 24 Month Program Report Jun 1, 2010 - Nov 30, 2010 Bi-Annual report to the Lilly Endowment, Inc. Search for "Lilly Report" to find all reports.

AttachmentSize
architecture_diagram.jpg872.21 KB