Indiana University

Follow us on Facebook!

Hierarchical MapReduce

Overview

The MapReduce programming model provides an easy way to execute pleasantly parallel applications. Many data-intensive applications fit this programming model and benefit from the scalability that can be delivered using this model. Although commercial clouds can provide virtually unlimited computation and storage resources on-demand, due to financial, security and possibly other concerns, many researchers still run experiments on a number of small clusters with limited number of nodes that cannot unleash the full power of MapReduce. We present a hierarchical MapReduce framework that gathers computation resources from different clusters and run MapReduce jobs across them. The global controller in our framework splits the data set and dispatches them to multiple "local" MapReduce clusters, and balances the workload by assigning tasks in accordance to the capabilities of each cluster and of each node. The local results are then returned back to the global controller for global reduction.

Hierarchical MapReduce Architecture

Contact

Project Contributors

  • Yuan Luo 
  • Yiming Sun
  • Zhenhua Guo
  • Beth Plale

Publications

  • Yuan Luo and Beth Plale. Hierarchical MapReduce Programming Model and Scheduling Algorithms, Doctoral Symposium of the 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), Ottawa, Canada, May 13-16, 2012, to appear.
  • Yuan Luo, Beth Plale, Zhenhua Guo, Wilfred W. Li, Judy Qiu, Yiming Sun. Hierarchical MapReduce: Towards Simplified Cross-Domain Data Processing, Concurrency and Computation: Practice and Experience, accepted.
  • Yuan Luo, Zhenhua Guo, Yiming Sun, Beth Plale, Judy Qiu, Wilfred W. Li, A Hierarchical Framework for Cross-Domain MapReduce Execution, in Proceedings of Emerging Computational Methods for the Life Sciences Workshop (ECMLS2011) of The 20th ACM High Performance Distributed Computing Conference (HPDC 2011), San Jose, California, June 8-10, 2011

Presentations

  • Hierarchical MapReduce: Towards Simplified Cross-Domain Data Processing, Invited talk at Cloud Computing Lecture, Indiana University, Oct 12, 2011.
  • A Hierarchical Framework for Cross-Domain MapReduce Execution, Presented at ECMLS 2011 Workshop, co-located with HPDC 2011, San Jose, CA, Jun 8th, 2011. [Slides

Posters

  • A Hierarchical MapReduce Framework, PRAGMA 22 Workshop, Monash University, Melbourne, Australia, April 17-19, 2012 [Slides
Related News, Events and Publications:

Middleware alternatives for storm surge predictions in Windows Azure Prof. Plale introduces research D2I will carry forward with a new project in conjunction with Craig Mattocks from University of Miami. Attendees included representatives from the National Hurricane Center and the National Weather Service. May 2, 2012.
A Hierarchical MapReduce Framework PRAGMA 22 Workshop, Monash University, Melbourne, Australia, April 17-19, 2012. Poster presented by Yuan Luo
A Hierarchical Framework for Cross-Domain MapReduce Execution The MapReduce programming model provides an easy way to execute pleasantly parallel applications. Many data-intensive life science applications fit this programming model and benefit from the scalability that can be delivered using this model. One such application is AutoDock, which consists of a suite of automated tools for predicting the bound conformations of flexible ligands to macromolecular targets. However, researchers also need sufficient computation and storage resources to fully enjoy...
Report to the Lilly Endowment, Inc. Grant Number 2008 1639-00 24 Month Program Report June 1, 2010 - November 30, 2010 Bi-Annual report to the Lilly Endowment, Inc. Search for "Lilly Report" to find all reports.

AttachmentSize
architecture_diagram.jpg872.21 KB