Overview
The
MapReduce programming model provides an easy way to execute pleasantly
parallel applications. Many data-intensive applications fit
this programming model and benefit from the scalability that can be
delivered using this model. Although commercial clouds can provide
virtually unlimited computation and storage resources on-demand, due to
financial, security and possibly other concerns, many researchers still
run experiments on a number of small clusters with limited number of
nodes that cannot unleash the full power of MapReduce. We present a
hierarchical MapReduce framework that gathers computation resources from
different clusters and run MapReduce jobs across them. The global
controller in our framework splits the data set and dispatches them to
multiple "local" MapReduce clusters, and balances the workload by
assigning tasks in accordance to the capabilities of each cluster and of
each node. The local results are then returned back to the global
controller for global reduction. 
Hierarchical MapReduce Architecture
Contact
Project Contributors
- Yuan Luo
- Yiming Sun
- Zhenhua Guo
- Beth Plale
Publications
-
Yuan Luo, Beth Plale, Zhenhua Guo, Wilfred W. Li, Judy Qiu, Yiming Sun. (2012), Hierarchical MapReduce: Towards Simplified Cross-Domain Data Processing, Concurrency and Computation: Practice and Experience, doi: 10.1002/cpe.2929
- Yuan Luo and Beth Plale. Hierarchical MapReduce Programming Model and Scheduling Algorithms, in Proceedings of the 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), Ottawa, Canada, May 13-16, 2012
- Yuan Luo, Zhenhua Guo, Yiming Sun, Beth Plale, Judy Qiu, Wilfred W. Li, A Hierarchical Framework for Cross-Domain MapReduce Execution, in Proceedings of Emerging Computational Methods for the Life Sciences Workshop (ECMLS2011) of The 20th ACM High Performance Distributed Computing Conference (HPDC 2011), San Jose, California, June 8-10, 2011
Presentations
- A Hierarchical MapReduce Framework, Invited talk at IBM Student Workshop for Frontiers of Cloud Computing 2012, IBM Thomas J. Watson Research Center, Hawthorne, New York, July 30-31, 2012
- Hierarchical MapReduce: Towards Simplified Cross-Domain Data Processing, Invited talk at Cloud Computing Lecture, Indiana University, Oct 12, 2011.
- A Hierarchical Framework for Cross-Domain MapReduce Execution, Presented at ECMLS 2011 Workshop, co-located with HPDC 2011, San Jose, CA, Jun 8th, 2011. [Slides]
Posters
- A Hierarchical MapReduce Framework, PRAGMA 22 Workshop, Monash University, Melbourne, Australia, April 17-19, 2012 [Slides]
Related News, Events and Publications:
|
|
|
|
|
Hierarchical MapReduce: towards simplified cross-domain data processing |
|
|
Hierarchical MapReduce Programming Model and Scheduling Algorithms |
|
|
Middleware alternatives for storm surge predictions in Windows Azure |
Prof. Plale introduces research D2I will carry forward with a new project in conjunction with Craig Mattocks from University of Miami. Attendees included representatives from the National Hurricane Center and the National Weather Service. May 2, 2012. |
|
A Hierarchical MapReduce Framework |
PRAGMA 22 Workshop, Monash University, Melbourne, Australia, April 17-19, 2012. Poster presented by Yuan Luo |
|
Karma Provenance Collection Tool |
The Karma tool is a standalone tool that can be added to existing cyberinfrastructure for purposes of collection and representation of provenance data. Karma utilizes a modular architecture that permits support for multiple instrumentation plugins that make it usable in different architectural settings. |
|
Provenance and Metadata |
The Data to Insight Center has a strong presence in provenance and metadata for scientific data through numerous
funded projects and efforts with collaborators both at Indiana University and at other institutions.
Provenance
As research digital data collections become more accessible, it becomes increasingly important to address the issues of data validity and quality: To record and manage information about where each data object originated, the processes applied to... |
|
A Hierarchical Framework for Cross-Domain MapReduce Execution |
The MapReduce programming model provides an easy way to execute pleasantly parallel applications. Many data-intensive life science applications fit this programming model and benefit from the scalability that can be delivered using this model. One such application is AutoDock, which consists of a suite of automated tools for predicting the bound conformations of flexible ligands to macromolecular targets. However, researchers also need sufficient computation and storage resources to fully enjoy... |
|
Report to the Lilly Endowment, Inc. 24 Month Program Report Jun 1, 2010 - Nov 30, 2010 |
Bi-Annual report to the Lilly Endowment, Inc. Search for "Lilly Report" to find all reports. |
|