This project is devoted to using MapReduce programming paradigm on clouds and hybrid infrastructures. Partners: Argonne National Lab (USA), the University of Illinois at Urbana Champaign (USA), the UIUC-INRIA Joint Lab on Petascale Computing, IBM France, IBCP, MEDIT (SME) and the GRAAL/AVALON INRIA project-team.
This project aims to overcome the limitations of current Map-Reduce frameworks such as Hadoop, thereby enabling highly-scalable Map-Reduce-based data processing on various physical platforms such as clouds, desktop grids, or on hybrid infrastructures built by combining these two types of infrastructures.To meet this global goal, several critical aspects will be investigated. Data storage and sharing architecture. First, we will explore advanced techniques for scalable, high-throughput, concurrency-optimized data and metadata management, based on recent preliminary contributions of the partners. Scheduling. Second, we will investigate various scheduling issues related to large executions of Map-Reduce instances. In particular, we will study how the scheduler of the Hadoop implementation of Map-Reduce can scale over heterogeneous platforms; other issues include dynamic data replication and fair scheduling of multiple parallel jobs. Fault tolerance and security. Finally, we intend to explore techniques to improve the execution of Map-Reduce applications on large-scale infrastructures with respect to fault tolerance and security.
Our global goal is to explore how combining these techniques can improve the behavior of Map-Reduce-based applications on the target large-scale infrastructures. To this purpose, we will rely on recent preliminary contributions of the partners associated in this project, illustrated though the following main building blocks. BlobSeer, a new approach to distributed data management being designed by the KerData team from INRIA Rennes – Bretagne Atlantique to enable scalable, efficient, fine-grain access to massive, distributed data under heavy concurrency. BitDew, a data-sharing platform being currently designed by the GRAAL team from INRIA Grenoble – Rhône-Alpes at ENS Lyon, with the goal of exploring the specificities of desktop grid infrastructures. Nimbus, a reference open source cloud management toolkit developed at the University of Chicago and Argonne National Laboratory (USA) with the goal of facilitating the operation of clusters as Infrastructure-as-a-Service (IaaS) clouds.
More information on the MapReduce web site.