WG – Laércio LIMA PILLA: Current Efforts in Global Scheduling and Fault Tolerance for HPC Systems


Title: Current Efforts in Global Scheduling and Fault Tolerance for HPC Systems

Speaker: Laércio LIMA PILLA

Abstract: Performance, energy efficiency, and reliability have been important objectives and challenges in current and future computing systems. In this context, our approach has been based on understanding the details of the computing system architecture and the behavior of applications, in order to combine this information, identify issues and propose new solutions. In this presentation, I will discuss our experience with the development of new architecture-aware global scheduling algorithms for multiprocessor and multicomputer systems, and with fault tolerance mechanisms for radiation-induced errors in parallel accelerators. I will also present some future global scheduling plans to handle the inclusion of non-volatile random-access memories (NVRAMs) in computing systems.

WG – Victor Allombert: Programming Multi-BSP Algorithms in ML


Title: Programming Multi-BSP Algorithms in ML

Speaker: Victor Allombert

Abstract: From personal computers using an increasing number of cores, to supercomputers having millions of computing units, parallel architectures are the current standard. The high performance architectures are usually referenced to as hierarchical, as they are composed from clusters of multi-processors of multi-cores. Programming such architectures is known to be notoriously difficult. Writing parallel programs is, most of the time, difficult for both the algorithmic and the implementation phase. To answer those concerns, many structured models and languages were proposed in order to increase both expressiveness and efficiency. Among other models, Multi-BSP is a bridging model dedicated to hierarchical architecture that ensures efficiency, execution safety, scalability and cost prediction. It is an extension of the well known BSP model that handles flat architectures. We introduce the Multi-ML language, which allows programming Multi-BSP algorithms “à la ML” and thus, guarantees the properties of the Multi-BSP model and the execution safety, thanks to a ml type system. To deal with the multi-level execution model of Multi-ML, we defined formal semantics which describe the valid evaluation of an expression. To ensure the execution safety of Multi-ML programs, we also propose a typing system that preserves replicated coherence. An abstract machine is defined to formally describe the evaluation of a Multi-ML program on a Multi-BSP architecture. An implementation of the language is available as a compilation toolchain. It is thus possible to generate an efficient parallel code from a program written in Multi-ML and execute it on any hierarchical machine.

GrPPI: A Generic Parallel Pattern Interface for Stream and Data Processing

Next Avalon working group will be tomorrow (3/10/2016) at 15h in amphi L. Dr. Manuel F. Dolz, currently visiting us from Madrid, will be talking about his work.

GrPPI: A Generic Parallel Pattern Interface for Stream and Data Processing

Current parallel programming frameworks aid developers to a great extent in implementing applications that exploit parallel hardware resources. Nevertheless, developers require additional expertise to properly use and tune them to operate efficiently on specific parallel platforms. With the lack of high-level parallel pattern abstractions, we present GrPPI, a generic and reusable parallel pattern interface for both stream processing and data-intensive C++ applications (https://github.com/arcosuc3m/grppi). GrPPI accommodates a layer between developers and existing parallel programming frameworks targeting multi-core processors, such as C++ threads, OpenMP and Intel TBB. To achieve this goal, this interface leverages modern C++ features, metaprogramming techniques, and template-based programming to act as switch among those frameworks. All in all, thanks to its high-level API and compact design, GrPPI allows users to easily expose parallelism and hide away the complexity behind concurrency mechanisms. We evaluate this interface using an image processing use case and demonstrate its benefits from the usability, flexibility, and performance points of view.