Exa-SofT : HPC software and tools

A NumPEx PEPR project

Though significant efforts have been devoted to the implementation and optimization of several crucial parts of a typical HPC software stack, most HPC experts agree that exascale supercomputers will raise new challenges, mostly because the trend in exascale compute-node hardware is toward heterogeneity and scalability: Compute nodes of future systems will have a combination of regular CPUs and accelerators (typically GPUs), along with a diversity of GPU architectures.

Meeting the needs of complex parallel applications and the requirements of exascale architectures raises numerous challenges which are still left unaddressed.
As a result, several parts of the software stack must evolve to better support these architectures. More importantly, the links between these parts must be strengthened to form a coherent, tightly integrated software suite.

Our project aims at consolidating the exascale software ecosystem by providing a coherent, exascale-ready software stack featuring breakthrough research advances enabled by multidisciplinary collaborations between researchers.

The main scientific challenges we intend to address are:

  • productivity,
  • performance portability,
  • heterogeneity,
  • scalability and resilience,
  • performance and energy efficiency.

AVALON is coordinating the WP1 and participates to WP1 and WP2

Project Information

  • URL: Not available yet
  • Starting date: 2023
  • End date: 2028

Taranis : Model, Deploy, Orchestrate, and Optimize Cloud

A PEPR Cloud project

New infrastructures, such as Edge Computing or the Cloud-Edge-IoT computing continuum, make cloud issues more complex as they add new challenges related to resource diversity and heterogeneity (from small sensor to data center/HPC, from low power network to core networks), geographical distribution, as well as increased dynamicity and security needs, all under energy consumption and regulatory constraints.

In order to efficiently exploit new infrastructures, we propose a strategy based on a significant abstraction of the application structure description to further automate application and infrastructure management. Thus, it will be possible to globally optimize the resources used with respect to multi-criteria objectives (price, deadline, performance, energy, etc.) on both the user side (applications) and the provider side (infrastructures). This abstraction also includes the challenges related to the abstraction of application reconfiguration and to automatically adapt the use of resources.

The Taranis project addresses these issues through four scientific work packages, each focusing on a phase of the application lifecycle: application and infrastructure description models, deployment and reconfiguration, orchestration, and optimization.

The first work package “Modeling” addresses the complexity of cloud-edge application and infrastructure models: formal verification and optimization of these models, multi-layer variability, the relationship between model expressiveness and efficient solution computation, lock-ins of proprietary models, and heterogeneity of cloud application and infrastructure modeling languages.

The second work package “Deployment and Reconfiguration” studies deployment and reconfiguration related issues to reduce management complexity and increase support for provisioning and configuration languages, while improving operations certification and increasing operations concurrency. The workpackage also aims to reduce the complexity of the bootstrapping problem on geo-distributed and heterogeneous resources.

The third work package “Orchestration of services and resources” aims at extending the orchestrators for the Cloud-Edge-IoT continuum, while making them more autonomous with respect to dynamic, functional and/or non-functional needs, in particular with respect to the network partitioning problem specific to Cloud-Edge-IoT infrastructures.

Finally, the fourth work package “Optimization” aims to revisit the optimization problems associated with the use of Cloud-Edge-IoT infrastructures and the execution of an application when a large number of decision variables need to be considered jointly. It also aims to make optimization techniques aware of the Cloud-Edge-IoT continuum, the heterogeneous distributed platforms and the wide range of application configurations involved.

AVALON is coordinating the project and participated to the first two workpackages.

Project Information

  • URL: Not available yet
  • Starting date: 2023, September 1st
  • End date: 2030, August 31th