WG Avalon public 10 juin 2024

Sébastien Valat (Inria AirSea) a donné le séminaire suivant.

Titre : Profiling mémoire, présentation de MALT et NUMAPROF

Résumé:
La mémoire est souvent devenue un problème majeur dans les applications.

Ceci pour la question des performances d’accès, mais aussi par les 
volumes à gérer au sein d’applications souvent de plus en plus 
dynamiques, complexes et développées à plusieurs sur des dizaines d’années.

Se posent alors les questions suivantes :

  – Comment se rendre compte de ses erreurs et de patterns 
problématiques facilement corrigibles ?
   – Comment trouver où elle est consommée lorsque j’atteins les limites 
de ma machine (malloc, variables globales, TLS) ?
   – Le NUMA, je fais comment pour savoir si je me suis trompé et où ?

Après ma thèse sur la gestion mémoire en contexte HPC (malloc, kernel, 
NUMA, multi-threading….) sur large simulation numérique j’ai eu 
l’opportunité de développer deux profileurs mémoires, Malt (MALloc 
Tracker) et Numaprof. Avec ces outils, j’ai tenté de reporter ce que 
j’ai pu comprendre le long de mon chemin et rendre visible ce que 
j’avais du mal à visualiser à l’époque dans les codes cibles inconnus 
avec lesquels j’interagissais. Outils maintenant open-sources pour le C 
/ C++ / Fortran (rust).

Je présenterais donc ces deux outils avec en principe quelques exemples 
d’observations obtenues.

Site web : https://memtt.github.io/

Présentation de l’orateur:
Après un parcours en physique des particules, j’ai bifurqué pour 
répondre à mes questions sur l’informatique en science avec une thèse au 
CEA en gestion de la mémoire des supercalculateurs et depuis cheminé au 
CERN et pour partie dans l’industrie du HPC sur les IOs. Je suis en ce 
moment à l’INRIA dans le domaine de la simulation océanique.
https://svalat.github.io/

ECLAT

ECLAT est un laboratoire commun du CNRS en partenariat avec Inria, Eviden, Observatoire de la Côte d’Azur et Observatoire de Paris.

Construit comme un véritable centre d’excellence sur les technologies et techniques du calcul haute performance et de l’intelligence artificielle au service de l’instrumentation astronomique, ECLAT fédère les forces des 14 laboratoires et équipes associés autour d’une feuille de route commune. Celle-ci vise à faciliter les partenariats de R&D nécessaires à la conception et la réalisation des futurs systèmes cyber-physiques pour l’astronomie, capables d’ingérer, traiter et réduire de très grands volumes de données.

Plus d’information sur le site d’ECLAT

Exa-SofT : HPC software and tools

A NumPEx PEPR project

Though significant efforts have been devoted to the implementation and optimization of several crucial parts of a typical HPC software stack, most HPC experts agree that exascale supercomputers will raise new challenges, mostly because the trend in exascale compute-node hardware is toward heterogeneity and scalability: Compute nodes of future systems will have a combination of regular CPUs and accelerators (typically GPUs), along with a diversity of GPU architectures.

Meeting the needs of complex parallel applications and the requirements of exascale architectures raises numerous challenges which are still left unaddressed.
As a result, several parts of the software stack must evolve to better support these architectures. More importantly, the links between these parts must be strengthened to form a coherent, tightly integrated software suite.

Our project aims at consolidating the exascale software ecosystem by providing a coherent, exascale-ready software stack featuring breakthrough research advances enabled by multidisciplinary collaborations between researchers.

The main scientific challenges we intend to address are:

  • productivity,
  • performance portability,
  • heterogeneity,
  • scalability and resilience,
  • performance and energy efficiency.

AVALON is coordinating the WP1 and participates to WP1 and WP2

Project Information

  • URL: Not available yet
  • Starting date: 2023
  • End date: 2028

Taranis : Model, Deploy, Orchestrate, and Optimize Cloud

A PEPR Cloud project

New infrastructures, such as Edge Computing or the Cloud-Edge-IoT computing continuum, make cloud issues more complex as they add new challenges related to resource diversity and heterogeneity (from small sensor to data center/HPC, from low power network to core networks), geographical distribution, as well as increased dynamicity and security needs, all under energy consumption and regulatory constraints.

In order to efficiently exploit new infrastructures, we propose a strategy based on a significant abstraction of the application structure description to further automate application and infrastructure management. Thus, it will be possible to globally optimize the resources used with respect to multi-criteria objectives (price, deadline, performance, energy, etc.) on both the user side (applications) and the provider side (infrastructures). This abstraction also includes the challenges related to the abstraction of application reconfiguration and to automatically adapt the use of resources.

The Taranis project addresses these issues through four scientific work packages, each focusing on a phase of the application lifecycle: application and infrastructure description models, deployment and reconfiguration, orchestration, and optimization.

The first work package “Modeling” addresses the complexity of cloud-edge application and infrastructure models: formal verification and optimization of these models, multi-layer variability, the relationship between model expressiveness and efficient solution computation, lock-ins of proprietary models, and heterogeneity of cloud application and infrastructure modeling languages.

The second work package “Deployment and Reconfiguration” studies deployment and reconfiguration related issues to reduce management complexity and increase support for provisioning and configuration languages, while improving operations certification and increasing operations concurrency. The workpackage also aims to reduce the complexity of the bootstrapping problem on geo-distributed and heterogeneous resources.

The third work package “Orchestration of services and resources” aims at extending the orchestrators for the Cloud-Edge-IoT continuum, while making them more autonomous with respect to dynamic, functional and/or non-functional needs, in particular with respect to the network partitioning problem specific to Cloud-Edge-IoT infrastructures.

Finally, the fourth work package “Optimization” aims to revisit the optimization problems associated with the use of Cloud-Edge-IoT infrastructures and the execution of an application when a large number of decision variables need to be considered jointly. It also aims to make optimization techniques aware of the Cloud-Edge-IoT continuum, the heterogeneous distributed platforms and the wide range of application configurations involved.

AVALON is coordinating the project and participated to the first two workpackages.

Project Information

  • URL: Not available yet
  • Starting date: 2023, September 1st
  • End date: 2030, August 31th

Slices PP– Preparatory Phase

The digital infrastructures research community continues to face numerous new challenges towards the design of the Next Generation Internet. This is an extremely complex ecosystem encompassing communication, networking, data-management and data-intelligence issues, supported by established and emerging technologies such as IoT, 5/6G, cloud-to-edge computing. Coupled with the enormous amount of data generated and exchanged over the network, this calls for incremental as well as radically new design paradigms. Experimentally-driven research is becoming worldwide a de-facto standard, which has to be supported by large- scale research infrastructures to make results trusted, repeatable and accessible to the research communities.


SLICES-RI (Research Infrastructure), which was recently included in the 2021 ESFRI roadmap, aims to answer these problems by building a large infrastructure needed for the experimental research on various aspects of distributed computing, networking, IoT and 5/6G networks. It will provide the resources needed to continuously design, experiment, operate and automate the full lifecycle management of digital infrastructures, data, applications, and services.


Based on the two preceding projects within SLICES-RI, SLICES-DS (Design Study) and SLICES-SC (Starting Community), the SLICES-PP (Preparatory Phase) project will validate the requirements to engage into the implementation phase of the RI lifecycle. It will set the policies and decision processes for the governance of SLICES-RI: i.e., the legal and financial frameworks, the business model, the required human resource capacities and training programme. It will also settle the final technical architecture design for implementation. It will engage member states and stakeholders to secure commitment and funding needed for the platform to operate. It will position SLICES as an impactful instrument to support European advanced research, industrial competitiveness and societal impact in the digital era.

Project Information

WG Avalon 3 mai 2022

Élise Jeanneau a présenté l’exposé suivant.

Titre: SkyData, un nouveau paradigme pour la gestion de données

Résumé

Les systèmes de gestion de données traditionnels sont centrées sur les 
applications, plutôt que sur les données. Le projet SkyData propose de 
renverser la donne en gérant des données autonomes, capables de décider 
d’elles-mêmes leurs migrations et réplications. Le système qui en 
résulte est distribué, dynamique et fondamentalement différent des 
systèmes de gestion de données existants. Ce nouveau paradigme permet 
aux utilisateurs d’intégrer leurs données au système sans avoir à en 
céder le contrôle à un gestionnaire de données tier.

SkyData est un projet soumis à l’ANR. Cette présentation a pour but 
d’introduire les bases de la structure de SkyData, et de discuter 
certains usages possibles de ces données autonomes.

Pierre Jacquot: Evaluating unikernels for HPC applications

Pierre Jacqot
June 29th 2020, 14:30–15:30

Title: Evaluating unikernels for HPC applications

Abstract: Unikernels are lightweight single-application operating systems. They are designed to run as virtual machines, but some are able to run on bare metal too. They are quite popular in the system research community due to the increase of performance they can provide. By reducing the system call overhead and the OS-noise, they might be a good alternative to containers for HPC applications. This report evaluate the suitability of unikernels for HPC applications. This is done by conducting stability and performance studies with the Bots benchmarks and the Rodinias benchmarks. They are performed on multi-core architectures, on single node.

Arthur Chevalier: Optimization of software license placement in the Cloud for economical and efficient deployment

Arthur Chevalier
November 17th 2020, 14:30–15:30

Title: Optimization of software license placement in the Cloud for economical and efficient deployment

Abstract: Today, the use of software is generally regulated by licenses, whether they are free, paid for and with or without access to their sources. The world of licensing is very vast and poorly understood. Often we only know the version most widely used by the general public (a software purchase is equal to a license). The reality is much more complex, especially for large publishers. In this presentation I will present the impact and importance of managing these licenses when using software in a cloud architecture. I will show a case study to demonstrate the impact of dynamic license management and the need to propose new ways to manage and optimize software assets.

Titre: Optimisation du placement des licences logicielles dans le Cloud pour un déploiement économique et efficient

Résumé: Aujourd’hui, l’utilisation des logiciels est généralement réglementée par des licences, qu’elles soient gratuites, payantes et avec ou sans accès à leurs sources. L’univers des licences est très vaste et mal connu. Souvent on ne connaît que la version la plus répandue au grand public (un achat de logiciel est égale à une licence). La réalité est bien plus complexe surtout chez les grands éditeurs. Dans cette présentation je présenterai l’impact et l’importance de la gestion de ces licences lors de l’utilisation de logiciels dans une architecture Cloud. Je montrerai un cas d’étude pour prouver l’impact de la gestion dynamique des licences et la nécessité de proposer de nouvelles façons de gérer et optimiser un patrimoine logiciel.

SLICES-DS: Slices – Design Study