Hadrien Croubois : Detecting Silent Data Corruption Using an Auxiliary Method and External Observer

Title: Detecting Silent Data Corruption Using an Auxiliary Method and External Observer

Speaker: Hadrien Croubois

Abstract: HPC platforms and application are becoming increasingly complex. Consequently, protecting results against all forms of corruption and ensuring trustworthiness are becoming more important. While previous work focuses on application-specific detectors, the dataflow manager in our current work in the Decaf project aims to have an efficient generic mechanism. We address those issues using new replication patterns that rely on the use of an auxiliary method and an external learning observer. In this talk, we present both the theoretical validation mechanisms and different use cases where our mechanism can be applied to detect silent data corruption.

2015-09-08_Hadrien

Sonia Ben Mokhtar : Building Selfish-Resilient Distributed Systems

Title : Building Selfish-Resilient Distributed Systems

Speaker: Sonia Ben Mokhtar

Abstract: Collaborative systems (e.g., peer-to-peer instant messaging, file sharing, live streaming applications) generate among the largest amounts of traffic of today’s Internet. Common to all these systems is the assumption that, in return to the service offered by the collaborative system, users are willing to participate by sharing their resources with others. However, in practice, these systems suffer from selfish users that strategically free-ride the system whenever it is convenient for them. Albeit a number of solutions have been devised in the literature to deal with this problem, most of them are tailored to specific systems and thus lack flexibility and re-usability. During this seminar I will discuss methods for building selfish resilient distributed systems and future directions towards the automatic transformation of a given collaborative system into a system resilient to selfish behaviors.

Sara Bouchenak : Service Level Agreement for Cloud Computing: Towards a Control-Theoretic Approach

Title : Service Level Agreement for Cloud Computing: Towards a Control-Theoretic Approach

Speaker: Sara Bouchenak

Abstract: Cloud Computing is a paradigm for enabling remote, on-demand access to a set of configurable computing resources. This model aims to provide hardware and software services to customers, while minimizing human efforts in terms of service installation, configuration and maintenance, for both cloud provider and cloud customer. A cloud may have the form of an Infrastructure as a Service (IaaS), a Platform as a Service (PaaS) or a Software as a Service (SaaS). However, cloud’s ad-hoc management in terms of quality-of-service and service level agreement (SLA) poses significant challenges to the performance, availability, energy consumption and economical costs of the cloud. We believe that a differentiating element between Cloud Computing environments will be the quality-of-service and the service level agreement (SLA) provided by the cloud. In this talk, we will discuss the definition and implementation of a novel cloud model: SLAaaS (SLA aware Service). The SLAaaS model enriches the general paradigm of Cloud Computing, and enables systematic and transparent integration of service levels and SLA to the cloud. SLAaaS is orthogonal to IaaS, PaaS and SaaS clouds and may apply to any of them. Both the cloud provider and cloud customer points of view are taken into account. From cloud provider’s point of view, we present autonomic SLA management to handle performance, availability, energy and cost issues in the cloud. An innovative approach combines control theory techniques with distributed algorithms and language support in order to build autonomic elastic clouds. Novel models, control laws, distributed algorithms and languages will be proposed for automated provisioning, configuration and deployment of cloud services to meet SLA requirements, while tackling scalability and dynamics issues. On the other hand from cloud customer’s point of view, we discuss SLA governance. It allows cloud customers to be part of the loop and to ba automatically notified about the state of the cloud, such as SLA violation and cloud energy consumption. The former provides more transparecy about SLA guaranties, and the latter aims to raise customers’ awareness about cloud’s energy footprint.

Frédéric Prost : Category Theory 101, Graph Transformation and Social Data anonymisation.

Title : Category Theory 101, Graph Transformation and Social Data anonymisation.

Speaker: Frédéric Prost

Abstract: We will briefly introduce the basics of category theory in order to have a self-contained talk on Graph Transformation and an application to social data anonymisation. We will present the research field of social data anonymization: Huge network data sets, like social networks (describing personal relationships and cultural preferences) or communication networks (the graph of phone calls or email correspondents) become more and more common. These data sets are analyzed in many ways varying from the study of disease transmission to targeted advertising. Selling network data set to third-parties is a significant part of the business model of major internet companies. Usually, in order to preserve the confidentiality of the sold data set, only “anonymized” data are released: the original social networks is modified in order to avoid re-identification. The aim is to anonymize the data while keeping its use for the analyzes. We will review the most important results in this field, and we will show how graph rewriting techniques based on category theory can be used to design a more formal approach to tackle these issues.

2015-06-18-Prost

Vincent Lanore : A Reconfigurable Component Model for HPC

Title : A Reconfigurable Component Model for HPC

Speaker: Vincent Lanore

Abstract: High-performance applications whose structure changes dynamically during execution are extremely complex to develop, maintain and adapt to new hardware. Such applications would greatly benefit from easy reuse and separation of concerns which are typical advantages of component models. Unfortunately, no existing component model is both HPC-ready (in terms of scalability and overhead) and able to easily handle dynamic reconfiguration. We aim at addressing performance, scalability and programmability by separating locking and synchronization concerns from reconfiguration code. To this end, we propose directMOD, a component model which provides on one hand a flexible mechanism to lock subassemblies with a very small overhead and high scalability, and on the other hand a set of well-defined mechanisms to easily plug various independently-written reconfiguration components to lockable subassemblies. We evaluate both the model itself and a C++/MPI implementation called directL2C.

WG_2015-04-29_Vincent

Brigitte Jaumard: Design of Survivable VPN Topologies over a Server Provider Network

Title : Design of Survivable VPN Topologies over a Server Provider Network

Speaker: Brigitte Jaumard

Abstract: In the context of multiple-hop working routing for IP layer traffic requests. The design problem is composed of two problems which are simultaneously solved: (i) Finding the most efficient or economical multi-hop routing of the IP traffic flows with different bandwidth granularities over the logical topology, which involves some traffic grooming, (ii) Ensuring that the logical topology is survivable throughout an appropriate mapping of the logical links over the physical topology, if such a mapping exists. In order to solve such a complex multi layer resilient network design problem, we propose a column generation ILP model. It allows exploiting the natural decomposition of the problem and helps devising a scalable solution scheme. We conducted numerical experiments on a German network with 50 nodes and 88 physical links. Not only we could solve much larger data instances than those published in the literature, but also observe than multi-hop routing allows a saving of up to 10% of the number of lightpaths, depending on the traffic load.

Jérome Richard: Vers un modèle de composants supportant l’ordonnancement de tâches pour le calcul de haute performance

Title : Vers un modèle de composants supportant l’ordonnancement de tâches pour le calcul de haute performance

Speaker: Jérome Richard

Abstract: Les applications de haute performance ont une durée de vie souvent plus grande que celle des plate-formes sur lesquelles elles reposent. L’adaptation de ces applications à différentes plate-formes est un processus nécessaire, long et coûteux. Les composants logiciels offrent de nombreux avantages de génie logiciel simplifiant l’adaptation des applications. Parallèlement, on souhaiterait garder de bonnes performances à travers les adaptations. Les modèles d’ordonnancement de graphes de tâches permettent de tirer parti efficacement des architectures hétérogènes tout en apportant des performances portables. Cet présentation propose et évalue un modèle de composants avec ordonnancement de tâches visant à profiter des avantages des deux approches sur des SMP. Les résultats montrent que le modèle proposé dispose d’avantages provenant des approches à composants (séparation des préoccupations) et des approches à tâches (équilibrage de charge).

WG Julien Bigot: Gysela5D, Adapting a GYrokinetic SEmi-LAgrangian code for current architectures and towards Exascale

Title : Gysela5D, Adapting a GYrokinetic SEmi-LAgrangian code for current architectures and towards Exascale

Speaker: Julien Bigot

Abstract: In order to design and operates the future reactor for nuclear fusion such as ITER (tokamaks), physicists need to better understand the various types of instabilities that develop in the plasma and impact the confinement of heat. Simulation of Ion Temperature Gradient (ITG) instabilities based on the Vlasov equations require huge amounts of computational power with a discretization of both the spacial and velocity space (6D). The gyrokinetic approximation makes this kind of simulation possible by reducing this to “only” 5D. Up to now, the semi-Lagrangian code Gysela5D has been used to perform large simulations using a few thousands cores (8k to 16k cores typically). These simulations make the hypothesis that electrons are adiabatic but recent advances seem to indicate that some instabilities could only be explained by simulating kinetic electrons. In order to do that, the spacial mesh would have to be refined by a 60³ ratio and time steps by a 60 ratio. Such simulations would require Exascale capable machines. In this talk, I present some challenges identified in order to provide an Exascale-ready code as well as solutions recently implemented and work in progress to tackle these. I especially focus on three such piece of work:

  • memory scalability optimization;
  • I/O optimizations for both checkpoints and result writing;
  • communication patterns optimization for big number of cores (Blue Gene/Q).

I will also present recent results that show that the code scales with good performance up to 1,835,008 threads (the complete Juqueen Blue Gene/Q at Jülich).

PDF: WG_150224_jbigot-gysela

WG Salem Harrache: Reconstructable Software Appliances with Kameleon

Title: Reconstructable Software Appliances with Kameleon

Speaker: Salem Harrache

Abstract: A software appliance builder bundles together an application with its needed middleware and an operating system to allow easy deployment on Infrastructure as a Service (IaaS) providers. These builders have the potential to address a key need in the computer science community: the ability to reproduce an experiment. This talk presents a software appliance builder called Kameleon that automates the construction of complex software appliances targeted at research on operating systems, HPC and distributed computing, Devops etc. It does so by proposing a highly modular description format that encourages shareability and reuse of procedures. Moreover, it provides debugging mechanisms for improving experimenter’s productivity.

PDF: WG_150127_Salem