Title: A Throughput Model for Data Stream Processing on Fog Computing
Speaker: Felipe Rodrigo de Souza (LIP, AVALON team)
Location: LIP, Meeting room M7 3rd floor
Today’s society faces an unprecedented deluge of data that requires processing and analysis. Data Stream Processing (DSP) applications are often employed to extract valuable information in a timely manner as they can handle data as it is generated. The typical approach for deploying these applications explores the Cloud computing paradigm, which has limitations when data sources are geographically distributed, hence introducing high latency and achieving low processing throughput. To address these problems, current work attempts to take the computation closer to the edges of the Internet, exploring Fog computing. The effective adoption of this approach is achieved with proper throughput modeling that accounts for characteristics of the DSP application and Fog infrastructure, including the location of devices, processing and bandwidth requirements of the application, as well as selectivity and parallelism level of operators. In this work, we propose a throughput model for DSP applications embracing these characteristics. Results show that the model estimates the application throughput with less than 1% error.
Title: Online scheduling in magnetic tapes
Speaker: Carlos Cardonha (IBM Research Brazil)
Location: LIP, meeting room M7 3rd floor
Abstract: Magnetic tapes have been playing a key role as means for storage of digital data for decades, and their unsurpassed cost-effectiveness still makes them the technology of choice in several industries, such as media and entertainment. Tapes are mostly used for cold storage nowadays, and therefore the study of scheduling algorithms for read requests tailored for these devices has been largely neglected in the literature. In this article, we investigate the Linear Tape Scheduling Problem (LTSP), in which read requests associated with files stored on a single-tracked magnetic tape should be scheduled in a way that the sum of all response times are minimized. LTSP has many similarities with classical combinatorial optimization problems such as the Traveling Repairmen Problem and the Dial-a-Ride Problem restricted to the real line; nevertheless, significant differences on structural properties and strict time-limit constraints of real-world scenarios make LTSP challenging and interesting on its own. In this work, we investigate several properties and algorithms for LTSP and some of its extensions. The results allowed for the identification of 3-approximation algorithms for LTSP and efficient exact algorithms for some of its special cases. We also show that LTSPR, the version of the problem with heterogeneous release times for requests, is NP-complete. OLTSP, the online extension of LTSPR, does not admit c-competitive algorithms for any constant factor c, but we nevertheless introduce an algorithm for the problem and show through extensive computational experiments on synthetic and real-world datasets that different embodiments of the proposed strategy are computationally efficient and over-perform by orders of magnitude an algorithm being currently used by real-world tape file systems.
Mini-bio: Carlos Cardonha is a Research Staff Member at the Natural Resources Optimization group at IBM Research Brazil, with a Ph.D. in Mathematics (T.U. Berlin) and with a Bachelor’s and a Master’s degree in Computer Science (Universidade de São Paulo). His primary research interests are mathematical programming and theoretical computer science, with focus on the application of techniques in mixed integer linear programming, combinatorial optimization, and algorithms design and analysis to real-world and/or operations research problems.
Title: SeeDep: Deploying Reproducible Application Topologies on Cloud Platform
Speaker: Cyril Seguin (LIP, AVALON team)
Location: LIP, Salle du conseil du LIP 3rd floor
As part of the scientific method, any researcher should be able to reproduce the experimentation in order to not only verify the result but also evaluate and compare this experimentation with other approaches. The need of a standard tool allowing researchers to easily generate, share and reproduce experiments set-up arises. In this talk, I’ll present SeeDep, a framework aiming at being such a standard tool. By associating a generation key to a network experiment set-up, SeeDep allows for reproducing network experiments independently from the used infrastructure.
Title: Study and design of data-driven services/microservices discovery mechanisms
Speaker: Houmani Zeina (LIP, Avalon team)
Location: LIP, council room 394 nord 3rd floor
— English version
Usual microservice discovery mechanisms are normally based on user needs (Goal-based Approaches). However, in today’s evolving architectures, several new microservices can be created. This makes the classic approach insufficient to discover the available microservices. That’s why customers need to discover the features they can benefit from before searching the available microservices in their domain. We will present a data-driven microservice architecture that allows customers to discover, from specific objects, the functionalities that can be exerted on these objects as well as all the microservices dedicated to them. This architecture, based on the main components of classic microservice architectures, adopts a particular communication strategy between clients and registers to achieve the desired objective.
— French version
Les mécanismes de découverte de microservices classiques sont normalement basés sur les besoins des utilisateurs (Goal-based Approches). Cependant, dans les architectures actuelles qui évoluent fréquemment, plusieurs nouveaux microservices peuvent être créés. Cela rend l’approche classique seule insuffisante pour découvrir les microservices disponibles. C’est pourquoi, les clients ont besoin de découvrir les fonctionnalités dont ils peuvent bénéficier avant de rechercher dans leur domaine les microservices disponibles. Nous allons présenter une architecture microservices pilotée par les données qui permet aux clients de découvrir, à partir d’objets spécifiques, les fonctionnalités qui peuvent être exercées sur ces objets ainsi que l’ensemble des microservices qui leur sont dédiés. Cette architecture, basée sur les composants principaux des architectures microservices classiques, adopte une stratégie de communication particulière entre les clients et les registres permettant d’atteindre l’objectif recherché.
Title: Improving power-efficiency through fine-grain monitoring in HPC clusters
Speaker: Mathieu Stoffel (LIG, CORSE team)
Location: LIP, meeting room M7 3rd floor
Nowadays, power and energy consumption are of paramount importance. Further, reaching the Exascale target will not be possible in the short term without major breakthroughs in software and hardware technologies to meet power consumption constraints.
In this context, this papers discusses the design and implementation of a system-wide tool to monitor, analyze and control power/energy consumption in HPC clusters.
We developed a lightweight tool that relies on a fine-grain sampling of two CPU performance metrics: instructions throughput (IPC) and last level cache bandwidth.
Thanks to the information provided by these metrics about hardware resources’ activity, and using DVFS to control power/performance, we show that it is possible to achieve up to 16% energy savings at the cost of less than 3% performance degradation on real HPC applications.
Title: Deployment of services in Fog Computing
Speaker: Farah Ait Salaht
Fog Computing, complementary to Cloud Computing, has recently emerged as a new paradigm that extends the computing infrastructure from the center to the edge of the network. Motivated by a rapidly increasing number of devices and Internet of Things (IoT) applications at the extreme edge of the network that implies the need for timely and local processing, Fog Computing offers a promising solution to move computational capabilities closer to the end-devices. Deploying applications to Fog nodes in a QoS- and context-aware manner is a challenging task due to the heterogeneity and scale of Fog infrastructures. This talk discusses what Fog is, provides an up-to-date review of service placement problem in such an environment (problem statement, problem formulation, optimization metrics, and optimization strategies), depict the variants of the problem and the current proposals coming from the research community.
Title: Software licenses for fun and profit
Speaker: Arthur Chevalier
— French version :
Aujourd’hui, l’utilisation des logiciels est généralement réglementée par des licences, qu’elles soient gratuites, payantes et avec ou sans accès à leurs sources. L’univers des licences est très vaste et mal connu. Souvent on ne connaît que la version la plus répandue au grand public (un achat de logiciel est égale à une licence). La réalité est bien plus complexe surtout chez les grands éditeurs. Dans cette présentation je présenterai l’impact et l’importance de la gestion de ces licences lors de l’utilisation de logiciels dans une architecture Cloud. Je montrerai un cas d’étude pour prouver l’impact de la gestion dynamique des licences et la nécessité de proposer de nouvelles façons de gérer un patrimoine logiciel. Ce cas d’étude portera sur des logiciels vendus par quatre grands éditeurs (Microsoft, Red Hat, Software AG et Oracle).
— English version :
Today, the use of software is generally regulated by licenses, whether they are free, paid for and with or without access to their sources. The world of licensing is very vast and poorly understood. Often we only know the version most widely used by the general public (a software purchase is equal to a license). The reality is much more complex, especially for large publishers. In this presentation I will present the impact and importance of managing these licenses when using software in a cloud architecture. I will show a case study to demonstrate the impact of dynamic license management and the need to propose new ways to manage software assets. This case study will focus on software sold by four major vendors (Microsoft, Red Hat, Software AG and Oracle).
Title: Toward an autonomic engine for scientific workflows and elastic Cloud infrastructure
Speaker: Hadrien Croubois
Abstract: The constant development of scientific and industrial computation infrastructures requires the concurrent development of scheduling and deployment mechanisms to manage such infrastructures. Throughout the last decade, the emergence of the Cloud paradigm raised many hopes, but achieving full platform autonomicity is still an ongoing challenge.
Work undertaken during this Ph.D. aimed at building a workflow engine that integrated the logic needed to manage workflow execution and \cloud deployment on its own. More precisely, we focus on \cloud solutions with a dedicated Data as a Service (DaaS) data management component. Our objective was to automate the execution of workflows submitted by many users on elastic Cloud resources.
This contribution proposes a modular middleware infrastructure and details the implementation of the underlying modules:
– A workflow clustering algorithm that optimises data locality in the context of DaaS-centered communications;
– A dynamic scheduler that executes clustered workflows on Cloud resources;
– A deployment manager that handles the allocation and deallocation of Cloud resources according to the workload characteristics and users’ requirements.
All these modules have been implemented in a simulator to analyse their behaviour and measure their effectiveness when running both synthetic and real scientific workflows. We also implemented these modules in the DIET middleware to give it new features and prove the versatility of this approach. Simulation running the WASABI workflow (waves analysis based inference, a framework for the reconstruction of gene regulatory networks) showed that our approach can decrease the deployment cost by up to 44% while meeting the required deadlines.
Title: Network Models for Multi-Objective Discrete Optimization
Speaker: Carlos Cardonha
Abstract: This work provides a novel framework for solving multi-objective discrete optimization problems with an arbitrary number of objectives. Our framework formulates these problems as network models, in that enumerating the Pareto frontier amounts to solving a multi-criteria shortest path problem in an auxiliary network. We design tools and techniques for exploiting the network model in order to accelerate the identification of the Pareto frontier, most notably a number of operations to simplify the network by removing nodes and arcs while preserving the set of nondominated solutions. We show that the proposed framework yields orders-of magnitude performance improvements over existing state-of-the-art algorithms on four problem classes containing both linear and nonlinear objective functions.
This is a joint work with David Bergman, Merve Bodur, and André Ciré.
Mini-bio: Carlos Cardonha is a Research Staff Member of the Optimization under Uncertainty Group at IBM Research Brazil, with a Ph.D. in Mathematics (T.U. Berlin) and with a Bachelor’s and a Master’s degree in Computer Science (Universidade de São Paulo). His primary research interests are mathematical programming and theoretical computer science, with focus on the application of techniques in mixed integer linear programming, combinatorial optimization, and algorithms design and analysis to real-world and/or operations research problems.
Title: Latency-Aware Placement of Data Stream Analytics on Edge Computing
Speaker: Alexandre da Silva Veith
Abstract: The interest in processing data events under stringent time constraints as they arrive has led to the emergence of architecture and engines for data stream processing. Edge computing, initially designed to minimize the latency of content delivered to mobile devices, can be used for executing certain stream processing operations. Moving operators from cloud to edge, however, is challenging as operator-placement decisions must consider the application requirements and the network capabilities. In this work, we introduce strategies to create placement configurations for data stream processing applications whose operator topologies follow series-parallel graphs. We consider the operator characteristics and requirements to improve the response time of such applications. Results show that our strategies can improve the response time in up to 50% for application graphs comprising multiple forks and joins while transferring less data and better using the resources.