WG Issam Raïs: An analysis of the feasibility of energy harvesting with thermoelectric generators on petascale and exascale systems

2016-04-16 Issam Raïs

Title: An analysis of the feasibility of energy harvesting with thermoelectric generators on petascale and exascale systems;

Speaker: Issam Raïs

Abstract: The heat induced by computing resources is generally a waste of energy in supercomputers. This is especially true in very large scale supercomputers, where the produced heat has to be compensated with expensive and energy consuming cooling systems. An analysis of the feasibility of energy harvesting with thermoelectric generators on petascale and exascale systems; Energy is a critical point for future supercomputing trends that currently try to achieve exascale, without having its energy consumption reaching an important fraction of a nuclear power plant. Thus, new ways of generating or recovering energy have to be explored. Energy harvesting consists in recovering wasted energy. ThermoElectric Generators (TEGs) aim to recover energy by converting wasted dissipated energy into usable electricity. By combining computing units (CU) and TEGs at very large scale, we spotted a potential way to recover energy from wasted heat generated by computations on supercomputers. In this paper, we study the potential gains in combining TEGs with computational units at petascale and exascale. We present the technology behind TEGs, the study of a typical supercomputer environment, and finally our results concerning binding TEGs and computational units in a petascale and exascale system. With the available technology, we demonstrate that the use of TEGs in a supercomputer environment could be realistic and quickly profitable, and hence have a positive environmental impact.

PDF: Thermoelectricity

Hadrien Croubois : Detecting Silent Data Corruption Using an Auxiliary Method and External Observer

Title: Detecting Silent Data Corruption Using an Auxiliary Method and External Observer

Speaker: Hadrien Croubois

Abstract: HPC platforms and application are becoming increasingly complex. Consequently, protecting results against all forms of corruption and ensuring trustworthiness are becoming more important. While previous work focuses on application-specific detectors, the dataflow manager in our current work in the Decaf project aims to have an efficient generic mechanism. We address those issues using new replication patterns that rely on the use of an auxiliary method and an external learning observer. In this talk, we present both the theoretical validation mechanisms and different use cases where our mechanism can be applied to detect silent data corruption.

2015-09-08_Hadrien

Sonia Ben Mokhtar : Building Selfish-Resilient Distributed Systems

Title : Building Selfish-Resilient Distributed Systems

Speaker: Sonia Ben Mokhtar

Abstract: Collaborative systems (e.g., peer-to-peer instant messaging, file sharing, live streaming applications) generate among the largest amounts of traffic of today’s Internet. Common to all these systems is the assumption that, in return to the service offered by the collaborative system, users are willing to participate by sharing their resources with others. However, in practice, these systems suffer from selfish users that strategically free-ride the system whenever it is convenient for them. Albeit a number of solutions have been devised in the literature to deal with this problem, most of them are tailored to specific systems and thus lack flexibility and re-usability. During this seminar I will discuss methods for building selfish resilient distributed systems and future directions towards the automatic transformation of a given collaborative system into a system resilient to selfish behaviors.