Performance, Maintainability and Scalability of In-Silico Experimental Evolution Simulation (PMSISEE)

Overview

The goal of PMSISEE is to support the collaboration between the Avalon (LIP) and Beagle (LIRIS) teams through research activities on programming modelsand tools for HPC applied to the Aevol/R-Aevol simulator of in silico evolution of bacteria.

Scientific objective

A population of organisms adapting to a new environment is a dynamic system changing over time at many levels (molecules, networks, individuals, ecosystems). A large amount of empirical and theoretical evidence indicates that in real populations all these levels interact, making the dynamics of adaptation a highly complex phenomenon. In order to understand bacterial evolution, we need large-scale integrative models in which all relevant levels from the molecule to the ecology are simulated. The Aevol/R-Aevol simulator (http://www.aevol.fr) has been developed by the Beagle team to address such questions. Aevol integrates the molecular and cellular levels to address the evolution of genomic complexity. R-Aevol adds the network level to investigate the evolution of network complexity.

Challenges

In this project we consider the Aevol/R-Aevol simulator, or equivalent code, as the object of the study. At a first glance, it is characterized by several properties: the code is complex due the models to integrate; the amount of computational resources required for simulations is huge when considering the size of the systems (millions of base pairs in the genome, thousands of genes in the genetic network, billions of individuals in the population, billions of generations); load unbalance occurs when running the models under different conditions ( i.e., different parameters). Any gain in performance, will make these simulations very valuable to understand bacterial evolution and to have feedback on the biological models in order to improve them.

The research during the PMSISEE project will be restricted to two main issues related to the software and its algorithms: 1/ analysis and design of specialized models to tackle software complexity in the context of HPC using next generation of parallel supercomputers. This point is based on advances in software engineering of these last twenty years in particular with respect to code composability and re-use using component model; 2/ performance analysis and design of new, or improvement of existing, algorithms for scalable and efficient simulation of evolving bacterial populations on modern parallel architecture. This axis will deal with heuristics for scheduling in order to well balance the work load and reducing communication.