Task 5 – CliMAF: a framework for climate models evaluation and analysis

Leader: J. Servonnat (LSCE-CEA) and S. Sénési (CNRS-GAME)

Contributors: IPSL-LSCE, CNRS-GAME, CERFACS, IPSL-LOCEAN, IPSL-LMD, IPSL-LATMOS

Objectives:

CliMAF (for Climate Model Assessment Framework) is the French climate community operational solution that will be developed in this task for efficient evaluation and monitoring of climate model outputs. It is a framework for a simplified (from the user point of view) and efficient access to both model simulations and reference datasets, data pre-processing (data subsetting, period selection, regridding), application of a set of diagnostics and metrics for model evaluation and finally, a web-oriented visualization solution to explore the evaluation results through an “atlas”, i.e. organized set of figures. CliMAF is a framework for climate model developers. Its flexibility also makes it suitable for researchers wanting to develop more specialized diagnostics while taking benefit from its data access and visualization tools. Our goal is that CliMAF can be used on a workstation, and on the computing centers and servers used at GAME, IPSL and CERFACS. An option will allow parallel computation of the diagnostics to face the huge amounts of data generated by ensembles and high resolution models. We plan close interactions with the EMBRACE WP4, IS-ENES and ExArch so that parallel projects can take benefits from other ones and to favour a community approach for evaluation and analysis tools for climate simulations.

  • Task 5.1: General driver and upstream user interface

The general driver of CliMAF is the core of the framework. It actually launches scripts implementing the diagnostics and metrics (see e.g. Gettelman et al., 2012) that will generate the various atlas. An upstream user interface (UI) will allow the user to describe the detailed content of each atlas, in a flexible way. A subset of “standard” evaluation and monitoring atlas will be available for the user. We consider two levels of sophistication for the UI, text file and point-and-click user interface. The general driver will be supported by a thorough analysis of the framework for ensuring genericity and evolution. The driver will be suitable to implement additional scripts in Ncl, R and Python, notably developed within task 5.4. The UI will allow batch submission of a CliMAF job. It will thus be possible to submit CliMAF jobs automatically during an on-going simulation for updating some monitoring atlas, or submit other ones,manually or not, when a simulation is completed, for a more comprehensive evaluation atlas. For both general driver and UI, interactions with the EMBRACE WP4 are emphasized to both use their framework as a basis and in return provide them with improvements developed in this task. The general driver will also describe the expected monitoring and evaluation data (produced with CliMAF) with structured metadata to be published in catalogs for later searches. To reduce CliMAF execution time, the driver will prepare a parallel implementation of a subset of the core diagnostics from task 5.4 to be run on a Cluster or an HPC. We plan to take advantage from the Swift (or similar) library as data-intensive task-oriented workflow.

  • Task 5.2: Services layer

Easing the development of diagnostic scripts is a basic requirement of CliMAF; this is achieved by providing ‘services’ to these scripts, in the form of a library of high-level functions, which include and combine the following functionalities: locating the data (including on ESGF datanodes), fetching the data (if requested), selecting the data in the space-time domain, caching the data on local disk, computing derived variables (using compute nodes facilities when available), re-gridding the data and forwarding all metadata available in the original data. Typically, upon a function call for getting data, the Service Layer will check step-by-step if the required derived quantity or data file is already existing in a local cache memory. If not, it will manage the required operations from the data access to the calculation of derived quantities or regridding to provide the calling script with the data ready for analysis. This upstream checking will avoid multiple calculation of the same derived variable, particularly when using CliMAF for monitoring.

  • Task 5.3: Visualization tools

CliMAF will produce extensive data through evaluation and monitoring diagnostics, both qualitative (maps, figures) and quantitative (scores, metrics) for every model run. Thorough analysis of these results using standard hands-on approaches is tedious for individual scientists and eventually becomes impossible as the number of simulations increases.Task 5.3 will provide a semi-automated solution to facilitate comparison of diagnostic and validation data across dozens of simulations. It will develop the means to search, organize and represent these numerous post-processing data. Among other fundamental tools, Task 5.3 will develop a web application to steer the search and organize results to facilitate subsequent, more synthetic analysis, notably in response to the standard versions of CliMAF evaluation and monitoring atlas (see task 5.4). Furthermore, diagnostic and validation data will be brought together and used in high-level joint graphical and statistical representations while exploiting the most recent web graphics technologies such as the Data-Driven Documents library (D3.js)

  • Task 5.4: Evaluation and monitoring diagnostics

This sub-task will consist in implementing in CliMAF a collection of evaluation and monitoring diagnostics used at GAME, IPSL and CERFACS, using the basic features of the general driver and the functions of the Service Layer. We will also collaborate with the DRAKKAR community to implement their ocean model monitoring and evaluation tools. As well, evaluation packages of the Is-ENES project can be integrated. A core set of diagnostics and metrics will define initial “standard” versions of CliMAF evaluation and monitoring atlas. It will be based on the classical atlas already in use in the partner’s centers for model development. In addition to this core set of diagnostics, the contributors will feed the standard collection with diagnostics that are relevant for model evaluation or monitoring from their expertise.

Success criteria:

A success criteria is that the climate model developers of the French community use CliMAF routinely during model development phases. The initial objectives will be overcome if climate scientists use CliMAF for climate research.

Risks and envisaged solutions:

  • Dealing with the variety of languages that potential script developers may wish to use. A fall-back solution should be designed, which could rely on using files to deliver the data to non-supported languages.
  • Difficulty in putting few constraints on diagnostic scripts language while providing advanced utilities to the developer. A trade-off has to be reached. Also, shortcomings in the overall design would hinder extensibility. This risk will be addressed by requiring validation by a large number of engineers and scientists in partner institutes.
  • Contextual descriptions of the diagnostic and validation data and their indexation catalogue are the conditions to make searches. ESGF Publishing or alternatives will be tested for that purpose.
  • There is no particular risk linked with the development of evaluation and monitoring diagnostics. The main risk of task 5.4 is the possibility to actually implement the diagnostics in the driver, notably for the DRAKKAR tools and IS-ENES packages, in a sustainable way. From this point of view, success of Task 5.4 is closely linked with the success of Task 5.1 and 5.2.

Task 4 – Big data management and analytics of climate simulations

Leader: S. Denvil (IPSL)

Contributors: IPSL, LSCE, IDRIS, CNRS-GAME, CERFACS, MDLS

Objectives:

Many challenges remain for climate scientists, and new ones will emerge as more and more data pour in from climate simulations run on ever-increasing high-performance computing (HPC) platforms and from increasingly higher-resolution satellites and instruments. From these current and future challenges, we envision a scientific data universe in which data creation, collection, documentation, analysis and dissemination are expanded. To prepare for such an expansion with continual delivery of a comprehensive end-to-end solution we will tackle specific challenges in this task. Bring together large volumes of diverse data to generate new insights reveals three challenges:

  • Variety: managing complex data from multiple types and schemas (model and observations),
  • Velocity: ingesting and distributing live data streams and large data volume quickly and efficiently,
  • Volume: analyzing large data volume (from terabytes to exabytes) in place for big data analytics.

Recognizing that large-scale scientific endeavors are becoming ever more data centric, we have begun to move toward a data services model. As part of this transition, over the past year, we successfully operated ESGF stack as a means of integrating and delivering scientific data services to the climate community. We will build upon our existing capabilities to efficiently produce, manage, discover and analyse distributed climate data. French community will strengthen his position in the international community by contributing to this effort. By leveraging the METAFOR CIM, we can output simulation and model documentation in a standardized format. By leveraging ES-DOC client tools we have a clear pathway for creating, publishing and exploiting model run documentation inherently conformant to an internationally recognized standard.

  • Task 4.1: XIOS implemented within project models (IPSL, MDLS, CERFACS, CNRM)

Without XIOS, output files were written partially process per process during the run. The time needed for the reconstruction of the files, done sequentially in the post-processing phase, grows exponentially with the resolution becoming in some very-high resolution cases even higher than the simulation time. This task will implement XIOS within IPSL-CM models components (LMDz, ORCHIDEE, INCA) and CNRM-CM model components (ARPEGE, GELATO, SURFEX). NEMO, used by both IPSL-CM and CNRM-CM, already implements XIOS but will have to integrate new functionalities (Tasks 2.2 and 4.2). This work will impact source code of cited components and will cover the ability to switch off the previous IO system, to integrate the new system and to validate the numerous field generated by climate models (more than thousand quantities). Furthermore, development done in Task 2.2 and Task 4.2 will be readily available to IPSL-CM and CNRM-CM families.

  • Task 4.2: XIOS a bridge towards standardisation (IPSL, MDLS, CERFACS, CNRM)

XIOS output format, structure and description will (1.) conform to the Climate and Forecast (CF) convention when describing spatio-temporal distribution of data (2.) conform to CMIP controlled vocabulary requirements and (3.) conform to CIM ontology describing simulation and model documentation. This will enable easier and faster systematic ingestion of outputs by data services developed in Task 4.3, and ensure a high level of documentation, provenance, standardization and reuse. Steps 1 and 2 were up to now done through costly post-processing and step 3 by time consuming manual intervention using on-line questionnaire. CF and CMIP controlled vocabulary requirements will be fulfil by adapting XIOS code so as to perfectly define and fill variables describing space and time discretization and by generating XIOS configuration file embarking project controlled vocabulary (CMIP and followers). XIOS project specific configuration files will be generated from the information system operated in Task 4.3. Innovative ES-DOC meta-programming approach will forward engineer C++ and python CIM client tools ensuring that the client tools are speedily updated in response to changes in the CIM standard. Those client tools will be implemented within XIOS to create CIM compliant documents and within the running environment to publish those documents. CIM instances will be created on the local file system in either XML of JSON format then published to remote ES-DOC CIM API.

  • Task 4.3: Data and metadata services (IPSL, IDRIS, CNRS-GAME)

Part of the intellectual content forming the basis of current and future climate research is being assembled in massive digital collections scattered at different place. As the size and complexity of these holdings increase, so do the complexities arising from interactions over the data, including use, reuse, and repackaging for unanticipated uses, as well as managing over time the historical metadata that keep the data relevant. For scientific data centres there is also a need for infrastructures that support a comprehensive, end-to-end approach to managing climate data: full information life cycle management. Ultimately, we have to be able to manage diverse collections within a uniform, coordinated environment. We will further develop ESGF and ES-DOC environment capabilities to that end. This environment will be our communication bus, accessible through RESTful interfaces and will be an integrative middle layer on which various data services, tailored to the needs of a diverse and growing user base, will be built. So far, ESGF ingestion capabilities have been limited to parsing NetCDF files organized in THREDDS catalogs, which was enough to support the global CMIP5 effort and related climate projects (CORDEX, PMIP3, Obs4MIPs, etc…). We will develop services to greatly enhance the variety of resources that can be published and discovered through the system. The targets are simulations data and CIM documents produced by the climate modelling platform (standardized in Task 4.1), reference datasets to sustain the Climate Evaluation Framework and diagnostics images produced in Task 5. Architecture will support publication through both a “pull” mechanism, whereby clients request the service to harvest a remote metadata repository, and a “push” mechanism, whereby clients send complete metadata records directly to the service for ingestion. The system will support the ability to plug-in new harvesting algorithms. The new publishing client will be use within the runtime environment developed in Task 3 and to that end will have to be compliant with GENCI HPC centres requirements and CNRS-GAME computing resources requirements.

  • Task 4.4: Big Data Analytics (IPSL, CNRS-GAME, IDRIS, CERFACS)

We will focus on specific development to enable server side computation capabilities at data location balanced and orchestrated by local cluster computation having high capacity link with the data location. This will greatly expand the capability of the ESGF Compute Node software stack continuing effort done in G8 ExArch project (finishing June 2014) and contributing to the international development of the ESGF stack. Some core/generic functions have to be available at the data location. Primitive functions needed by typical climate analysis process will be implemented on the ESGF compute node at the data location (average, interpolation, Taylor diagram) primarily chosen to feed Task 5 needs.

The ESGF compute node will allow for large-scale manipulation and analysis of data. These computations will be coordinated within an organizational unit as well as across organizational units directed by data locality. Institution cluster will be used to orchestrate the analysis by querying compute node’s core functions and performing directly “non core” computations. This capability coupled with the existing distributed ‘data-space’ capability will give us the ability to take advantage of local and remote computing resources to efficiently analyse data across their locality. Each participating organization will be able to provide policies to govern its local resource utilization. ESGF authentication and authorization system will be used to restrict access to the computational capabilities when necessary.

Success criteria:

Standardized data and metadata can be seamlessly ingests during simulations runtime. ESGF and ES-DOC API are largely used so as to discover, reuse and analyse simulations, reference datasets and images.

Identified risks:

Not all climate outputs, metadata or images can be ingested at run time either because XIOS has not been implemented in all components or because ingestion process is not fast enough for near real time ingestion. Manual post-processing will have to be implemented to overcome this issue for a subsets of simulation. Server side core functions are not enough generic or efficient to suit the task 5 needs. This has to be identify as early as possible so as to define a fall-back position that could take the form of additional post-processing procedure at the HPC centre triggered manually for simulations subsets.

Task 3 – Runtime Environment

Leader: A. Caubel (LSCE), M.-A. Foujols (IPSL)

Contributors: IPSL, CERFACS, IDRIS, CNRS-GAME

Objectives:

Climate modelling community is targeting larger ensembles simulations performed with higher resolution models and more advanced earth system components, both in terms of physical processes and parallel programming complexity. A robust and reliable runtime environment is needed to run such simulations together with efficient use of computing resources. The libIGCM runtime environment handles the production of climate simulations – i.e the orchestration of about hundreds batch scheduled jobs per simulation (both model execution tasks and output data post-processing tasks) – on different computing centres. The aim of this task is to perform developments in libIGCM workflow in order to have a runtime environment handling complexity and load balancing of parallel programming of coupled model (MPMD mode, hybrid MPIOpenMP parallelization, IO tasks), ensuring robustness and reliability to all Earth System Models users as well as portability on different High Performance Computing centres.

  • Task 3.1: Process assignment (IPSL, IDRIS, CNRS-GAME)

Components of climate model are codes, which use MPI context only or hybrid MPI-OpenMP context to manage parallel computations and data communications. On most of our European ESM, from which IPSL and CNRMCM, these components are coupled together with OASIS on a so-called “MPMD (Multiple Program Multiple Data) “application. MPMD applications require a specific system manager with different levels of parallelism, which is not a standard for computing centres. The aim of this sub-task is to work with computing centres to find the best way to address computing processes and tasks on cores and nodes of the supercomputer. The appropriate method will be implemented in libIGCM runtime environment.

  • Task 3.2: Optimization, Load balancing (IPSL, CERFACS, CNRS-GAME)

As any coupled system, the complexity of a coupled climate application imposes us to pay attention to the location of the tasks on the cores and nodes of the supercomputers, and to analyse deeply the performances of every model task and every independent component to expect to use efficiently computing resources.

Every independent component has to be analyzed in terms of computing tasks and I/O tasks. Using XIOS in every component (see Task 4) will allow reducing the elapsed time of the simulation thanks to the asynchronous parallel writing. We propose to develop an analyse tool in order to find the optimal number of XIOS servers needed to make every component model run as balanced as possible between computing tasks and I/O tasks.

Load-balancing between components that are running in parallel also strongly influences performances of a coupled model. An analyse tool, easily expandable to any coupler, has been developed by CERFACS (Maisonnave, 2012) to measure performances of every component in the whole coupled model, to evaluate their scalabilities and to find the optimal number of process (1) to balance each independent component durations and (2) to speed-up the whole system. We propose to extend its use to any IPSL-CM and CNRM-CM configuration and to provide additional information such as OASIS coupling cost (interpolation, communications).

Besides, a study will be done on the effect of the location of computing (and IO) process on different cores and nodes of the supercomputer to understand the best way to map the different tasks of the whole Earth System model. This study will be done on machines of different HPC centre (IDRIS and TGCC) centre. The use of these analysis tools and the conclusions of the study on the optimization of the process assignment, that will be useful for any coupled model on high end supercomputers, will find a realization in the libIGCM runtime environment

  • Task 3.3: Climate Simulations Supervision (IPSL, IDRIS, CNRS-GAME)

Typical climate simulation keeps running for weeks within HPC centres; in particular they must be able to span over HPC maintenance period. During his running time of three weeks one single typical simulation will produce and manage around 100 000 files representing 25 To of data. Orchestration of about hundreds batch scheduled jobs (model and post-processing tasks) is necessary to achieve one simulation. Climate simulation workflow, such as IPSL simulation control environment (libIGCM) have an execution model that is extremely data-flow oriented and dynamically generated depending on a set of user defined rules.

While statically configured workflows are sufficient for many applications, the notion of dynamic, event-driven workflows enable a fundamentally different class of applications. Climate simulations are part of this class of applications, events that occur after the workflow has been started can alter the workflow and what it is intended to accomplish. We may wish to change a workflow for a variety of reasons, and the events that trigger these changes may come from any source possible. Events may occur in external, physical systems, in the application processes themselves, or in the computational infrastructure. Clearly the ability to dynamically manage workflows is an important capability that will enable dynamic application systems and also improve reliability.

As well as dynamic behaviour, the autonomic aspect is strongly needed in climate simulation control environment. At the most general level, autonomic computing is intended to be “self defining and self healing”. To realize these behaviours on a practical level we will develop a “supervisor agent” to accomplish precisely the behaviours required for dynamic workflow management, (1) detect and understand failure event, (2) understand the ultimate goals of the workflow, and (3) be able to re-plan, reschedule, and remap the workflow. Fault detection is essential and in this case, we will rely on the service call tree to propagate the cancellation events. This service call tree would be “short-circuited” and rescheduled in specific cases. This must be done even though the agent may have imperfect knowledge of the environment and limited control over it. Under these conditions, we must realize that dynamic workflows may only be capable of approximately accomplishing their goals. That is why a tight integration and cooperation with HPC centres are essential to ensure portability. The supervisor agent will provide the following features and will be included in the IPSL simulation control environment (libIGCM):

  • All events logged in a comprehensive call tree (job submission, work to be done, each copy, etc.)
  • Reliable lightweight communication channel between client agents and server agents (very likely to be based on RabbitMQ implementation of the Advanced Message Queuing Protocol, AMPQ)
  • Call tree traversal capabilities so as to determine checkpoint restart
  • Autonomous rescheduling of necessary jobs
  • Monitoring capabilities, for instance colored graph with all jobs and their status
  • Regression tests handling capabilities

Success criteria:

Number of developments from task 2.1 to 2.4 included in reference version of IPSL andCNRMCM coupled models . OASIS3-MCT implemented and used in both coupled models. Throughput of 2 simulated years per day for the IPSL coupled model at 1/3th° resolution and 4 simulated years per day for high-resolution version of CNRM-CM (~50 km for the atmosphere, 1/4th° for the ocean).

Identified risks:

Difficulty to hire trained computer scientist with skills in HPC. May increase the time needed for some developments. Consequences: the target throughput may not be reached. Parallel weight generation and interpolation may not be operational (low risk, as previous methods will be still usable).For use of OASIS3- MCT, the risk is low as it is already used in few coupled model and has already shown ease of use and good performances.

Task 2 – Towards high-resolution coupled models

Leader: Y. Meurdesoif (IPSL), S. Valcke (CERFACS)

Contributors: IPSL, CERFACS, IDRIS, MDLS, CNRS-GAME

Objectives:

The experience gained during high-resolution experiments (CINES Grand Challenge for IPSL and current PRACE projects for CNRM-CERFACS) allows us to identify potential technical show stopper in the setup of high-resolution models. These include the coupled model parallelism in terms of computing and memory, the management of input and restart files, the interpolation algorithms used to generate the initial states, and the component coupling as well as some problems related with the adaptation of parametrization of cumulonimbus convection when going from meshes of few hundreds to few tens of km. In this task, we propose solutions to the problems encountered, targeting production runs of CNRM-CM-HR the high resolution version of CNRM-CM developed mainly by CERFACS, and of the high-resolution version of IPSL-CM (see deliverables 2.2 and 2.3).

  • Task 2.1 : Model improvements for high-resolution simulations

IPSL coupled model parallelism

The implementation of mixed MPI/OpenMP parallelization in all components of IPSL coupled model, started during the CICLE ANR project. On one hand, it is proposed here to finalise this work and validate it by checking the reproducibility of the results of our high-resolution coupled model on multicore architectures, such as the IBM Ada Sandy-Bridge. On other hand, after a performance study, we propose to identify the bottleneck, and optimise the hybrid MPI/OpenMP parallelism to achieve suitable performance for high-resolution modelling. Performed in close collaboration with IDRIS and Maison de la Simulation this task will be culminate with a series of new training targeting both masters, Phd students and Climate Models developers.

CNRM-CM-HR coupled model parallelism

As detailed in Sect 1.2, the efficiency of our coupled model at ~50 km resolution flattens dramatically for number of cores greater than O(1000). We will test if increasing the resolution even further (~25 km), which increases the work load per core, will result in a better scalability for a higher number of cores. We will also study the applicability of OpenMP parallelism for ARPEGE and SURFEX for improving CNRM-CM scalability at all resolutions. MPI parallelism of the GELATO component will be optimized and OpenMP will be considered for this component.

IPSL coupled model memory scalability

The Grand Challenge has shown that the memory footprint of LMDz atmospheric component becomes a real problem at high resolution. Given the current tendency to have less memory per computing node, this problem will certainly become a show-stopper for high-resolution models, especially on new architectures of blue gene type. For historical reasons, each MPI process running the dynamic part of the LMDz atmospheric code holds in its memory the total global fields, even if it performs calculations only on a local part of the field. This part of the code, about 60 files gathering about 10000 lines of code, will be rewritten to declare only the local part of the field, to manage index loop on the local domain and to implement MPI exchanges between the processes.

Scale aware parametrizations of deep convection

In climate models, a number of important processes (clouds, turbulence) are subgrid-scale and must be parametrized. When changing the grid resolution from a few hundreds to a few tens of km, most parametrizations (radiation, fair weather cumulus) can be kept unchanged. This is not the case of deep cumulonimbus convection, which organises at scales from a few tens to a few hundred km (squall lines, meso-scale convective system). One practical consequence is the tendency of model to develop so-called grid point storms, i.e. strong numerical convective rainfall associated with a strong vertical ascending motion in one particular column of the model. A very recent development in LMDZ model concerns “convection triggering”, for which a stochastic approach was developed to account for the dependency to the grid cell size: the larger the grid cell the higher the probability of triggering convection at a given time. This introduction of a stochastic approach in a deterministic Eulerian set of equation opens a path to so called scale-aware parametrizations of deep convection. This new parametrizations framework will be explored further, and tested in the full climate model with an atmospheric resolution ranging from 200 to 25 km.

  • Task 2.2 : Managing efficiently input and restart files

In many components, the global restart fields are read from the restart files simultaneously by all processes. When the number of cores gets above O(100), this induces an over consumption of memory and an overload of the file system which considerably slows down the initialization. Also, the writing of the restart files at the end of the run is done in each component by the master process causing an important bottleneck in the simulation. XIOS is an I/O server being developed to ensure smooth and efficient parallel management of very high resolution data output. XIOS integration in all components of the coupled system is planned in Task 4. Here we will extend the use of XIOS to the management of the input (ancillary or forcing) and restart files (for reading and writing) to solve the problems described above. Asynchronous reading will avoid latency linked to accessing the disk file per se. Which means that the IO server processes will read the input data in advance and make it available to the client model processes that will in fact receive the data only when it needs it.

  • Task 2.3 : Integrating parallel interpolation mechanisms in XIOS

The generation of initial states are based on sequential interpolation algorithms, which cost is proportional to square (N) where N is the grid dimension. At high resolution, cost and memory consumption of these sequential algorithms becomes simply unacceptable. For example, the creation of the initial states for LMDz for the Grand Challenge simulation required a full day and 500 Gbytes of memory on the NEC SX9 vector computer. For ORCHIDEE, we simply failed in producing the initial states following the usual method. In this task, the algorithm developed in the G8 project ICOMEX for second order conservative remapping between our icosahedral geodesic and regular latitude-longitude grids will be integrated in the XIOS server. This will allow efficient conservative remapping of the restart, input and output data in our high resolution coupled system. With this algorithm, the weight calculation is done in parallel and its cost is proportional to N log N (where N is the grid dimension).

  • Task 2.4 : Parallel component coupling

Tests performed on the Curie platform with a higher resolution version of ARPEGE-NEMO, atmosphere at a ~22 km resolution coupled to the ocean at a ~25 km resolution by the previous non-parallel OASIS coupler, showed a coupling overhead of up to ~20% when the coupled system was run on O(1000) cores. This demonstrates the necessity of interfacing all components of our high-resolution coupled models with the new parallel version of the coupler, OASIS3-MCT and to ensure his efficiency at a number of cores even greater than O(1000) cores. In CONVERGENCE, we plan to implement the OASIS3-MCT interface in all ocean and atmosphere components of our coupled models to ensure fully parallel interpolation and exchange of the coupling data. To perform parallel interpolation of the coupling data, OASIS3-MCT reads in and uses any set of weights predefined offline by other software or library. The coupler currently includes sequential algorithms for generating remapping weights based on the SCRIP library originally developed at the Los Alamos laboratory. The cost of these sequential algorithms becomes unaffordable at high resolution. For example, at the Grand Challenge resolution, the generation of conservative remapping weights was simply impossible. It is proposed here to evaluate the different parallel methods currently used to generate interpolation weights, the ones developed in ICOMEX for XIOS, in ESMF and in the Open-PALM coupler. Depending on the efficiency, robustness, precision and usability, one or some of them will be made available to OASIS3-MCT users for offline pregeneration of high quality regridding weights for the coupling exchanges.

Success criteria:

Number of developments from task 2.1 to 2.4 included in reference version of IPSL andCNRMCM coupled models . OASIS3-MCT implemented and used in both coupled models. Throughput of 2 simulated years per day for the IPSL coupled model at 1/3th° resolution and 4 simulated years per day for high-resolution version of CNRM-CM (~50 km for the atmosphere, 1/4th° for the ocean).

Risks and envisaged solutions:

Difficulty to hire trained computer scientist with skills in HPC. May increase the time needed for some developments. Consequences: the target throughput may not be reached. Parallel weight generation and interpolation may not be operational (low risk, as previous methods will be still usable).For use of OASIS3- MCT, the risk is low as it is already used in few coupled model and has already shown ease of use and good performances.

Task 1 – National platform for climate modeling

Leader: J-L Dufresne (IPSL), D. Salas (CNRS-GAME)

Contributors: IPSL, CNRS-GAME, CERFACS, IDRIS, MDLS

Objectives:

The overall objective of this task is to ensure that the platform is operational and reaches its objectives, and to ensure and to promote the dissemination and the large use of the platform. First, within two years, a first version of the platform will be achieved and tested through an implementation of a standard version of IPSL-CM. Second, within the two last years, the platform will be used to implement three new climate model configurations, all components of the platform will continue to evolve to reach their final objectives. At the end of the project, the platform will be fully assessed through a challenging test. The platform is composed by a set of tools. Implementing a platform consists of assembling tools and models into different configurations, with different resolution, allowing a set of simulations, producing a set of diagnostics, running efficiently and reliably on different super-computers. All the tools developed will follow usual best practices: versioning, source code management, tracking system, documentation, examples, assembled prototypes, regular release associated with regression tests. Platform component having cross-dependencies, each versions of the platform will include implementation tests to insure the compatibility and the consistency of the new developments.

Training sessions will be organized and training material will be made available. We already organized training sessions on specifics tools. Two to three training sessions on libIGCM have been organized each year for the last 3 years with 10 to 20 participants each. The 3 days training session for OASIS is organized regularly each year.

  • Task 1.1. : Platform release, documentation and training

Each version of the platform will be released with associated documentation. Codes, documentations and training materials will be made available via the project web site. To ensure the consistency and the effectiveness of all the tools, each version of the platform will be evaluated with at least the reference version of the IPSL-CM model. During the project, different training session will be organized for different public: (1) Platform tools for climate model developers (mainly during M1 and M24), (2) How to use implemented platform for regular climate model user (mainly during M24 and M48), (3) How to use implemented platform for new user (M 36 and M48). Our past experience teaches us that the writing of documentation is often difficult to complete, and that dedicated seminar of few days with a small writing team are very efficient to do it. We will organize such seminars within the project.

  • Task 1.2 : Model implementations using the platform

In addition to the implementation with the standard IPSL-CM model, the platform will be used to implement three new climate model configurations: a version of the global climate model with the CNRM-CM model, a version of the global climate model IPSL-CM with a zoomed version of the atmospheric model, and a version of a regional climate model using the LMDZ atmospheric model. The variety of all these implementations will be an illustration of the flexibility and the suitability of the platform. They are typical of the simulations performed by the climate community and will illustrate the benefit allowed by the platform compared to existing current practices. The implementation with CNRM-CM and with the standard IPSL-CM will be relevant for usual climate experiments like those of CMIP. The implementation with the zoomed version, both coupled and uncoupled with the ocean, will be relevant to study regional climates and, for the later, to contribute to the CORDEX project

  • Task 1.3: Our « grand chalenge »: a multi-step, multi-criteria procedure to define next version of IPSL-CM model

GCMs include many parameterizations, which are approximate descriptions of sub-grid processes. These parameterizations are formulated via a series of parameters that are usually not directly observable and must be tuned so that the parametrizations fit as well as possible the statistical behaviour of the physical processes. The tuning process is an important aspect of climate model development and is usually performed at different stages: for individual parameterizations, for individual model components (atmosphere, ocean, land surface,…) and for the full coupled climate model. However, this tuning process is non linear, includes iterations among these three stages and is very time consuming. This slow down the development of the model and the overall evaluation at the end of this process is satisfactory yet.

The goal of this task is to design and perform a procedure that optimize the parameter values using a suite of IPSL-CM tests (from simple well documented tests over short periods to multi-decadal tests with the coupled climate model) and a wide range of evaluation diagnostics. This task using the platform as a whole will be a nice illustration of its added value. The first step will be to design the overall procedure, to specify configurations and for each of them the most relevant simulations and diagnostics. A first challenge will be to run in a short time a few hundreds of simulations, from a few years to a few centuries, with very different configurations (individual parameterizations, individual model components, coupled climate model with all or part of its components…). As an additional test, we will perform these very time consuming runs between two tier-1 national centre: IDRIS and TGCC. A second challenge will be to automatically analyse the huge amount of resulting data and to extract a few set of indicators that will help us to choose the model version that is the most relevant to our goals.

Success criteria:

  • The number of persons that will participate to our training sessions and that will use the platform.
  • The ease to implement the platform with a new model, configuration or experiment.
  • The length of the “tuning” phase and quality of the model obtained at the end.

Risks and envisaged solutions:

Some tools or diagnostics may be missing for the final tuning. However, we anticipate that enough of them will be available to strongly facilitate and improve the quality of the model compared to current versions.

Task 0 – Management, valorization and sustainability

Leader: J.-L. Dufresne (IPSL)

Contributors: All partners

Objectives:

The most important target of the management task is to insure and enhance communication and synergy between the partners, to insure that the project is progressing, following the planning and in accordance with the initial objectives, verifying compliance with the schedule and the supply of the deliverables. A steering committee will be established with all the task’s leaders. The steering committee will meet five times a year, keep track of the overall progress, identify possible bottlenecks and problems, organize and coordinate meetings. We will start with the definition of a detailed roadmap for the specifications of the platform and the analytic toolkits, including the design of the component-based model, the specifications for post-processing and analysis components for earth system models applications. TGCC and IDRIS experts will be included during strategic discussion to keep solutions as much portable as possible.

The main developers of global climate modelling are involved in this project that will facilitate the effective use of the platform within the national community. A possible dissemination at the European level of the platform or part of it will be facilitate by the participation of IPSL, CNRS-GAME and CERFACS to the IS-ENES (EU FP7) project. Developing climate models and performing climate runs is a long effort in which the partners are involved since more than fifteen years. In this context, the sustainability of the platform will be a priority, not to say a fundamental reason of this project. This task also includes the writing of the reports and presentations requested by the ANR.

Success criteria:

The good achievement of the project.

Risks and envisaged solutions:

No risk is foreseen as most of the partners have a long experience of cooperation. During these common past works, various practical difficulties have appeared and have always been solved.

Position of the project

Several international projects currently running or recently ended address part of those issues and CONVERGENCE partners contributed to them. Also, due to his multidisciplinary nature, CONVERGENCE is one of the very few initiative to date in such a position to fill gaps in our ability to effectively use the models we develop, to analyse their outputs and associated observations, and ultimately to answer guiding scientific questions.

Earth System Grid Federation (ESGF) is a group of archive service providers and software developers, developing and deploying a robust, distributed data and computation platform using open source software.

ESGF grid

ESGF is a unfunded federated international effort coordinated by multiple agencies and partners including Institut Pierre Simon Laplace (IPSL). METAFOR (2009-2011) was a European Union FP7 project and provided detailed metadata for climate models developing the CIM ontology (Common Information Model) to describe climate data and the models that produce it in a standard way. ES-DOC is also a unfunded federated international effort towards climate models and simulations documentation. IS-ENES1 (2009 – 2013) and IS-ENES2 (2014 – 2018) are European Union FP7 projects providing integrative research and service activities for the European Earth System Modelling community, including support for distribution of the CMIP5 archive through ESGF. The G8 initiative’s support of ExArch has provided a critical opportunity to develop a long-range internationally coordinated strategy to address the goals and requirements of exascale climate analysis.

IPSL being the coordinator of METAFOR, IS-ENES 1 and 2, and an ExArch and ES-DOC principal investigator play a prominent role in those projects. International coordination with respect to the scientific climate data management issues is done through those projects. They are the link with projects or initiative like CURATOR (NSF, NASA. NOAA): capable infrastructure for Earth system research and operations and GO-ESSP (Global Organization for Earth System Science Portals): a federation of frameworks that can work together using agreedupon standards.

This international collaboration culminates every five to seven years during the highly visible CMIP phase. It is mandatory for the French climate community to remain at the forefront on those topics in the future. CONVERGENCE will give us the opportunity to reduce the delay between simulations, distribution and analysis of high quality and well documented and referenced data to a wide variety of users. Research collaborators and partners will benefit from a better access to climate data. This has the potential to give a strong competitive advantage to the national research teams. Private companies making use of our data like insurance companies, energy suppliers or companies working on climate change impacts will as well benefit from a better access to highly traceable climate data and will thus be keen to further exploit partners data. These companies currently restrict their study to a very limited sample of simulations and datasets (the sample they have an easy access to).

CONVERGENCE principally address axis number 3 of the project call by engaging with class of problems where data volume and complexity is a major brake at every stage of climate simulations, pre-processing, runtime, data management and data analysis. CONVERGENCE will also address axis number 1 of the project call by engaging with class of problems that needs innovative algorithms and approach to evaluate model and parameters uncertainties in a multi-physic and multi-scale context. CONVERGENCE platform will favour the climate modelling ecosystem development through multidisciplinary consortium, wide adoption and training.

IDRIS experts will bring there knowledge in HPC use and environment and has been involved in DEISA and PRACE. IDRIS has been work package leader and member of the executive committee of the DEISA project. They are now a PRACE work package leader. They will be involved in scientific data management. MDLS seeks to accompany, support and stimulate scientific communities to make the best use of supercomputers, especially deployed as part of the European project PRACE and the French national centres under GENCI. MDLS organizes training sessions in the following fields: distributed parallel programming, hybrid programming (OpenMP/MPI), GPU computing, debugging and parallel sparse linear algebra. MDLS hosts and gives support to several HPC oriented Masters programs and will extend it through CONVERGENCE platform trainings.

Strategy

Increasing computational capabilities is mandatory in climate modelling to achieve higher spatial and temporal resolution, better physical process representation, explicit modelling of more biogeochemical processes, longer runs and larger ensembles. Adequate strategies and software environments integrated into specialised platforms are absolutely mandatory to sustain the development of the next generation of French climate models and the analysis of climate simulations as we are entering the Big Data era and facing the Exascale horizon.

The new comprehensive software platform will be built on ongoing efforts to homogenise and unify generic elements of the French ESMs software environment. These elements include data structure definition and IO, code coupling and interpolation, as well as runtime and pre/post-processing environments. Because it will provide the fabric through which we connect our models to state-of-the-art HPC systems, the development of a comprehensive platform demands expertise in HPC and informatics that cuts across the individual partners and the broader HPC community. Our next generation platform will be based on modern and open practices in software engineering, and will be readily usable by climate scientists. The ability to vary the configuration and the resolution of the model, to perform large ensembles of runs, to provide a precise evaluation and validation of the model will constitute a unique framework to help model development. It will provide a considerable support to the French community for his contribution to demanding and highly visible climate project like CMIP.

Challenges

The platform will be built on ongoing efforts to homogenize and to unify generic elements of the French climate models. So as to unify and homogenize platform’s components a dedicated communication bus will be implemented. The French community has adopted a flexible approach to climate modelling based on “families” of models within which a suite of configurations addressing different aspects of our scientific guiding questions are defined. The model families may differ in various ways: resolution, vertical extent, horizontal domain (e.g. global or regional), active parameterization, complexity (e.g. atmosphere-only, coupled atmosphere-ocean, carbon cycle, ocean biogeochemistry, etc. ), etc… But the different members of one family must share the same basic physical properties and any changes in their configuration must be limited to those required to address specific scientific questions. Such imbrications of model families and configurations pose considerable challenges that we propose to address in CONVERGENCE.

The U.S. NRC Panel Report “A National Strategy for Advancing Climate Modelling” identifies the challenge we plan to address as an urgent priority. This view is shared by DOE, NOAA, NASA, EC FP7, WMO, and other agencies (http://dels.nas.edu/Report/National-Strategy-Climate-Modeling/13430).

Key objectives of the climate modelling platform

The primary purpose of this project is to develop a platform capable of running large ensembles of simulations with a suite of models, to handle the complex and voluminous generated datasets, to facilitate the evaluation and validation of the models and the use of higher resolution models. The goal is to help model development, to facilitate and reinforce the French contribution to demanding and highly visible project like CMIP and CORDEX(Coordinated Regional Climate Downscaling Experiment).