Contributors: IPSL, CERFACS, IDRIS, MDLS, CNRS-GAME
The experience gained during high-resolution experiments (CINES Grand Challenge for IPSL and current PRACE projects for CNRM-CERFACS) allows us to identify potential technical show stopper in the setup of high-resolution models. These include the coupled model parallelism in terms of computing and memory, the management of input and restart files, the interpolation algorithms used to generate the initial states, and the component coupling as well as some problems related with the adaptation of parametrization of cumulonimbus convection when going from meshes of few hundreds to few tens of km. In this task, we propose solutions to the problems encountered, targeting production runs of CNRM-CM-HR the high resolution version of CNRM-CM developed mainly by CERFACS, and of the high-resolution version of IPSL-CM (see deliverables 2.2 and 2.3).
Task 2.1 : Model improvements for high-resolution simulations
IPSL coupled model parallelism
The implementation of mixed MPI/OpenMP parallelization in all components of IPSL coupled model, started during the CICLE ANR project. On one hand, it is proposed here to finalise this work and validate it by checking the reproducibility of the results of our high-resolution coupled model on multicore architectures, such as the IBM Ada Sandy-Bridge. On other hand, after a performance study, we propose to identify the bottleneck, and optimise the hybrid MPI/OpenMP parallelism to achieve suitable performance for high-resolution modelling. Performed in close collaboration with IDRIS and Maison de la Simulation this task will be culminate with a series of new training targeting both masters, Phd students and Climate Models developers.
CNRM-CM-HR coupled model parallelism
As detailed in Sect 1.2, the efficiency of our coupled model at ~50 km resolution flattens dramatically for number of cores greater than O(1000). We will test if increasing the resolution even further (~25 km), which increases the work load per core, will result in a better scalability for a higher number of cores. We will also study the applicability of OpenMP parallelism for ARPEGE and SURFEX for improving CNRM-CM scalability at all resolutions. MPI parallelism of the GELATO component will be optimized and OpenMP will be considered for this component.
IPSL coupled model memory scalability
The Grand Challenge has shown that the memory footprint of LMDz atmospheric component becomes a real problem at high resolution. Given the current tendency to have less memory per computing node, this problem will certainly become a show-stopper for high-resolution models, especially on new architectures of blue gene type. For historical reasons, each MPI process running the dynamic part of the LMDz atmospheric code holds in its memory the total global fields, even if it performs calculations only on a local part of the field. This part of the code, about 60 files gathering about 10000 lines of code, will be rewritten to declare only the local part of the field, to manage index loop on the local domain and to implement MPI exchanges between the processes.
Scale aware parametrizations of deep convection
In climate models, a number of important processes (clouds, turbulence) are subgrid-scale and must be parametrized. When changing the grid resolution from a few hundreds to a few tens of km, most parametrizations (radiation, fair weather cumulus) can be kept unchanged. This is not the case of deep cumulonimbus convection, which organises at scales from a few tens to a few hundred km (squall lines, meso-scale convective system). One practical consequence is the tendency of model to develop so-called grid point storms, i.e. strong numerical convective rainfall associated with a strong vertical ascending motion in one particular column of the model. A very recent development in LMDZ model concerns “convection triggering”, for which a stochastic approach was developed to account for the dependency to the grid cell size: the larger the grid cell the higher the probability of triggering convection at a given time. This introduction of a stochastic approach in a deterministic Eulerian set of equation opens a path to so called scale-aware parametrizations of deep convection. This new parametrizations framework will be explored further, and tested in the full climate model with an atmospheric resolution ranging from 200 to 25 km.
Task 2.2 : Managing efficiently input and restart files
In many components, the global restart fields are read from the restart files simultaneously by all processes. When the number of cores gets above O(100), this induces an over consumption of memory and an overload of the file system which considerably slows down the initialization. Also, the writing of the restart files at the end of the run is done in each component by the master process causing an important bottleneck in the simulation. XIOS is an I/O server being developed to ensure smooth and efficient parallel management of very high resolution data output. XIOS integration in all components of the coupled system is planned in Task 4. Here we will extend the use of XIOS to the management of the input (ancillary or forcing) and restart files (for reading and writing) to solve the problems described above. Asynchronous reading will avoid latency linked to accessing the disk file per se. Which means that the IO server processes will read the input data in advance and make it available to the client model processes that will in fact receive the data only when it needs it.
Task 2.3 : Integrating parallel interpolation mechanisms in XIOS
The generation of initial states are based on sequential interpolation algorithms, which cost is proportional to square (N) where N is the grid dimension. At high resolution, cost and memory consumption of these sequential algorithms becomes simply unacceptable. For example, the creation of the initial states for LMDz for the Grand Challenge simulation required a full day and 500 Gbytes of memory on the NEC SX9 vector computer. For ORCHIDEE, we simply failed in producing the initial states following the usual method. In this task, the algorithm developed in the G8 project ICOMEX for second order conservative remapping between our icosahedral geodesic and regular latitude-longitude grids will be integrated in the XIOS server. This will allow efficient conservative remapping of the restart, input and output data in our high resolution coupled system. With this algorithm, the weight calculation is done in parallel and its cost is proportional to N log N (where N is the grid dimension).
Task 2.4 : Parallel component coupling
Tests performed on the Curie platform with a higher resolution version of ARPEGE-NEMO, atmosphere at a ~22 km resolution coupled to the ocean at a ~25 km resolution by the previous non-parallel OASIS coupler, showed a coupling overhead of up to ~20% when the coupled system was run on O(1000) cores. This demonstrates the necessity of interfacing all components of our high-resolution coupled models with the new parallel version of the coupler, OASIS3-MCT and to ensure his efficiency at a number of cores even greater than O(1000) cores. In CONVERGENCE, we plan to implement the OASIS3-MCT interface in all ocean and atmosphere components of our coupled models to ensure fully parallel interpolation and exchange of the coupling data. To perform parallel interpolation of the coupling data, OASIS3-MCT reads in and uses any set of weights predefined offline by other software or library. The coupler currently includes sequential algorithms for generating remapping weights based on the SCRIP library originally developed at the Los Alamos laboratory. The cost of these sequential algorithms becomes unaffordable at high resolution. For example, at the Grand Challenge resolution, the generation of conservative remapping weights was simply impossible. It is proposed here to evaluate the different parallel methods currently used to generate interpolation weights, the ones developed in ICOMEX for XIOS, in ESMF and in the Open-PALM coupler. Depending on the efficiency, robustness, precision and usability, one or some of them will be made available to OASIS3-MCT users for offline pregeneration of high quality regridding weights for the coupling exchanges.
Number of developments from task 2.1 to 2.4 included in reference version of IPSL andCNRMCM coupled models . OASIS3-MCT implemented and used in both coupled models. Throughput of 2 simulated years per day for the IPSL coupled model at 1/3th° resolution and 4 simulated years per day for high-resolution version of CNRM-CM (~50 km for the atmosphere, 1/4th° for the ocean).
Risks and envisaged solutions:
Difficulty to hire trained computer scientist with skills in HPC. May increase the time needed for some developments. Consequences: the target throughput may not be reached. Parallel weight generation and interpolation may not be operational (low risk, as previous methods will be still usable).For use of OASIS3- MCT, the risk is low as it is already used in few coupled model and has already shown ease of use and good performances.