Nowadays, developers lack tools that enable the development of complex workflows involving HPC simulations and modelling with data analytics and machine learning.
eFlows4HPC aims to deliver a workflow software stack and an additional set of services to enable the integration of HPC simulations and modelling with big data analytics and machine learning in scientific and industrial applications. The software stack will allow creating innovative adaptive workflows that efficiently use the computing resources considering novel storage solutions.
38 months from 01/01/2021 to 29/02/2024
eFlows4HPC aims to propose workflow technologies and services that enable the integration of HPC simulation and modelling with big data analytics and machine learning. The project aims to demonstrate the workflow software stack through the use cases of three application Pillars with high industrial and social relevance, i.e., manufacturing, climate, and urgent computing for natural hazards.
The project concerns the design and implementation of a European workflow platform to enable complex applications that integrate HPC processes, data analytics and artificial intelligence. HPC resources are used in an easy, efficient, and responsible way while also providing accessibility and reusability of applications to reduce the time to solution.
- Scientific and Technological objectives, focused on the delivery of a workflows software stack and added value services, to be used as the basis for exploitation by the HPC centres in Europe.
- Pillar-specific Scientific objectives focused on the delivery of application workflows and workflow templates, that can be exploited by the current application stakeholders involved in the project and by the corresponding communities in its usage of HPC.
- Societal and Industrial objectives focused on the pre-commercial evaluation and validation of the project solutions and on the exploitation by communities of the project results.
CMCC leads WP5 Pillar II: Dynamic and adaptive workflows for climate modelling, which develops innovative adaptive workflows for climate and for the study of Tropical Cyclones (TC) in the context of the CMIP6 experiment, including in-situ analytics. This Work Package (Pillar II) aims to demonstrate the applicability and effectiveness of the eFlows4HPC approach and technologies with a set of key use cases in the climate modelling domain.
More specifically WP5 sets as objectives:
- Development of intelligent and novel end-to-end ensemble Earth System Modelling (ESM) workflows able to (i) rapidly adapt and evolve according to the dynamic conditions of the climate simulations, and (ii) make a better use of computational and storage resources by performing a smart (AI-driven) pruning of ensemble members (and releasing resources accordingly) at runtime.
- Seamless, transparent, and efficient integration of different components of the ESM workflow (from simulation to post-processing, HPDA and learning) into the same experiment, overcoming current gaps and barriers.
- Development of data-driven ML/DL models to help understand and grasp key features as well as to produce added-value products (i.e., Tropical Cyclones track) from the climate simulations.
- Investigation of post-processing versus in-situ approaches to study how they respond to the scientific needs, and to the use of computational and storage resources.
- Evaluation of data-intensive versus data-driven approaches with respect to key features (like TC track) by performing a scientific multidimensional analysis of (i) accuracy, (ii) uncertainty, (iii) time-to-solution, and (iv) energy consumption.
An agile methodology will be adopted to ensure the right level of flexibility during the entire software development process.
CMCC also contributes to:
- WP1 (Workflow interfaces for the integration of HPC, data analytics and ML),
- WP2 (Optimization of runtime and libraries, workflow deployment, resource management, and data management)
- WP3 (Co-design aspects between applications, software stack and actual hardware).
The workflows platform will consist of the following characteristics: support for the integration of HPC simulation and modelling, data analytics and machine learning; support for dynamic workflows that can change their behaviour during the execution; support for dynamic resource management depending on the actual workload needs; support for data-streaming; support for persistent storage beyond traditional file systems. The type of computing platforms considered will be centred on large HPC systems (PRACE tier-0 and tier-1 and EuroHPC pre-exascale systems) but also connected to external devices or instruments for data acquisition, including cloud-based solutions, like a data logistics service for the on-demand, self-service, automatic movement and pre-processing of data. The project will also devote efforts to consider the heterogeneity of the systems and how specialized architectures can be used in each of the Pillars’ use cases. A special case will be the study of the requirements of the Pillars’ use cases in the context of the new European Processor Initiative (EPI), by optimizing specific kernels or parts of applications to its architecture.
The project also aims to propose and develop the HPC Workflows as a Service (HPCWaaS) concept, as a means of widening the access to HPC from user communities. The goal is to provide methodologies and tools that enable sharing and reuse of existing workflows, and that assist when adapting workflow templates to create new workflow instances.
As the main outcome, the project will deliver the eFlows4HPC software stack which integrates different components to provide an overall workflow management system. One of the core functionalities of the software stack is the definition of the complex workflows that combine HPC, HPDA and ML frameworks and the integration of large volumes of data from different sources and locations.
On top of this software stack, the project will build an HPC Workflow as a Service (HPCWaaS) platform to facilitate the reusability of these complex workflows in federated HPC infrastructure. The goal is to provide methodologies and tools that enable sharing and reuse of existing workflows and that assist when adapting workflow templates to create new workflow instances.
The HPCWaaS platform and the eFlows4HPC software stack will be validated by use cases organised in three pillars which represent the main sectors that the project targets.
- CENTRE INTERNACIONAL DE METODES NUMERICS EN ENGINYERIA
- FORSCHUNGSZENTRUM JULICH GMBH
- UNIVERSITAT POLITECNICA DE VALENCIA
- BULL SAS
- DtoK Lab S.r.l.
- FONDAZIONE CENTRO EURO-MEDITERRANEOSUI CAMBIAMENTI CLIMATICI
- INSTITUT NATIONAL DE RECHERCHE ENINFORMATIQUE ET AUTOMATIQUE
- SCUOLA INTERNAZIONALE SUPERIORE DI STUDI AVANZATI DI TRIESTE
- INSTYTUT CHEMII BIOORGANICZNEJ POLSKIEJ AKADEMII NAUK
- UNIVERSIDAD DE MALAGA
- ISTITUTO NAZIONALE DI GEOFISICA E VULCANOLOGIA
- ALFRED-WEGENER-INSTITUT HELMHOLTZ-ZENTRUM FUR POLAR- UND MEERESFORSCHUNG
- EIDGENOESSISCHE TECHNISCHE HOCHSCHULE ZUERICH
- SIEMENS AKTIENGESELLSCHAFT
- STIFTELSEN NORGES GEOTEKNISKE INSTITUTT