The Advanced Scientific Computing (ASC) division carries out R&D activities on Computational Science applied to the Climate Change domain. In particular, it focuses on the optimization of numerical models on HPC architectures (High End Computing – HEC), (ii) the management, analysis and mining of large volumes of scientific data looking forward at exascale scenarios (Data Science and Learning – DSL), (iii) user-friendly interfaces, workflows and applications (Usable Software and Systems – USS), and (iv) research on innovative digital platforms and tools for the delivery of new services in different sectors, such as agriculture, climate, disaster risk reduction, oceanography, water management, etc. (Production Platforms for Operational Services (PPOS).
- Optimization and parallelization of numerical models for climate change simulations (both climate and impacts models);
- Design and implementation of open source Data Science and Learning solutions addressing efficient access, analysis and mining of scientific data in the climate change domain;
- Design and development of innovative digital platforms and tools, based on the integration of state-of-the-art and cutting-edge ICT technologies;
- Design and development of applications and visual analytics tools coupling interactive graphical representations with underlying analytical processes and workflows.
IS-ENES is a key research infrastructure for climate modelling It is actually the distributed infrastructure of the European Network for Earth System modelling (ENES) that serves the European...
The project EXDCI-2 builds upon the success of EXDCI and will continue the coordination of the HPC ecosystem with important enhancements to better address the convergence of big data, cloud and...
INDIGO-DataCloud: a Platform to Facilitate Seamless Access to E-Infrastructures
Salomoni D., Campos I., Gaido L., Marco de Lucas J., Solagna P., Gomes J., Matyska L., Fuhrman L., Hardt M., Donvito G., Dutka L., Plociennik M., Barbera R., Blanquer I., Ceccanti A., Cetinic E., David M., Duma C., López-García A., Moltó G., Orviz P., Sustr Z., Viljoen M., Aguilar F., Alves L., Antonacci M., Antonelli L. A., Bagnasco S., Bonvin A. M. J. J., Bruno R., Chen Y., Costa A. , Davidovic D., Ertl B., Fargetta M., Fiore S., Gallozzi S., Kurkcuoglu Z., Lloret L., Martins J. Nuzzo A., Nassisi P., Palazzo C., Pina J., Sciacca E., Spiga D., Tangaro M., Urbaniak M., Vallero S., Wegh B., Zaccolo V., Zambelli F., Zok T.
2018, Journal of Grid Computing, Volume 16, Issue 3, pp 381–408, DOI: https://doi.org/10.1007/s10723-018-9453-3, web page
Towards an Open (Data) Science Analytics-Hub for Reproducible Multi-Model Climate Analysis at Scale
Fiore S., Elia D., Palazzo C., D'Anca A., Antonio F., Williams D.N., Foster I.,Aloisio G.
2018, Proceedings of 2018 IEEE International Conference on Big Data (Big Data), Seattle, WA, USA, 10-13 December 2018, pp. 3226-3234, DOI: https://doi.org/10.1109/BigData.2018.8622205, web page
GIS and Data: Three applications to enhance Mobility
Alic A.S., Almeida J.M., Meira Jr. W., Guedes D.O., dos Santos W., Blanquer i.,Fiore S., Kozievitch N.P., Andrade N., Braz T., Brito A., Pires C.E.S., Antunes N., Vieira M., Silva P., Ardagna D., Fonseca K.V.O., Lezzi D.,Elia D., Moraes R., Basso T., Cavassin W.H.,
2018, Proceedings of the XIX Brazilian Symposium on Geoinformatics, pp. 1-12, DOI: AlicAMGSBFKABBP18, web page
Via Augusto Imperatore, 16 – 73100 LECCE, Italy
Research Unit Leader: Italo Epicoco
The objectives of this research unit are:
– the optimization and parallelization of numerical models for climate change simulations (both climate and impacts models);
– the evaluation of how new emergent technologies impact on the implementation and the design of the current climate models;
– the innovation of the main computational kernels by means of a re-design at numerical, algorithmic and software level.
– Models optimization and parallelization
This activity supports all CMCC divisions and aims at the use of “co-design” techniques for the optimization of the climate models used at CMCC. Moreover, the activity focuses on the analysis of the main computational kernels featured in the models. The goal is to optimize kernels on “multi-cores” hybrid architectures by considering the performance impacts of optimized compilers, numerical libraries and new parallel paradigms.
– Re-design of computational kernels towards Exascale
This activity focuses on the study of the impact of exascale computing architectures on the numerical algorithms used in the main climate models studied at CMCC. In particular, the following aspects are analyzed in exascale terms: (I) the optimization of Earth System coupled Models; (II) the use of advanced parallel algorithmic structures to reduce the current “dynamical cores” communication overhead; (III) the optimal management of “multi-model” and “multi-emission” ensemble experiments; (IV) the analysis of the “super-parameterization” approaches used to solve the CRM (Cloud Resolving Models) at highest resolution; (V) the optimal management of the memory system and its hierarchies; (VI) the rationalization of I/O operations; (VII) the fault-tolerance management; (VIII) the adoption of new communication parallel paradigms and parallel tasks synchronization mechanisms. This activity is carried out under the IESP (International Exascale Software Programme) and EESI (European Exascale Software Initiative) international initiatives.
Research Unit Leader: Sandro Fiore
The main goal of this Research Unit concerns the design and implementation of Data Science and Learning open source solutions addressing efficient access, analysis and mining of scientific data in the climate change domain. In particular, the activities focus on (i) the management of scientific data in major international contexts/initiatives like the ENES Climate Data Infrastructure, the Earth System Grid Federation, and the European Open Science Cloud, (ii) the definition of new storage models to enable efficient access to climate data (also including parallel I/O approaches), and (iii) the development of advanced Data Science environments for climate scientists leveraging High Performance Data Analytics solutions, workflow tools as well as machine/deep learning frameworks to accelerate scientific discovery.
– Scientific data management in large-scale environments.
The main goal of this activity is the transparent, secure and efficient management of large volumes of scientific data on a geographical scale. In particular, the activity focuses on the design and implementation of software components that can represent core building blocks in several major international contexts like the ENES Climate Data Infrastructure, the Earth System Grid Federation and the European Open Science Cloud.
– Storage models and parallel I/O applied to scientific data.
This activity aims at the study, analysis and design of novel storage models related to scientific data in the climate change context. Through the definition of advanced/novel storage models for the management of climate change data (to be implemented on HPC platforms and by means of the adoption of parallel paradigms such as MPI and OpenMP), this research activity is aimed at optimizing the efficiency related to data access (parallel I/O) as well as to storage management.
– High Performance Data Analytics.
Peta-exascale data requires a different workflow based on High Performance Data Analytics facilities close to data storage and server-side analysis capabilities. Such an approach will reduce the downloaded data, the makespan for the analysis task, and the complexity related to the analysis software to be installed on client machines. This activity will look into the big data and HPC convergence, thus exploring innovative and scalable approaches towards large-scale data analysis experiments.
– Delivering Petabyte data handling power to scientists’ desks.
The exponential growth in the amount of data produced/available is opening very challenging scenarios for scientific investigation. However, only a fraction of the power of the data is currently exploited by scientists, because even simple data manipulation tasks can be cumbersome due to the involved data volume. This research activity will address the delivery of a high-productivity Data Science environment for researchers hiding the complexity of HPC and big data back-end, targeting multiple aspects of data management, including interactive and exploratory data analysis as well as visualization.
– Knowledge Discovery and Learning from Data
It is a major challenge faced nowadays by scientists, due to the vast amount of data available from climate simulations, sensors, satellites, etc. Distilling knowledge from this data represents a strong computational challenge, which calls for new approaches and opens scenarios where AI applied to large amount of data can be of great help for innovative approaches to support simulations, analysis, workflows. To this end, data mining and machine/deep learning techniques/algorithms will be developed and integrated into novel AI-enabled data platforms to support the discovery of previously unseen patterns and relationships from very large climate datasets.
Research Unit Leader: Marco Mancini
The Research Unit on “Production Platforms for Operational Services” (PPOS) focuses on the research and development of innovative ICT digital platform, based on the integration of state-of-the-art information and communication technologies (Cloud computing, container orchestration, Internet of Things, advanced data acquisition/management and analytics, blockchain, AI).
The main aim of the research group is the development and deployment of digital platforms and tools to support production-level operational environments and eco-systems at CMCC in different sectors (agriculture, climate, disaster risk reduction, oceanography, water management, etc.). The development of advanced ICT assets will allow a faster deployment of operational services, making their maintenance easier and addressing at the same time common issues regarding operational services.
Research activities are carried out by considering interoperability, international standards, and open source solutions to provide production-ready platforms and tools that take high availability, fault tolerance, business continuity, and disaster recovery into account.
PPOS has been actively contributing, collaborating and integrating state-of-the-art ICT open-source projects such as OpenNebula (https://opennebula.org/), Rancher (https://rancher.com), iRODS (https://irods.org), Thingsboard (https://thingsboard.io), which represent production-ready platforms for Cloud computing, containers orchestration, data management and Internet of Things.
This research unit focuses on the development of user-driven applications and software components in the context of climate change, with particular reference to portals, desktop and mobile applications, workflows and notebooks.
The main activities are related to: i) visual analytics tools and services, coupling interactive graphical representations with underlying analytical processes (e.g. statistical procedures, data mining techniques); ii) distributed information systems within the Earth System Grid Federation (ESGF); iii) Geographical Information Systems (GISs) for landscape structure, diversity and land-use analysis; iv) user-centered scientific workflows and gateways.
– Visual and analytical tools, user interfaces and applications.
This activity focuses on the design and the implementation of i) user-oriented tools, portals and applications to provide the final users with better evaluation analysis and strong user experience, ii) mobile-based crowd sensing applications towards large-scale user engagement for applications with a high societal impact, iv) Geographical Information Systems (GIS) applications to tackle advanced spatial data analysis challenges and iv) advanced dashboard-based UI to address monitoring, reporting, etc.
– Distributed information systems.
This activity regards the management of distributed (geographically spread) data in the context of the international Earth System Grid Federation with the leadership of the Dashboard Working Team and the collaboration with the LLNL (Lawrence Livermore National Laboratory) and ENES community to collect and validate the requirements and the developments of the data usage metrics in the federation.
– End-to-end workflow automation and management.
The main aim of this activity relates to the development of fault tolerant workflow automation tools and applications for the optimized scheduling of large number of tasks on HPC infrastructures. In particular, specific analytics workflows are designed to support weather and climate use cases by also using AI-enabled end-to-end approaches to support numerical simulations.
High Performance Data Mining & Analytics for eScience
Ophidia is a CMCC Foundation research project addressing big data challenges for eScience. It provides support for data-intensive analysis exploiting advanced parallel computing techniques and smart data distribution methods. It exploits an array-based storage model and a hierarchical storage organisation to partition and distribute multidimensional scientific datasets over multiple nodes. The Ophidia analytics framework can be exploited in different scientific domains (e.g. Climate Change, Earth Sciences, Life Sciences) and with very heterogeneous sets of data