Research Unit Leader:
Objective
The main goal of this macro-activity concerns with the design and implementation of open source solutions addressing efficient access, analysis and mining of scientific data in the climate change domain. In particular, the activities focus on the management of distributed (geographically spread) data in the context of the international ESGF (Earth System Grid Federation) and CMIP5 initiatives, the management of data banks applied to scientific data to identify novel storage models and efficient parallel I/O libraries and the knowledge discovery from data, which means inferring new knowledge from large volumes of scientific data.
Activities
- Distributed scientific data management in large scale environments
The main goal of this activity is the trasparent, secure and efficient distributed management of large volumes of data on a geographical scale. In particular the activity focuses on the management of distributed data in the context of the ESGF (Earth System Grid Federation) initiative. In this regard, part of the work concerns with relevant extensions to the data node component, related to the proactive distributed monitoring system foreseen in the ESGF P2P system architecture.
- Storage models and parallel I/O applied to scientific data
This activity aims at studying, analyzing and designing novel storage models related to scientific data in the climate change context, with special regard to the NetCDF format. Through the definition of these new storage models for the management of climate change data (to be implemented on HPC platforms and by means of the adoption of parallel paradigms such as MPI and OpenMP), this research activity aims at optimizing the efficiency related to the data access (through new I/O primitives), as well as to the storage space allocation.
- Knowledge Discovery from Data (KDD) applied to scientific data
This activity aims at inferring knowledge from large volumes of data. Starting from the access primitives defined in the activity “Storage models and parallel I/O applied to scientific data”, it defines and implements new interfaces (“data operators”) to carry out analysis and mining applied to multidimensional data in the climate change context. The design of the KDD platform keeps into account the evolution of the ESGF architecture to study convergences and possible integrations.