The Earth System Grid Federation (ESGF) peer-to-peer (P2P) enterprise system is one of the largest-ever collaborative data efforts in Earth system science that develops, deploys and maintains software to facilitate advancements in geophysical science.
With its collection of independently funded national and international projects, ESGF manages the first ever decentralized database for accessing geophysical data at dozens of federated sites. Currently, the total ESGF archive manages over 7 petabytes of Earth system science datasets from about 40 projects; furthermore, it supports over 1,200,000 datasets from worldwide laboratories and universities.
The federation works across multiple worldwide data centers and spans several international network organizations to provide users with the ability to access, analyze and visualize data through a globally federated collection of networks, computers, and software. Its architecture employs a series of geographically distributed peer nodes that are independently administered and united by common federation protocols and application programming interfaces (APIs). The full ESGF infrastructure has been adopted by multiple Earth science projects and allows access to petabytes of geophysical data. These projects include the Coupled Model Intercomparison Project (CMIP), whose output is to be used in the upcoming Intergovernmental Panel on Climate Change’s (IPCC) Assessment Reports; multiple model intercomparison projects (MIPs) endorsed by the World Climate Research Programme (WCRP); and the Accelerated Climate Modeling for Energy (ACME) project, which leverages ESGF in its overarching workflow process to store model output. ESGF is a successful example of integrating disparate open-source technologies into a cohesive functional system that serves the needs of the global climate science community.
With the advent of server-side computing to reduce the amount of data transmission, the real question emerged: How does ESGF capture the appropriate metrics that reveal the true worth of the ESGF infrastructure as well as the data and projects it supports? The ESGF Dashboard automatically captures several useful metrics by today’s standards and provides a distributed and scalable monitoring framework responsible for capturing usage metrics at the single site level and at the global ESGF level. The Dashboard collects and stores a high volume of heterogeneous metrics, covering aggregated cross-project and project-specific download statistics as well as the status of the federated archive in terms of published datasets, models and institutes involved.
The system offers an analytics web interface enriched with a set of simple and attractive graphical widgets (e.g. charts, maps, reports), giving the users a comprehensive view of data usage statistics.
The ESGF Dashboard UI is deployed on a collector node at the CMCC Supercomputing Center and reachable here.
More metrics may be added as projects determine which measurements are needed for their community, sponsors, and stakeholders.
CMIP Data Node Operation Team (CDNOT) activities
The Coupled Model Intercomparison Project (CMIP), which completed phase 5 a few years ago and entered into phase 6, is expected to produce around 20 petabytes of scientific data for the analysis by climate scientists.
To face this challenge, starting from January 2018, the Dashboard team took part in the CDNOT activities, a sequence of data challenges with the aim of verifying the operational readiness of the federation and providing clear, concise and well-exercised instructions for CMIP6 data providers.
The CDNOT involved a restricted number of international institutes to make sure everyone is running an up-to-date, secure, and homogeneous software stack. Concerning the Dashboard activities, the team supported the node administrators during the test of the esgf-dashboard and esgf-stats-api modules to ensure an appropriate collection of information and provide accurate usage statistics on CMIP6 data both at single node and federation level.
Team Award, San Francisco 2017
Every year, the ESGF community is committed to determining who has performed exceptional or outstanding work in the development of community tools for the acceleration of climate science in the Earth System Grid Federation (ESGF) data science domain.
During the 2017 Face-to-Face meeting in San Francisco, “Alessandra Nuzzo, Maria Mirto, Paola Nassisi, and Sandro Fiore (CMCC, Italy) won a Team Award for developing the new ESGF Dashboard. […] The CMCC group addressed key challenges such as communicating the most important information in a straightforward way and allowing different users to view specific details simultaneously. Without their critical work in displaying automatic real-time data usage, scientists would have no clear way to determine the importance of their projects’ data contributions to the community.” (from the Face-to-Face Conference Report)