Biodiversity data: Data storage

Hot versus cold data storage

November 2025

Yvan Le Bras

Olivier Norvez

Nicolas Casajus

Animation coordinator
@PNDB  
@DataTerra  

Data engineer
@GBIF-France  

Data scientist
@FRB-CESAB    

Data lifecycle

Data lifecycle

Table of contents

Table of contents








Reminder of the context and the issues


Hot versus cold storage


Carbon impact


Resources

Data storage : reminder of the context and the issues

Data storage : reminder of the context and the issues

  • Heterogeneity (data types, origin, standards) & diversity of “objects” to be linked together1
  • Loss of information over time2
  • Toward a better open science and reproducibility 3 4

Data storage : Hot versus cold data storage

Data storage : Hot versus cold data storage

Table 1: Differences and complementarities between hot and cold data storage. Source Geeks for Geeks
Aspect Hot Data Cold Data
Access Frequency Frequently accessed Infrequently accessed
Storage Type High-performance storage (e.g., SSDs, in-memory databases) Cost-effective storage (e.g., HDDs, magnetic tapes, cloud archives)
Retrieval Speed Fast retrieval required Slower retrieval acceptable
Use Case Real-time transactions, active user sessions, real-time analytics Historical records, backups, archived files
Cost Generally higher due to performance requirements Generally lower due to slower access speeds
Data management Requires optimization for speed and efficiency Focused on long-term storage and cost-efficiency

Data storage : hot storage and calculation

Data storage : hot storage and calculation

The GAIA Data project aims to develop and implement an integrated and distributed platform of services and data for the observation, modeling and understanding of the Earth system, biodiversity and the environment


 Carbon footprint Ideally: bring the calculation closer to the {meta}data and not the other way around

Data storage : cold storage

Data storage : cold storage

Table 1: Examples of biodiversity and environmental data repositories, from Ouvrir la science (2024) and adapted
Repositories names Supporting by thematic, institutional, generic disciplinary fields Accepted data (keywords) embargo Persistent identifier Volume limit
InDoRES CNRS-Ecology, MNHN thematic ( and institutional) Ecology, Environment, Bio-archaeology Environmental, ecological and geographical data yes DOI 2 GB per data set but planned to increase to 4 or 5 GB soon
EaSy Data Data Terra, BRGM thematic Earth and Environmental Sciences Long tail data from the earth system and environment (example: project issues) yes (2 years max.) DOI 5 GB per file, 100 GB per deposit. Possibility to make the request if larger volume
SEANOE Ifremer thematic (and institutional) Oceanography Georeferenced marine data yes (2 years max.) DOI 100 GB
Data SUD IRD institutional all fields covered by IRD agents ??? ??? DOI ???
GBIF the international GBIF community thematic Life sciences, Biodiversity, Animal biology, Plant biology, Ecology, Environment; Ecosystems Taxa, occurrence data, sampling data, all standardized according to Darwin core or ABCD standards. yes DOI no
Recherche Data Gouv Recherche Data Gouv generic all fields all yes DOI ???

Data storage : Carbon impact

Data storage : Carbon impact

“How much energy is used in saving to the cloud? That’s a complicated question. It takes energy to get the data to the data center—miles of fiber optic cables, studded with other fixtures of internet infrastructure that all require power along the way. At the center, your data is stored multiple times on hard disks, and the constant activity of all those disks creates a lot of heat, which necessitates energy-intensive air conditioners to protect the equipment from overheating.” Asked by Mark Williams of Cambridge, U.K. to Justin Adamson - SAGE (Sound Advice for a Green Earth) Source Standford Magazine

“Various studies estimate them to be between 2.3 – 3.7 percent of global CO₂ emissions, which is equivalent to the emissions of the entire aviation industry” Source MyClimate.Org

The ecological dynamics we find ourselves in are not entirely a consequence of design limits, but of human practices and choices — among individuals, communities, corporations, and governments — combined with a deficit of will and imagination to bring about a sustainable Cloud. The Cloud is both cultural and technological. Like any aspect of culture, the Cloud’s trajectory — and its ecological impacts — are not predetermined or unchangeable. Like any aspect of culture, they are mutable.”Steven Gonzalez Monserrate is an anthropologist and a PhD candidate at MIT. Source MIT The Reader

Data storage : Carbon impact

Data Storage : resources

Data Storage : resources