Biodiversity data: Data storage

Hot versus cold data storage

November 2024

Yvan Le Bras

Olivier Norvez

Nicolas Casajus

Animation coordinator
@PNDB  
@DataTerra  

Data engineer
@GBIF-France  

Data scientist
@FRB-CESAB    

Table of contents

Table of contents








Data storage : reminder of the context and the issues


Data Storage : hot versus cold storage


Resources


Data storage : Resources

Data storage : reminder of the context and the issues

Data storage : reminder of the context and the issues

  • Heterogeneity (data types, origin, standards) & diversity of “objects” to be linked together1
  • Loss of information over time2
  • Toward a better open science and reproducibility 3 4

Data storage : Hot versus cold data storage

Table 1: Differences and complementarities between hot and cold data storage. Source Geeks for Geeks
Aspect Hot Data Cold Data
Access Frequency Frequently accessed Infrequently accessed
Storage Type High-performance storage (e.g., SSDs, in-memory databases) Cost-effective storage (e.g., HDDs, magnetic tapes, cloud archives)
Retrieval Speed Fast retrieval required Slower retrieval acceptable
Use Case Real-time transactions, active user sessions, real-time analytics Historical records, backups, archived files
Cost Generally higher due to performance requirements Generally lower due to slower access speeds
Data management Requires optimization for speed and efficiency Focused on long-term storage and cost-efficiency

Data storage : Carbon impact

Data storage : Carbon impact

“How much energy is used in saving to the cloud? That’s a complicated question. It takes energy to get the data to the data center—miles of fiber optic cables, studded with other fixtures of internet infrastructure that all require power along the way. At the center, your data is stored multiple times on hard disks, and the constant activity of all those disks creates a lot of heat, which necessitates energy-intensive air conditioners to protect the equipment from overheating.” Asked by Mark Williams of Cambridge, U.K. to Justin Adamson - SAGE (Sound Advice for a Green Earth) Source Standford Magazine

“Various studies estimate them to be between 2.3 – 3.7 percent of global CO₂ emissions, which is equivalent to the emissions of the entire aviation industry” Source MyClimate.Org

The ecological dynamics we find ourselves in are not entirely a consequence of design limits, but of human practices and choices — among individuals, communities, corporations, and governments — combined with a deficit of will and imagination to bring about a sustainable Cloud. The Cloud is both cultural and technological. Like any aspect of culture, the Cloud’s trajectory — and its ecological impacts — are not predetermined or unchangeable. Like any aspect of culture, they are mutable.”Steven Gonzalez Monserrate is an anthropologist and a PhD candidate at MIT. Source MIT The Reader

Data storage : Carbon impact

Data storage : Carbon impact

  • WP2 : Infrastructure
    • carbon footprint

Ideally: bring the calculation closer to the {meta}data and not the other way around

Data Storage : resources

Data Storage : resources