Biodiversity data: Data Management Plan

Data Management Plan : a tool for working with data during the all data life cycle

November 2024

Yvan Le Bras

Olivier Norvez

Scientific and technical coordinator
@PNDB  
@DataTerra  

Animation coordinator
@PNDB  
@DataTerra  

Table of contents

Table of contents







Data Management Plan : reminder of the context and the issues


Process


Resources

The Data Management Plan (DMP) : reminder of the context and the issues

The DMP : reminder of the context and the issues

Heterogeneity (data types, origin, standards) &
Diversity of “objects” to be linked together
1

Loss of information over time2

The DMP : reminder of the context and the issues

Computational reproducibility frequently refers to the ability to generate equivalent analytical outcomes from the same data set using the same code and software1.

[…] all raw data and metadata, code, programming scripts, and bespoke software necessary for fully replicating any analyses that lead to inferences made in a published study2.

The DMP : reminder of the context and the issues

The DMP : process

The DMP : process

The Data Management Plan (DMP) is a management tool. It is presented in the form of a structured document into sections (or stapes). Its objective is to summarize the description and evolution of the “objects” (data sets, metadata, source codes, etc.) of your research project that you will generate over time.

It prepares the sharing, reuse and sustainability of the data. It is written during at the beginiing of the research project

From DORANum and adapted

The DMP : process

all projects supported by the ANR (or by the EU, …) must have a PGD.

A first version of the PGD is expected at T0+6 months after the scientific start of the project. This version will be updated mid-term (for projects of more than 30 months) as well as at the end of the project.

the DMP OPIDor tool is aligned with the ANR

The DMP : process

The content of a PGD must make it possible to provide information of this type :

  • High-level information (title of the PGD, date of modification, etc.) => Note that the PGD is a living document with different versions evolving over time between the pre-project and post-project.

  • Research priject description

  • Project funding description

  • DMP contact

  • DMP contributors

  • Data management costs

  • Datasets description

  • Datasets distribution

  • Data & metadata licences

  • Data hosting

  • Data sensibility a& security

  • Technical resources required for data production, storage and management

  • What metadata standard(s) is/are used to describe the data

The DMP : process

General recommendations :

  • The definition of licenses (especially data and also source codes), to be carried out as soon as possible (if possible in the first months of the project) because if it takes too long, it will be too late
    • We often see the mention of “creative commons will be used” which in fact does not indicate much. Therefore, it seems essential to me to specify whether CC-BY 4.0 (therefore compatible etalab v2.0 therefore open data) or other CC-BY-XX.. (therefore not open data) and especially not CC0 not authorized for research data
    • When a non-open license is proposed, I ask if it is assumed (already to be well informed of what is considered open and not open) -When the choice is assumed ;) I wonder if we can try to highlight the public funding aspects > 50% to try to provide a gentle reminder of the aspect of openness of public data which should normally be the rule
  • Specify asap the deadlines / embargo periods when mentioning a sharing deadline
  • Specify the proposed/expected storage “guarantees” (project duration/5/10/>10 years)

The DMP : process

INTEROPERABILITY

  • Specify, if existing information, according to the data, which catalogs and/or warehouses will be precisely used
  • The presentation of links between the “in-house” standards mentioned and international standards

SOURCES CODES

  • For the source codes, if known, please list the libraries / packages / dependencies and access / use modes

PEOPLE

  • Specify the function and structure of each person mentioned (if possible with ORCID)

The DMP : Minimal informations from “science europe DMP template”

6 descriptive sections, according to the ANR template

Data description and collection, or re-use of existing data

Documentation and data quality

Storage and backup during the research process

Legal and ethical requirements

Data sharing and long-term preservation

Data management responsibilities and resources

The DMP : Minimal informations from “science europe DMP template”

Data description and collection, or re-use of existing data

  • How will new data be collected or produced and/or how will existing data be re-used? (methodology, software, data sources …)

  • What data (for example the kind, formats, and volumes), will be collected or produced? (databases, spreadsheet, textual, images, sounds, …)

From ANR template

The DMP : Minimal informations from “science europe DMP template”

Data description and collection, or re-use of existing data

Documentation and data quality (process, calibration, standardization, …)

From ANR template

The DMP : Minimal informations from “science europe DMP template”

Data description and collection, or re-use of existing data

Documentation and data quality (process, calibration, standardization, …)

Storage and backup during the research process

  • How will data and metadata be stored and backed up during the research? (automatic backup, external storage, cloud, frequency, …)

  • How will data security and protection of sensitive data be taken care during the research

From ANR template

The DMP : Minimal informations from “science europe DMP template”

Data description and collection, or re-use of existing data

Documentation and data quality (process, calibration, standardization, …)

Storage and backup during the research process

Legal and ethical requirements

  • If personal data are processed, how will compliance with legislation on personal data and on security be ensured? (RGPD / GDPR, anonymisation, pseudonymisation, …)

  • How will other legal issues, such as intellectual property rights and ownership, be managed? What legislation is applicable? (Open Data, restriction, data owner(s), …)

  • What ethical issues and codes of conduct are there, and how will they be taken into account? (CARE principles, ethical council, …)

From ANR template

The DMP : Minimal informations from “science europe DMP template”

Data description and collection, or re-use of existing data

Documentation and data quality (process, calibration, standardization, …)

Storage and backup during the research process

Legal and ethical requirements

Data sharing and long-term preservation

  • How and when will data be shared? (data access, restrictions, embargo, archives, …)

  • How will data for preservation be selected, and where data will be preserved long-term? (repository, metadata standards, data repository or archive …)

  • What methods or software tools are needed to access and use data? (request handle, …)

  • How will the application of a unique and persistent identifier (such as a Digital Object Identifier (DOI)) to each dataset be ensured? (PID, …)

From ANR template

The DMP : Minimal informations from “science europe DMP template”

Data description and collection, or re-use of existing data

Documentation and data quality (process, calibration, standardization, …)

Storage and backup during the research process

Legal and ethical requirements

Data sharing and long-term preservation

Data management responsibilities and resources

  • Who (for example role, position, and institution) will be responsible for data management? (DMP responsible, data stewardship, …)

  • What resources will be dedicated to data management and ensuring that data will be FAIR? (data curation, data storage, financial and time cost, …)

From ANR template

The DMP : process

DMP OPIDoR is a tool to help create online data management plans (Data Management Plan or DMP) made available to Higher Education and Research. It is hosted and managed by Inist-CNRS. Based on the open source code DMPRoadmap, it has been adapted to the needs of the French scientific community.

Customizable it participates in the implementation of data policies of institutions or disciplinary communities by adding adapted DMP models and facilitates the harmonization of good community practices. => soon : an ecoinformatic dmp template ! ;)

Machine actionable it facilitates entry and interactions with the services involved in data management.

Templates search your funding support and/or your insitution templates

From OPIDor

The DMP : process

Before writing your DMP, check the existing DMP templates !

Table 1: Examples of DMP templates, from DMP OPIDor
Model name Name of the organization Type of organization Download
Science Europe machine actionnable DMP Europe Funders pdf
ANR - DMP template Agence Nationale de la Recherche Funders Word PDF
AgroParisTech - DMP template for Project AgroParisTech - Institut des sciences et industries du vivant et de l’environnement Institution Word PDF
CIRAD-TEMPLATE-ENG CIRAD Institution Word PDF
Horizon 2020 FAIR DMP European Commission Funders Word PDF
ERC DMP European Research Council, ERC Funders Word PDF
INRAE - Project template INRAE - Institut national de recherche pour l’agriculture l’alimentation et l’environnement Institution Word PDF
Sorbonne Université - Modèle Plateforme (in French) Sorbonne Université Institution Word PDF
University of Montpellier - DMP template (english) 2 Université de Montpellier Institution Word PDF

From the DMP OPIDor template sections and adapted

The DMP : resources

The DMP : resources