Reproducibility

Definition, concepts & tools

October 2024

Nicolas Casajus

Senior data scientist
@FRB-CESAB    

What is reproducibility?


Reproducibility is about results that can be obtained by someone else (or you in the future) given the same data and the same code. This is a technical problem.


 We talk about Computational reproducibility

Why does it matter?

An article about computational results is advertising, not scholarship. The actual scholarship is the full software environment, code and data, that produced the result.

Claerbout & Karrenbach (1992)1


Why does it matter?

An article about computational results is advertising, not scholarship. The actual scholarship is the full software environment, code and data, that produced the result.

Claerbout & Karrenbach (1992)1


Reproducibility has the potential to serve as a minimum standard for judging scientific claims (…).

Peng (2011)2

Why does it matter?

An article about computational results is advertising, not scholarship. The actual scholarship is the full software environment, code and data, that produced the result.

Claerbout & Karrenbach (1992)1


Reproducibility has the potential to serve as a minimum standard for judging scientific claims (…).

Peng (2011)2


 Sharing the code and the data is now a prerequisite for publishing in many journals

Reproducibility spectrum


Source: Peng (2011)1


Each degree of reproducibility requires additional skills and time. While some of those skills (e.g. literal programming, version control, setting up environments) pay off in the long run, they can require a high up-front investment.

Concepts

According to Wilson et al. (2017)1, good practices for a better reproducibility can be organized into the following six topics:




 Data management

 Project organization

 Tracking changes


 Collaboration

 Manuscript

 Code & Software

Tools





Website available at: https://rdatatoolbox.github.io/