Sharing code & tools

Research compendium, R Package & Shiny App

October 2024

Nicolas Casajus

Senior data scientist
@FRB-CESAB    

Table of contents

Table of contents





Introduction

Research compendium

R package



Shiny App

  Introduction

Introduction

  • You released a new database (with metadata)
  • You published a data paper
  • And you wrote a lot of code…

Introduction

  • You released a new database (with metadata)
  • You published a data paper
  • And you wrote a lot of code…


  Why not sharing your code?

Introduction

  • You released a new database (with metadata)
  • You published a data paper
  • And you wrote a lot of code…


  Why not sharing your code?


Share your code to reproduce your pipeline

Introduction

  • You released a new database (with metadata)
  • You published a data paper
  • And you wrote a lot of code…


  Why not sharing your code?


Share your code to access data

Introduction

  • You released a new database (with metadata)
  • You published a data paper
  • And you wrote a lot of code…


  Why not sharing your code?


Share your code to add new data

Code hosting platforms

GitHub and co are cloud-based git repository hosting services

  Perfect solutions to host projects (code) tracked by git


Services

  • Full integration of version control (commits, history, differences)
  • Easy collaboration w/ branches, forks, pull requests
  • Issues tracking system
  • Enhanced documentation rendering (README, Wiki)
  • Static website hosting
  • Automation & monitoring (CI/CD)



Code hosting platforms

Main platforms


GitHub
Microsoft


GitLab
Open source





 GitHub account: https://github.com/ahasverus/

 GitHub organization: https://github.com/frbcesab/

  Research compendium

Research compendium

The goal of a research compendium is to provide a standard and easily recognisable way for organizing the digital materials of a project to enable others to inspect, reproduce, and extend the research.

Marwick B, Boettiger C & Mullen L (2018)1



Three generic principles


Files organized according to the conventions of the community

Clear separation of data, method, and output

Specify the computational environment that was used


 A research compendium should be self-contained

Research compendium

 Strong flexibility in the structure of a compendium

Small compendium

.
├─ .git/
├─ .gitignore
│
├─ project.Rproj
│ 
├─ data/ 🔒
│ 
├─ code/
│  └─ script.R
│ 
├─ outputs/
│ 
├─ LICENSE
└─ README.md

Research compendium

 Strong flexibility in the structure of a compendium

Small compendium

.
├─ .git/
├─ .gitignore
│
├─ project.Rproj
│ 
├─ data/ 🔒
│ 
├─ code/
│  └─ script.R
│ 
├─ outputs/
│ 
├─ LICENSE
└─ README.md

Medium compendium

.
├─ .git/
├─ .gitignore
│
├─ project.Rproj
│
├─ data/
│  ├─ raw-data/ 🔒
│  └─ derived-data/
│
├─ R/
│  ├─ function-x.R
│  └─ function-y.R
│
├─ analyses/
│  ├─ script-1.R
│  └─ script-n.R
│
├─ outputs/
│
├─ make.R
│
├─ DESCRIPTION
├─ LICENSE
└─ README.md

Research compendium

 Strong flexibility in the structure of a compendium

Small compendium

.
├─ .git/
├─ .gitignore
│
├─ project.Rproj
│ 
├─ data/ 🔒
│ 
├─ code/
│  └─ script.R
│ 
├─ outputs/
│ 
├─ LICENSE
└─ README.md

Medium compendium

.
├─ .git/
├─ .gitignore
│
├─ project.Rproj
│
├─ data/
│  ├─ raw-data/ 🔒
│  └─ derived-data/
│
├─ R/
│  ├─ function-x.R
│  └─ function-y.R
│
├─ analyses/
│  ├─ script-1.R
│  └─ script-n.R
│
├─ outputs/
│
├─ make.R
│
├─ DESCRIPTION
├─ LICENSE
└─ README.md

Large compendium

.
├─ .git/
├─ .gitignore
├─ .github/
│  └─ workflows/
│     ├─ workflow-1.yaml
│     └─ workflow-n.yaml
│
├─ project.Rproj
│
├─ .renv/
├─ renv.lock
│
├─ dockerfile
├─ .dockerignore
│
├─ data/
│  ├─ raw-data/ 🔒
│  └─ derived-data/
│
├─ R/
│  ├─ function-x.R
│  └─ function-y.R
│
├─ analyses/
│  ├─ script-x.R
│  └─ script-n.R
│
├─ outputs/
│
├─ figures/
│
├─ paper/
│  ├─ references.bib
│  ├─ style.csl
│  └─ paper.Rmd
│
├─ make.R
│
├─ DESCRIPTION
├─ CITATION.cff
├─ CODE_OF_CONDUCT.md
├─ CONTRIBUTING.md
├─ LICENSE
└─ README.md

README please

A README is a text file that introduces and explains your project

  • each research compendium should contain a README
  • you can write different README (project, data, etc.)

README please

A README is a text file that introduces and explains your project

  • each research compendium should contain a README
  • you can write different README (project, data, etc.)


 GitHub and other code hosting platforms recognize and interpret README written in Markdown (README.md)

README please

A README is a text file that introduces and explains your project

  • each research compendium should contain a README
  • you can write different README (project, data, etc.)


 GitHub and other code hosting platforms recognize and interpret README written in Markdown (README.md)

README please

A good README should answer the following questions1:

  • Why should I use it?
  • How do I get it?
  • How do I use it?

README please

A good README should answer the following questions1:

  • Why should I use it?
  • How do I get it?
  • How do I use it?

Main sections (for a research compendium)

  • Title
  • Description
  • Content (file organization)
  • Prerequisites
  • Installation
  • Usage
  • License
  • Citation
  • Acknowledgements
  • References

  R package

What’s an R Package?

In   the fundamental unit of shareable code is the package. A package bundles together code, data, documentation, and tests, and is easy to share with others.

Hadley Wickham - R packages (1st ed.)


An   package:

  • is a collection of well-documented functions
  • makes your work more reproducible
  • makes your code useful for you and for others


As of today (2024-10-31):

  • 21585 packages are available on the CRAN
  • 2289 packages on Bioconductor

Development workflow

Package structure

A package contains two main components:

  • a DESCRIPTION file with package metadata
  • a folder R/ with documented functions


.
├─ R/
│  └─ fun.R
│ 
└─ DESCRIPTION

Package structure

A package contains two main components:

  • a DESCRIPTION file with package metadata
  • a folder R/ with documented functions


.
├─ R/
│  └─ fun.R
│ 
└─ DESCRIPTION


devtools::document()

.
├─ R/
│  └─ fun.R
│ 
├─ man/
│  └─ fun.Rd
│ 
├─ NAMESPACE
│ 
└─ DESCRIPTION


The function devtools::document() automatically generates a folder man/ (function documentation) and the NAMESPACE file.

What’s a function?

A function is a block of code organized together to perform a specific task and only runs when it is called. It can have parameters and can return a result.


 Automate common and repetitive tasks


Advantages1

  • You can give a function an evocative name that makes your code easier to understand.
  • As requirements change, you only need to update code in one place, instead of many.
  • You eliminate the chance of making incidental mistakes when you copy and paste.
  • It makes it easier to reuse work from project-to-project, increasing your productivity over time.

Creating a function

## Function definition ----

function_name <- function(input) {
  
  # Code block
  # Code block
  # Code block
  
  return(output)
}
  • A function is defined by calling function()
  • A function should have an explicit name
  • A function can have 0, 1 or many parameters (inputs)
  • A function can return a value (output)


Defining a function

## Arithmetic mean ----

arithmetic_mean <- function(x) {
  
  y <- sum(x) / length(x)
  
  return(y)
}

Creating a function

## Function definition ----

function_name <- function(input) {
  
  # Code block
  # Code block
  # Code block
  
  return(output)
}
  • A function is defined by calling function()
  • A function should have an explicit name
  • A function can have 0, 1 or many parameters (inputs)
  • A function can return a value (output)


Defining a function

## Arithmetic mean ----

arithmetic_mean <- function(x) {
  
  y <- sum(x) / length(x)
  
  return(y)
}
## Simplification ----

arithmetic_mean <- function(x) {
  
  sum(x) / length(x)
}


Calling the function

## Arithmetic mean ----

arithmetic_mean(x = c(4, 6, 5, 10))
[1] 6.25
## Comparison ----

mean(x = c(4, 6, 5, 10))
[1] 6.25

Documenting function

  • Specially-structured comments preceding each function definition
  • Lightweight syntax easy to write and to read
  • Syntax: #' @field value
  • Keep function definition and documentation in the same file
  • Automatically write .Rd files (in man/) and NAMESPACE

 Get started w/ roxygen2: here


#' Compute the arithmetic mean
#'
#' This function computes the arithmetic mean of a numeric variable.
#'
#' @param x a `numeric` vector
#'
#' @return A `numeric` value representing the arithmetic mean of `x`.
#'
#' @export
#'
#' @examples
#' x <- 1:10
#' arithmetic_mean(x)

arithmetic_mean <- function(x) {
  
  sum(x) / length(x)
}

Documenting function

  • Specially-structured comments preceding each function definition
  • Lightweight syntax easy to write and to read
  • Syntax: #' @field value
  • Keep function definition and documentation in the same file
  • Automatically write .Rd files (in man/) and NAMESPACE

 Get started w/ roxygen2: here


#' Compute the arithmetic mean
#'
#' This function computes the arithmetic mean of a numeric variable.
#'
#' @param x a `numeric` vector
#'
#' @return A `numeric` value representing the arithmetic mean of `x`.
#'
#' @export
#'
#' @examples
#' x <- 1:10
#' arithmetic_mean(x)

arithmetic_mean <- function(x) {
  
  sum(x) / length(x)
}



Then, run devtools::document() to automatically generate .Rd files in man/ and the NAMESPACE file

The DESCRIPTION file

Main component of an package, the DESCRIPTION file describes package metadata.


Package: nameofthepackage
Type: Package
Title: The Title of the Package
Authors@R: c(
    person(given   = "John",
           family  = "Doe",
           role    = c("aut", "cre", "cph"),
           email   = "john.doe@domain.com",
           comment = c(ORCID = "9999-9999-9999-9999")))
Description: A paragraph providing a full description of 
    the package.
License: GPL (>= 2)


 External packages required by the package will be listed in this file.

Example


Database hosted on
Zenodo

Data paper published in
Scientific Data

Package hosted on GitHub
(coming soon on the CRAN)

Software paper submitted at
Methods in Ecology and Evolution

Must-read resources

  Shiny App

shiny package

Shiny is an package that makes it easy to build interactive web applications (apps) straight from .

Source: Mastering Shiny


Features

  • Provides a curated set of user interface (UI) functions that generate the HTML, CSS, and JavaScript needed for common tasks.

     No knowledge of HTML, CSS, or JavaScript required


  • Introduces a new style of programming called reactive programming which automatically tracks the dependencies of pieces of code.

     Automatically update output if input changes

shiny package


Available at: https://github.com/rstudio/shiny/

Structure of a Shiny App

A Shiny app is contained in a single script called app.R and has three components:

  • a ui (user interface) object
  • a server() function
  • a call to the shinyApp() function


## Required package ----

library(shiny)


## User interface ----

ui <- *(
  ...
)


## Server component ----

server <- function(input, output) {
  ...
}


## Create Shiny app object ----

shinyApp(ui = ui, server = server)

Structure of a Shiny App

A Shiny app is contained in a single script called app.R and has three components:

  • a ui (user interface) object
  • a server() function
  • a call to the shinyApp() function


## Required package ----

library(shiny)


## User interface ----

ui <- *(
  ...
)


## Server component ----

server <- function(input, output) {
  ...
}


## Create Shiny app object ----

shinyApp(ui = ui, server = server)





## Launch the Shiny app ----

runApp()

UI Components


 More information here

UI Layouts


 More information here

Reactive programming

Graph of dependencies


  • User interacts with UI inputs (click a button, enter text, select an option, etc.)
  • The server handles input changes and modifies the output value
  • The server updates the UI output

Minimal Shiny app

## User interface ----

ui <- fluidPage(

    # Application title
    titlePanel("Old Faithful Geyser Data"),

    # Sidebar with a slider input for number of bins 
    sidebarLayout(
        sidebarPanel(
            sliderInput("bins",
                        "Number of bins:",
                        min   = 1,
                        max   = 50,
                        value = 30
            )
        ),

        # Show a plot of the generated distribution
        mainPanel(
           plotOutput("distPlot")
        )
    )
)
## Server logic ----

server <- function(input, output) {

    output$distPlot <- renderPlot({
      
        # Generate bins based on input$bins
        x    <- faithful[, 2]
        bins <- seq(min(x), max(x), length.out = input$bins + 1)

        # Draw the histogram with the specified number of bins
        hist(x, breaks = bins, col = 'darkgray', border = 'white',
             xlab = 'Waiting time to next eruption (in mins)',
             main = 'Histogram of waiting times')
    })
}


## Create the application ----

shinyApp(ui = ui, server = server)


 RStudio IDE: New Project > New Directory > Shiny Application

Examples

Resources

 Shiny website


Thanks