Data formats

Structure, formats, files & extensions

November 2024

Nicolas Casajus

Senior data scientist
@FRB-CESAB    

 File formats

Reading data files

 Plain text-based files


Comma-separated values (.csv)

Tab-separated values (.tsv & .txt)

Suggested functions:

  • read.csv() or read.csv2()
  • read.table()
  • read.delim()

 See also the package readr

Suggested functions:

  • read.table()
  • read.delim()

 See also the package readr

Reading data files

 Plain text-based files


JavaScript Object Notation (.json)

Yet Another Markup Language (.yml & .yaml)

Suggested function:

  • jsonlite::fromJSON()

Suggested function:

  • yaml::read_yaml()


Extensible Markup Language (.xml)

Hypertext Markup Language (.html)

Suggested function:

  • xml2::read_xml()

Suggested function:

  • xml2::read_html()

Reading data files

 Plain text-based files


Other plain text files (.txt, etc.)

Suggested function:

  • readLines()

Reading data files

 Spreadsheets


Microsoft Excel (.xls & .xlsx)

Google Sheets (online)

Suggested functions:

  • readxl::read_excel()
  • readxl::read_xls()
  • readxl::read_xlsx()

Suggested function:

  • gsheet::gsheet2tbl()


OpenDocument Spreadsheet (.ods)

Suggested functions:

  • readODS::read_ods()

Reading data files

   binary files


 saved objects (.Rdata & .rds)

Quick Serialization of  Objects (.qs)

Suggested functions:

  • load()
  • readRDS()

Suggested function:

  • qs::qread()

Reading data files

 Images (rasters)


Joint Photographic Experts Group (.jpg & .jpeg)

Portable Network Graphics (.png)

Suggested function:

  • jpeg::readJPEG()

Suggested function:

  • png::readPNG()


Tagged Image File Format (.tif & .tiff)

Suggested function:

  • tiff::readTIFF()

Reading data files

 Spatial data files


Vector layers (.shp, .geojson, .gpkg, etc.)

Raster layers (.tif, .asc, .grd, etc.)

Suggested functions:

  • sf::st_read()
  • terra::vect()

Suggested function:

  • terra::rast()


Network Common Data Form - NetCDF (.nc)

Suggested functions:

  • ncdf4::nc_open()
  • terra::rast()

Reading data files

 SQL databases


MySQL

PostgreSQL

Suggested packages: DBI & RMySQL


## Connect to MySQL database ----
con <- DBI::dbConnect(RMySQL::MySQL(), dbname = ...)

Suggested packages: DBI & RPostgreSQL


## Connect to PostgreSQL database ----
con <- DBI::dbConnect(RPostgreSQL::PostgreSQL(), dbname = ...)


SQLite (.sql)

Suggested packages: DBI & RSQLite


## Connect to SQLite database ----
con <- DBI::dbConnect(RSQLite::SQLite(), dbname = ...)

Reading data files

 Other formats


Portable Document Format (.pdf)

BibTex (.bib)

Suggested function:

  • pdftools::pdf_text()

Suggested function:

  • bib2df::bib2df()

Reading data files

 Compressed archives


ZIP files (.zip)

TAR files (.tar)

Suggested function:

  • unzip()

Suggested function:

  • untar()

Encoding



 Use the function readr::guess_encoding() to find the good encoding.