Any online platform (website) which supports users in accessing collections of open data. They can be governmental (e.g. data.gouv.fr), or from organizations, NGOs, or from an individual initiative.
You access data in a web browser (by clicking)
Click on a ready-to-download file (e.g. GADM)
Fill a form to download a user-specific file (e.g. GBIF)
Fill a form to get data through a URL (e.g. BirdLife)
Sometimes, you need to registrer (e.g. TRY)
Case study
You’re doing some species distribution models for metropolitan France. You already have a list of species and their occurrences in space and time.
Now, you need to retrieve:
Spatial boundaries of France regions to map the species occurrences
Climate data (temperature and precipitation) to fit models
Access data - France regions
The GADM data portal is a good option to get spatial boundaries of any country in the World at different administrative levels.
## Driver: GPKG
## Available layers:
## layer_name geometry_type features fields crs_name
## 1 ADM_ADM_0 Multi Polygon 1 2 WGS 84
## 2 ADM_ADM_1 Multi Polygon 13 11 WGS 84
## 3 ADM_ADM_2 Multi Polygon 96 13 WGS 84
## 4 ADM_ADM_3 Multi Polygon 350 16 WGS 84
## 5 ADM_ADM_4 Multi Polygon 3728 14 WGS 84
## 6 ADM_ADM_5 Multi Polygon 36611 15 WGS 84
All these layers correspond to different levels of subdivisions.
Layer name
Description
ADM_ADM_0
France contours
ADM_ADM_1
Region countours
ADM_ADM_2
Departments contours
ADM_ADM_3
Communes contours
Access data - France regions
Let’s import the regional subdivision
regions <- sf::st_read(dsn = here::here("data", "gadm41_FRA.gpkg"),layer ="ADM_ADM_1")head(regions)
## Simple feature collection with 6 features and 11 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -5.143751 ymin: 41.33375 xmax: 9.560416 ymax: 50.16764
## Geodetic CRS: WGS 84
## GID_1 GID_0 COUNTRY NAME_1 VARNAME_1 NL_NAME_1 TYPE_1
## 1 FRA.1_1 FRA France Auvergne-Rhône-Alpes NA NA Région
## 2 FRA.2_1 FRA France Bourgogne-Franche-Comté NA NA Région
## 3 FRA.3_1 FRA France Bretagne NA NA Région
## 4 FRA.4_1 FRA France Centre-Val de Loire NA NA Région
## 5 FRA.5_1 FRA France Corse Corsica NA Région
## 6 FRA.6_1 FRA France Grand Est NA NA Région
## ENGTYPE_1 CC_1 HASC_1 ISO_1 geom
## 1 Region NA FR.AR NA MULTIPOLYGON (((5.415834 44...
## 2 Region NA FR.BF NA MULTIPOLYGON (((5.256271 46...
## 3 Region NA FR.BT FR-BRE MULTIPOLYGON (((-3.248194 4...
## 4 Region NA FR.CN FR-CVL MULTIPOLYGON (((2.063459 46...
## 5 Region NA FR.CE FR-20R MULTIPOLYGON (((9.102084 41...
## 6 Region NA FR.AO NA MULTIPOLYGON (((7.178251 47...
Access data - France regions
Let’s import the regional subdivision
regions <- sf::st_read(dsn = here::here("data", "gadm41_FRA.gpkg"),layer ="ADM_ADM_1")head(regions)
## Simple feature collection with 6 features and 11 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -5.143751 ymin: 41.33375 xmax: 9.560416 ymax: 50.16764
## Geodetic CRS: WGS 84
## GID_1 GID_0 COUNTRY NAME_1 VARNAME_1 NL_NAME_1 TYPE_1
## 1 FRA.1_1 FRA France Auvergne-Rhône-Alpes NA NA Région
## 2 FRA.2_1 FRA France Bourgogne-Franche-Comté NA NA Région
## 3 FRA.3_1 FRA France Bretagne NA NA Région
## 4 FRA.4_1 FRA France Centre-Val de Loire NA NA Région
## 5 FRA.5_1 FRA France Corse Corsica NA Région
## 6 FRA.6_1 FRA France Grand Est NA NA Région
## ENGTYPE_1 CC_1 HASC_1 ISO_1 geom
## 1 Region NA FR.AR NA MULTIPOLYGON (((5.415834 44...
## 2 Region NA FR.BF NA MULTIPOLYGON (((5.256271 46...
## 3 Region NA FR.BT FR-BRE MULTIPOLYGON (((-3.248194 4...
## 4 Region NA FR.CN FR-CVL MULTIPOLYGON (((2.063459 46...
## 5 Region NA FR.CE FR-20R MULTIPOLYGON (((9.102084 41...
## 6 Region NA FR.AO NA MULTIPOLYGON (((7.178251 47...
In the previous section, we saw how to manually download data from a web browser. However, when possible, we recommend you to perfom this task using code (scripting).
Reproducibility & Automation
In , the function download.file() can be used to download a file from Internet.
# Map France boundary ----maps::map(regions ="France", fill =TRUE, col ="black")# Add retrieved coordinates ----points(x = content$"lon", y = content$"lat", pch =19, cex =1, col ="red")# Add retrieved name ----text(x = content$"lon", y = content$"lat", labels = content$"name", pos =2, col ="white", family ="serif")
# Retrieve coordinates ----get_coords_from_location(city ="Montpellier", country ="France")
## name lon lat
## 1 Montpellier 3.876734 43.61124
Automation
# List of cities ----cities <-c("Montpellier", "Paris", "Strasbourg", "Grenoble", "Bourges")# Retrieve coordinates ----coords <-data.frame()for (city in cities) { coord <-get_coords_from_location(city = city, country ="France") coords <-rbind(coords, coord)}coords
## name lon lat
## 1 Montpellier 3.876734 43.61124
## 2 Paris 2.348391 48.85350
## 3 Strasbourg 7.750713 48.58461
## 4 Grenoble 5.735782 45.18756
## 5 Bourges 2.399125 47.08117
Exercise (40 min)
Exercise - Accessing data
Part 1: Download the PanTHERIA database, a species-level database of life history, ecology, and geography of extant and recently extinct mammals available here.
# Convert 'pop2021' to numeric ----top10$"pop2021"<-gsub(" ", "", top10$"pop2021")top10$"pop2021"<-as.numeric(top10$"pop2021")top10
## # A tibble: 10 × 4
## rang2022 commune departement pop2021
## <int> <chr> <chr> <dbl>
## 1 1 Paris Paris 2113705
## 2 2 Marseille Bouches-du-Rhône 877215
## 3 3 Lyon Métropole de Lyon 520774
## 4 4 Toulouse Haute-Garonne 511684
## 5 5 Nice Alpes-Maritimes 353701
## 6 6 Nantes Loire-Atlantique 325070
## 7 7 Montpellier Hérault 307101
## 8 8 Strasbourg Bas-Rhin 291709
## 9 9 Bordeaux Gironde 265328
## 10 10 Lille Nord 238695
Scrap other elements
Detect HTML element by tag
# Extract content of h1 element ----rvest::html_element(content, css ="h1") |> rvest::html_text2()
## [1] "Liste des communes de France les plus peuplées"
Scrap other elements
Detect HTML element by tag
# Extract content of h1 element ----rvest::html_element(content, css ="h1") |> rvest::html_text2()
## [1] "Liste des communes de France les plus peuplées"
Detect HTML elements by tag
# Extract content of the first h2 element ----rvest::html_element(content, css ="h2") |> rvest::html_text2()
## [1] "Sommaire"
# Extract content of all h2 elements ----rvest::html_elements(content, css ="h2") |> rvest::html_text2()
## [1] "Sommaire"
## [2] "Cadre des données"
## [3] "Vue d'ensemble"
## [4] "Communes de plus de 30 000 habitants"
## [5] "Communes ayant compté plus de 30 000 habitants avant 2025"
## [6] "Notes et références"
## [7] "Voir aussi"
Scrap other elements
Detect HTML element by tag
# Extract content of h1 element ----rvest::html_element(content, css ="h1") |> rvest::html_text2()
## [1] "Liste des communes de France les plus peuplées"
Detect HTML elements by tag
# Extract content of the first h2 element ----rvest::html_element(content, css ="h2") |> rvest::html_text2()
## [1] "Sommaire"
# Extract content of all h2 elements ----rvest::html_elements(content, css ="h2") |> rvest::html_text2()
## [1] "Sommaire"
## [2] "Cadre des données"
## [3] "Vue d'ensemble"
## [4] "Communes de plus de 30 000 habitants"
## [5] "Communes ayant compté plus de 30 000 habitants avant 2025"
## [6] "Notes et références"
## [7] "Voir aussi"
Detect HTML element by ID
# Extract content of the h2 element detected by its id ----rvest::html_element(content, css ="#Cadre_des_données") |> rvest::html_text2()
## [1] "Cadre des données"
Scrap other elements
Detect HTML element by tag
# Extract content of h1 element ----rvest::html_element(content, css ="h1") |> rvest::html_text2()
## [1] "Liste des communes de France les plus peuplées"
Detect HTML elements by tag
# Extract content of the first h2 element ----rvest::html_element(content, css ="h2") |> rvest::html_text2()
## [1] "Sommaire"
# Extract content of all h2 elements ----rvest::html_elements(content, css ="h2") |> rvest::html_text2()
## [1] "Sommaire"
## [2] "Cadre des données"
## [3] "Vue d'ensemble"
## [4] "Communes de plus de 30 000 habitants"
## [5] "Communes ayant compté plus de 30 000 habitants avant 2025"
## [6] "Notes et références"
## [7] "Voir aussi"
Detect HTML element by ID
# Extract content of the h2 element detected by its id ----rvest::html_element(content, css ="#Cadre_des_données") |> rvest::html_text2()
## [1] "Cadre des données"
Extract attribute
# Extract URL of the first image ----image_url <- rvest::html_element(content, css ="img") |> rvest::html_attr(name ="src")image_url
## [1] "/static/images/icons/wikipedia.png"
Scrap other elements
Detect HTML element by tag
# Extract content of h1 element ----rvest::html_element(content, css ="h1") |> rvest::html_text2()
## [1] "Liste des communes de France les plus peuplées"
Detect HTML elements by tag
# Extract content of the first h2 element ----rvest::html_element(content, css ="h2") |> rvest::html_text2()
## [1] "Sommaire"
# Extract content of all h2 elements ----rvest::html_elements(content, css ="h2") |> rvest::html_text2()
## [1] "Sommaire"
## [2] "Cadre des données"
## [3] "Vue d'ensemble"
## [4] "Communes de plus de 30 000 habitants"
## [5] "Communes ayant compté plus de 30 000 habitants avant 2025"
## [6] "Notes et références"
## [7] "Voir aussi"
Detect HTML element by ID
# Extract content of the h2 element detected by its id ----rvest::html_element(content, css ="#Cadre_des_données") |> rvest::html_text2()
## [1] "Cadre des données"
Extract attribute
# Extract URL of the first image ----image_url <- rvest::html_element(content, css ="img") |> rvest::html_attr(name ="src")image_url
## [1] "/static/images/icons/wikipedia.png"
# Build image full URL ----image_url <-paste0(base_url, image_url)image_url
# Extract content of h1 element ----rvest::html_element(content, css ="h1") |> rvest::html_text2()
## [1] "Liste des communes de France les plus peuplées"
Detect HTML elements by tag
# Extract content of the first h2 element ----rvest::html_element(content, css ="h2") |> rvest::html_text2()
## [1] "Sommaire"
# Extract content of all h2 elements ----rvest::html_elements(content, css ="h2") |> rvest::html_text2()
## [1] "Sommaire"
## [2] "Cadre des données"
## [3] "Vue d'ensemble"
## [4] "Communes de plus de 30 000 habitants"
## [5] "Communes ayant compté plus de 30 000 habitants avant 2025"
## [6] "Notes et références"
## [7] "Voir aussi"
Detect HTML element by ID
# Extract content of the h2 element detected by its id ----rvest::html_element(content, css ="#Cadre_des_données") |> rvest::html_text2()
## [1] "Cadre des données"
Extract attribute
# Extract URL of the first image ----image_url <- rvest::html_element(content, css ="img") |> rvest::html_attr(name ="src")image_url
## [1] "/static/images/icons/wikipedia.png"
# Build image full URL ----image_url <-paste0(base_url, image_url)image_url