Static Predictors - Read Me

Note that this repository is not updated anymore by me (@FriedaRosa). I hold no responsibily, nor was I involved in making any changes after July 2025. The Czech University of Life Sciences, Prague and the ERC hold all rights and ownerships of this study. For any questions regarding this project, please reach out to @petrkeil (keil@fzp.czu.cz) as he holds the full responsibility for this project now.

0. Meta information:

Project title: Static Predictors
Author: Friederike Johanna Rosa Wölke, MSc
Date: 2025-05-28
Location: Prague, Czech Republic
License: CC BY-NC-ND 4.0 (until publication) (https://creativecommons.org/licenses/by-nc-nd/4.0/)
R Package versions: registered in file renv.lock
Computational demands:
- Estimated total run time: 50 h
  locally on laptop without parallelization;
  can be significantly enhanced by running predictor scripts in parallel or enable multiple cores to approx. 6 h
Manual to the folder: Folder_metadata.xlsx
- has a list of A) scripts, input and output files, figure locations, run times, etc. and B) Files and their sources

How to use this folder:

install renv and here packages if not already installed
open the Git.Rproj in RStudio or VScode and set here::here() as working directory to the root folder (“Git”)
use renv::restore() to restore packages & dependencies from the lockfile (this will lead in a huge downloading session of packages)
there are rendered .html files for all code (see/click links below to see)
click here for advanced Project Description (incl. figs, results, predictor table)

Rendered code in .html

Example: New York State

All code (Japan, New York, Czechia, Europe)

A: Prepare data

B: Machine learning

C: Validation

D: Figures

1. Project/File structure:

fs::dir_tree(here::here(), recurse = F)

C:/Users/wolke/OneDrive - CZU v Praze/Frieda_PhD_files/02_StaticPatterns/Git
├── Code
├── Data
├── Demo_NewYork
├── Figures
├── Folder_metadata.xlsx
├── Git.Rproj
├── Project_Description.html
├── Project_Description.qmd
├── README.md
├── README.qmd
├── README.rmarkdown
├── README_files
├── renv
├── renv.lock
└── StaticPatterns_Results_all.xlsx

There are three sections in this project: The first part (A) produces the predictors and data needed for modelling. It starts by grabbing data from the database, cleaning it, filtering it, ad then producing the predictors. The second part (B) uses the predictors to train a randomForest model and evaluate it using xAI (explainable AI), interaction effects and variable importance. In the last part (C), the model predictions are checked against the latest replication of the empirical data.

Additionally, the project contains several sensitivity analyses and robustness checks, which are not part of the main analysis but were used to aid interpretation of the results and determine patterns of stochasticity in the data.

Script nomenclature:

Each R script is labelled by part (A,B,C) and script sequence (1-14).
The 00_Configuration.R script is needed for almost all other scripts. It ensures that packages are installed and has file paths and global variables and lookup tables needed for many steps.

2. Methods summary

A) Description of steps

Get data from MOBI database for first two replications (Cz, Ny, Jp, Eu)
Remove cells and species that were not sampled twice; filter species based on expert knowledge and introduced status
Prepare predictors for H1 and H2, use datasetID as H3 to determine effect of atlas in the ‘full model’
H1: Body mass, Habitat_5, Threatened_01, Generalism_01, Phylodistinct, Migration_123, Global range size
H2: Fractal dimension, Lacunarity, Spatial autocorrelation, circularity, AOO, minimum distance to the border from the centroid
H3: datasetID
Calculate responses:
1. Jaccard_dissimilarity,
2. log Ratio AOO,
3. log Ratio AOO per year
Make simulations of Jaccard_dissimilarity based on different combinations of parameters and evaluate the effect of these on the Jaccard values. Certain combinations of parameters restrict Jaccard_dissimilarity to a range of values. This can be used to determine the effect of mathematical constraints on the Jaccard_dissimilarity values.
Parameters:
1. initially occupied cells,
2. total number of cells possible,
3. number of changes
Train model with
1. ‘all data’ and
2. subsets for each datasetID (‘split data’) using random forest
Extract for all three responses:
1. rsq, rmse,
2. hyper-parameters,
3. predictions,
4. variable importance,
5. interactions
6. partial dependence plots
Test for phylogenetic autocorrelation for each datasetID in the model residuals (and the raw data)
Calculate responses from third atlas replication (Cz, Jp), use predictors calculated from second period to predict responses for the third period and get residuals

B) Modeling settings:

80/20 split (80 training, 20 testing)
3x repeated 10-fold cross validation
permutation importance (not impurity)
always split variables : datasetID
respect unordered factors = T
Bayesian hyperparameter tuning:
- mtry = 2-10
- min_n = 5-15
- trees = 1000-5000
- initial values = 5
- iterations = 50
- no improve = 10
- set a seed.

C) Data overview

Responses:

Jaccard_dissimilarity
log_R2_1 (log ratio between sampling period 2 and 1)
log_R2_1_per_year (log ratio between sampling period 2 and 1 divided by the number of years between sampling)

Notes:

The higher J_dissim, the more variable log_R2_1 and log_R2_1_per_year
The smaller AOO, the more variable log_R2_1 and log_R2_1_per_year
The lower D, the more variable (and more positive?) log_R2_1
The higher mean_lnLac, the more variable (and more positive?) log_R2_1
The higher mean_lnLac, the lower Jaccard_dissim
Species in New York and Japan are more dissimilar than species in Czechia and Europe

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
Code		Code
Demo_NewYork		Demo_NewYork
.Rprofile		.Rprofile
.gitattributes		.gitattributes
.gitignore		.gitignore
.renvignore		.renvignore
LICENSE.txt		LICENSE.txt
Project_Description.html		Project_Description.html
README.md		README.md
renv.lock		renv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Static Predictors - Read Me

0. Meta information:

How to use this folder:

Rendered code in .html

Example: New York State

All code (Japan, New York, Czechia, Europe)

A: Prepare data

B: Machine learning

C: Validation

D: Figures

1. Project/File structure:

Script nomenclature:

2. Methods summary

A) Description of steps

B) Modeling settings:

C) Data overview

Responses:

Notes:

About

Uh oh!

Releases 2

Languages

License

FriedaRosa/StaticPatterns_git

Folders and files

Latest commit

History

Repository files navigation

Static Predictors - Read Me

0. Meta information:

How to use this folder:

Rendered code in .html

Example: New York State

All code (Japan, New York, Czechia, Europe)

A: Prepare data

B: Machine learning

C: Validation

D: Figures

1. Project/File structure:

Script nomenclature:

2. Methods summary

A) Description of steps

B) Modeling settings:

C) Data overview

Responses:

Notes:

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Languages