amRml is part of the amR suite for antimicrobial resistance prediction. amRml is the machine learning (ML) engine of the https://github.com/JRaviLab/amR. It trains interpretable ML models to predict antimicrobial resistance (AMR) from the multi‑scale genomic features prepared by amRdata. The package includes streamlined pipelines for:
- Single‑drug AMR prediction
- Multi‑drug resistance (MDR) prediction
- Stratified and cross‑testing workflows (year, country)
- Leave‑one‑out (LOO) generalization tests
- Baseline shuffled‑label controls
- PCA and variable‑importance‑based feature reduction
All with reproducible, parallelized model execution.
amRml produces ML‑ready tibbles, trains logistic regression models, and exports clean performance summaries, feature rankings, predictions, and tuned model objects.
amRml provides functions to:
- Convert multi‑scale genomic feature Parquets into ML‑ready tibbles
- Train interpretable AMR classifiers with tidymodels
- Perform robust evaluation using MCC, F1, AuPRC, balanced accuracy, and confusion matrices
- Extract top predictive features to connect models’ performance with biological underpinnings
The models are optimized for reproducibility, interpretability, and benchmarking across species and antibiotic classes. See the https://jravilab.github.io/amRml/articles/intro.html for full examples.
# Install from GitHub
if (!requireNamespace("remotes", quietly = TRUE))
install.packages("remotes")
remotes::install_github("JRaviLab/amRml")generateMLInputs() converts the Parquet‑backed DuckDB file created by
amRdata into a standard set of ready-to-model matrices.
library(amRml)
generateMLInputs(
parquet_duckdb_path = "data/Shigella_flexneri/Sfl_parquet.duckdb",
out_path = "results/Shigella_flexneri/",
n_fold = 5,
verbosity = "minimal"
)This will analyze the Parquet tables attached to the
{Bug}_parquet.duckdb database created by the amRdata::cleanData() step
at the end of the amRdata::runDataProcessing() workflow. For each drug
or drug class, it evaluates how many isolates have paired genotype and
AMR phenotype data, and produced subsetted ML-ready matrices for each
feature scale.
The result: All modeling matrices that can be created for a given dataset, ready for input into the ML modeling workflow next.
runMLmodels() executes logistic regression ML models across all
prepared matrices.
runMLmodels(
path = "results/Shigella_flexneri/",
threads = 16
)This produces 4 output directories: - performance: Containing
performance metrics per bug-drug-feature_scale model - pred:
Containing the specific predictions made per model - top_features:
Containing which specific features drove model predictions per model -
models: Containing the model fits themselves in .Rds format
These models can also be adjusted with parameters like
shuffle_labels = TRUE (creates random baseline models to compare
against), use_pca = TRUE (to reduce feature space by using PCs as
features), stratify_by = "country" (to stratify samples into countries
of origin to identify regional trends in AMR and how well models
generalize), and many other options.
runMDRmodels() trains models that predict aggregated multi‑drug
resistance phenotypes.
runMDRmodels(
path = "results/Shigella_flexneri/",
threads = 16
)This uses specific matrices to test whether ML models can predict resistance against multiple drug classes, and identify any features associated with MDR.
- Data preparation: Load Parquet files and prepare ML-ready datasets
- Model training: User-customizable logistic regression via tidymodels
- Evaluation: nMCC, F1, balanced accuracy, AuPRC, and confusion matrices
- Feature importance: Extract and rank predictive features
See the package vignette for detailed usage.
amRml is designed to work seamlessly with other amR packages:
library(amRdata)
library(amRml)
library(amRshiny)
# 1. Curate data
prepareGenomes("Shigella flexneri")
runDataProcessing("amRdata/data/Shigella_flexneri/Sfl.duckdb")
# 2. Train models
runMLmodels("amRdata/data/Shigella_flexneri/Sfl_parquet.duckdb")
# 3. Visualize
launchAMRDashboard()If you use amRml in your research, please cite:
Brenner E, Ghosh A, Wolfe E, Boyer E, Vang C, Lesiyon R, Mayer D, Ravi J. (2026).
amR: an R package suite to predict antimicrobial resistance in bacterial pathogens.
R package version 0.99.0.
https://github.com/JRaviLab/amR
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
Report bugs and request features at: https://github.com/JRaviLab/amRml/issues
BSD 3-Clause License. See LICENSE for details.
Corresponding author: Janani Ravi (janani.ravi@cuanschutz.edu)
Lab website: https://jravilab.github.io
Please note that amRml is released with a Contributor Code of
Conduct.
By contributing to this project, you agree to abide by its terms.