Skip to content

JeffersonConza/microplastics-ml-research

Repository files navigation

🌊 Microplastics ML Research: Rusanda Peloid ATR-FTIR

Machine Learning pipeline for the identification and classification of microplastics using ATR-FTIR spectral data.

This repository contains a research-oriented machine learning pipeline for the analysis, classification, and interpretation of microplastics data, combining environmental science with modern data-driven methods.
The project is designed to be modular, reproducible, and extensible, supporting both exploratory analysis and interpretable ML models.

Python ML Status

📌 Project Motivation

Microplastics (<5 mm plastic particles) are an emerging global environmental threat due to their persistence, widespread distribution, and potential impacts on ecosystems and human health.

This project aims to:

  • Apply machine learning models to microplastics datasets
  • Improve classification accuracy and interpretability
  • Support environmental decision-making through explainable AI
  • Provide a research-grade codebase suitable for extension and publication

🎯 Objectives

  • Detect and classify microplastics based on measured features
  • Explore environmental and chemical patterns in the data
  • Train and evaluate ML models with rigorous metrics
  • Interpret predictions using SHAP and other explainability tools
  • Assess annotation reliability via inter-annotator agreement (Cohen's Kappa)

🗂 Repository Structure

microplastics-ml-research/
│
├── data/                    # Raw and processed datasets
├── models/                  # Trained models and serialized artifacts
├── notebooks/               # Jupyter notebooks (EDA, experiments)
├── reports/                 # Figures, tables, and generated reports
├── src/                     # Core source code (features, utils, models)
│
├── config.py                # Global configuration (paths, parameters)
├── main.py                  # Main execution entry point
│
├── run_training.py          # Model training pipeline
├── run_exploration.py       # Exploratory data analysis
├── run_shap.py              # SHAP explainability analysis
├── run_interpretability.py  # Model interpretation tools
├── run_kappa_evaluation.py  # Cohen's Kappa agreement analysis
├── run_chemical_interpretation.py  # Chemical/environmental analysis
│
├── requirements.txt         # Python dependencies
└── README.md                # Project documentation

📊 Dataset

Source

This project uses ATR-FTIR spectroscopy data from the Rusanda Lake peloid study (Serbia):

Prosenc, F. (2024). Identification of microplastics isolated from Rusanda peloid (Serbia) with ATR-FTIR [Data set]. Zenodo.
📍 DOI: 10.5281/zenodo.13850936

The dataset contains:

  • Unprocessed FTIR spectra from isolated microplastic candidates
  • Processed spectra using OpenSpecy 1.0
  • Library matches for polymer identification
  • Microplastics categorized by morphology: fibers, films, and fragments

Data Structure

ATR-FTIR spectra_Rusanda sample/
├── OpenSpecy_Rusanda sample library matches/
│   ├── Fibres top matches.csv
│   ├── Films top matches.csv
│   └── Fragments top matches.csv
└── Processed FTIR spectra with OpenSpecy_Rusanda sample/
    ├── BLATO TOPLICE_fiber_1_processed.csv
    ├── BLATO TOPLICE_fiber_2_processed.csv
    └── ...

Features Analyzed

  • Spectral signatures (wavenumber vs. absorbance)
  • Particle morphology (fiber, fragment, film)
  • Polymer type (via library matching: PE, PP, PET, PS, etc.)
  • Match confidence scores from OpenSpecy
  • Chemical functional groups from FTIR peaks

License

The dataset is available under Creative Commons Attribution 4.0 International and Community Data License Agreement Permissive 1.0, allowing research and educational use with proper attribution.


🤖 Machine Learning Models

The project supports classical and interpretable ML approaches, such as:

  • Random Forest
  • Gradient Boosting
  • Logistic Regression
  • Tree-based classifiers

Why these models?

  • Robust performance on tabular scientific data
  • Compatibility with explainability methods
  • Lower risk of overfitting on limited datasets

📈 Evaluation Metrics

  • Accuracy
  • Precision / Recall / F1-score
  • Confusion Matrix
  • Cohen's Kappa (for label agreement and reliability)
  • Feature importance and SHAP values

🔍 Model Interpretability

To ensure scientific transparency, the project includes:

  • SHAP (SHapley Additive exPlanations)
  • Feature importance analysis
  • Chemical and environmental interpretation of predictions

This allows linking model behavior to physical and chemical meaning, which is critical for environmental research.


🚀 How to Run

1️⃣ Clone the repository

git clone https://github.com/JeffersonConza/microplastics-ml-research.git
cd microplastics-ml-research

2️⃣ Create environment & install dependencies

pip install -r requirements.txt

3️⃣ Run main workflows

Exploratory Data Analysis

python run_exploration.py

Train models

python run_training.py

Evaluate inter-annotator agreement

python run_kappa_evaluation.py

Model interpretability (SHAP)

python run_shap.py

Chemical interpretation

python run_chemical_interpretation.py

🧪 Reproducibility

  • Fixed random seeds (where applicable)
  • Config-driven parameters via config.py
  • Modular scripts for independent execution
  • Clear separation between data, models, and analysis

📚 Scientific Relevance

This project is suitable for:

  • Environmental data science research
  • Microplastics monitoring studies
  • Explainable AI applications in ecology
  • Academic theses and peer-reviewed publications

🔮 Future Work

  • Deep learning models for spectral data
  • Automated microplastics image classification
  • Integration with GIS and spatial analysis
  • Longitudinal environmental trend analysis
  • Risk assessment and policy-oriented outputs

👤 Author

Jefferson Conza
Mathematics Student | Machine Learning & Data Science
GitHub: @JeffersonConza


📜 License

This project is intended for research and educational purposes.
Please cite appropriately if used in academic work.


⭐ Acknowledgments

  • Environmental microplastics research community
  • Open-source Python scientific ecosystem
  • Contributors to interpretable machine learning tools

About

Research project for the detection, classification, and environmental modelling of microplastics.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published