Ovarian Cancer Immune Repertoire Analysis

This project involves the analysis of immune repertoire data obtained from the blood samples of healthy donors and ovarian cancer patients. The immune repertoire data includes T-cell receptor alpha (TRA) and beta (TRB) sequences obtained through Repertoire Sequencing (Rep-Seq). The analysis includes data preprocessing, feature filtering, and machine learning for classification.

This repository accompanies the publication:

Zuckerbrot-Schuldenfrei, M., et al.
"Ovarian cancer is detectable from peripheral blood using machine learning over T-cell receptor repertoires"
Briefings in Bioinformatics, 2024.
https://doi.org/10.1093/bib/bbae075

Data Collection and Processing

Blood Collection:
- Blood samples were collected from healthy donors and ovarian cancer patients.
Repertoire Sequencing (Rep-Seq):
- Repertoire Sequencing was performed on the collected blood samples to obtain TRA and TRB sequences.
Data Processing with MiXCR:
- The obtained raw sequencing data went through the MiXCR pipeline for processing.
Concatenation of TRA and TRB Files:
- The processed TRA and TRB files were concatenated for further analysis.

Data Analysis

Immunarch Analysis

immunarch_analysis.Rmd:
- This R Markdown document performs subsampling on the concatenated data and conducts basic analyses using the immunarch package.

Feature Filtering

feature_filtering.Rmd:
- This R Markdown document implements two different methods for feature filtering on the concatenated data.

Machine Learning

ML_atom_SFM_600f.ipynb and ML_atom_SFM_16f.ipynb
- These Jupyter Notebooks contain Python scripts for machine learning.
- The data resulting from feature_filtering.Rmd is processed through these notebooks.
- Machine learning algorithms are applied for classification using the atom package.

Usage

The analysis can be reproduced by following the steps outlined in each analysis script.
Ensure that the required dependencies (such as R packages and Python libraries) are installed.
The project was developed and tested with the following package versions:
- Python: 3.8.13
- NumPy: 1.21.2
- Pandas: 1.4.1
- Scikit-learn: 1.0.2
- atom-ml: 4.12.0
To reproduce the Python environment, conda env create -f environment.yml.

Data Availability

The Rep-Seq data were deposited in the NCBI BioProject database under accession number PRJNA1152888.
Data resulting from the second feature filtering method in feature_filtering.Rmd is available in the data_16_ab.csv file. All analysis on these data can be reproduced with ML_atom_SFM_16f.ipynb.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Ovarian Cancer Immune Repertoire Analysis

Data Collection and Processing

Data Analysis

Immunarch Analysis

Feature Filtering

Machine Learning

Usage

Data Availability

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
LICENSE		LICENSE
ML_atom_SFM_16f.ipynb		ML_atom_SFM_16f.ipynb
ML_atom_SFM_600f.ipynb		ML_atom_SFM_600f.ipynb
README.md		README.md
data_16_ab.csv		data_16_ab.csv
environment.yml		environment.yml
feature_filtering.Rmd		feature_filtering.Rmd
immunarch_analysis.Rmd		immunarch_analysis.Rmd

License

Miriam-Zu/Ovarian

Folders and files

Latest commit

History

Repository files navigation

Ovarian Cancer Immune Repertoire Analysis

Data Collection and Processing

Data Analysis

Immunarch Analysis

Feature Filtering

Machine Learning

Usage

Data Availability

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages