Skip to content
View munaberhe's full-sized avatar

Block or report munaberhe

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
munaberhe/README.md

Hi, I’m Muna 👋

I’m an MSc Bioinformatics student interested in human genetics, rare disease and machine learning for genomics. Long term, I’d like to work as a bioinformatician in the pharmaceutical / biotech industry, building tools that help with target discovery, biomarker development and patient stratification.

What I’m into

  • Human genetics and rare disease
  • Single-cell and bulk RNA-seq analysis
  • Variant interpretation and clinical genomics
  • Applying machine learning / NLP to biological data

What I work with

  • Languages / tools: Python, R, Git, Linux
  • Python: pandas, scikit-learn, Scanpy, matplotlib
  • R / Bioconductor: DESeq2, clusterProfiler
  • Domains: scRNA-seq, RNA-seq, variant analysis, phenotype–disease matching

At a glance

On GitHub I tend to build small, reproducible analysis pipelines around:

  • single-cell RNA-seq (cell type classification / label transfer),
  • bulk RNA-seq differential expression and pathway analysis,
  • phenotype–disease matching and LLM-style reasoning over clinical text,
  • variant-level modelling and pathogenicity prediction.

As my MSc project grows, I’ll keep using this space to collect the kinds of workflows and experiments I enjoy working on most, especially those that sit at the intersection of bioinformatics, statistics and translational genomics in pharma.

Pinned Loading

  1. scRNA_label_transfer_benchmark scRNA_label_transfer_benchmark Public

    Benchmarking kNN, RandomForest and Scanpy ingest for cell type label transfer on PBMC single-cell RNA-seq data.

    Python

  2. phenotype_disease_matching phenotype_disease_matching Public

    Toy benchmark of BM25, TF-IDF and an LLM baseline for phenotype–disease matching, supporting an MSc project on LLMs for genomic diagnosis.

    Python

  3. rnaseq_deseq2_pathway rnaseq_deseq2_pathway Public

    DESeq2 and clusterProfiler pipeline for differential expression and GO enrichment on the airway RNA-seq dataset.

    R

  4. variant_pathogenicity_classifier variant_pathogenicity_classifier Public

    ClinVar-style toy project that trains and evaluates a RandomForest classifier to predict variant pathogenicity from gene, consequence, impact, PolyPhen-like score, and allele frequency.

    Python

  5. exomiser_llm_benchmark exomiser_llm_benchmark Public

    Experimental benchmark for rare disease prioritisation tools, contrasting algorithmic (Exomiser-like) and LLM-based approaches.

    Python

  6. rare_disease_genomics_toolkit rare_disease_genomics_toolkit Public

    Modular Python toolkit for rare disease case prioritisation, integrating variant pathogenicity models and phenotype–disease similarity scoring for translational genomics applications. This toolkit …

    Python