Skip to content
View zafstojano's full-sized avatar

Organizations

@open-thought

Block or report zafstojano

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
zafstojano/README.md

👨🏻‍🔬 Research Interests

Reinforcement Learning

Continual Learning

  • Worked on mitigating catastrophic forgetting in foundation models based on continual weight interpolation, demonstrating performance close to the upper bound of jointly training on all data in our NeurIPS workshop publication.

Evaluation

Healthcare and Life Sciences

  • Led a team to automate glomerular sclerosis classification from gigapixel kidney biopsies, deployed in a system serving over half of the Organ Procurement Organizations in the US.
  • Part of a team developing models to predict protein-ligand binding affinity from DNA Encoded Library (DEL) data for drug discovery, resulting in numerous experimentally confirmed binders in the lab!

📄 Publications

My work is used by AI labs such as DeepMind [1, 2, 3, 4], Meta [5, 6, 7], NVIDIA [8, 9], Mila [10, 11, 12], and Prime Intellect [13]:

Pinned Loading

  1. open-thought/reasoning-gym open-thought/reasoning-gym Public

    [NeurIPS 2025 Spotlight] Reasoning Environments for Reinforcement Learning with Verifiable Rewards

    Python 1.3k 113

  2. natolambert/rlhf-book natolambert/rlhf-book Public

    Textbook on reinforcement learning from human feedback

    Python 1.5k 130

  3. EleutherAI/lm-evaluation-harness EleutherAI/lm-evaluation-harness Public

    A framework for few-shot evaluation of language models.

    Python 11.3k 3k

  4. policy-gradients policy-gradients Public

    A minimal hackable implementation of policy gradient methods (GRPO, PPO, REINFORCE)

    Python 9