Reinforcement Learning
- Core contributor of Reasoning Gym where I built dozens of RL environments, as well as ran the zero-shot, external benchmark, and curriculum learning experiments for our NeurIPS publication.
- Wrote several sections of the RLHF Book, where I derived the policy gradient objective and Bradley-Terry loss, as well as provided intuitions for the PPO gradient dynamics.
Continual Learning
- Worked on mitigating catastrophic forgetting in foundation models based on continual weight interpolation, demonstrating performance close to the upper bound of jointly training on all data in our NeurIPS workshop publication.
Evaluation
- Contributed several datasets to EleutherAI’s Evaluation Harness (e.g. Lambada Translations, Paloma, LegalBench), as well as implemented higher-is-better indicators and tests for output table consistency.
Healthcare and Life Sciences
- Led a team to automate glomerular sclerosis classification from gigapixel kidney biopsies, deployed in a system serving over half of the Organ Procurement Organizations in the US.
- Part of a team developing models to predict protein-ligand binding affinity from DNA Encoded Library (DEL) data for drug discovery, resulting in numerous experimentally confirmed binders in the lab!
My work is used by AI labs such as DeepMind [1, 2, 3, 4], Meta [5, 6, 7], NVIDIA [8, 9], Mila [10, 11, 12], and Prime Intellect [13]:
- "Reasoning Gym: Reasoning Environments for Reinforcement Learning with Verifiable Rewards." Zafir Stojanovski*, Oliver Stanley*, Joe Sharratt*, Richard Jones*, Abdulhakeem Adefioye, Jean Kaddour, Andreas Köpf. NeurIPS 2025 (Spotlight)
- "Momentum-based Weight Interpolation of Strong Zero-Shot Models for Continual Learning." Zafir Stojanovski*, Karsten Roth*, Zeynep Akata. NeurIPS 2022 Interpolate Workshop (Best Paper Award)