Skip to content
View Ramshankar07's full-sized avatar
🎯
Focusing
🎯
Focusing

Highlights

  • Pro

Block or report Ramshankar07

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Ramshankar07/README.md

Hi 👋, I'm Ramshankar

ramshankar07

Research Interests

  • GPU kernel optimization and low-level performance engineering
  • Model quantization and precision-efficient inference

Selected Work

ModelOpt

An automated neural network optimization framework that treats model compression as a sequential decision-making problem. Uses Monte Carlo Tree Search to navigate the combinatorial space of compression configurations, discovering optimal quantization and pruning strategies without requiring gradient-based fine-tuning.

BitSkip: Quantization × Early Exit Composition

Empirical analysis of composing quantization with early exit strategies for efficient inference. Investigates how reduced numerical precision interacts with adaptive computation depth—whether aggressive quantization degrades the confidence estimates that early exit relies on, and how to jointly optimize both techniques. Explores the Pareto frontier of latency, memory, and accuracy trade-offs.

Cross-Platform GPU Optimization

Custom kernel implementations achieving 8.8x RMSNorm speedup and 79-83% memory bandwidth utilization on distributed 8-GPU A100 systems. Experience porting optimizations across CUDA, ROCm/HIP, and Metal backends, with focus on memory coalescing, warp-level primitives, and minimizing kernel launch overhead.

Education

M.S. Information Systems — Northeastern University (December 2025)
Thesis: ModelOpt: Research Framework for Zero-Shot Computer Vision Model Optimization with Tree Search and Federated Knowledge Sharing
Advisor: Professor Handan Liu

Technical Skills

GPU Programming: CUDA, ROCm/HIP, Metal, Triton
ML Frameworks: PyTorch, DeepSpeed, FSDP, Hugging Face Transformers
Quantization Tools: bitsandbytes, GPTQ, AWQ
Languages: Python, C++, CUDA C
Infrastructure: Distributed training, SLURM, multi-node clusters

💻 Tech Stack

Python MySQL R C++

NumPy Pandas scikit-learn Keras TensorFlow PyTorch Langchain Library Hugging Face

AWS Google Cloud CUDA Docker

]

Pinned Loading

  1. Enhancing-Retrieval-via-GraphRAG Enhancing-Retrieval-via-GraphRAG Public

    Graph RAG provides a robust solution for medical professionals and researchers seeking efficient access to complex medical information.

    HTML

  2. CUDA-llama3.1-inference CUDA-llama3.1-inference Public

    This repository is CUDA implementation for LLAMA 3.1 open models

    Cuda 2

  3. qwen600-ROCm-inference qwen600-ROCm-inference Public

    Forked from yassa9/qwen600

    Static suckless single batch qwen3-0.6B mini inference engine

    C++ 1