Python lab for exploring memory bandwidth, cache effects, and locality in accelerator workloads
performance-engineering hpc cuda memory-bandwidth gpu-performance roofline-model cache-locality tiling-optimization
-
Updated
Dec 24, 2025 - Python