Understanding Memory Hierarchy and Data Movement Energy in AI Computing
This project simulates and analyzes the energy consumption patterns in AI inference, focusing on the critical relationship between compute energy and memory access energy. It demonstrates why "moving bits" often consumes more energy than "computing with them" in modern AI systems.
- Memory Hierarchy Understanding: Explore the energy costs of different memory levels (SRAM vs DRAM)
- Cache Optimization: See how data reuse dramatically reduces energy consumption
- The Memory Wall: Understand why data movement dominates AI chip power consumption
- Architecture Impact: Compare energy patterns across different matrix sizes
pip install numpy matplotlibpython memory_traffic_simulator.pyThe simulator reveals that:
- With caching: Memory energy β 1-2x compute energy
- Without caching: Memory energy > 10x compute energy
- Scale impact: Larger matrices amplify the memory wall problem
The tool generates two key visualizations:
- Energy Comparison Chart: Shows compute vs memory energy for a single matrix size
- Scaling Analysis: Demonstrates how energy consumption grows with matrix size
- DRAM Access: 100 pJ per access (slow, high energy)
- SRAM Access: 10 pJ per access (fast, low energy)
- MAC Operation: 5 pJ per multiply-accumulate
- With Reuse: 90% of reads from SRAM, 10% from DRAM
- No Reuse: 100% of reads from DRAM (worst case)
Simulates NΓN matrix multiplication with NΒ³ operations and 2ΓNΒ² memory reads.
π AI Model Memory-Traffic Simulator
========================================
Matrix size: 256x256
Energy with cache reuse: 1.3x compute energy
Energy without cache reuse: 12.8x compute energy
Detailed Results:
Compute energy: 83.89 Β΅J
Memory energy (with reuse): 109.23 Β΅J
Memory energy (no reuse): 1073.74 Β΅J
This simulator is part of Week 3: Memory Systems and Data Movement in AI Computing - a study of how memory architecture limits AI performance and how engineers optimize bandwidth, latency, and energy.
- Memory Hierarchy: registers β cache β DRAM β storage
- Bandwidth vs Latency: Trading speed for capacity
- Power Scaling: P β C Γ VΒ² Γ f for data movement
- Architectural Solutions: Why we need systolic arrays and tensor cores
- Modify energy parameters for different hardware
- Adjust cache hit ratios for various algorithms
- Add more memory levels (L1, L2, L3 cache)
- Implement different matrix algorithms
- Memory bandwidth limitations
- Parallel processing simulation
- Quantization impact on memory usage
- Different neural network layer types
- AI Accelerators: TPUs, GPUs, and specialized chips
- Memory Technologies: HBM, GDDR, processing-in-memory
- Optimization Techniques: Tiling, data layout, compression
- Architecture Design: Von Neumann bottleneck solutions
Feel free to:
- Add new memory models
- Implement different AI workloads
- Improve visualization
- Add more realistic energy models
Built to demonstrate the fundamental energy tradeoffs in AI computing systems π