Skip to content

platformxlab/RAGPerf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

RAGPerf: An End-to-End Benchmarking Framework for Retrieval-Augmented Generation Systems

RAGPerf is an open-source framework designed to benchmark the end-to-end system performance of Retrieval-Augmented Generation (RAG) applications. Built with a fully modular architecture, it offers a user-friendly and highly customizable framework that allows precise measurement of throughput, latency, and scalability across different RAG configurations.

CMake C++ Python OS Linux Code style: clang-format Code style: black

Key Features

πŸš€ Holistic System-Centric Benchmarking: RAGPerf moves beyond simple accuracy metrics to profile the performance of RAG systems. It measures end-to-end throughput (QPS), latency breakdowns, and hardware efficiency. This helps developers identify potential bottlenecks throughout the entire pipeline.

🧩 Modular Architecture: RAGPerf uses a modular design that abstracts different stages of the RAG pipeline (Embedding, Vector Database, Reranking, and Generation) behind uniform interfaces. Users can seamlessly switch components (e.g., switching underlyinig vector database from Milvus to LanceDB, or change underlying generative model from GPT to Qwen) without rewriting code. This enables detailed performance comparisons between different system settings.

πŸ“Š Detailed Full-Stack Profiling: RAGPerf integrates a lightweight profiler that runs as a background daemon. It captures fine-grained hardware metrics with minimal overhead, including GPU/CPU utilization, memory consumptions (host RAM & GPU VRAM), PCIe throughput, and disk I/O utilization. This allows detailed analysis of resource utilization between RAG components and help identify potential system bottlenecks.

πŸ”„ Simulating Real-World Scenarios: RAGPerf is able to simulate the evolution of real-world knowledge bases by synthesizing updates with a custom and configurable workload generator. The workload generator supports insert, update, and delete requests at different frequency and patterns, allowing users to estimate how data freshness and system performance varies in real systems.

πŸ–ΌοΈ Multi-Modal Capabilities: RAGPerf supports diverse data modalities beyond plain text. It provides specialized pipelines including Visual RAG (PDFs, Images) using OCR or ColPali visual embeddings, and Audio RAG using ASR models like Whisper. This enables benchmarking of complex, unstructured RAG pipelines.


Table of Contents

Installation

Create a Virtual Environment

To run RAGPerf, we highly recommend using an isolated Python environment. You can use a Python virtual environment manager (e.g., venv, conda) to avoid package conflicts. We use conda for demonstrating purposes throughout the documentation.

Conda (recommended)

# Install Miniconda/Mambaforge from the official site if you don't have Conda
conda create -n RAGPerf python=3.10
conda activate RAGPerf

Install Dependencies

Execute the following instructions to install all the dependencies for the project. We use pip-tools to ensure reproducible dependency resolution.

# install pip-compile for python package dependency resolution
python3 -m pip install pip-tools

# generate list of all required python packages
mkdir build && cd build
cmake ..
make generate_py3_requirements

# install the dependencies
python3 -m pip install -r ../requirement.txt

Install Monitoring System

RAGPerf uses a custom, low-overhead monitoring daemon. Here is a stripped-down version of the installation procedure (please refer to MonitoringSystem README for detailed instructions and explanations).

C++ 20 Compatible Compiler Installation

Install a C++ 20 compatible compiler in the virtual environment. For example, to install gcc=12.1.0, run

conda install -c conda-forge gcc=12.1.0

Build MSys Shared Library and Position the Output to src/monitoring_sys

Run the following commands in the project's build folder.

# enter the python virtual environment
cmake -DCMAKE_BUILD_TYPE=Release ..
make libmsys_pymod -j

Make sure you see the file libmsys.cpython-310-x86_64-linux-gnu.so (the exact name could depend on your python version and architecture), this is the cpython module for the monitoring system executable.

Running RAGPerf

RAGPerf provides an Interactive Web UI for ease of use. And of course, you can use the Command Line (CLI) for automation.

Quick Start with Web UI

Preparation

Set these once in your shell rc file (e.g., ~/.bashrc or ~/.zshrc) or export them in every new shell.

# Make local "src" importable
export PYTHONPATH="$REPO_ROOT/src${PYTHONPATH+:$PYTHONPATH}"

# Where to cache Hugging Face models (optional, adjust path as needed)
export HF_HOME="/mnt/data/hf_home"

Install streamlit and run the RAGPerf client.

# install streamlit
python3 -m pip install streamlit
# run RAGPerf
streamlit run ui_client.py

Open the UI with the reported url in your web browser, the default url is http://localhost:8501.

Configuring the Benchmark

To run the benchmark, we first need to set up the vector database (See vectordb for more details). Then, customize your own workload settings with all the available options on the webpage.

config

Running the Benchmark

In the execute page, click the START BENCHMARK button to execute the workload already configured. You may also want to check if all the configs are set correctly, see here for detailed explanation of different entries in the config file.

config

Run with Command Line Interface

Preparation

Set these environment variables once in your shell rc file (e.g., ~/.bashrc or ~/.zshrc) or export them in every new shell.

# Make local `src` module importable
# set variable REPO_ROOT to correct path to the repo
export PYTHONPATH="$REPO_ROOT/src$PYTHONPATH"

# Where to cache Hugging Face models (optional, adjust path as needed)
export HF_HOME="/mnt/data/hf_home"

Running the Benchmark

To run the benchmark, you first need to set up the vector database as the retriever. See vectordb for a supported list and quick setup guide. Change the db_path to your local vector database storage path in config file.

vector_db:
    db_path: /mnt/data/vectordb

First run the preprocess/insert phase to insert the dataset.

# 1) Build/insert into the vector store (LanceDB example)
python3 src/run_new.py \
  --config config/lance_insert.yaml \
  --msys-config config/monitor/example_config.yaml

After the insertion stage, proceed to the query/evaluate stage.

# 2) Retrieval and Query
python3 src/run_new.py \
  --config config/lance_query.yaml \
  --msys-config config/monitor/example_config.yaml

To customize your own workload setting, you may refer to the provided config file within config folder. The detailed parameters are listed here.

Performing Analysis

You can check the output result within the output folder. To visualize the output results, run python3 example/monitoring_sys_lib/test_parser.py, the visualized figures will be located within the output.

Supported RAG Pipeline Modules

Vector Databases

RAGPerf supports many popular vector databases. To set up, check the detailed documentations at VectorDB README.

Want to add a new DB? Check our RAGPerf API at VectorDB API. This benchmark suit can automatically perform profiling and analysis on your vector database after implementing these APIs.

Monitoring System

Examples of how to use the monitoring system are documented in example/monitoring_sys_lib. Detailed documentations at MonitoringSystem README.

About

An End-to-End Benchmarking Framework for Retrieval-Augmented Generation Systems

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •