RAGPerf is an open-source framework designed to benchmark the end-to-end system performance of Retrieval-Augmented Generation (RAG) applications. Built with a fully modular architecture, it offers a user-friendly and highly customizable framework that allows precise measurement of throughput, latency, and scalability across different RAG configurations.
π Holistic System-Centric Benchmarking: RAGPerf moves beyond simple accuracy metrics to profile the performance of RAG systems. It measures end-to-end throughput (QPS), latency breakdowns, and hardware efficiency. This helps developers identify potential bottlenecks throughout the entire pipeline.
π§© Modular Architecture: RAGPerf uses a modular design that abstracts different stages of the RAG pipeline (Embedding, Vector Database, Reranking, and Generation) behind uniform interfaces. Users can seamlessly switch components (e.g., switching underlyinig vector database from Milvus to LanceDB, or change underlying generative model from GPT to Qwen) without rewriting code. This enables detailed performance comparisons between different system settings.
π Detailed Full-Stack Profiling: RAGPerf integrates a lightweight profiler that runs as a background daemon. It captures fine-grained hardware metrics with minimal overhead, including GPU/CPU utilization, memory consumptions (host RAM & GPU VRAM), PCIe throughput, and disk I/O utilization. This allows detailed analysis of resource utilization between RAG components and help identify potential system bottlenecks.
π Simulating Real-World Scenarios: RAGPerf is able to simulate the evolution of real-world knowledge bases by synthesizing updates with a custom and configurable workload generator. The workload generator supports insert, update, and delete requests at different frequency and patterns, allowing users to estimate how data freshness and system performance varies in real systems.
πΌοΈ Multi-Modal Capabilities: RAGPerf supports diverse data modalities beyond plain text. It provides specialized pipelines including Visual RAG (PDFs, Images) using OCR or ColPali visual embeddings, and Audio RAG using ASR models like Whisper. This enables benchmarking of complex, unstructured RAG pipelines.
To run RAGPerf, we highly recommend using an isolated Python environment. You can use a Python virtual environment manager (e.g., venv, conda) to avoid package conflicts. We use conda for demonstrating purposes throughout the documentation.
Conda (recommended)
# Install Miniconda/Mambaforge from the official site if you don't have Conda
conda create -n RAGPerf python=3.10
conda activate RAGPerfExecute the following instructions to install all the dependencies for the project. We use pip-tools to ensure reproducible dependency resolution.
# install pip-compile for python package dependency resolution
python3 -m pip install pip-tools
# generate list of all required python packages
mkdir build && cd build
cmake ..
make generate_py3_requirements
# install the dependencies
python3 -m pip install -r ../requirement.txtRAGPerf uses a custom, low-overhead monitoring daemon. Here is a stripped-down version of the installation procedure (please refer to MonitoringSystem README for detailed instructions and explanations).
Install a C++ 20 compatible compiler in the virtual environment. For example, to install gcc=12.1.0, run
conda install -c conda-forge gcc=12.1.0Run the following commands in the project's build folder.
# enter the python virtual environment
cmake -DCMAKE_BUILD_TYPE=Release ..
make libmsys_pymod -jMake sure you see the file libmsys.cpython-310-x86_64-linux-gnu.so (the exact name could depend on your python version and architecture), this is the cpython module for the monitoring system executable.
RAGPerf provides an Interactive Web UI for ease of use. And of course, you can use the Command Line (CLI) for automation.
Set these once in your shell rc file (e.g., ~/.bashrc or ~/.zshrc) or export them in every new shell.
# Make local "src" importable
export PYTHONPATH="$REPO_ROOT/src${PYTHONPATH+:$PYTHONPATH}"
# Where to cache Hugging Face models (optional, adjust path as needed)
export HF_HOME="/mnt/data/hf_home"Install streamlit and run the RAGPerf client.
# install streamlit
python3 -m pip install streamlit
# run RAGPerf
streamlit run ui_client.pyOpen the UI with the reported url in your web browser, the default url is http://localhost:8501.
To run the benchmark, we first need to set up the vector database (See vectordb for more details). Then, customize your own workload settings with all the available options on the webpage.
In the execute page, click the START BENCHMARK button to execute the workload already configured. You may also want to check if all the configs are set correctly, see here for detailed explanation of different entries in the config file.
Set these environment variables once in your shell rc file (e.g., ~/.bashrc or ~/.zshrc) or export them in every new shell.
# Make local `src` module importable
# set variable REPO_ROOT to correct path to the repo
export PYTHONPATH="$REPO_ROOT/src$PYTHONPATH"
# Where to cache Hugging Face models (optional, adjust path as needed)
export HF_HOME="/mnt/data/hf_home"To run the benchmark, you first need to set up the vector database as the retriever. See vectordb for a supported list and quick setup guide. Change the db_path to your local vector database storage path in config file.
vector_db:
db_path: /mnt/data/vectordbFirst run the preprocess/insert phase to insert the dataset.
# 1) Build/insert into the vector store (LanceDB example)
python3 src/run_new.py \
--config config/lance_insert.yaml \
--msys-config config/monitor/example_config.yamlAfter the insertion stage, proceed to the query/evaluate stage.
# 2) Retrieval and Query
python3 src/run_new.py \
--config config/lance_query.yaml \
--msys-config config/monitor/example_config.yamlTo customize your own workload setting, you may refer to the provided config file within config folder. The detailed parameters are listed here.
You can check the output result within the output folder. To visualize the output results, run python3 example/monitoring_sys_lib/test_parser.py, the visualized figures will be located within the output.
RAGPerf supports many popular vector databases. To set up, check the detailed documentations at VectorDB README.
Want to add a new DB? Check our RAGPerf API at VectorDB API. This benchmark suit can automatically perform profiling and analysis on your vector database after implementing these APIs.
Examples of how to use the monitoring system are documented in example/monitoring_sys_lib. Detailed documentations at MonitoringSystem README.

