A comprehensive GraphRAG (Graph Retrieval-Augmented Generation) implementation that transforms documents into knowledge graphs and enables intelligent querying through natural language processing.
Common Use Cases: Research papers analysis, legal document review, Medical documentation Q&A, educational content exploration, and enterprise document intelligence.
GraphRAG Studio is a complete solution for creating, storing, and querying knowledge graphs from your documents. It combines the power of Large Language Models (LLMs) with Neo4j graph databases to provide an interactive platform for document-based question answering with enhanced context understanding.
- Multi-LLM Support: Compatible with OpenAI GPT-4 and Anthropic Claude models
- Document Processing: PDF document parsing and intelligent chunking
- Knowledge Graph Creation: Automatic entity and relationship extraction with LLM
- Neo4j Integration: Persistent graph storage with enhanced schema support
- Interactive Web Interface: Streamlit-based user-friendly application
- Graph Visualization: Interactive network visualization with PyVis
- Natural Language Querying: Cypher query generation from natural language
- GraphTransformer: Converts documents to graph structures
- Neo4jGraph: Database interface and query execution
- GraphCypherQAChain: Natural language to Cypher translation
- Visualize: Graph rendering and visualization utilities
GraphRAG/
├── graphrag_studio_app.py # Custom GraphRAG-studio-ready application
├── requirements.txt # Python dependencies
├── app/
│ └── main.py # query ready GraphRAG app
├── backend/
│ ├── graph_transformer.py # Graph transformation logic
│ └── graph_query.py # Graph querying logic
├── test_notebooks/
│ ├── Graphrag_pdf.ipynb # PDF data processing notebook
│ ├── Graphrag_table.ipynb # Table data processing notebook
├── tests/
│ └── test_viz.py # Visualization tests
└── utils/
├── visualizer.py # Graph visualization utilities
└── retriver_visualizer.py # Retrieval visualization
-
Launch the application
streamlit run graphrag_studio_app.py
-
Configure your setup
- Select your preferred LLM provider (OpenAI or Anthropic)
- Enter your API key
- Connect to your Neo4j database
-
Process your documents
- Upload a PDF document
- Create and store graph documents
- Start querying your knowledge graph!
- Upload PDF documents through the web interface
- Documents are automatically chunked using RecursiveCharacterTextSplitter
- Optimal chunk size: 1200 characters with 40-character overlap
- Choose between OpenAI GPT-4 or Anthropic Claude models
- Graph transformer extracts entities and relationships
- Nodes and relationships are automatically identified and structured
- Graphs are stored in Neo4j with enhanced schema support
- Persistent storage enables complex queries and relationships
- Source document information is preserved
- Ask questions in plain English
- System generates appropriate Cypher queries
- Returns contextual answers based on graph relationships
-
Clone the repository
git clone <repository-url> cd GraphRAG
-
Install Python dependencies
pip install -r requirements.txt
-
Set up Neo4j Database
- Create a Neo4j account at Neo4j AuraDB or Neo4j local instance
- Create a new database instance
- Note down the connection URI, username, and password
-
Set up environment variables Create a
.envfile in the root directory:ANTHROPIC_API_KEY=your_anthropic_api_key_here OPENAI_API_KEY=your_openai_api_key_here NEO4J_URI=neo4j+s://your-neo4j-url NEO4J_USERNAME=neo4j NEO4J_PASSWORD=your_neo4j_password