- π§ Context-Aware: Uses vector embeddings and semantic search to understand your codebase
- β‘ Fast & Local: Runs locally using Ollama - no API keys, no data leaves your machine
- π― Smart File Inference: Automatically detects which files you want to edit from natural language
- π Tree-Sitter Parsing: Function-level code understanding, not just file-level
- π¬ Interactive Chat: Streaming responses with syntax highlighting
- π File Operations: Create, edit, and modify files with confirmation
- π¨ Beautiful UI: Rich terminal interface with syntax highlighting and themes
-
Install Ollama (for running models locally)
# macOS brew install ollama # Linux curl -fsSL https://ollama.com/install.sh | sh # Windows # Download from https://ollama.com
-
Pull a coding model
# Recommended: Balanced speed and quality ollama pull qwen2.5-coder:3b # Alternative options: # ollama pull qwen2.5-coder:1.5b # Faster, lower quality # ollama pull deepseek-coder:6.7b # Slower, higher quality
# Clone the repository
git clone https://github.com/yttrium400/sage.git
cd sage
# Install dependencies
cd cli
python3 -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
pip install -r requirements.txt
# Run setup wizard
python3 main.py setup# Start interactive chat
python3 main.py chat
# Ask a single question
python3 main.py ask "how does streaming work?"
# Index your codebase
python3 main.py index
# List available models
python3 main.py modelsYou: create a calculator with add, subtract, multiply, divide
Sage: [Creates calc.py with proper functions]
You: add authentication to the project
Sage: [Creates auth.py with authentication logic]
You: edit model.py to add error handling
Sage: [Finds model.py and adds error handling]
You: How does streaming work?
Sage: [Finds generate_response_streaming in model.py]
You: Show me all file operations
Sage: [Finds write_file, read_file functions]
You: What files use the model?
Sage: [Identifies chat.py imports model.py]
- Vector Search: ChromaDB + sentence-transformers for semantic code search
- Code Parsing: Tree-sitter for AST-based Python parsing
- LLM Backend: Ollama for local model inference
- UI: Rich library for beautiful terminal output
- CLI Framework: Click for command-line interface
- Indexing: Tree-sitter parses your Python files into function/class chunks
- Embedding: Each chunk is converted to a 384-dim vector using sentence-transformers
- Storage: Vectors stored in ChromaDB for fast similarity search
- Query: Your question is embedded and matched against code chunks
- Context: Top-K relevant chunks sent to LLM with your query
- Response: LLM generates answer with full codebase understanding
βββββββββββββββββββ
β Your Query β
ββββββββββ¬βββββββββ
β
ββββΊ File Inference (regex patterns)
β
ββββΊ Embedding (sentence-transformers)
β
ββββΊ Vector Search (ChromaDB)
β
ββββΊ Context Assembly
β
ββββΊ Top-K Code Chunks
ββββΊ Import Dependencies
ββββΊ File Metadata
β
βΌ
ββββββββββββββββββββ
β LLM + Context β
ββββββββββββββββββββ
- Indexing: ~5 seconds for typical project (7 files, 40 chunks)
- Query Speed: <300ms per semantic search
- Memory: ~100MB for embedding model + vectors
- Model Size: 1.9GB (qwen2.5-coder:3b)
# Run automated test suite
python3 run_tests.py
# Run manual tests
python3 test_context_awareness.py
# Quick smoke test (5 minutes)
# See docs/QUICK_TEST_REFERENCE.mdTest Coverage:
- β Semantic search accuracy: 91.7%
- β File inference: Keyword and explicit path detection
- β Cross-file understanding: Import dependency tracking
- β Multiple file operations
- Context Awareness - Technical implementation details
- Testing Guide - Comprehensive test scenarios
- Quick Test Reference - Fast 5-minute smoke tests
- Architecture - System design and components
sage/
βββ cli/
β βββ main.py # Entry point (Click commands)
β βββ chat.py # Interactive chat interface
β βββ model.py # Ollama integration
β βββ context.py # Vector search & code parsing
β βββ file_ops.py # File read/write/edit operations
β βββ theme.py # UI themes and styling
β βββ run_tests.py # Automated test runner
β βββ requirements.txt # Python dependencies
βββ docs/ # Documentation
- context.py: Core intelligence - semantic search, file inference, tree-sitter parsing
- chat.py: User interaction - streaming, syntax highlighting, file proposals
- model.py: LLM integration - prompts, streaming, model management
- file_ops.py: File operations - smart code extraction, diff preview, confirmation
- Context-aware semantic search
- Smart file inference from natural language
- Tree-sitter AST parsing
- Interactive chat with streaming
- File creation/editing with confirmation
- Multi-language support (JavaScript, TypeScript, Go)
- VSCode extension
- Incremental indexing (watch mode)
- Custom model fine-tuning for Python
Contributions welcome! Please feel free to submit a Pull Request.
MIT License - see LICENSE file for details
Technologies Used:
- Ollama - Local LLM inference
- ChromaDB - Vector database
- sentence-transformers - Text embeddings
- tree-sitter - Code parsing
- Rich - Terminal UI
Built with β€οΈ for Python developers