OpenContracts (Demo)

Open source document intelligence. Self-hosted, AI-powered, and built for teams who need to own their data.


Backend CI/CD
Meta

What is OpenContracts?

OpenContracts is an AGPL-3.0 licensed platform for document analysis, annotation, and collaboration. It combines document management with AI-powered analysis tools, discussion threads, and structured data extraction.

Core Capabilities

Document Processing — Upload PDFs and text files, automatically extract structure with ML-based parsers
Annotation & Analysis — Highlight, label, and analyze documents with custom annotation schemas
AI Agents — Chat with documents using configurable AI assistants that can search and analyze content
Collaboration — Threaded discussions with @mentions, voting, and moderation at corpus and document levels
Data Extraction — Extract structured data from hundreds of documents using agent-powered queries
Version Control — Track document changes, restore previous versions, soft delete with recovery

Quick Look

Document Annotation

Text Format Support

Structured Data Extraction

Custom Analytics

Features

Document Management

Organize documents into collections (Corpuses) with folder hierarchies
Fine-grained permissions with public/private visibility controls
Document versioning with full history and restore capability
Bulk upload and batch operations

Parsing & Processing

Pluggable parser architecture supporting multiple backends:
- Docling — ML-based structure extraction
- NLM-Ingest — Layout-aware parsing
- Text/Markdown — Simple text extraction
Automatic vector embeddings for semantic search (powered by pgvector)
Structural annotation extraction (headers, paragraphs, tables)

Annotation Tools

Multi-page annotation support
Custom label schemas with validation
Relationship mapping between annotations
Import/export in standard formats

AI & LLM Integration

Built on PydanticAI for structured LLM interactions
Configurable AI agents with tool access (search, document loading, annotation queries)
Real-time streaming responses via WebSocket
Conversation history with context management

Collaboration (New in v3.0.0.b3)

Threaded discussions at global, corpus, and document levels
@mentions for documents, corpuses, and AI agents
Upvoting/downvoting with reputation tracking
Thread pinning, locking, and moderation controls
User profiles with activity feeds and statistics
Badges and achievements for community engagement
Leaderboards showing top contributors

Data Extraction

Define extraction schemas with multiple question types
Run extractions across document collections
Review and validate extracted data in grid view
Export results in structured formats

Documentation

Browse the full documentation at jsv4.github.io/OpenContracts or in the repo:

Guide	Description
Quick Start	Get running with Docker in minutes
Key Concepts	Core workflows and terminology
PDF Data Format	How text maps to PDF coordinates
LLM Framework	PydanticAI integration and agents
Vector Stores	Semantic search architecture
Pipeline Overview	Parser and embedder system
Custom Extractors	Build your own data extraction tasks
v3.0.0.b3 Release Notes	Latest features and migration guide

Architecture

Data Format

OpenContracts uses a standardized format for representing text and layout on PDF pages, enabling portable annotations across tools:

Processing Pipeline

The modular pipeline supports custom parsers, embedders, and thumbnail generators:

Each component inherits from a base class with a defined interface:

Parsers — Extract text and structure from documents
Embedders — Generate vector embeddings for search
Thumbnailers — Create document previews

See the pipeline documentation for details on creating custom components.

Deployment

Quick Start (Development)

git clone https://github.com/JSv4/OpenContracts.git
cd OpenContracts
docker compose -f local.yml up

Production

Run migrations before starting services:

# Apply database migrations
docker compose -f production.yml --profile migrate up migrate

# Start services
docker compose -f production.yml up -d

The migration service runs once to avoid race conditions and ensures all tables are created before dependent services start.

Telemetry

OpenContracts collects anonymous usage data to guide development priorities. We collect:

Installation events (unique installation ID)
Feature usage statistics (analyzer runs, extracts created)
Aggregate counts (documents, users, queries)

We do not collect document contents, extracted data, user identities, or query contents.

Disable with TELEMETRY_ENABLED=False in your settings.

Supported Formats

Currently supported:

PDF (full layout and annotation support)
Text-based formats (plaintext, Markdown)

Coming soon: DOCX viewing and annotation powered by Docxodus, an open source in-browser Word document viewer. This will enable the same annotation and analysis workflows for Word documents that currently exist for PDFs.

Acknowledgements

This project builds on work from:

AllenAI PAWLS — PDF annotation data format and concepts
NLMatics nlm-ingestor — Document parsing pipeline

The data extraction grid UI draws inspiration from NLMatics' innovative approach to document querying:

License

AGPL-3.0 — See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 2,329 Commits
.cursor/rules		.cursor/rules
.envs/.test		.envs/.test
.github		.github
.idea		.idea
.ipython/profile_default		.ipython/profile_default
compose		compose
config		config
docs		docs
fixtures/vcr_cassettes		fixtures/vcr_cassettes
frontend		frontend
locale		locale
model_preloaders		model_preloaders
opencontractserver		opencontractserver
requirements		requirements
scripts		scripts
tools		tools
.codecov.yml		.codecov.yml
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.pylintrc		.pylintrc
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
Claude.md		Claude.md
LICENSE		LICENSE
README.md		README.md
conftest.py		conftest.py
local.yml		local.yml
manage.py		manage.py
merge_production_dotenvs_in_dotenv.py		merge_production_dotenvs_in_dotenv.py
mkdocs.yml		mkdocs.yml
production.yml		production.yml
pytest.ini		pytest.ini
schema.graphql		schema.graphql
schema.json		schema.json
setup.cfg		setup.cfg
setup_codecov.sh		setup_codecov.sh
test.yml		test.yml

Uh oh!

License

Open-Source-Legal/OpenContracts

Folders and files

Latest commit

History

Repository files navigation

OpenContracts (Demo)

What is OpenContracts?

Core Capabilities

Quick Look

Document Annotation

Text Format Support

Structured Data Extraction

Custom Analytics

Features

Document Management

Parsing & Processing

Annotation Tools

AI & LLM Integration

Collaboration (New in v3.0.0.b3)

Data Extraction

Documentation

Architecture

Data Format

Processing Pipeline

Deployment

Quick Start (Development)

Production

Telemetry

Supported Formats

Acknowledgements

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 21

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors 7

Languages

Packages