ThinkMark

It's open source cursor docs: scrape documentation sites, convert them to LLM-friendy format, and inject it into your LLM system of choice as an MCP tool!

Scrape documentation sites with intelligent crawling
Convert HTML to clean, structured Markdown
Annotate content using LLMs for enhanced metadata
Index documents with vector search capabilities
Query via MCP server for Claude Desktop integration

Quick Start

Installation

From PyPI (Recommended)

# Install stable version from PyPI
pip install thinkmark

# Install with optional dependencies
pip install thinkmark[mcp,dev]

From Source

# Clone and install from source
git clone <repo-url>
cd ThinkMark
uv install

# Install with optional dependencies
uv install --group mcp --group dev

Basic Usage

#Run the init command to set your storage path
thinkmark init

# Process documentation site with full pipeline
thinkmark pipeline https://docs.example.com --vector-index

# Start MCP server for Claude Desktop
thinkmark-mcp stdio

Pipeline Stages

Scrape - Crawl documentation sites and extract HTML
Markify - Convert HTML to clean Markdown
Annotate - Enhance content with LLM-generated metadata
Vector - Create searchable vector indexes

Output Structure

output/{site_name}/
├── _temp_html/         # Raw HTML files
├── content/           # Clean markdown files  
├── annotated/         # LLM-annotated markdown
└── vector_index/      # Vector search index

Configuration

Copy example_config.yaml and customize for your needs. Key settings:

Crawling parameters (depth, delays, filters)
LLM provider configuration
Vector index settings

Development

See docs/DEVELOPMENT.md for detailed development guide.

Testing

uv run pytest
uv run ruff check .

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
docs		docs
scripts		scripts
src/thinkmark		src/thinkmark
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
example_config.yaml		example_config.yaml
pyproject.toml		pyproject.toml
run_mcp.py		run_mcp.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ThinkMark

Quick Start

Installation

From PyPI (Recommended)

From Source

Basic Usage

Pipeline Stages

Output Structure

Configuration

Development

Testing

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

256thFission/ThinkMark

Folders and files

Latest commit

History

Repository files navigation

ThinkMark

Quick Start

Installation

From PyPI (Recommended)

From Source

Basic Usage

Pipeline Stages

Output Structure

Configuration

Development

Testing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages