Cross-platform desktop application for extracting content from PDF documents using IBM Docling's AI-powered document understanding.
- Drag & Drop Interface: Simply drag PDF files into the application
- Multiple Export Formats:
- JSON (structured document data)
- Markdown (clean text output)
- CSV/Excel (extracted tables)
- HTML (web-viewable format)
- AI-Powered Extraction: Uses IBM Docling for intelligent layout detection and table extraction
- Cross-Platform: Native apps for Windows and macOS
- GPU Acceleration: Automatic CUDA/MPS support for faster processing
Download the latest release for your platform from the Releases page.
| Platform | File | Description |
|---|---|---|
| Windows (Installer) | PDF_Extractor_Setup_x.x.x.exe |
Standard Windows installer |
| Windows (Portable) | PDF_Extractor_Portable.zip |
No installation required |
| macOS (Intel) | PDF_Extractor_macOS.dmg |
For Intel-based Macs |
| macOS (Apple Silicon) | PDF_Extractor_macOS_ARM.dmg |
For M1/M2/M3/M4 Macs |
Note: On first run, the app downloads AI models (~300MB). This only happens once.
- Download
PDF_Extractor_Setup_x.x.x.exefrom the latest release - Run the installer (if Windows SmartScreen appears, click "More info" → "Run anyway")
- Follow the setup wizard
- Launch PDF Extractor from the Start Menu or desktop shortcut
The portable version requires no installation and can run from any folder or USB drive.
- Download
PDF_Extractor_Portable.zipfrom the latest release - Extract the zip file to any folder (e.g.,
C:\Apps\PDF Extractor\) - Double-click
PDF Extractor.exeto run the application - First run only: If Windows SmartScreen shows "Windows protected your PC":
- Click "More info"
- Click "Run anyway"
Important: Keep the
_internalfolder in the same location asPDF Extractor.exe- the application needs it to run.
- Download the appropriate DMG for your Mac:
- Intel Macs:
PDF_Extractor_macOS.dmg - Apple Silicon (M1/M2/M3/M4):
PDF_Extractor_macOS_ARM.dmg
- Intel Macs:
- Open the DMG file
- Drag PDF Extractor to your Applications folder
- Launch from Applications or Spotlight
Note: If you see "App is damaged" or "unidentified developer" warning, see Troubleshooting below.
- Python 3.10 or higher
- Git
# Clone the repository
git clone https://github.com/danribes/pdf_xtractor.git
cd pdf_xtractor
# Create virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Run the application
python src/main.py# Build executable only
scripts\build_windows.bat
# Build executable + installer
scripts\build_windows.bat installerOutput:
dist\PDF Extractor.exe- Standalone executabledist\PDF_Extractor_Setup_1.0.0.exe- Installer (requires Inno Setup)
# Build .app bundle
./scripts/build_mac.sh
# Build .app + DMG installer
./scripts/build_mac.sh dmg
# Build Universal binary (Intel + Apple Silicon)
./scripts/build_mac.sh universal dmgOutput:
dist/PDF Extractor.app- Application bundledist/PDF_Extractor_1.0.0.dmg- DMG installer
For distribution without requiring internet on first run:
# Download models to local directory
python scripts/download_models.py
# Then build normally - models will be included
./scripts/build_mac.sh dmg # or build_windows.batThis increases the app size by ~300MB but allows fully offline usage.
pdf_xtractor/
├── src/
│ ├── main.py # Application entry point
│ ├── gui.py # PySide6 desktop interface
│ ├── converter.py # Docling processing logic
│ └── config.py # Configuration management
├── build/
│ ├── pdfextractor.spec # PyInstaller configuration
│ ├── installer_windows.iss # Inno Setup script
│ └── version_info.txt # Windows version metadata
├── scripts/
│ ├── build_windows.bat # Windows build script
│ ├── build_mac.sh # macOS build script
│ ├── download_models.py # Pre-download AI models
│ └── create_icons.py # Generate app icons
├── assets/
│ ├── icon.ico # Windows icon
│ ├── icon.icns # macOS icon
│ └── icon.png # Reference icon
├── .github/
│ └── workflows/
│ └── build.yml # CI/CD for automated builds
├── requirements.txt
└── README.md
| Format | Method | Use Case |
|---|---|---|
| JSON | export_to_dict() |
Full document hierarchy for developers |
| Markdown | export_to_markdown() |
Clean text for LLMs or documentation |
| CSV/Excel | table.export_to_dataframe() |
Structured data for analysis |
| HTML | export_to_html() |
Visualizing the document in a browser |
For distribution, sign your executable with a code signing certificate:
signtool sign /f certificate.pfx /p password /t http://timestamp.digicert.com "dist\PDF Extractor.exe"For distribution outside the App Store:
# Sign the app
codesign --deep --force --sign "Developer ID Application: Your Name (TEAM_ID)" "dist/PDF Extractor.app"
# Create signed DMG
codesign --sign "Developer ID Application: Your Name (TEAM_ID)" "dist/PDF_Extractor_1.0.0.dmg"
# Notarize
xcrun notarytool submit dist/PDF_Extractor_1.0.0.dmg \
--apple-id "your@email.com" \
--team-id "TEAM_ID" \
--password "app-specific-password" \
--wait
# Staple the notarization
xcrun stapler staple "dist/PDF_Extractor_1.0.0.dmg"The project includes automated builds via GitHub Actions. To create a release:
- Tag a version:
git tag v1.0.0 - Push the tag:
git push origin v1.0.0 - GitHub Actions will build for all platforms
- Download artifacts from the draft release
You can also manually trigger a build from the Actions tab.
This happens with unsigned apps. Remove the quarantine attribute:
xattr -cr "/Applications/PDF Extractor.app"If behind a firewall, pre-download models and set environment variables:
export HF_HOME=/path/to/models
python scripts/download_models.pyEnsure you have the correct PyTorch version for your GPU:
# For NVIDIA CUDA
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
# For Apple Silicon (MPS)
pip install torch torchvision # MPS support is automatic- Python 3.10+
- docling >= 2.5.0
- PySide6 >= 6.6.0
- pandas >= 2.0.0
- PyInstaller >= 6.0.0 (for building)
MIT License
- IBM Docling - Document understanding AI
- PySide6 - Qt for Python
- PyInstaller - Python application bundling