Skip to content

Conversation

@ESultanik
Copy link
Collaborator

Summary

  • Defer importing the pdf module (and pdfminer) until a PDF file is actually matched
  • Uses a lazy parser wrapper (_LazyPDFParser) that registers immediately but only imports pdf.py on first use
  • The pdfminer library imports cryptography and other heavy modules, adding ~0.5s to import time

Implementation

  • Add _LazyPDFParser class in __init__.py that wraps the actual PDF parser
  • Remove @register_parser decorator from pdf.py (registration is now done lazily in __init__.py)
  • Parser is registered at import time, but the actual pdf module import is deferred

Performance Results

Metric Before After Improvement
Import time 527ms 380ms 28% faster
pdfminer loaded at import Yes No Deferred

Test plan

  • Run pytest tests/test_magic.py tests/test_pdf.py - same test results as baseline
  • Verify pdfminer is not loaded after import polyfile
  • Verify pdfminer is loaded when matching PDF files

🤖 Generated with Claude Code

@ESultanik ESultanik force-pushed the perf/lazy-pdfminer-import branch 2 times, most recently from 9a6022b to 795e25a Compare January 20, 2026 22:02
Defer importing the pdf module (and pdfminer) until a PDF file is
actually matched. This is done via a lazy parser wrapper that registers
immediately but only imports the actual pdf module on first use.

The pdfminer library imports many submodules (cryptography, etc.) which
adds ~0.5s to import time. Most files aren't PDFs, so deferring this
import improves startup time for the common case.

Performance improvement:
- pdfminer no longer loaded at import time
- Import time reduced by ~28% (measured 527ms → 380ms in cached runs)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@ESultanik ESultanik force-pushed the perf/lazy-pdfminer-import branch from 795e25a to 8db8471 Compare January 20, 2026 22:15
@ESultanik ESultanik merged commit f8d1e2b into master Jan 20, 2026
10 checks passed
@ESultanik ESultanik deleted the perf/lazy-pdfminer-import branch January 20, 2026 22:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants