Pyllo ingests clay-science literature, builds a local vector store, and answers questions with citations.
pip install -e .
pyllo ingest # expects PDFs in data/literature/
export OPENAI_API_KEY="sk-..." # or configure another backend
pyllo query "How do montmorillonite layers swell?"- Default (
litellm) – works with any provider litellm supports (OpenAI, Anthropic, Azure, local vLLM, etc.). Export the provider’s API key(s) before running queries. - CBORG (Berkeley Lab) – OpenAI-compatible gateway for lab-hosted models:
export CBORG_API_KEY="cborg-..."
# Optional overrides (defaults already match these values):
export PYLLO_MODEL__PROVIDER=cborg
export PYLLO_MODEL__MODEL="gpt-5"
export PYLLO_MODEL__API_KEY_ENV=CBORG_API_KEY
export PYLLO_MODEL__API_BASE="https://api.cborg.lbl.gov"
export PYLLO_MODEL__MAX_TOKENS=128000Discover current model identifiers:
pyllo cborg-models --show-detailspyllo ingest– process PDFs and updatestorage/vectorstore/.pyllo query "…"– ask the clay expert, printing an answer plus retrieved context.pyllo cborg-models --show-details– list CBORG models and their API names.pyllo minerals-download --mineral montmorillonite– fetch Crossref manuscripts for minerals indata/minerals/.pyllo structures-download --mineral Quartz --limit 1– pull experimental (RRUFF) and simulated (Materials Project) CIF files intodata/structure/(simulated files include the MP material id in the filename).
data/literature/– source PDFs (with optionaldata/literature_metadata.jsonl).data/minerals/– mineral datasets; downloads land indata/minerals/manuscripts/.docs/– design notes and user guide.pyllo/– Python package (ingestion, retrieval, CBORG utilities, CLI).storage/– generated vector store (created after ingestion).
- Expand
data/literature/and metadata coverage for better recall. - Add evaluation notebooks to benchmark retrieval + generation quality.
- Plug in additional ingestion transforms or alternative embedding models.
- Explore local LLM endpoints via litellm for fully disconnected workflows.