Skip to content

dchiueh/image-search

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Semantic Image Search

This repository contains a prototype for captioning images, embedding text, and serving semantic image search results via an AWS Lambda function. It includes helper scripts for data ingestion, model training, and a minimal Chrome extension.

Repository Layout

  • backend/ – Data preparation and training scripts.
    • alttext-embed.py – Generates captions for images with GPT-4-Vision and writes image embeddings.
    • ingest.py – Fetches ArcXP story revisions to build story–image pairs.
    • copy_to_rds.py – Bulk loads embeddings into a PostgreSQL database using pgvector.
    • train.py – Trains a LightGBM ranking model from the collected data.
  • lambda/ – AWS Lambda code used by the search API.
    • handler.py – Entry point that scores candidate images against a query.
    • ranker.py – Downloads the trained model and manages database connections.
  • extension/ – Simple Chrome extension that queries /suggest for lead-art recommendations.
  • tests/ – Unit tests for the handler and ranking helper.
  • data/ – Placeholder directory for generated embeddings and the trained model.

Getting Started

  1. Install dependencies

    pip install -r backend/requirements.txt
  2. Ingest story–image pairs

    scripts/run_ingest.sh --since 2024-01-01T00:00:00Z

    Set ARC_API_URL and ARC_TOKEN to connect to the Arc API.

  3. Caption and embed images

    scripts/run_alttext.sh <image-id> [...]

    Requires PHOTO_URL and PHOTO_TOKEN for fetching image bytes. Results are written to artifacts/images.csv.

  4. Load embeddings into PostgreSQL

    PG_DSN=postgresql://user:pass@host/db python backend/copy_to_rds.py
  5. Train the ranking model

    scripts/run_train.sh

    Produces artifacts/leadart_ranker.lgb.

  6. Run tests

    pytest

The Lambda function (lambda/handler.py) uses the trained model to rank candidates stored in the database. The Chrome extension demonstrates how search suggestions can be integrated into Arc Composer.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published