Semantic Image Search

This repository contains a prototype for captioning images, embedding text, and serving semantic image search results via an AWS Lambda function. It includes helper scripts for data ingestion, model training, and a minimal Chrome extension.

Repository Layout

backend/ – Data preparation and training scripts.
- alttext-embed.py – Generates captions for images with GPT-4-Vision and writes image embeddings.
- ingest.py – Fetches ArcXP story revisions to build story–image pairs.
- copy_to_rds.py – Bulk loads embeddings into a PostgreSQL database using pgvector.
- train.py – Trains a LightGBM ranking model from the collected data.
lambda/ – AWS Lambda code used by the search API.
- handler.py – Entry point that scores candidate images against a query.
- ranker.py – Downloads the trained model and manages database connections.
extension/ – Simple Chrome extension that queries /suggest for lead-art recommendations.
tests/ – Unit tests for the handler and ranking helper.
data/ – Placeholder directory for generated embeddings and the trained model.

Getting Started

Install dependencies
```
pip install -r backend/requirements.txt
```
Ingest story–image pairs
```
scripts/run_ingest.sh --since 2024-01-01T00:00:00Z
```
Set ARC_API_URL and ARC_TOKEN to connect to the Arc API.
Caption and embed images
```
scripts/run_alttext.sh <image-id> [...]
```
Requires PHOTO_URL and PHOTO_TOKEN for fetching image bytes. Results are written to artifacts/images.csv.

Load embeddings into PostgreSQL

PG_DSN=postgresql://user:pass@host/db python backend/copy_to_rds.py

Train the ranking model
```
scripts/run_train.sh
```
Produces artifacts/leadart_ranker.lgb.
Run tests
```
pytest
```

The Lambda function (lambda/handler.py) uses the trained model to rank candidates stored in the database. The Chrome extension demonstrates how search suggestions can be integrated into Arc Composer.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
backend		backend
common		common
config/terraform		config/terraform
data		data
extension		extension
lambda		lambda
scripts		scripts
tests		tests
.DS_Store		.DS_Store
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Semantic Image Search

Repository Layout

Getting Started

About

Uh oh!

Releases

Packages

Languages

dchiueh/image-search

Folders and files

Latest commit

History

Repository files navigation

Semantic Image Search

Repository Layout

Getting Started

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages