Skip to content

biodiversitydata-se/hespi-docker

Repository files navigation

HESPI Docker

Build Status License: MIT

This repository provides a Dockerized environment for running HESPI (HErbarium Specimen sheet PIpeline), a Python application designed for processing herbarium specimen images. This setup facilitates easy deployment and execution of HESPI, including support for GPU acceleration. The original HESPI repository can be found at https://github.com/rbturnbull/hespi.

Features

  • Containerized HESPI: Run HESPI in an isolated and reproducible Docker container.
  • CPU and GPU Support: Easily switch between CPU and GPU configurations using Docker Compose profiles.
  • Robust Job Submission: A powerful Slurm submission script with robust error handling, pre-flight checks, and flexible configuration.
  • Volume Mounting: Seamlessly mount local input, output, and model directories for data processing.
  • Reproducible Builds: Pinned dependencies to ensure that builds are reproducible.

Building Images with the Makefile

This project uses a Makefile to simplify the build process for both Docker and Singularity/Apptainer images.

Makefile Targets

  • make all: Build both CPU and GPU Docker images and download models (default).
  • make build-cpu: Build the CPU Docker image.
  • make build-gpu: Build the GPU Docker image.
  • make sif-cpu: Build the CPU Singularity/Apptainer image.
  • make sif-gpu: Build the GPU Singularity/Apptainer image.
  • make models: Download and prepare the models.
  • make clean: Remove all build artifacts and models.
  • make help: Show the help message.

Examples

1. Build a CPU-only image:

make build-cpu

2. Build a GPU-enabled image:

make build-gpu

3. Build a GPU-enabled Singularity image:

make sif-gpu

Usage with Docker Compose

The docker-compose.yml file is configured to run HESPI using an image that can be specified with the HESPI_IMAGE environment variable.

Running HESPI

  1. Place Input Files: Add your image files into the hespi-input/ directory.

  2. Start HESPI:

    • For GPU:
      HESPI_IMAGE=hespi-gpu:0.6.1 docker compose run --rm hespi bash
    • For CPU:
      HESPI_IMAGE=hespi-cpu:0.6.1 docker compose run --rm hespi bash
  3. Execute HESPI command: Once inside the container's shell, you can run hespi with the desired arguments. For example:

    hespi hespi-input/your_image.tif --output-dir hespi-output/

Singularity / Apptainer Support

For HPC environments, you can use the Makefile to build Singularity/Apptainer images.

Building the Singularity Image

# Build the GPU-enabled Singularity image
make sif-gpu

# Or, for the CPU version
make sif-cpu

Running on HPC (Slurm)

The submit_hespi_job.sh script simplifies running HESPI jobs on Slurm-based HPC clusters.

Prerequisites:

  1. A hespi-gpu.sif image file.
  2. A hespi-models/ directory with the model weights.
  3. An image_list.txt file containing the absolute paths of the images to process (one per line).

image_list.txt example:

/path/to/your/images/image1.tif
/path/to/your/images/image2.jpg

Submitting the Job:

The submit_hespi_job.sh script accepts several command-line arguments to configure the job.

./submit_hespi_job.sh --account YOUR_SLURM_ACCOUNT [OPTIONS] [HESPI_EXTRA_ARGS...]

Command-line arguments:

  • -i, --input-dir DIR: Host input directory (default: ./hespi-input).
  • -o, --output-dir DIR: Host output directory (default: ./hespi-output).
  • -m, --model-dir DIR: Host model directory (default: ./hespi-models).
  • -l, --image-list FILE: Image list file (default: image_list.txt).
  • -s, --sif-file FILE: Singularity/Apptainer image file (default: hespi-gpu.sif).
  • -a, --account STR: Your Slurm account (required).
  • -g, --gpu-type STR: GPU type (default: A100).
  • --llm-model STR: LLM model name to use with hespi.
  • -h, --help: Display the help message.
  • HESPI_EXTRA_ARGS: Additional arguments to pass to the hespi command.

Example:

# Export your OpenAI API key if you are using an LLM model
export OPENAI_API_KEY="sk-..."

# Submit the job
./submit_hespi_job.sh \
    --account naiss2023-22-123 \
    --llm-model gpt-4 \
    -- --some-hespi-flag --another-hespi-arg

Configuration

The HESPI container is configured using environment variables in the docker-compose.yml file. These variables point to the locations of the model weights.

  • HESPI_SHEET_COMPONENT_WEIGHTS
  • HESPI_LABEL_FIELD_WEIGHTS
  • HESPI_PRIMARY_SPECIMEN_LABEL_CLASSIFIER_WEIGHTS

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published