This repository provides a Dockerized environment for running HESPI (HErbarium Specimen sheet PIpeline), a Python application designed for processing herbarium specimen images. This setup facilitates easy deployment and execution of HESPI, including support for GPU acceleration. The original HESPI repository can be found at https://github.com/rbturnbull/hespi.
- Containerized HESPI: Run HESPI in an isolated and reproducible Docker container.
- CPU and GPU Support: Easily switch between CPU and GPU configurations using Docker Compose profiles.
- Robust Job Submission: A powerful Slurm submission script with robust error handling, pre-flight checks, and flexible configuration.
- Volume Mounting: Seamlessly mount local input, output, and model directories for data processing.
- Reproducible Builds: Pinned dependencies to ensure that builds are reproducible.
This project uses a Makefile to simplify the build process for both Docker and Singularity/Apptainer images.
make all: Build both CPU and GPU Docker images and download models (default).make build-cpu: Build the CPU Docker image.make build-gpu: Build the GPU Docker image.make sif-cpu: Build the CPU Singularity/Apptainer image.make sif-gpu: Build the GPU Singularity/Apptainer image.make models: Download and prepare the models.make clean: Remove all build artifacts and models.make help: Show the help message.
1. Build a CPU-only image:
make build-cpu2. Build a GPU-enabled image:
make build-gpu3. Build a GPU-enabled Singularity image:
make sif-gpuThe docker-compose.yml file is configured to run HESPI using an image that can be specified with the HESPI_IMAGE environment variable.
-
Place Input Files: Add your image files into the
hespi-input/directory. -
Start HESPI:
- For GPU:
HESPI_IMAGE=hespi-gpu:0.6.1 docker compose run --rm hespi bash
- For CPU:
HESPI_IMAGE=hespi-cpu:0.6.1 docker compose run --rm hespi bash
- For GPU:
-
Execute HESPI command: Once inside the container's shell, you can run
hespiwith the desired arguments. For example:hespi hespi-input/your_image.tif --output-dir hespi-output/
For HPC environments, you can use the Makefile to build Singularity/Apptainer images.
# Build the GPU-enabled Singularity image
make sif-gpu
# Or, for the CPU version
make sif-cpuThe submit_hespi_job.sh script simplifies running HESPI jobs on Slurm-based HPC clusters.
Prerequisites:
- A
hespi-gpu.sifimage file. - A
hespi-models/directory with the model weights. - An
image_list.txtfile containing the absolute paths of the images to process (one per line).
image_list.txt example:
/path/to/your/images/image1.tif
/path/to/your/images/image2.jpg
Submitting the Job:
The submit_hespi_job.sh script accepts several command-line arguments to configure the job.
./submit_hespi_job.sh --account YOUR_SLURM_ACCOUNT [OPTIONS] [HESPI_EXTRA_ARGS...]Command-line arguments:
-i, --input-dir DIR: Host input directory (default:./hespi-input).-o, --output-dir DIR: Host output directory (default:./hespi-output).-m, --model-dir DIR: Host model directory (default:./hespi-models).-l, --image-list FILE: Image list file (default:image_list.txt).-s, --sif-file FILE: Singularity/Apptainer image file (default:hespi-gpu.sif).-a, --account STR: Your Slurm account (required).-g, --gpu-type STR: GPU type (default:A100).--llm-model STR: LLM model name to use withhespi.-h, --help: Display the help message.HESPI_EXTRA_ARGS: Additional arguments to pass to thehespicommand.
Example:
# Export your OpenAI API key if you are using an LLM model
export OPENAI_API_KEY="sk-..."
# Submit the job
./submit_hespi_job.sh \
--account naiss2023-22-123 \
--llm-model gpt-4 \
-- --some-hespi-flag --another-hespi-argThe HESPI container is configured using environment variables in the docker-compose.yml file. These variables point to the locations of the model weights.
HESPI_SHEET_COMPONENT_WEIGHTSHESPI_LABEL_FIELD_WEIGHTSHESPI_PRIMARY_SPECIMEN_LABEL_CLASSIFIER_WEIGHTS