Skip to content

parallelworks/hpc_status

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HPC Status Monitor

Real-time dashboard for monitoring HPC fleet status, queue health, quota usage, and storage across multiple systems.

Quick Start

./scripts/run.sh

Open http://localhost:8080 to view the dashboard.

Local Mode with PW CLI

Monitor your own clusters locally by authenticating with the Parallel Works CLI:

# Install PW CLI (if not already installed)
pip install pw-client

# Authenticate with your API key
pw auth

# Run the dashboard
./scripts/run.sh

The dashboard will automatically detect your authenticated PW CLI session and collect data from all connected clusters, including:

  • HPC clusters with PBS/Slurm schedulers (queue depth, quota usage)
  • GPU servers (nvidia-smi metrics, utilization, memory)
  • Compute nodes (CPU, memory, load averages)

Your API key can be found in your ACTIVATE account under "API Keys".

Features

  • Fleet Status - Real-time system availability across sites
  • Queue Health - PBS/Slurm queue depth and wait times
  • Quota Usage - Allocation tracking with warnings
  • Storage Monitoring - Disk capacity for home/work/scratch
  • Recommendations - Queue scoring and load balancing suggestions

Configuration

Use a custom config file:

CONFIG_FILE=configs/my-config.yaml ./scripts/run.sh

See docs/configuration.md for all options.

Documentation

Guide Description
Deployment Installation, Docker, ACTIVATE workflow
Configuration YAML options, environment variables
API Reference REST endpoints and responses

Development

# Run tests
./scripts/test.sh

# Direct invocation
python -m src.server.main --config configs/config.yaml

License

See LICENSE file for details.

About

HPCMP Status Site

Resources

License

Stars

Watchers

Forks

Packages

No packages published