A universal web interface for real-time Vision Language Model interaction and benchmarking.
Stream your webcam to any VLM and get live AI-powered analysis - perfect for testing models, benchmarking performance, and exploring vision AI capabilities across multiple domains and hardware platforms.
🎥 Install + Demo Video (Click to expand)
Watch the demo: See Live VLM WebUI in action with webcam image with real-time AI analysis
2025-11-13.19-00-53.mp4
Tip
⭐ If you find this project useful, please consider giving it a star! It helps others discover this tool and motivates us to keep improving it. Thank you for your support! 🙏
For PC, Mac, DGX, and Jetson systems:
pip install live-vlm-webui
live-vlm-webuiAccess the WebUI: Open https://localhost:8090 in your browser
Platforms supported:
- ✅ Linux PC (x86_64)
- ✅ DGX Spark (ARM64)
- ✅ macOS (Apple Silicon)
- ✅ Windows (via WSL2) - need to run Ollma on WSL. See Windows WSL Setup Guide
⚠️ Jetson (Orin, Thor) - pip works but Docker is simpler. See Jetson Quick Start below
Important
Requires JetPack 6.x (Python 3.10+) or JetPack 7.0 (Python 3.12). JetPack 5.x has Python 3.8 which is not supported - use Docker or upgrade.
For all Jetson platforms (Orin, Thor):
# Clone the repository
git clone https://github.com/nvidia-ai-iot/live-vlm-webui.git
cd live-vlm-webui
# Run the auto-detection script (interactive mode)
./scripts/start_container.sh
# Or specify a version
./scripts/start_container.sh --version 0.2.0The script auto-detects your platform, lets you choose a version, and starts the appropriate Docker container.
Access the WebUI: Open https://localhost:8090 in your browser
📘 Full Docker Guide: docs/setup/docker.md Includes manual commands, troubleshooting, network modes, and more.
Platforms supported:
- ✅ Linux PC (x86_64)
- ✅ DGX Spark (ARM64)
⚠️ macOS (Docker can't access localhost - use pip install instead)- ❓ Windows WSL2 (Docker container not tested)
- ✅ Jetson (Orin, Thor) - works great
For Jetson AGX Orin and Jetson Orin Nano (JetPack 6.x / r36.x):
# Install dependencies
sudo apt install openssl python3-pip
# Install jetson-stats for GPU monitoring (optional but recommended)
# Note: Use --break-system-packages if on newer JetPack with Python 3.12
sudo pip3 install -U jetson-stats
# Install the package
python3 -m pip install --user live-vlm-webui
# Add to PATH (one-time setup)
echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc
# Run it
live-vlm-webuiFor Jetson Thor (JetPack 7.0 / r38.2+):
# Install dependencies
sudo apt install openssl pipx
# Ensure PATH for pipx
pipx ensurepath
source ~/.bashrc
# Install the package using pipx (required for Python 3.12)
pipx install live-vlm-webui
# Install jetson-stats for GPU monitoring (from GitHub - PyPI version doesn't support Thor yet)
# Step 1: Install system-wide for the jtop service
sudo pip3 install --break-system-packages git+https://github.com/rbonghi/jetson_stats.git
sudo jtop --install-service
# Step 2: Inject into pipx environment so live-vlm-webui can use it
pipx inject live-vlm-webui git+https://github.com/rbonghi/jetson_stats.git
# Step 3: Reboot for jtop service permissions to take effect
sudo reboot
# After reboot, run it
live-vlm-webuiWarning
Ollama 0.12.10 incompatible with Jetson Thor (JetPack 7.0)
Ollama version 0.12.10 does not work on Thor - GPU inference will fail.
Solution: Use Ollama 0.12.9 or earlier:
curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION=0.12.9 shSee troubleshooting guide for details and alternatives.
Note
Jetson Thor GPU Monitoring: Thor support is in the latest jetson-stats on GitHub but not yet released to PyPI.
Why two installations?
- System-wide (
sudo pip3) - Runs the jtop background service - Pipx environment (
pipx inject) - Allows live-vlm-webui to access jtop data
The reboot ensures proper socket permissions for jtop access.
Tip
Jetson Thor (Python 3.12): Use pipx instead of pip due to PEP 668 protection.
pipx automatically creates isolated environments for applications.
GPU Monitoring: Installing jetson-stats enables proper hardware detection and GPU/VRAM monitoring.
Access the WebUI: Open https://localhost:8090 in your browser
Benefits of pip install:
- ✅ Editable code for development
- ✅ Direct access to logs and debugging
- ✅ No container overhead
- ✅ Fine-grained control
Note: pip installation requires platform-specific setup steps. For production or simpler setup, use Docker (Option 1).
Once the server is running, access the web interface at https://localhost:8090
| 1️⃣ Click "Advanced" button | 2️⃣ Click "Proceed to localhost (unsafe)" | 3️⃣ Allow camera access when prompted |
|---|---|---|
![]() |
![]() |
![]() |
Left Sidebar Controls:
- Set API Base URL, API Key, and Model
- 🔄 Refresh models button - Auto-detect available models
- ➕ Download button (coming soon)
- Dropdown menu lists all detected cameras
- Switch cameras on-the-fly without restarting
- START/STOP buttons for analysis control
- Frame Interval: Process every N frames (1-3600)
- Lower (5-30) = more frequent, higher GPU usage
- Higher (60-300) = less frequent, power saving
- 10+ preset prompts (scene description, object detection, safety, OCR, etc.)
- Write custom prompts
- Adjust Max Tokens for response length (1-4096)
Main Content Area:
- Model name and inference latency metrics ⏱️
- Current prompt display (gray box)
- Generated text output
- mirror toggle button 🔄
- System info: Hardware name with hostname with GPU info
- GPU utilization and VRAM with progress bars
- CPU and RAM stats
- Sparkline graphs
Header:
- Connection Status - WebSocket connectivity indicator
- ⚙️ Settings - Advanced configuration modal (WebRTC, latency thresholds, debugging)
- 🌙/☀️ Theme Toggle - Switch between Light/Dark modes
For developers, contributors, and those who want full control:
# 1. Clone the repository
git clone https://github.com/nvidia-ai-iot/live-vlm-webui.git
cd live-vlm-webui
# 2. Create virtual environment
python3 -m venv .venv
source .venv/bin/activate
# 3. Upgrade pip and install in editable mode
pip install --upgrade pip setuptools wheel
pip install -e .
# 4. Start the server (SSL certs auto-generate)
./scripts/start_server.shAccess the WebUI: Open https://localhost:8090
Benefits of source installation:
- ✅ Make code changes that take effect immediately (editable install)
- ✅ Access to development tools and scripts
- ✅ Works on macOS (unlike Docker which doesn't support webcam)
- ✅ Full debugging capabilities
Platforms tested:
- ✅ Linux (x86_64) - fully tested
- ✅ DGX Spark (ARM64) - fully tested
- ✅ Jetson Thor - fully tested
- ✅ Jetson Orin - fully tested
- ✅ macOS (Apple Silicon) - fully tested
⚠️ Windows - WSL2 recommended, native Windows requires additional setup (FFmpeg, build tools)
Tip
For Jetson, we recommend Docker for production use. Source installation works but requires:
sudo apt install python3.10-venv and careful pip management to avoid JetPack conflicts.
Choose the VLM backend that fits your needs:
📖 Looking for specific models? See the complete List of Vision-Language Models across all providers.
| Backend | Setup Difficulty | Model Coverage | GPU Required |
|---|---|---|---|
| Ollama ✅ | 🟢 Easy | 14+ vision models (link) | 🏠 Yes (local) |
| vLLM |
🔴 Varies (works best on PC) | Widest HF model support | 🏠 Yes (local) |
| NVIDIA NIM |
🟡 Medium | Limited VLM selection (improving) | 🏠 Yes (local) |
| NVIDIA API Catalog ✅ | 🟢 Easy | 12+ hosted VLMs | ☁️ No (cloud) |
| OpenAI API |
🟢 Easy | GPT-4o, GPT-4o-mini | ☁️ No (cloud) |
Legend: ✅ Tested |
⚠️ Has auto-detection but not fully validated
# Install from https://ollama.ai/download
# Pull a vision model
ollama pull llama3.2-vision:11b
# Start server
ollama serveBest for: Quick start, easy model management
# Install vLLM
pip install vllm
# Start server
python -m vllm.entrypoints.openai.api_server \
--model meta-llama/Llama-3.2-11B-Vision-Instruct \
--port 8000Best for: Production deployments, high throughput
- Visit NVIDIA API Catalog
- Get API key on build.nvidia.com page.
- Configure in WebUI:
- API Base:
https://integrate.api.nvidia.com/v1 - API Key:
nvapi-YOUR_KEY - Model:
meta/llama-3.2-90b-vision-instruct
- API Base:
Best for: Cloud-based inference, instant access, free API trial usage
📘 Detailed Guide: VLM Backend Setup
For PC, DGX Spark, and Jetson users who want containerized deployment:
# 1. Clone the repository
git clone https://github.com/nvidia-ai-iot/live-vlm-webui.git
cd live-vlm-webui
# 2. Run the auto-detection script
./scripts/start_container.sh
# Or specify a version
./scripts/start_container.sh --version 0.2.0
# List available versions
./scripts/start_container.sh --list-versionsBenefits:
- ✅ No dependency management
- ✅ Isolated environment
- ✅ Works across all platforms (x86_64, ARM64, Jetson)
- ✅ Production-ready
- ✅ Version pinning support
Version Selection:
The script supports multiple ways to select a version:
- Interactive mode: Shows available versions and lets you pick (default)
- Specific version:
--version 0.2.0to pin to a specific release - Latest version:
--version latestor--skip-version-pickfor newest - List versions:
--list-versionsto see all available tags
Available pre-built images:
| Platform | Latest Tag | Versioned Tag Example |
|---|---|---|
| PC (x86_64) / DGX Spark | latest |
0.2.0 |
| Jetson Orin | latest-jetson-orin |
0.2.0-jetson-orin |
| Jetson Thor | latest-jetson-thor |
0.2.0-jetson-thor |
| macOS (testing) | latest-mac |
0.2.0-mac |
Tip
The base tags (latest, 0.2.0) are multi-arch images that automatically select the correct architecture:
linux/amd64for x86_64 PC and DGX systemslinux/arm64for DGX Spark (ARM64 SBSA server)
📘 Detailed Guide: Docker Deployment Guide
For PC and DGX Spark users who want VLM + WebUI in one command:
Tip
start_docker_compose.sh automatically detects your platform, checks Docker installation, and selects the correct profile. Just run it!
Using the launcher script (recommended):
./scripts/start_docker_compose.sh ollama
# Pull a vision model after startup
docker exec ollama ollama pull llama3.2-vision:11bOr manually with docker compose:
docker compose --profile ollama up
# Pull a vision model
docker exec ollama ollama pull llama3.2-vision:11bTip
Backend-centric profiles make it easy: --profile ollama, --profile vllm (future), etc.
Includes:
- ✅ Ollama for easy model management
- ✅ Live VLM WebUI for real-time interaction
- ✅ No API keys required
Tip
Cosmos-Reason1-7B is the default NIM model because it's the only NVIDIA VLM NIM that supports both x86_64 (PC) and ARM64 (DGX Spark, Jetson Thor) architectures. Other NIM models like Llama-3.2-90B-Vision and Nemotron are x86_64-only.
Using the launcher script (recommended):
# Get NGC API Key from https://org.ngc.nvidia.com/setup/api-key
export NGC_API_KEY=<your-key>
./scripts/start_docker_compose.sh nimOr manually with docker compose:
export NGC_API_KEY=<your-key>
docker compose --profile nim upIncludes:
- ✅ NVIDIA NIM serving Cosmos-Reason1-7B with reasoning capabilities
- ✅ Production-grade inference
- ✅ Advanced VLM with planning and anomaly detection
Important
NIM requires NGC API Key and downloads ~10-15GB on first run. Requires NVIDIA driver 565+ (CUDA 12.9 support).
📘 Detailed Guide: Docker Compose Setup Details
- 📖 VLM Backend Setup - Detailed guide for Ollama, vLLM, SGLang, NVIDIA API
- 🤖 List of Vision-Language Models - Comprehensive catalog of VLMs across Ollama, NVIDIA, OpenAI, Anthropic
- 📹 RTSP IP Camera Setup - 🧪 Beta feature for continuous monitoring (tested: Reolink RLC-811A)
- 🐋 Docker Compose Details - Complete stack setup with Ollama or NIM
- 🛠️ Docker Deployment Guide - Complete Docker setup and troubleshooting
- ⚙️ Advanced Configuration - Performance tuning, custom prompts, API compatibility
- 🔨 Building Docker Images - Build platform-specific images for GHCR
- 🧑💻 Contributing Guide - How to contribute to the project
- 🚑 Troubleshooting Guide - Common issues and solutions
- 💬 GitHub Issues - Bug reports and feature requests
- 🌐 NVIDIA Developer Forums - Community support
- 🎥 Multi-source video input
- WebRTC webcam streaming (stable)
- 🧪 RTSP IP camera support (Beta - tested with Reolink RLC-811A)
- 🔌 OpenAI-compatible API - Works with vLLM, SGLang, Ollama, TGI, or any vision API
- 📝 Interactive prompt editor - 10+ preset prompts + custom prompts
- ⚡ Async processing - Smooth video while VLM processes frames in background
- 🔧 Flexible deployment - Local inference or cloud APIs
- 🎨 Modern NVIDIA-themed UI - Professional design with NVIDIA green accents
- 🌓 Light/Dark theme toggle - Automatic preference persistence
- 📊 Live system monitoring - Real-time GPU, VRAM, CPU, RAM stats with sparkline charts
- ⏱️ Inference metrics - Live latency tracking (last, average, total count)
- 🪞 Video mirroring - Toggle button overlay on camera view
- 📱 Compact layout - Single-screen design
- 💻 Cross-platform monitoring - Auto-detects NVIDIA GPUs (NVML), Apple Silicon
- 🖥️ Dynamic system detection - CPU model name and hostname
- 🔒 HTTPS support - Self-signed certificates for secure webcam access
- 🌐 Universal compatibility - PC (x86_64), DGX Spark (ARM64 SBSA), Jetson (Orin, Thor), Mac
- 🏗️ Multi-arch Docker images - Single image works across x86_64 and ARM64 architectures
- 🔒 Security - Real-time monitoring and alert generation
- 🤖 Robotics - Visual feedback for robot control
- 🏭 Industrial - Quality control, safety monitoring, automation
- 🏥 Healthcare - Activity monitoring, fall detection
- ♿ Accessibility - Visual assistance for visually impaired users
- 📚 Education - Interactive learning experiences
- 🎬 Content Creation - Live scene analysis for video production
- 🎮 Gaming - AI game master or interactive experiences
Camera not accessible?
- Use HTTPS (not HTTP):
./scripts/start_server.shor--ssl-cert cert.pem --ssl-key key.pem - Accept the self-signed certificate warning (Advanced → Proceed)
Can't connect to VLM?
- Check VLM is running:
curl http://localhost:8000/v1/models(vLLM) orcurl http://localhost:11434/v1/models(Ollama) - Use
--network hostin Docker for local VLM services
GPU stats show "N/A"?
- PC: Add
--gpus allwhen running Docker - Jetson: Add
--privileged -v /run/jtop.sock:/run/jtop.sock:ro
Slow performance?
- Use smaller model (gemma3:4b instead of gemma3:11b)
- Increase Frame Processing Interval (60+ frames)
- Reduce Max Tokens (50-100 instead of 512)
We ❤️ contributions from the community! This project is built with passion and we'd love your help making it even better.
How you can help:
- ⭐ Star this repo - It really helps us and takes just 1 second!
- 🐛 Report bugs - Found an issue? Let us know
- 💡 Suggest features - Have an idea? Create a feature request
- 🔧 Submit PRs - Code contributions are always welcome!
- 📢 Share it - Tell others about this project
- 📝 Improve docs - Help us make the documentation better
Areas for improvement:
- 📏 Jetson VRAM utilization - Workaround for measuring GPU memory consumption
- ⚡ Hardware-accelerated video processing on Jetson - Use NVENC/NVDEC
- ➕ Model download UI - Ability to initiate backend's model donwload from Web UI
- 📜 Log functionality - Keep the past analysis results viewable
- 🏆 Benchmark mode - Side-by-side model comparison
- 👥 Multi-session support - Support multiple sessions for hosting
See Contributing Guide for details.
Important
⭐ Don't forget to star the repository if you found it helpful! Your support means the world to us and helps demonstrate the value of this work to the community and our organization.
live-vlm-webui/
├── src/
│ └── live_vlm_webui/ # Main Python package
│ ├── __init__.py # Package initialization
│ ├── server.py # Main WebRTC server with WebSocket support
│ ├── video_processor.py # Video frame processing and VLM integration
│ ├── gpu_monitor.py # Cross-platform GPU/system monitoring
│ ├── vlm_service.py # VLM API integration
│ └── static/
│ └── index.html # Frontend web UI
│
├── scripts/ # Bash scripts & utilities
│ ├── start_server.sh # Quick start script with SSL
│ ├── stop_server.sh # Stop the server
│ ├── start_container.sh # Auto-detection Docker launcher
│ ├── stop_container.sh # Stop Docker container
│ ├── start_docker_compose.sh # Docker Compose launcher
│ ├── generate_cert.sh # SSL certificate generation
│ ├── build_multiarch.sh # Multi-arch Docker build
│ └── build_multiarch_cuda.sh
│
├── docker/ # Docker configuration
│ ├── Dockerfile # x86_64 PC / DGX Spark (multi-arch)
│ ├── Dockerfile.jetson-orin # Jetson Orin
│ ├── Dockerfile.jetson-thor # Jetson Thor
│ ├── Dockerfile.jetson # Generic Jetson
│ ├── Dockerfile.mac # macOS (testing)
│ └── docker-compose.yml # Unified stack (Ollama + NIM)
│
├── tests/ # Unit tests
│ └── __init__.py
│
├── prototypes/ # Experimental/prototype scripts (not production)
│ ├── examples.sh
│ ├── test_mac_docker.sh
│ └── test_gpu_monitor_mac.py
│
├── docs/ # Detailed documentation
│ ├── setup/ # Setup guides
│ ├── usage/ # Usage guides
│ ├── development/ # Developer guides
│ └── troubleshooting.md
│
├── pyproject.toml # Modern Python packaging (PEP 621)
├── requirements.txt # Python dependencies
├── requirements-dev.txt # Development dependencies
├── MANIFEST.in # Package data includes
├── README.md # This file
├── CONTRIBUTING.md # Contribution guidelines
└── LICENSE # Apache 2.0 license
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: Apache-2.0
- Built with aiortc - Python WebRTC implementation
- Compatible with vLLM, SGLang, and Ollama
- Inspired by the growing ecosystem of open-source vision language models, including NanoVLM
If you use this in your research or project, please cite:
@software{live_vlm_webui,
title = {Live VLM WebUI: Real-time Vision AI Interaction},
year = {2025},
url = {https://github.com/nvidia-ai-iot/live-vlm-webui}
}Thank you to everyone who has starred this project! Your support drives us to keep improving and innovating. 🚀
Haven't starred yet? Click here to give us a ⭐ — it takes just a second and helps us tremendously!





