- Contributors: Tony Wang
- 09/25/2024
This tool is designed for use in the PAL lab to annotate video, particularly for robotics demos.
-
Object Detection with Text: Collabrating with
IDEA-Research, this library use the SOTA Openset Detection Model:Grounding DINO Seriesto detect objects in the video and overlays text annotations. With API access, this tool is light weight, easy to use and solely built for PAL Lab members. -
Masking with Grounded SAM: With API access to
IDEA-Research, this tool is able to render masks withGrounding SAMto specific objects or regions within the video. -
Video Masking: With
Segment Anything 2, the tool can render masks to specific objects or regions within the video. -
Fancy Visualization: With
Supervision, more advanced visualization techniques to enhance the presentation of video content, to be developed... -
Depth Estimation: With
ml-depth-pro, the tool can estimate the depth of given image sequence and analyze results.
To install the PAL Video Processing Tool, follow these steps:
-
Clone the repository:
git clone https://github.com/Everloom-129/pal_video_tool.git
-
Navigate to the project directory:
cd pal_video_tool -
Install the required dependencies in conda environment:
conda create -n pal_video_tool python=3.9 conda activate pal_video_tool pip install -r requirements.txt <!-- cd idea-research-api # IDEA-Research's SDK repo Removed with fixed PR now --> cd ml-depth-pro pip install -e .
-
Obtain API token from
IDEA-Researchand set it in the environment variable: Option 1: Temporarily set the API keyThis method sets the API key only for the current terminal session.
export DDS_CLOUDAPI_TEST_TOKEN='YOUR_API_KEY'
Option 2: Permanently add the API key to your
.bashrcfileThis method ensures the API key is set every time you open a new terminal session.
echo "export DDS_CLOUDAPI_TEST_TOKEN='2681bf4c'" >> ~/.bashrc
Reload the terminal to apply the changes:
source ~/.bashrc
Note: Remember to replace
'YOUR_API_KEY'with your actual API key.
To use video processing tool, run the following command:
python main.py --input <input_video> --output <output_video> --prompts <detection_prompts> #Optional, default output video will be <input_video_name>_pal.mp4# first set the input_dir and output_dir in main func
python generate_depth.py python generate_depth.py --input_dir <images_dir> --output_dir <your_dir> --image_name <for rename>
# you can also directly modify the main func for quick updatePlease read functions' docstring for more information.
Welcome suggestions! Please raise any issue / PR if you are interested in it.
- Add more scripts for detection
- Add more scripts for segmentation
- Add more scripts for visualization
This project is licensed under the MIT License. See the LICENSE file for more information.
This project uses ml-depth-pro AND Segment Anything 2 as submodule. After cloning, initialize the submodule:
git submodule update --init --recursive- DepthPro: Sharp Monocular Metric Depth in Less Than a Second
- DINOX DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding
- Segment Anything 2 Segment Anything 2
For any questions or feedback, please contact the PAL lab team at tonyw3@seas.upenn.edu