Skip to content

Everloom-129/pal_video_tool

Repository files navigation

PAL Video Processing Tool

  • Contributors: Tony Wang
  • 09/25/2024

This tool is designed for use in the PAL lab to annotate video, particularly for robotics demos.

Features

  • Object Detection with Text: Collabrating with IDEA-Research, this library use the SOTA Openset Detection Model: Grounding DINO Series to detect objects in the video and overlays text annotations. With API access, this tool is light weight, easy to use and solely built for PAL Lab members.

  • Masking with Grounded SAM: With API access to IDEA-Research, this tool is able to render masks with Grounding SAM to specific objects or regions within the video.

  • Video Masking: With Segment Anything 2, the tool can render masks to specific objects or regions within the video.

  • Fancy Visualization: With Supervision, more advanced visualization techniques to enhance the presentation of video content, to be developed...

  • Depth Estimation: With ml-depth-pro, the tool can estimate the depth of given image sequence and analyze results.

Installation

To install the PAL Video Processing Tool, follow these steps:

  1. Clone the repository:

    git clone https://github.com/Everloom-129/pal_video_tool.git
  2. Navigate to the project directory:

    cd pal_video_tool
  3. Install the required dependencies in conda environment:

    conda create -n pal_video_tool python=3.9
    conda activate pal_video_tool
    pip install -r requirements.txt
    <!-- cd idea-research-api # IDEA-Research's SDK repo Removed with fixed PR now -->
    cd ml-depth-pro
    pip install -e .
  4. Obtain API token from IDEA-Research and set it in the environment variable: Option 1: Temporarily set the API key

    This method sets the API key only for the current terminal session.

    export DDS_CLOUDAPI_TEST_TOKEN='YOUR_API_KEY'

    Option 2: Permanently add the API key to your .bashrc file

    This method ensures the API key is set every time you open a new terminal session.

    echo "export DDS_CLOUDAPI_TEST_TOKEN='2681bf4c'" >> ~/.bashrc

    Reload the terminal to apply the changes:

    source ~/.bashrc

    Note: Remember to replace 'YOUR_API_KEY' with your actual API key.

Usage

Grounding DINO for video

To use video processing tool, run the following command:

python main.py --input <input_video> --output <output_video> --prompts <detection_prompts> #Optional, default output video will be <input_video_name>_pal.mp4

DepthPro for images

run DepthPro for a dir of images

# first set the  input_dir and output_dir in main func
python generate_depth.py 

Compare depth maps from RGBD and DepthPro

python generate_depth.py --input_dir <images_dir> --output_dir <your_dir> --image_name <for rename>
# you can also directly modify the main func for quick update

Please read functions' docstring for more information.

Contributing

Welcome suggestions! Please raise any issue / PR if you are interested in it.

  • Add more scripts for detection
  • Add more scripts for segmentation
  • Add more scripts for visualization

License

This project is licensed under the MIT License. See the LICENSE file for more information.

Dependencies

This project uses ml-depth-pro AND Segment Anything 2 as submodule. After cloning, initialize the submodule:

git submodule update --init --recursive

Acknowledgments

Contact

For any questions or feedback, please contact the PAL lab team at tonyw3@seas.upenn.edu

About

video annotation tool for PAL lab@UPenn

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages