Skip to content

Feature Request: New Image Format for Computer Vision Datasets #1

@VoxleOne

Description

@VoxleOne

Motivation

To create a new image format or adapt an existing one to improve archival and data management for computer vision datasets. The core idea is to embed rich metadata directly within the image file, specifically catering to the needs of computer vision applications.

Key Features

The proposed image format should support:

  1. Standard Image Information: Basic image metadata comparable to EXIF data in JPEGs (e.g., resolution, color depth, creation date).
  2. Custom Tags: Flexible, user-defined tags for general-purpose annotation or categorization.
  3. Dedicated Computer Vision Dataset Metadata Storage:
    • Object Classes: Ability to store a list of defined object classes relevant to the image.
    • Bounding Box Coordinates: Storage for bounding box coordinates (e.g., x, y, width, height) for multiple objects within the image. This should be designed to handle a potentially large number of bounding boxes or complex annotation scenarios, possibly offering a higher capacity or more efficient storage than existing sidecar file methods.
  4. Extensibility for Other Metadata (To Be Explored):
    • Author/Creator information.
    • Camera settings or image acquisition parameters.
    • AI model details (e.g., model used for pre-annotation, version).
    • GPS coordinates or other location data.

Benefits

  • Streamlined Workflows: Simplifies the management of image data and its corresponding annotations by keeping them together, reducing the need for separate annotation files (like XML, JSON, or CSVs) and the risk of them becoming desynchronized.
  • Improved Data Integrity: Ensures that metadata, especially crucial bounding box and class information, is intrinsically linked to the image.
  • Enhanced Portability: Makes datasets easier to share and transfer, as all necessary information is contained within a single file type.
  • Potential for Advanced Functionality: Could support more complex annotation types or a higher density of information than currently feasible with separate metadata files.

Use Case Example

For a dataset used to train an object detection model, each image file would directly contain:

  • The pixel data.
  • A list of all objects present (e.g., "car", "pedestrian", "cyclist").
  • The bounding box coordinates for every instance of each object in the image.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions