WSDS

wsds merges SQL querying capabilities with native support for multimodal data (speech and video) in a single data format and a unified API. It uses shards for efficiency and to support very-scalable parallel data processing.

wsds has a powerful database query engine integrated into it (built on top of Polars). This makes database-style operations like duplicate detection, group by operations and aggregations very fast and easy to write. This tight integration let's you run both SQL queries and efficient dataloaders directly on your data without any conversion or importing.

Getting Started

# create environment
conda create -n wsds python=3.10
conda activate wsds

# install hume_wsds
pip install https://github.com/HumeAI/wsds.git

Tests

To run tests you currently need a copy of the librilight dataset. The tests can be run with:

WSDS_DATASET_PATH=/path/to/the/librilight/folder python tests.py

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
wsds		wsds
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
tests.py		tests.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

WSDS

Getting Started

Tests

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

HumeAI/wsds

Folders and files

Latest commit

History

Repository files navigation

WSDS

Getting Started

Tests

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages