This script imports a Hugging Face LeRobot dataset into Roboto, using one of two modes:
-
import: Mirrors the dataset files from Hugging Face to your configured S3 bucket, then imports them to Roboto. This is ideal for large datasets or when using your own cloud storage. -
upload: Downloads the dataset files from Hugging Face and uploads them directly to Roboto. Use this for quick uploads without S3 involvement.
When using import mode, Roboto does not copy the source files β it simply records their location in your S3 bucket so they can be ingested in place.
- Python β₯ 3.10
- A Roboto account
- A Hugging Face account (if using private datasets)
- Environment variables set via
.env.localfile (see below)
Additional requirements for import mode:
- AWS CLI installed and available in your path
- AWS S3 bucket connected to your Roboto account
- Contact us to configure this
-
Clone this package:
git clone https://github.com/roboto-ai/leroboto.git cd leroboto -
Create a virtual environment:
python3 -m venv .venv source .venv/bin/activate -
Install dependencies:
pip install roboto python-dotenv
-
Install
lerobotfrom source (required forLeRobotDataset):git clone https://github.com/huggingface/lerobot.git cd lerobot pip install -e .
Create a .env.local file in the root directory of this repo.
Populate it by copying the provided example in .env.example:
# =============================
# Required for both modes
# =============================
# Hugging Face token
# Create one here: https://huggingface.co/settings/tokens
HF_TOKEN=hf_...
# Roboto token
# Create one here: https://app.roboto.ai/settings/tokens
ROBOTO_API_KEY=roboto_pat_... # Optional if you already have: `~/.roboto/config.json`
ROBOTO_ORG_ID=og_... # Optional if you're just in 1 org
# =============================
# Required for import mode only
# =============================
# S3 bucket to mirror HF dataset to
S3_BUCKET=your-s3-bucket
S3_PREFIX=hf-mirror # Optional, defaults to 'hf-mirror'
# Read/write credentials to S3 bucket
AWS_ACCESS_KEY=...
AWS_SECRET_KEY=...
# Optional session token (if using temporary credentials)
AWS_SESSION_TOKEN=...This file configures:
- Hugging Face access for downloading datasets
- Roboto access for importing or uploading datasets
- AWS S3 access for uploading mirrored content
For import mode, your AWS credentials will need read/write permissions to sync files to the S3 bucket. Here's an example policy you can use for constrained access:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Resource": "arn:aws:s3:::your-s3-bucket"
},
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:GetObjectAcl",
"s3:PutObject",
"s3:PutObjectAcl",
"s3:DeleteObject"
],
"Resource": "arn:aws:s3:::your-s3-bucket/*"
}
]
}python leroboto.py {import,upload} <huggingface_dataset_id> [<roboto_dataset_id>]If a Roboto dataset ID is not provided, a new dataset will be created automatically.
-
import: Mirrors the dataset files from Hugging Face to your configured S3 bucket, then imports them to Roboto. This is ideal for large datasets or when using your own cloud storage. -
upload: Downloads the dataset files from Hugging Face and uploads them directly to Roboto. Use this for quick uploads without S3 involvement.
Create a new Roboto dataset and import files into it:
python leroboto.py import lerobot/utokyo_xarm_bimanual
Import files into an existing Roboto dataset:
python leroboto.py import lerobot/utokyo_xarm_bimanual ds_abc123xyz
Upload files directly to Roboto without using your own S3 bucket:
python leroboto.py upload lerobot/utokyo_xarm_bimanual
βοΈ Starting mirror and import process:
HF Dataset: lerobot/utokyo_xarm_bimanual
Roboto Dataset: Will create new dataset
π Creating new Roboto dataset for lerobot/utokyo_xarm_bimanual...
β
Created dataset: ds_tdnowr3uirfa
β¬οΈ Downloading lerobot/utokyo_xarm_bimanual...
β
Downloaded files from Hugging Face in 0.4s
π Syncing data to s3://your-s3-bucket/hf-mirror/lerobot/utokyo_xarm_bimanual
β
Synced files to S3 in 0.8s
π§Ύ Preparing Roboto import batch...
π€ Importing 145 files to Roboto: ds_tdnowr3uirfa
π¦ Processing chunk 1/1 (145 files)...
β
Chunk 1 imported successfully (145 files)
β
Process complete β 145 total files in Roboto.
β¬οΈ Uploading manifest.txt to Roboto.
β
Uploaded manifest.txt to ds_tdnowr3uirfa.
π View dataset: https://app.roboto.ai/datasets/ds_tdnowr3uirfa
- Paths like
.cache/,.lock,.metadata, and.gitignoreare excluded from upload. - Roboto import occurs in chunks of 500 files.
- A
manifest.txtis uploaded to Roboto to signal upload is complete so ingestion can begin.