Forge - Robotics Data Toolkit

Features

Everything you need for robotics data

One toolkit to convert, inspect, score, filter, segment, and browse robotics datasets.

Format Conversion

Convert between RLDS, LeRobot, Zarr, HDF5, Rosbag, and RoboDM with a single command. Hub-and-spoke architecture means O(n) not O(n²).

$ forge convert hf://lerobot/pusht ./output --format rlds

Dataset Inspection

Auto-detect format, list episodes, cameras, action/state dimensions, FPS, and schema. Works with local paths and HuggingFace URIs.

$ forge inspect hf://lerobot/aloha_sim_cube

Quality Scoring

Score every episode 0-10 with 8 research-backed metrics. Detect jerky demos, dead actions, gripper chatter, and idle periods from proprioception alone.

$ forge quality ./my_dataset --export report.json

Episode Filtering

Filter datasets by quality score, flags, or episode IDs. Supports dry-run previews and pre-computed quality reports.

$ forge filter ./dataset ./filtered --min-quality 6.0

Episode Segmentation

PELT changepoint detection on proprioception signals. Automatically split episodes into sub-skills, regime changes, and idle periods.

$ forge segment ./dataset --label --plot timeline.png

Dataset Registry

Curated catalog of 23+ prominent robotics datasets. Search, filter, and download by name. Use dataset IDs directly in any command.

$ forge inspect droid # resolves via registry

See it in action

Real output from real datasets

Every output below was generated by running Forge on the pusht dataset.

forge inspect

$ forge inspect hf://lerobot/pusht Dataset: lerobot/pusht Format: lerobot-v3 (v3.0) Episodes: 206 Total frames: 25,650 Observation Schema ┏━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━┓ ┃ Field ┃ Type ┃ Shape ┃ ┡━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━┩ │ observation.state │ float32 │ (2,) │ │ next.success │ bool │ (1,) │ │ next.reward │ float32 │ () │ │ next.done │ bool │ () │ └───────────────────┴─────────┴───────┘ Action: float32 (2,) Cameras: image: 96x96 (rgb) FPS: 10 Language: yes (100% coverage) Sample: "Push the T-shaped block onto the T-shaped target."

forge quality

$ forge quality hf://lerobot/pusht Analyzing episodes... ━━━━━━━━━━━━━━━━━━ 206/206 ╭────────── Quality Report: pusht (206 episodes) ──────────╮ │ │ │ Overall Quality Score: 8.5 / 10 │ │ │ │ Smoothness (LDLJ) ███████░░░ 0.75 OK │ │ Dead Actions █████████░ 0.99 OK │ │ Gripper Health ██████████ 1.00 OK │ │ Static Detection ██████████ 1.00 OK │ │ Timestamp Regularity ██████████ 1.00 OK │ │ Action Saturation ████████░░ 0.87 OK │ │ Action Diversity ███░░░░░░░ 0.30 OK │ │ │ ╰───────────────────────────────────────────────────────╯

forge segment

$ forge segment pusht --sample 8 --label --penalty aic Resolved from registry: PushT (lerobot) Format: lerobot-v3 | Signal: observation.state | Penalty: aic Segmentation Results ┏━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Episode ┃ Frames ┃ Segments ┃ Changepoints ┃ Labels ┃ ┡━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━┩ │ episode_000000 │ 161 │ 6 │ 20, 33, 63... │ moving -> fine_m... │ │ episode_000001 │ 118 │ 6 │ 13, 34, 49... │ fine_m -> fine_m... │ │ episode_000002 │ 141 │ 7 │ 12, 27, 56... │ moving -> fine_m... │ │ episode_000003 │ 159 │ 7 │ 28, 42, 60... │ fine_m -> fine_m... │ │ episode_000004 │ 159 │ 8 │ 12, 22, 45... │ moving -> fine_m... │ │ episode_000005 │ 157 │ 6 │ 30, 47, 83... │ moving -> moving... │ │ episode_000006 │ 69 │ 4 │ 14, 46, 57 │ fine_m -> moving... │ │ episode_000007 │ 169 │ 7 │ 12, 43, 59... │ moving -> fine_m... │ └────────────────┴────────┴──────────┴─────────────────┴───────────────────────┘ ╭────────────────── Summary ──────────────────╮ │ Episodes: 8 │ │ Mean segments/episode: 6.38 │ │ Range: 4 — 8 │ │ Total changepoints: 43 │ ╰───────────────────────────────────────────╯

forge filter

$ forge filter ./dataset ./filtered --min-quality 6.0 Filtering episodes... ━━━━━━━━━━━━━━━━━━ 206/206 ┏━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓ ┃ Episode ┃ Score ┃ Flags ┃ Status ┃ ┡━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩ │ episode_000000 │ 8.7 │ │ KEEP │ │ episode_000001 │ 9.1 │ │ KEEP │ │ episode_000002 │ 8.4 │ │ KEEP │ │ episode_000003 │ 5.2 │ jerky, hesitant │ EXCL │ │ episode_000004 │ 7.8 │ │ KEEP │ │ episode_000005 │ 3.1 │ mostly_static │ EXCL │ │ episode_000006 │ 8.9 │ │ KEEP │ │ ... 199 more episodes ... │ └────────────────┴───────┴────────────────────┴────────┘ ╭────────────── Filter Results ──────────────╮ │ │ │ Episodes kept: 189 / 206 │ │ Episodes excluded: 17 │ │ Written to: ./filtered/ │ │ │ ╰─────────────────────────────────────────────╯

forge segment pusht --label --plot timeline.png — semantic phase labels via proprioception

Format	Read	Write	Visualize	Notes
RLDS	✓	✓	✓	Open-X, TensorFlow Datasets
LeRobot v2/v3	✓	✓	✓	HuggingFace, Parquet + MP4
GR00T	✓	—	✓	NVIDIA Isaac, LeRobot v2 with embodiment metadata
RoboDM	✓	✓	✓	Berkeley's .vla format, up to 70x compression
Zarr	✓	—	✓	Diffusion Policy, UMI
HDF5	✓	—	✓	robomimic, ACT/ALOHA
Rosbag	✓	—	✓	ROS1 .bag, ROS2 MCAP

Everything you need for robotics data

Real output from real datasets

Hub-and-spoke, not N×M

Format support matrix

8 research-backed metrics

23+ curated robotics datasets

Up and running in 60 seconds