The normalization layer for robotics data. Convert, inspect, visualize, score, and discover datasets across every major format.

RLDS  ===\          /===> LeRobot
Zarr  ===| Episode/Frame |===> RoboDM
HDF5  ===/          \===> RLDS
View on GitHub Browse Datasets
$ pip install -e ".[all]"

Everything you need for robotics data

One toolkit to convert, inspect, score, filter, segment, and browse robotics datasets.

Format Conversion
Convert between RLDS, LeRobot, Zarr, HDF5, Rosbag, and RoboDM with a single command. Hub-and-spoke architecture means O(n) not O(n²).
$ forge convert hf://lerobot/pusht ./output --format rlds
Dataset Inspection
Auto-detect format, list episodes, cameras, action/state dimensions, FPS, and schema. Works with local paths and HuggingFace URIs.
$ forge inspect hf://lerobot/aloha_sim_cube
Quality Scoring
Score every episode 0-10 with 8 research-backed metrics. Detect jerky demos, dead actions, gripper chatter, and idle periods from proprioception alone.
$ forge quality ./my_dataset --export report.json
Episode Filtering
Filter datasets by quality score, flags, or episode IDs. Supports dry-run previews and pre-computed quality reports.
$ forge filter ./dataset ./filtered --min-quality 6.0
Episode Segmentation
PELT changepoint detection on proprioception signals. Automatically split episodes into sub-skills, regime changes, and idle periods.
$ forge segment ./dataset --label --plot timeline.png
Dataset Registry
Curated catalog of 23+ prominent robotics datasets. Search, filter, and download by name. Use dataset IDs directly in any command.
$ forge inspect droid # resolves via registry

Real output from real datasets

Every output below was generated by running Forge on the pusht dataset.

forge inspect
$ forge inspect hf://lerobot/pusht Dataset: lerobot/pusht Format: lerobot-v3 (v3.0) Episodes: 206 Total frames: 25,650 Observation Schema ┏━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━┓ Field Type Shape ┡━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━┩ observation.state float32 (2,) next.success bool (1,) next.reward float32 () next.done bool () └───────────────────┴─────────┴───────┘ Action: float32 (2,) Cameras: image: 96x96 (rgb) FPS: 10 Language: yes (100% coverage) Sample: "Push the T-shaped block onto the T-shaped target."
forge quality
$ forge quality hf://lerobot/pusht Analyzing episodes... ━━━━━━━━━━━━━━━━━━ 206/206 ╭────────── Quality Report: pusht (206 episodes) ──────────╮ Overall Quality Score: 8.5 / 10 Smoothness (LDLJ) ███████░░░ 0.75 OK Dead Actions █████████ 0.99 OK Gripper Health ██████████ 1.00 OK Static Detection ██████████ 1.00 OK Timestamp Regularity ██████████ 1.00 OK Action Saturation ████████░░ 0.87 OK Action Diversity ███░░░░░░░ 0.30 OK ╰───────────────────────────────────────────────────────╯
forge segment
$ forge segment pusht --sample 8 --label --penalty aic Resolved from registry: PushT (lerobot) Format: lerobot-v3 | Signal: observation.state | Penalty: aic Segmentation Results ┏━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┓ Episode Frames Segments Changepoints Labels ┡━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━┩ episode_000000 161 6 20, 33, 63... moving -> fine_m... episode_000001 118 6 13, 34, 49... fine_m -> fine_m... episode_000002 141 7 12, 27, 56... moving -> fine_m... episode_000003 159 7 28, 42, 60... fine_m -> fine_m... episode_000004 159 8 12, 22, 45... moving -> fine_m... episode_000005 157 6 30, 47, 83... moving -> moving... episode_000006 69 4 14, 46, 57 fine_m -> moving... episode_000007 169 7 12, 43, 59... moving -> fine_m... └────────────────┴────────┴──────────┴─────────────────┴───────────────────────┘ ╭────────────────── Summary ──────────────────╮ Episodes: 8 Mean segments/episode: 6.38 Range: 4 — 8 Total changepoints: 43 ╰───────────────────────────────────────────╯
forge filter
$ forge filter ./dataset ./filtered --min-quality 6.0 Filtering episodes... ━━━━━━━━━━━━━━━━━━ 206/206 ┏━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓ Episode Score Flags Status ┡━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩ episode_000000 8.7 KEEP episode_000001 9.1 KEEP episode_000002 8.4 KEEP episode_000003 5.2 jerky, hesitant EXCL episode_000004 7.8 KEEP episode_000005 3.1 mostly_static EXCL episode_000006 8.9 KEEP ... 199 more episodes ... └────────────────┴───────┴────────────────────┴────────┘ ╭────────────── Filter Results ──────────────╮ Episodes kept: 189 / 206 Episodes excluded: 17 Written to: ./filtered/ ╰─────────────────────────────────────────────╯
forge segment pusht --label --plot timeline.png — semantic phase labels via proprioception
Segmentation timeline visualization

Hub-and-spoke, not N×M

Add a reader, get all writers for free. Add a writer, get all readers for free.

RLDS / Open-X
LeRobot v2/v3
GR00T
Zarr
HDF5
Rosbag / MCAP
RoboDM
Episode / Frame
Intermediate Representation
RLDS
LeRobot v2/v3
RoboDM

Format support matrix

Read, write, and visualize across every major robotics data format.

Format Read Write Visualize Notes
RLDS Open-X, TensorFlow Datasets
LeRobot v2/v3 HuggingFace, Parquet + MP4
GR00T NVIDIA Isaac, LeRobot v2 with embodiment metadata
RoboDM Berkeley's .vla format, up to 70x compression
Zarr Diffusion Policy, UMI
HDF5 robomimic, ACT/ALOHA
Rosbag ROS1 .bag, ROS2 MCAP

8 research-backed metrics

Score every episode 0-10 from proprioception data alone. No video processing needed.

Smoothness (LDLJ)
Jerk-based smoothness
Dead Actions
Zero/constant detection
Gripper Chatter
Rapid open/close cycles
Static Detection
Idle period detection
Timestamp Regularity
Dropped frames & jitter
Action Saturation
Time at hardware limits
Action Entropy
Diversity vs repetition
Path Length
Wandering & hesitation
$ forge quality hf://lerobot/aloha_sim_cube --export report.json
$ forge filter ./dataset ./filtered --min-quality 6.0 # remove bad demos

23+ curated robotics datasets

Browse, search, and download by name. Use dataset IDs directly in any command.

DROID
~76,000 episodes
rlds
Bridge V2
~60,096 episodes
rlds
Open-X Embodiment
~2,200,000 episodes
rlds
AgiBot World
~1,000,000 episodes
lerobot-v2
RoboSet
~100,000 episodes
hdf5
ALOHA
Bi-manual demos
lerobot
Browse All Datasets

Up and running in 60 seconds

1
Install
Clone the repo and install with pip.
$ git clone https://github.com/arpitg1304/forge.git
$ cd forge
$ pip install -e ".[all]"
2
Try a demo dataset
Download, inspect, and score a small dataset in one command.
$ forge demo
# Downloads pusht, runs inspect + quality
3
Convert and go
Convert to any format you need for training.
$ forge convert droid ./output --format lerobot-v3