The normalization layer for robotics data. Convert, inspect, score, tokenize, visualize, and discover datasets across every major format.

RLDS  ===\          /===> LeRobot
HDF5  ===|          |===> MCAP
Zarr  ===| Episode/Frame |===> RoboDM
MCAP  ===/          \===> RLDS
View on GitHub Browse Datasets Try in Colab
$ pip install "forge-robotics[all]"

Everything you need for robotics data

One toolkit to convert, inspect, score, filter, segment, tokenize, and browse robotics datasets.

Format Conversion
Convert between RLDS, LeRobot, Zarr, HDF5, MCAP, Rosbag, and RoboDM with a single command. Hub-and-spoke architecture means O(n) not O(n²).
$ forge convert hf://lerobot/pusht ./output --format rlds
Dataset Inspection
Auto-detect format, list episodes, cameras, action/state dimensions, FPS, and schema. Works with local paths and HuggingFace URIs.
$ forge inspect hf://lerobot/aloha_sim_cube
Quality Scoring
Score every episode 0-10 with 8 research-backed metrics. Detect jerky demos, dead actions, gripper chatter, and idle periods from proprioception alone.
$ forge quality ./my_dataset --export report.json
Video Quality
Opt-in camera-stream scoring: blur, exposure, frozen frames, and colorfulness (Tier 0), plus optical-flow motion, smoothness, and shot-cut detection (Tier 1). Composes with filtering.
$ forge quality ./dataset --video --video-level motion
Dataset Linting
Catch hygiene defects before you train: missing or placeholder task strings, ambiguous camera names, low-res or single-view setups, missing action fields. Enforces Hugging Face's recording guidelines — CI-friendly exit codes.
$ forge lint ./my_dataset --strict
Episode Filtering
Filter datasets by quality score, flags, or episode IDs. Supports dry-run previews and pre-computed quality reports.
$ forge filter ./dataset ./filtered --min-quality 6.0
Deduplication
Find and remove near-duplicate episodes — exact copies, re-encodes, near-identical takes — by perceptual hashing of camera keyframes. Numpy only, no model.
$ forge dedup ./dataset ./deduped
Episode Segmentation
PELT changepoint detection on proprioception signals. Automatically split episodes into sub-skills, regime changes, and idle periods.
$ forge segment ./dataset --label --plot timeline.png
Web Visualizer
Browser-based viewer with multi-camera support, action/state charts, timeline scrubber, and segment overlay. Zero extra dependencies.
$ forge visualize pusht --segment
Action Tokenization
Turn continuous actions into the discrete tokens VLA models train on. Four built-in strategies (RT-1, OpenVLA, quantile, mu-law) and a comparator that benchmarks them on your data — measure, don't guess.
$ forge tokenize compare ./my_dataset --export report.json
Dataset Registry
Curated catalog of 23+ prominent robotics datasets. Search, filter, and download by name. Use dataset IDs directly in any command.
$ forge inspect droid # resolves via registry

Real output from real datasets

Every output below was generated by running Forge on the pusht dataset.

forge inspect
$ forge inspect hf://lerobot/pusht Dataset: lerobot/pusht Format: lerobot-v3 (v3.0) Episodes: 206 Total frames: 25,650 Observation Schema ┏━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━┓ Field Type Shape ┡━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━┩ observation.state float32 (2,) next.success bool (1,) next.reward float32 () next.done bool () └───────────────────┴─────────┴───────┘ Action: float32 (2,) Cameras: image: 96x96 (rgb) FPS: 10 Language: yes (100% coverage) Sample: "Push the T-shaped block onto the T-shaped target."
forge quality
$ forge quality hf://lerobot/pusht Analyzing episodes... ━━━━━━━━━━━━━━━━━━ 206/206 ╭────────── Quality Report: pusht (206 episodes) ──────────╮ Overall Quality Score: 8.5 / 10 Smoothness (LDLJ) ███████░░░ 0.75 OK Dead Actions █████████ 0.99 OK Gripper Health ██████████ 1.00 OK Static Detection ██████████ 1.00 OK Timestamp Regularity ██████████ 1.00 OK Action Saturation ████████░░ 0.87 OK Action Diversity ███░░░░░░░ 0.30 OK ╰───────────────────────────────────────────────────────╯
forge lint
$ forge lint hf://lerobot/pusht WARN camera.ambiguous_name Camera 'image' doesn't say where it is (third-person vs wrist?). → Use <modality>.<location>, e.g. 'observation.images.wrist'. WARN task.missing Dataset has no language / task instructions. → VLA training needs per-episode instructions; add them. INFO camera.low_resolution Camera 'image' is 96x96; below 640x480. INFO camera.too_few_views Only 1 camera view(s); HF recommends >= 2. PASS 206 episodes 0 errors, 2 warnings, 2 info
forge segment
$ forge segment pusht --sample 8 --label --penalty aic Resolved from registry: PushT (lerobot) Format: lerobot-v3 | Signal: observation.state | Penalty: aic Segmentation Results ┏━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┓ Episode Frames Segments Changepoints Labels ┡━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━┩ episode_000000 161 6 20, 33, 63... moving -> fine_m... episode_000001 118 6 13, 34, 49... fine_m -> fine_m... episode_000002 141 7 12, 27, 56... moving -> fine_m... episode_000003 159 7 28, 42, 60... fine_m -> fine_m... episode_000004 159 8 12, 22, 45... moving -> fine_m... episode_000005 157 6 30, 47, 83... moving -> moving... episode_000006 69 4 14, 46, 57 fine_m -> moving... episode_000007 169 7 12, 43, 59... moving -> fine_m... └────────────────┴────────┴──────────┴─────────────────┴───────────────────────┘ ╭────────────────── Summary ──────────────────╮ Episodes: 8 Mean segments/episode: 6.38 Range: 4 — 8 Total changepoints: 43 ╰───────────────────────────────────────────╯
forge filter
$ forge filter ./dataset ./filtered --min-quality 6.0 Filtering episodes... ━━━━━━━━━━━━━━━━━━ 206/206 ┏━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓ Episode Score Flags Status ┡━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩ episode_000000 8.7 KEEP episode_000001 9.1 KEEP episode_000002 8.4 KEEP episode_000003 5.2 jerky, hesitant EXCL episode_000004 7.8 KEEP episode_000005 3.1 mostly_static EXCL episode_000006 8.9 KEEP ... 199 more episodes ... └────────────────┴───────┴────────────────────┴────────┘ ╭────────────── Filter Results ──────────────╮ Episodes kept: 189 / 206 Episodes excluded: 17 Written to: ./filtered/ ╰─────────────────────────────────────────────╯
forge tokenize
$ forge tokenize compare pusht --sample 2000 Tokenizer comparison: pusht — 25,650 frames, action_dim=2, scored on 2000 Tokenizer Comparison ┏━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━┓ Strategy Vocab MAE MSE Max-abs Vocab util ┡━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━┩ uniform-bins ✓ 256 0.4829 0.3098 0.9746 96% quantile-bins 256 0.6442 1.6471 23.5000 100% openvla-bins 256 0.6828 7.6404 61.7969 100% mu-law 256 2.8620 12.5675 10.9926 27% └────────────────┴───────┴────────┴─────────┴─────────┴────────────┘ Lowest MAE: uniform-bins # pusht actions are raw pixel coords — RT-1 min/max bins beat OpenVLA's # percentile clipping on this dataset. The right tokenizer depends on your data.
forge segment pusht --label --plot timeline.png — semantic phase labels via proprioception
Segmentation timeline visualization
forge visualize pusht --segment — browser-based viewer with segment overlay
Forge Viewer lerobot-v3
206 episodes · 25,650 frames · 10 fps
image (96×96)
Episode 0 Pause 1x
56 / 161 fine_manipulation
moving fine_manipulation idle
Actions
States
Space: Play/Pause   ←→: Frame   ↑↓: Episode   [/]: Speed
forge visualize droid_100 --backend rerun --samples 2 — Rerun viewer: cameras, time-series, and segment labels on one timeline
Rerun viewer showing camera stream, action and state time series
Install: pip install forge-robotics[rerun]

Hub-and-spoke, not N×M

Add a reader, get all writers for free. Add a writer, get all readers for free.

RLDS / Open-X
LeRobot v2/v3
GR00T
Zarr
HDF5
MCAP
Rosbag
RoboDM
Episode / Frame
Intermediate Representation
RLDS
LeRobot v2/v3
RoboDM

Format support matrix

Read, write, and visualize across every major robotics data format.

Format Read Write Visualize Notes
RLDS Open-X, TensorFlow Datasets
LeRobot v2/v3 HuggingFace, Parquet + MP4
GR00T NVIDIA Isaac, LeRobot v2 with embodiment metadata
RoboDM Berkeley's .vla format, up to 70x compression
Zarr Diffusion Policy, UMI
HDF5 robomimic, ACT/ALOHA
MCAP ROS2 CDR + Foxglove Protobuf, no ROS install required
Rosbag ROS1 .bag, ROS2 SQLite3

8 research-backed metrics

Score every episode 0-10 from proprioception data alone. No video processing needed.

Smoothness (LDLJ)
Jerk-based smoothness
Dead Actions
Zero/constant detection
Gripper Chatter
Rapid open/close cycles
Static Detection
Idle period detection
Timestamp Regularity
Dropped frames & jitter
Action Saturation
Time at hardware limits
Action Entropy
Diversity vs repetition
Path Length
Wandering & hesitation
$ forge quality hf://lerobot/aloha_sim_cube --export report.json
$ forge filter ./dataset ./filtered --min-quality 6.0 # remove bad demos

23+ curated robotics datasets

Browse, search, and download by name. Use dataset IDs directly in any command.

DROID
~76,000 episodes
rlds
Bridge V2
~60,096 episodes
rlds
Open-X Embodiment
~2,200,000 episodes
rlds
AgiBot World
~1,000,000 episodes
lerobot-v2
RoboSet
~100,000 episodes
hdf5
ALOHA
Bi-manual demos
lerobot
Browse All Datasets

Up and running in 60 seconds

1
Install
One pip install — pick the format extras you need (or grab everything).
$ pip install "forge-robotics[mcap]"
# or for every supported format:
$ pip install "forge-robotics[all]"
2
Try a demo dataset
Download, inspect, and score a small dataset in one command.
$ forge demo
# Downloads pusht, runs inspect + quality
3
Convert and go
Convert to any format you need for training — tokenized actions included.
$ forge convert droid ./output --format lerobot-v3
$ forge tokenize write ./output ./tokenized