Forge - Robotics Data Toolkit & Data Engine

The Toolkit

Everything you need for a single dataset

Convert, inspect, score, lint, filter, dedup, segment, tokenize, and visualize — one dataset at a time, across every format.

Format Conversion

Convert between RLDS, LeRobot, Zarr, HDF5, MCAP, Rosbag, and RoboDM with a single command. Hub-and-spoke architecture means O(n) not O(n²).

$ forge convert hf://lerobot/pusht ./output --format rlds

Dataset Inspection

Auto-detect format, list episodes, cameras, action/state dimensions, FPS, and schema. Works with local paths and HuggingFace URIs.

$ forge inspect hf://lerobot/aloha_sim_cube

Quality Scoring

Score every episode 0-10 with 8 research-backed metrics. Detect jerky demos, dead actions, gripper chatter, and idle periods from proprioception alone.

$ forge quality ./my_dataset --export report.json

Video Quality

Opt-in camera-stream scoring: blur, exposure, frozen frames, and colorfulness (Tier 0), plus optical-flow motion, smoothness, and shot-cut detection (Tier 1). Composes with filtering.

$ forge quality ./dataset --video --video-level motion

Dataset Linting

Catch hygiene defects before you train: missing or placeholder task strings, ambiguous camera names, low-res or single-view setups, missing action fields. Enforces Hugging Face's recording guidelines — CI-friendly exit codes.

$ forge lint ./my_dataset --strict

Episode Filtering

Filter datasets by quality score, flags, or episode IDs. Supports dry-run previews and pre-computed quality reports.

$ forge filter ./dataset ./filtered --min-quality 6.0

Deduplication

Find and remove near-duplicate episodes — exact copies, re-encodes, near-identical takes — by perceptual hashing of camera keyframes. Numpy only, no model.

$ forge dedup ./dataset ./deduped

Episode Segmentation

PELT changepoint detection on proprioception signals. Automatically split episodes into sub-skills, regime changes, and idle periods.

$ forge segment ./dataset --label --plot timeline.png

Web Visualizer

Browser-based viewer with multi-camera support, action/state charts, timeline scrubber, and segment overlay. Zero extra dependencies.

$ forge visualize pusht --segment

Action Tokenization

Turn continuous actions into the discrete tokens VLA models train on. Four built-in strategies (RT-1, OpenVLA, quantile, mu-law) and a comparator that benchmarks them on your data — measure, don't guess.

$ forge tokenize compare ./my_dataset --export report.json

Dataset Registry

Curated catalog of 23+ prominent robotics datasets. Search, filter, and download by name. Use dataset IDs directly in any command.

$ forge inspect droid # resolves via registry

The Data Engine

From a pile of datasets to a queryable corpus

The catalog registers every episode you collect into an append-only, SQL-queryable store — then embed, search, dedup, curate, and explore it. Zero-server, cloud-native.

ingest→query→embed→search→dedup→curate→Studio

The Catalog

Register every episode into an append-only set of Parquet tables, annotated with quality. Query it with SQL — zero-server (embedded DuckDB), local dir or cloud bucket, readable by pandas/Polars without Forge.

$ forge ingest s3://lab/raw/ -c ./catalog

Semantic Search

Embed episodes with SigLIP (a shared image–text model) and search by natural language — text queries match episode video, not just metadata. GPU auto-detected; sub-second cosine in DuckDB.

$ forge search "picks up the red cup" --top 10

Dedup & Curation

Find near-duplicate episodes across the whole corpus from the embeddings, then curate a clean, labeled training set by policy (keep-higher-quality, keep-longer…). An append-log — nothing is ever deleted.

$ forge curate --dedup 0.97 --label approved

Forge Studio

A self-contained, themed HTML app to explore the catalog — Overview, Corpus (thumbnails + quality rings), Dedup review (keep/reject pairs), and Snapshot. Real data + video thumbnails embedded; one shareable file, no server.

$ forge studio -c ./catalog -o studio.html

Cloud Native

Every command that takes a path also accepts s3:// and gs:// URIs — datasets and catalogs alike. Auth uses your standard AWS/GCP credential chain; Forge never handles credentials itself.

$ forge quality s3://my-bucket/droid

Idempotent & Versioned

Re-running ingest or embed is a no-op — episodes are keyed by content hash. Quality scores and embeddings are versioned; commits are atomic (manifest-last), so a crash never leaves a partial batch.

$ forge query "SELECT task, count(*) ..."

See it in action

Real output from real datasets

Every output below was generated by running Forge on the pusht dataset.

forge inspect

$ forge inspect hf://lerobot/pusht Dataset: lerobot/pusht Format: lerobot-v3 (v3.0) Episodes: 206 Total frames: 25,650 Observation Schema ┏━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━┓ ┃ Field ┃ Type ┃ Shape ┃ ┡━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━┩ │ observation.state │ float32 │ (2,) │ │ next.success │ bool │ (1,) │ │ next.reward │ float32 │ () │ │ next.done │ bool │ () │ └───────────────────┴─────────┴───────┘ Action: float32 (2,) Cameras: image: 96x96 (rgb) FPS: 10 Language: yes (100% coverage) Sample: "Push the T-shaped block onto the T-shaped target."

forge quality

$ forge quality hf://lerobot/pusht Analyzing episodes... ━━━━━━━━━━━━━━━━━━ 206/206 ╭────────── Quality Report: pusht (206 episodes) ──────────╮ │ │ │ Overall Quality Score: 8.5 / 10 │ │ │ │ Smoothness (LDLJ) ███████░░░ 0.75 OK │ │ Dead Actions █████████░ 0.99 OK │ │ Gripper Health ██████████ 1.00 OK │ │ Static Detection ██████████ 1.00 OK │ │ Timestamp Regularity ██████████ 1.00 OK │ │ Action Saturation ████████░░ 0.87 OK │ │ Action Diversity ███░░░░░░░ 0.30 OK │ │ │ ╰───────────────────────────────────────────────────────╯

forge lint

$ forge lint hf://lerobot/pusht WARN camera.ambiguous_name Camera 'image' doesn't say where it is (third-person vs wrist?). → Use <modality>.<location>, e.g. 'observation.images.wrist'. WARN task.missing Dataset has no language / task instructions. → VLA training needs per-episode instructions; add them. INFO camera.low_resolution Camera 'image' is 96x96; below 640x480. INFO camera.too_few_views Only 1 camera view(s); HF recommends >= 2. PASS 206 episodes 0 errors, 2 warnings, 2 info

forge segment

$ forge segment pusht --sample 8 --label --penalty aic Resolved from registry: PushT (lerobot) Format: lerobot-v3 | Signal: observation.state | Penalty: aic Segmentation Results ┏━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Episode ┃ Frames ┃ Segments ┃ Changepoints ┃ Labels ┃ ┡━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━┩ │ episode_000000 │ 161 │ 6 │ 20, 33, 63... │ moving -> fine_m... │ │ episode_000001 │ 118 │ 6 │ 13, 34, 49... │ fine_m -> fine_m... │ │ episode_000002 │ 141 │ 7 │ 12, 27, 56... │ moving -> fine_m... │ │ episode_000003 │ 159 │ 7 │ 28, 42, 60... │ fine_m -> fine_m... │ │ episode_000004 │ 159 │ 8 │ 12, 22, 45... │ moving -> fine_m... │ │ episode_000005 │ 157 │ 6 │ 30, 47, 83... │ moving -> moving... │ │ episode_000006 │ 69 │ 4 │ 14, 46, 57 │ fine_m -> moving... │ │ episode_000007 │ 169 │ 7 │ 12, 43, 59... │ moving -> fine_m... │ └────────────────┴────────┴──────────┴─────────────────┴───────────────────────┘ ╭────────────────── Summary ──────────────────╮ │ Episodes: 8 │ │ Mean segments/episode: 6.38 │ │ Range: 4 — 8 │ │ Total changepoints: 43 │ ╰───────────────────────────────────────────╯

forge filter

$ forge filter ./dataset ./filtered --min-quality 6.0 Filtering episodes... ━━━━━━━━━━━━━━━━━━ 206/206 ┏━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓ ┃ Episode ┃ Score ┃ Flags ┃ Status ┃ ┡━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩ │ episode_000000 │ 8.7 │ │ KEEP │ │ episode_000001 │ 9.1 │ │ KEEP │ │ episode_000002 │ 8.4 │ │ KEEP │ │ episode_000003 │ 5.2 │ jerky, hesitant │ EXCL │ │ episode_000004 │ 7.8 │ │ KEEP │ │ episode_000005 │ 3.1 │ mostly_static │ EXCL │ │ episode_000006 │ 8.9 │ │ KEEP │ │ ... 199 more episodes ... │ └────────────────┴───────┴────────────────────┴────────┘ ╭────────────── Filter Results ──────────────╮ │ │ │ Episodes kept: 189 / 206 │ │ Episodes excluded: 17 │ │ Written to: ./filtered/ │ │ │ ╰─────────────────────────────────────────────╯

forge tokenize

$ forge tokenize compare pusht --sample 2000 Tokenizer comparison: pusht — 25,650 frames, action_dim=2, scored on 2000 Tokenizer Comparison ┏━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━┓ ┃ Strategy ┃ Vocab ┃ MAE ┃ MSE ┃ Max-abs ┃ Vocab util ┃ ┡━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━┩ │ uniform-bins ✓ │ 256 │ 0.4829 │ 0.3098 │ 0.9746 │ 96% │ │ quantile-bins │ 256 │ 0.6442 │ 1.6471 │ 23.5000 │ 100% │ │ openvla-bins │ 256 │ 0.6828 │ 7.6404 │ 61.7969 │ 100% │ │ mu-law │ 256 │ 2.8620 │ 12.5675 │ 10.9926 │ 27% │ └────────────────┴───────┴────────┴─────────┴─────────┴────────────┘ Lowest MAE: uniform-bins # pusht actions are raw pixel coords — RT-1 min/max bins beat OpenVLA's # percentile clipping on this dataset. The right tokenizer depends on your data.

forge segment pusht --label --plot timeline.png — semantic phase labels via proprioception

forge visualize pusht --segment — browser-based viewer with segment overlay

Forge Viewer lerobot-v3

206 episodes · 25,650 frames · 10 fps

image (96×96)

Episode 0 Pause 1x

56 / 161 fine_manipulation

moving fine_manipulation idle

Actions

States

Space: Play/Pause ←→: Frame ↑↓: Episode [/]: Speed

forge visualize droid_100 --backend rerun --samples 2 — Rerun viewer: cameras, time-series, and segment labels on one timeline

Rerun viewer showing camera stream, action and state time series

Install: pip install forge-robotics[rerun]

Everything you need for a single dataset

From a pile of datasets to a queryable corpus

Real output from real datasets

Hub-and-spoke, not N×M

Format support matrix

8 research-backed metrics

23+ curated robotics datasets

Up and running in 60 seconds

Format	Read	Write	Visualize	Notes
RLDS	✓	✓	✓	Open-X, TensorFlow Datasets
LeRobot v2/v3	✓	✓	✓	HuggingFace, Parquet + MP4
GR00T	✓	—	✓	NVIDIA Isaac, LeRobot v2 with embodiment metadata
RoboDM	✓	✓	✓	Berkeley's .vla format, up to 70x compression
Zarr	✓	—	✓	Diffusion Policy, UMI
HDF5	✓	—	✓	robomimic, ACT/ALOHA
MCAP	✓	✓	✓	ROS2 CDR + Foxglove Protobuf, no ROS install required
Rosbag	✓	—	✓	ROS1 .bag, ROS2 SQLite3