Skip to content

liveplex-cpu/robogate-failure-dictionary

Repository files navigation

RoboGate Failure Dictionary

50,000+ Physics-Validated Pick & Place Failure Patterns across 4 Robots (Franka Panda, UR5e, UR3e, UR10e)

Experiments Robots AUC Simulator License

A structured database of robot AI failure patterns collected from NVIDIA Isaac Sim physical simulations using Two-Stage Adaptive Sampling. Each experiment records the exact conditions under which a robot succeeded or failed at Pick & Place tasks, enabling pre-deployment risk assessment for industrial robotics.


Quick Stats

Franka Uniform Franka Boundary UR5e UR3e UR10e Combined
Experiments 10,000 10,000 10,000 10,000 10,000 50,000+
Success Rate 33.3% 63.8% 74.3% 10.0% 0.0%
Franka Combined 48.6%
Danger Zones 7,808 2,570 9,000+ 10,000 10,378+
Risk Model AUC 0.65 0.777 0.777
Parameters 8 8 11 11 11 11
Sampling Uniform LHS Boundary LHS Uniform LHS Uniform LHS Uniform LHS Two-Stage

Two-Stage Adaptive Sampling

Stage 1 — Uniform Exploration (40,000)

  • Franka Panda 10K + UR5e 10K + UR3e 10K + UR10e 10K via Latin Hypercube Sampling
  • Uniform parameter space coverage — 2-3× better than random
  • Identified boundary regions and initial risk model (AUC 0.65)

Stage 2 — Boundary-Focused (10,000)

  • Franka Panda only, targeting boundary/transition regions
  • Concentrated sampling near friction threshold μ* = 0.492
  • Revealed failure mode transitions invisible to uniform sampling
  • Boosted Risk Model AUC to 0.777 (+19.5%)

Result

  • Boundary equation: μ(m) = (1.469 + 0.419·m) / (3.691 - 1.400·m)*
  • Failure mode transition discovered: friction↓ → timeout → collision → grasp_miss

Key Findings

  • friction × mass interaction z = -10.00 — strongest predictor of failure across all 4 robots
  • Friction threshold: μ = 0.492 ± 0.031* — below this, failure cascades through modes
  • Mass > 0.93 kg → Both robots fail at < 40% SR (universal danger zone)
  • UR5e never drops → SurfaceGripper (suction) with breakForce=MAX; all failures are grasp_miss
  • 2.2× success gap → UR5e 74.3% vs Franka 33.3% (z = -58.15, p < 0.001)
  • AUC 0.65 → 0.777 (+19.5%) with boundary-focused sampling

Universal Danger Zones (mass > 0.93 kg)

Mass Range Franka SR UR5e SR
0.93 – 1.23 kg 21.4% 30.9%
1.23 – 1.52 kg 14.9% 25.3%
1.52 – 1.82 kg 12.5% 28.9%
1.82 – 2.11 kg 6.6% 28.1%

4-Robot Comparison

Franka Panda UR5e UR3e UR10e
Experiments 20,000 10,000 10,000 10,000
Success Rate 48.6% (20K combined) 74.3% 10.0% 0.0%
Failure Modes grasp_miss, drop, collision grasp_miss only grasp_miss, collision grasp_miss, collision
Gripper Finger (parallel jaw) SurfaceGripper (suction) Finger (parallel jaw) Finger (parallel jaw)
DOF 7 6 6 6
Payload 3 kg 5 kg 3 kg 12.5 kg
Collision Count 1,124 (uniform 10K) ~0

Top Failure Correlations

Franka Panda

Parameter Correlation (r) Interpretation
friction +0.36 Higher friction → higher success (strongest factor)
mass -0.20 Heavier objects → more drops
ik_noise -0.11 Control noise → approach errors

UR5e

Parameter Correlation (r) Interpretation
mass -0.35 Heavier objects → suction failure (strongest factor)
friction +0.18 Moderate effect (suction less friction-dependent)
ik_noise -0.12 Control noise → approach miss

Cross-Robot Interactions

Interaction z-score Interpretation
friction × mass -10.00 Strongest predictor — low friction + high mass = catastrophic
friction threshold 0.492 ± 0.031 Decision boundary for success/failure

Research Foundations

Design Choice Paper Venue/Year How We Used It
Two-Stage Adaptive Sampling ALEAS RSS Workshop 2025 Stage 1 uniform 20K + Stage 2 boundary 10K
friction × mass interaction SIMPLER CoRL 2024 Joint sampling, interaction z-test
Failure taxonomy RoboFAC NeurIPS 2025 6 failure type classification
Cross-robot validation RoboMIND RSS 2025 Franka + UR5e simultaneous comparison
UR-specific failures Guardian ICRA 2025 UR robot singularity/reach categories
Confidence intervals SureSim Badithela et al. 2025 Wilson Score 95% CI (50K+ → ±0.4%)
GPU simulation Isaac Lab NVIDIA 2025 Newton Physics + 60Hz physics

Data Files

File Robot Experiments Sampling Description
failure_dictionary_large.json Franka 10,000 Uniform LHS Stage 1 uniform exploration
franka_boundary_10k.json Franka 10,000 Boundary LHS Stage 2 boundary-focused
ur5e_failure_dictionary.json UR5e 10,000 Uniform LHS Stage 1 uniform exploration
ur3e_failure_dictionary.json UR3e 10,000 Uniform LHS Stage 1 uniform exploration
ur10e_failure_dictionary.json UR10e 10,000 Uniform LHS Stage 1 uniform exploration

Data Schema

{
  "friction": 0.603,
  "mass": 0.085,
  "com_offset": 0.081,
  "size": 0.074,
  "ik_noise": 0.037,
  "obstacles": 2,
  "shape": "box",
  "placement": "rotated_135",
  "success": true,
  "failure_type": "none",
  "cycle_time": 1.437,
  "collision": false,
  "drop": false,
  "grasp_miss": false,
  "zone": "boundary"
}

Zone Classification:

  • safe: fail_prob < 0.30
  • boundary: 0.30 ≤ fail_prob < 0.70
  • danger: fail_prob ≥ 0.70

Usage

import json

# Load all Franka data (uniform + boundary)
with open("failure_dictionary_large.json") as f:
    franka_uniform = json.load(f)["experiments"]
with open("franka_boundary_10k.json") as f:
    franka_boundary = json.load(f)["experiments"]
franka_all = franka_uniform + franka_boundary
print(f"Franka total: {len(franka_all)}")  # 20,000

# Find mass danger zones
import numpy as np
heavy = [e for e in franka_all if e["mass"] > 0.93]
sr = sum(1 for e in heavy if e["success"]) / len(heavy)
print(f"Mass > 0.93kg SR: {sr:.1%}")  # ~21%

Or via HuggingFace:

from datasets import load_dataset
ds = load_dataset("liveplex/robogate-failure-dictionary")
# Splits: franka, franka_boundary, ur5e, ur3e, ur10e, train (all 50K+ combined)

Citation

@dataset{robogate_failure_dictionary_2026,
  title={RoboGate Failure Dictionary: 50K+ Physics-Validated Pick & Place Failure Patterns},
  author={RoboGate Team},
  year={2026},
  url={https://github.com/liveplex-cpu/robogate-failure-dictionary},
  note={Franka Panda + UR5e + UR3e + UR10e, Two-Stage Adaptive Sampling, AUC 0.777}
}

VLA Benchmark — 9-Model Leaderboard

Seven models (6 VLA + scripted baseline) evaluated on RoboGate's 68-scenario adversarial suite. All VLAs scored 0% SR — including Physical Intelligence's PI0, NVIDIA's GR00T, and HuggingFace's SmolVLA.

Model Params SR Confidence Failure Pattern
Scripted Controller 100% (68/68) 76/100
GR00T N1.6 (LIBERO-finetuned) 3B 0% (0/68) 49/100 LIBERO 97.65% → RoboGate 0%
PI0 Base (Physical Intelligence) 3.5B 0% (0/68) 27/100 grasp_miss dominant, OpenPI inference
GR00T N1.6 (NVIDIA) 3B 0% (0/68) 1/100 grasp_miss + collision
OpenVLA (Stanford + TRI) 7B 0% (0/68) 27/100 grasp_miss dominant, 0 collision
SmolVLA Base (HuggingFace) 450M 0% (0/68) 1/100 grasp_miss dominant, avg 18ms inference
Octo-Base (UC Berkeley) 93M 0% (0/68) 1/100 grasp_miss 79%, collision 21%
Octo-Small (UC Berkeley) 27M 0% (0/68) 1/100 grasp_miss 79.4%, collision 20.6%

Cross-Simulator Gap: GR00T N1.6, fine-tuned on LIBERO-Spatial (97.65% SR on MuJoCo), scores 0% on RoboGate's Isaac Sim scenarios. Model size is not the bottleneck — from 27M to 7B, no VLA bridges the training-deployment distribution gap without fine-tuning on the target environment.

Leaderboard: robogate.io/vla · Paper: arXiv:2603.22126


Ecosystem Integrations

This failure dictionary is used across the NVIDIA Physical AI ecosystem:

Integration Description Status
Cosmos Evaluator Plugin 50K+ failure patterns power the failure_pattern checker in our Cosmos Evaluator plugin. Danger zones and boundary equations from this dataset drive risk scoring. Live
Azure Physical AI Toolchain RoboGate evaluation step in Microsoft's Physical AI pipeline. This dataset validates policy safety before Azure ML deployment. Live
Isaac Lab-Arena Benchmark 68-scenario adversarial Pick & Place suite contributed as a benchmark task, derived from failure patterns in this dataset. PR #506

Pipeline position: Cosmos Curator → Cosmos Transfer → RoboGate Validation (this data) → Cosmos Evaluator → Deployment


Links


License: MIT

Built with NVIDIA Isaac Sim · Newton Physics · Franka Panda · UR5e · UR3e · UR10e · Two-Stage Adaptive Sampling

About

20,000 Pick & Place failure patterns from NVIDIA Isaac Sim. Franka Panda + UR5e. Latin Hypercube Sampling, 11 physical parameters. MIT License.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors