← Back to curriculum

Module 8 — Continuous control & robotics

Sim-to-real & domain randomization

Reality gap, randomizing physics/visuals, and system identification preview.

~60 min read + exercises

Sim-to-real & domain randomization

Before we begin

Policies trained in simulation often fail on real robots — friction, latency, mass, and sensor noise differ. Sim-to-real transfer closes the gap with better simulation, domain randomization, system identification, and careful deployment pipelines. RL makes this harder because policies exploit simulator quirks that do not exist in reality.

Reality gap — performance drop when moving from sim to physical system.
Domain randomization — train on a distribution of sim parameters.
System ID — measure real dynamics to calibrate the simulator.


What you will learn

  • List major sources of sim-to-real error (dynamics, perception, control).
  • Apply domain randomization over physics and visuals.
  • Contrast randomization, adaptation, and fine-tuning on real data.
  • Design evaluation protocols before touching hardware.
  • Connect to offline RL and safety (Module 9) for deployment.

Sources of the reality gap

CategorySim assumptionReal world
DynamicsFixed friction, massWear, temperature, payload
ActuationInstant torqueMotor delay, backlash, limits
SensingClean state vectorNoise, bias, dropout, latency
ContactSoft constraintsStiction, bounce, deformation
VisualPerfect texturesLighting, dirt, motion blur

RL policies overfit sim idiosyncrasies — a legged robot may learn twitchy motions that work only with MuJoCo's contact model.


Domain randomization (DR)

Randomize simulation parameters each episode so the policy sees a family of environments. Hope real world is inside the training distribution.

python
def reset_env_randomized(env):
    env.set_friction(random.uniform(0.5, 1.5))
    env.set_link_mass("base", random.uniform(0.8, 1.2) * nominal_mass)
    env.set_sensor_noise(std=random.uniform(0.0, 0.05))
    return env.reset()
ParameterTypical range strategy
Friction±30–50% of nominal
Mass / inertia±20%
Motor strength±15%
Sensor noisemonotonically increase until sim policy degrades slightly
Push disturbancesrandom external forces

Too narrow DR → no transfer. Too wide → sim policy cannot learn; find the Pareto frontier.


Worked example: grasping with DR

OpenAI's dexterous manipulation work randomized object mass, friction, and geometry. Sim-only policy with sufficient DR achieved real transfers without real training labels — but required massive sim compute and careful reward design.

Checkpoint: Why not identify exact real parameters and use a single accurate sim?

Answer

System ID is hard for contact-rich tasks — small errors compound. Multiple real conditions (wear, payloads) need one policy anyway. DR trains robust policies; identification helps narrow randomization ranges rather than eliminating DR entirely.


Other transfer strategies

MethodIdeaWhen
Domain adaptationAlign sim/real featuresVisual policies
Real fine-tuningFew real episodes with RL or BCBudget for on-robot data
Offline RL on logsLearn from historical robot dataModule 9
Digital twinContinuous calibration from sensorsIndustrial setups
Conservative objectivesPenalize OOD actions (CQL, etc.)Safety-critical

Deployment checklist (before real robot)

  1. Sim baselines — seed variance, eval over 50+ episodes.
  2. Randomization sweep — policy still works at wide ranges?
  3. Action rate limits — low-pass filter torques; clip jerk.
  4. Emergency stop — hardware e-stop independent of policy.
  5. Sandbox — harness, reduced payload, limited workspace.
  6. Logging — states, actions, contacts at full rate for replay.
  7. Human oversight — first N episodes with kill switch.

Never skip hardware limits because the sim allowed impossible torques.


Perception sim-to-real

For camera policies add: random backgrounds, lighting, camera pose jitter, noise, blur, and (if affordable) real-to-sim domain adaptation networks. Pure DR on pixels needs wide GPU budget — consider state-based policies for first hardware iteration (joint encoders).


Common mistakes

MistakeSymptomFix
Zero DR, perfect simInstant real failureRandomize physics
DR too aggressiveSim never learnsNarrow, curriculum widen
No latency modelingOscillation on hardwareAdd action delay in sim
Skipping safety limitsHardware damageClip + rate limit
One sim seedLucky contact exploitMulti-seed eval
No real loggingCannot debug transferMirror sim log schema

Closing

Sim-to-real is engineering plus RL: randomize what you cannot measure, identify what you can, and deploy with safety guardrails. Robotics case studies next show how these pieces combined in published systems — then you train SAC in sim for Pendulum as a lightweight continuous-control lab.


Before this lesson


What's next