Sim-to-real & domain randomization

Before we begin

Policies trained in simulation often fail on real robots — friction, latency, mass, and sensor noise differ. Sim-to-real transfer closes the gap with better simulation, domain randomization, system identification, and careful deployment pipelines. RL makes this harder because policies exploit simulator quirks that do not exist in reality.

Reality gap — performance drop when moving from sim to physical system.
Domain randomization — train on a distribution of sim parameters.
System ID — measure real dynamics to calibrate the simulator.

What you will learn

List major sources of sim-to-real error (dynamics, perception, control).
Apply domain randomization over physics and visuals.
Contrast randomization, adaptation, and fine-tuning on real data.
Design evaluation protocols before touching hardware.
Connect to offline RL and safety (Module 9) for deployment.

Sources of the reality gap

Category	Sim assumption	Real world
Dynamics	Fixed friction, mass	Wear, temperature, payload
Actuation	Instant torque	Motor delay, backlash, limits
Sensing	Clean state vector	Noise, bias, dropout, latency
Contact	Soft constraints	Stiction, bounce, deformation
Visual	Perfect textures	Lighting, dirt, motion blur

RL policies overfit sim idiosyncrasies — a legged robot may learn twitchy motions that work only with MuJoCo's contact model.

Domain randomization (DR)

Randomize simulation parameters each episode so the policy sees a family of environments. Hope real world is inside the training distribution.

python

def reset_env_randomized(env):
    env.set_friction(random.uniform(0.5, 1.5))
    env.set_link_mass("base", random.uniform(0.8, 1.2) * nominal_mass)
    env.set_sensor_noise(std=random.uniform(0.0, 0.05))
    return env.reset()

Parameter	Typical range strategy
Friction	±30–50% of nominal
Mass / inertia	±20%
Motor strength	±15%
Sensor noise	monotonically increase until sim policy degrades slightly
Push disturbances	random external forces

Too narrow DR → no transfer. Too wide → sim policy cannot learn; find the Pareto frontier.

Worked example: grasping with DR

OpenAI's dexterous manipulation work randomized object mass, friction, and geometry. Sim-only policy with sufficient DR achieved real transfers without real training labels — but required massive sim compute and careful reward design.

Checkpoint: Why not identify exact real parameters and use a single accurate sim?

Answer

System ID is hard for contact-rich tasks — small errors compound. Multiple real conditions (wear, payloads) need one policy anyway. DR trains robust policies; identification helps narrow randomization ranges rather than eliminating DR entirely.

Other transfer strategies

Method	Idea	When
Domain adaptation	Align sim/real features	Visual policies
Real fine-tuning	Few real episodes with RL or BC	Budget for on-robot data
Offline RL on logs	Learn from historical robot data	Module 9
Digital twin	Continuous calibration from sensors	Industrial setups
Conservative objectives	Penalize OOD actions (CQL, etc.)	Safety-critical

Deployment checklist (before real robot)

Sim baselines — seed variance, eval over 50+ episodes.
Randomization sweep — policy still works at wide ranges?
Action rate limits — low-pass filter torques; clip jerk.
Emergency stop — hardware e-stop independent of policy.
Sandbox — harness, reduced payload, limited workspace.
Logging — states, actions, contacts at full rate for replay.
Human oversight — first N episodes with kill switch.

Never skip hardware limits because the sim allowed impossible torques.

Perception sim-to-real

For camera policies add: random backgrounds, lighting, camera pose jitter, noise, blur, and (if affordable) real-to-sim domain adaptation networks. Pure DR on pixels needs wide GPU budget — consider state-based policies for first hardware iteration (joint encoders).

Common mistakes

Mistake	Symptom	Fix
Zero DR, perfect sim	Instant real failure	Randomize physics
DR too aggressive	Sim never learns	Narrow, curriculum widen
No latency modeling	Oscillation on hardware	Add action delay in sim
Skipping safety limits	Hardware damage	Clip + rate limit
One sim seed	Lucky contact exploit	Multi-seed eval
No real logging	Cannot debug transfer	Mirror sim log schema

Closing

Sim-to-real is engineering plus RL: randomize what you cannot measure, identify what you can, and deploy with safety guardrails. Robotics case studies next show how these pieces combined in published systems — then you train SAC in sim for Pendulum as a lightweight continuous-control lab.

Before this lesson

Previous lesson

What's next

Next lesson — Robotics RL case studies