Sim-to-real & domain randomization
Before we begin
Policies trained in simulation often fail on real robots — friction, latency, mass, and sensor noise differ. Sim-to-real transfer closes the gap with better simulation, domain randomization, system identification, and careful deployment pipelines. RL makes this harder because policies exploit simulator quirks that do not exist in reality.
Reality gap — performance drop when moving from sim to physical system.
Domain randomization — train on a distribution of sim parameters.
System ID — measure real dynamics to calibrate the simulator.
What you will learn
- List major sources of sim-to-real error (dynamics, perception, control).
- Apply domain randomization over physics and visuals.
- Contrast randomization, adaptation, and fine-tuning on real data.
- Design evaluation protocols before touching hardware.
- Connect to offline RL and safety (Module 9) for deployment.
Sources of the reality gap
| Category | Sim assumption | Real world |
|---|---|---|
| Dynamics | Fixed friction, mass | Wear, temperature, payload |
| Actuation | Instant torque | Motor delay, backlash, limits |
| Sensing | Clean state vector | Noise, bias, dropout, latency |
| Contact | Soft constraints | Stiction, bounce, deformation |
| Visual | Perfect textures | Lighting, dirt, motion blur |
RL policies overfit sim idiosyncrasies — a legged robot may learn twitchy motions that work only with MuJoCo's contact model.
Domain randomization (DR)
Randomize simulation parameters each episode so the policy sees a family of environments. Hope real world is inside the training distribution.
def reset_env_randomized(env):
env.set_friction(random.uniform(0.5, 1.5))
env.set_link_mass("base", random.uniform(0.8, 1.2) * nominal_mass)
env.set_sensor_noise(std=random.uniform(0.0, 0.05))
return env.reset()| Parameter | Typical range strategy |
|---|---|
| Friction | ±30–50% of nominal |
| Mass / inertia | ±20% |
| Motor strength | ±15% |
| Sensor noise | monotonically increase until sim policy degrades slightly |
| Push disturbances | random external forces |
Too narrow DR → no transfer. Too wide → sim policy cannot learn; find the Pareto frontier.
Worked example: grasping with DR
OpenAI's dexterous manipulation work randomized object mass, friction, and geometry. Sim-only policy with sufficient DR achieved real transfers without real training labels — but required massive sim compute and careful reward design.
Checkpoint: Why not identify exact real parameters and use a single accurate sim?
Answer
System ID is hard for contact-rich tasks — small errors compound. Multiple real conditions (wear, payloads) need one policy anyway. DR trains robust policies; identification helps narrow randomization ranges rather than eliminating DR entirely.
Other transfer strategies
| Method | Idea | When |
|---|---|---|
| Domain adaptation | Align sim/real features | Visual policies |
| Real fine-tuning | Few real episodes with RL or BC | Budget for on-robot data |
| Offline RL on logs | Learn from historical robot data | Module 9 |
| Digital twin | Continuous calibration from sensors | Industrial setups |
| Conservative objectives | Penalize OOD actions (CQL, etc.) | Safety-critical |
Deployment checklist (before real robot)
- Sim baselines — seed variance, eval over 50+ episodes.
- Randomization sweep — policy still works at wide ranges?
- Action rate limits — low-pass filter torques; clip jerk.
- Emergency stop — hardware e-stop independent of policy.
- Sandbox — harness, reduced payload, limited workspace.
- Logging — states, actions, contacts at full rate for replay.
- Human oversight — first N episodes with kill switch.
Never skip hardware limits because the sim allowed impossible torques.
Perception sim-to-real
For camera policies add: random backgrounds, lighting, camera pose jitter, noise, blur, and (if affordable) real-to-sim domain adaptation networks. Pure DR on pixels needs wide GPU budget — consider state-based policies for first hardware iteration (joint encoders).
Common mistakes
| Mistake | Symptom | Fix |
|---|---|---|
| Zero DR, perfect sim | Instant real failure | Randomize physics |
| DR too aggressive | Sim never learns | Narrow, curriculum widen |
| No latency modeling | Oscillation on hardware | Add action delay in sim |
| Skipping safety limits | Hardware damage | Clip + rate limit |
| One sim seed | Lucky contact exploit | Multi-seed eval |
| No real logging | Cannot debug transfer | Mirror sim log schema |
Closing
Sim-to-real is engineering plus RL: randomize what you cannot measure, identify what you can, and deploy with safety guardrails. Robotics case studies next show how these pieces combined in published systems — then you train SAC in sim for Pendulum as a lightweight continuous-control lab.