← Back to curriculum

Perception & closed-loop control

From images to motions (intro)

Hand–eye calibration, visual servoing at a high level, and how CV lessons connect to robot motion.

~60 min read + exercises

From images to motions (intro)

This lesson connects the vision track to robot motion: how images become errors that drive controllers, and what must be true for closed-loop behavior to be stable.

Figure

The closed perception–action loop

The closed perception–action loopVision turns into motion, motion changes what vision sees — repeat.Captureimage(s)Perceivetask errorPlandesired motionActuatevelocity / torquelatency hurts here
Sensing turns into motion, which changes what the sensors see next. Stability requires bounded latency and a well-calibrated error definition at every hop.

Learning objectives

  • Define hand–eye calibration and why it matters.
  • Contrast position-based vs image-based visual servoing at a high level.
  • List prerequisites (known models, controllability) for closing the loop safely.

Prerequisites

  • Camera projection lesson (ideal).
  • IK / Jacobian intuition (previous robotics lessons).

Step 1 — The perception–action loop

A minimal loop:

  1. Capture image(s).
  2. Extract features or run a network to estimate a task error in some space.
  3. Map error to desired motion (velocity or torque command).
  4. Actuate; repeat.

Checkpoint: Where does latency hurt stability the most?


Step 2 — Hand–eye: align what the camera sees with what the arm does

You need a consistent transform chain between:

  • the camera frame,
  • the robot base / wrist, and
  • the tool.

Eye-in-hand vs eye-to-hand setups change which transforms are time-varying.

Figure

Eye-in-hand vs eye-to-hand

Eye-in-hand vs eye-to-handWhere the camera sits decides which transform becomes time-varying.camEye-in-hand (camera on wrist)fixed camEye-to-hand (camera fixed)
Wrist-mounted cameras move with the tool (great close-up resolution, calibration of base→camera changes); fixed cameras keep extrinsics constant but lose detail far from the lens.

Exercise: In eye-in-hand, which transform changes as the wrist moves?


Step 3 — Position-based visual servoing (PBVS)

Pipeline intuition:

  • Estimate pose of target relative to camera (often via calibrated geometry + features).
  • Compute desired end-effector pose.
  • Use IK / Jacobian control to drive the arm.

Pros: can leverage classical geometry. Cons: pose estimation errors can destabilize control; may need good initialization.


Step 4 — Image-based visual servoing (IBVS)

Drive image feature errors directly toward desired image coordinates using the image Jacobian (interaction matrix) relating image-plane velocities to camera motion.

Pros: robust to some calibration errors in certain configurations. Cons: singularities and local minima exist; camera retreat motions need handling.

Checkpoint: Why is “depth” a recurring nuisance in IBVS even if you never build a full 3D model?


Step 5 — Safety and reality

Before running fast visual servoing on hardware:

  • joint limits, self-collisions, and workspace bounds,
  • maximum velocities / torque limits,
  • behavior on feature loss (occlusion).

Exercise: Write a short “estop policy” checklist for demo day.


Check your understanding

  1. What problem does hand–eye calibration solve in one sentence?
  2. Why is a high frame rate camera not sufficient for stable servoing by itself?
  3. Name one failure mode unique to IBVS compared to PBVS.

Lab-style stretch goal (optional)

In simulation, track a colored blob centroid in the image and command differential drive velocities proportional to centroid error — observe oscillation as gains increase.