← Back to curriculum

Module 7 — CV production & deployment

Project: deploy a vision inference API

Export a classifier to ONNX, serve with FastAPI + ONNX Runtime, add health checks, batch endpoint, and a simple latency dashboard.

~300 min read + exercises

Project: deploy a vision inference API

Before we begin

Export your Module 3 classifier to ONNX and serve it with FastAPI + ONNX Runtime. Add health check, batch endpoint, and simple latency logging.

Time: ~5 hours.


Goals

  1. POST /predict — single image JSON response.
  2. POST /predict_batch — up to N images.
  3. GET /health — version + model loaded.
  4. Log p50/p95 latency over last 100 requests (in-memory ring buffer).

Project structure

text
vision-api/
  app.py
  model/classifier.onnx
  requirements.txt

requirements.txt: fastapi uvicorn onnxruntime pillow numpy


Batch endpoint sketch

python
@app.post("/predict_batch")
async def predict_batch(files: list[UploadFile] = File(...)):
    tensors = []
    for f in files[:16]:
        img = Image.open(io.BytesIO(await f.read()))
        tensors.append(preprocess(img))
    batch = np.concatenate(tensors, axis=0)
    logits = sess.run(None, {"input": batch})[0]
    return {"predictions": logits.argmax(axis=1).tolist()}

Latency dashboard (minimal)

Store (timestamp_ms, elapsed_ms) in collections.deque(maxlen=100).
GET /metrics returns {"p50": ..., "p95": ..., "count": ...}.


Deliverables

  1. README with uvicorn app:app --reload instructions.
  2. Screenshot or curl example for /predict.
  3. Table: single vs batch latency on 8 images.

Extension

  • Dockerize with Dockerfile.
  • Add prometheus /metrics exporter.
  • INT8 quantized ONNX — compare latency.

Course complete

You finished Computer Vision Foundations — imaging through production. Revisit weak modules via quiz links, or explore the AI track for transformers and GenAI depth.