Project: deploy a vision inference API

Before we begin

Export your Module 3 classifier to ONNX and serve it with FastAPI + ONNX Runtime. Add health check, batch endpoint, and simple latency logging.

Time: ~5 hours.

Goals

POST /predict — single image JSON response.
POST /predict_batch — up to N images.
GET /health — version + model loaded.
Log p50/p95 latency over last 100 requests (in-memory ring buffer).

Project structure

text

vision-api/
  app.py
  model/classifier.onnx
  requirements.txt

requirements.txt: fastapi uvicorn onnxruntime pillow numpy

Batch endpoint sketch

python

@app.post("/predict_batch")
async def predict_batch(files: list[UploadFile] = File(...)):
    tensors = []
    for f in files[:16]:
        img = Image.open(io.BytesIO(await f.read()))
        tensors.append(preprocess(img))
    batch = np.concatenate(tensors, axis=0)
    logits = sess.run(None, {"input": batch})[0]
    return {"predictions": logits.argmax(axis=1).tolist()}

Latency dashboard (minimal)

Store (timestamp_ms, elapsed_ms) in collections.deque(maxlen=100).
GET /metrics returns {"p50": ..., "p95": ..., "count": ...}.

Deliverables

README with uvicorn app:app --reload instructions.
Screenshot or curl example for /predict.
Table: single vs batch latency on 8 images.

Extension

Dockerize with Dockerfile.
Add prometheus /metrics exporter.
INT8 quantized ONNX — compare latency.

Course complete

You finished Computer Vision Foundations — imaging through production. Revisit weak modules via quiz links, or explore the AI track for transformers and GenAI depth.