Project: deploy a vision inference API
Before we begin
Export your Module 3 classifier to ONNX and serve it with FastAPI + ONNX Runtime. Add health check, batch endpoint, and simple latency logging.
Time: ~5 hours.
Goals
POST /predict— single image JSON response.POST /predict_batch— up to N images.GET /health— version + model loaded.- Log p50/p95 latency over last 100 requests (in-memory ring buffer).
Project structure
text
vision-api/
app.py
model/classifier.onnx
requirements.txtrequirements.txt: fastapi uvicorn onnxruntime pillow numpy
Batch endpoint sketch
python
@app.post("/predict_batch")
async def predict_batch(files: list[UploadFile] = File(...)):
tensors = []
for f in files[:16]:
img = Image.open(io.BytesIO(await f.read()))
tensors.append(preprocess(img))
batch = np.concatenate(tensors, axis=0)
logits = sess.run(None, {"input": batch})[0]
return {"predictions": logits.argmax(axis=1).tolist()}Latency dashboard (minimal)
Store (timestamp_ms, elapsed_ms) in collections.deque(maxlen=100).
GET /metrics returns {"p50": ..., "p95": ..., "count": ...}.
Deliverables
- README with
uvicorn app:app --reloadinstructions. - Screenshot or curl example for
/predict. - Table: single vs batch latency on 8 images.
Extension
- Dockerize with
Dockerfile. - Add prometheus
/metricsexporter. - INT8 quantized ONNX — compare latency.
Course complete
You finished Computer Vision Foundations — imaging through production. Revisit weak modules via quiz links, or explore the AI track for transformers and GenAI depth.