AI Engineering
How this portfolio works — RAG pipeline, tech choices, and AWS architecture.
RAG Pipeline
Why Each Tech
Best-in-class instruction-following and multilingual (EN/ZH) capability. Streaming via Anthropic SDK integrates cleanly with FastAPI SSE.
Lightweight, embeddable vector DB — no separate service needed. Persisted on AWS EFS so it survives container restarts without re-embedding on every deploy.
Fast, small (80MB), runs CPU-only in Fargate. Good semantic similarity for resume-style factual retrieval without needing a paid embedding API.
Handles chunking, retrieval chain wiring, and history management. Swappable components — easy to upgrade models or vector stores without rewriting retrieval logic.
Async-native Python, minimal overhead. Server-Sent Events keep the connection simple — no WebSocket handshake — and stream tokens to the browser as they arrive.
Serverless containers — no EC2 to manage. Scales to zero when idle, spins up in seconds. Paired with EFS for persistent ChromaDB storage across deployments.