Designing an LLM Serving Architecture: Batching, Caching & Autoscaling