KServe Deep Dive: Evolving Model Serving for the Generative AI Era
Abstract:
As generative AI continues to transform the AI landscape, the demand for scalable, efficient, and interoperable model serving infrastructure is growing rapidly. This session explores the evolution from early, custom-built deployment patterns to today’s Kubernetes-native model serving solutions. We’ll unpack the unique challenges of serving large language models (LLMs) — including inference efficiency, distributed execution, KV-cache management, and cost optimization.
We’re excited to announce the release of KServe v0.17, a significant milestone that brings native support for generative AI workloads. This includes the introduction of a purpose-built LLMInferenceService CRD designed for advanced LLM-serving patterns such as disaggregated serving, enhanced model and KV caching, and seamless integration with Envoy AI Gateway.
Attendees will walk away with a clear understanding of the technologies driving the next generation of AI applications — and how to architect infrastructure ready to meet the demands of generative AI at scale.