KServe Deep Dive: Evolving Model Serving for the Generative AI Era

Abstract:

As generative AI continues to transform the AI landscape, the demand for scalable, efficient, and interoperable model serving infrastructure is growing rapidly. This session explores the evolution from early, custom-built deployment patterns to today’s Kubernetes-native model serving solutions. We’ll unpack the unique challenges of serving large language models (LLMs) — including inference efficiency, distributed execution, KV-cache management, and cost optimization.

We’re excited to announce the release of KServe v0.17, a significant milestone that brings native support for generative AI workloads. This includes the introduction of a purpose-built LLMInferenceService CRD designed for advanced LLM-serving patterns such as disaggregated serving, enhanced model and KV caching, and seamless integration with Envoy AI Gateway.

Attendees will walk away with a clear understanding of the technologies driving the next generation of AI applications — and how to architect infrastructure ready to meet the demands of generative AI at scale.

Slides

The Power of Community - Kick Off Video for Community Day at IBM TechXchange

Video

public-talks

Slides, videos, and supporting files for my public talks

KServe Deep Dive: Evolving Model Serving for the Generative AI Era

The Power of Community - Kick Off Video for Community Day at IBM TechXchange