public-talks

Slides, videos, and supporting files for my public talks

View project on GitHub

KubeCon North America 2025 and Co-located Events at Atlanta

[Keynote] Anchoring Trust in the Age of AI: Identities Across Humans, Machines, and Models

Abstract:

Every revolution in computing has been defined by trust. Firewalls secured the internet, while API keys and IAM roles helped shape the cloud. Today, in the Kubernetes era of ephemeral workloads and Agentic AI, trust is no longer just about people—human and machine identities now stand on equal footing. The challenge is ensuring auditability: knowing who or what is calling, and being able to trace every interaction.

This keynote shows how projects like Kubernetes can anchor a new trust fabric. With SPIFFE and SPIRE providing cryptographic workload identities, and Keycloak enabling another layer of identity and access control, we establish an auditable chain of trust. Paired with KServe, this fabric extends into AI serving so that every model, explainer, and pipeline step runs with verifiable identity. Together, they make Kubernetes a secure, accountable platform for the age of AI.


[Keynote] Welcome + Opening Remarks, KubeCon North America


KServe Next: Advancing Generative AI Model Serving

Abstract:

As generative AI rapidly reshapes the AI landscape, the need for scalable, efficient, and interoperable model serving infrastructure has never been greater. In this session, we’ll trace the journey from bespoke model deployment patterns to modern, Kubernetes-native serving platforms. We’ll dive into the latest challenges in deploying and scaling large language models (LLMs) — including inference performance, KV-cache management, distributed execution, and cost optimization.

We are thrilled to announce the release of KServe v0.17, a major milestone introducing enhanced support for generative AI workloads: a dedicated LLMInferenceService CRD tailored for LLM-serving capabilities (e.g., disaggregated serving), model and KV caching, and integration with the open source Envoy AI Gateway.

Attendees will gain insights into the technologies powering the next wave of AI applications and learn how to prepare their infrastructure for a generative AI future.


Introducing TAG Workloads Foundation: Advancing the Core of Cloud Native Execution

Abstract:

The CNCF Technical Advisory Groups (TAGs) play a vital role in shaping the future of cloud native. We’re excited to introduce a new addition: the TAG Workloads Foundation. This session will present the mission, scope, and early initiatives of TAG Workloads Foundation, focused on defining and advancing practices and standards for cloud native workload execution environments and lifecycle management. Attendees will learn how this TAG supports the CNCF’s technical vision, why workload execution is critical for adopters, and how community members can get involved to help solve real-world challenges across container platforms, schedulers, orchestration systems, etc. Join us to help shape the next phase of cloud native maturity—from fundamental runtime environments to future-forward workload patterns.


Abstract:

Large model inference is evolving rapidly: model or expert parallelism, prefill/decode disaggregation, multi-lora and kv cache offloading push the limits of traditional serving. As infrastructure teams, we must decide — what belongs in Kubernetes core primitives vs engines vs ecosystem projects? In this session, WG-Serving chairs and industry leaders will share real-world lessons on managing these blurry boundaries. We’ll discuss how to evaluate new patterns, balance control vs observability, and adapt infrastructure to stay ahead in this dynamic landscape. Attendees will gain practical frameworks to decide when to extend Kubernetes vs offload to runtimes, and insights into top emerging demands from large-scale LLM workloads.


Cloud Native & Kubernetes AI Day Closing Remarks