🌙

Distributed AI Inference Best Practices & Gotchas

A deep dive into 5D parallelism, P/D disaggregation, KV cache optimization, speculative decoding, and deployment blueprints for distributed AI inference at scale.

Production-Grade LLM Inference at Scale with KServe, llm-d, and vLLM

How KServe and llm-d solve the operational challenges of deploying hundreds of LLMs at scale, from storage performance to intelligent KV-cache-aware load balancing.

Best of Both Worlds: Cloud-Native AI Inference at Scale using KServe and llm-d

How KServe and llm-d combine to deliver efficient GPU utilization, intelligent request routing, and cost-aware autoscaling for enterprise AI inference on Kubernetes.

Kubernetes Serving Working Group Has Succeeded and Will Be Disbanded

Kubernetes WG Serving accomplished its goals for AI inference on Kubernetes and is disbanding, with ongoing work continuing in llm-d, AIBrix, and relevant SIGs.

Feeling Thankful Today and Reflecting on Two Incredible Years at Red Hat

Reflecting on two years at Red Hat — from promotion to Senior Principal Engineer, leading KServe and open source AI initiatives, to gratitude for mentors and community.

KubeCon North America 2025: Red Hat AI Model Serving Highlights

Highlights from KubeCon NA 2025 in Atlanta — KServe joining CNCF, K8s AI Conformance keynote, Cloud Native AI Day, and model serving demos.

KServe Joins CNCF as an Incubating Project

KServe has been accepted as a CNCF incubating project, validating its maturity as the standardized AI inference platform on Kubernetes.

The Best Choice for AI Inference: vLLM

Why vLLM's open-source architecture, advanced KV-cache management, and parallelization strategies make it the best choice for production LLM inference.

Eight Lessons on Open Source Leadership and Community

What contributing, mentoring, and stewarding have taught me about OSS

Lessons on stewardship, trust, mentoring, and community building from years of contributing to and leading open source projects.

PyTorch on Kubernetes: Kubeflow Trainer Joins PyTorch Ecosystem

Kubeflow Trainer joins the PyTorch ecosystem, providing Kubernetes-native distributed training and LLM fine-tuning with simplified APIs.