Yuan's Blog

Deploying Distributed AI Inference: Blueprints & Troubleshooting

Part 3 of the distributed AI inference series covering six deployment blueprints by traffic shape, troubleshooting recipes, and a scaling roadmap.

Posted on June 26, 2026

Optimizing Distributed AI Inference: Advanced Deployment Patterns

Part 2 of the distributed AI inference series covering P/D disaggregation, KV cache tiering and sharing, and speculative decoding techniques.

Posted on June 24, 2026

Designing Distributed AI Inference: Core Concepts and Scaling Dimensions

Part 1 of the distributed AI inference series covering prefill/decode phases and 5D parallelism for deploying large language models at scale.

Posted on June 22, 2026

Distributed AI Inference Best Practices & Gotchas

A deep dive into 5D parallelism, P/D disaggregation, KV cache optimization, speculative decoding, and deployment blueprints for distributed AI inference at scale.

Posted on June 5, 2026

Production-Grade LLM Inference at Scale with KServe, llm-d, and vLLM

How KServe and llm-d solve the operational challenges of deploying hundreds of LLMs at scale, from storage performance to intelligent KV-cache-aware load balancing.

Posted on April 21, 2026

Best of Both Worlds: Cloud-Native AI Inference at Scale using KServe and llm-d

How KServe and llm-d combine to deliver efficient GPU utilization, intelligent request routing, and cost-aware autoscaling for enterprise AI inference on Kubernetes.

Posted on March 5, 2026

Kubernetes Serving Working Group Has Succeeded and Will Be Disbanded

Kubernetes WG Serving accomplished its goals for AI inference on Kubernetes and is disbanding, with ongoing work continuing in llm-d, AIBrix, and relevant SIGs.

Posted on February 13, 2026

Feeling Thankful Today and Reflecting on Two Incredible Years at Red Hat

Reflecting on two years at Red Hat — from promotion to Senior Principal Engineer, leading KServe and open source AI initiatives, to gratitude for mentors and community.

Posted on November 28, 2025

KubeCon North America 2025: Red Hat AI Model Serving Highlights

Highlights from KubeCon NA 2025 in Atlanta — KServe joining CNCF, K8s AI Conformance keynote, Cloud Native AI Day, and model serving demos.

Posted on November 18, 2025

KServe Joins CNCF as an Incubating Project

KServe has been accepted as a CNCF incubating project, validating its maturity as the standardized AI inference platform on Kubernetes.

Posted on November 11, 2025

«
1
2
3
4
5
»