🌙
Yuan Tang
Yuan Tang is a Senior Principal Software Engineer at Red Hat AI. Previously, he led AI infrastructure and platform teams at various companies. He holds leadership roles in open source communities — including Argo, Kubeflow, KServe, Kubernetes, and CNCF — and maintains many popular open source projects. Yuan has authored award-winning books and numerous highly cited papers and patents (13k+ citations), and is a frequent keynote speaker, technical advisor, and mentor at various organizations.
Work Experience
Senior Principal Software Engineer details[details] Dec 2023 - current
- Served as Staff Engineer in the technical leadership team, shaping technical strategy and product alignment across the AI Engineering organization of 500+ engineers.
- Delivered sponsored keynotes to audiences of up to 9,000 and 20+ breakout sessions to showcase our work; served as co-chair at major industry conferences.
- Co-founded and co-chaired Kubernetes AI Conformance, WG Serving and CNCF TAG Workloads Foundation; served on Kubeflow Steering Committee and LF AI & Data Foundation TAC and Governing Board.
- Built scalable model serving platform on Kubernetes for OpenShift AI, led KServe, and collaborated closely with vLLM and llm-d.
- First external maintainer of Open GenAI Stack (originally Meta Llama Stack), driving org-wide adoption as the standard agentic API server.
- Started the State of Model Serving Communities newsletter and grew it to 1,500+ subscribers, fostering cross-team collaboration; filed 5 patent applications and published 10+ blog posts.
Blogs - Model Serving & Inference:
- Distributed AI Inference Best Practices & Gotchas
- Production-Grade LLM Inference at Scale with KServe, llm-d, and vLLM | llm-d Blog
- Best of Both Worlds: Cloud-Native AI Inference at Scale using KServe and llm-d | KServe Blog
- Why vLLM is the best choice for AI inference today | Red Hat Developer
- Introducing vLLM Inference Provider in Llama Stack | vLLM Blog
- KServe Providers Dish Up NIMble Inference in Clouds and Data Centers | NVIDIA Blog
- Empower conversational AI at scale with KServe | Red Hat Developer

Blogs - Kubernetes & Cloud Native AI:
- Kubernetes WG Serving concludes following successful advancement of AI inference support
- KubeCon North America 2025: Red Hat AI Model Serving Highlights
- PyTorch on Kubernetes: Kubeflow Trainer Joins the PyTorch Ecosystem | PyTorch Blog
- AI/ML Innovation in the Kubernetes Ecosystem | DZone
- Open source AI at Red Hat: Our journey in the Kubeflow community | Red Hat Blog
- Cloud Native AI Day deep dive | CNCF Blog

Media & Announcements:
- CNCF Nearly Doubles Certified Kubernetes AI Platforms
- KServe becomes a CNCF incubating project (CNCF, Red Hat, CloudNativeNow)
- CNCF Launches Certified Kubernetes AI Conformance Program
- Kubeflow Project Steering Committee Announced | Kubeflow Blog
- Announcing Charmed Kubeflow 1.10 | Canonical Blog
- Kubeflow brings MLOps to the CNCF Incubator | CNCF Blog
- KubeCon + CloudNativeCon NA 2022 | SiliconANGLE theCUBE

Short Videos:
- Why is open, transparent data important to LLMs? | Red Hat
- Using llm-d to Serve Large Models | Red Hat Community
- The Power of Community | IBM TechXchange Community Day
Various Roles in Software Engineering and AI/ML Systems details[details] Sep 2015 - Nov 2023
- Built and led AI infrastructure, DevOps platforms, and machine learning systems on Kubernetes across startups and large organizations as founding engineer and tech lead.
Akuity - Founding Engineer (Sep 2021 - Nov 2023)
- First founding engineer: bootstrapped engineering infrastructure, drove early sales and marketing initiatives, and recruited the majority of the engineering team.
- Led the development of Argo Workflows, maintained Argo CD, and provided enterprise support and architectural reviews for customers.
- Led development of the AI Assistant Extension to help developers analyze issues with managed Kubernetes resources and applications.
- Designed and implemented major components of the Akuity Platform, an enterprise-ready, fully-managed DevOps platform.
- Led efforts to achieve SOC 2 Type 2 compliance certification and hardened engineering operations and security best practices.
- Co-authored CNCF Certified Argo Project Associate exam.

Alibaba Group - Senior Software Engineer / Tech Lead (June 2018 - Aug 2021)
- Built AI infrastructure and AutoML platform on Kubernetes; co-chaired Kubeflow and led distributed training operators.
- Designed major components of ElasticDL for fault-tolerant deep learning; co-inventor of the patent; integrated with SQLFlow for ML via extended SQL.
- Led Couler and Argo Workflows for cloud-native workflow orchestration.

H2O.ai - Senior AI Platform Engineer (Dec 2017 - May 2018)
- Contributed to H2O and built model management for Driverless AI.

Uptake (acquired by Bosch) - Data Science Lead (Sep 2015 - Nov 2017)
- Led a team building a data science platform for industrial asset monitoring (trains, airplanes, wind turbines).
- Built real-time microservice monitoring platform; lead inventor of the anomaly detection patent.
- Led all open source initiatives within the data science team.
Services and Positions
Open Source Community Leadership 2019 - current
- Co-Chair and Founding Member, CNCF Technical Advisory Group on Workloads Foundation, 2025 - current details[details]
This is a Technical Advisory Group under the CNCF Technical Oversight Committee (TOC) with a focus on Workloads Foundation.

Mission: define and advance practices and standards for fundamental cloud native workload execution environments and their related lifecycle management within cloud native systems, applications, and architectures. This supports the CNCF's technical vision by addressing critical problems faced by adopters and contributing to a robust cloud native ecosystem.

Talks:
- Introducing TAG Workloads Foundation: Advancing the Core of Cloud Native Execution - KubeCon NA 2025
- Explore TAG Workloads Foundation: Advancing Cloud Native Execution From Core Runtime To Applications - KubeCon EU 2026
- Co-Chair, Tech Lead, and Founding Member, Kubernetes AI Conformance, 2025 - current details[details]
Defines a standardized set of capabilities, APIs, and configurations that a Kubernetes cluster must offer to reliably and efficiently run AI/ML workloads.

KubeCon North America 2025 - Initial Launch:
Featured on the day 1 opening keynote stage and quoted as a co-chair in the CNCF announcement. Led Red Hat's initial certification, among the first certified vendors.
Coverage: Forbes, CloudNativeNow, AKS, GKE

KubeCon Europe 2026 - Doubled Adoption:
Presented at the keynote stage and maintainer session; quoted as a program leader in the CNCF announcement.
Coverage: Forbes
- Technical Advisory Council Member and Alternate Governing Board Member, LF AI & Data Foundation, 2024 - 2026
- Project Lead, Co-Chair, and Steering Committee Member, Kubeflow, 2020 - 2026 details[details]
Steering Committee:
Member of the Kubeflow Steering Committee (KSC), the governing body overseeing project policies, sub-organizations, financial planning, and community structure.
- Kubeflow Project Steering Committee Announced

Distributed Training:
Co-chair of Distributed Training Working Group and maintainer of Kubernetes operators for TensorFlow, PyTorch, MXNet, and XGBoost.
- PyTorch on Kubernetes: Kubeflow Trainer Joins the PyTorch Ecosystem | PyTorch Blog
- Introduction to Kubeflow MPI Operator and Industry Adoption

AutoML:
Co-author of the technical whitepaper for Kubeflow Katib, a Kubernetes-native project for automated machine learning.
- A Scalable and Cloud-Native Hyperparameter Tuning System

Talks:
- Kubeflow Ecosystem: What’s Next for Cloud Native AI/ML and LLMOps - KubeCon EU 2025
- Engaging the Kubeflow Community: Building an Enterprise-Ready AI/ML Platform - Cloud Native & Kubernetes AI Day EU 2025
- Large Scale Distributed Deep Learning with Kubernetes Operators - KubeCon EU 2019
- Project Management Committee Member and Committer, XGBoost, 2020 - current
- Project Management Committee Member and Committer, Apache MXNet, 2017 - 2023
Conferences and Journals 2018 - current
- Program Chair of KubeCon and Co-Located Events, 2023 - current details[details]
Chair:
- Cloud Native + Kubernetes AI Day (Europe, North America, China), 2024 - 2026
- Data on Kubernetes Day at KubeCon North America, 2023

Program Committee:
- KubeCon AI/ML Track (Europe, North America, China, Japan, India), 2024 - 2026
- PyTorch Conference (North America and China), 2026
- Agentics Day at KubeCon Europe, 2026

Talks:
- Cloud Native AI + Kubeflow Day: Welcome + Opening Remarks - KubeCon EU 2026
- Cloud Native & Kubernetes AI Day: Closing Remarks - KubeCon NA 2025
- Cloud Native & Kubernetes AI Day Welcome + Opening Remarks - KubeCon EU 2025
- Cloud Native AI Day: Welcome + Opening Remarks - KubeCon EU 2024
- Editor of Journal of Open Source Software, 2018 - 2022
- Insight Partner on AI Technology of Synced Review, 2019
Advisor and Mentor 2016 - current
- Purdue University Department of Computer Science, Advisory Board Member, National Science Foundation (NSF)-sponsored projects
- Carnegie Mellon University School of Computer Science, Industry Mentor, Catalyst Research Lab, 2025 - 2026
- Google Summer of Code, Project Mentor, 2016 - 2024 details[details]
Mentored students across 5 programs over 8 years:
- 2024: Kubeflow
- 2022: Argo
- 2020: Kubeflow
- 2019: TensorFlow
- 2016: R Project for Statistical Computing
- Technical Advisor at various exited startups and large organizations details[details]
Active:
- TensorChord - reproducible AI/ML dev environments
- Metabit Trading - cloud-native infrastructure for quantitative trading

Exited / Acquired:
- Chaintool (now Codatta) - distributed graph database for Web3 risk management
- Moises (now Music.AI) - AI-powered music and audio platform
- Maven Wave (now Eviden/Atos) - ML, data visualization, open source strategy
- CSPA (acquired by AngelList) - technical steering and open source strategy
Selected Projects [Full List]
Kubernetes Co-chair & Project Lead
Production-Grade Container Orchestration. Co-Chair, Tech Lead, and Founding Member of Kubernetes AI Conformance and Serving Working Group
Kubeflow Project Lead & Steering Committee Member
Machine learning toolkits on Kubernetes
Argo Project Lead
Project lead of Argo Workflows, the container-native workflow engine; maintainer of Argo CD, declarative continuous delivery for Kubernetes
KServe Project Lead & Technical Steering Committee Member
Standardized AI inference platform on Kubernetes
TensorFlow Co-author & Maintainer
Co-author of TensorFlow Estimators (KDD'17) and TensorFlow in R. First non-Google maintainer; Google Open Source Peer Bonus recipient.
XGBoost Project Management Committee Member & Committer
General-purpose gradient boosting library
Selected Talks [Full List]
Frequent speaker at KubeCon, PyData Global, PlatformCon, and other major venues, with single-session audiences of up to 9,000+. Roles include keynote speaker, lecturer, panelist, and moderator.
Anchoring Trust in the Age of AI: Identities Across Humans, Machines, and Models
Sponsored Keynote Speaker, KubeCon North America 2025
Advancing Cloud Native AI Innovation Through Open Collaboration
Sponsored Keynote Speaker, Cloud Native & Kubernetes AI Day North America 2024
Building for the Road Ahead: An Ode to Maintainers, the Life Blood of Our Ecosystem
Invited Keynote Speaker, KubeCon North America 2022
Publications [Google Scholar]
13k+ citations across conference papers (ICDE, KDD), journals (JMLR, JOSS, The R Journal), award-winning books, and patents.
Books
Distributed Machine Learning Patterns《分布式机器学习模式》 2023
Manning Publications, ISBN 9781617299025. Available in English, Korean, Russian, and Chinese.
Dive into Deep Learning (with TensorFlow) 《动手学深度学习》 2020
Available in English and Chinese.
TensorFlow in Practice 《TensorFlow实战》 2017
Beijing Publishing House of Electronics Industry. Available in Chinese. Award-winning.
Conference Papers
Couler: Unified Machine Learning Workflow Optimization in Cloud 2024
40th IEEE International Conference on Data Engineering (ICDE)
TensorFlow Estimators: Managing Simplicity vs. Flexibility in High-Level Machine Learning Frameworks 2017
Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD)
Journal Articles
metric-learn: Metric Learning Algorithms in Python 2020
Journal of Machine Learning Research (JMLR)
lfda, dml, autoplotly: R Packages for Metric Learning and Visualization 2018 - 2019
Journal of Open Source Software (JOSS) lfda · dml · autoplotly
ggfortify: Unified Interface to Visualize Statistical Result of Popular R Packages 2016
The R Journal
Patents
4 US Patent Applications on AI Model Optimization, Inference, and Distributed Workloads 2025 - 2026
System and Method for Distributed Task Execution 2023
China Patent CN110609749B
Systems and Methods for Detecting and Remedying Software Anomalies 2020
US Patent US10635519B1
Preprints details[details]
A Scalable and Cloud-Native Hyperparameter Tuning System 2020
arXiv preprint arXiv:2006.02085
SQLFlow: A Bridge between SQL and Machine Learning 2020
arXiv preprint arXiv:2001.06846
TF.Learn: TensorFlow's High-level Module for Distributed Machine Learning 2016
arXiv preprint arXiv:1612.04251
Incorporating Hierarchical Structure into Dynamic Systems: An Application of Estimating HIV Epidemics at Sub-National and Sub-Population Level 2016
arXiv preprint arXiv:1602.05665
Education
Georgia Institute of Technology
Graduate-level Coursework in Computer Science, Specializing in Computing Systems
Schreyer Honors College at Pennsylvania State University 2012 - 2015
Bachelor of Science in Mathematics with Honors
Schreyer Honors Scholar (top 2%) with scholarships, research grants, and honors thesis. Dean's List (all semesters). National Statistics Honorary Society Inductee.
Changjun High School (湖南省长沙市长郡中学) 2009 - 2012
High School Diploma, Science
High School Dual Diploma (via Exchange Program), Grattan Academy High School, 2011 - 2012
Excellence in Action, International Scholar Athlete Award. Student Committee Member.
Awards
Awards by Teams at Red Hat and IBM 2023 - current
- IBM Tech Award, Dec 10th, 2024
- Red Hat AI Engineering Jedi Award, Red Hat Multiplier and Influence, Oct 18th, 2024
- Numerous internal awards and recognitions (available upon request)
Awards by Teams at Alibaba Group 2020 - 2021
- Inner Source Pioneer, April 17th, 2021
- Top Open Source Contributor of the Year, Jan 20th, 2020
- Best Pull Request of the Week, May 3rd, 2020
Publishing Awards 2017 - 2018
- Outstanding China Mainland Books Copyright Exported to Taiwan, The Publishers Association of China, 2018 (for TensorFlow in Practice)
- Outstanding Author, Beijing Publishing House of Electronics Industry, 2017 (for TensorFlow in Practice)
Open Source Peer Bonus Award 2016
Google Inc.
Miscellaneous Awards during College details[details] 2014 - 2016
DataNovo Startup Team, HackRPI, Schreyer Honors College at Penn State
DataNovo Startup Team (2015 - 2016)
- Top 3 Finalist, SXSW Interactive · Top Startup Winner, TiE50 · Trial Support Software Innovation Award, Legaltech News · B2B Finalist, Launch Festival

Hackathon and Programming Competition (2014)
- Best Virtual Reality Hack, HackRPI at Rensselaer Polytechnic Institute
- MindSumo Programming Challenge Winner, Sponsored by Capital One

Scholarships & Grants (2014)
- Pre-Eminence in Honors Education Fund ($5,000) · Summer Research Grants ($1,200) · NSF MCTP Grant and PMASS Fellowship ($12,800) · John K. Tsui Honors Scholarship ($5,300)