Yuan Tang
Yuan Tang is a Senior Principal Software Engineer at Red Hat AI. Previously, he has led AI infrastructure and platform teams at various companies. He holds leadership positions in open source communities, including Argo, Kubeflow, KServe, Kubernetes, and CNCF. He's also a maintainer and author of many popular open source projects. In addition, Yuan authored three technical books as well as numerous papers and patents with 12,000+ citations. He's a frequent keynote speaker, technical advisor, leader, and mentor at various organizations.
Work Experience
Senior Principal Software Engineer details[details] Dec 2023 - current
- Served as Staff Engineer in the technical leadership team, shaping technical strategy and product alignment across the AI Engineering organization of 500+ engineers.
- Delivered sponsored keynotes to audiences of up to 9,000 and 20+ breakout sessions to showcase our work; served as co-chair at major industry conferences.
- Co-founded and co-chaired Kubernetes AI Conformance, Serving Working Group, and CNCF Technical Advisory Group Workloads Foundation.
- Represented as a primary voting member of the Technical Advisory Council and an alternate member of the Governing Board at LF AI & Data Foundation.
- Built scalable model serving platform on Kubernetes for OpenShift AI, led KServe, and collaborated closely with vLLM and llm-d.
- Contributed to Llama Stack in collaboration with Meta, which led to organization-wide adoption as the GenAI API standard.
- Served as a member of the Kubeflow Steering Committee and oversaw all community governance activities.
- Published the State of Model Serving Communities newsletter, fostering collaborations across internal teams and reaching 1,500+ external subscribers.
- Filed 5 patent applications and published 10+ blog posts.
- Served as co-chair of AI Engineering SIG Serving to lead internal cross-team discussions and alignment on technical strategy for AI products.
- Received company-wide awards including IBM Tech Leadership Award and Red Hat AI Engineering Jedi Award, as well as numerous awards and internal recognitions from colleagues.
- Mentored team members at various levels and established partnerships with external organizations and communities.
Blogs - Model Serving & Inference:
- Best of Both Worlds: Cloud-Native AI Inference at Scale using KServe and llm-d | KServe Blog
- Why vLLM is the best choice for AI inference today | Red Hat Developer
- Introducing vLLM Inference Provider in Llama Stack | vLLM Blog
- KServe Providers Dish Up NIMble Inference in Clouds and Data Centers | NVIDIA Blog
- Empower conversational AI at scale with KServe | Red Hat Developer

Blogs - Kubernetes & Cloud Native AI:
- Kubernetes WG Serving concludes following successful advancement of AI inference support
- KubeCon North America 2025: Red Hat AI Model Serving Highlights
- PyTorch on Kubernetes: Kubeflow Trainer Joins the PyTorch Ecosystem | PyTorch Blog
- AI/ML Innovation in the Kubernetes Ecosystem | DZone
- Open source AI at Red Hat: Our journey in the Kubeflow community | Red Hat Blog
- Cloud Native AI Day deep dive | CNCF Blog

Media & Announcements:
- CNCF Nearly Doubles Certified Kubernetes AI Platforms
- KServe becomes a CNCF incubating project (CNCF, Red Hat, CloudNativeNow)
- CNCF Launches Certified Kubernetes AI Conformance Program
- Kubeflow Project Steering Committee Announced | Kubeflow Blog
- Announcing Charmed Kubeflow 1.10 | Canonical Blog
- Kubeflow brings MLOps to the CNCF Incubator | CNCF Blog
- KubeCon + CloudNativeCon NA 2022 | SiliconANGLE theCUBE

Short Videos:
- Why is open, transparent data important to LLMs? | Red Hat
- Using llm-d to Serve Large Models | Red Hat Community
- The Power of Community | IBM TechXchange Community Day
Founding Engineer Sep 2021 - Nov 2023
- First founding engineer: bootstrapped engineering infrastructure, drove early sales and marketing initiatives, and recruited the majority of the engineering team.
- Led the development of Argo Workflows, maintained Argo CD, and provided enterprise support and architectural reviews for customers.
- Led development of the AI Assistant Extension to help developers analyze issues with managed Kubernetes resources and applications.
- Designed and implemented major components of the Akuity Platform, an enterprise-ready, fully-managed DevOps platform.
- Led efforts to achieve SOC 2 Type 2 compliance certification and hardened engineering operations and security best practices.
Various Roles in Software Engineering and AI/ML Systems details[details] Sep 2015 - Aug 2021
- Built AI infrastructure, AutoML platforms, and data science platforms on Kubernetes across multiple organizations.
Alibaba Group - Senior Software Engineer / Tech Lead (June 2018 - Aug 2021)
- Built AI infrastructure and AutoML platform on Kubernetes; co-chaired Kubeflow and led distributed training operators.
- Designed major components of ElasticDL for fault-tolerant deep learning; co-inventor of the patent; integrated with SQLFlow for ML via extended SQL.
- Led Couler and Argo Workflows for cloud-native workflow orchestration.

H2O.ai - Senior AI Platform Engineer (Dec 2017 - May 2018)
- Contributed to H2O and built model management for Driverless AI.

Uptake - Data Science Lead (Sep 2015 - Nov 2017)
- Led a team building a data science platform for industrial asset monitoring (trains, airplanes, wind turbines).
- Built real-time microservice monitoring platform; lead inventor of the anomaly detection patent.
- Led all open source initiatives within the data science team.
Services and Positions
Open Source Community Leadership 2019 - current
- Co-Chair and Founding Member, CNCF Technical Advisory Group on Workloads Foundation, 2025 - current details[details]
This is a Technical Advisory Group under the CNCF Technical Oversight Committee (TOC) with a focus on Workloads Foundation.

Mission: define and advance practices and standards for fundamental cloud native workload execution environments and their related lifecycle management within cloud native systems, applications, and architectures. This supports the CNCF's technical vision by addressing critical problems faced by adopters and contributing to a robust cloud native ecosystem.

Talks:
- Introducing TAG Workloads Foundation: Advancing the Core of Cloud Native Execution - KubeCon NA 2025
- Explore TAG Workloads Foundation: Advancing Cloud Native Execution From Core Runtime To Applications - KubeCon EU 2026
- Co-Chair, Tech Lead, and Founding Member, Kubernetes AI Conformance, 2025 - current details[details]
Defines a standardized set of capabilities, APIs, and configurations that a Kubernetes cluster must offer to reliably and efficiently run AI/ML workloads.

KubeCon North America 2025 - Initial Launch:
Featured on the day 1 opening keynote stage and quoted as a co-chair in the CNCF announcement. Led Red Hat's initial certification, among the first certified vendors.
Coverage: Forbes, CloudNativeNow, AKS, GKE

KubeCon Europe 2026 - Doubled Adoption:
Presented at the keynote stage and maintainer session; quoted as a program leader in the CNCF announcement.
Coverage: Forbes
- Technical Advisory Council Member and Alternate Governing Board Member, LF AI & Data Foundation, 2024 - current details[details]
Serving as a primary voting member of the Technical Advisory Council for the LF AI & Data Foundation, representing Red Hat.
- Project Lead, Co-Chair, and Steering Committee Member, Kubeflow, 2020 - 2026 details[details]
Steering Committee:
Member of the Kubeflow Steering Committee (KSC), the governing body overseeing project policies, sub-organizations, financial planning, and community structure.
- Kubeflow Project Steering Committee Announced

Distributed Training:
Co-chair of Distributed Training Working Group and maintainer of Kubernetes operators for TensorFlow, PyTorch, MXNet, and XGBoost.
- PyTorch on Kubernetes: Kubeflow Trainer Joins the PyTorch Ecosystem | PyTorch Blog
- Introduction to Kubeflow MPI Operator and Industry Adoption

AutoML:
Co-author of the technical whitepaper for Kubeflow Katib, a Kubernetes-native project for automated machine learning.
- A Scalable and Cloud-Native Hyperparameter Tuning System

Talks:
- Kubeflow Ecosystem: What’s Next for Cloud Native AI/ML and LLMOps - KubeCon EU 2025
- Engaging the Kubeflow Community: Building an Enterprise-Ready AI/ML Platform - Cloud Native & Kubernetes AI Day EU 2025
- Large Scale Distributed Deep Learning with Kubernetes Operators - KubeCon EU 2019
- Project Management Committee Member and Committer, XGBoost, 2020 - current
- Project Management Committee Member and Committer, Apache MXNet, 2017 - 2023
Conferences and Journals 2018 - current
- Program Chair of KubeCon and Co-Located Events, 2023 - current details[details]
Chair:
- Cloud Native + Kubernetes AI Day (Europe, North America, China), 2024 - 2026 [announcement]
- Data on Kubernetes Day at KubeCon North America, 2023

Program Committee:
- KubeCon AI/ML Track (Europe, North America, China, Japan, India), 2024 - 2026
- Agentics Day at KubeCon Europe, 2026

Talks:
- Cloud Native AI + Kubeflow Day: Welcome + Opening Remarks - KubeCon EU 2026
- Cloud Native & Kubernetes AI Day: Closing Remarks - KubeCon NA 2025
- Cloud Native & Kubernetes AI Day Welcome + Opening Remarks - KubeCon EU 2025
- Cloud Native AI Day: Welcome + Opening Remarks - KubeCon EU 2024
- Editor of Journal of Open Source Software, 2018 - 2022
- Insight Partner on AI Technology of Synced Review, 2019
Advisor and Mentor 2016 - current
- Purdue University Department of Computer Science, Advisory Board Member, National Science Foundation (NSF)-sponsored projects
- Carnegie Mellon University School of Computer Science, Industry Mentor, Catalyst Research Lab, 2025 - 2026 [projects]
- Google Summer of Code, Project Mentor, 2016 - 2024 details[details]
Mentored students across 5 programs over 8 years:
- 2024: Kubeflow [project] [certificate]
- 2022: Argo [project]
- 2020: Kubeflow [project] [certificate]
- 2019: TensorFlow [project] [certificate]
- 2016: R Project for Statistical Computing [project] [certificate]
- Technical Advisor at various exited startups and large organizations details[details]
Active:
- TensorChord - reproducible AI/ML dev environments [project]
- Metabit Trading - cloud-native infrastructure for quantitative trading

Exited / Acquired:
- Chaintool (now Codatta) - distributed graph database for Web3 risk management [project]
- Moises (now Music.AI) - AI-powered music and audio platform
- Maven Wave (now Eviden/Atos) - ML, data visualization, open source strategy [project]
- CSPA (acquired by AngelList) - technical steering and open source strategy
Investor 2024 - current
xAI (Series B, acquired by SpaceX), Figure AI (Series C), Blok (Pre-seed and Seed), Probabl (Seed)
Selected Projects [Full List]
Kubernetes Co-chair & Project Lead
Production-Grade Container Orchestration. Co-Chair, Tech Lead, and Founding Member of Kubernetes AI Conformance and Serving Working Group
Kubeflow Project Lead & Steering Committee Member
Machine learning toolkits on Kubernetes
Argo Project Lead
- Project lead of Argo Workflows, the container-native workflow engine
- Maintainer of Argo CD, declarative continuous delivery for Kubernetes
KServe Project Lead & Technical Steering Committee Member
Standardized distributed generative and predictive AI inference platform for scalable, multi-framework deployment on Kubernetes
TensorFlow Co-author & Maintainer
- Co-author of TensorFlow Estimators and maintainer of TensorFlow I/O.
- Co-author of TensorFlow in R.
- Recipient of Google Open Source Peer Bonus in 2016 for my contributions to TensorFlow. First non-Google maintainer.
XGBoost Project Management Committee Member & Committer
General-purpose gradient boosting library
Llama Stack Maintainer
Standardized and composable building blocks for AI applications
metric-learn Co-author
Python package for state-of-the-art metric learning algorithms
Selected Talks [Full List]
Delivered keynotes to audiences of up to 9,000+ attendees, regular presentations, panels, podcasts, and university lectures at major venues including KubeCon (North America, Europe, China), Open Data Science Conference, PlatformCon, PyData Global, IBM TechXchange, Red Hat Summit, ArgoCon, Cloud Native AI Day, and Purdue University. Served in roles ranging from keynote speaker and lecturer to panel moderator and participant.
Anchoring Trust in the Age of AI: Identities Across Humans, Machines, and Models
Sponsored Keynote Speaker, KubeCon North America 2025 [link]
Advancing Cloud Native AI Innovation Through Open Collaboration
Sponsored Keynote Speaker, Cloud Native & Kubernetes AI Day North America 2024 [link]
Building for the Road Ahead: An Ode to Maintainers, the Life Blood of Our Ecosystem
Invited Keynote Speaker, KubeCon North America 2022 [link]
Publications [Google Scholar]
Authored research publications with 12,000+ citations spanning conference papers (ICDE, KDD), peer-reviewed journals (JMLR, JOSS, The R Journal), books (Manning Publications and Beijing Publishing House of Electronics Industry), and patents (US and China). Work encompasses distributed systems, machine learning, cloud computing, data visualization, and open source software.
Books
Distributed Machine Learning Patterns 2023
Manning Publications, ISBN 9781617299025
[Link] [GitHub]
Dive into Deep Learning (with TensorFlow)《动手学深度学习》 2020
  • Aston Zhang,
  • Zachary C. Lipton,
  • Mu Li,
  • Alexander J. Smola,
  • Anirudh Dagar,
  • Yuan Tang
[Link] [GitHub]
TensorFlow in Practice《TensorFlow实战》 2017
  • Wenjian Huang,
  • Yuan Tang
Beijing Publishing House of Electronics Industry
[Link] [GitHub]
Conference Papers
Couler: Unified Machine Learning Workflow Optimization in Cloud 2024
  • Xiaoda Wang,
  • Yuan Tang,
  • Tengda Guo,
  • Bo Sang,
  • Jingji Wu,
  • Jian Sha,
  • Ke Zhang,
  • Jiang Qian,
  • Mingjie Tang
40th IEEE International Conference on Data Engineering (ICDE) [PDF] [GitHub]
TensorFlow Estimators: Managing Simplicity vs. Flexibility in High-Level Machine Learning Frameworks 2017
  • Heng-Tze Cheng,
  • Lichan Hong,
  • Mustafa Ispir,
  • Clemens Mewald,
  • Zakaria Haque,
  • Illia Polosukhin,
  • Georgios Roumpos,
  • D Sculley,
  • Jamie Smith,
  • David Soergel,
  • Yuan Tang,
  • Philipp Tucker,
  • Martin Wicke,
  • Cassandra Xia,
  • Jianwei Xie
Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) [PDF] [GitHub]
Journal Articles
metric-learn: Metric Learning Algorithms in Python 2020
  • William de Vazelhes,
  • CJ Carey,
  • Yuan Tang,
  • Nathalie Vauquier,
  • Aurélien Bellet
Journal of Machine Learning Research (JMLR) [PDF] [GitHub]
lfda: Local Fisher Discriminant Analysis in R 2019
  • Yuan Tang,
  • Wenxuan Li
Journal of Open Source Software (JOSS) [PDF] [GitHub]
dml: Distance Metric Learning in R 2018
  • Yuan Tang,
  • Tao Gao,
  • Nan Xiao
Journal of Open Source Software (JOSS) [PDF] [GitHub]
autoplotly: An R Package for Automatic Generation of Interactive Visualizations for Statistical Results 2018
Journal of Open Source Software (JOSS) [PDF] [GitHub]
ggfortify: Unified Interface to Visualize Statistical Result of Popular R Packages 2016
  • Yuan Tang,
  • Masaaki Horikoshi,
  • Wenxuan Li
The R Journal [PDF] [GitHub]
Patents
Container Image-Based Storage and Dynamic Image Layer Reordering for Machine-Learned Models 2026
US Patent App. US20260044318A1 [Link]
Topology-Aware Multi-Host Model Serving System with Mirrored Local Image Registries 2026
US Patent App. US20260044362A1 [Link]
System and Method for Constructing Container Image Layers Based on Neural Network Model Layers 2026
US Patent App. US20260017938A1 [Link]
Machine-Learned Models for Colocating Distributed Workloads via Metric Learning 2025
US Patent App. US20250371407A1 [Link]
System and Method for Distributed Task Execution 2023
  • Yi Wang,
  • Wei Yan,
  • Yuan Tang,
  • Haitao Zhang,
  • Chunyang Wen,
  • Minghao Li,
  • Jun Qi,
  • Yongfeng Liu
China Patent CN110609749B [PDF]
Systems and Methods for Detecting and Remedying Software Anomalies 2020
  • Yuan Tang,
  • Tuo Li,
  • James Herzog
US Patent US10635519B1 [PDF]
Preprints
A Scalable and Cloud-Native Hyperparameter Tuning System 2020
  • Johnu George,
  • Ce Gao,
  • Richard Liu,
  • Hou Gang Liu,
  • Yuan Tang,
  • Ramdoot Pydipaty,
  • Amit Kumar Saha
arXiv preprint arXiv:2006.02085 [PDF] [GitHub]
SQLFlow: A Bridge between SQL and Machine Learning 2020
  • Yi Wang,
  • Yang Yang,
  • Weiguo Zhu,
  • Yi Wu,
  • Xu Yan,
  • Yongfeng Liu,
  • Yu Wang,
  • Liang Xie,
  • Ziyao Gao,
  • Wenjing Zhu,
  • Xiang Chen,
  • Wei Yan,
  • Mingjie Tang,
  • Yuan Tang
arXiv preprint arXiv:2001.06846 [PDF] [GitHub]
Incorporating Hierarchical Structure into Dynamic Systems: An Application of Estimating HIV Epidemics at Sub-National and Sub-Population Level 2016
  • Le Bao,
  • Ben Sheng,
  • Xiaoyue Niu,
  • Yuan Tang,
  • Tim Brown,
  • Peter D. Ghys,
  • Jeff W. Eaton
arXiv preprint arXiv:1602.05665 [PDF]
Education
Georgia Institute of Technology
Master of Science in Computer Science (coursework only)
Finished classes: Software Development Process, Databases, Computer Networks, Software Architecture and Design, Artificial Intelligence for Robotics, Data & Visual Analytics, Entrepreneurship, and Computer Law.
Schreyer Honors College at Pennsylvania State University 2012 - 2015
Bachelor of Science in Mathematics with Honors [thesis]
Changjun High School (湖南省长沙市长郡中学) 2009 - 2012
High School Diploma, Science
Awards
Awards by Teams at Red Hat and IBM 2023 - current
- IBM Tech Award, Dec 10th, 2024 [certificate]
- Red Hat AI Engineering Jedi Award, Red Hat Multiplier and Influence, Oct 18th, 2024
- Numerous internal awards and recognitions from Red Hat colleagues [details]
Awards by Teams at Alibaba Group 2020 - 2021
- Inner Source Pioneer, April 17th, 2021 [certificate]
- Top Open Source Contributor of the Year, Jan 20th, 2020
- Best Pull Request of the Week, May 3rd, 2020
Publishing Awards 2017 - 2018
- Outstanding China Mainland Books Copyright Exported to Taiwan, The Publishers Association of China, 2018
- Outstanding Author, Beijing Publishing House of Electronics Industry, 2017 [certificate]
Open Source Peer Bonus Award 2016
Google Inc. [announcement] [letter]
Miscellaneous Awards during College details[details] 2014 - 2016
DataNovo Startup Team, HackRPI, Schreyer Honors College at Penn State
DataNovo Startup Team (2015 - 2016)
- Top 3 Finalist, SXSW Interactive · Top Startup Winner, TiE50 · Trial Support Software Innovation Award, Legaltech News · B2B Finalist, Launch Festival

Hackathon (2014)
- Best Virtual Reality Hack, HackRPI at Rensselaer Polytechnic Institute [announcement] [project]

Scholarships & Grants (2014)
- Pre-Eminence in Honors Education Fund ($5,000) · Summer Research Grants ($1,200) · NSF MCTP Grant and PMASS Fellowship ($12,800) · John K. Tsui Honors Scholarship ($5,300)