Yuan Tang
Education
Georgia Institute of Technology Aug 2019
Master of Science in Computer Science
The Pennsylvania State University Aug 2012 - May 2015
Bachelor of Science in Mathematics with Honors [thesis]
Work Experience
Founding Engineer Sep 2021 - current
- Maintained Argo Workflows and Argo CD, the container-native workflow engine and declarative continuous delivery tools for Kubernetes;
- Designed and implemented major components of the Akuity Platform, an enterprise-ready and fully-managed DevOps platform that is scalable, reliable, and secure;
- Led the efforts to successful SOC 2 Type II compliance certification and hardened our engineering operation and security best practices.
Senior Software Engineer / Engineering Manager June 2018 - Aug 2021
- Built AI infrastructure and AutoML platform on Kubernetes; served as the co-chair of Kubeflow and led the development of various distributed training operators;
- Designed and implemented major components of ElasticDL to support fault-tolerance and elastic scheduling for deep learning workloads; co-inventor of the patent for the underlying system and method for distributed task execution; integrated with SQLFlow to enable machine learning with extended SQL dialect;
- Led the design and development of Couler and Argo Workflows to provide scalable cloud-native workflow orchestration for data science teams.
Senior Platform Engineer Dec 2017 - May 2018
- Contributed to the open source machine learning platform H2O;
- Built the model management component of Driverless AI that automates the end-to-end data science workflows.
Data Science Lead Sep 2015 - Nov 2017
- Led a team that built our scalable data science platform to monitor the condition of industry assets such as trains, airplanes, and wind turbines to avoid machine failures and reduce downtime;
- Built intelligent monitoring platform from the ground up to automatically monitor internal microservices in real-time at scale and was the lead inventor of the patent for the underlying anomaly detection algorithm;
- Led all the open source initiatives within data science team.
Selected Projects [full list]
TensorFlow [link] Committer
End-to-end open source platform for machine learning. Co-author of TensorFlow Estimators and maintainer of TensorFlow I/O
XGBoost [link] Committer & PMC
General-purpose gradient boosting library
Apache MXNet [link] Committer & PMC
Flexible and efficient deep learning library on heterogeneous distributed systems
Kubeflow [link] Co-chair and Technical Lead
Machine learning toolkits on Kubernetes. Co-chair of Distributed Training Working Group and maintainer of various Kubernetes operators
Argo [link] Maintainer
Maintainer of Argo Workflows and Argo CD, the container-native workflow engine and declarative continuous delivery tools for Kubernetes
Couler [link] Technical Lead
Unified interface for constructing and managing workflows on various cloud-native workflow orchestration engines
ElasticDL [link] Maintainer
Kubernetes-native deep learning framework with fault-tolerance and elastic scheduling
TensorFlow in R [link] Author
R interfaces to core TensorFlow components, including Estimators, Keras, and Datasets API
reticulate [link] Author
R interface to Python - comprehensive set of tools for interoperability between Python and R
ggfortify [link] Author
Unified interface to visualize popular statistical results with ggplot2 style
metric-learn [link] Author
Python package for state-of-art metric learning algorithms
Selected Services
Chaintool [link] 2022 - current
Technical Advisor on Distributed Graph Database and Machine Learning Systems [project]
Moises [link] 2021
Technical Advisor on Machine Learning Infrastructure
Maven Wave [link] 2020
Technical Advisor on MLOps and Open Source Strategy [project]
Kubeflow [link] 2020 - current
Co-chair of Distributed Training Working Group
XGBoost [link] 2019 - current
Project Management Committee Member
Synced 机器之心 [link] 2019 - 2020
Insight Partner on AI Technology & Industry Review
Computer Science Proficiency Assessment [link] 2019
Technical Steering Committee Member, Open Source Representative
Journal of Open Source Software [link] 2018 - 2022
- Editor on Machine Learning, Distributed Systems, and Cloud Computing, 2019 - 2022
- Reviewer, 2018
Apache MXNet [link] 2017 - current
Project Management Committee Member
Google Summer of Code [link] 2016 - 2022
- Mentor, Argo, 2022 [project]
- Mentor, Kubeflow, 2020 [project] [certificate]
- Mentor, TensorFlow, 2019 [project] [certificate]
- Mentor, R Project for Statistical Computing, 2016 [project] [certificate]
Selected Talks [full list]
[Keynote] Building for the Road Ahead: An Ode to Maintainers, the Life Blood of Our Ecosystem Oct 26th, 2022
Invited Keynote Speaker, KubeCon North America [link]
Data Science in the Cloud-Native Era April 19th, 2022
Invited Speaker, Open Data Science Conference [link]
[Keynote] When Machine Learning Toolkit for Kubernetes Meets PaddlePaddle Dec 12th, 2021
Invited Keynote Speaker, Wave Summit [link]
Unveil The Secret Ingredients for Argo CD at Enterprise Scale Dec 9th, 2021
Invited Speaker, KubeCon China [link]
Bridging into Python Ecosystem with Cloud-Native Distributed Machine Learning Pipelines Dec 8th, 2021
Speaker, ArgoCon [link]
Towards Cloud-Native Distributed Machine Learning Pipelines at Scale Oct 29th, 2021
Speaker, PyData Global [link]
Large Scale Distributed Deep Learning with Kubernetes Operators May 22nd, 2019
Invited Speaker and AI Media Roundtable Panelist, KubeCon Europe [link]
Considerations for Large Scale Analytics in Production Nov 28th, 2018
Invited Lecturer, Production Scale Implementation of Data Analytics, School of Management, Purdue University [link]
TensorFlow Overview - Why Should Statisticians Care? Oct 18th, 2017
Invited Lecturer, Advanced Machine Learning, Department of Statistics, Purdue University [link]
Introduction to TensorFlow Mar 3rd, 2017
Invited Speaker, American Statistical Association Conference on Recent Advances in Machine Learning [link]
Publications and Patents [Google Scholar]
[Book] Distributed Machine Learning Patterns 2021
  • Yuan Tang
Manning Publications
[Link] [GitHub]
[Book] Dive into Deep Learning (with TensorFlow)《动手学深度学习》 2020
  • Aston Zhang,
  • Zachary C. Lipton,
  • Mu Li,
  • Alexander J. Smola,
  • Anirudh Dagar,
  • Yuan Tang
[Link] [GitHub]
[Patent] System and Method for Distributed Task Execution 2020
  • Yi Wang,
  • Wei Yan,
  • Yuan Tang,
  • Haitao Zhang,
  • Chunyang Wen,
  • Minghao Li,
  • Jun Qi,
  • Yongfeng Liu
China Patent CN110609749A and Hong Kong Patent HK40019539
[PDF]
[Journal] metric-learn: Metric Learning Algorithms in Python 2020
  • William de Vazelhes,
  • CJ Carey,
  • Yuan Tang,
  • Nathalie Vauquier,
  • Aurélien Bellet
Journal of Machine Learning Research
[PDF] [bibtex] [GitHub]
[Preprint] A Scalable and Cloud-Native Hyperparameter Tuning System 2020
  • Johnu George,
  • Ce Gao,
  • Richard Liu,
  • Hou Gang Liu,
  • Yuan Tang,
  • Ramdoot Pydipaty,
  • Amit Kumar Saha
arXiv preprint arXiv:2006.02085
[PDF] [bibtex] [GitHub]
[Patent] Systems and Methods for Detecting and Remedying Software Anomalies 2020
  • Yuan Tang,
  • Tuo Li,
  • James Herzog
United States Patent US10635519B1
[PDF]
[Preprint] SQLFlow: A Bridge between SQL and Machine Learning 2020
  • Yi Wang,
  • Yang Yang,
  • Weiguo Zhu,
  • Yi Wu,
  • Xu Yan,
  • Yongfeng Liu,
  • Yu Wang,
  • Liang Xie,
  • Ziyao Gao,
  • Wenjing Zhu,
  • Xiang Chen,
  • Wei Yan,
  • Mingjie Tang,
  • Yuan Tang
arXiv preprint arXiv:2001.06846
[PDF] [bibtex] [GitHub]
[Journal] lfda: Local Fisher Discriminant Analysis in R 2019
  • Yuan Tang,
  • Wenxuan Li
Journal of Open Source Software
[PDF] [bibtex] [GitHub]
[Journal] dml: Distance Metric Learning in R 2018
  • Yuan Tang,
  • Tao Gao,
  • Nan Xiao
Journal of Open Source Software
[PDF] [bibtex] [GitHub]
[Journal] autoplotly: An R Package for Automatic Generation of Interactive Visualizations for Statistical Results 2018
  • Yuan Tang
Journal of Open Source Software
[PDF] [bibtex] [GitHub]
[Conference] TensorFlow Estimators: Managing Simplicity vs. Flexibility in High-Level Machine Learning Frameworks 2017
  • Heng-Tze Cheng,
  • Lichan Hong,
  • Mustafa Ispir,
  • Clemens Mewald,
  • Zakaria Haque,
  • Illia Polosukhin,
  • Georgios Roumpos,
  • D Sculley,
  • Jamie Smith,
  • David Soergel,
  • Yuan Tang,
  • Philipp Tucker,
  • Martin Wicke,
  • Cassandra Xia,
  • Jianwei Xie
Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
[PDF] [bibtex] [GitHub]
[Book] TensorFlow in Practice《TensorFlow实战》 2017
  • Wenjian Huang,
  • Yuan Tang
Beijing Publishing House of Electronics Industry
[Journal] ggfortify: Unified Interface to Visualize Statistical Result of Popular R Packages 2016
  • Yuan Tang,
  • Masaaki Horikoshi,
  • Wenxuan Li
The R Journal
[PDF] [bibtex] [GitHub]
[Preprint] Incorporating Hierarchical Structure into Dynamic Systems: An Application of Estimating HIV Epidemics at Sub-National and Sub-Population Level 2016
  • Le Bao,
  • Ben Sheng,
  • Xiaoyue Niu,
  • Yuan Tang,
  • Tim Brown,
  • Peter D. Ghys,
  • Jeff W. Eaton
arXiv preprint arXiv:1602.05665
[PDF] [bibtex]
Awards
Multiple Awards by Teams at Alibaba Group 2020 - 2021
- Inner Source Pioneer, April 17th, 2021 [certificate]
- Top Open Source Contributor of the Year, Jan 20th, 2020
- Best Pull Request of the Week, May 3rd, 2020
Outstanding China Mainland Books Copyright Exported to Taiwan 2018
The Publishers Association of China
Outstanding Author 2017
Beijing Publishing House of Electronics Industry
Open Source Peer Bonus Award 2016
Google Inc.
Multiple Awards to DataNovo Startup Team 2015 - 2016
- Top 3 Finalist by SXSW Interactive, 2016
- Top Startup Winner by TiE50, 2016
- Trial Support Software Innovation Award by Legaltech News, The Recorder, 2016
- B2B Finalist by Launch Festival, 2015
Best Virtual Reality Hack at HackRPI 2014
Rensselaer Polytechnic Institute
Multiple Awards by Schreyer Honors College at Penn State 2014
- Pre-Eminence in Honors Education Fund ($5,000)
- Summer Research Grants ($1,200)
- NSF MCTP Grant and PMASS Fellowship ($12,800)
- John K. Tsui Honors Scholarship ($5,300)