Awesome Kubeflow
🔔 NEWS:
- Join us at Kubeflow Summit, colocated with KubeCon/CloudNativeCon Europe 2025. Register now!
- Kubeflow v1.9 released.
- Videos from Kubeflow Summit Europe 2024 is available.
- Kubeflow Steering Committee announced.
- The book Distributed Machine Learning Patterns from Manning Publications (uses Kubeflow) is officially published.
- Kubeflow is now an incubating project in CNCF.
A curated list of awesome projects and resources related to Kubeflow, a Cloud Native Computing Foundation (CNCF) incubating project (announcement).
What is Kubeflow?
The Kubeflow project is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable.
Table of Contents
Ecosystem Projects
Main projects in Kubeflow:
- Kubeflow Main Repository which provides the front-end to access major components of Kubeflow.
- Katib is a Kubernetes-native project for automated machine learning (AutoML).
- Pipelines is dedicated to making deployments of machine learning workflows on Kubernetes simple, portable, and scalable with Kubeflow.
- Training Operator provides Kubernetes custom resources that makes it easy to run distributed or non-distributed TensorFlow/PyTorch/Apache MXNet/XGBoost/MPI jobs on Kubernetes.
- Arena is a CLI for Kubeflow.
Other open source projects that use or integrate with Kubeflow:
- Argo Workflows is a container-native workflow engine for orchestrating parallel jobs on Kubernetes.
- Couler provides a unified interface for constructing and managing workflows on different workflow engines.
- deployKF effortlessly integrates Kubeflow and leading MLOps tools on Kubernetes into open ML platforms.
- Kale is aims at simplifying the data science experience of deploying Kubeflow Pipelines workflows.
- Kedro is an open-source Python framework for creating reproducible, maintainable and modular data science code.
- KServe is a standardized serverless ML inference platform on Kubernetes.
- MLRun is an open MLOps platform for quickly building and managing continuous ML applications across their lifecycle.
- ModelDB is an open-source system to version machine learning models including their ingredients code, data, config, and environment and to track ML metadata across the model lifecycle.
- Polyaxon is a platform for building, training, and monitoring large scale deep learning applications.
- Seldon is an MLOps framework to package, deploy, monitor and manage thousands of production machine learning models.
- SQLFlow extends SQL to support AI and compiles the SQL program to a workflow that runs on Kubernetes.
- ZenML is a framework to build portable, production-ready MLOps pipelines.
- Elyra is a set of AI-centric extensions to JupyterLab Notebooks, that contains a visual pipeline editor.
- Pipeline Editor web app that allows the users to build and run Machine Learning pipelines using drag and drop. A VSCode extension can be found here.
- WizStudio is a web based tool that allows the users to build Kubeflow pipelines using drag and drop interface.
Books
- Continuous Machine Learning with Kubeflow introduces you to the modern machine learning infrastructure, which includes Kubernetes and the Kubeflow architecture. This book will explain the fundamentals of deploying various AI/ML use cases with TensorFlow training and serving with Kubernetes and how Kubernetes can help with specific projects from start to finish.
- 🔥[New!] Distributed Machine Learning Patterns teaches you how to take machine learning models from your personal laptop to large distributed clusters. You’ll explore key concepts and patterns behind successful distributed machine learning systems, and learn technologies like TensorFlow, Kubernetes, Kubeflow, and Argo Workflows with real-world scenarios and hands-on projects.
- Kubeflow for Machine Learning: From Lab to Production helps data scientists build production-grade machine learning implementations with Kubeflow and shows data engineers how to make models scalable and reliable.
- Kubeflow in Action: End-to-End Machine Learning is an authoritative hands-on guide to deploying machine learning to production using the Kubeflow MLOps platform.
- Kubeflow Operations Guide: Managing Cloud and On-Premise Deployment shows data scientists, data engineers, and platform architects how to plan and execute a Kubeflow project to make their Kubernetes workflows portable and scalable.
Blog Posts
Please check out the official Kubeflow Project blog. Additional blog posts:
- Data Science Meets Devops: MLOps with Jupyter, Git, & Kubernetes
- Elastic Training with MPI Operator and Practice
- Enabling Kubeflow with Enterprise-Grade Auth for On-Premise Deployments
- GitOps for Kubeflow using Argo CD
- Hardening Kubeflow Security for Enterprise Environments
- Humans of Cloud Native: From Argo to Mentoring and Everything In Between
- Introduction to Kubeflow MPI Operator and Industry Adoption
- KServe: The Next Generation of KFServing
- Kubeflow & Kale Simplify Building Better ML Pipelines With Automatic Hyperparameter Tuning
- Kubeflow’s 1.4 Release Lays the Foundation for Advanced ML Metadata Workflows
- Kubeflow 1.0 - Cloud Native ML for Everyone
- Kubeflow 1.1 Improves ML Workflow Productivity, Isolation & Security, and GitOps
- Kubeflow Continues to Move into Production
- Kubeflow Has Applied To Become a CNCF Incubating Project
- Kubeflow Katib: Scalable, Portable and Cloud Native System for AutoML
- Kubeflow v1.5 Improves ML Model Accuracy, Reduces Infrastructure Costs and Optimizes MLOps
- Kubeflow v1.6 Delivers Support for Kubernetes v1.22 and Introduces an Alpha Release of the Kubeflow Pipeline v2 Functionality
- Kubeflow 1.9: New Tools for Model Management and Training Optimization
- Kubeflow Welcomes Two Google Summer of Code Students
- Kubeflow’s 2nd Doc Sprint: 10+ New Docs & Samples Ahead of Kubeflow 1.0
- Kubeflow is More Accessible than Ever
- Operationalize, Scale and Infuse Trust in AI Models using KFServing
- Open Source AI at Red Hat: Our Journey in the Kubeflow Community
- Record Metadata on Kubeflow from Notebooks
- Running Kubeflow at Intuit: Enmeshed in the Service Mesh
- Scalable and Cloud-Native Hyperparameter Tuning System
- The Kubeflow 1.3 Release Streamlines ML Workflows and Simplifies ML Platform Operations
- Unified Training Operator Release Announcement
- ZenML + Kubernetes + Kubeflow: Leveraging your MLOps infrastructure
Videos
Please check out the official Kubeflow YouTube channel.
Kubeflow Summit talks playlist
Additional videos
- Best Practices for Deploying LLM Inference, RAG and Fine Tuning Pipelines
- Optimizing Load Balancing and Autoscaling for Large Language Model (LLM) Inference on Kubernetes
- Engaging the KServe Community, The Impact of Integrating a Solutions with Standardized CNCF Projects
- Advancing Cloud Native AI Innovation Through Open Collaboration
- Unlocking Potential of Large Models in Production
- WG Serving: Accelerating AI/ML Inference Workloads on Kubernetes
- Kubernetes Working Group Serving, with Yuan Tang and Eduardo Arango
- A 10 Minute Introduction to Kubeflow: Basics, Architecture & Components
- Accelerate ML Model Development for Autonomous Vehicles in Aurora
- Accelerating Machine Learning App Development with Kubeflow Pipelines
- A Simple, NVIDIA-accelerated Kubeflow Pipeline
- A Tour of Katib’s new UI for Kubeflow 1.3
- AutoML and Training WG Summit 2021
- Bridging into Python Ecosystem with Cloud-Native Distributed Machine Learning Pipelines
- Building a Machine Learning Pipeline with Kubeflow
- Building and Managing a Centralized Kubeflow Platform at Spotify
- Building an ML Application Platform from the Ground Up
- Building AutoML Pipelines With Argo Workflows and Katib
- Building end-to-end ML workflows with Kubeflow Pipelines
- Building Real Time Image Classification with Kubeflow Orchestrator
- Building Together: Community in Kubeflow
- Charmed for Kubeflow: A Distribution for Everybody
- Cloud Native AutoML with Argo Workflows and Katib
- Converting Kaggle Competitions into Kubeflow Pipelines
- DGL Operator and Graph Training
- Distributed Training and HPO Deep Dive
- Engineering Cloud Native AI Platform
- Enterprise MLOps using Kubeflow with DKube
- Experiment Tracking with Kubeflow
- Feast: Feature Storage for Machine Learning
- From Notebook to Kubeflow Pipelines to KFServing: the Data Science Odyssey
- From Notebook to Kubeflow Pipelines with HP Tuning
- From Notebook to Kubeflow Pipelines with MiniKF & Kale
- From Zero to Kubeflow
- Hiding Kubernetes Complexity for ML Engineers Using Kubeflow
- Hyperparameter Tuning Using Kubeflow
- Hyperparameter Tuning with Katib
- Introducing Couler: Unified Interface for Constructing and Managing Workflows
- Katib and Training Operator
- Katib User Journey
- KFServing: Enabling Serverless Workloads Across Model Frameworks
- KServe: The State and Future of Cloud Native Model Serving
- Kubeflow & Alibaba Arena
- Kubeflow & TFX
- Kubeflow 101 from Google Cloud
- Kubeflow: Machine Learning on Kubernetes
- Kubeflow and the ML Landscape
- Kubeflow Experiments at LinkedIn
- Kubeflow Fairing
- Kubeflow for Enterprise – Samsung Case
- Kubeflow inference on knative
- Kubeflow Katib & Hyperparameter Tuning
- Kubeflow Pipelines 2.0: Introduction & Roadmap
- Kubeflow Universal Training Operator
- Kubeflow vs SageMaker in Machine Learning
- Machine Learning as Code: GitOps for ML with Kubeflow and ArgoCD
- Managing Thousands of Automatic Machine Learning Experiments with Argo and Katib
- MiniKF: The Fastest and Easiest Way to a Local Kubeflow
- MLOps and AutoML in Cloud-Native Way with Kubeflow and Katib
- ModelDB: Open-source Model Management
- Model Monitoring for Model Trained and Served on Kubeflow
- Multi-user Kubeflow Environments
- Nested Workflows in Kubeflow Pipelines
- Neural Architecture Search System on Kubeflow
- New UI for Kubeflow components
- Orchestrating Apache Spark with Kubeflow on Kubernetes
- Paddle Operator and EDL Introduction
- Production-Ready AI Platform on Kubernetes
- Roblox User Story
- Serverless Magic for ML Orchestration using Kubeflow
- Taming Your AI/ML Workloads with Kubeflow
- Tour of New Katib UI
- Towards Cloud-Native Distributed Machine Learning Pipelines at Scale
- Training and Serving ML Model using Kubeflow
- Understanding the Earth: Machine Learning with Kubeflow Pipelines
- Using Pipelines in Katib
- When Machine Learning Toolkit for Kubernetes Meets PaddlePaddle
Community
- Community Calendar
- Kubeflow Steering Committee (KSC)
- Working Groups
- GitHub Organization
- Community Governance
- Community User Surveys (2024, 2023, 2022, 2019 Fall, 2019 Spring)
Social media accounts: