There are a number of deployment options for installing Kubeflow with AWS service integrations.
The following installation guides assume that you have an existing Kubernetes cluster. To get started with creating an Amazon Elastic Kubernetes Service (EKS) cluster, see Getting started with Amazon EKS -
eksctl. To verify compatibility between EKS Kubernetes and Kubeflow versions during setup, see Amazon EKS and Kubeflow Compatibility.
Note: It is necessary to use a Kubernetes cluster with compatible tool versions and compute power. For more information, see the specific prerequisites for the deployment option of your choosing.
If you experience any issues with installation, see Troubleshooting Kubeflow on AWS.
Read on to explore more options for AWS-integrated deployment options.
Components configured for Cognito, RDS and S3
There is a single guide for deploying Kubeflow on AWS with RDS, S3, and Cognito.
Vanilla version with Dex for auth and EBS volumes as PV
Components configured for RDS and S3
Components configured for Cognito
Additional component integrations
Using EFS with Kubeflow
Amazon EFS supports
ReadWriteMany access mode, which means the volume can be mounted as read-write by many nodes. This is useful for creating a shared filesystem that can be mounted into multiple pods, as you may have with Jupyter. For example, one group can share datasets or models across an entire team.
Refer to the Amazon EFS example for more information.
Using FSx for Lustre with Kubeflow
Amazon FSx for Lustre provides a high-performance file system optimized for fast processing for machine learning and high performance computing (HPC) workloads. Lustre also supports
ReadWriteMany. One difference between Amazon EFS and Lustre is that Lustre can be used to cache training data with direct connectivity to Amazon S3 as the backing store. With this configuration, you don’t need to transfer data to the file system before using the volume.
Refer to the Amazon FSx for Lustre example for more details.
AWS uses customer feedback and usage information to improve the quality of the services and software we offer to customers. We have added usage data collection to the AWS Kubeflow distribution in order to better understand customer usage and guide future improvements. Usage tracking for Kubeflow is activated by default, but is entirely voluntary and can be deactivated at any time.
Usage tracking for Kubeflow on AWS collects the instance ID used by one of the worker nodes in a customer’s cluster. This data is sent back to AWS once per day. Usage tracking only collects the EC2 instance ID where Kubeflow is running and does not collect or export any other data to AWS. If you wish to deactivate this tracking, instructions are below.
How to activate usage tracking
Usage tracking is activated by default. If you deactivated usage tracking for your Kubeflow deployment and would like to activate it after the fact, you can do so at any time with the following command:
kustomize build distributions/aws/aws-telemetry | kubectl apply -f -
How to deactivate usage tracking
Before deploying Kubeflow:
You can deactivate usage tracking by skipping the telemetry component installation in one of two ways:
- For single line installation, comment out the
aws-telemetryline in the
kustomization.yamlfile. e.g. in cognito-rds-s3 kustomization.yaml file:
- For individual component installation, do not install the
# AWS Telemetry - This is an optional component. See usage tracking documentation for more information kustomize build distributions/aws/aws-telemetry | kubectl apply -f -
After deploying Kubeflow:
To deactivate usage tracking on an existing deployment, delete the
aws-kubeflow-telemetry cronjob with the following command:
kubectl delete cronjob -n kubeflow aws-kubeflow-telemetry
Information collected by usage tracking
- Instance ID - We collect the instance ID used by one of the worker nodes in the customer’s EKS cluster. This collection occurs once per day.
The telemetry data we collect is in accordance with AWS data privacy policies. For more information, see the following:
Kubeflow provides multi-tenancy support and users are not able to create notebooks in either the
default namespaces. For more information, see Multi-Tenancy.
Automatic profile creation is not enabled by default. To create profiles as an administrator, see Manual profile creation.
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.