Introduction

Kubeflow Operator introduction

This guide describes the Kubeflow Operator and the current supported releases of Kubeflow Operator.

Kubeflow Operator

Kubeflow Operator helps deploy, monitor and manage the lifecycle of Kubeflow. Built using the Operator Framework which offers an open source toolkit to build, test, package operators and manage the lifecycle of operators.

The operator is currently in incubation phase and is based on this design doc. It is built on top of KfDef CR, and uses kfctl as the nucleus for Controller. Current roadmap for this Operator is listed here. The Operator is also published on OperatorHub.

Applications and components to be deployed as part of Kubeflow platform are defined in the KfDef configuration manifest. Each application has a kustomize configuration with all its resource manifests. KfDef spec includes the applications field. Application are specified in the kustomizeConfig field. parameters and overlays may be used to provide custom setting for the application. repoRef field specifies the path to retrieve the application’s kustomize configuration.

KfDef spec may also include a plugins field for certain cloud platforms, including AWS and GCP. It is used by the platforms to preprocess certain tasks before Kubeflow deployment.

An example of KfDef is as follow:

apiVersion: kfdef.apps.kubeflow.org/v1
kind: KfDef
metadata:
  namespace: kubeflow
spec:
  applications:
  # Install Istio
  - kustomizeConfig:
      repoRef:
        name: manifests
        path: stacks/ibm/application/istio-stack
    name: istio-stack
  # Install Kubeflow applications.
  - kustomizeConfig:
      repoRef:
        name: manifests
        path: stacks/ibm
    name: kubeflow-apps
  # Other applications
  - kustomizeConfig:
      repoRef:
        name: manifests
        path: stacks/ibm/application/spark-operator
    name: spark-operator
  # Model Serving applications
  - kustomizeConfig:
      repoRef:
        name: manifests
        path: knative/installs/generic
    name: knative
  - kustomizeConfig:
      repoRef:
        name: manifests
        path: kfserving/installs/generic
    name: kfserving
  repos:
  - name: manifests
    uri: https://github.com/kubeflow/manifests/archive/master.tar.gz
  version: master

More KfDef examples may be found in Kubeflow manifests repo. Users can pick one there and make some modification to fit their requirements. OpenDataHub project also maintains a KfDef manifest for Kubeflow deployment on OpenShift Container Platforms.

The operator watches on all KfDef configuration instances in the cluster as custom resources (CR) and manage them. It handles reconcile requests to all the KfDef instances. To understand more on the operator controller behavior, refer to this controller-runtime link.

Kubeflow Operator shares the same packages and functions as the kfctl CLI, which is the command line approach to deploy Kubeflow. Therefore, the deployment flow is similar except that the ownerReferences metadata is added for each application’s Kubernetes object. The KfDef CR is the parent of all these objects. Kubeflow Operator does better in tearing down the Kubeflow deployment than the CLI approach. When the KfDef CR is deleted, Kubernetes garbage collection mechanism then takes over the responsibility to remove all and only the resources deployed through this KfDef configuration.

One of the many good reasons to use an operator is to monitor the resources. The Kubeflow Operator also watches all child resources of the KfDef CR. Should any of these resources be deleted, the operator would try to apply the resource manifest and bring the object up again.

The operator responds to following events:

  • When a KfDef instance is created or updated, the operator’s reconciler will be notified of the event and invoke the Apply functions provided by the kfctl package to deploy Kubeflow. The Kubeflow resources specified with the manifests will be owned by the KfDef instance with their ownerReferences set.

  • When a KfDef instance is deleted, since the owner is deleted, all the secondary resources owned by it will be deleted through the garbage collection. In the mean time, the reconciler will be notified of the event and remove the finalizers.

  • When any resource deployed as part of a KfDef instance is deleted, the operator’s reconciler will be notified of the event and invoke the Apply functions provided by the kfctl package to re-deploy the Kubeflow. The deleted resource will be recreated with the same manifest as specified when the KfDef instance is created.

Deploying Kubeflow with the Kubeflow Operator includes two steps: installing the Kubeflow Operator followed by deploying the KfDef custom resource.

Current Tested Operators and Pre-built Images

Kubeflow Operator controller logic is based on the kfctl package, so for each major release of kfctl, an operator image is built and tested with that version of manifests to deploy a KfDef instance. Following table shows what releases have been tested.

branch tag operator image manifests version kfdef example note
v1.0 aipipeline/kubeflow-operator:v1.0.0 1.0.0 kfctl_k8s_istio.v1.0.0.yaml
v1.0.1 aipipeline/kubeflow-operator:v1.0.1 1.0.1 kfctl_k8s_istio.v1.0.1.yaml
v1.0.2 aipipeline/kubeflow-operator:v1.0.2 1.0.2 kfctl_k8s_istio.v1.0.2.yaml
v1.1.0 aipipeline/kubeflow-operator:v1.1.0 1.1.0 kfctl_ibm.v1.1.0.yaml
master aipipeline/kubeflow-operator:master master kfctl_ibm.yaml as of 07/29/2020

Note: if building a customized operator for a specific version of Kubeflow is desired, you can run git checkout to that specific branch tag. Keep in mind to use the matching version of manifests.


Last modified 20.08.2021: Fix typos in docs (#2884) (e1639edb)