Run Kubeflow natively on Docker Desktop for Mac or Windows
This is a guest post by Alex Iankoulski, Docker Captain and full stack software and infrastructure architect at Shell New Energies. The views expressed here are his own and are neither opposed or endorsed by Shell or Docker.
In this blog, I will show you how to use Docker Desktop for Mac or Windows to run Kubeflow. To make this easier, I used my Depend on Docker project, which you can find on Github.
Rationale
Even though we are experiencing a tectonic shift of development workflows in the cloud era towards hosted and remote environments, a substantial amount of work and experimentation still happens on developer’s local machines. The ability to scale down allows us to mimic a cloud deployment locally and enables us to play, learn quickly, and make changes in a safe, isolated environment. A good example of this rationale is provided by Kubeflow and MiniKF.
Overview
Since Kubeflow was first released by Google in 2018, adoption has increased significantly, particularly in the data science world for orchestration of machine learning pipelines. There are various ways to deploy Kubeflow both on desktops and servers as described in its Getting Started guide. However, the desktop deployments for Mac and Windows rely on running virtual machines using Vagrant and VirtualBox. If you do not wish to install Vagrant and VirtualBox on your Mac or PC but would still like to run Kubeflow, then you can simply depend on Docker! This article will show you how to deploy Kubeflow natively on Docker Desktop.
Setup
Prerequisites
Kubeflow has a hard dependency on Kubernetes and the Docker runtime. The easiest way to satisfy both of these requirements on Mac or Windows is to install Docker Desktop (version 2.1.x.x or higher). In the settings of Docker Desktop, navigate to the Kubernetes tab and check “Enable Kubernetes”:
Enabling the Kubernetes feature in Docker Desktop creates a single node Kubernetes cluster on your local machine.
This article offers a detailed walkthrough of setting up Kubeflow on Docker Desktop for Mac. Deploying Kubeflow on Docker Desktop for Windows using Linux containers requires two additional prerequisites:
- Linux shell – to run the bash commands from the Kubeflow installation instructions
- Kfctl and kubectl CLI – to initialize, generate, and apply the Kubeflow deployment
The easiest way to satisfy both of these dependencies is to run a Linux container that has the kfctl and kubectl utilities. A Depend on Docker project was created for this purpose. To start a bash shell with the two CLI’s available, just execute:
docker run -it --rm -v <kube_config_folder_path>:/root/.kube iankoulski/kfctl bash
The remaining setup steps for both Mac and Windows are the same.
Resource Requirements
The instructions for deployment of Kubeflow on a pre-existing Kubernetes cluster specify the following resource requirements:
- 4 vCPUs
- 50 GB storage
- 12 GB memory
The settings in Docker Desktop need to be adjusted to accommodate these requirements as shown below.
Note that the settings are adjusted to more than the minimum required resources to accommodate system containers and other applications that may be running on the local machine.
Deployment
We will follow instructions for the kfctl_k8s_istio configuration.
- Download your preferred version from the release archive:
curl -L -o kfctl_v0.6.2_darwin.tar.gz
https://github.com/kubeflow/kubeflow/releases/download/v0.6.2/kfctl_darwin.tar.gz
- Extract the archive:
tar -xvf kfctl_v0.6.2_darwin.tar.gz
- Set environment variables:
export PATH=$PATH:$(pwd)
export KFAPP=localkf
export CONFIG=https://raw.githubusercontent.com/kubeflow/kubeflow/v0.6-branch/bootstrap/config/kfctl_k8s_istio.0.6.2.yaml - Initialize deployment:
kfctl init ${KFAPP} --config=${CONFIG}
cd ${KFAPP}
kfctl generate all -V
Note: The above instructions are for Kubeflow release 0.6.2 and are meant to use as an example. Other releases would have slightly different archive filename, environment variable names and values, and kfctl commands. Those would be available in the release-specific deployment instructions.
- Pre-pull container images (optional)
To facilitate the deployment of Kubeflow locally, we can pre-pull all required Docker images. When the container images are already present on the machine, the memory usage of Docker Desktop stays low. Pulling all images at the time of deployment may cause large spikes in memory utilization and can cause Docker Daemon to run out of resources. Pre-pulling images is especially helpful when running Kubeflow on a 16GB laptop.
To pre-pull all container images, execute the following one-line script in your $KFAPP/kustomize folder:
for i in $(grep -R image: . | cut -d ':' -f 3,4 | uniq | sed -e 's/ //' -e 's/^"//' -e 's/"$//'); do echo "Pulling $i"; docker pull $i; done;
Depending on your Internet connection, this could take several minutes to complete. Even if Docker Desktop runs out of resources, restarting it and running the script again will resume pulling the remaining images from where you left off.
If you are using the kfctl container on Windows, you may wish to modify the one-line script above so it saves the docker pull commands to a file and then execute them from your preferred Docker shell.
- Apply Kubeflow deployment to Kubernetes:
cd ${KFAPP}
kfctl apply all -V
Note: An existing deployment can be removed by executing “kfctl delete all -V”
- Determine the Kubeflow entrypoint
To determine the endpoint, list all services in the istio-system namespace:kubectl get svc -n istio-system
The Kubeflow end-point service is through the ingress-gateway service on the NodePort connected with the default HTTP port (80). The Node Port number is 31380. To access Kubeflow use: http://127.0.0.1:31380
Using Kubeflow
The Kubeflow central dashboard is now accessible:
We can run one of the sample pipelines that is included in Kubeflow. Select Pipelines, then Experiments, and choose Conditional expression (or just click the [Sample] Basic – Conditional expression link on the dashboard screen).
Next, click the +Create run button, enter a name (e.g. conditional-execution-test), choose an experiment, and then click Start to initiate the run. Navigate to your pipeline by selecting it from the list of runs.
The completed pipeline run looks similar to Fig. 9 above. Due to the random nature of the coin flip in this pipeline, your actual output is likely to be different. Select a node in the graph to review various assets associated with that node, including its logs.
Conclusion
Docker Desktop enables you to easily run container applications on your local machine, including ones that require a Kubernetes cluster. Kubeflow is a deployment that typically targets larger clusters either in cloud or on-prem environments. In this article we’ve demonstrated how to deploy and use Kubeflow locally on your Docker Desktop.
References
- Docker Desktop
- About Kubeflow
- MiniKF Rationale
- Kubernetes
- Kubeflow Getting Started
- Vagrant
- Virtual Box
- Kubeflow deployment instructions
- Depend on Docker project
- Kfctl container image
Credits
I’d like to thank the following people for their help with this post and related topics:
- Yannis Zarkadas, Arrikto
- Constantinos Venetsanopoulos, Arrikto
- Josh Bottum, Arrikto
- Fabio Nonato de Paula, Shell
- Jenny Burcio, Docker
- David Aronchick, Microsoft
- Stephen Turner, Docker
- David Friedlander, Docker
To learn more about Docker Desktop and running Kubernetes with Docker:
- Learn about designing your first application in Kubernetes.
- Try Play with Kubernetes, powered by Docker.
- Learn more about Docker Desktop and the new Docker Desktop Enterprise