Jun 02, 2020 Dr. Design

NVIDIA K8s Device Plugin for Wind River Linux

By Pablo Rodriguez Quesada

Pablo

This article is part two of a series on orchestrating container workloads on Nvidia GPUs.  Read the first part, NVIDIA container runtime for Wind River Linux.

Introduction

The advent of containers has changed the way computational workloads are managed and orchestrated in modern computing environments. Given the paradigm shift towards the microservices, container orchestration has become of critical importance in today’s distributed and cloud systems [1].

Managing edge devices on the scale of hundreds and thousands is an onerous task. Fortunately, orchestrators such as Kubernetes take the complexity out of updates, roll-backs, and more in a platform-agnostic environment. [2]. Orchestrators provide the means to manage heterogeneous edge clusters. It is necessary to not only orchestrate containers but to discover the hardware specialized devices that the containers and orchestrator can leverage. Failing to manage these resources can lead to inefficiency, time drain, concurrency issues, and more.

Background

In the context of Service-Oriented Architectures (SOA), orchestration is the automated configuration, coordination, and management of computer systems and software. Orchestration provides an automation model where process logic is centralized yet still extensible and composable. The use of this paradigm can significantly reduce the complexity of solution environments [3]. Containers provide a convenient means to deploy software in a managed and controlled way. The controlled deployment of containers is made using Kubernetes, making computing at the edge completely cloud-native and intelligent, scalable, and secure [2].

Kubernetes

Kubernetes (K8s) is a portable, extensible, open-source platform orchestrator for managing containerized workloads and services. It facilitates both declarative configuration and automation [4].

Two types of resources form a K8s cluster, the master nodes, and the worker nodes. K8s master nodes run essential cluster services. Worker nodes run the scheduled containerized workloads in units called pods (See Figure 1 below). Pods are the smallest deployable computing units that can be created and managed in Kubernetes; they encapsulate one or more containers that share resources, including a single IP address [5].

Kubernetes architecture overview
Figure 1. Kubernetes architecture overview [6].

Device Plugins

Nodes with specialized hardware need to make the orchestrator aware so that orchestrators can manage resources and control concurrency of applications. The K8s community developed an interface to address this need called device plugins.

A K8s device plugin is a gRPC Remote Procedure Calls (gRPC) server that adds support for specific vendor-specific devices. The plugins allow discovery and health checks of the devices, which allows hooking the runtime to make devices available in containers and cleaning up. K8s designed these servers to be an external part of the cluster so that they are independent, and vendors can customize them depending on their needs [7].

When setting up a cluster, an administrator knows what kind of devices are present on the different machines and can install the device plugin to manage the resources automatically. The plugin detects the devices and advertises the resources to the K8s cluster through a code name, At the end-user side, the application specifies hardware requirements, and the cluster allocates an application to the specific resource in the best node available [7].

K3S

In this tutorial, we use K3s, a K8s custom-made distribution developed by Rancher Labs, with its focus on edge. It has many enhancements that make it suitable for this project; for example, it is lightweight and packed in a single binary, which makes it suitable for embedded[8].

Another enhancement that K3s offers are a lightweight storage backend mounted on top of sqlite3, and minimized versions of dependencies. These dependencies are tailored for the embedded world and include all the basic functionalities of a K8s cluster. As an example, Rancher labs developed its storage driver called Local Path Provisioner that enables the ability to create persistent volume claims out of the box using local storage on the respective node [8]. These improvements and more make the Rancher version of K8s suitable for the needs of this tutorial.

NVIDIA Containers Orchestration

The NVIDIA device plugin for K8s is a Daemon set that allows the cluster to expose the number of GPUs available on each node automatically, keep track of the health and run GPU enabled containers in a K8s cluster. To use this plugin, it is necessary to have the NVIDIA-docker stack installed in the node as well as the NVIDIA and CUDA drivers [9]. The plugin automatically performs the registration and communication with the cluster so that a user of the cluster can request GPU resources in its pods.

One limitation of this plugin has is that it only supports Kubernetes clusters that run on Intel 64 bit architecture. As a result, it is not possible to orchestrate containers on embedded ARM-based devices or NVIDIA Jetson devices [9]. This article utilizes a custom device plugin that enables the use of the NVIDIA GPUs on Jetson boards. By modifying the existing plugin to support other architectures and creating an ARM64 container with the modified source code, we can orchestrate both Intel and ARM-based nodes.

Requisites

This article is part two of a series on orchestrating container workloads on Nvidia GPUs. Read the first part, NVIDIA container runtime for Wind River Linux.

We assume a booted Jetson Board with the following requirements:

  • NVIDIA drivers  = 384.81
  • nvidia-docker version > 2.0 (see how to install and it’s
    prerequisites)
  • docker configured with nvidia as the default runtime.
  • Kubernetes version >= 1.10
  • Wind River Linux >= LTS 19

Source: [9]

Wind River Linux Image

For the OS image, make sure to add the following packages into your project:

  • docker-ce
  • git
  • openssh

Install K3s

The following script downloads the latest version of K3s available; however, note that this tutorial uses version v1.18.2+k3s1 of K3s.

mkdir -p /usr/local/bin
curl -sfL https://get.k3s.io | sh -

After installation, check for status:

kubectl get nodes

Example output:

root@jetson-nano-qspi-sd:~# kubectl get nodes
NAME                  STATUS   ROLES    AGE   VERSION
jetson-nano-qspi-sd   Ready    master   88s   v1.18.2+k3s1

Change the K3s Default Runtime

To use the NVIDIA runtime, add Docker as the default’s container runtime:

sed -i 's/server \\/server --docker \\/'  /etc/systemd/system/k3s.service
systemctl daemon-reload
systemctl restart k3s

Then, add NVIDIA container runtime as the default docker runtime by modifying the file: /etc/docker/daemon.json as follows:

{
  "runtimes": {
    "nvidia": {
      "path": "/usr/bin/nvidia-container-runtime",
      "runtimeArgs": []
    }
  },
  "default-runtime": "nvidia"
}

Then, restart the Docker daemon:

systemctl restart docker

Install the NVIDIA K8s Device Plugin

To have the device plugin working on ARM64 architectures, we need to edit the NVIDIA device plugin with the following patches:

Clone the original NVIDIA device plugin repo and apply the patches:

$ git clone -b 1.0.0-beta6 https://github.com/nvidia/k8s-device-plugin.git
$ cd k8s-device-plugin
$ wget https://labs.windriver.com/downloads/0001-arm64-add-support-for-arm64-architectures.patch
$ wget https://labs.windriver.com/downloads/0002-nvidia-Add-support-for-tegra-boards.patch
$ wget https://labs.windriver.com/downloads/0003-main-Add-support-for-tegra-boards.patch
$ git am 000*.patch

Then, build the device plugin container:

$ docker build -t nvidia/k8s-device-plugin:1.0.0-beta6 -f docker/arm64/Dockerfile.ubuntu16.04 .

Next, deploy the container into your cluster:

$ kubectl apply -f nvidia-device-plugin.yml

Finally, check the status of the pods and what until all of them are running:

$ kubectl get pods -A

An example output of the device plugin is as follows:

root@jetson-nano-qspi-sd:~/test/k8s-device-plugin# kubectl logs nvidia-device-plugin-daemonset-k8g57 --namespace=kube-system
2020/05/29 19:49:07 NVIDIA Tegra device detected!
2020/05/29 19:49:07 Starting FS watcher.
2020/05/29 19:49:07 Starting OS watcher.
2020/05/29 19:49:07 Retreiving plugins.
2020/05/29 19:49:07 Starting GRPC server for 'nvidia.com/gpu'
2020/05/29 19:49:07 Starting to serve 'nvidia.com/gpu' on /var/lib/kubelet/device-plugins/nvidia.sock
2020/05/29 19:49:07 Registered device plugin for 'nvidia.com/gpu' with Kubelet

Results

After following the installation steps, you would have a working Kubernetes node with an additional nvidia.com/gpu resource:

$ kubectl describe node
...
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests   Limits
  --------           --------   ------
  cpu                100m (2%)  0 (0%)
  memory             70Mi (1%)  170Mi (4%)
  ephemeral-storage  0 (0%)     0 (0%)
  hugepages-2Mi      0 (0%)     0 (0%)
  nvidia.com/gpu     0          0
...

NVIDIA Container Runtime over K8s

Query the GPU device to verify that the NVIDIA runtime is working by deploying the following pod:

$ cat << EOF > query_pod.yml
apiVersion: v1
kind: Pod
metadata:
  name: query-pod
spec:
  restartPolicy: OnFailure
  containers:
  - image: jitteam/devicequery
    name: query-ctr

    resources:
      limits:
        nvidia.com/gpu: 1
EOF
$ kubectl apply -f query_pod.yml

Check the status of the pod query_pod and wait until is equal
"Completed":

$ kubectl get pod query_pod

Then, check the logs:

$ kubectl logs query_pod

Output:

root@jetson-nano-qspi-sd:~/k8s-device-plugin# kubectl logs pod1
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA Tegra X1"

The container detected 1 CUDA device correctly!

GPU concurrency

Test that the concurrency of the GPU resources is being handled correctly:

cat << EOF > concurrency.yml
apiVersion: v1
kind: Pod
metadata:
  name: pod1
spec:
  restartPolicy: OnFailure
  containers:
  - image: nvcr.io/nvidia/l4t-base:r32.4.2
    name: pod1-ctr
    command: ["sleep"]
    args: ["30"]

    resources:
      limits:
        nvidia.com/gpu: 1
---
apiVersion: v1
kind: Pod
metadata:
  name: pod2
spec:
  restartPolicy: OnFailure
  containers:
  - image: nvcr.io/nvidia/l4t-base:r32.4.2
    name: pod2-ctr
    command: ["sleep"]
    args: ["30"]

    resources:
      limits:
        nvidia.com/gpu: 1
EOF

Apply the changes and check the status of the second pod:

kubectl apply -f concurrency.yml
kubectl describe pod pod2

Output:

root@jetson-nano-qspi-sd:~/k8s-device-plugin# kubectl describe pod pod2
...
Events:
  Type     Reason            Age        From               Message
  ----     ------            ----       ----               -------
  Warning  FailedScheduling  <unknown>  default-scheduler  0/1 nodes are available: 1 Insufficient nvidia.com/gpu.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/1 nodes are available: 1 Insufficient nvidia.com/gpu.

As you can see, the second pod failed to allocate the GPU because the first pod is already using it. As soon as the pod1 exists, the other pod runs successfully.

After waiting 30 seconds (the specified timeout), you will see this output instead:

root@jetson-nano-qspi-sd:~/k8s-device-plugin# kubectl describe pod pod2
...
Events:
  Type     Reason            Age        From                          Message
  ----     ------            ----       ----                          -------
  Warning  FailedScheduling  <unknown>  default-scheduler             0/1 nodes are available: 1 Insufficient nvidia.com/gpu.
  Warning  FailedScheduling  <unknown>  default-scheduler             0/1 nodes are available: 1 Insufficient nvidia.com/gpu.
  Normal   Scheduled         <unknown>  default-scheduler             Successfully assigned default/pod2 to jetson-nano-qspi-sd
  Normal   Pulled            6s         kubelet, jetson-nano-qspi-sd  Container image "nvcr.io/nvidia/l4t-base:r32.4.2" already present on machine
  Normal   Created           6s         kubelet, jetson-nano-qspi-sd  Created container pod2-ctr
  Normal   Started           5s         kubelet, jetson-nano-qspi-sd  Started container pod2-ctr
root@jetson-nano-qspi-sd:~# kubectl get pods
NAME   READY   STATUS      RESTARTS   AGE
pod1   0/1     Completed   0          2m38s
pod2   0/1     Completed   0          2m38s

Conclusions

The use of device plugins alongside K8s allows the orchestration of GPU enabled devices, together with correcting the concurrency issues faced before. Device plugins effectively support the discovery of external devices so that the containers can leverage the different types of hardware accelerators. It is now possible to manage GPU workloads at the edge using state-of-the-art technologies, therefore, enabling HPC areas such as AI to benefit from this kind of acceleration.

References

[1] A. M. Beltre, P. Saha, M. Govindaraju, A. Younge, and R. E. Grant, “Enabling hpc workloads on cloud infrastructure using kubernetes container orchestration mechanisms,” in 2019 ieee/acm international workshop on containers and new orchestration paradigms for isolated environments in hpc (canopie-hpc), 2019, pp. 11–20.

[2] C. Tarbett, “Why K3s Is the Future of Kubernetes at the Edge,”
Rancher Labs, Nov. 2019.

[3] T. Erl, Service-oriented architecture: Concepts, technology, and
design
. USA: Prentice Hall PTR, 2005.

[4] “What is Kubernetes?” 2020.

[5] M. E. Piras, L. Pireddu, M. Moro, and G. Zanetti, “Container
Orchestration on HPC Clusters,” SpringerLink, pp. 25–35, Jun. 2019.

[6] “Kubernetes: part 1 architecture and main components overview,”, https://rtfm.co.ua/en/kubernetes-part-1-architecture-and-main-components-overview/
RTFM: Linux, DevOps and system administration. May-2020.

[7] Kubernetes, “Community,” GitHub. 2020.

[8] “K3s - 5 less than K8s,” Rancher Labs. Apr-2020.

[9] Nvidia, “k8s-device-plugin,” GitHub. May-2020.

All product names, logos, and brands are property of their respective owners.
All company, product and service names used in this software are for identification purposes only. Wind River are registered trademarks of Wind River Systems.

Disclaimer of Warranty / No Support: Wind River does not provide support and maintenance services for this software, under Wind River’s standard Software Support and Maintenance Agreement or otherwise. Unless required by applicable law, Wind River provides the software (and each contributor provides its contribution) on an “AS IS” BASIS, WITHOUT WARRANTIES OF ANY KIND, either express or implied, including, without limitation, any warranties of TITLE, NONINFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the software and assume ay risks associated with your exercise of permissions under the license.

Docker is a trademark of Docker, Inc.

Kubernetes is a trademark of The Linux Foundation.

NVIDIA, NVIDIA EGX, CUDA, Jetson, and Tegra are trademarks and/or registered trademarks of NVIDIA Corporation in the U.S. and other countries. Other company and product names may be trademarks of the respective companies with which they are associated.

Previous Wind River Cloud Platform Honored as Bronze Stevie® Winner in 2020 American Business Awards®
Next COVID-19 and Transformational Focus