teamwork

NVIDIA Container Runtime for Wind River Linux

Introduction

Training and using AI models are tasks that demand significant computational power. Current trends are pointing more to deep neural networks, which include thousands, if not millions of operations per iteration. In the past year, more and more researchers have sounded the alarm on the exploding costs of deep learning. The computing power needed to do AI is now rising seven times faster than ever before [1]. These new needs are making hardware companies create hardware accelerators like Neural processing units, CPUs, and GPUS.

Embedded systems are not an exception to this transformation. We see every day intelligent traffic lights, autonomous vehicles, intelligent IoT devices, and more. The current direction is to have accelerators inside these embedded devices, Systems OnChip mainly. Hardware developers have embedded small accelerators like GPUs, FPGAs, and more into SOCs, SOMs, and other systems. We call these modern systems: heterogeneous computing architectures.

The use of GPUs on Linux is not something new; we have been able to do so for many years. However, it would be great to accelerate the development and deployment of HPC applications. Containers enable portability, stability, and many other characteristics when deploying an application. For this reason, companies are investing so much in these technologies. For instance, NVIDIA recently started a project that enables CUDA on Docker [2]. One concern when dealing with containers is the loss of performance. However, when comparing the performance of the GPU with and without the containers environment, researchers found that no additional overhead is caused [3]. The consistency in the performance is one of the principal benefits of containers over virtual machines; accessing the GPU is done seamlessly as the kernel stays the constant.

NVIDIA-Docker on Yocto

Together with Matt Madison (Maintainer of meta-tegra layer), we created the required recipes to build and deploy NVIDIA-docker on Wind River Linux LTS 19 (Yocto 3.0 Zeus).[4]

In this tutorial, you will find how to enable NVIDIA-containers on a custom distribution of Linux and run a small test application that leverages the use of GPUs inside a container.

Description

To enable NVIDIA containers, Docker needs to have the nvidia-containers-runtime which is a modified version of runc that adds a custom pre-start hook to all containers. The nvidia-containers-runtime communicates docker using the library libnvidia-container, which automatically configures GNU/Linux containers leveraging NVIDIA hardware. This library relies on kernel primitives and is designed to be agnostic of the container runtime. All the effort to port these libraries and tools to Yocto was submitted to the community and now is part of the meta-tegra layer which is maintained by Matt Madison.

Note: this setup is based on Linux for Tegra and not the original Yocto Linux Kernel

Benefits, and Limitations

The main benefit of GPUs inside containers is the portability and stability in the environment at the time of deployment. Of course, the development also sees benefits in having this portable environment as developers can collaborate more efficiently. However, there are limitations due to the nature of the NVIDIA environment. Containers are heavy-weight because they are based in Linux4Tegra image that contains libraries required on runtime. On the other hand, because of redistribution limitations, some libraries are not included in the container. This requires runc to mount some property code libraries, losing portability in the process.

Prerequisites

You are required to download NVIDIA property code from their website. To do so, you will need to create an NVIDIA Developer Network account.

Go into https://developer.nvidia.com/embedded/downloads , download the NVIDIA SDK Manager, install it and download all the files for the Jetson board you own. All the effort to port these libraries and tools to Yocto was submited to the community and now is part of the meta-tegra layer which is maintained by Matt Madison. The required Jetpack version is 4.3

Figure 1

If you need to include TensorRT in your builds, you must create the subdirectory and move all of the TensorRT packages downloaded by the SDK Manager there.

Figure 2

Creating the project

Figure 3

Note: --distro wrlinux-graphics can be used for some applications that require x11.

Add meta-tegra layer

DISCAIMER: meta-tegra is a community maintained layer not supported by Wind River at the time of writing

Figure 4

Tested with: https://github.com/madisongh/metategra/commit/74de93aa9cdce39fe536ebe677ad33079adb0bf2

Figure 5

Configure the project

Figure 6

Set the machine to your Jetson Board

Figure 7

CUDA cannot be compiled with GCC versions higher than 7. Set GCC version to 7.%:

Figure 8

Set the IMAGE export type to tegraflash for ease of deployment.

Figure 9

Change the docker version, add nvidia-container-runtime.

Figure 10

Fix tini build error

Figure 11

Set NVIDIA download location

Figure 12

Add the Nvidia containers runtime, AI libraries and the AI libraries CSV files

Figure 13

Enable ldconfig required by the nvidia-container-runtime

Figure 14

Build the project

Figure 15

Burn the image into the SD card

Figure 16

Connect the Jetson Board to your computer using the microusb as shown in the image:

Figure 17

Figure 18

After connecting the board, run:

Figure 19

This command will create the file wrlinux-image-glibc-std.sdcard that contains the SD card image required to boot. Burn the Image to the SD Card:

Figure 20

Warning: substitute the of= device to the one that points to your sdcard Failure to do so can lead to unexpected erase of hard disks

Deploy the target

Boot up the board and find the ip address with the command ifconfig. Then, ssh into the machine and run docker:

Figure 21

Inside the container run:

Figure 22

 

References

• [1] K. Hao, “The computing power needed to train AI is now rising seven times faster than ever before”, MIT Technology Review, Nov. 2019. [Online]. Available: https://www.technologyreview.com/s/614700/the- computing- powerneeded- to- train-ai-is-now-rising-seven-times-faster-than-ever-before.

• [2] Nvidia, nvidia-docker, [Online; accessed 15. Mar. 2020], Feb. 2020. [Online]. Available:https://github.com/NVIDIA/nvidia-docker.

• [3] L. Benedicic and M. Gila, “Accessing gpus from containers in hpc”, 2016. [Online]. Available: http://sc16.supercomputing.org/scarchive/tech_poster/poster_files/post187s2-file3.pdf.

• [4] M. Madison, Container runtime for master, [Online; accessed 30. Mar. 2020], Mar. 2020. [Online]. Available:https://github.com/madisongh/metategra/pull/266