May 22, 2020 Dr. Design

NVIDIA container runtime for Wind River Linux

By Pablo Rodriguez Quesada



Training and using AI models are tasks that demand significant computational power. Current trends are pointing more to deep neural networks, which include thousands, if not millions of operations per iteration. In the past year, more and more researchers have sounded the alarm on the exploding costs of deep learning. The computing power needed to do AI is now rising seven times faster than ever before [1]. These new needs are making hardware companies create hardware accelerators like Neural processing units, CPUs, and GPUs.

Embedded systems are not an exception to this transformation. We see every day intelligent traffic lights, autonomous vehicles, intelligent IoT devices, and more. The current direction is to have accelerators inside these embedded devices, Systems On-Chip mainly. Hardware developers have embedded small accelerators like GPUs, FPGAs, and more into SOCs, SOMs, and other systems. We call these modern systems: heterogeneous computing architectures.

The use of GPUs on Linux is not something new; we have been able to do so for many years. However, it would be great to accelerate the development and deployment of HPC applications. Containers enable portability, stability, and many other characteristics when deploying an application. For this reason, companies are investing so much in these technologies. For instance, NVIDIA recently started a project that enables CUDA on Docker [2].

One concern when dealing with containers is the loss of performance. However, when comparing the performance of the GPU with and without the containers environment, researchers found that no additional overhead is caused [3]. The consistency in the performance is one of the principal benefits of containers over virtual machines; accessing the GPU is done seamlessly as the kernel stays the constant.

NVIDIA-Docker on Yocto

Together with Matt Madison (Maintainer of meta-tegra layer), we created the required recipes to build and deploy NVIDIA-docker on Wind River Linux LTS 19 (Yocto 3.0 Zeus).[4]

In this tutorial, you will find how to enable NVIDIA-containers on a custom distribution of Linux and run a small test application that leverages the use of GPUs inside a container.


To enable NVIDIA containers, Docker needs to have the nvidia-containers-runtime which is a modified version of runc that adds a custom pre-start hook to all containers. The nvidia-containers-runtime communicates docker using the library libnvidia-container, which automatically configures GNU/Linux containers leveraging NVIDIA hardware. This library relies on kernel primitives and is designed to be agnostic of the container runtime. All the effort to port these libraries and tools to the Yocto Project was submitted to the community and now is part of the meta-tegra layer which is maintained by Matt Madison.

Note: this setup is based on Linux for Tegra and not the original Yocto Linux kernel

Benefits, and Limitations

The main benefit of GPUs inside containers is the portability and stability in the environment at the time of deployment. Of course, the development also sees benefits in having this portable environment as developers can collaborate more efficiently.

However, there are limitations due to the nature of the NVIDIA environment. Containers are heavy-weight because they are based in Linux4Tegra image that contains libraries required on runtime. On the other hand, because of redistribution limitations, some libraries are not included in the container. This requires runc to mount some property code libraries, losing portability in the process.


You are required to download NVIDIA property code from their website. To do so, you will need to create an NVIDIA Developer Network account.

Go into , download the NVIDIA SDK Manager, install it and download all the files for the Jetson board you own. All the effort to port these libraries and tools to the Yocto Project was submited to the community and now is part of the meta-tegra layer which is maintained by Matt Madison.

The required Jetpack version is 4.3


Image 1. SDK Manager installation

If you need to include TensorRT in your builds, you must create the subdirectory and move all of the TensorRT packages downloaded by the SDK Manager there.

$ mkdir /home/$USER/Downloads/nvidia/sdkm_downloads/NoDLA
$ cp /home/$USER/Downloads/nvidia/sdkm_downloads/libnv* /home/$USER/Downloads/nvidia/sdkm_downloads/NoDLA

Creating the project

$ git clone --branch WRLINUX_10_19_BASE
$ ./wrlinux-x/ --all-layers --dl-layers --templates feature/docker

Note: --distro wrlinux-graphics can be used for some applications that require x11.

Add meta-tegra layer

DISCLAIMER: meta-tegra is a community maintained layer not supported by Wind River at the time of writing

$ git clone layers/meta-tegra
$ cd layers/meta-tegra
$ git checkout 11a02d02a7098350638d7bf3a6c1a3946d3432fd
$ cd -

Tested with:

$ . ./environment-setup-x86_64-wrlinuxsdk-linux
$ . ./oe-init-build-env
$ bitbake-layers add-layer ../layers/meta-tegra/
$ bitbake-layers add-layer ../layers/meta-tegra/contrib

Configure the project

$ echo "BB_NO_NETWORK = '0'" >> conf/local.conf
$ echo 'INHERIT_DISTRO_remove = "whitelist"' >> conf/local.conf

Set the machine to your Jetson Board

$ echo "MACHINE='jetson-nano-qspi-sd'" >> conf/local.conf
$ echo "PREFERRED_PROVIDER_virtual/kernel = 'linux-tegra'" >> conf/local.conf

CUDA cannot be compiled with GCC versions higher than 7. Set GCC version to 7.%:

$ echo 'GCCVERSION = "7.%"' >> conf/local.conf
$ echo "require contrib/conf/include/gcc-compat.conf" >> conf/local.conf

Set the IMAGE export type to tegraflash for ease of deployment.

$ echo 'IMAGE_CLASSES += "image_types_tegra"' >> conf/local.conf
$ echo 'IMAGE_FSTYPES = "tegraflash"' >> conf/local.conf

Change the docker version, add nvidia-container-runtime.

$ echo 'IMAGE_INSTALL_remove = "docker"' >> conf/local.conf
$ echo 'IMAGE_INSTALL_append = " docker-ce"' >> conf/local.conf

Fix tini build error

$ echo 'SECURITY_CFLAGS_pn-tini_append = " ${SECURITY_NOPIE_CFLAGS}"' >> conf/local.conf

Set NVIDIA download location

$ echo "NVIDIA_DEVNET_MIRROR='file:///home/$USER/Downloads/nvidia/sdkm_downloads'" >> conf/local.conf
$ echo 'CUDA_BINARIES_NATIVE = "cuda-binaries-ubuntu1604-native"' >> conf/local.conf

Add the Nvidia containers runtime, AI libraries and the AI libraries CSV files

$ echo 'IMAGE_INSTALL_append = " nvidia-docker nvidia-container-runtime cudnn tensorrt libvisionworks libvisionworks-sfm libvisionworks-tracking cuda-container-csv cudnn-container-csv tensorrt-container-csv libvisionworks-container-csv libvisionworks-sfm-container-csv libvisionworks-tracking-container-csv"' >> conf/local.conf

Enable ldconfig required by the nvidia-container-runtime

$ echo 'DISTRO_FEATURES_append = " ldconfig"' >> conf/local.conf

Build the project

$ bitbake wrlinux-image-glibc-std

Burn the image into the SD card

$ unzip -d wrlinux-jetson-nano
$ cd wrlinux-jetson-nano

Connect the Jetson Board to your computer using the micro USB cable as shown in the image:

Image 2. Recovery mode setup for Jetson Nano

Image 3. Pins Diagram for Jetson Nano

After connecting the board, run:

$ sudo ./

This command will create the file wrlinux-image-glibc-std.sdcard that contains the SD card image required to boot.

Burn the Image to the SD Card:

$ sudo dd if=wrlinux-image-glibc-std.sdcard of=/dev/***** bs=8k

Warning: substitute the of= device to the one that points to your sdcard
Failure to do so can lead to unexpected erase of hard disks

Deploy the target

Boot up the board and find the ip address with the command ifconfig.

Then, ssh into the machine and run docker:

$ ssh root@<ip_address>

Create using the example from the "Train and evaluate with Keras" section in the Tensorflow documentation:

from __future__ import absolute_import, division, print_function, unicode_literals
import tensorflow as tf
import numpy as np
from tensorflow import keras
from tensorflow.keras import layers

inputs = keras.Input(shape=(784,), name='digits')
x = layers.Dense(64, activation='relu', name='dense_1')(inputs)
x = layers.Dense(64, activation='relu', name='dense_2')(x)
outputs = layers.Dense(10, name='predictions')(x)

model = keras.Model(inputs=inputs, outputs=outputs)

(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Preprocess the data (these are Numpy arrays)
x_train = x_train.reshape(60000, 784).astype('float32') / 255
x_test = x_test.reshape(10000, 784).astype('float32') / 255

y_train = y_train.astype('float32')
y_test = y_test.astype('float32')

# Reserve 10,000 samples for validation
x_val = x_train[-10000:]
y_val = y_train[-10000:]
x_train = x_train[:-10000]
y_train = y_train[:-10000]

model.compile(optimizer=keras.optimizers.RMSprop(),  # Optimizer
              # Loss function to minimize
              # List of metrics to monitor

print('# Fit model on training data')
history =, y_train,
                    # We pass some validation for
                    # monitoring validation loss and metrics
                    # at the end of each epoch
                    validation_data=(x_val, y_val))

print('\nhistory dict:', history.history)

# Evaluate the model on the test data using `evaluate`
print('\n# Evaluate on test data')
results = model.evaluate(x_test, y_test, batch_size=128)
print('test loss, test acc:', results)

# Generate predictions (probabilities -- the output of the last layer)
# on new data using `predict`
print('\n# Generate predictions for 3 samples')
predictions = model.predict(x_test[:3])
print('predictions shape:', predictions.shape)

Create a Dockerfile:

FROM tianxiang84/l4t-base:all


ENTRYPOINT ["/usr/bin/python3"]
CMD ["/root/"]

Build the container:

# docker build -t l4t-tensorflow .

Run the container:

# docker run --runtime nvidia -it l4t-tensorflow 


Note the use of the GPU0:

2020-04-22 21:13:56.969319: I tensorflow/core/common_runtime/gpu/] Adding visible gpu devices: 0
2020-04-22 21:13:58.210600: I tensorflow/core/common_runtime/gpu/] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 268 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)


The use of NVIDIA-containers allows a smooth deployment of AI applications. Once you have your Linux distribution running containers with the custom NVIDIA runtime, getting a Neural Network to work is as simple as running one command. Getting an NVIDIA Tegra board to run computing-intensive workloads is now easier than ever.

With the provided custom runc engine that allows the use of CUDA and other related libraries, you will be running applications as if they were on bare-metal.

One of the possibilities the containers offer is combining this setup with Kubernetes or the NVIDIA EGX Platform so that you can do the orchestration. The Kubernetes Device Plugins distribute and manage workloads across multiple acceleration devices, giving you high availability as well as other benefits. Combined with other technologies such as Tensorflow and OpenCV, and you will have an army of edge devices ready to run your Intelligent applications for you.


All product names, logos, and brands are property of their respective owners.
All company, product and service names used in this software are for identification purposes only. Wind River are registered trademarks of Wind River Systems.

Disclaimer of Warranty / No Support: Wind River does not provide support and maintenance services for this software, under Wind River’s standard Software Support and Maintenance Agreement or otherwise. Unless required by applicable law, Wind River provides the software (and each contributor provides its contribution) on an “AS IS” BASIS, WITHOUT WARRANTIES OF ANY KIND, either express or implied, including, without limitation, any warranties of TITLE, NONINFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the software and assume ay risks associated with your exercise of permissions under the license.

TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc.

Docker is a trademark of Docker, Inc.

NVIDIA, NVIDIA EGX, CUDA, Jetson, and Tegra are trademarks and/or registered trademarks of NVIDIA Corporation in the U.S. and other countries. Other company and product names may be trademarks of the respective companies with which they are associated.

Previous 5G Use Cases: Exploring 5G Applications Beyond Telecom
Next Developing Qt5 applications natively on Wind River Linux