Docker使用GPU全过程

更新时间：2024年01月09日 15:16:25 作者：DripBoy

这篇文章主要介绍了Docker使用GPU全过程,具有很好的参考价值,希望对大家有所帮助,如有错误或未考虑完全的地方,望不吝赐教

一、docker使用宿主机硬件设备的三种方式

使用--privileged=true选项，以特权模式开启容器
使用--device选项
使用容器卷挂载-v选项

二、docker使用gpu方式演变

docker使用宿主机的gpu设备，本质是把宿主机使用gpu时调用的设备文件全部挂载到docker上。

nvidia提供了三种方式的演变，如下是官网的一些介绍

来自 <Enabling GPUs in the Container Runtime Ecosystem | NVIDIA Technical Blog>

NVIDIA designed NVIDIA-Docker in 2016 to enable portability in Docker images that leverage NVIDIA GPUs. It allowed driver agnostic CUDA images and provided a Docker command line wrapper that mounted the user mode components of the driver and the GPU device files into the container at launch. Over the lifecycle of NVIDIA-Docker, we realized the architecture lacked flexibility for a few reasons: Tight integration with Docker did not allow support of other container technologies such as LXC, CRI-O, and other runtimes in the future We wanted to leverage other tools in the Docker ecosystem – e.g. Compose (for managing applications that are composed of multiple containers) Support GPUs as a first-class resource in orchestrators such as Kubernetes and Swarm Improve container runtime support for GPUs – esp. automatic detection of user-level NVIDIA driver libraries, NVIDIA kernel modules, device ordering, compatibility checks and GPU features such as graphics, video acceleration As a result, the redesigned NVIDIA-Docker moved the core runtime support for GPUs into a library called libnvidia-container. The library relies on Linux kernel primitives and is agnostic relative to the higher container runtime layers. This allows easy extension of GPU support into different container runtimes such as Docker, LXC and CRI-O. The library includes a command-line utility and also provides an API for integration into other runtimes in the future. The library, tools, and the layers we built to integrate into various runtimes are collectively called the NVIDIA Container Runtime. Since 2015, Docker has been donating key components of its container platform, starting with the Open Containers Initiative (OCI) specification and an implementation of the specification of a lightweight container runtime called runc. In late 2016, Docker also donated containerd, a daemon which manages the container lifecycle and wraps OCI/runc. The containerd daemon handles transfer of images, execution of containers (with runc), storage, and network management. It is designed to be embedded into larger systems such as Docker. More information on the project is available on the official site. Figure 1 shows how the libnvidia-container integrates into Docker, specifically at the runc layer. We use a custom OCI prestart hook called nvidia-container-runtime-hook to runc in order to enable GPU containers in Docker (more information about hooks can be found in the OCI runtime spec). The addition of the prestart hook to runc requires us to register a new OCI compatible runtime with Docker (using the –runtime option). At container creation time, the prestart hook checks whether the container is GPU-enabled (using environment variables) and uses the container runtime library to expose the NVIDIA GPUs to the container. Figure 1.Integration of NVIDIA Container Runtime with Docker

1、nvidia-docker

nvidia-docker是在docker的基础上做了一层封装

通过 nvidia-docker-plugin把硬件设备在docker的启动命令上添加必要的参数。

Ubuntu distributions 
# Install nvidia-docker and nvidia-docker-plugin 
wget -P /tmp https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.0-rc/nvidia-docker_1.0.0.rc-1_amd64.deb 
sudo dpkg -i /tmp/nvidia-docker_1.0.0.rc-1_amd64.deb && rm /tmp/nvidia-docker*.deb # Test nvidia-smi 
nvidia-docker run --rm nvidia/cuda nvidia-smi 
 
Other distributions 
# Install nvidia-docker and nvidia-docker-plugin 
wget -P /tmp https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.0-rc/nvidia-docker_1.0.0.rc_amd64.tar.xz 
sudo tar --strip-components=1 -C /usr/bin -xvf /tmp/nvidia-docker_1.0.0.rc_amd64.tar.xz && rm /tmp/nvidia-docker*.tar.xz 
# Run nvidia-docker-plugin 
sudo -b nohup nvidia-docker-plugin > /tmp/nvidia-docker.log 
# Test nvidia-smi 
nvidia-docker run --rm nvidia/cuda nvidia-smi 
 
Standalone install 
# Install nvidia-docker and nvidia-docker-plugin 
wget -P /tmp https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.0-rc/nvidia-docker_1.0.0.rc_amd64.tar.xz 
sudo tar --strip-components=1 -C /usr/bin -xvf /tmp/nvidia-docker_1.0.0.rc_amd64.tar.xz && rm /tmp/nvidia-docker*.tar.xz 
# One-time setup 
sudo nvidia-docker volume setup 
# Test nvidia-smi 
nvidia-docker run --rm nvidia/cuda nvidia-smi

2、nvidia-docker2

sudo apt-get install nvidia-docker2 sudo apt-get install nvidia-container-runtime sudo dockerd --add-runtime=nvidia=/usr/bin/nvidia-container-runtime [...]

3、nvidia-container-toolkit

docker版本在19.03及以上后

nvidia-container-toolkit进行了进一步的封装，在参数里直接使用--gpus "device=0" 即可

总结

以上为个人经验，希望能给大家一个参考，也希望大家多多支持脚本之家。

您可能感兴趣的文章:

Docker
GPU

docker安装nginx并部署前端项目的全过程
作为一个前端,代码写完,最后部署到服务器,这是一个必须要了解的过程,这篇文章主要给大家介绍了关于docker安装nginx并部署前端项目的相关资料,需要的朋友可以参考下
2022-05-05
docker构建nginx alpine镜像实现步骤
这篇文章主要介绍了docker构建nginx alpine镜像实现步骤，有需要的朋友可以借鉴参考下，希望能够有所帮助，祝大家多多进步，早日升职加薪
2023-08-08
docker中使用mysql数据库详解（在局域网访问）
这篇文章主要给大家介绍了在docker中使用mysql数据库，在局域网访问的相关资料，文中通过图文以及示例代码介绍的非常详细，对大家具有一定的参考学习价值，需要的朋友们下面来一起看看吧。
2017-06-06
Docker安装Tomcat无法访问的问题及解决
这篇文章主要介绍了Docker安装Tomcat无法访问的问题及解决方案，具有很好的参考价值，希望对大家有所帮助。如有错误或未考虑完全的地方，望不吝赐教
2023-07-07
docker默认存储路径修改方法总结
docker默认存储路径是/var/lib/docker,占用服务器根分区,容易导致磁盘空间占满,下面这篇文章主要给大家介绍了关于docker默认存储路径修改方法的相关资料,需要的朋友可以参考下
2023-10-10
Docker MySQL每天定时自动备份的实现方法
本文主要介绍了Docker MySQL每天定时自动备份的实现方法，文中通过示例代码介绍的非常详细，对大家的学习或者工作具有一定的参考学习价值，需要的朋友们下面随着小编来一起学习学习吧
2023-01-01
excel导出在docker环境中总是失败的问题
这篇文章主要介绍了excel导出在docker环境中总是失败的问题及解决方法，本文给大家介绍的非常详细，对大家的学习或工作具有一定的参考借鉴价值，
2020-09-09
docker守护进程配置代理
本文主要介绍了docker守护进程配置代理,文中通过示例代码介绍的非常详细,对大家的学习或者工作具有一定的参考学习价值,需要的朋友们下面随着小编来一起学习学习吧
2024-06-06
idea配置docker插件的方法步骤(图文)
本文主要介绍了idea配置docker插件的方法步骤,文中通过图文介绍的非常详细,对大家的学习或者工作具有一定的参考学习价值,需要的朋友们下面随着小编来一起学习学习吧
2024-01-01
docker的安装及docker-compose详解
Docker是一种轻量级的容器技术,可以帮助开发者更加方便地打包、发布和管理应用程序,在Linux系统上安装Docker非常容易,这篇文章主要介绍了docker的安装及docker-compose,需要的朋友可以参考下
2024-05-05