pytorch模型训练的时候GPU使用率不高的问题

 更新时间:2023年09月08日 10:47:51   作者:两只蜡笔的小新  
这篇文章主要介绍了pytorch模型训练的时候GPU使用率不高的问题,具有很好的参考价值,希望对大家有所帮助,如有错误或未考虑完全的地方,望不吝赐教

前言

博主使用的显卡配置为:2*RTX 2080Ti,最近在训练的时候,监控显卡的资源使用情况发现,

虽然同是使用了两张显卡,但是每张显卡的使用率很不稳定,貌似是交替使用,这种情况下训练的速度是很慢的,为了解决

下面是解决这个问题的一些过程。

1. CPU和内存的使用情况

2. 用linux命令查看显卡资源的使用情况

watch -n 1 nvidia-smi

模型执行预测阶段 使用显卡0,但是也只有51%的使用率。

模型在训练阶段,同时使用两张显卡,发现里利用率也不高,我截取的最高的也就60%

3. 在pytorch的文档中找到了解决办法

data.DataLoader(dataset: Dataset[T_co], batch_size: Optional[int] = 1,
             shuffle: bool = False, sampler: Optional[Sampler[int]] = None,
             batch_sampler: Optional[Sampler[Sequence[int]]] = None,
             num_workers: int = 0, collate_fn: _collate_fn_t = None,
             pin_memory: bool = False, drop_last: bool = False,
             timeout: float = 0, worker_init_fn: _worker_init_fn_t = None,
             multiprocessing_context=None, generator=None,
             *, prefetch_factor: int = 2,
             persistent_workers: bool = False)

上面是该类的输入参数,经常使用的用红色标出,与本文相关的设置用紫色标出

下面是该类的描述文件:

class DataLoader(Generic[T_co]):
    r"""
    Data loader. Combines a dataset and a sampler, and provides an iterable over
    the given dataset.
    The :class:`~torch.utils.data.DataLoader` supports both map-style and
    iterable-style datasets with single- or multi-process loading, customizing
    loading order and optional automatic batching (collation) and memory pinning.
    See :py:mod:`torch.utils.data` documentation page for more details.
    Args:
        dataset (Dataset): dataset from which to load the data.
        batch_size (int, optional): how many samples per batch to load
            (default: ``1``).
        shuffle (bool, optional): set to ``True`` to have the data reshuffled
            at every epoch (default: ``False``).
        sampler (Sampler or Iterable, optional): defines the strategy to draw
            samples from the dataset. Can be any ``Iterable`` with ``__len__``
            implemented. If specified, :attr:`shuffle` must not be specified.
        batch_sampler (Sampler or Iterable, optional): like :attr:`sampler`, but
            returns a batch of indices at a time. Mutually exclusive with
            :attr:`batch_size`, :attr:`shuffle`, :attr:`sampler`,
            and :attr:`drop_last`.
        num_workers (int, optional): how many subprocesses to use for data
            loading. ``0`` means that the data will be loaded in the main process.
            (default: ``0``)
        collate_fn (callable, optional): merges a list of samples to form a
            mini-batch of Tensor(s).  Used when using batched loading from a
            map-style dataset.
        pin_memory (bool, optional): If ``True``, the data loader will copy Tensors
            into CUDA pinned memory before returning them.  If your data elements
            are a custom type, or your :attr:`collate_fn` returns a batch that is a custom type,
            see the example below.
        drop_last (bool, optional): set to ``True`` to drop the last incomplete batch,
            if the dataset size is not divisible by the batch size. If ``False`` and
            the size of dataset is not divisible by the batch size, then the last batch
            will be smaller. (default: ``False``)
        timeout (numeric, optional): if positive, the timeout value for collecting a batch
            from workers. Should always be non-negative. (default: ``0``)
        worker_init_fn (callable, optional): If not ``None``, this will be called on each
            worker subprocess with the worker id (an int in ``[0, num_workers - 1]``) as
            input, after seeding and before data loading. (default: ``None``)
        prefetch_factor (int, optional, keyword-only arg): Number of samples loaded
            in advance by each worker. ``2`` means there will be a total of
            2 * num_workers samples prefetched across all workers. (default: ``2``)
        persistent_workers (bool, optional): If ``True``, the data loader will not shutdown
            the worker processes after a dataset has been consumed once. This allows to
            maintain the workers `Dataset` instances alive. (default: ``False``)
    .. warning:: If the ``spawn`` start method is used, :attr:`worker_init_fn`
                 cannot be an unpicklable object, e.g., a lambda function. See
                 :ref:`multiprocessing-best-practices` on more details related
                 to multiprocessing in PyTorch.
    .. warning:: ``len(dataloader)`` heuristic is based on the length of the sampler used.
                 When :attr:`dataset` is an :class:`~torch.utils.data.IterableDataset`,
                 it instead returns an estimate based on ``len(dataset) / batch_size``, with proper
                 rounding depending on :attr:`drop_last`, regardless of multi-process loading
                 configurations. This represents the best guess PyTorch can make because PyTorch
                 trusts user :attr:`dataset` code in correctly handling multi-process
                 loading to avoid duplicate data.
                 However, if sharding results in multiple workers having incomplete last batches,
                 this estimate can still be inaccurate, because (1) an otherwise complete batch can
                 be broken into multiple ones and (2) more than one batch worth of samples can be
                 dropped when :attr:`drop_last` is set. Unfortunately, PyTorch can not detect such
                 cases in general.
                 See `Dataset Types`_ for more details on these two types of datasets and how
                 :class:`~torch.utils.data.IterableDataset` interacts with
                 `Multi-process data loading`_.
    .. warning:: See :ref:`reproducibility`, and :ref:`dataloader-workers-random-seed`, and
                 :ref:`data-loading-randomness` notes for random seed related questions.
    """

发现如下连个参数很关键:

num_workers (int, optional): how many subprocesses to use for data
    loading. ``0`` means that the data will be loaded in the main process.
    (default: ``0``)
pin_memory (bool, optional): If ``True``, the data loader will copy Tensors
    into CUDA pinned memory before returning them.  If your data elements
    are a custom type, or your :attr:`collate_fn` returns a batch that is a custom type,
    see the example below.

把 num_workers  = 4,pin_memory = True,发现效率就上来啦!!!

只开 num_workers

开 num_workers 和 pin_memory

总结

以上为个人经验,希望能给大家一个参考,也希望大家多多支持脚本之家。

相关文章

  • python 列表推导式使用详解

    python 列表推导式使用详解

    这篇文章主要介绍了python 列表推导式使用详解,文中通过示例代码介绍的非常详细,对大家的学习或者工作具有一定的参考学习价值,需要的朋友可以参考下
    2019-08-08
  • 解决Pyinstaller打包软件失败的一个坑

    解决Pyinstaller打包软件失败的一个坑

    这篇文章主要介绍了解决Pyinstaller打包软件失败的一个坑,具有很好的参考价值,希望对大家有所帮助。一起跟随小编过来看看吧
    2021-03-03
  • Python读取hdf文件并转化为tiff格式输出

    Python读取hdf文件并转化为tiff格式输出

    这篇文章主要介绍了Python读取hdf文件并转化为tiff格式输出,文章围绕主题展开详细的内容介绍,具有一定的参考价值,需要的小伙伴可以参考一下
    2022-07-07
  • python中有函数重载吗

    python中有函数重载吗

    在本篇内容里下边给大家整理的是关于python函数重载的知识点总结,有需要的朋友们可以学习下。
    2020-05-05
  • python中gevent库的用法详情

    python中gevent库的用法详情

    这篇文章主要介绍了python中gevent库的用法详情,Greenlet全部运行在主程序操作系统的过程中,但是它们是协作调度的,文章围绕主题展开详细的内容介绍,具有一定的参考价值
    2022-07-07
  • Python在for循环里处理大数据的推荐方法实例

    Python在for循环里处理大数据的推荐方法实例

    这篇文章主要介绍了Python在for循环里处理大数据的推荐方法实例详解,有需要的朋友可以借鉴参考下,希望能够有所帮助,祝大家多多进步,早日升职加薪
    2024-01-01
  • 如何在Django中添加没有微秒的 DateTimeField 属性详解

    如何在Django中添加没有微秒的 DateTimeField 属性详解

    这篇文章主要给大家介绍了关于如何在Django中添加没有微秒的 DateTimeField 属性的相关资料,文中通过示例代码介绍的非常详细,对大家的学习或者工作具有一定的参考学习价值,需要的朋友们下面随着小编来一起学习学习吧
    2019-01-01
  • Django中如何防范CSRF跨站点请求伪造攻击的实现

    Django中如何防范CSRF跨站点请求伪造攻击的实现

    这篇文章主要介绍了Django中如何防范CSRF跨站点请求伪造攻击的实现,文中通过示例代码介绍的非常详细,对大家的学习或者工作具有一定的参考学习价值,需要的朋友们下面随着小编来一起学习学习吧
    2019-04-04
  • Python输出列表(list)的倒序/逆序的几种方法

    Python输出列表(list)的倒序/逆序的几种方法

    列表是一个有序的元素集合,而列表的倒序或逆序操作也是常见的需求之一,本文主要介绍了Python输出列表(list)的倒序/逆序的几种方法,具有一定的参考价值,感兴趣的可以了解一下
    2024-02-02
  • 如何基于matlab相机标定导出xml文件

    如何基于matlab相机标定导出xml文件

    这篇文章主要介绍了如何基于matlab相机标定导出xml文件,文中通过示例代码介绍的非常详细,对大家的学习或者工作具有一定的参考学习价值,需要的朋友可以参考下
    2020-11-11

最新评论