mirror of
https://github.com/pytorch/pytorch.git
synced 2025-10-21 05:34:18 +08:00
Since CUDA 11.x (need to update the docs for this, current PR is saying 12.2 which is incorrect) we've been allocating cuBLAS workspaces explicitly per handle/stream combination https://github.com/pytorch/pytorch/pull/85447 According to the cuBLAS documentation, this appears to be sufficient for determinism without any explicit workspace requirements to e.g., `:4096:8` or `:16:8` as was previously expressed in PyTorch docs https://docs.nvidia.com/cuda/cublas/#results-reproducibility Planning to add an explicit determinism test as well... Pull Request resolved: https://github.com/pytorch/pytorch/pull/161749 Approved by: https://github.com/ngimel
180 lines
7.7 KiB
ReStructuredText
180 lines
7.7 KiB
ReStructuredText
.. _reproducibility:
|
|
|
|
Reproducibility
|
|
===============
|
|
|
|
Completely reproducible results are not guaranteed across PyTorch releases,
|
|
individual commits, or different platforms. Furthermore, results may not be
|
|
reproducible between CPU and GPU executions, even when using identical seeds.
|
|
|
|
However, there are some steps you can take to limit the number of sources of
|
|
nondeterministic behavior for a specific platform, device, and PyTorch release.
|
|
First, you can control sources of randomness that can cause multiple executions
|
|
of your application to behave differently. Second, you can configure PyTorch
|
|
to avoid using nondeterministic algorithms for some operations, so that multiple
|
|
calls to those operations, given the same inputs, will produce the same result.
|
|
|
|
.. warning::
|
|
|
|
Deterministic operations are often slower than nondeterministic operations, so
|
|
single-run performance may decrease for your model. However, determinism may
|
|
save time in development by facilitating experimentation, debugging, and
|
|
regression testing.
|
|
|
|
Controlling sources of randomness
|
|
.................................
|
|
|
|
PyTorch random number generator
|
|
-------------------------------
|
|
You can use :meth:`torch.manual_seed()` to seed the RNG for all devices (both
|
|
CPU and CUDA)::
|
|
|
|
import torch
|
|
torch.manual_seed(0)
|
|
|
|
Some PyTorch operations may use random numbers internally.
|
|
:meth:`torch.svd_lowrank()` does this, for instance. Consequently, calling it
|
|
multiple times back-to-back with the same input arguments may give different
|
|
results. However, as long as :meth:`torch.manual_seed()` is set to a constant
|
|
at the beginning of an application and all other sources of nondeterminism have
|
|
been eliminated, the same series of random numbers will be generated each time
|
|
the application is run in the same environment.
|
|
|
|
It is also possible to obtain identical results from an operation that uses
|
|
random numbers by setting :meth:`torch.manual_seed()` to the same value between
|
|
subsequent calls.
|
|
|
|
Python
|
|
------
|
|
|
|
For custom operators, you might need to set python seed as well::
|
|
|
|
import random
|
|
random.seed(0)
|
|
|
|
Random number generators in other libraries
|
|
-------------------------------------------
|
|
If you or any of the libraries you are using rely on NumPy, you can seed the global
|
|
NumPy RNG with::
|
|
|
|
import numpy as np
|
|
np.random.seed(0)
|
|
|
|
However, some applications and libraries may use NumPy Random Generator objects,
|
|
not the global RNG
|
|
(`<https://numpy.org/doc/stable/reference/random/generator.html>`_), and those will
|
|
need to be seeded consistently as well.
|
|
|
|
If you are using any other libraries that use random number generators, refer to
|
|
the documentation for those libraries to see how to set consistent seeds for them.
|
|
|
|
CUDA convolution benchmarking
|
|
-----------------------------
|
|
The cuDNN library, used by CUDA convolution operations, can be a source of nondeterminism
|
|
across multiple executions of an application. When a cuDNN convolution is called with a
|
|
new set of size parameters, an optional feature can run multiple convolution algorithms,
|
|
benchmarking them to find the fastest one. Then, the fastest algorithm will be used
|
|
consistently during the rest of the process for the corresponding set of size parameters.
|
|
Due to benchmarking noise and different hardware, the benchmark may select different
|
|
algorithms on subsequent runs, even on the same machine.
|
|
|
|
Disabling the benchmarking feature with :code:`torch.backends.cudnn.benchmark = False`
|
|
causes cuDNN to deterministically select an algorithm, possibly at the cost of reduced
|
|
performance.
|
|
|
|
However, if you do not need reproducibility across multiple executions of your application,
|
|
then performance might improve if the benchmarking feature is enabled with
|
|
:code:`torch.backends.cudnn.benchmark = True`.
|
|
|
|
Note that this setting is different from the :code:`torch.backends.cudnn.deterministic`
|
|
setting discussed below.
|
|
|
|
Avoiding nondeterministic algorithms
|
|
....................................
|
|
:meth:`torch.use_deterministic_algorithms` lets you configure PyTorch to use
|
|
deterministic algorithms instead of nondeterministic ones where available, and
|
|
to throw an error if an operation is known to be nondeterministic (and without
|
|
a deterministic alternative).
|
|
|
|
Please check the documentation for :meth:`torch.use_deterministic_algorithms()`
|
|
for a full list of affected operations. If an operation does not act correctly
|
|
according to the documentation, or if you need a deterministic implementation
|
|
of an operation that does not have one, please submit an issue:
|
|
`<https://github.com/pytorch/pytorch/issues?q=label:%22module:%20determinism%22>`_
|
|
|
|
For example, running the nondeterministic CUDA implementation of :meth:`torch.Tensor.index_add_`
|
|
will throw an error::
|
|
|
|
>>> import torch
|
|
>>> torch.use_deterministic_algorithms(True)
|
|
>>> torch.randn(2, 2).cuda().index_add_(0, torch.tensor([0, 1]), torch.randn(2, 2))
|
|
Traceback (most recent call last):
|
|
File "<stdin>", line 1, in <module>
|
|
RuntimeError: index_add_cuda_ does not have a deterministic implementation, but you set
|
|
'torch.use_deterministic_algorithms(True)'. ...
|
|
|
|
When :meth:`torch.bmm` is called with sparse-dense CUDA tensors it typically uses a
|
|
nondeterministic algorithm, but when the deterministic flag is turned on, its alternate
|
|
deterministic implementation will be used::
|
|
|
|
>>> import torch
|
|
>>> torch.use_deterministic_algorithms(True)
|
|
>>> torch.bmm(torch.randn(2, 2, 2).to_sparse().cuda(), torch.randn(2, 2, 2).cuda())
|
|
tensor([[[ 1.1900, -2.3409],
|
|
[ 0.4796, 0.8003]],
|
|
[[ 0.1509, 1.8027],
|
|
[ 0.0333, -1.1444]]], device='cuda:0')
|
|
|
|
CUDA convolution determinism
|
|
----------------------------
|
|
While disabling CUDA convolution benchmarking (discussed above) ensures that
|
|
CUDA selects the same algorithm each time an application is run, that algorithm
|
|
itself may be nondeterministic, unless either
|
|
:code:`torch.use_deterministic_algorithms(True)` or
|
|
:code:`torch.backends.cudnn.deterministic = True` is set. The latter setting
|
|
controls only this behavior, unlike :meth:`torch.use_deterministic_algorithms`
|
|
which will make other PyTorch operations behave deterministically, too.
|
|
|
|
CUDA RNN and LSTM
|
|
-----------------
|
|
In some versions of CUDA, RNNs and LSTM networks may have non-deterministic behavior.
|
|
See :meth:`torch.nn.RNN` and :meth:`torch.nn.LSTM` for details and workarounds.
|
|
|
|
Filling uninitialized memory
|
|
----------------------------
|
|
Operations like :meth:`torch.empty` and :meth:`torch.Tensor.resize_` can return
|
|
tensors with uninitialized memory that contain undefined values. Using such a
|
|
tensor as an input to another operation is invalid if determinism is required,
|
|
because the output will be nondeterministic. But there is nothing to actually
|
|
prevent such invalid code from being run. So for safety,
|
|
:attr:`torch.utils.deterministic.fill_uninitialized_memory` is set to ``True``
|
|
by default, which will fill the uninitialized memory with a known value if
|
|
:code:`torch.use_deterministic_algorithms(True)` is set. This will prevent the
|
|
possibility of this kind of nondeterministic behavior.
|
|
|
|
However, filling uninitialized memory is detrimental to performance. So if your
|
|
program is valid and does not use uninitialized memory as the input to an
|
|
operation, then this setting can be turned off for better performance.
|
|
|
|
DataLoader
|
|
..........
|
|
|
|
DataLoader will reseed workers following :ref:`data-loading-randomness` algorithm.
|
|
Use :meth:`worker_init_fn` and `generator` to preserve reproducibility::
|
|
|
|
def seed_worker(worker_id):
|
|
worker_seed = torch.initial_seed() % 2**32
|
|
numpy.random.seed(worker_seed)
|
|
random.seed(worker_seed)
|
|
|
|
g = torch.Generator()
|
|
g.manual_seed(0)
|
|
|
|
DataLoader(
|
|
train_dataset,
|
|
batch_size=batch_size,
|
|
num_workers=num_workers,
|
|
worker_init_fn=seed_worker,
|
|
generator=g,
|
|
)
|