mirror of
https://github.com/pytorch/pytorch.git
synced 2025-10-20 21:14:14 +08:00
Address related comments earlier. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144139 Approved by: https://github.com/justinchuby Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com>
113 lines
3.9 KiB
ReStructuredText
113 lines
3.9 KiB
ReStructuredText
Understanding TorchDynamo-based ONNX Exporter Memory Usage
|
|
==========================================================
|
|
The previous TorchScript-based ONNX exporter would execute the model once to trace its execution, which could cause it to run out of
|
|
memory on your GPU if the model's memory requirements exceeded the available GPU memory. This issue has been addressed with the new
|
|
TorchDynamo-based ONNX exporter.
|
|
|
|
The TorchDynamo-based ONNX exporter utilizes torch.export.export() function to leverage
|
|
`FakeTensorMode <https://pytorch.org/docs/stable/torch.compiler_fake_tensor.html>`_ to avoid performing actual tensor computations
|
|
during the export process. This approach results in significantly lower memory usage compared to the TorchScript-based ONNX exporter.
|
|
|
|
Below is an example demonstrating the memory usage difference between TorchScript-based and TorchDynamo-based ONNX exporters.
|
|
In this example, we use the HighResNet model from MONAI. Before proceeding, please install it from PyPI:
|
|
|
|
.. code-block:: bash
|
|
|
|
pip install monai
|
|
|
|
|
|
PyTorch offers a tool for capturing and visualizing memory usage traces. We will use this tool to record the memory usage of the two
|
|
exporters during the export process and compare the results. You can find more details about this tool on
|
|
`Understanding CUDA Memory Usage <https://pytorch.org/docs/stable/torch_cuda_memory.html>`_.
|
|
|
|
|
|
TorchScript-based exporter
|
|
==========================
|
|
The code below could be run to generate a snapshot file which records the state of allocated CUDA memory during the export process.
|
|
|
|
.. code-block:: python
|
|
|
|
import torch
|
|
|
|
from monai.networks.nets import (
|
|
HighResNet,
|
|
)
|
|
|
|
torch.cuda.memory._record_memory_history()
|
|
|
|
model = HighResNet(
|
|
spatial_dims=3, in_channels=1, out_channels=3, norm_type="batch"
|
|
).eval()
|
|
|
|
model = model.to("cuda")
|
|
data = torch.randn(30, 1, 48, 48, 48, dtype=torch.float32).to("cuda")
|
|
|
|
with torch.no_grad():
|
|
onnx_program = torch.onnx.export(
|
|
model,
|
|
data,
|
|
"torchscript_exporter_highresnet.onnx",
|
|
dynamo=False,
|
|
)
|
|
|
|
snapshot_name = "torchscript_exporter_example.pickle"
|
|
print(f"generate {snapshot_name}")
|
|
|
|
torch.cuda.memory._dump_snapshot(snapshot_name)
|
|
print("Export is done.")
|
|
|
|
|
|
Open `pytorch.org/memory_viz <https://pytorch.org/memory_viz>`_ and drag/drop the generated pickled snapshot file into the visualizer.
|
|
The memory usage is described as below:
|
|
|
|
.. image:: _static/img/onnx/torch_script_exporter_memory_usage.png
|
|
|
|
|
|
By this figure, we can see the memory usage peak is above 2.8GB.
|
|
|
|
|
|
TorchDynamo-based exporter
|
|
==========================
|
|
|
|
The code below could be run to generate a snapshot file which records the state of allocated CUDA memory during the export process.
|
|
|
|
.. code-block:: python
|
|
|
|
import torch
|
|
|
|
from monai.networks.nets import (
|
|
HighResNet,
|
|
)
|
|
|
|
torch.cuda.memory._record_memory_history()
|
|
|
|
model = HighResNet(
|
|
spatial_dims=3, in_channels=1, out_channels=3, norm_type="batch"
|
|
).eval()
|
|
|
|
model = model.to("cuda")
|
|
data = torch.randn(30, 1, 48, 48, 48, dtype=torch.float32).to("cuda")
|
|
|
|
with torch.no_grad():
|
|
onnx_program = torch.onnx.export(
|
|
model,
|
|
data,
|
|
"test_faketensor.onnx",
|
|
dynamo=True,
|
|
)
|
|
|
|
snapshot_name = f"torchdynamo_exporter_example.pickle"
|
|
print(f"generate {snapshot_name}")
|
|
|
|
torch.cuda.memory._dump_snapshot(snapshot_name)
|
|
print(f"Export is done.")
|
|
|
|
Open `pytorch.org/memory_viz <https://pytorch.org/memory_viz>`_ and drag/drop the generated pickled snapshot file into the visualizer.
|
|
The memeory usage is described as below:
|
|
|
|
.. image:: _static/img/onnx/torch_dynamo_exporter_memory_usage.png
|
|
|
|
|
|
By this figure, we can see the memory usage peak is only around 45MB. Comparing to the memory usage peak of TorchScript-based exporter,
|
|
it reduces 98% memory usage.
|