mirror of
https://github.com/pytorch/pytorch.git
synced 2025-10-20 21:14:14 +08:00
Add config.save.use_pinned_memory_for_d2h to serialization config (#143342)
This was benchmarked with two separate scripts on my A100 (A) Save state_dict of llama3-style model on CUDA to disk with ``torch.save`` (B) Save `ModuleList` of 10 `nn.Linear(10,000, 10,000)` on CUDA to disk with `torch.save` Timings are an average of 5 runs and benchmark scripts + results are attached Under both scenarios, we see **~2x speedup in ``torch.save`` time with (``compute_crc32=False`` and ``use_pinned_memory_for_d2h=True``)** compared to the baseline of the current defaults (``compute_crc32=True`` and ``use_pinned_memory_for_d2h=False`` (A) Save state_dict of llama3-style model on CUDA to disk with ``torch.save`` [[script](https://gist.github.com/mikaylagawarecki/d3a86ea1bb08045d1a839976808d7432)][[results](https://gist.github.com/mikaylagawarecki/f61a4714e5cff703146a1fcb7e0c755c)] | | use_pinned_memory_for_d2h=False (Default) | use_pinned_memory_for_d2h=True | |-|-|-| | `compute_crc_32= True` (Default)| 28.54s | 20.76s | | `compute_crc_32 = False` | 22.57s | **14.51s** | (B) Save `ModuleList` of 10 `nn.Linear(10,000, 10,000)` on CUDA to disk with `torch.save` [[script](https://gist.github.com/mikaylagawarecki/ecbc505436bdd4b5190ef1b3430c12b6)][[results](https://gist.github.com/mikaylagawarecki/4e686bcf030b57de8c3ca74d8f5a88f7)] | | use_pinned_memory_for_d2h=False (Default) | use_pinned_memory_for_d2h=True | |-|-|-| | `compute_crc_32= True` (Default)| 8.38s | 5.53s | | `compute_crc_32 = False` | 6.94s | **3.99s** | Trace of (A) with `use_pinned_memory_for_d2h=True`, `compute_crc32=False` <img width="1745" alt="Screenshot 2024-12-16 at 7 32 33 PM" src="https://github.com/user-attachments/assets/80b87a8c-5a70-4eb9-ad66-7abc4aa7cc25" /> Baseline trace of (A) with `use_pinned_memory_for_d2h=False`, `compute_crc32=True` <img width="1799" alt="Screenshot 2024-12-16 at 7 38 20 PM" src="https://github.com/user-attachments/assets/13fa12d1-8f5f-424c-adc4-275b67012927" /> Pull Request resolved: https://github.com/pytorch/pytorch/pull/143342 Approved by: https://github.com/albanD ghstack dependencies: #143324
This commit is contained in:
committed by
PyTorch MergeBot
parent
3f63b742e6
commit
8e483654cb
@ -503,6 +503,8 @@ Config
|
||||
|
||||
* ``compute_crc32``: whether to compute and write the zip file checksum (Default : ``True``).
|
||||
See :func:`~torch.serialization.set_crc32_options`.
|
||||
* ``use_pinned_memory_for_d2h``: for storages that are on an accelerator when passed to ``torch.save``, whether to
|
||||
move storage to pinned memory or pageable memory on CPU within ``torch.save``. (Default: ``False`` (i.e. pageable))
|
||||
|
||||
``torch.utils.serialization.config.load`` contains options that control the behavior of ``torch.load``.
|
||||
|
||||
|
Reference in New Issue
Block a user