Commit Graph

262 Commits

Author SHA1 Message Date
68e023cbbb [BE] Add missing type for storage dict (#156831)
For some reason, this one always bleats when I run mypy on OSX, so shut it up.

Signed-off-by: Edward Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156831
Approved by: https://github.com/mikaylagawarecki, https://github.com/atalman, https://github.com/malfet
ghstack dependencies: #156830
2025-06-26 02:48:55 +00:00
e95e8eed0a mypy 1.16.0 (#155821)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155821
Approved by: https://github.com/ezyang, https://github.com/zou3519
2025-06-14 18:18:43 +00:00
6383ddcfa4 Update serialization docs (#153631)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153631
Approved by: https://github.com/albanD
2025-05-19 20:22:07 +00:00
68c12ecfe2 Move get accelerator to use build time flags when possible (#146098)
This PR does two main things (they are in a single PR to show how the newly added APIs are used).

- Add isBuilt and isAvailable APIs to the AcceleratorHook interface. See inline doc for their exact semantic
- Use the newly added isBuilt for accelerator check to ensure it does not poison fork

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146098
Approved by: https://github.com/ngimel, https://github.com/malfet, https://github.com/EikanWang, https://github.com/jeromean

Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com>
2025-03-10 13:17:58 +00:00
b246cd7b82 Revert "Move get accelerator to use build time flags when possible (#146098)"
This reverts commit 17302b4bc837af079d2f6480f07ea2c99b93fb4b.

Reverted https://github.com/pytorch/pytorch/pull/146098 on behalf of https://github.com/albanD due to Still fails with cuda build on a non-gpu machine ([comment](https://github.com/pytorch/pytorch/pull/146098#issuecomment-2707191770))
2025-03-07 18:59:58 +00:00
17302b4bc8 Move get accelerator to use build time flags when possible (#146098)
This PR does two main things (they are in a single PR to show how the newly added APIs are used).

- Add isBuilt and isAvailable APIs to the AcceleratorHook interface. See inline doc for their exact semantic
- Use the newly added isBuilt for accelerator check to ensure it does not poison fork

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146098
Approved by: https://github.com/ngimel, https://github.com/malfet, https://github.com/EikanWang, https://github.com/jeromean

Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com>
2025-03-07 15:19:34 +00:00
d5184901c4 Make torch.serialization.skip_data work with torch.load (#148018)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148018
Approved by: https://github.com/albanD
ghstack dependencies: #147786, #147787, #147788
2025-03-06 12:04:46 +00:00
be0ceee1c3 Make record/storage alignment in torch.save configurable (#147788)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/147788
Approved by: https://github.com/albanD
ghstack dependencies: #147786, #147787
2025-03-06 12:04:46 +00:00
209977e6e5 Add information about checkpoint offset to untyped storages when torch.load under FakeTensorMode (#147787)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/147787
Approved by: https://github.com/albanD
ghstack dependencies: #147786
2025-03-06 12:04:39 +00:00
bdcc1b579b Allow torch.load under FakeTensorMode to load FakeTensors with correct devices (for plain Tensors) (#147786)
This only fixes _rebuild_tensor_v2 and _rebuild_tensor_v3

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147786
Approved by: https://github.com/albanD
2025-03-06 12:04:32 +00:00
c73a92fbf5 [BE][CI] bump ruff to 0.9.2: multiline assert statements (#144546)
Reference: https://docs.astral.sh/ruff/formatter/black/#assert-statements

> Unlike Black, Ruff prefers breaking the message over breaking the assertion, similar to how both Ruff and Black prefer breaking the assignment value over breaking the assignment target:
>
> ```python
> # Input
> assert (
>     len(policy_types) >= priority + num_duplicates
> ), f"This tests needs at least {priority+num_duplicates} many types."
>
>
> # Black
> assert (
>     len(policy_types) >= priority + num_duplicates
> ), f"This tests needs at least {priority+num_duplicates} many types."
>
> # Ruff
> assert len(policy_types) >= priority + num_duplicates, (
>     f"This tests needs at least {priority + num_duplicates} many types."
> )
> ```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144546
Approved by: https://github.com/malfet
2025-02-27 20:46:16 +00:00
52f6d4aa30 [BE][CI][Easy] bump ruff to 0.9.0: long statements in docstrings (#146509)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146509
Approved by: https://github.com/justinchuby, https://github.com/Skylion007
2025-02-24 19:56:08 +00:00
086d146f6f Update ruff linter for PEP585 (#147540)
This turns on PEP585 enforcement in RUFF.

- Updates the target python version
- Stops ignoring UP006 warnings (PEP585)
- Fixes a few issues which crept into the tree in the last day

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147540
Approved by: https://github.com/justinchuby, https://github.com/Skylion007
2025-02-22 04:45:17 +00:00
2190ca7f47 Use __qualname__ in add_safe_globals and update Unpickling error raised for Unsupported GLOBAL (#146815)
- Fixes #146814

Change
```python
for f in _marked_safe_globals_set:
    module, name = f.__module__, f.__name__
```
to

```python
for f in _marked_safe_globals_set:
    module, name = f.__module__, f.__qualname__
```
for avoiding same key string overwrite.

A test is also added.
```
python test/test_serialization.py TestSerialization.test_serialization_nested_class
```

- Fixes #146886
Pull Request resolved: https://github.com/pytorch/pytorch/pull/146815
Approved by: https://github.com/mikaylagawarecki
2025-02-21 18:04:59 +00:00
db4ce78d46 PEP585: More UP006 fixes (#146392)
This should be the final PR before we can enable RUFF UP006.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146392
Approved by: https://github.com/justinchuby, https://github.com/albanD, https://github.com/Skylion007
2025-02-20 06:18:13 +00:00
cyy
d473c212fd Remove code for Python < 3.9 (#147097)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147097
Approved by: https://github.com/albanD
2025-02-14 03:22:49 +00:00
001e355a56 Add option to serialization config to reduce random reads from get_record_offset when loading with mmap=True (#143880)
## Background

This PR adds `torch.utils.serialization.config.load.calculate_storage_offsets`. This option relies  on the previous PR in this stack, where storage order was changed to non lexicographical. A `.format_version` entry was added to the zipfile and `calculate_storage_offsets` will only work on checkpoints with `.format_version`.

When this is turned on, for `torch.load(mmap=True)`, offsets of each storage record (other than the 0th storage will be calculated instead of relying on `miniz` APIs to determine this).

The existing APIs will issue multiple random reads (reading the end of central directory record, then reading the zipfile header for the record) to determine the storage offset where the record starts. This can greatly degrade `torch.load(mmap=True)` performance for non-filesystem cases.

6aaae9d78f/caffe2/serialize/inline_container.cc (L589-L605)

## How does this work

The format for the checkpoint is as such

```
archive_name/
|_ data.pkl
|_.format_version
|_byteorder
|_data/
  |_ 0
  |_ 1
  |_ 2
  |_ ...
|_
```

Each `data/i` record represents a storage, where storages are written in the order that the Pickler encounters them.

For each storage, our `persistent_load` logic saves the following metadata to the pickle file `dtype, numel, key, location` where `numel` is the number of bytes in the storage.

Note that we always use `miniz` writer  in the zip64 mode per [here](7796e308d0/caffe2/serialize/inline_container.cc (L701)) A zipfile record written by miniz looks as such

```
 ---------------- ----------------- ------------------- ---------------- --------- ------------------------------
| 30 byte header | n byte filename | zip64_extra_data | m byte padding | storage | 16 or 24 byte local dir footer  |
 ---------------- ----------------- ------------------- ---------------- --------- ------------------------------
```

- The header size (30) is given by [`MZ_ZIP_LOCAL_DIR_HEADER_SIZE`](https://github.com/pytorch/pytorch/blob/main/third_party/miniz-3.0.2/miniz.c?fbclid=IwZXh0bgNhZW0CMTEAAR2O8Vysd--UoSCxW70gabXIS1dbz733oHwuUQ5_Ff1hY2WU6PL2i6CSH4A_aem_J9oaU2HpDeWtJKOU9EnVqw#L3290)
- filename will be `"{archive_name}/{filepath}"`

- `zip64_extra_data` is determined by [`mz_zip_writer_create_zip64_extra_data`](7796e308d0/third_party/miniz-3.0.2/miniz.c (L6202)). Note that [we only create zip64_extra_data if storage_size >= 0xFFFFFFFF or the offset of the start of the header >= 0xFFFFFFFF](7796e308d0/third_party/miniz-3.0.2/miniz.c (L6519-L6524))
- `m` is determined by [`getPadding`](7796e308d0/caffe2/serialize/inline_container.cc (L254)), which accounts for filename, zip64_extra_data to determine `m` such that the start of `storage` is aligned to 64 bytes. The `m` bytes will always start with `F B padding_size" as the first 4 bytes
- The local dir footer size is determined based on [this snippet ](7796e308d0/third_party/miniz-3.0.2/miniz.c (L6610-L6632)): if the buffer size is 0 it is skipped. If the zip64_extra_data was created, it is 24, otherwise it is 16.

When `torch.utils.serialization.config.load.calculate_storage_offsets` is set we do the following
- We keep track of where the "cursor" is in the file using `current_offset`, after each persistent_load call, it will be at the offset where the header for the next record starts
- for the 0th storage, "data/0", we use the regular get_record_offset to determine the start of the storage
- for any other storage, (where the storages will be in order encountered by the unpickler, 0, 1, 2, 3, ...) we use `get_record_offset_no_read`, which re-uses the `getPadding` logic to determine the offset of the storage
- Note that `load_tensor` will only ever be called again with the same key if the storage's `._data_ptr()` is 0 [[pointer1](https://github.com/pytorch/pytorch/blob/main/torch/serialization.py#L1917-L1918)][[pointer2](https://github.com/pytorch/pytorch/blob/main/torch/serialization.py#L1936-L1937)], so we cache the offsets for this edge case
- After each storage, if the storage is non-zero, we account for the local dir footer based on the logic described above

## Testing strategy

The agreed upon testing strategy was as follows:
- Add debug code gated by an environment flag `TORCH_SERIALIZATION_DEBUG` that will run this offset calculation logic and verify it against getRecordOffset for each storage (when mmap=False)
- This flag is set throughout CI, which means that every time `torch.load` is called, the offset calculation logic is implicitly being tested.

Differential Revision: [D67673026](https://our.internmc.facebook.com/intern/diff/D67673026)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/143880
Approved by: https://github.com/albanD
ghstack dependencies: #143879
2025-01-31 17:09:20 +00:00
2d6f6637d3 Remove lexicographical sorting of storage keys in torch.save (#143879)
Currently the order lexicographical (i.e. 0, 10, 11, ...19, 2, ....) instead of 0, 1, 2, 3, 4, 5 (the order that storage metadata is actually pickled in), since PyTorch will never be used with Python < 3.7 we can be assured that the keys will be read in the order of insertion (numerically sorted)

This makes it such that the order storages are written in are the same as the pickling/unpickling order so we can calculate their offsets with less random reads

* __->__ #143879
Pull Request resolved: https://github.com/pytorch/pytorch/pull/143879
Approved by: https://github.com/albanD
2025-01-31 17:00:23 +00:00
dbef2a9bc9 Revert "Remove lexicographical sorting of storage keys in torch.save (#143879)"
This reverts commit 7db0afabaaff17dd37cf846cd786610ebf6aedd3.

Reverted https://github.com/pytorch/pytorch/pull/143879 on behalf of https://github.com/ZainRizvi due to Sorry but this is breaking internally. See D68746524 for details ([comment](https://github.com/pytorch/pytorch/pull/143879#issuecomment-2619661492))
2025-01-28 17:40:16 +00:00
9010649292 Revert "Add option to serialization config to reduce random reads from get_record_offset when loading with mmap=True (#143880)"
This reverts commit db3685a35cdce32622ab89f6c92e09d52210ff53.

Reverted https://github.com/pytorch/pytorch/pull/143880 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but either this PR or the base PR breaks distributed tests ([comment](https://github.com/pytorch/pytorch/pull/143880#issuecomment-2617743403))
2025-01-28 03:07:17 +00:00
db3685a35c Add option to serialization config to reduce random reads from get_record_offset when loading with mmap=True (#143880)
## Background

This PR adds `torch.utils.serialization.config.load.calculate_storage_offsets`. This option relies  on the previous PR in this stack, where storage order was changed to non lexicographical. A `.format_version` entry was added to the zipfile and `calculate_storage_offsets` will only work on checkpoints with `.format_version`.

When this is turned on, for `torch.load(mmap=True)`, offsets of each storage record (other than the 0th storage will be calculated instead of relying on `miniz` APIs to determine this).

The existing APIs will issue multiple random reads (reading the end of central directory record, then reading the zipfile header for the record) to determine the storage offset where the record starts. This can greatly degrade `torch.load(mmap=True)` performance for non-filesystem cases.

6aaae9d78f/caffe2/serialize/inline_container.cc (L589-L605)

## Testing strategy

The agreed upon testing strategy was as follows:
- Add debug code gated by an environment flag `TORCH_SERIALIZATION_DEBUG` that will run this offset calculation logic and verify it against getRecordOffset for each storage (when mmap=False)
- This flag is set throughout CI, which means that every time `torch.load` is called, the offset calculation logic is implicitly being tested.

Differential Revision: [D67673026](https://our.internmc.facebook.com/intern/diff/D67673026)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/143880
Approved by: https://github.com/albanD
ghstack dependencies: #143879
2025-01-27 23:57:30 +00:00
7db0afabaa Remove lexicographical sorting of storage keys in torch.save (#143879)
Currently the order lexicographical (i.e. 0, 10, 11, ...19, 2, ....) instead of 0, 1, 2, 3, 4, 5 (the order that storage metadata is actually pickled in), since PyTorch will never be used with Python < 3.7 we can be assured that the keys will be read in the order of insertion (numerically sorted)

This makes it such that the order storages are written in are the same as the pickling/unpickling order so we can calculate their offsets with less random reads

Differential Revision: [D67673025](https://our.internmc.facebook.com/intern/diff/D67673025)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/143879
Approved by: https://github.com/albanD
2025-01-27 23:57:30 +00:00
835e770bad Use typing.IO[bytes] instead of io.BytesIO in annotations (#144994)
Fixes #144976

Using appoach ① `IO[bytes]`, but could also try with a protocol.

## Notes:

- moved `torch.serialization.FILE_LIKE` to `torch.types.FileLike`
- Use `FileLike` annotation where it makes sense
- made sure those functions also support `os.PathLike`
- Replaced `isinstance(x, io.BytesIO)` with `isinstance(x, (io.IOBase, IO))` where appropriate.
- Replaced `BinaryIO` with `IO[bytes]` (the two ABCs are almost identical, the only difference is that `BinaryIO` allows `bytearray` input to `write`, whereas `IO[bytes]` only `bytes`)
- needed to make `torch.serialization._opener` generic to avoid LSP violations.
- skipped `torch/onnx/verification` for now (functions use `BytesIO.getvalue` which is not part of the `IO[bytes]` ABC, but it kind of seems that this is redundant, as e.g. `onnx.load` supports `str | PathLike[str] | IO[bytes]` directly...

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144994
Approved by: https://github.com/ezyang, https://github.com/Skylion007
2025-01-27 18:08:07 +00:00
0afd335174 PEP585 update - torch/nn torch/optim torch/package torch/profiler torch/serialization torch/sparse torch/xpu (#145175)
See #145101 for details.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145175
Approved by: https://github.com/bobrenjc93
2025-01-21 16:57:27 +00:00
5fd881a5b6 Revert "PEP585 update - torch/nn torch/optim torch/package torch/profiler torch/serialization torch/sparse torch/xpu (#145175)"
This reverts commit 54a00af2c6026a830f40d9e6a659ff81d51f9bc6.

Reverted https://github.com/pytorch/pytorch/pull/145175 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it seems to break some trunk tests ([comment](https://github.com/pytorch/pytorch/pull/145175#issuecomment-2603418267))
2025-01-21 00:49:55 +00:00
54a00af2c6 PEP585 update - torch/nn torch/optim torch/package torch/profiler torch/serialization torch/sparse torch/xpu (#145175)
See #145101 for details.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145175
Approved by: https://github.com/bobrenjc93
2025-01-20 22:32:59 +00:00
0eda02a94c Prevent legacy_load when weights_only=True (correctly) (#145020)
Only prevent `legacy_load` (.tar format removed in https://github.com/pytorch/pytorch/pull/713), not the whole of `_legacy_load` (.tar format + _use_new_zipfile_serialization=False)

Differential Revision: [D68301405](https://our.internmc.facebook.com/intern/diff/D68301405)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145020
Approved by: https://github.com/kit1980, https://github.com/albanD
2025-01-17 20:10:22 +00:00
aa4a1ff027 Revert "Prevent _legacy_load with weights_only=True (#144914)"
This reverts commit 7c3aa1da1c97812af54d41f3f0eff2ef922c0f32.

Reverted https://github.com/pytorch/pytorch/pull/144914 on behalf of https://github.com/izaitsevfb due to breaking inductor on trunk ([comment](https://github.com/pytorch/pytorch/pull/144914#issuecomment-2596922781))
2025-01-16 21:29:50 +00:00
7c3aa1da1c Prevent _legacy_load with weights_only=True (#144914)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144914
Approved by: https://github.com/malfet, https://github.com/albanD
2025-01-16 19:33:46 +00:00
8e483654cb Add config.save.use_pinned_memory_for_d2h to serialization config (#143342)
This was benchmarked with two separate scripts on my A100
(A) Save state_dict of llama3-style model on CUDA to disk with ``torch.save``
(B) Save `ModuleList` of 10 `nn.Linear(10,000, 10,000)` on CUDA to disk with `torch.save`
Timings are an average of 5 runs and benchmark scripts + results are attached

Under both scenarios, we see **~2x speedup in ``torch.save`` time with (``compute_crc32=False`` and ``use_pinned_memory_for_d2h=True``)** compared to the baseline of the current defaults (``compute_crc32=True`` and ``use_pinned_memory_for_d2h=False``

(A)  Save state_dict of llama3-style model on CUDA to disk with ``torch.save`` [[script](https://gist.github.com/mikaylagawarecki/d3a86ea1bb08045d1a839976808d7432)][[results](https://gist.github.com/mikaylagawarecki/f61a4714e5cff703146a1fcb7e0c755c)]

|                                                                                 |  use_pinned_memory_for_d2h=False (Default) |  use_pinned_memory_for_d2h=True |
|-|-|-|
| `compute_crc_32= True`  (Default)| 28.54s | 20.76s |
| `compute_crc_32 = False` | 22.57s |  **14.51s** |

(B) Save `ModuleList` of 10 `nn.Linear(10,000, 10,000)` on CUDA to disk with `torch.save` [[script](https://gist.github.com/mikaylagawarecki/ecbc505436bdd4b5190ef1b3430c12b6)][[results](https://gist.github.com/mikaylagawarecki/4e686bcf030b57de8c3ca74d8f5a88f7)]

|                                                                                 |  use_pinned_memory_for_d2h=False (Default) |  use_pinned_memory_for_d2h=True |
|-|-|-|
| `compute_crc_32= True`  (Default)| 8.38s | 5.53s |
| `compute_crc_32 = False` | 6.94s |  **3.99s** |

Trace of (A) with `use_pinned_memory_for_d2h=True`, `compute_crc32=False`
<img width="1745" alt="Screenshot 2024-12-16 at 7 32 33 PM" src="https://github.com/user-attachments/assets/80b87a8c-5a70-4eb9-ad66-7abc4aa7cc25" />

Baseline trace of (A) with `use_pinned_memory_for_d2h=False`, `compute_crc32=True`
<img width="1799" alt="Screenshot 2024-12-16 at 7 38 20 PM" src="https://github.com/user-attachments/assets/13fa12d1-8f5f-424c-adc4-275b67012927" />

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143342
Approved by: https://github.com/albanD
ghstack dependencies: #143324
2024-12-20 21:01:18 +00:00
3f63b742e6 Refactor serialization getter/setters into torch.utils.serialization.config (#143324)
Consolidate
- get/set_default_load_endianness
- get/set_default_mmap_options
- get/set_crc32_options

into one global dynamo-style config + allow global setting of mmap. The existing APIs are not removed and will get/set from the config (as they can't be removed for BC)

In #143459 I add the local (argument style) config

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143324
Approved by: https://github.com/albanD
2024-12-20 21:01:17 +00:00
ac8342f881 Prevent torch.jit.load path in torch.load when weights_only=True (#143326)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/143326
Approved by: https://github.com/albanD
2024-12-18 00:17:41 +00:00
b7b56576d8 Allow user to manually pass module.name associated with global in {add}_safe_global (#142153)
Fixes #142144

A global x is saved in checkpoint as `GLOBAL x.__module__ x.__name__`. So , after allowlisting a GLOBAL it is expected to match any GLOBAL instruction of the form `GLOBAL x.__module__ x.__name__`  but there are edge cases when for the same API from the same module, what `__module__` gives changes between versions which prevents users from allowlisting the global.

In this case, in numpy < 2.1

```
torch.save("bla", np_array)
# checkpoint has GLOBAL "np.core.multiarray" "_reconstruct"
```
In np version 2.1

```
with safe_globals([np.core.multiarray._reconstruct]):
    torch.load("bla")
```
np.core.multiarray._reconstruct.__module__ gives "np._core.multiarray" (note the extra _ before core) and see what was done [here](https://github.com/numpy/numpy/blob/main/numpy/core/multiarray.py)

Since the dictionary to access safe globals is keyed on "{foo.__module__}.{foo.__name__}", __module__, __name__ will no longer match that in the checkpoint so "np.core.multiarray._reconstruct" can no longer be properly allowlisted (instead np._core.multiarray._reconstruct is a key in the dict).

We allow `add_safe_globals/safe_globals` to optionally take tuples of (global, str of module.name) to workaround such (odd/edge case) situations.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/142153
Approved by: https://github.com/albanD
2024-12-06 18:56:39 +00:00
ecbb8a8800 Mention version of flip in weights_only error message (#141304)
Fixes https://github.com/pytorch/pytorch/issues/141139

How the 3 versions of the error message now look

### Version 1

Old error message:

```
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint.
        (1) Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
        (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
        WeightsUnpickler error: Unsupported global: GLOBAL __main__._rebuild_class_that_uses_build_instruction was not an allowed global by default. Please use `torch.serialization.add_safe_globals([_rebuild_class_that_uses_build_instruction])` or the `torch.serialization.safe_globals([_rebuild_class_that_uses_build_instruction])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
```

New error message:

```
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint.
        (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
        (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
        WeightsUnpickler error: Unsupported global: GLOBAL __main__._rebuild_class_that_uses_build_instruction was not an allowed global by default. Please use `torch.serialization.add_safe_globals([_rebuild_class_that_uses_build_instruction])` or the `torch.serialization.safe_globals([_rebuild_class_that_uses_build_instruction])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
````
### Version 2

Old error message:

```
_pickle.UnpicklingError: Weights only load failed. ``torch.nested`` and ``torch._dynamo`` must be imported to load nested jagged tensors (NJTs)
```

New error message:
```

_pickle.UnpicklingError: Weights only load failed. ``torch.nested`` and ``torch._dynamo`` must be imported to load nested jagged tensors (NJTs)
 In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.

 ```

 ### Version 3

Old error message
```
_pickle.UnpicklingError: Weights only load failed. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
Trying to load unsupported GLOBAL posix.execv whose module posix is blocked.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
```

New error message
```
_pickle.UnpicklingError: Weights only load failed. In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
Trying to load unsupported GLOBAL posix.execv whose module posix is blocked.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
````

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141304
Approved by: https://github.com/zou3519
2024-12-03 03:26:27 +00:00
b63a84804c Allow NJT by default for weights_only torch.load (take 2) (#140739)
Per discussion with @malfet, only allow weights_only unpickler to load NJT if `torch.nested` and `torch._dynamo`  are imported

(this is slightly weird as technically `torch.nested` is actually imported by default and `torch._dynamo.decorators._DimRange` is actually what needs to be imported)

we can't import this from `torch.nested` as this would
- undo dynamo lazy import
- cause circular import

===========================
Redo of https://github.com/pytorch/pytorch/pull/140304 caused issues as `torch.nested._internal.foo` needs to be imported, which causes issues like

```python
torch/_weights_only_unpickler.py", line 339, in load
    if full_path in _get_allowed_globals():
torch/_weights_only_unpickler.py", line 188, in _get_allowed_globals
    torch.nested._internal.nested_tensor.NestedTensor
AttributeError: module 'torch.nested' has no attribute '_internal'
```

**This likely wasn't caught in our CI because imports are global during unit tests(?), so we use subprocess to properly test this time**

Differential Revision: [D65961691](https://our.internmc.facebook.com/intern/diff/D65961691)

@jbschlosser
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140739
Approved by: https://github.com/malfet
2024-11-19 02:44:53 +00:00
41bb1539d3 Fix get_unsafe_globals_in_checkpoint to account for user allowed globals per docstring (#140738)
bugfix: this function did not account for the user allowed globals :(

Differential Revision: [D65960696](https://our.internmc.facebook.com/intern/diff/D65960696)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140738
Approved by: https://github.com/malfet
2024-11-15 22:47:35 +00:00
86d7d39bff Forward fix D65441551 for T206731737 (#139767)
Test Plan: -

Differential Revision: D65482429

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139767
Approved by: https://github.com/awgu
2024-11-05 23:19:08 +00:00
ca43ecd599 Flip default on weights_only (#137602)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137602
Approved by: https://github.com/malfet, https://github.com/albanD
ghstack dependencies: #138936, #139221, #139433, #139541
2024-11-04 18:30:29 +00:00
a979318ef7 Add section to serialization note re weights_only (#139433)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/139433
Approved by: https://github.com/malfet
ghstack dependencies: #138936, #139221
2024-11-01 21:51:50 +00:00
ea0e09b3f3 Add utility to get all unsafe globals in checkpoint (no pickletools dependency) (#139221)
Fixes https://github.com/pytorch/pytorch/issues/129698

https://github.com/pytorch/pytorch/pull/139106 without pickletools

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139221
Approved by: https://github.com/malfet
ghstack dependencies: #138936
2024-11-01 19:31:39 +00:00
bae3426af7 reimport pr137735 due to merging check issues (#138959)
This is  a cherry-pick from #137735 by @mikaylagawarecki , that cannot be merged due to a (wrongly) failing check for codev

@diff-train-skip-merge

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138959
Approved by: https://github.com/mikaylagawarecki
2024-10-27 16:31:34 +00:00
49ed365b22 [BE]: Update Typeguard to TypeIs for better type inference (#133814)
Uses TypeIs instead of TypeGuard for better inference. See https://peps.python.org/pep-0742/

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133814
Approved by: https://github.com/ezyang
2024-10-26 15:07:13 +00:00
e24871eb3c Add environment variable to force no weights_only load (#138225)
In preparation for `weights_only` flip, if users don't have access to the `torch.load` call

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138225
Approved by: https://github.com/albanD
2024-10-21 23:26:15 +00:00
32d4582e02 Revert "[BE]: Update Typeguard to TypeIs for better type inference (#133814)"
This reverts commit 16caa8c1b3a02e47b5f52d3c2d40d7931cc427dc.

Reverted https://github.com/pytorch/pytorch/pull/133814 on behalf of https://github.com/jeanschmidt due to checking if this will solve inductor errors ([comment](https://github.com/pytorch/pytorch/pull/133814#issuecomment-2427565425))
2024-10-21 19:40:58 +00:00
16caa8c1b3 [BE]: Update Typeguard to TypeIs for better type inference (#133814)
Uses TypeIs instead of TypeGuard for better inference. See https://peps.python.org/pep-0742/

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133814
Approved by: https://github.com/ezyang
2024-10-21 17:20:06 +00:00
c0582fd0f8 Remove unused Python variables in torch/[b-z]* (#136963)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136963
Approved by: https://github.com/ezyang
2024-10-19 16:45:22 +00:00
d531bd509e [Docs] Fix description in torch.save docs to show default for pickle_protocol instead of variable name (#138153)
Fixes #138013

Replace `DEFAULT_PROTOCOL` with actual default value `2` in `torch.save` method document

Before
![image](https://github.com/user-attachments/assets/cdd77d14-c009-4848-8538-9256bf22c32a)

After
![image](https://github.com/user-attachments/assets/f6b1063d-c955-478a-8d42-702b988426aa)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138153
Approved by: https://github.com/mikaylagawarecki
2024-10-17 18:13:05 +00:00
dd32a32cb6 Revert "Expose option to disable CRC-32 computation during torch.save (#137735)"
This reverts commit 534fa96f2d9a4feb1dcdfaecb3d73990db60f819.

Reverted https://github.com/pytorch/pytorch/pull/137735 on behalf of https://github.com/clee2000 due to failing internally D64438525, probably needs gating ([comment](https://github.com/pytorch/pytorch/pull/137735#issuecomment-2417412264))
2024-10-16 17:03:06 +00:00
534fa96f2d Expose option to disable CRC-32 computation during torch.save (#137735)
Option only works in open source, not internal

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137735
Approved by: https://github.com/albanD
2024-10-15 19:30:02 +00:00
a00faf4408 [3.13] fix 3.13 pickle error in serialization.py (#136034)
Error encountered when adding dynamo 3.13 support.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136034
Approved by: https://github.com/albanD
2024-09-14 00:02:40 +00:00