24 Commits

Author SHA1 Message Date
086dec3235 Pyrefly suppressions 6/n (#164877)
Adds suppressions to pyrefly will typecheck clean: https://github.com/pytorch/pytorch/issues/163283

Almost there!

Test plan:
dmypy restart && python3 scripts/lintrunner.py -a
pyrefly check

step 1: delete lines in the pyrefly.toml file from the project-excludes field
step 2: run pyrefly check
step 3: add suppressions, clean up unused suppressions
before: https://gist.github.com/maggiemoss/4b3bf2037014e116bc00706a16aef199

after:

INFO 0 errors (5,064 ignored)

Only four directories left to enable

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164877
Approved by: https://github.com/oulgen
2025-10-08 02:30:57 +00:00
2f9d378f7b PEP585 update - torch/utils (#145201)
See #145101 for details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145201
Approved by: https://github.com/bobrenjc93
2025-01-21 21:04:10 +00:00
90e81a157a Migrate from Tuple -> tuple in torch/utils/data (#144255)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144255
Approved by: https://github.com/andrewkho
2025-01-08 04:09:45 +00:00
5e8e1d725a Remove some unused type ignores (round 1) (#142325)
Over time, a large number of the existing type ignores have become irrelevant/unused/dead as a result of improvements in annotations and type checking.

Having these `# type: ignore` linger around is not ideal for two reasons:

- They are verbose/ugly syntatically.
- They could hide genuine bugs in the future, if a refactoring would actually introduce a bug but it gets hidden by the ignore.

I'm counting over 1500 unused ignores already. This is a first PR that removes some of them. Note that I haven't touched type ignores that looked "conditional" like the import challenge mentioned in https://github.com/pytorch/pytorch/pull/60006#issuecomment-2480604728. I will address these at a later point, and eventually would enable `warn_unused_ignores = True` in the mypy configuration as discussed in that comment to prevent accumulating more dead ignores going forward.

This PR should have no effect on runtime at all.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/142325
Approved by: https://github.com/Skylion007, https://github.com/janeyx99
2024-12-09 18:23:46 +00:00
7cf0b90e49 [BE] enable UFMT in torch.utils.data (#127705)
Part of #123062

- #123062

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127705
Approved by: https://github.com/ezyang
ghstack dependencies: #127706, #127704
2024-06-27 23:16:24 +00:00
f911957573 [BE] sort imports in torch.utils.data (#127704)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127704
Approved by: https://github.com/ezyang
ghstack dependencies: #127706
2024-06-27 23:16:24 +00:00
8db9dfa2d7 Flip default value for mypy disallow_untyped_defs [9/11] (#127846)
See #127836 for details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127846
Approved by: https://github.com/ezyang
ghstack dependencies: #127842, #127843, #127844, #127845
2024-06-08 18:50:06 +00:00
7c33ce7702 [CI] Install dill in ci (#116214)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116214
Approved by: https://github.com/malfet
ghstack dependencies: #116230
2024-01-24 23:42:35 +00:00
92e7f79609 Doc: Add and Fix docstrings for torch.util.data files (#112817)
Fixes #112635

Fix docstrings for `torch.utils.data` files.

```
Before:
> pydocstyle torch/utils/data/graph.py --count
Before: 5
After: 1

> pydocstyle torch/utils/data/graph_settings.py --count
Before: 8
After: 3

> pydocstyle torch/utils/data/dataloader.py --count
Before: 12
After: 6

> pydocstyle torch/utils/data/dataset.py --count
Before: 28
After: 23

> pydocstyle torch/utils/data/sampler.py --count
Before: 24
After: 19

> pydocstyle torch/utils/data/_utils/signal_handling.py --count
Before: 1
After: 0

> pydocstyle torch/utils/data/_utils/__init__.py --count
Before: 2
After: 0

> pydocstyle torch/utils/data/_utils/collate.py --count
Before: 20
After: 6

> pydocstyle torch/utils/data/_utils/fetch.py --count
Before: 3
After: 0

> pydocstyle torch/utils/data/_utils/pin_memory.py --count
Before: 4
After: 1

> pydocstyle torch/utils/data/datapipes/_decorator.py --count
Before: 19
After: 16

> pydocstyle torch/utils/data/datapipes/_hook_iterator.py --count
Before: 13
After: 0

> pydocstyle torch/utils/data/datapipes/_typing.py --count
Before: 17
After: 4

> pydocstyle torch/utils/data/datapipes/gen_pyi.py --count
Before: 19
After: 4
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112817
Approved by: https://github.com/kit1980
2023-11-07 17:59:56 +00:00
abc1cadddb [BE] Enable ruff's UP rules and autoformat utils/ (#105424)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105424
Approved by: https://github.com/ezyang, https://github.com/malfet
2023-07-18 20:17:25 +00:00
64a526d4af [DataLoader] Replacing traverse function with traverse_datapipes (#85667)
This PR deprecates `traverse` function and replaces it with `traverse_datapipes` instead.

While use `DataLoader`, I realized that it is raising `FutureWarning` even though I am not explicitly using `traverse`. What is happening is that `DataLoader` invokes `traverse(dp, only_datapipe=True)`, and the usage of the keyword causes the `only_datapipe` warning to be raised.

```
/home/ubuntu/miniconda3/lib/python3.8/site-packages/torch/utils/data/graph.py:102: FutureWarning: `only_datapipe` is deprecated from `traverse` function and will be removed after 1.13.
  warnings.warn(msg, FutureWarning)
```

A few things we'd like to do:
1. Deprecate the key word arg `only_datapipe`
2. Change the default behavior from `only_datapipe=False` to `only_datapipe=True` in the future
3. Do not raise a warning when users are using the function correctly

This creates a paradox it is impossible for the users to change their code to match the future default behavior (i.e. call `traverse(dp)` without `only_datapipe`):
  - they cannot do so because the default behavior of `traverse` hasn't changed yet, so they must use `only_datapipe=True`
  - if they use `only_datapipe=True`, eventually the kwarg will go away and cause a runtime error; they also get a `FutureWarning` in the present

IIUC, there doesn't seem to be a way to accomplish those 3 goals without replacing the function with a new one that has a different name; hence, this PR. Let me know if there is a better alternative.

If this looks right, I will send a follow up PR in `TorchData`.

Differential Revision: [D39832183](https://our.internmc.facebook.com/intern/diff/D39832183)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85667
Approved by: https://github.com/ejguan
2022-09-27 19:58:15 +00:00
ea72a0991c Add support to traverse all python collection objects (#84079)
Fixes https://github.com/pytorch/data/issues/752

This PR makes `traverse` function supporting more collections data structures from Python. The `getstate_hook` will be invoked after custom `__getstate__` function. This would guarantee that `traverse` function will be working as long as the `DataPipe` is working properly with multiprocessing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84079
Approved by: https://github.com/NivekT, https://github.com/VitalyFedyunin
2022-09-23 16:21:25 +00:00
d50aa517b5 Revert "Add support to traverse all python collection objects (#84079)"
This reverts commit e0f0c8e7b9acf6b821956acadbe79aaa0f6f0237.

Reverted https://github.com/pytorch/pytorch/pull/84079 on behalf of https://github.com/weiwangmeta due to Diff reverted internally
2022-08-29 06:34:50 +00:00
e0f0c8e7b9 Add support to traverse all python collection objects (#84079)
Fixes https://github.com/pytorch/data/issues/752

This PR makes `traverse` function supporting more collections data structures from Python. Please let me know if anyone has a better idea about how to elegantly check if the object is a collection then we can dive into this object to see wether there is any DataPipe wrapped.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84079
Approved by: https://github.com/NivekT
2022-08-26 21:02:43 +00:00
782f18e9b5 [DLv2] Make graph traverse working with unhashable DataPipe (#80509)
Summary:
This Diff removes the requirement for `traverse` function that `DataPipe` needs to be hash-able. `traverse` function now is using `id` of `DataPipe` instance rather than `DataPipe` itself as the key for both `cache` and graph.

But, it requires the changes of type of `DataPipeGraph` from `Dict[DataPipe, "DataPipeGraph"]` to `Dict[int, Tuple[DataPipe, "DataPipeGraph"]]`.

Differential Revision: D37354153

Ref PR in TorchData: https://github.com/pytorch/data/pull/559
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80509
Approved by: https://github.com/VitalyFedyunin
2022-07-12 14:47:42 +00:00
a008d19ff7 [DataPipe] Revamp serialization logic of DataPipes
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74984

Approved by: https://github.com/ejguan
2022-05-10 16:16:46 +00:00
383f026791 [DataPipe] Enabling graph traversal for MapDataPipe
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74851

Approved by: https://github.com/ejguan
2022-04-22 18:06:16 +00:00
0289ab2cec Fix data-related public API (#368)
Summary:
X-link: https://github.com/pytorch/data/pull/368

This is PR aims to expose the right data-relate API.

There are two more changes made in this PR to convert public api to private api
`check_lambda_fn` -> `_check_lambda_fn`
`deprecation_warning` -> `_deprecation_warning`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76143

Reviewed By: albanD, NivekT

Differential Revision: D35798311

Pulled By: ejguan

fbshipit-source-id: b13fded5c88a533c706702fb2070c918c839dca4
(cherry picked from commit 0b534b829a2e90e1e533951c6d334fdeaa9358b9)
2022-04-21 17:27:05 -07:00
841a7f5187 [DataPipe] apply dill serialization for _Demux and add cache to traverse
- Fix _Demux can not be pickled with DILL presented https://github.com/pytorch/pytorch/pull/74958#issuecomment-1084637227
- And add cache to traverse function to prevent infinite recursion for circular reference of DataPipe (Fixes https://github.com/pytorch/data/issues/237)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75034
Approved by: https://github.com/wenleix
2022-04-04 19:45:14 +00:00
4fc6ab5e81 [DataPipe] Fix OOM when traverse IterDataPipe due to pickling (#72209)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72209

This PR would fix https://fburl.com/jtfyksyr

Test Plan: Imported from OSS

Reviewed By: dagitses, NivekT

Differential Revision: D33955933

Pulled By: ejguan

fbshipit-source-id: 120203a3c2323a0c7081715bb6628d1768f8b1c4
(cherry picked from commit 469f3d056276ad34c31dad4a3ba5c570f511854a)
2022-02-03 22:55:04 +00:00
74c834e0dc [DataPipe] adding a finally statement to ensure hook is reset (#70214)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70214

cc VitalyFedyunin ejguan NivekT

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D33255306

Pulled By: NivekT

fbshipit-source-id: de2fe6bf08328e481c714aaad390db771073469e
2021-12-21 15:21:04 -08:00
d90012689f [DataPipe] Control shuffle settings from DataLoader2 (#65756)
Summary:
Makes `shuffle` DataPipe sensitive to DataLoader(2) `shuffle` kwarg.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65756

Reviewed By: albanD

Differential Revision: D31344867

Pulled By: VitalyFedyunin

fbshipit-source-id: e0084e0ac193ac784d6298328ca1222745681347
2021-12-14 07:35:26 -08:00
4c87aa77d1 [DataPipe] Traverse DataPipe graph excluding primitive and callable (#67783)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67783

Add `getstate_hook` to exclude primitive objects and callable when serialization when `exclude_primitive` is enabled for `traverse`.
For graph traversing, we don't have to handle the lambda and other stuff.
This is used by `OnDiskCacheHolder` to trace the DataPipe Graph.

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D32146697

Pulled By: ejguan

fbshipit-source-id: 03b2ce981bb21066e807f57c167b77b2d0e0ce61
2021-11-15 06:46:31 -08:00
f2857883c4 Add DataPipes Graph Functions (#61235)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61235

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D29588834

Pulled By: VitalyFedyunin

fbshipit-source-id: e0331d6e1fc2a3f8b6211aac83965bcf13165161
2021-07-12 10:28:35 -07:00