24 Commits

Author SHA1 Message Date
2f9d378f7b PEP585 update - torch/utils (#145201)
See #145101 for details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145201
Approved by: https://github.com/bobrenjc93
2025-01-21 21:04:10 +00:00
5e8e1d725a Remove some unused type ignores (round 1) (#142325)
Over time, a large number of the existing type ignores have become irrelevant/unused/dead as a result of improvements in annotations and type checking.

Having these `# type: ignore` linger around is not ideal for two reasons:

- They are verbose/ugly syntatically.
- They could hide genuine bugs in the future, if a refactoring would actually introduce a bug but it gets hidden by the ignore.

I'm counting over 1500 unused ignores already. This is a first PR that removes some of them. Note that I haven't touched type ignores that looked "conditional" like the import challenge mentioned in https://github.com/pytorch/pytorch/pull/60006#issuecomment-2480604728. I will address these at a later point, and eventually would enable `warn_unused_ignores = True` in the mypy configuration as discussed in that comment to prevent accumulating more dead ignores going forward.

This PR should have no effect on runtime at all.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/142325
Approved by: https://github.com/Skylion007, https://github.com/janeyx99
2024-12-09 18:23:46 +00:00
4d7bf72d93 [BE][Easy] fix ruff rule needless-bool (SIM103) (#130206)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130206
Approved by: https://github.com/malfet
2024-07-14 08:17:52 +00:00
7cf0b90e49 [BE] enable UFMT in torch.utils.data (#127705)
Part of #123062

- #123062

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127705
Approved by: https://github.com/ezyang
ghstack dependencies: #127706, #127704
2024-06-27 23:16:24 +00:00
f911957573 [BE] sort imports in torch.utils.data (#127704)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127704
Approved by: https://github.com/ezyang
ghstack dependencies: #127706
2024-06-27 23:16:24 +00:00
8db9dfa2d7 Flip default value for mypy disallow_untyped_defs [9/11] (#127846)
See #127836 for details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127846
Approved by: https://github.com/ezyang
ghstack dependencies: #127842, #127843, #127844, #127845
2024-06-08 18:50:06 +00:00
67ef2683d9 [BE] wrap deprecated function/class with typing_extensions.deprecated (#127689)
Use `typing_extensions.deprecated` for deprecation annotation if possible. Otherwise, add `category=FutureWarning` to `warnings.warn("message")` if the category is missing.

Note that only warnings that their messages contain `[Dd]eprecat(ed|ion)` are updated in this PR.

Resolves #126888

- #126888

This PR is split from PR #126898.

- #126898

------

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127689
Approved by: https://github.com/Skylion007
2024-06-02 12:30:43 +00:00
033e733021 Revert "[BE] wrap deprecated function/class with typing_extensions.deprecated (#126898)"
This reverts commit 749a132fb0a8325cbad4734a563aa459ca611991.

Reverted https://github.com/pytorch/pytorch/pull/126898 on behalf of https://github.com/fbgheith due to switching typing-extensions=4.3.0 to 4.9.0 causes internal failure ([comment](https://github.com/pytorch/pytorch/pull/126898#issuecomment-2142884456))
2024-05-31 19:47:24 +00:00
749a132fb0 [BE] wrap deprecated function/class with typing_extensions.deprecated (#126898)
Use `typing_extensions.deprecated` for deprecation annotation if possible. Otherwise, add `category=FutureWarning` to `warnings.warn("message")` if the category is missing.

Note that only warnings that their messages contain `[Dd]eprecat(ed|ion)` are updated in this PR.

UPDATE: Use `FutureWarning` instead of `DeprecationWarning`.

Resolves #126888

- #126888

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126898
Approved by: https://github.com/albanD
2024-05-29 12:09:27 +00:00
92e7f79609 Doc: Add and Fix docstrings for torch.util.data files (#112817)
Fixes #112635

Fix docstrings for `torch.utils.data` files.

```
Before:
> pydocstyle torch/utils/data/graph.py --count
Before: 5
After: 1

> pydocstyle torch/utils/data/graph_settings.py --count
Before: 8
After: 3

> pydocstyle torch/utils/data/dataloader.py --count
Before: 12
After: 6

> pydocstyle torch/utils/data/dataset.py --count
Before: 28
After: 23

> pydocstyle torch/utils/data/sampler.py --count
Before: 24
After: 19

> pydocstyle torch/utils/data/_utils/signal_handling.py --count
Before: 1
After: 0

> pydocstyle torch/utils/data/_utils/__init__.py --count
Before: 2
After: 0

> pydocstyle torch/utils/data/_utils/collate.py --count
Before: 20
After: 6

> pydocstyle torch/utils/data/_utils/fetch.py --count
Before: 3
After: 0

> pydocstyle torch/utils/data/_utils/pin_memory.py --count
Before: 4
After: 1

> pydocstyle torch/utils/data/datapipes/_decorator.py --count
Before: 19
After: 16

> pydocstyle torch/utils/data/datapipes/_hook_iterator.py --count
Before: 13
After: 0

> pydocstyle torch/utils/data/datapipes/_typing.py --count
Before: 17
After: 4

> pydocstyle torch/utils/data/datapipes/gen_pyi.py --count
Before: 19
After: 4
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112817
Approved by: https://github.com/kit1980
2023-11-07 17:59:56 +00:00
2f95a3d0fc [BE]: Apply ruff PERF fixes to torch (#104917)
Applies automated ruff fixes in the PERF modules and enables all automatic ones. I also updated ruff which applied some additional fixes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104917
Approved by: https://github.com/ezyang, https://github.com/albanD
2023-07-11 20:45:21 +00:00
29d2e4b7fa Forward fix for DataLoader to accept custom Sharding DataPipe (#97287)
Fixes #96975

Changes:
- Make sure custom ShardingDataPipe with `apply_sharding` can be used by `DataLoader`
  - Allow the `apply_sharding` function without the last argument of `sharding_group`
- Make `DataLoader` not relying on `sharding_group`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97287
Approved by: https://github.com/NivekT
2023-04-05 22:33:37 +00:00
d6dec1a5cf Refactor sharding data pipe into a seperate file (#94095)
Move `ShardingFilterIterDataPipe` into a dedicated file.

Also, propose to have a dedicated parent class (`_ShardingIterDataPipe`) for sharding data pipe, as this seems more like a "system/engine-level" datapipe that gives strong hints to RS on how to execute, and needs first-class citizen treatment in RS (compared with other "user-level" datapipe that are mostly composable `Callable[[Iterable], Iterable]`.  So we don't need to based on whether `is_shardable` and `apply_sharding` are presented in DataPipe in `graph_settings.py`. But open to other discussions.

Open question: Should
[ShardingRoundRobinDispatcherIterDataPipe](01fc762003/torchdata/datapipes/iter/util/sharding.py (L16-L17)) also be considered as a `_ShardingIterDataPipe`? (e.g. this sharding is executed by replicating (the metadata), while `ShardingRoundRobinDispatcherIterDataPipe` hints too expensive to replicate so requires round robin data exchange/dispatch).

Differential Revision: D43014692

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94095
Approved by: https://github.com/ejguan, https://github.com/NivekT
2023-02-07 09:12:02 +00:00
7a112c43c1 [DataLoader2] Fix apply_sharding to accept one sharding_filter per branch (#90769)
Changes:
- Allow multiple `sharding_filter` in the pipeline as long as they are not on the same branch
- [x] Add test

Example:
```mermaid
graph TD;
DP1-->sharding_filter_1;
sharding_filter_1-->DP3;
DP2-->sharding_filter_2;
sharding_filter_2-->DP4;
DP3-->DP4;
DP4-->output;
```
In order to properly shard `DP1` and `DP2`, we should allow multiple `sharding_filter`s
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90769
Approved by: https://github.com/NivekT
2022-12-14 22:03:41 +00:00
9dadf8fcc2 [DataPipes] Add group support to the sharding_filter (#88424)
Differential Revision: [D41006747](https://our.internmc.facebook.com/intern/diff/D41006747)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88424
Approved by: https://github.com/ejguan
2022-11-07 22:07:01 +00:00
64a526d4af [DataLoader] Replacing traverse function with traverse_datapipes (#85667)
This PR deprecates `traverse` function and replaces it with `traverse_datapipes` instead.

While use `DataLoader`, I realized that it is raising `FutureWarning` even though I am not explicitly using `traverse`. What is happening is that `DataLoader` invokes `traverse(dp, only_datapipe=True)`, and the usage of the keyword causes the `only_datapipe` warning to be raised.

```
/home/ubuntu/miniconda3/lib/python3.8/site-packages/torch/utils/data/graph.py:102: FutureWarning: `only_datapipe` is deprecated from `traverse` function and will be removed after 1.13.
  warnings.warn(msg, FutureWarning)
```

A few things we'd like to do:
1. Deprecate the key word arg `only_datapipe`
2. Change the default behavior from `only_datapipe=False` to `only_datapipe=True` in the future
3. Do not raise a warning when users are using the function correctly

This creates a paradox it is impossible for the users to change their code to match the future default behavior (i.e. call `traverse(dp)` without `only_datapipe`):
  - they cannot do so because the default behavior of `traverse` hasn't changed yet, so they must use `only_datapipe=True`
  - if they use `only_datapipe=True`, eventually the kwarg will go away and cause a runtime error; they also get a `FutureWarning` in the present

IIUC, there doesn't seem to be a way to accomplish those 3 goals without replacing the function with a new one that has a different name; hence, this PR. Let me know if there is a better alternative.

If this looks right, I will send a follow up PR in `TorchData`.

Differential Revision: [D39832183](https://our.internmc.facebook.com/intern/diff/D39832183)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85667
Approved by: https://github.com/ejguan
2022-09-27 19:58:15 +00:00
33bb8ae350 Set shuffle to DataPipes with set_shuffle API (#83741)
This PR requires PR is landed: https://github.com/pytorch/pytorch/pull/83202

## changes
- For `apply_shuffle_setting` and `apply_shuffle_seed`, it makes sure it will apply shuffle setting to each of DataPipe that contains a method called `set_shuffle` or `set_seed`.
- Change the API from `apply_shuffle_seed` to `apply_random_seed`.
- Fix a bug that `apply_shuffle_seed` only accepts DataPipe that is hashable. After the PR, this function uses `id` to prevent seeding the same DataPipe multiple times per epoch.
- Fix another bug from `shuffler` that `reset` with `_enable=False` would also reset `_seed`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83741
Approved by: https://github.com/NivekT
2022-09-13 13:38:58 +00:00
782f18e9b5 [DLv2] Make graph traverse working with unhashable DataPipe (#80509)
Summary:
This Diff removes the requirement for `traverse` function that `DataPipe` needs to be hash-able. `traverse` function now is using `id` of `DataPipe` instance rather than `DataPipe` itself as the key for both `cache` and graph.

But, it requires the changes of type of `DataPipeGraph` from `Dict[DataPipe, "DataPipeGraph"]` to `Dict[int, Tuple[DataPipe, "DataPipeGraph"]]`.

Differential Revision: D37354153

Ref PR in TorchData: https://github.com/pytorch/data/pull/559
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80509
Approved by: https://github.com/VitalyFedyunin
2022-07-12 14:47:42 +00:00
365ce350cb Make ShufflerDataPipe deterministic for SP & MP DataLoader (#77741)
This is the first PR to make DataPipe deterministic.

Users should be able to use `torch.manual_seed(seed)` to control the shuffle order for the following cases:
- Directly over `DataPipe`
- For single-process DataLoader
- Multiprocessing DataLoader

Unfortunately, for distributed training, users have to run `apply_shuffle_seed` manually to make sure all distributed processes having the same order of shuffle.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77741
Approved by: https://github.com/VitalyFedyunin, https://github.com/NivekT
2022-05-18 23:32:07 +00:00
0289ab2cec Fix data-related public API (#368)
Summary:
X-link: https://github.com/pytorch/data/pull/368

This is PR aims to expose the right data-relate API.

There are two more changes made in this PR to convert public api to private api
`check_lambda_fn` -> `_check_lambda_fn`
`deprecation_warning` -> `_deprecation_warning`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76143

Reviewed By: albanD, NivekT

Differential Revision: D35798311

Pulled By: ejguan

fbshipit-source-id: b13fded5c88a533c706702fb2070c918c839dca4
(cherry picked from commit 0b534b829a2e90e1e533951c6d334fdeaa9358b9)
2022-04-21 17:27:05 -07:00
3c10987692 don't add extra shuffle in DataLoader2 if one is present
Without this, `DataLoader2` will just add an `Shuffler` to the end of the datapipe if `shuffle=True`:

```py
from torch.utils.data.dataloader_experimental import DataLoader2

from torchdata.datapipes.iter import IterableWrapper, IterDataPipe, Shuffler

class Sorter(IterDataPipe):
    def __init__(self, datapipe):
        self.datapipe = datapipe

    def __iter__(self):
        return iter(sorted(self.datapipe))

data = list(range(1000))
dp = IterableWrapper(data)
dp = Shuffler(dp).set_shuffle(False)
dp = Sorter(dp)

dl2 = DataLoader2(dp, shuffle=True, batch_size=None)

assert list(dl2) == data  # fails unless you hit a lucky random seed
```

This example is somewhat non-sensical, but demonstrates we cannot simply add a `Shuffler`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75014
Approved by: https://github.com/ejguan
2022-04-05 19:53:08 +00:00
5667c4ea21 Remove default parameter of ShufflerIterDataPipe (#74370)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74370

Closes https://github.com/pytorch/data/issues/298. This PR:

- removes the `default` parameter of `ShufflerIterDataPipe`
- renames `set_shuffle_setting()` into `set_shuffle()`
- let `set_shuffle()` return `self`.

Test Plan: Imported from OSS

Reviewed By: george-qi

Differential Revision: D35073666

Pulled By: NicolasHug

fbshipit-source-id: 9847b037e70f44f36eaf4471f2c12fa8ec2ed73c
(cherry picked from commit b07ab646f308532886e8daddd57e937a53edb153)
2022-03-28 12:47:24 +00:00
4fc6ab5e81 [DataPipe] Fix OOM when traverse IterDataPipe due to pickling (#72209)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72209

This PR would fix https://fburl.com/jtfyksyr

Test Plan: Imported from OSS

Reviewed By: dagitses, NivekT

Differential Revision: D33955933

Pulled By: ejguan

fbshipit-source-id: 120203a3c2323a0c7081715bb6628d1768f8b1c4
(cherry picked from commit 469f3d056276ad34c31dad4a3ba5c570f511854a)
2022-02-03 22:55:04 +00:00
d90012689f [DataPipe] Control shuffle settings from DataLoader2 (#65756)
Summary:
Makes `shuffle` DataPipe sensitive to DataLoader(2) `shuffle` kwarg.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65756

Reviewed By: albanD

Differential Revision: D31344867

Pulled By: VitalyFedyunin

fbshipit-source-id: e0084e0ac193ac784d6298328ca1222745681347
2021-12-14 07:35:26 -08:00