Commit Graph

223 Commits

Author SHA1 Message Date
221350e3a4 Add None return type to init -- tests (#132352)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132352
Approved by: https://github.com/ezyang
ghstack dependencies: #132335, #132351
2024-08-01 15:44:51 +00:00
49a8e061b6 Revert "Support IPC for Expandable Segments (#130890)"
This reverts commit 0e71a88f9b2ca6b950c76a061791559cdd8a8870.

Reverted https://github.com/pytorch/pytorch/pull/130890 on behalf of https://github.com/zdevito due to some internal tests show shutdown issues with the change to the table that holds ipc handles ([comment](https://github.com/pytorch/pytorch/pull/130890#issuecomment-2250767280))
2024-07-25 15:54:57 +00:00
0e71a88f9b Support IPC for Expandable Segments (#130890)
This reapplication commit is the same as before except it resolves a build error in an internal build where `handle` was shadowed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130890
Approved by: https://github.com/dsjohns2
2024-07-24 15:45:40 +00:00
1e86387871 Revert "Support IPC for Expandable Segments (#130890)"
This reverts commit 32c2f84e349ad6e34b8559d3f1f9c27020ae702f.

Reverted https://github.com/pytorch/pytorch/pull/130890 on behalf of https://github.com/zdevito due to variable shadowing broke internal tests ([comment](https://github.com/pytorch/pytorch/pull/130890#issuecomment-2245456085))
2024-07-23 14:46:28 +00:00
32c2f84e34 Support IPC for Expandable Segments (#130890)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130890
Approved by: https://github.com/dsjohns2
ghstack dependencies: #130888, #130889
2024-07-22 16:15:01 +00:00
d80939e5e9 [BE] enable UFMT for torch/storage.py (#127706)
Part of #123062

- #123062

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127706
Approved by: https://github.com/ezyang
2024-06-27 23:16:24 +00:00
3f47c72268 add multiprocessing checks in test_dataloader.py (#128244)
Add multiprocessing checks in test_dataloader.py for tests requiring multiprocessing similar to test_multiprocessing.py: https://github.com/pytorch/pytorch/blob/main/test/test_multiprocessing.py#L41-L52. Change all Jetson skips to TEST_CUDA_IPC checks since that is the root cause of the failures on Jetson in the first place.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128244
Approved by: https://github.com/eqy, https://github.com/malfet
2024-06-15 01:32:55 +00:00
a05b2ae302 Enable UFMT on test/test_dataloader.py (#124710)
Part of: #123062

Ran lintrunner on:

- test/test_custom_op_testing.py (already deleted)
- test/test_dataloader.py

Detail:

```bash
$ lintrunner -a --take UFMT --all-files
ok No lint issues.
Successfully applied all patches.
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124710
Approved by: https://github.com/soulitzer
2024-04-28 21:21:51 +00:00
29cc293725 [BE]: FURB142 - Remove set mutations. Use set update (#124551)
Uses set mutation methods instead of manually reimplementing (update, set_difference etc).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124551
Approved by: https://github.com/ezyang
2024-04-21 14:12:33 +00:00
5a1216bb2e [BE]: Update ruff to 0.4.1 (#124549)
Update ruff to 0.4.1 .
This version fixes a lot false negatives/false positives, is 20-40% faster, and has various other bug fixes.

Below is a before and after table showing the execution time of ruff lint and ruff format in milliseconds courtesy of https://astral.sh/blog/ruff-v0.4.0

| Repository                                         | Linter (v0.3) | Linter (v0.4) | Formatter (v0.3) | Formatter (v0.4) |
|----------------------------------------------------|---------------|---------------|------------------|------------------|
| [pytorch/pytorch](https://github.com/pytorch/pytorch) | 328.7         | 251.8         | 351.1            | 274.9            |

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124549
Approved by: https://github.com/ezyang
2024-04-21 14:06:23 +00:00
1562dae62c [BE]: Apply RUF025 dict.fromkeys preview rule (#118637)
Simplifies and optimizes dict construction using the `fromkeys` classmethod ctor. This also makes it really obvious when all the keys will have the same static value, which could be a bug if unintentional. It is also significantly faster than using a dict comprehension. The rule is in preview, but I am adding a forward fix for when it becomes stable.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118637
Approved by: https://github.com/albanD
2024-01-30 20:46:54 +00:00
7c33ce7702 [CI] Install dill in ci (#116214)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116214
Approved by: https://github.com/malfet
ghstack dependencies: #116230
2024-01-24 23:42:35 +00:00
ee5d981249 [BE]: Enable RUFF PERF402 and apply fixes (#115505)
* Enable PERF402. Makes code more efficient and succinct by removing useless list copies that could be accomplished either via a list constructor or extend call. All test cases have noqa added since performance is not as sensitive in that folder.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115505
Approved by: https://github.com/malfet
2023-12-20 18:01:24 +00:00
376217cc0b [BE]: Apply FURB145 to make code more readable and idiomatic. (#112990)
Testing out some new rules that are in beta, I think I will apply this one codebase wide once it's out of preview. Replaces the hack of using `[:]` to do copies of list with the proper copy method. More efficient and more readable.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112990
Approved by: https://github.com/ezyang
2023-11-06 13:15:04 +00:00
91eeb77260 StackDataset batched sampling (#110694)
Optimization of loading minibatches

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110694
Approved by: https://github.com/ejguan
2023-10-10 22:05:51 +00:00
ac01304e22 pin_memory support for NT (#110404)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110404
Approved by: https://github.com/cpuhrsch, https://github.com/albanD
ghstack dependencies: #110292
2023-10-10 21:58:19 +00:00
43ea782af3 Multiprocessing support for NT (#110292)
Fixes #110161

Allows NTs to be used in DataLoaders with `num_workers > 1`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110292
Approved by: https://github.com/cpuhrsch, https://github.com/albanD
2023-10-10 21:58:19 +00:00
dac895c10a Revert "Multiprocessing support for NT (#110292)"
This reverts commit f17fe89e14ef7c29690d989c857ae011b8589b80.

Reverted https://github.com/pytorch/pytorch/pull/110292 on behalf of https://github.com/kit1980 due to Causes CUDA memory leaks ([comment](https://github.com/pytorch/pytorch/pull/110292#issuecomment-1749852095))
2023-10-06 01:07:40 +00:00
81ce5d5725 Revert "pin_memory support for NT (#110404)"
This reverts commit 3597325bc7f07d97ded1c94c47bb59c98e080a0f.

Reverted https://github.com/pytorch/pytorch/pull/110404 on behalf of https://github.com/kit1980 due to Previous PR in the stack caused CUDA memory leaks ([comment](https://github.com/pytorch/pytorch/pull/110404#issuecomment-1749850211))
2023-10-06 01:03:17 +00:00
3597325bc7 pin_memory support for NT (#110404)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110404
Approved by: https://github.com/cpuhrsch, https://github.com/albanD
ghstack dependencies: #110292
2023-10-05 16:33:22 +00:00
f17fe89e14 Multiprocessing support for NT (#110292)
Fixes #110161

Allows NTs to be used in DataLoaders with `num_workers > 1`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110292
Approved by: https://github.com/cpuhrsch, https://github.com/albanD
2023-10-05 15:04:48 +00:00
7e6cf04a84 Revert "Multiprocessing support for NT (#110292)"
This reverts commit 881e7304d6315c17953fa5b9bc1dfe07dcb7d166.

Reverted https://github.com/pytorch/pytorch/pull/110292 on behalf of https://github.com/jbschlosser due to Address review comments ([comment](https://github.com/pytorch/pytorch/pull/110292#issuecomment-1743524901))
2023-10-02 18:27:13 +00:00
881e7304d6 Multiprocessing support for NT (#110292)
Fixes #110161

Allows NTs to be used in DataLoaders with `num_workers > 1`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110292
Approved by: https://github.com/cpuhrsch
ghstack dependencies: #110219
2023-10-02 18:14:34 +00:00
3488837ec1 Update ruff to v0.0.286 (#108058)
Updates ruff to v0.0.286 and fixes one false negative.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108058
Approved by: https://github.com/albanD
2023-08-28 22:55:56 +00:00
ecde622649 Revert "reseed all Generators in Dataloader's _worker_loop() -- via GC (#107131)"
This reverts commit 42625da5e1c29d710abf6db01c2506898043fdb2.

Reverted https://github.com/pytorch/pytorch/pull/107131 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/107131#issuecomment-1690325745))
2023-08-23 17:08:07 +00:00
42625da5e1 reseed all Generators in Dataloader's _worker_loop() -- via GC (#107131)
Alternative to https://github.com/pytorch/pytorch/pull/107034, implements @ezyang 's suggestion from https://github.com/pytorch/pytorch/pull/107034#discussion_r1292857201.

This PR addresses https://fb.workplace.com/groups/pytorch.oss.dev/posts/1699944830430051 and does a bunch of stacked changes:

- Make `Generator` class support GC;this makes all `Generator` instances tracked and accessile through Python's GC.
- Use the GC to retrieve all existing Generator instances in Dataloader's `_worker_loop` and re-seed them: this extends what is already applied to the global/default Generator, which is already re-seeded.

~TODO: a bit of docs and justification, which I'll do if this PR is mergeable.~ -- Done

CC @albanD @ezyang  as previously discussed

BC-Breaking Note
-------------------

We now re-seed all `Generator` instances within the `Dataloader` workers' loop to ensure that their RNG is different across workers.
Previously, the RNG of user-defined `Generators` would be the same across workers, which could lead to wrong training procedures. This only affects user-defined `Generators`, not the default `Generator` (which was already re-seeded).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107131
Approved by: https://github.com/ezyang
2023-08-18 10:23:23 +00:00
bb2fcc7659 unify TEST_CUDA (#106685)
Fixes #ISSUE_NUMBER
as title, unify TEST_CUDA
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106685
Approved by: https://github.com/zou3519
2023-08-10 09:01:36 +00:00
4cc1745b13 [BE] f-stringify torch/ and scripts (#105538)
This PR is a follow up on the pyupgrade series to convert more strings to use f-strings using `flynt`.

- https://docs.python.org/3/reference/lexical_analysis.html#f-strings
- https://pypi.org/project/flynt/

Command used:

```
flynt torch/ -ll 120
flynt scripts/ -ll 120
flynt tools/ -ll 120
```

and excluded `collect_env.py`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105538
Approved by: https://github.com/ezyang, https://github.com/malfet
2023-07-21 19:35:24 +00:00
73e1455327 [BE] Enable ruff's UP rules and autoformat test/ (#105434)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105434
Approved by: https://github.com/albanD
2023-07-19 20:36:06 +00:00
28098cae6b [DataLoader] Adding StackDataset (#101338)
Torch wrapping datasets list has:
`TensorDataset`
`ConcatDataset`
`ChainDataset`

`TensorDataset` is useful for stacking sets of tensors but can't work with objects without `.size()` method.

This PR proposes `StackDataset`, similar to `TensorDataset` but for a general case like `ConcatDataset`.

Possible usage of `StackDataset` is multimodal networks with different input like image+text or for staking non-tensor input and property to predict.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101338
Approved by: https://github.com/ejguan, https://github.com/NivekT
2023-05-18 00:57:12 +00:00
7ff1f3f3f6 Revert "Revert "Expandable blocks in allocator (#96995)"" (#99275)
This reverts commit 851e89c8e817f28270e0fc21d74ced9446bea747.

Differential Revision: [D45034526](https://our.internmc.facebook.com/intern/diff/D45034526)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99275
Approved by: https://github.com/eellison
2023-04-17 23:46:08 +00:00
851e89c8e8 Revert "Expandable blocks in allocator (#96995)"
This reverts commit 6a50b83b739c2d37d0f518f98b8e624eca0ea153.

Reverted https://github.com/pytorch/pytorch/pull/96995 on behalf of https://github.com/izaitsevfb due to Breaks internal tests
2023-04-16 19:23:37 +00:00
6a50b83b73 Expandable blocks in allocator (#96995)
Common advice we give for handling memory fragmentation issues is to
allocate a big block upfront to reserve memory which will get split up later.
For programs with changing tensor sizes this can be especially helpful to
avoid OOMs that happen the first time we see a new largest input and would
otherwise have to allocate new segments.

However the issue with allocating a block upfront is that is nearly impossible
to correctly estimate the size of that block. If too small, space in the block
will run out and the allocator will allocate separate blocks anyway. Too large,
and other non-PyTorch libraries might stop working because they cannot allocate
any memory.

This patch provides the same benefits as using a pre-allocating block but
without having to choose its size upfront. Using the cuMemMap-style APIs,
it adds the ability to expand the last block in a segment when more memory is
needed.

Compared to universally using cudaMallocAsync to avoid fragmentation,
this patch can fix this common fragmentation issue while preserving most
of the existing allocator behavior. This behavior can be enabled and disabled dynamically.
 This should allow users to, for instance, allocate long-lived parameters and state in individual buffers,
and put temporary state into the large expandable blocks, further reducing
fragmentation.

See inline comments for information about the implementation and its limitations.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96995
Approved by: https://github.com/eellison
2023-04-14 09:49:11 +00:00
8aa34602f7 Jetson Update for CI Redo (#94549)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94549
Approved by: https://github.com/ezyang, https://github.com/malfet
2023-02-21 17:13:38 +00:00
046e88a291 [BE] [3/3] Rewrite super() calls in test (#94592)
Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied.

- #94587
- #94588
- #94592

Also, methods with only a `super()` call are removed:

```diff
class MyModule(nn.Module):
-   def __init__(self):
-       super().__init__()
-
    def forward(self, ...):
        ...
```

Some cases that change the semantics should be kept unchanged. E.g.:

f152a79be9/caffe2/python/net_printer.py (L184-L190)

f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94592
Approved by: https://github.com/ezyang, https://github.com/seemethere
2023-02-12 22:20:53 +00:00
67d9790985 [BE] Apply almost all remaining flake8-comprehension checks (#94676)
Applies the remaining flake8-comprehension fixes and checks. This changes replace all remaining unnecessary generator expressions with list/dict/set comprehensions which are more succinct, performant, and better supported by our torch.jit compiler. It also removes useless generators such as 'set(a for a in b)`, resolving it into just the set call.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94676
Approved by: https://github.com/ezyang
2023-02-12 01:01:25 +00:00
9171f7d4cd [BE] Modernize PyTorch even more for 3.8 with pyupgrade (#94520)
Applies some more pyupgrade fixits to PyTorch

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94520
Approved by: https://github.com/ezyang
2023-02-10 18:02:50 +00:00
8fce9a09cd [BE]: pyupgrade Python to 3.8 - imports and object inheritance only (#94308)
Apply parts of pyupgrade to torch (starting with the safest changes).
This PR only does two things: removes the need to inherit from object and removes unused future imports.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94308
Approved by: https://github.com/ezyang, https://github.com/albanD
2023-02-07 21:10:56 +00:00
04689ae209 [CI][ROCm] skip multiprocessing tests that trigger hangs (#92101)
Skip tests affected by #90940.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92101
Approved by: https://github.com/huydhn
2023-01-13 22:39:00 +00:00
a775204499 Fix issue 38095 TODO in test_dataloader.py (#90084)
Fix TODO related to https://github.com/pytorch/pytorch/issues/38095

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90084
Approved by: https://github.com/clee2000, https://github.com/NivekT
2022-12-03 03:01:52 +00:00
21dd311077 Add a mode to rerun all disabled tests (without running anything else) (#88646)
Rerun all disabled test to gather their latest result so that we can close disabled tickets automatically. When running under this mode (RERUN_DISABLED_TESTS=true), only disabled tests are run while the rest are skipped `<skipped message="Test is enabled but --rerun-disabled-tests verification mode is set, so only disabled tests are run" type="skip"/>`

The logic is roughly as follows, the test runs multiple times (n=50)

* If the disabled test passes, and it's flaky, do nothing because it's still flaky.  In the test report, we'll see the test passes with the following skipped message:
```
<testcase classname="TestMultiprocessing" file="test_multiprocessing.py" line="357" name="test_fs" time="0.000" timestamp="0001-01-01T00:00:00">
    <skipped message="{&quot;flaky&quot;: True, &quot;num_red&quot;: 4, &quot;num_green&quot;: 0, &quot;max_num_retries&quot;: 3, &quot;rerun_disabled_test&quot;: true}" type="skip"/>
</testcase>
```

* If the disabled test passes every single time, and it is not flaky anymore, mark it so that it can be closed later.  We will see the test runs and passes, i.e.
```
<testcase classname="TestCommonCUDA" name="test_out_warning_linalg_lu_factor_cuda" time="0.170" file="test_ops.py" />
```

* If the disabled test fails after all retries, this is also expected. So only report this but don't fail the job (because we don't care about red signals here), we'll see the test is skipped (without the `flaky` field), i.e.
```
<testcase classname="TestMultiprocessing" file="test_multiprocessing.py" line="357" name="test_fs" time="0.000" timestamp="0001-01-01T00:00:00">
    <skipped message="{&quot;num_red&quot;: 4, &quot;num_green&quot;: 0, &quot;max_num_retries&quot;: 3, &quot;rerun_disabled_test&quot;: true}" type="skip"/>
</testcase>
```

This runs at the same schedule as `mem_leak_check` (daily).  The change to update test stats, and (potentially) grouping on HUD will come in separated PRs.

### Testing

* pull https://github.com/pytorch/pytorch/actions/runs/3447434434
* trunk https://github.com/pytorch/pytorch/actions/runs/3447434928
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88646
Approved by: https://github.com/clee2000
2022-11-15 05:08:26 +00:00
be8d88f8d0 [DataLoader] Removing DataLoader2 related code (#88848)
Removing these lines of code as `DataLoader2` has been added to [TorchData](https://github.com/pytorch/data). I'm importing this to confirm it will not impact internal codes.

Differential Revision: [D41201578](https://our.internmc.facebook.com/intern/diff/D41201578)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88848
Approved by: https://github.com/ejguan
2022-11-11 22:27:01 +00:00
4c19981316 [DataPipe] Reset Shuffler's iterator when NotStarted (#83535)
This PR changes the behavior of `IterDataPipe` to always invoke `reset` for the state of `NotStarted`. The main reason is we normally put lazy initialization code into `reset` function. Even for the state of `NotStarted`, we should invoke `reset` to initialize those lazy variables. Otherwise, we have to manually determine if the state is `NotStarted` or `Iterating` in `__iter__` function and only manually invoke `reset` in the state of `NotStarted`.

This PR also makes `Shuffler` is able to serialize with `buffer` and `rng_state`.

The following part is removed:

~I am also add `_snapshot_state` into serialization state and during `__setstate__` only change the state to `Restored` if the original state is `Iterating`. Especially, for the case of deserializing/serializing `NotStarted` DataPipe (multiprocessing), we would invoke `set_seed` for `Shuffler`. We need the `DataPipe` remains as `NotStarted` to properly `reset`.~

I am listing all the expected behavior state transition below:
- Initial state: `NotStarted`
  - `iter` -> Call `reset` and change the state to `Iterating`
  - serialize/deserialize -> Keep the state as `NotStarted` (will `reset` if `iter` is called afterwards)
- Initial state: `Iterating`
  - `iter` -> Call `reset` and keep the state to `Iterating`
  - serialize/deserialize -> Change the state as `Restored`
- Initial state: `Restored`
  - `iter` -> Only change the state to `Iterating`
  - serialize/deserialize -> Not allowed

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83535
Approved by: https://github.com/NivekT
2022-08-25 19:45:41 +00:00
4f00c7589d Fix and unskip dataloader tests on ARM (#83125)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83125
Approved by: https://github.com/albanD
2022-08-12 19:21:59 +00:00
2255911f8a Make M1 tests green (#82213)
This is skipping all the failing tests and add a new master job to test on M1

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82213
Approved by: https://github.com/seemethere, https://github.com/soulitzer, https://github.com/malfet
2022-08-05 16:12:08 +00:00
2785818f5a Choose test affinity based on current affinity (#80327)
This avoids test failures in cgroup environments

Fixes https://github.com/pytorch/pytorch/issues/44368

CC @VitalyFedyunin

new PR after #44369 got closed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80327
Approved by: https://github.com/VitalyFedyunin
2022-07-20 16:45:57 +00:00
270069cfb9 Fix DataLoader flaky tests that run out of shared memory (#81660)
Fixes #70517 #70547 #74598

I basically disable the `loader_kill` test that would prevent proper clean up, which would be handled by Python interpreter in normal python script execution. However, in pytest framework, the python interpreter never exits and execute clean up on the shared memory using `/multiprocessing/resource_tracker.py`. So, we might encounter the running out of shared memory in some cases.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81660
Approved by: https://github.com/NivekT
2022-07-19 15:38:20 +00:00
3064982fb8 Support percentages in random_split (#78877)
Fixes #78510

This PR adds support for using fractions with `random_split`. This should be completely backwards-compatible as the fractional-style splitting is only applied when the sum across the input lengths is lower than 1.0
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78877
Approved by: https://github.com/ejguan
2022-06-16 02:00:25 +00:00
c978b609f7 [ci] remove IN_CI env var
The conventional env var to set is CI. Both circle and GHA set it, so
IN_CI is unnecessary

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79229

Approved by: https://github.com/janeyx99
2022-06-11 17:16:30 +00:00
883f8ef62e [DataLoader] DataLoader now automatically apply sharding to DataPipes
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78631

Approved by: https://github.com/ejguan, https://github.com/NivekT
2022-06-02 17:40:29 +00:00