Commit Graph

10 Commits

Author SHA1 Message Date
8db9dfa2d7 Flip default value for mypy disallow_untyped_defs [9/11] (#127846)
See #127836 for details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127846
Approved by: https://github.com/ezyang
ghstack dependencies: #127842, #127843, #127844, #127845
2024-06-08 18:50:06 +00:00
93e249969b [BE] enable ruff rule RSE and remove useless parentheses in raise statements (#124261)
Remove useless parentheses in `raise` statements if the exception type is raised with no argument.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124261
Approved by: https://github.com/albanD
2024-04-17 19:29:34 +00:00
92e7f79609 Doc: Add and Fix docstrings for torch.util.data files (#112817)
Fixes #112635

Fix docstrings for `torch.utils.data` files.

```
Before:
> pydocstyle torch/utils/data/graph.py --count
Before: 5
After: 1

> pydocstyle torch/utils/data/graph_settings.py --count
Before: 8
After: 3

> pydocstyle torch/utils/data/dataloader.py --count
Before: 12
After: 6

> pydocstyle torch/utils/data/dataset.py --count
Before: 28
After: 23

> pydocstyle torch/utils/data/sampler.py --count
Before: 24
After: 19

> pydocstyle torch/utils/data/_utils/signal_handling.py --count
Before: 1
After: 0

> pydocstyle torch/utils/data/_utils/__init__.py --count
Before: 2
After: 0

> pydocstyle torch/utils/data/_utils/collate.py --count
Before: 20
After: 6

> pydocstyle torch/utils/data/_utils/fetch.py --count
Before: 3
After: 0

> pydocstyle torch/utils/data/_utils/pin_memory.py --count
Before: 4
After: 1

> pydocstyle torch/utils/data/datapipes/_decorator.py --count
Before: 19
After: 16

> pydocstyle torch/utils/data/datapipes/_hook_iterator.py --count
Before: 13
After: 0

> pydocstyle torch/utils/data/datapipes/_typing.py --count
Before: 17
After: 4

> pydocstyle torch/utils/data/datapipes/gen_pyi.py --count
Before: 19
After: 4
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112817
Approved by: https://github.com/kit1980
2023-11-07 17:59:56 +00:00
5b1cedacde [BE] [2/3] Rewrite super() calls in functorch and torch (#94588)
Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied.

- #94587
- #94588
- #94592

Also, methods with only a `super()` call are removed:

```diff
class MyModule(nn.Module):
-   def __init__(self):
-       super().__init__()
-
    def forward(self, ...):
        ...
```

Some cases that change the semantics should be kept unchanged. E.g.:

f152a79be9/caffe2/python/net_printer.py (L184-L190)

f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94588
Approved by: https://github.com/ezyang, https://github.com/albanD
2023-02-10 21:16:33 +00:00
8fce9a09cd [BE]: pyupgrade Python to 3.8 - imports and object inheritance only (#94308)
Apply parts of pyupgrade to torch (starting with the safest changes).
This PR only does two things: removes the need to inherit from object and removes unused future imports.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94308
Approved by: https://github.com/ezyang, https://github.com/albanD
2023-02-07 21:10:56 +00:00
52a2b61203 Fix fetch function which breaks user code (#85099)
The [fastNLP](https://github.com/fastnlp/fastNLP/blob/v0.6.0/fastNLP/core/batch.py#L51) model uses DataSetGetter to fetch data from the dataset. The following code breaks because of https://github.com/pytorch/pytorch/pull/84301:

```
from fastNLP.io.pipe.qa import CMRC2018BertPipe
input_dir = os.path.join(os.path.dirname(os.path.abspath(__file__)), ".data", "cmrc2018-sim")
data_bundle = CMRC2018BertPipe().process_from_file(paths=input_dir)
data_bundle.rename_field('chars', 'words')
data_bundle.get_dataset('dev')
dataset = DataSetGetter(dataset, as_numpy)
dataiter = torch.utils.data.DataLoader(dataset=dataset)
for batch in dataiter:
    # data-processing...
```

This is because for the `DataSetGetter` class, the following condition holds:
```
# hasattr(dataset_getter, '__getitems__') == True
# dataset_getter.__getitems__ == None
```

This PR adds an additional check to make sure `__getitems__` is only called when it is not None.

This error was found by the torchbench nightly CI, original error stack trace:
```
ERROR: test_fastNLP_Bert_train_cuda (__main__.TestBenchmark)
----------------------------------------------------------------------
components._impl.workers.subprocess_rpc.ChildTraceException: Traceback (most recent call last):
  File "/home/circleci/project/components/_impl/workers/subprocess_rpc.py", line 470, in _run_block
    exec(  # noqa: P204
  File "<subprocess-worker>", line 35, in <module>
  File "<subprocess-worker>", line 12, in _run_in_worker_f
  File "/home/circleci/project/torchbenchmark/util/model.py", line 16, in __call__
    obj = type.__call__(cls, *args, **kwargs)
  File "/home/circleci/project/torchbenchmark/models/fastNLP_Bert/__init__.py", line 93, in __init__
    self.example_inputs = self._prefetch(example_inputs)
  File "/home/circleci/project/torchbenchmark/models/fastNLP_Bert/__init__.py", line 133, in _prefetch
    for batch_x, batch_y in example_inputs:
  File "/home/circleci/miniconda3/lib/python3.8/site-packages/fastNLP/core/batch.py", line 266, in __iter__
    for indices, batch_x, batch_y in self.dataiter:
  File "/home/circleci/miniconda3/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 681, in __next__
    data = self._next_data()
  File "/home/circleci/miniconda3/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 719, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/home/circleci/miniconda3/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 56, in fetch
    data = self.dataset.__getitems__(possibly_batched_index)
TypeError: 'NoneType' object is not callable
```

Full error log: https://app.circleci.com/pipelines/github/pytorch/benchmark/5143/workflows/0676f36d-0ab4-42bd-adb4-90e6b0df76d1/jobs/5293
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85099
Approved by: https://github.com/ejguan
2022-09-15 21:48:28 +00:00
335033f718 asyncio increase throughput (pytorch change) (#84301)
Summary: This diffs add a check in the fetcher, that if the dataset to be fetched has a function "getitems" then use it for fetching a batch of elements, as oppose to one by one. This is benefical for io bounded usage.

Differential Revision: D39145980

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84301
Approved by: https://github.com/VitalyFedyunin
2022-09-08 17:00:45 +00:00
d2e03051e0 Fix fecher continue next after StopIterator (#59313)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/59312

cc VitalyFedyunin dzhulgakov

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59313

Reviewed By: jbschlosser

Differential Revision: D28837762

Pulled By: dzhulgakov

fbshipit-source-id: 95cc29359aaba0f24ca169c5495ab5c6c53a0dce
2021-06-02 19:14:25 -07:00
f522bde121 Replace references to _DataLoaderIter with _BaseDataLoaderIter (#27105)
Summary:
Back in April, malmaud added type annotations for `dataloader.py`. However, at about the same time, SsnL in https://github.com/pytorch/pytorch/issues/19228 replaced `_DataLoaderIter` with `_BaseDataLoaderIter` and two subclasses, `_SingleProcessDataLoaderIter`, and `_MultiProcessingDataLoaderIter`. However - probably because these changes happened in parallel at roughly the same time, the type stubs and several other references in the codebase were never updated to match this refactoring.

I've gone ahead and done the updates to reflect the refactoring in https://github.com/pytorch/pytorch/issues/19228, which fixes the specific type stub/impelementation mismatch pointed out in https://github.com/pytorch/pytorch/issues/26673, although not the broader problem that pytorch doesn't have a test to make sure that the `.pyi` type stub files match the real API defined in `.py` files.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27105

Differential Revision: D17813641

Pulled By: ezyang

fbshipit-source-id: ed7ac025c8d6ad3f298dd073347ec83bb4b6600c
2019-10-08 12:09:02 -07:00
058beae411 Add IterableDataset (#19228)
Summary:
This is a modified version of https://github.com/pytorch/pytorch/pull/14705 since commit structure for that PR is quite messy.

1. Add `IterableDataset`.
3. So we have 2 data loader mods: `Iterable` and `Map`.

    1. `Iterable` if the `dataset` is an instance of `IterableDataset`
    2. `Map` o.w.

3. Add better support for non-batch loading (i.e., `batch_size=None` and `batch_sampler=None`). This is useful in doing things like bulk loading.
3. Refactor `DataLoaderIter` into two classes, `_SingleProcessDataLoaderIter` and `_MultiProcessingDataLoaderIter`. Rename some methods to be more generic, e.g., `get_batch` -> `get_data`.
4. Add `torch.utils.data.get_worker_info` which returns worker information in a worker proc (e.g., worker id, dataset obj copy, etc.) and can be used in `IterableDataset.__iter__` and `worker_init_fn` to do per-worker configuration.
5. Add `ChainDataset`, which is the analog of `ConcatDataset` for `IterableDataset`.
7. Import torch.utils.data in `torch/__init__.py`
9. data loader examples and documentations
10. Use `get_worker_info` to detect whether we are in a worker process in `default_collate`

Closes https://github.com/pytorch/pytorch/issues/17909, https://github.com/pytorch/pytorch/issues/18096, https://github.com/pytorch/pytorch/issues/19946, and some of https://github.com/pytorch/pytorch/issues/13023
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19228

Reviewed By: bddppq

Differential Revision: D15058152

fbshipit-source-id: 9e081a901a071d7e4502b88054a34b450ab5ddde
2019-06-20 20:12:44 -07:00