pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-11-02 23:15:01 +08:00

Author	SHA1	Message	Date
Aaron Orenstein	8db9dfa2d7	Flip default value for mypy disallow_untyped_defs [9/11] (#127846 ) See #127836 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127846 Approved by: https://github.com/ezyang ghstack dependencies: #127842, #127843, #127844, #127845	2024-06-08 18:50:06 +00:00
Xuehai Pan	93e249969b	[BE] enable `ruff` rule `RSE` and remove useless parentheses in `raise` statements (#124261 ) Remove useless parentheses in `raise` statements if the exception type is raised with no argument. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124261 Approved by: https://github.com/albanD	2024-04-17 19:29:34 +00:00
Aryan Gupta	92e7f79609	Doc: Add and Fix docstrings for torch.util.data files (#112817 ) Fixes #112635 Fix docstrings for `torch.utils.data` files. ``` Before: > pydocstyle torch/utils/data/graph.py --count Before: 5 After: 1 > pydocstyle torch/utils/data/graph_settings.py --count Before: 8 After: 3 > pydocstyle torch/utils/data/dataloader.py --count Before: 12 After: 6 > pydocstyle torch/utils/data/dataset.py --count Before: 28 After: 23 > pydocstyle torch/utils/data/sampler.py --count Before: 24 After: 19 > pydocstyle torch/utils/data/_utils/signal_handling.py --count Before: 1 After: 0 > pydocstyle torch/utils/data/_utils/__init__.py --count Before: 2 After: 0 > pydocstyle torch/utils/data/_utils/collate.py --count Before: 20 After: 6 > pydocstyle torch/utils/data/_utils/fetch.py --count Before: 3 After: 0 > pydocstyle torch/utils/data/_utils/pin_memory.py --count Before: 4 After: 1 > pydocstyle torch/utils/data/datapipes/_decorator.py --count Before: 19 After: 16 > pydocstyle torch/utils/data/datapipes/_hook_iterator.py --count Before: 13 After: 0 > pydocstyle torch/utils/data/datapipes/_typing.py --count Before: 17 After: 4 > pydocstyle torch/utils/data/datapipes/gen_pyi.py --count Before: 19 After: 4 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/112817 Approved by: https://github.com/kit1980	2023-11-07 17:59:56 +00:00
Xuehai Pan	5b1cedacde	[BE] [2/3] Rewrite `super()` calls in functorch and torch (#94588 ) Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied. - #94587 - #94588 - #94592 Also, methods with only a `super()` call are removed: ```diff class MyModule(nn.Module): - def __init__(self): - super().__init__() - def forward(self, ...): ... ``` Some cases that change the semantics should be kept unchanged. E.g.: `f152a79be9/caffe2/python/net_printer.py (L184-L190)` `f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/94588 Approved by: https://github.com/ezyang, https://github.com/albanD	2023-02-10 21:16:33 +00:00
Aaron Gokaslan	8fce9a09cd	[BE]: pyupgrade Python to 3.8 - imports and object inheritance only (#94308 ) Apply parts of pyupgrade to torch (starting with the safest changes). This PR only does two things: removes the need to inherit from object and removes unused future imports. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94308 Approved by: https://github.com/ezyang, https://github.com/albanD	2023-02-07 21:10:56 +00:00
Xu Zhao	52a2b61203	Fix fetch function which breaks user code (#85099 ) The [fastNLP](https://github.com/fastnlp/fastNLP/blob/v0.6.0/fastNLP/core/batch.py#L51) model uses DataSetGetter to fetch data from the dataset. The following code breaks because of https://github.com/pytorch/pytorch/pull/84301: ``` from fastNLP.io.pipe.qa import CMRC2018BertPipe input_dir = os.path.join(os.path.dirname(os.path.abspath(__file__)), ".data", "cmrc2018-sim") data_bundle = CMRC2018BertPipe().process_from_file(paths=input_dir) data_bundle.rename_field('chars', 'words') data_bundle.get_dataset('dev') dataset = DataSetGetter(dataset, as_numpy) dataiter = torch.utils.data.DataLoader(dataset=dataset) for batch in dataiter: # data-processing... ``` This is because for the `DataSetGetter` class, the following condition holds: ``` # hasattr(dataset_getter, '__getitems__') == True # dataset_getter.__getitems__ == None ``` This PR adds an additional check to make sure `__getitems__` is only called when it is not None. This error was found by the torchbench nightly CI, original error stack trace: ``` ERROR: test_fastNLP_Bert_train_cuda (__main__.TestBenchmark) ---------------------------------------------------------------------- components._impl.workers.subprocess_rpc.ChildTraceException: Traceback (most recent call last): File "/home/circleci/project/components/_impl/workers/subprocess_rpc.py", line 470, in _run_block exec( # noqa: P204 File "<subprocess-worker>", line 35, in <module> File "<subprocess-worker>", line 12, in _run_in_worker_f File "/home/circleci/project/torchbenchmark/util/model.py", line 16, in __call__ obj = type.__call__(cls, args, *kwargs) File "/home/circleci/project/torchbenchmark/models/fastNLP_Bert/__init__.py", line 93, in __init__ self.example_inputs = self._prefetch(example_inputs) File "/home/circleci/project/torchbenchmark/models/fastNLP_Bert/__init__.py", line 133, in _prefetch for batch_x, batch_y in example_inputs: File "/home/circleci/miniconda3/lib/python3.8/site-packages/fastNLP/core/batch.py", line 266, in __iter__ for indices, batch_x, batch_y in self.dataiter: File "/home/circleci/miniconda3/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 681, in __next__ data = self._next_data() File "/home/circleci/miniconda3/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 719, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "/home/circleci/miniconda3/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 56, in fetch data = self.dataset.__getitems__(possibly_batched_index) TypeError: 'NoneType' object is not callable ``` Full error log: https://app.circleci.com/pipelines/github/pytorch/benchmark/5143/workflows/0676f36d-0ab4-42bd-adb4-90e6b0df76d1/jobs/5293 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85099 Approved by: https://github.com/ejguan	2022-09-15 21:48:28 +00:00
Junteng Jia	335033f718	asyncio increase throughput (pytorch change) (#84301 ) Summary: This diffs add a check in the fetcher, that if the dataset to be fetched has a function "getitems" then use it for fetching a batch of elements, as oppose to one by one. This is benefical for io bounded usage. Differential Revision: D39145980 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84301 Approved by: https://github.com/VitalyFedyunin	2022-09-08 17:00:45 +00:00
Yi Wang	d2e03051e0	Fix fecher continue next after StopIterator (#59313 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59312 cc VitalyFedyunin dzhulgakov Pull Request resolved: https://github.com/pytorch/pytorch/pull/59313 Reviewed By: jbschlosser Differential Revision: D28837762 Pulled By: dzhulgakov fbshipit-source-id: 95cc29359aaba0f24ca169c5495ab5c6c53a0dce	2021-06-02 19:14:25 -07:00
Nathan Goldbaum	f522bde121	Replace references to _DataLoaderIter with _BaseDataLoaderIter (#27105 ) Summary: Back in April, malmaud added type annotations for `dataloader.py`. However, at about the same time, SsnL in https://github.com/pytorch/pytorch/issues/19228 replaced `_DataLoaderIter` with `_BaseDataLoaderIter` and two subclasses, `_SingleProcessDataLoaderIter`, and `_MultiProcessingDataLoaderIter`. However - probably because these changes happened in parallel at roughly the same time, the type stubs and several other references in the codebase were never updated to match this refactoring. I've gone ahead and done the updates to reflect the refactoring in https://github.com/pytorch/pytorch/issues/19228, which fixes the specific type stub/impelementation mismatch pointed out in https://github.com/pytorch/pytorch/issues/26673, although not the broader problem that pytorch doesn't have a test to make sure that the `.pyi` type stub files match the real API defined in `.py` files. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27105 Differential Revision: D17813641 Pulled By: ezyang fbshipit-source-id: ed7ac025c8d6ad3f298dd073347ec83bb4b6600c	2019-10-08 12:09:02 -07:00
Tongzhou Wang	058beae411	Add IterableDataset (#19228 ) Summary: This is a modified version of https://github.com/pytorch/pytorch/pull/14705 since commit structure for that PR is quite messy. 1. Add `IterableDataset`. 3. So we have 2 data loader mods: `Iterable` and `Map`. 1. `Iterable` if the `dataset` is an instance of `IterableDataset` 2. `Map` o.w. 3. Add better support for non-batch loading (i.e., `batch_size=None` and `batch_sampler=None`). This is useful in doing things like bulk loading. 3. Refactor `DataLoaderIter` into two classes, `_SingleProcessDataLoaderIter` and `_MultiProcessingDataLoaderIter`. Rename some methods to be more generic, e.g., `get_batch` -> `get_data`. 4. Add `torch.utils.data.get_worker_info` which returns worker information in a worker proc (e.g., worker id, dataset obj copy, etc.) and can be used in `IterableDataset.__iter__` and `worker_init_fn` to do per-worker configuration. 5. Add `ChainDataset`, which is the analog of `ConcatDataset` for `IterableDataset`. 7. Import torch.utils.data in `torch/__init__.py` 9. data loader examples and documentations 10. Use `get_worker_info` to detect whether we are in a worker process in `default_collate` Closes https://github.com/pytorch/pytorch/issues/17909, https://github.com/pytorch/pytorch/issues/18096, https://github.com/pytorch/pytorch/issues/19946, and some of https://github.com/pytorch/pytorch/issues/13023 Pull Request resolved: https://github.com/pytorch/pytorch/pull/19228 Reviewed By: bddppq Differential Revision: D15058152 fbshipit-source-id: 9e081a901a071d7e4502b88054a34b450ab5ddde	2019-06-20 20:12:44 -07:00

10 Commits