pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-11-19 10:04:58 +08:00

Author	SHA1	Message	Date
linhaifeng	3e7a66fae1	[BugFix][Refactor] fix several instances which use f = open(...) without a corresponding f.close() (#167423 ) This pattern can lead to potential file descriptor leaks, which can cause resource exhaustion or other unpredictable issues Pull Request resolved: https://github.com/pytorch/pytorch/pull/167423 Approved by: https://github.com/cyyever, https://github.com/Skylion007	2025-11-11 11:27:59 +00:00
Xuehai Pan	c73a92fbf5	[BE][CI] bump `ruff` to 0.9.2: multiline `assert` statements (#144546 ) Reference: https://docs.astral.sh/ruff/formatter/black/#assert-statements > Unlike Black, Ruff prefers breaking the message over breaking the assertion, similar to how both Ruff and Black prefer breaking the assignment value over breaking the assignment target: > > ```python > # Input > assert ( > len(policy_types) >= priority + num_duplicates > ), f"This tests needs at least {priority+num_duplicates} many types." > > > # Black > assert ( > len(policy_types) >= priority + num_duplicates > ), f"This tests needs at least {priority+num_duplicates} many types." > > # Ruff > assert len(policy_types) >= priority + num_duplicates, ( > f"This tests needs at least {priority + num_duplicates} many types." > ) > ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/144546 Approved by: https://github.com/malfet	2025-02-27 20:46:16 +00:00
Tom Ritchford	498a7808ff	Fix unused Python variables outside torch/ and test/ (#136359 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/136359 Approved by: https://github.com/albanD	2024-12-11 17:10:23 +00:00
Jez Ng	b346e99376	remove fast_flush arguments (#135387 ) I've removed them from upstream Triton in https://github.com/triton-lang/triton/pull/4485. It looks like most places in the code use the default value of `fast_flush=True` anyway, though there are two PRs from @pearu that use `False`. To my knowledge, there's no reason to use the `False` value. Differential Revision: [D62325778](https://our.internmc.facebook.com/intern/diff/D62325778) Pull Request resolved: https://github.com/pytorch/pytorch/pull/135387 Approved by: https://github.com/nmacchioni, https://github.com/jansel	2024-09-13 08:13:46 +00:00
Nicolas Macchioni	5cb05a82b4	[BC breaking] move benchmarking + prefer inductor path (#132827 ) move benchmarking out of `torch._inductor.runtime.runtime_utils` and into `torch._inductor.runtime.benchmarking`, and prefer this path over directly accessing Triton's benchmarking Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/132827 Approved by: https://github.com/eellison	2024-08-08 00:47:45 +00:00
Xuehai Pan	973037be6a	[BE][Easy] apply autofix for ruff rules unnecessary-collection-call (C408): `list()` / `tuple()` / `dict()` (#130199 ) This PR changes the empty collection factory call to Python literals: - `list()` -> `[]` - `tuple()` -> `()` - `dict()` -> `{}` The Python literals are more performant and safer. For example, the bytecode for building an empty dictionary: ```bash $ python3 -m dis - <<EOS import collections d1 = {} d2 = dict() dict = collections.OrderedDict d3 = dict() EOS ``` ```text 0 0 RESUME 0 1 2 LOAD_CONST 0 (0) 4 LOAD_CONST 1 (None) 6 IMPORT_NAME 0 (collections) 8 STORE_NAME 0 (collections) 3 10 BUILD_MAP 0 12 STORE_NAME 1 (d1) 4 14 PUSH_NULL 16 LOAD_NAME 2 (dict) 18 CALL 0 26 STORE_NAME 3 (d2) 6 28 LOAD_NAME 0 (collections) 30 LOAD_ATTR 8 (OrderedDict) 50 STORE_NAME 2 (dict) 7 52 PUSH_NULL 54 LOAD_NAME 2 (dict) 56 CALL 0 64 STORE_NAME 5 (d3) 66 RETURN_CONST 1 (None) ``` The dict literal `{}` only has one bytecode `BUILD_MAP`, while the factory call `dict()` has three `PUSH_NULL + LOAD_NAME + CALL`. Also, the factory call is not safe if users override the `dict` name in `locals` or `globals` (see the example of replacing with `OrderedDict` above). Pull Request resolved: https://github.com/pytorch/pytorch/pull/130199 Approved by: https://github.com/malfet	2024-07-11 17:30:28 +00:00
Xuehai Pan	26f4f10ac8	[5/N][Easy] fix typo for `usort` config in `pyproject.toml` (`kown` -> `known`): sort torch (#127126 ) The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127126 Approved by: https://github.com/kit1980	2024-05-27 14:49:57 +00:00
PyTorch MergeBot	55c0ab2887	Revert "[5/N][Easy] fix typo for `usort` config in `pyproject.toml` (`kown` -> `known`): sort torch (#127126 )" This reverts commit 7763c83af67eebfdd5185dbe6ce15ece2b992a0f. Reverted https://github.com/pytorch/pytorch/pull/127126 on behalf of https://github.com/XuehaiPan due to Broken CI ([comment](https://github.com/pytorch/pytorch/pull/127126#issuecomment-2133044286))	2024-05-27 09:22:08 +00:00
Xuehai Pan	7763c83af6	[5/N][Easy] fix typo for `usort` config in `pyproject.toml` (`kown` -> `known`): sort torch (#127126 ) The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127126 Approved by: https://github.com/kit1980 ghstack dependencies: #127122, #127123, #127124, #127125	2024-05-27 04:22:18 +00:00
Pearu Peterson	cf6041e942	Use weakref in storing tensors as keys (follow-up to #111470 ) (#112076 ) This PR addresses the discussion items in https://github.com/pytorch/pytorch/pull/111470#discussion_r1369008167, that is, - use weakref when storing tensors as keys, - add `storage_offset` to the key data, - and revise the description of the `TensorAsKey` utility. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112076 Approved by: https://github.com/cpuhrsch ghstack dependencies: #112154	2023-10-30 19:16:05 +00:00
Pearu Peterson	6382011843	Add NVIDIA A100 optimized meta parameters to bsr_dense_mm (#111760 ) As in the title. The figures below illustrate the performance differences of bsr_dense_mm with optimized parameters and bsr_dense_mm with default parameters (GPU: NVIDIA A100-SXM4-80GB). The first figure represents the performance equilibrium point in BSR tensor sparsity at which value bsr_dense_mm have the same performance characteristics as torch.matmul. The second figure represents speedups from using optimized meta parameters in bsr_dense_mm at its performance equilibrium points with respect to bsr_dense_mm with default meta parameters. In sum, this PR speeds up `bsr_dense_mm` about 50 % depending on the bsr tensor shape and blocksize and lowers the performance equilibrium points of BSR tensor sparsity and strided tensor for matmul operations. <img src="https://github.com/pytorch/pytorch/assets/402156/6fe9d35f-dd21-4aa0-bb01-6ee257254453" width="48%"> <img src="https://github.com/pytorch/pytorch/assets/402156/506921c6-3770-4209-ad3d-498d2ae4989d" width="48%"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/111760 Approved by: https://github.com/cpuhrsch ghstack dependencies: #110396, #111470, #111489	2023-10-23 23:52:49 +00:00
Pearu Peterson	6078ed95cc	Use lru_cache to cache indices data for bsr_scatter_mm. (#111470 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111470 Approved by: https://github.com/cpuhrsch ghstack dependencies: #110396	2023-10-23 23:52:49 +00:00
Pearu Peterson	d4708a6da7	Add scatter_mm and bsr_scatter_mm operations. (#110396 ) This PR introduces `scatter_mm` operation (compute `mm` of arbitrary pairs of tensors given in batches of tensors) that is used to implement `bsr_scatter_mm` that is equivalent to `bsr_dense_mm` (the `mm` operation on bsr and strided tensors). The implementation is provided both in Triton (when tensor dimensions are multiples of 16) and in PyTorch (otherwise). The figures below illustrate the performance differences of `bsr_scatter_mm` and `bsr_dense_mm` (GPU: `NVIDIA GeForce RTX 2060 SUPER`). The first figure represents the performance equilibrium point in BSR tensor sparsity at which value `bsr_scatter_mm` or `bsr_dense_mm` have the same performance characteristics as `torch.matmul`. The second figure represents speedups from using `bsr_scatter_mm` at its performance equilibrium points with respect to `bsr_dense_mm`. <img src="https://github.com/pytorch/pytorch/assets/402156/526d182e-937f-4812-a6c4-904f52d6d5ab" width="48%"> <img src="https://github.com/pytorch/pytorch/assets/402156/ccb606ab-1f3f-4133-887c-b56285f4f168" width="48%"> The same figures for GPU card `NVIDIA A100-SXM4-80GB`: <img src="https://github.com/pytorch/pytorch/assets/402156/25466f1d-df34-4d1c-a975-afb478e4d9f0" width="48%"> <img src="https://github.com/pytorch/pytorch/assets/402156/6ada91f0-a20f-4f0d-8a48-1f4ccc60d08e" width="48%"> In sum: - `bsr_scatter_mm` is about 2x faster than `bsr_dense_mm` for small block sizes of 16 and 32 and large tensors [GPU: `NVIDIA GeForce RTX 2060 SUPER`]. - `bsr_scatter_mm` is up to 2x faster than `bsr_dense_mm` for small block sizes of 16 and large tensors [GPU: `NVIDIA A100-SXM4-80GB`]. - `bsr_dense_mm` is up to 20 % faster than `bsr_scatter_mm` for block sizes of 64 or larger [GPU: `NVIDIA GeForce RTX 2060 SUPER`]. - However, `bsr_dense_mm` fails with `OutOfResources` exception for block sizes of 256 or larger whereas `bsr_scatter_mm` succeeds. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110396 Approved by: https://github.com/cpuhrsch	2023-10-23 19:45:30 +00:00

13 Commits