pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Author	SHA1	Message	Date
Xiaodong Wang	0a94bb432e	[ROCm] CK Flash Attention Backend (#143695 ) Replace https://github.com/pytorch/pytorch/pull/138947 for re-import. Replaces https://github.com/ROCm/pytorch/pull/1592 This PR contains the initial implementation of SDPA with composable_kernel backend. The CK path can be forced by simply calling torch.backends.cuda.preferred_rocm_fa_library("ck"). Similarly, you can force the incumbent aotriton implementation by passing in "aotriton" or "default". As you'd expect, not setting this option will result in aotriton to be used as the backend. In the case of CK, if pytorch deems flash attention usable, then it will use the CK path in all the same places aotriton would have been used. This PR makes no changes to the heuristics which select which attention scheme to use (i.e. flash attention vs memory efficient attention vs math etc etc). It only gets called when flash attention is both enabled (via USE_FLASH_ATTENTION) and is selected at runtime by the existing heuristics. Files located in pytorch/aten/src/ATen/native/transformers/hip/flash_attn/ck/mha* have been pulled from https://github.com/Dao-AILab/flash-attention courtesy of @tridao's hard work who is the co-author NOTE: In order to use this backend, the user MUST set USE_CK_FLASH_ATTENTION=1 in their environment when they build PyTorch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143695 Approved by: https://github.com/malfet Co-authored-by: Andy Lugo <Andy.LugoReyes@amd.com> Co-authored-by: Jithun Nair <jithun.nair@amd.com>	2025-01-03 22:01:36 +00:00
Huy Do	3251171ae8	Make whl metadata public readable (#144164 ) After https://github.com/pytorch/pytorch/pull/143677/files#r1902138480 lands, the new nightly wheel metadata is not readable publicly causing pip install to fail, for example https://github.com/pytorch/pytorch/actions/runs/12603415308/job/35128414909. FBGEMM folks are also noticed this failure on their end (cc @q10) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144164 Approved by: https://github.com/clee2000	2025-01-03 21:08:49 +00:00
drisspg	9bf2a9a616	[ScaledMM] Fix NaNs in test for garbage input data (#144042 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144042 Approved by: https://github.com/janeyx99	2025-01-03 21:02:25 +00:00
Jay Zhang	b75f32b848	Update TorchDynamo-based ONNX Exporter memory usage example code. (#144139 ) Address related comments earlier. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144139 Approved by: https://github.com/justinchuby Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com>	2025-01-03 20:41:36 +00:00
bobrenjc93	64bffb3124	remove allow-untyped-defs onnx/_internal/exporter/_fx_passes.py (#144134 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144134 Approved by: https://github.com/Skylion007	2025-01-03 20:18:40 +00:00
bobrenjc93	64b197b603	remove allow-untyped-defs from export/_remove_auto_functionalized_pass.py (#144135 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144135 Approved by: https://github.com/Skylion007	2025-01-03 20:08:11 +00:00
bobrenjc93	9b8a4e7141	remove allow-untyped-defs from torch/onnx/operators.py (#144133 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144133 Approved by: https://github.com/Skylion007	2025-01-03 20:07:56 +00:00
bobrenjc93	6e09d32c00	remove allow-untyped-defs from torch/jit/_passes/_property_propagation.py (#144132 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144132 Approved by: https://github.com/Skylion007	2025-01-03 20:07:37 +00:00
Wanchao Liang	eb7a303d21	[dtensor] expose the __create_chunk_list__ in the doc (#144100 ) as titled, this PR expose this dunder method as a public API in the doc, so that different checkpoint implementations can leverage this protocol, instead of exposing a separate API Pull Request resolved: https://github.com/pytorch/pytorch/pull/144100 Approved by: https://github.com/awgu ghstack dependencies: #144099	2025-01-03 20:06:23 +00:00
Xuehai Pan	45411d1fc9	Use absolute path `path.resolve()` -> `path.absolute()` (#129409 ) Changes: 1. Always explicit `.absolute()`: `Path(__file__)` -> `Path(__file__).absolute()` 2. Replace `path.resolve()` with `path.absolute()` if the code is resolving the PyTorch repo root directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/129409 Approved by: https://github.com/albanD	2025-01-03 20:03:40 +00:00
bobrenjc93	e9e18a9617	remove allow-untyped-defs from _export/db/logging.py (#144093 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144093 Approved by: https://github.com/Skylion007	2025-01-03 19:36:14 +00:00
Nikita Shulga	ad09395674	[MPSInductor] Fix multi rangevar kernel invocation (#144050 ) By changing `thread_position_in_grid` type to uint{n} and passing dimentions during the kernel call `pytest test/inductor/test_torchinductor.py -k _mps` score is 445 failed, 309 passed, 32 skipped Pull Request resolved: https://github.com/pytorch/pytorch/pull/144050 Approved by: https://github.com/jansel ghstack dependencies: #144055, #144051, #144122, #144105, #144156	2025-01-03 19:32:43 +00:00
Nikita Shulga	52e107a7ca	[MPSInductor] Add `constant`, `isinf` and `isnan` ops (#144156 ) Per Table 6.5 of [Metal Language Specification](https://developer.apple.com/metal/Metal-Shading-Language-Specification.pdf) infinity is `HUGE_VALF` Pull Request resolved: https://github.com/pytorch/pytorch/pull/144156 Approved by: https://github.com/Skylion007, https://github.com/jansel ghstack dependencies: #144055, #144051, #144122, #144105	2025-01-03 19:32:43 +00:00
Catherine Lee	383ff4011c	[ez] Use strip for arg sanitization in upload_metadata_file to improve readability (#144155 ) Minor thing that improves readability. I didn't realize you could specify characters for strip when I wrote this Pull Request resolved: https://github.com/pytorch/pytorch/pull/144155 Approved by: https://github.com/huydhn, https://github.com/Skylion007	2025-01-03 19:25:30 +00:00
bobrenjc93	8b3479e361	remove allow-untyped-defs from torch/distributed/fsdp/_dynamo_utils.py (#144131 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144131 Approved by: https://github.com/Skylion007	2025-01-03 19:07:21 +00:00
Jane Xu	7b69f7b449	Clarify what we mean by decoupled weight decay in the *AdamWs (#144101 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144101 Approved by: https://github.com/albanD	2025-01-03 19:06:00 +00:00
Yidi Wu	c36f94b373	[while_loop][dynamo] auto-unspecialize int input and output to unbacked symints (#143106 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143106 Approved by: https://github.com/zou3519 ghstack dependencies: #143105, #143545	2025-01-03 19:01:07 +00:00
Yidi Wu	5660709856	[hop][BE] unify meta checking with check_meta_consistency (#143545 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143545 Approved by: https://github.com/zou3519 ghstack dependencies: #143105	2025-01-03 19:01:07 +00:00
Yidi Wu	6e8dca9ff3	[while_loop][aot] auto-unspecialize int input and output to unbacked symints (#143105 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143105 Approved by: https://github.com/zou3519	2025-01-03 19:01:07 +00:00
Davide Italiano	56f6289f6a	[mps/inductor] Add support for atanh(). (#144121 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144121 Approved by: https://github.com/jansel, https://github.com/malfet	2025-01-03 18:55:05 +00:00
Nikita Shulga	a7b61c5b49	[MPSInductor] Add signbit op support (#144105 ) By mapping it to `metal::signbit` Pull Request resolved: https://github.com/pytorch/pytorch/pull/144105 Approved by: https://github.com/jansel, https://github.com/Skylion007 ghstack dependencies: #144055, #144051, #144122	2025-01-03 18:34:46 +00:00
PyTorch MergeBot	8d63a4a409	Revert "Set `enable_trace_contextlib_contextmanager` flag to True (#140604 )" This reverts commit 1c817fe6714cec510ccc6022b2c3e66146c3ad59. Reverted https://github.com/pytorch/pytorch/pull/140604 on behalf of https://github.com/guilhermeleobas due to breaking one of the benchmarks (moco) ([comment](https://github.com/pytorch/pytorch/pull/140604#issuecomment-2569640837))	2025-01-03 18:23:53 +00:00
Animesh Jain	c5c897c3a1	[dynamo][easy] Miscellaneous fixes (#144141 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144141 Approved by: https://github.com/williamwen42 ghstack dependencies: #144129, #144130	2025-01-03 18:22:56 +00:00
Animesh Jain	732359c633	[dynamo][easy] Minor fixes in guards.cpp (#144130 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144130 Approved by: https://github.com/williamwen42 ghstack dependencies: #144129	2025-01-03 18:22:56 +00:00
Animesh Jain	a450e177fd	[dynamo] remove inline inbuilt tests as flag is enabled by default (#144129 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144129 Approved by: https://github.com/williamwen42	2025-01-03 18:22:56 +00:00
PyTorch MergeBot	2409b49a33	Revert "Rewrite _reparametrize_module to use `contextmanager` (#138203 )" This reverts commit 7bf3b7cdc5631f9991eebcdd8ec09095339a9973. Reverted https://github.com/pytorch/pytorch/pull/138203 on behalf of https://github.com/guilhermeleobas due to breaking one of the benchmarks (moco) ([comment](https://github.com/pytorch/pytorch/pull/138203#issuecomment-2569634001))	2025-01-03 18:17:32 +00:00
Blaine Burton Rister	60fe8a65af	[Inductor] Generalize tiling algorithm to handle fused reductions (#144041 ) # Issue This PR cleans up an edge case that wasn't handled by https://github.com/pytorch/pytorch/pull/137243. The existing tiling code assumes that `node.get_ranges()` is a reliable source of pointwise and reduction numels. This is true for pointwise kernels, but the situation is more complicated with reductions. Since reductions change the number of elements in a tensor, not all ops within a reduction kernel will have the same number of iterations. For example, `var_mean` fuses pointwise division with the output of reduction sum, and the division lacks the corresponding reduction ranges. # Fix Instead of getting numels from `node.get_ranges()`, explicitly pass the global pointwise and reduction numels to the relevant tiling functions. In `SIMDKernel.complete_partial_tiling`, we solve for the missing numel by diving the global numel by the partial tiling's numel. This ensures all tilings have the correct global numel. Also, in `SIMDKernel.is_compatible`, add the global reduction numel to node ranges that are missing it. For example, `{"x": 8, "r0_": 8}` is compatible with a node of ranges `([8], [])` when we have `reduction_numel=8`. Finally, this PR generalizes some of the existing codegen to handle multiple reduction dims. We already had code to ignore reduction splits for pointwise kernels, but it only worked for 1D reductions. Now it can handle ND. # Test plan This PR parametrizes the existing CI test for `var_mean` to also run with tiled reductions. It also adds a new test checking that `var_mean` generates 2D tilings (with tiled reduction enabled). These new tests would fail on the current main branch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144041 Approved by: https://github.com/jansel	2025-01-03 18:16:27 +00:00
Colin Peppler	e93f625d00	[AOTI] don't codegen autotune_at_compile_time for non-Triton kernels (#143990 ) `autotune_at_compile_time` is a separate codegen file specifically for autotuning Triton kernels. We can skip it for non-Triton kernels (like CUTLASS). This test (test_aoti_workspace_ptr) checks that `workspace_0.data_ptr()` is codegen-ed correctly in AOTI. ``` // in AOTI codegen kernels.cuda_fused_0( (const half)arg0_1.data_ptr(), (const half)arg1_1.data_ptr(), (half)buf0.data_ptr(), (int)200, (int)5216, (int)10432, (int)10432, (int)5216, (int)0, (int)5216, (size_t)nullptr, (uint8_t*)workspace_0.data_ptr(), stream); ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/143990 Approved by: https://github.com/henrylhtsang, https://github.com/chenyang78, https://github.com/desertfire	2025-01-03 18:01:12 +00:00
Huy Do	f3968373c1	Migrate the rest of CUDA 12.1 jobs to 12.4 (#144118 ) CUDA 12.4 is the default now and we don't build nightly 12.1 anymore, so it's time to move the rest of CI jobs to 12.4. I also clean up some redundant CI jobs on periodic and inductor-periodic. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144118 Approved by: https://github.com/atalman	2025-01-03 17:45:41 +00:00
Huy Do	cbdc70ae07	Use the build environment as sccache prefix instead of workflow name (#144112 ) This is an attempt to improve cache usage for jobs in non-pull workflows like periodic, slow, or inductor as we are seeing build timeout there from time to time, for example https://github.com/pytorch/pytorch/actions/runs/12553928804. The build timeout never happens in pull or trunk AFAICT because they are more up to date with the cache content coming from the PR itself. Logically, the same build should use the same cache regardless of the workflows. We have many examples where the same build, for example [linux-focal-cuda12.4-py3.10-gcc9-sm86](https://github.com/search?q=repo%3Apytorch%2Fpytorch+linux-focal-cuda12.4-py3.10-gcc9-sm86&type=code), is split between different workflows and, thus, uses different caches. I could gather some sccache stats from CH in the meantime to try to prove the improvement before and after this lands. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144112 Approved by: https://github.com/malfet	2025-01-03 17:33:03 +00:00
Benjamin Glass	b9fbd65dfd	AOTI fallback ops: remove ops that were never codegen'ed (#143421 ) Removes 4 fallback ops that are currently not possible to codegen, which does not break ABI-compatibility. 1. `_cudnn_rnn_backward` and `_histogramdd_bin_edges` both return `Tensor[]`, which we cannot codegen with the current design. 2. `_sparse_coo_tensor_with_dims_and_tensors` only supplies a Sparse operator, which we don't support. 3. `zeros.names` requires a `Dimname` input, which we can't currently codegen. Removing these ops from the list will improve test performance, since the fallback op generation will use the Python proxy executor instead of calling non-existent C functions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143421 Approved by: https://github.com/desertfire ghstack dependencies: #141371, #143223	2025-01-03 16:05:38 +00:00
Benjamin Glass	b5b419d627	cpp_wrapper: Use runtime dispatched fallbacks for complex ops (#143223 ) When calling a fallback op in cpp_wrapper mode, where any of the inputs are complex numbers, utilize the runtime dispatched fallback mode. This properly handles the Conjugate and Negative dispatch keys, if present, in exchange for a performance pessimization in complex arithmetic. This PR additionally fixes some cascading failure modes exposed in our `aot_inductor` tests by this change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143223 Approved by: https://github.com/desertfire ghstack dependencies: #141371	2025-01-03 16:05:38 +00:00
Benjamin Glass	e88d06f54e	ir.ExternKernel: correctly handle kwarg default arguments (#141371 ) Additionally, enable torchinductor opinfo tests exercising all previously fixed bugs in this stack. Note: I've manually sharded the cpp_wrapper CI checks into 2 shards. Once all OpInfo tests are enabled we should switch back to automatic sharding, but until then the pipeline doesn't have appropriate timing stats. More shards would be helpful given the compilation slowdown associated with cpp_wrapper, but 2 will do for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141371 Approved by: https://github.com/desertfire	2025-01-03 16:05:31 +00:00
Nikita Shulga	f7644efa79	[MPSInductor][EZ] Fix logical_[or\|end] ops (#144122 ) For boolean operands it does not really matter whether `&` or `&&` is used, but if one ever to rely on operator precedence, then bitwise ops should have higher precendence than logical ones Pull Request resolved: https://github.com/pytorch/pytorch/pull/144122 Approved by: https://github.com/huydhn ghstack dependencies: #144055, #144051	2025-01-03 15:28:07 +00:00
Nikita Shulga	b336d72dae	[MPSInductor] Preserve dtype during load (#144051 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144051 Approved by: https://github.com/Skylion007 ghstack dependencies: #144055	2025-01-03 15:17:33 +00:00
Valentine233	a1ae8fadc7	[cpu][vec] support reduce ops for add and max (#144065 ) ### Description During the support of INT8 SDPA https://github.com/pytorch/ao/pull/1372, we find that `at::vec::vec_reduce_all<int32_t>` would go into slow scalar path when doing sum and max. So here, we support the two reduce-related ops `reduce_add` and `reduce_max` for `vec512` and `vec256`, using the Sequence instructions. ### Details - Support vectorized `reduce_add` and `reduce_max` for dtypes `int32` and `float32`, using the Sequence instructions; - Implement the scalar version for fallback path in vec base; - Add the operator `reduce` in vec base, in order to simplify the codes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144065 Approved by: https://github.com/mingfeima	2025-01-03 13:01:52 +00:00
Michael Diggin	55dc61dd52	Dataloader distribute tasks to workers when in_order is False (#142324 ) Fixes #105203 and is a follow up PR to #141833 When `in_order` is True (the default), tasks are given out to workers in a round robin fashion. When `in_order` is False this is no longer needed, as we give up guarantees of reproducibility, and instead tasks should be given to workers that are able to perform work. In this PR I've added tracking of the number of outstanding tasks for each worker (updated when tasks are added to their queue, and when data is returned to the main thread). When finding the next queue to add a task to, if `in_order` is False it will only add the task to the workers queue if it has fewer than `_prefetch_factor` tasks outstanding. The current default behaviour is left as is. Tests are also updated to assert on the worker IDs for each sample of data returned. I've run the following to confirm they aren't flaky ```bash for i in {1..20}; do python test/test_dataloader.py TestOutOfOrderDataLoader; done ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/142324 Approved by: https://github.com/andrewkho	2025-01-03 12:57:04 +00:00
blzheng	c09bf71bd6	[Inductor][CPU] Fix C++ compile error of torch.max on bool type (#143848 ) Fix https://github.com/pytorch/pytorch/issues/143568 Before: ![image](https://github.com/user-attachments/assets/3e1e869e-7ae7-45c0-a334-8a663028e003) After: ![image](https://github.com/user-attachments/assets/91f72920-64bd-449a-a6c6-6048409c1450) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143848 Approved by: https://github.com/jgong5, https://github.com/leslie-fang-intel	2025-01-03 09:00:43 +00:00
Xuehai Pan	d9507548d8	[dynamo][BE] move `zip_longest` polyfill to submodule `polyfills.itertools` (#144067 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144067 Approved by: https://github.com/yanboliang ghstack dependencies: #144066	2025-01-03 08:08:31 +00:00
Xuehai Pan	fb1beb31d2	[dynamo][BE] move `dropwhile` polyfill to submodule `polyfills.itertools` (#144066 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144066 Approved by: https://github.com/jansel	2025-01-03 08:08:31 +00:00
hongxyan	00df63f09f	[ROCm] Fix for ld failed to convert GOTPCREL relocation in PyTorch build (#143986 ) I experienced an error while doing a DEBUG build of pytorch on rocm: ``` additional relocation overflows omitted from the output /usr/bin/ld: failed to convert GOTPCREL relocation; relink with --no-relax ``` Based on discussions on similar issue #138427, I fixed it after adding the `--offload-compress` to the HIP_HIPCC_FLAGS to successfully build DEBUG mode on my node. Further updated the PR to enable the flag for non-DEBUG builds as well due to the size reduction. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143986 Approved by: https://github.com/jeffdaily	2025-01-03 06:53:08 +00:00
Xu Han	e141cb9c34	export AOTI_TORCH_EXPORT on Windows. (#140030 ) Fixes #139954 reproduce UT: ```cmd pytest test/inductor/test_torchinductor_codegen_dynamic_shapes.py -k test_device_assert_dynamic_shapes_cpu ``` Issue: <img width="856" alt="image" src="https://github.com/user-attachments/assets/5fc501a9-54e5-45ac-9fb3-509ec11a7abe"> After fixing: ![Image](https://github.com/user-attachments/assets/883846fb-8e92-4b9c-9400-daab32382a3a) Reland: 1. Declare export on Windows explicitly. 2. Support cpu, cuda and xpu devices. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140030 Approved by: https://github.com/jgong5, https://github.com/desertfire	2025-01-03 05:41:06 +00:00
Wanchao Liang	48a05ee773	[dtensor] improve doc of the DTensor class (#144099 ) as titled: explicitly list all public members to make sure the public API stays consistent, also use groupwise as the member order to make doc look better Pull Request resolved: https://github.com/pytorch/pytorch/pull/144099 Approved by: https://github.com/awgu	2025-01-03 05:35:44 +00:00
Davide Italiano	41b5c600df	[ReduceOps] Add dimension checking for cummin()/cummax(). (#143920 ) Summary: cum{min,max} didn't guard against 0-d vector and allowed an arbitrary dimension to be passed. Test Plan: torch_test.py Reviewers: Subscribers: Tasks: Tags: Fixes #71477 Pull Request resolved: https://github.com/pytorch/pytorch/pull/143920 Approved by: https://github.com/malfet	2025-01-03 04:14:33 +00:00
Bin Bao	c5b75f8db1	[AOTI] Remove more AOTI_TORCH_EXPORT (#144081 ) Summary: Similar to https://github.com/pytorch/pytorch/pull/142500, remove redundant AOTI_TORCH_EXPORT from several cpp files, to solve a windows build issue. Differential Revision: D67762069 Pull Request resolved: https://github.com/pytorch/pytorch/pull/144081 Approved by: https://github.com/yushangdi	2025-01-03 02:17:38 +00:00
Jithun Nair	c31912666e	[ROCm] Print amdgpu info on bare metal for CI runners (#144038 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144038 Approved by: https://github.com/jeffdaily	2025-01-03 02:00:40 +00:00
Michal Gallus	37e9da0687	[ROCm][Windows] Disable roctracer-related code (#143329 ) Currently, the roctracer for Windows is not available. This PR disables any mentions of its usage for Windows, and creates dummy functions for Windows to keep compatibility with existing code, but which warn the user about the lack of Windows' availability. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143329 Approved by: https://github.com/sraikund16	2025-01-03 01:51:01 +00:00
bobrenjc93	891a86d1ad	remove allow-untyped-defs from ao/quantization/experimental/fake_quantize.py (#144091 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144091 Approved by: https://github.com/aorenste	2025-01-03 01:26:36 +00:00
bobrenjc93	377e29745f	remove allow-untyped-defs from distributed/elastic/utils/data/cycling_iterator.py (#144090 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144090 Approved by: https://github.com/aorenste	2025-01-03 01:22:50 +00:00
bobrenjc93	0d6db839a7	remove allow-untyped-defs from utils/data/datapipes/iter/streamreader.py (#144088 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144088 Approved by: https://github.com/aorenste	2025-01-03 01:21:44 +00:00

1 2 3 4 5 ...

82797 Commits