pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 12:54:11 +08:00

Author	SHA1	Message	Date
Klaus Zimmermann	50d418f69f	Replace setup.py bdist_wheel with python -m build --wheel (#156712 ) Previously we already replaced most use of `python setup.py develop/install`. This PR also replaces the use of `setup.py bdist_wheel` with the modern `python -m build --wheel` alternative. Pull Request resolved: https://github.com/pytorch/pytorch/pull/156712 Approved by: https://github.com/atalman ghstack dependencies: #156711	2025-09-29 21:51:32 +00:00
Catherine Lee	c332d58184	[testing] upload test stats: Add info to the invoking file summary and some other changes (#164016 ) * Changes some internal logic for grouping so hopefully it's slightly less annoying write code for * Changes the invoking file summary to just use file, which I think is correct most of the time * Adds some fields to the file summary, like skips, errors, etc so I can reuse it for file report regression things Output should be the same, maybe with slightly more fields since I got rid of some of the pops Pull Request resolved: https://github.com/pytorch/pytorch/pull/164016 Approved by: https://github.com/huydhn	2025-09-29 21:20:18 +00:00
Edward Yang	efd7fd5ed5	Consistently use c10_ovrsource in arvr mode everywhere (#164128 ) Summary: Previously, many arvr targets transitively depended on c10, not c10_ovrsource, because they either explicitly depended on c10 (because they didn't know better) or they depended on legacy Caffe2, which never got the ovrsource treatment. So we found all these spots (driven by D82283623) and forced them to query arvr mode to figure out which one they should use. The goal is you NEVER have both targets in the same build rule at the same time. This diff could be reverted if D82224960 works out but I haven't gotten it to work yet. Test Plan: sandcastle Reviewed By: EscapeZero Differential Revision: D82390436 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164128 Approved by: https://github.com/albanD, https://github.com/malfet	2025-09-29 20:47:20 +00:00
eellison	b5d4d350f5	Helper to augment graph with additional deps (#163959 ) In comm-compute overlap we will have a graph with: ``` def foo(...): ag = all_gather(...) hiding_compute = mm(...) wait(ag) ``` There is no explicit dependency between the hiding compute and the collectives, but we want to add implicit dependencies from wait->hiding_compute, and from hiding_compute->all_gather to preserve overlap. Additionally, while bucketing, we will merge collective starts and collective waits together. In this case, we will want to treat the two nodes as a single subgraph - each node in the merged set will have the union of all deps in the set. This pr adds `AugmentedGraphHelper` that adds the apis, and allows querying for dependency with this augmented graph. Pull Request resolved: https://github.com/pytorch/pytorch/pull/163959 Approved by: https://github.com/v0i0, https://github.com/IvanKobzarev ghstack dependencies: #163215, #163754	2025-09-29 20:43:12 +00:00
Nikita Shulga	6db1b9dd21	[MPS] Chunk fillBuffer into 4Gb slices (#164108 ) To avoid regression on MacOS 26, which one could observe by running the following script ```swift import Metal let bufferSize = 1<<32 + 4 guard let device = MTLCreateSystemDefaultDevice() else { fatalError("No Metal device found") } guard let buffer = device.makeBuffer(length: bufferSize, options: .storageModeShared) else { fatalError("Failed to create buffer") } guard let cmdQueue = device.makeCommandQueue() else { fatalError("Failed to create command queue") } guard let cmdBuffer = cmdQueue.makeCommandBuffer() else { fatalError("Failed to create command buffer") } guard let blitEncoder = cmdBuffer.makeBlitCommandEncoder() else { fatalError("Failed to create blit encoder") } blitEncoder.fill(buffer: buffer, range: 0..<bufferSize, value: 0x42) blitEncoder.endEncoding() cmdBuffer.commit() cmdBuffer.waitUntilCompleted() let tailOffs = 8 let hostPtr = buffer.contents().bindMemory(to: UInt8.self, capacity: bufferSize) let tail = Array(UnsafeBufferPointer(start: hostPtr + (bufferSize - tailOffs), count: tailOffs)) for (idx, val) in tail.enumerated() { print("Offs 0x\(String(bufferSize - tailOffs + idx, radix: 16)): 0x\(String(val, radix: 16))") } ``` Test plan: run `test_indexing.py` on MacOS-26 Fixes https://github.com/pytorch/pytorch/issues/161265 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164108 Approved by: https://github.com/Skylion007	2025-09-29 20:19:29 +00:00
PyTorch MergeBot	9e792f583a	Revert "[export] Skip the check instead of disable (#164084 )" This reverts commit c2768d0f5af840a94c342ed9eac3e26c819aa3f0. Reverted https://github.com/pytorch/pytorch/pull/164084 on behalf of https://github.com/yangw-dev due to broke internal tests ([comment](https://github.com/pytorch/pytorch/pull/164084#issuecomment-3348862668))	2025-09-29 20:09:13 +00:00
PyTorch MergeBot	6650f5af74	Revert "[dynamo] Special path for cloning of torch dispatch tensors (#164081 )" This reverts commit 811c693c49f7cd3da2ea174955d12f2f8780bd46. Reverted https://github.com/pytorch/pytorch/pull/164081 on behalf of https://github.com/yangw-dev due to broke internal tests ([comment](https://github.com/pytorch/pytorch/pull/164084#issuecomment-3348862668))	2025-09-29 20:09:13 +00:00
atalman	349c960970	Use linux.g4dn.4xlarge.nvidia.gpu for cuda 12.4 legacy driver tests (#163956 ) Workaround for https://github.com/pytorch/pytorch/issues/163658 Looks like the workflow passes on 12.8 build that use inux.g4dn.4xlarge.nvidia.gpu but its failing on 12.6 builds that use linux.4xlarge.nvidia.gpu: https://github.com/pytorch/pytorch/actions/runs/17953843505/job/51080623612#step:13:470 Pull Request resolved: https://github.com/pytorch/pytorch/pull/163956 Approved by: https://github.com/malfet Co-authored-by: Mark Saroufim <marksaroufim@meta.com>	2025-09-29 19:38:17 +00:00
atalman	f090818a40	Rename remaining periodic and xpu workflows py3.9->py3.10 (#164127 ) Fix naming py3.9 should be py 3.10 These jobs where already migrated to 3.10 Please see: https://github.com/pytorch/pytorch/actions/runs/18091356163/job/51472526131#step:16:224 ``` Python version: + python --version Python 3.10.18 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/164127 Approved by: https://github.com/malfet	2025-09-29 19:26:21 +00:00
eellison	e1bd5b60cf	refactor bucketing (#163754 ) Preparatory refactory Pull Request resolved: https://github.com/pytorch/pytorch/pull/163754 Approved by: https://github.com/IvanKobzarev ghstack dependencies: #163215	2025-09-29 18:32:41 +00:00
eellison	c9b5af9a38	[inductor] do comm compute overlap at aten fx level (#163215 ) This is first part of the stack that does comm/compute reordering, and then uses the exposure analysis to do bucketing. Subsequent prs will handle: - use of exposure analysis to do bucketing - make sure inductor respects comm/compute overlapping done at fx level - non-profiling mm estimation/rank broadcasting of profile results Other mis: - Validate accuracy of nccl estimations ( use ruisi's profiling instead ?) For a llama 2d parallelism test, on forward, we overlap all but 2 of potentially hidden collectives. For backward, we overlap 217/269 of potentially hidden collectives. If you increase `compute_overlap_multipler` (for fudge factor of inaccurate comms estimation), that goes down to all but 16 of potentially hidden collectives. fwd example: https://gist.github.com/eellison/76209c49d8829c5f1e323d34a3f040c3 bwd example: https://gist.github.com/eellison/6cfc2285df53a94cfa4012f5fdae5c51 Pull Request resolved: https://github.com/pytorch/pytorch/pull/163215 Approved by: https://github.com/IvanKobzarev	2025-09-29 18:18:03 +00:00
Blaine Burton Rister	604da4bb9a	[Inductor-FX] Support unbacked symbol definitions (#163729 ) # Problem Inductor sometimes generates unbacked symints to handle things like mismatched branches of `torch.cond`. This code is represented by `pytree.KeyPath`, with special codegen logic to convert it to Python and C++. This was not previously supported by the FX backend. # Feature This PR adds support for unbacked symbol declarations to the FX backend. The implementation is fairly straightforward. 1. Instead of raw Python/C++, update the wrapper codegen method to emit a new Wrapper IR line called `UnbackedSymbolDefsLine`. This contains all the information needed to generate the Python and C++ code. 2. Move the existing Python/C++ codegen to a private method, which is invoked by `UnbackedSymbolDefsLine.codegen()`. 3. Implement a method to generate FX IR from unbacked symbol definitions. The implementation is based on recursive descent, consuming some keypath entries, emitting an FX IR node, and recursing to the rest of the keypath. It is conceptually identical to the existing algorithm for Python and C++, except it generates FX nodes. 4. The FX backend currently relies on size hints to generate autotuning arguments, and consequently autotuning does not support unbacked SymInts. At some point, we would like to generalize the autotuning logic to support these. But for now, simply emit a warning and skip autotuning when we see them. 5. The new test case exposed some tricky issues reconciling Triton call args with constants stored in `triton_meta`. This PR rewrites the relevant helper function to do this in a more principled way. # Test plan This PR imports an existing control flow test to the FX backend's test suite. The test uses unbacked symbol definitions to handle mismatched dynamic shapes coming from `torch.cond` branches. Pull Request resolved: https://github.com/pytorch/pytorch/pull/163729 Approved by: https://github.com/jansel	2025-09-29 18:10:37 +00:00
Nikita Shulga	8f32adc90a	[MPSHooks] Release pending command encoder (#164093 ) Before returning a comand buffer, as subsequent calle are very likely to allocate their own encoder, which results in the following runtime error ``` tryCoalescingPreviousComputeCommandEncoderWithConfig:nextEncoderClass:]:1090: failed assertion `A command encoder is already encoding to this command buffer' ``` Added regression test to `test_mps_extension` Please note, that `torch::mps::get_command_buffer()` should be called with dispatch_queue held, both before and after this change, but many implementations skip that Fixes https://github.com/pytorch/pytorch/issues/163721 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164093 Approved by: https://github.com/atalman, https://github.com/Skylion007	2025-09-29 17:50:12 +00:00
Nikita Shulga	3fa3bfbfda	[EZ][BE] Fix unused parameter warnings in EmbeddingBag (#164135 ) Before this change following were emitted during compilation ``` [7/31] Compiling /Users/malfet/git/pytorch/pytorch/aten/src/ATen/native/mps/kernels/EmbeddingBag.metal to EmbeddingBag_31.air /Users/malfet/git/pytorch/pytorch/aten/src/ATen/native/mps/kernels/EmbeddingBag.metal:28:12: warning: unused parameter 'is_first' [-Wunused-parameter] bool is_first) { ^ /Users/malfet/git/pytorch/pytorch/aten/src/ATen/native/mps/kernels/EmbeddingBag.metal:47:16: warning: unused parameter 'per_sample_weights_index' [-Wunused-parameter] uint32_t per_sample_weights_index, ^ /Users/malfet/git/pytorch/pytorch/aten/src/ATen/native/mps/kernels/EmbeddingBag.metal:48:19: warning: unused parameter 'per_sample_weights' [-Wunused-parameter] constant T* per_sample_weights, ^ /Users/malfet/git/pytorch/pytorch/aten/src/ATen/native/mps/kernels/EmbeddingBag.metal:49:16: warning: unused parameter 'per_sample_weights_stride' [-Wunused-parameter] uint32_t per_sample_weights_stride) { ^ /Users/malfet/git/pytorch/pytorch/aten/src/ATen/native/mps/kernels/EmbeddingBag.metal:74:19: warning: unused parameter 'weight_val' [-Wunused-parameter] opmath_t<T> weight_val, ^ /Users/malfet/git/pytorch/pytorch/aten/src/ATen/native/mps/kernels/EmbeddingBag.metal:75:19: warning: unused parameter 'out_val' [-Wunused-parameter] opmath_t<T> out_val, ^ /Users/malfet/git/pytorch/pytorch/aten/src/ATen/native/mps/kernels/EmbeddingBag.metal:76:12: warning: unused parameter 'is_first' [-Wunused-parameter] bool is_first, ^ /Users/malfet/git/pytorch/pytorch/aten/src/ATen/native/mps/kernels/EmbeddingBag.metal:77:17: warning: unused parameter 'max_idx' [-Wunused-parameter] thread I& max_idx, ^ /Users/malfet/git/pytorch/pytorch/aten/src/ATen/native/mps/kernels/EmbeddingBag.metal:78:9: warning: unused parameter 'weight_idx' [-Wunused-parameter] I weight_idx, ^ /Users/malfet/git/pytorch/pytorch/aten/src/ATen/native/mps/kernels/EmbeddingBag.metal:79:12: warning: unused parameter 'pad' [-Wunused-parameter] bool pad) {} ^ ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/164135 Approved by: https://github.com/Skylion007	2025-09-29 17:44:09 +00:00
Fabian	8701f18bc0	Adjust ...mark_unbacked() -> ...decorators.mark_unbacked() in logs. (#164131 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164131 Approved by: https://github.com/albanD, https://github.com/Skylion007	2025-09-29 17:44:00 +00:00
Janani Sriram	a56e7a1920	[Max Autotune][B200] Add addmm config to avoid test OOM (#164020 ) Summary: Add a new `addmm` config that is small enough to not cause an OOM (out of memory error), since the configs for `blackwell_persistent_mm_configs`, which `addmm` used, are too large. Test Plan: `test_max_autotune.py` Differential Revision: D83378477 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164020 Approved by: https://github.com/coconutruben, https://github.com/njriasan	2025-09-29 17:38:46 +00:00
Janani Sriram	e2c894c97d	[Inductor][ATen][FP8] Relax stride check for block-wise scaling when scaling dimension is 1 (#163829 ) Summary: Relax stride check for block-wise scaling (1x128, 128x128) when a dimension of the scaling factor is 1. When the scaling tensor has a dimension of size 1, the stride is effectively "meaningless" to PyTorch, i.e. PyTorch decides to replace its stride with a default of `[1, 1]`. However, the old stride check required the stride to match one of the scaling dimensions. Here, we relax the stride check when the effective stride is 1 in order to allow for cases in which `K <= 128` and `N <= 128`. Test Plan: ``` pytest -s -v test/test_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_block_wise_float32_lhs_block_1_rhs_block_128_cuda 2>&1 \| tee ~/personal/stride_check.log ``` Differential Revision: D83023706 Pull Request resolved: https://github.com/pytorch/pytorch/pull/163829 Approved by: https://github.com/lw, https://github.com/eqy	2025-09-29 17:28:26 +00:00
PyTorch MergeBot	6b473c90cf	Revert "[inductor] require shape in TritonCSEVariable (#162275 )" This reverts commit c257570e6cd25753f9f0a640b965148ead2cf918. Reverted https://github.com/pytorch/pytorch/pull/162275 on behalf of https://github.com/jeffdaily due to sorry this broke rocm CI; inductor/test_select_algorithm.py::TestTemplateRender::test_finalized_subclass_hooks [GH job link](https://github.com/pytorch/pytorch/actions/runs/18048893250/job/51366715091) [HUD commit link](`c257570e6c`) ([comment](https://github.com/pytorch/pytorch/pull/162275#issuecomment-3348159095))	2025-09-29 17:26:54 +00:00
Janani Sriram	6bcc6bbc85	[Inductor][FP8] Add op_name for ScaledMM TMA template heuristic (#164019 ) Summary: For H100s and below, add `op_name="scaled_mm"` to the template heuristic for `CUDAScaledTMATemplateConfigHeuristic` such that `scaled_mm` persistent + TMA tests do not default to the "mm" heuristics. Test Plan: `test_max_autotune.py` Differential Revision: D83390775 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164019 Approved by: https://github.com/njriasan	2025-09-29 17:24:26 +00:00
Nikita Shulga	95be302889	Skip test_conv3d_cudnn_broken on ROCM (#164138 ) Followup after https://github.com/pytorch/pytorch/pull/163903 Fixes https://github.com/pytorch/pytorch/issues/164137 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164138 Approved by: https://github.com/Camyll	2025-09-29 16:56:51 +00:00
Yuanyuan Chen	f433e681b9	Remove export of slice_in_dim (#164117 ) Cannot find `slice_in_dim` in OSS. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164117 Approved by: https://github.com/soulitzer	2025-09-29 16:56:14 +00:00
Dev Sashidhar	5ff2387dbe	Fix comment on broadcasting example to clarify dimension mismatch (#162177 ) Fixes #162116 Updated the comment in the broadcasting example to clarify that tensors with mismatched dimension sizes (0 vs 2) are not broadcastable. Removed incorrect reference to missing dimensions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/162177 Approved by: https://github.com/soulitzer	2025-09-29 16:47:48 +00:00
Nikita Shulga	84b57c93db	[MPSInductor] Unskip test_repeat_interleave_Tensor_decomp (#164136 ) Not sure what was the problem, but it passes for me locally Fixes https://github.com/pytorch/pytorch/issues/159408 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164136 Approved by: https://github.com/v0i0	2025-09-29 16:20:34 +00:00
Markus Hoehnerbach	069ccf5f1e	[inductor] pdl: enable launch and deduplicate waits (#162014 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/162014 Approved by: https://github.com/eellison	2025-09-29 16:10:26 +00:00
Vismai Khanderao	1c12d7416b	[SDPA] [MPS] Fixes regression in 2.8.0 for scaled_dot_product_attention using mps (#163598 ) Fixes #163597 - Updates fast SDPA implementations to take in query tensor stride info similar to key and value instead of assuming stride. - Updated tests with additional transpose/permutation layouts. New tests catch the regression. ### Benchmarking with script found in [implementation PR](https://github.com/pytorch/pytorch/pull/152781#:~:text=19.8%25%20speed%20improvement-,Script%20to%20get%20perf%3A,-import%20torch%0Aimport) Times are averaged over 100000 iterations. This change should not have any significant performance difference. Tested on an M3 Pro ### Vector Fast Path (q_len=1, k_len=256) - Before: 0.160 ms - After: 0.157 ms ### Vector 2-pass (q_len=1, k_len=4096) - Before: 0.342 ms - After: 0.339 ms ### Vector Fast Path (q_len=8, k_len=256) - Before: 0.228 ms - After: 0.231 ms ### Vector 2-pass (q_len=8, k_len=4096) - Before: 0.432 ms - After: 0.436 ms Pull Request resolved: https://github.com/pytorch/pytorch/pull/163598 Approved by: https://github.com/malfet	2025-09-29 16:09:46 +00:00
dolpm	3746039b47	[inductor] fix: 'get_raw_stream' undefined (#163707 ) Summary: ran into this when precompiling baidu/ERNIE-4.5-21B-A3B-PT codegen after fix: ```py import triton import triton.language as tl from torch._inductor.runtime.triton_heuristics import start_graph, end_graph from torch._C import _cuda_getCurrentRawStream as get_raw_stream with torch.cuda._DeviceGuard(0): stream0 = get_raw_stream(0) ... ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/163707 Approved by: https://github.com/jamesjwu	2025-09-29 15:48:16 +00:00
Paul Zhang	872edd89d6	Enable outer reductions in fbcode (#163884 ) Summary: Enabling the outer reduction optimization in fbcode Test Plan: Evals in https://docs.google.com/document/d/1-tcItRsyEaibaXL56Zq2-CWh5wCmHXDDgDQT_9uOvXE/edit?tab=t.0#bookmark=id.tkgzaitxacg0 Differential Revision: D81948542 Pull Request resolved: https://github.com/pytorch/pytorch/pull/163884 Approved by: https://github.com/Skylion007	2025-09-29 15:25:17 +00:00
Howard Huang	47ed41109f	Fix PgNccl coalseced profiling (#160680 ) Admittedly I'm a noob when looking at traces, but this looked pretty off to me: <img width="1528" height="824" alt="Screenshot 2025-08-14 at 5 27 49 PM" src="https://github.com/user-attachments/assets/871e7b4c-0e47-4c84-97cc-8198b7b76d4b" /> 1. Why are there so many "nccl:coalesced" on the CPU thread 2. Why is there "nccl:coalesced" on compute stream (stream 7) Here is what is happening: CPU side: In `endCoalescing`, we create a [work object ](`3be70dc30e/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp (L3473)`) with the profiling title "nccl:coalesced" GPU side: The CUDA kernels will inherit this profiling title What is missing: We forgot to call the record function [callback](`3be70dc30e/torch/csrc/distributed/c10d/Work.cpp (L35-L38)`). With this change we finishs immediately on the CPU side, but the ncclDevKernel_SendRecv still have the coalesced title. New trace looks like this: <img width="1123" height="637" alt="image" src="https://github.com/user-attachments/assets/f015fd64-85cd-452a-be24-3e7724f84e44" /> Pull Request resolved: https://github.com/pytorch/pytorch/pull/160680 Approved by: https://github.com/fegin, https://github.com/kwen2501	2025-09-29 15:21:55 +00:00
Klaus Zimmermann	fa54b08cd5	Replace setup.py install with pip install (#156711 ) #156027 already replaced most use of `python setup.py install`. This PR only adds a few more occurrences and adds `--no-build-isolation` in a few places. Pull Request resolved: https://github.com/pytorch/pytorch/pull/156711 Approved by: https://github.com/atalman	2025-09-29 15:15:10 +00:00
Nicolas De Carli	92284fb2ff	Add SVE128 ISA (#158932 ) Summary: Partly Importing and adapting https://github.com/pytorch/pytorch/pull/138388, adding SVE128 as ISA. Intention is to add SVE128 translation layers for Vectorized data types. Idea is to have 1 PR per file, aside from the current one, plus a last one modifying cmake files to enable the new ISA selectively. Tested current changes on a nightly run, to verify no regressions occur on systems leveraging SVE256. No regressions spotted when running test_ops.py, a set of 34k unit tests. A machine leveraging SVE128 was used towards this testing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/158932 Approved by: https://github.com/malfet	2025-09-29 14:49:19 +00:00
PaulZhang12	84d673ef57	Add less warps config to inner reductions (#162447 ) Add less warps to ensure proper vectorization + memory coalescing for inner reductions, prefer more work per thread <img width="1717" height="731" alt="Screenshot 2025-09-17 at 10 03 25 AM" src="https://github.com/user-attachments/assets/7b1f4a30-62f2-4bee-bb9c-122501bde63e" /> Differential Revision: [D83343892](https://our.internmc.facebook.com/intern/diff/D83343892) Pull Request resolved: https://github.com/pytorch/pytorch/pull/162447 Approved by: https://github.com/v0i0, https://github.com/eellison, https://github.com/shunting314	2025-09-29 13:48:36 +00:00
Jean Schmidt	d633bac252	Update issue templates adding a DISABLE AUTOREVERT option (#163858 ) This should be used to disable autorevert functionality if users feels the need to. Pull Request resolved: https://github.com/pytorch/pytorch/pull/163858 Approved by: https://github.com/izaitsevfb	2025-09-29 13:10:05 +00:00
PyTorch UpdateBot	d81476e211	[xla hash update] update the pinned xla hash (#163494 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/nightly.yml). Update the pinned xla hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/163494 Approved by: https://github.com/pytorchbot	2025-09-29 12:31:16 +00:00
PyTorch UpdateBot	a0ae2f9aa0	Update slow tests (#163493 ) This PR is auto-generated weekly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/weekly.yml). Update the list of slow tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/163493 Approved by: https://github.com/pytorchbot	2025-09-29 11:58:17 +00:00
Ke Wen	615da7b95e	[fx] Allow customization of submod name in split graph (#164035 ) Fixes #164030: HOP and pipelining both name things submod_i by adding an optional argument `partition_affix` to `split_module` API. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164035 Approved by: https://github.com/ezyang ghstack dependencies: #164045	2025-09-29 09:16:36 +00:00
Deng, Daisy	4fd70d4e7b	[1/N]Enable some tests in test_ops.TestCommon on Intel GPU (#159944 ) For https://github.com/pytorch/pytorch/issues/114850, we will port aten unit tests to Intel GPU. This PR will work on some test case of test/test_ops.py. We could enable Intel GPU with following methods and try the best to keep the original code styles: 1. Extended XPUTestBase.get_all_devices to support multiple devices 2. Added skipXPU decorator 3. Extended onlyOn to support device list 4. Enabled 'xpu' for some test pathes 5. Added allow_xpu=True for supported test class. 6. Replaced onlyCUDA with onlyOn(['cuda', 'xpu']) for supported tests 7. Use skipIfXpu and skipXPU to disable unsupported test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/159944 Approved by: https://github.com/guangyey, https://github.com/EikanWang, https://github.com/albanD	2025-09-29 09:08:04 +00:00
Animesh Jain	e1e5e040cd	[dynamo][export] Add some missing trace rules (#164080 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164080 Approved by: https://github.com/tugsbayasgalan	2025-09-29 08:47:24 +00:00
Ke Wen	5ddad22196	[PP] Use default export mode (non-strict) (#164045 ) export's default mode has switched from strict to non-strict. We just follow suit in PP. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164045 Approved by: https://github.com/H-Huang	2025-09-29 06:31:06 +00:00
Valentine233	90512fa5bd	[Quant] extend the op list for quant lift up (#163621 ) Add `aten.reshape.default` into the op list of quant lift up, in order to fuse more potential quantized kernels. Pull Request resolved: https://github.com/pytorch/pytorch/pull/163621 Approved by: https://github.com/mingfeima, https://github.com/Xia-Weiwen, https://github.com/jansel	2025-09-29 06:14:45 +00:00
Isalia20	48a5470cf8	[CUDA] fix indexing on large tensor causing nvalid configuration argument (#164049 ) Fixes #164048 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164049 Approved by: https://github.com/eqy	2025-09-29 06:07:35 +00:00
CaoE	b9854c9d89	[Inductor][CPP] Fix the test case of test_linear_reuse_kernels (#163723 ) Fixes #163491. Add tolerances to make `test_linear_reuse_kernels` more stable. Pull Request resolved: https://github.com/pytorch/pytorch/pull/163723 Approved by: https://github.com/leslie-fang-intel	2025-09-29 05:29:01 +00:00
can-gaa-hou	eb4361a801	[Fix] Adding missing `f` prefixes to formatted strings [1/N] (#164065 ) As stated in the title. * #164068 * #164067 * #164066 * __->__ #164065 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164065 Approved by: https://github.com/Skylion007	2025-09-29 04:53:00 +00:00
PyTorch UpdateBot	d131f213ac	[vllm hash update] update the pinned vllm hash (#164092 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/nightly.yml). Update the pinned vllm hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164092 Approved by: https://github.com/pytorchbot	2025-09-29 04:41:06 +00:00
can-gaa-hou	7c7ae86991	[Fix] Adding missing `f` prefixes to formatted strings [2/N] (#164066 ) As stated in the title. * #164068 * #164067 * __->__ #164066 * #164065 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164066 Approved by: https://github.com/Skylion007	2025-09-29 04:40:44 +00:00
can-gaa-hou	ad32ed83b3	[Fix] Adding missing `f` prefixes to formatted strings [3/N] (#164067 ) As stated in the title. * #164068 * __->__ #164067 * #164066 * #164065 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164067 Approved by: https://github.com/Skylion007	2025-09-29 04:35:23 +00:00
Animesh Jain	d8becd1cf4	[dynamo][export] Make the source_stack and fqn info same between dynamo and export (#164085 ) preparing for landing the install_free_tensors flag Pull Request resolved: https://github.com/pytorch/pytorch/pull/164085 Approved by: https://github.com/tugsbayasgalan	2025-09-29 04:35:13 +00:00
can-gaa-hou	e64dd8c694	[Fix] Adding missing `f` prefixes to formatted strings [4/N] (#164068 ) As stated in the title. * __->__ #164068 * #164067 * #164066 * #164065 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164068 Approved by: https://github.com/Skylion007	2025-09-29 04:07:07 +00:00
Xuehai Pan	047ae24e34	Eliminate setup.py install/develop in the codebose (#162329 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/162329 Approved by: https://github.com/ezyang	2025-09-29 03:54:28 +00:00
Yuanyuan Chen	3cda34ebde	[2/N] Apply ruff UP035 check in torch files (#164054 ) This is the result of applying the ruff `UP035` check. `Callable` is imported from `collections.abc` instead of `typing`. `TypeAlias` is also imported from `typing`. This PR is the follow-up of #163947. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164054 Approved by: https://github.com/ezyang, https://github.com/Skylion007	2025-09-29 03:35:32 +00:00
Yuanyuan Chen	352197c508	Remove old ROCm skip conditions in tests (#164058 ) This PR removes skip conditions for ROCM <= 3.5. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164058 Approved by: https://github.com/kwen2501	2025-09-29 03:00:58 +00:00

1 2 3 4 5 ...

93688 Commits