pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-11-19 01:54:54 +08:00

Author	SHA1	Message	Date
fduwjj	6aca75eab9	Update on "[For discussion][DeviceMesh] Use a shared_state to cache pg per layout, root_mesh and rank_map" We want to create a shared_state to store root_mesh, rank_map and pg caches. We can add more into it down the road, so that it becomes a singleton for bookkeeping and also align with our original proposal to move toward the idea of mesh universe. cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta msaroufim dcci [ghstack-poisoned]	2025-11-14 09:20:33 -08:00
fduwjj	f8d0bef572	Update base for Update on "[For discussion][DeviceMesh] Use a shared_state to cache pg per layout, root_mesh and rank_map" We want to create a shared_state to store root_mesh, rank_map and pg caches. We can add more into it down the road, so that it becomes a singleton for bookkeeping and also align with our original proposal to move toward the idea of mesh universe. cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta msaroufim dcci [ghstack-poisoned]	2025-11-14 09:20:33 -08:00
fduwjj	23099ad498	Update on "[For discussion][DeviceMesh] Use a shared_state to cache pg per layout, root_mesh and rank_map" We want to create a shared_state to store root_mesh, rank_map and pg caches. We can add more into it down the road, so that it becomes a singleton for bookkeeping and also align with our original proposal to move toward the idea of mesh universe. cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta msaroufim dcci [ghstack-poisoned]	2025-11-14 09:07:17 -08:00
fduwjj	1842dde349	Update base for Update on "[For discussion][DeviceMesh] Use a shared_state to cache pg per layout, root_mesh and rank_map" We want to create a shared_state to store root_mesh, rank_map and pg caches. We can add more into it down the road, so that it becomes a singleton for bookkeeping and also align with our original proposal to move toward the idea of mesh universe. cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k pragupta msaroufim dcci [ghstack-poisoned]	2025-11-14 09:07:17 -08:00
Oguz Ulgen	a5f3035aaf	More pyrefly local errors (#166976 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/166976 Approved by: https://github.com/maggiemoss, https://github.com/Skylion007 viable/strict/1762300179	2025-11-04 18:51:35 +00:00
Eddie Yan	1d3f5e19da	[cuDNN] Smoke-test runtime cuDNN version matches compile time version in CI (#165922 ) Fix and regression test for https://github.com/pytorch/pytorch/issues/165801 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165922 Approved by: https://github.com/malfet, https://github.com/atalman, https://github.com/Skylion007, https://github.com/drisspg Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com> Co-authored-by: Andrey Talman <atalman@fb.com>	2025-11-04 18:46:43 +00:00
amdfaa	496277a8ff	[ROCm][CI] Lower runner check gpu count for distributed jobs (#166961 ) This is a PR to temporarily relieve the queueing that is caused by an mi250 node outage. See this ticket for more information: https://github.com/pytorch/pytorch/issues/166866 It relaxes the GPU count check to allow distributed jobs to run on 2-GPU runners Pull Request resolved: https://github.com/pytorch/pytorch/pull/166961 Approved by: https://github.com/jeffdaily viable/strict/1762296590	2025-11-04 18:44:21 +00:00
Wenlin Chong	53f75cd5ba	Fixed some syntax errors in SECURITY.md file. (#166718 ) Fixed some syntax errors in SECURITY.md file including PyTorch's capitalization problems, some grammatical inconsistencies, etc Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/166718 Approved by: https://github.com/mikaylagawarecki viable/strict/1762295100	2025-11-04 18:18:38 +00:00
Richard Zou	527b1109a8	Delete deprecated fp32 precision warnings (#166956 ) The deprecation warning led to warning spamming in PyTorch APIs, like torch.compile. This is not how a deprecation warning should go: if we add a deprecation warning, we'd better update our built-in APIs to prevent warning spam. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166956 Approved by: https://github.com/albanD	2025-11-04 17:50:04 +00:00
clr	3144713325	subproc_pool: Add support for enabling quiesce via a timer (#166467 ) This adds the capability to subproc pool to enable quiesce via a timer Pull Request resolved: https://github.com/pytorch/pytorch/pull/166467 Approved by: https://github.com/masnesral	2025-11-04 17:37:41 +00:00
Nikita Vedeneev	eefa16342c	[Inductor] addmm with bias -> unfuse bias if there is a pointwise/reduction consumer (#166165 ) Prefer unfused addmm when there is at least a single elemwise/reduction consumer.. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166165 Approved by: https://github.com/eellison viable/strict/1762291515	2025-11-04 17:23:04 +00:00
Nikita Shulga	d02f68f484	[BE] Use `[[maybe_unused]]` (#166865 ) Instead of `(void) foo; // Unused parameter` trick, as this is a C++17 standard feature Will replace further repetitions of the same pattern soon after Pull Request resolved: https://github.com/pytorch/pytorch/pull/166865 Approved by: https://github.com/mikaylagawarecki, https://github.com/Skylion007, https://github.com/janeyx99	2025-11-04 17:08:28 +00:00
Shangdi Yu	68eb55c4b2	Add model code stack trace to cuda.memory._snapshot (#166676 ) We store a mapping between generated fx graph code and original model code stack trace in `fx.traceback._FX_METADATA_REGISTRY`. And we do a post-processing on the memory snapshot to append the original model stack trace information. To achieve this, the biggest change we had to do in `aot_eager` mode is to give each generated fx graph a unique stack trace, i.e. it cannot just be `<eval_with_key>`. We set co_filename to pretend that the code is from `co_filename` file. Now instead of `<eval_with_key>` in stack trace, we get something like `fx_generated_3a4b5c6d7e8f9a0.py`. `augment_with_fx_traces` arg is added to `torch.cuda.memory._snapshot` and `_dump_snapshot`. When the arg is set to True, a post-processing will run to populate the original model stack trace to the snapshot frames. The new behavior of GraphModule can be controlled by `TORCH_ENRICH_RPOFILER_STACK_TRACE` or `_dynamo.config.enrich_profiler_metadata=True`. Alternative: Instead of setting co_filename, we can also do it like below: Note that if we do it this way, we will need to dump the file to make the graph module torch-scriptable. TorchScript requires source access in order to carry out compilation, so we need to make sure original .py files are available. ``` key = filename globals_copy = globals.copy() globals_copy["__file__"] = key globals_copy["__name__"] = key linecache.lazycache(key, globals_copy) exec(compile(src, key, "exec"), globals) ```` Other changes: - Update `MemoryViz.js` to display fx node information and original model code if exist ``` python test/test_fx.py -k test_lineno_map python test/test_fx.py -k test_custom_traceback_raised python test/test_public_bindings.py python test/test_cuda.py -k test_fx_memory python test/test_fx.py -k test_informative_co_filename python test/test_fx.py -k test_autowrap_functions python test/dynamo/test_utils.py -k test_inductor_provenance ``` ```python # Profile with memory snapshot torch.cuda.memory._record_memory_history() with torch._dynamo.config.patch("enrich_profiler_stack_trace", True): compiled = torch.compile(mod, backend="aot_eager", fullgraph=True) result = compiled(torch.randn(10, 10, device="cuda:0")) torch.cuda.memory._dump_snapshot("memory_snapshot.pickle", augment_with_fx_traces=True) torch.cuda.memory._record_memory_history(enabled=None) ``` <img width="913" height="711" alt="Screenshot 2025-10-30 at 10 40 44 AM" src="https://github.com/user-attachments/assets/8d7a1833-f98d-4756-b666-1d63ab57b27b" /> Pull Request resolved: https://github.com/pytorch/pytorch/pull/166676 Approved by: https://github.com/albanD, https://github.com/ezyang	2025-11-04 17:01:02 +00:00
Catherine Lee	8d4b8ab430	[ez] Print some more test timing info in the logs (#166447 ) You can just subtract timestamps, but this makes it easier Pull Request resolved: https://github.com/pytorch/pytorch/pull/166447 Approved by: https://github.com/Skylion007	2025-11-04 16:45:22 +00:00
Catherine Lee	afd50bdd29	[CI] Use smaller amx + avx2 runners for inductor test? (#164989 ) Results from CI: No failures but generally takes longer, maybe ~20% increase in time? But the smaller runner is ~25% of the cost of the current runner, so in terms of cost this is a decrease If the 20% is too much, we can try the 4x larger runners, which are about half the cost of the current runner, so it would probably still result in cost savings with hopefully less impact to time Pull Request resolved: https://github.com/pytorch/pytorch/pull/164989 Approved by: https://github.com/BoyuanFeng, https://github.com/huydhn viable/strict/1762289446	2025-11-04 16:43:06 +00:00
Simon Layton	56dfd4c74b	Add CUDA MXFP4 scaled mm support via. FBGEMM (#166526 ) Summary: * Pull in `f4f4bf16` from FBGemm to provide MXFP4 support for CUDA * Add testing Test Plan: Reviewers: Subscribers: Tasks: Tags: Signed-off-by: Simon Layton <simonlayton@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/166526 Approved by: https://github.com/drisspg, https://github.com/ngimel	2025-11-04 15:53:16 +00:00
IvanKobzarev	24db5c4451	[inductor] do not hard fail on FakePG with nccl estimator (#166869 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/166869 Approved by: https://github.com/eellison ghstack dependencies: #166521 viable/strict/1762284203	2025-11-04 15:22:38 +00:00
atalman	cc8bfd1206	Docker release build: Use 13.0.0 nvidia docker (#166904 ) Forward fix for failing Docker release builds Related to: https://github.com/pytorch/pytorch/issues/166897 Nightly Docker build failure https://github.com/pytorch/pytorch/actions/runs/18900508440/job/53946606434 Due to missing base image: ``` ERROR: failed to build: failed to solve: docker.io/nvidia/cuda:13.0.2-devel-ubuntu22.04: not found ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/166904 Approved by: https://github.com/tinglvv, https://github.com/malfet	2025-11-04 13:58:10 +00:00
Simon Layton	c45b156605	Fix DeepSeek scaling tensor handling (#166752 ) Summary: cuBlasLt enforces size/stride requirements for 1x128 and 128x128 blockwise scaling kernels, some of which weren't being handled, causing silent incorrect answers especially for 128x128 scaling cases. cuBlasLt enforces ([docs](https://docs.nvidia.com/cuda/cublas/#scaling-factors-layouts)) for deepseek-style scaling, for `A: MxN`, `B: KxN` you have the following: ```Py L = K // 128 L4 = round_up(L, 4) 1x128 x 128x128: * A_scale: [M, K // 128], stride: [1, M] * B_scale: [L4, N // 128], stride: [1, L4] 128x128 x 1x128: * A_scale: [L4, M // 128], stride: [1, L4] * B_scale: [N, K // 128], stride: [1, N] 1x128 x 1x128: * A_scale: [M, K // 128], stride: [1, M] * B_scale: [N, K // 128], stride: [1, N] ``` Notable here is the `L4` term, which means that we must round up to the nearest multiple of 4 blocks in the `K` dimension. This wasn't enforced previously, and caused silent wrong answers where `(K // 128) % 4 != 0`. Test Plan: Reviewers: Subscribers: @vkuzo Tasks: Tags: Signed-off-by: Simon Layton <simonlayton@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/166752 Approved by: https://github.com/drisspg, https://github.com/vkuzo viable/strict/1762278588	2025-11-04 13:32:24 +00:00
Yu, Guangye	8fff7e36b4	[xpu][test] Add UT for expandable segments (#166495 ) # Motivation This PR aims to reuse some UT to validate the expandable segment feature. # Additional Context Currently, the failure is related to the internal track `GSD-11403`, we could get the fix when upgrading the driver to `ci-neo-master-034630` or greater TODO: add test conv and gemm into this test case when upgrading the driver. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166495 Approved by: https://github.com/albanD, https://github.com/EikanWang, https://github.com/gujinghui ghstack dependencies: #166299, #166292, #166424	2025-11-04 08:01:35 +00:00
Edward Z. Yang	82fa2aa269	DTensor: Fix trivial as_strided case, add alias support (#166867 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/166867 Approved by: https://github.com/albanD ghstack dependencies: #166868	2025-11-04 07:18:32 +00:00
Zhang, Jianyi	09e0285608	[xpu][feature][inductor] Enable decompose_mm_pass and UT on Intel GPU (#166613 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/166613 Approved by: https://github.com/hl475	2025-11-04 06:58:05 +00:00
Vensenmu	d980d8dc79	[dynamo] Implement __sym_float__ for SymBool to fix multiplication TypeError (#165264 ) Fixes #164684 ### Description Symbolic tracing fails during multiplication between a `SymBool` and a `Tensor`. This scenario is triggered when `.item()` is called on a 0-dim boolean tensor within a `torch.compile` region. In compile mode, this yields a `SymBool`, and the subsequent `SymBool * FakeTensor` operation is unsupported, leading to a `TypeError` or a data-dependent `UserError`. ### Solution This PR addresses the issue at the type-conversion level, as suggested by reviewers. The root cause of the TypeError is that torch.sym_float() (which is called by _maybe_convert_to_dtype during type promotion for aten.mul) lacks a conversion path for SymBool and incorrectly falls back to builtins.float(SymBool). This fix addresses this by implementing the __sym_float__(self) method within the SymBool class (defined in torch/__init__.py). The torch.sym_float(a) utility function is already designed to check for hasattr(a, "__sym_float__") before falling back to builtins.float(). By adding this method, SymBool instances now correctly advertise their ability to be cast to SymFloat. The new method implementation leverages self.node.sym_float() to correctly convert the symbolic boolean value to its symbolic float representation (0.0 or 1.0), resolving the TypeError at its source. This approach is more fundamental than modifying a specific operation in builtin.py and ensures SymBool can be correctly promoted to SymFloat in any operation, while still preserving its boolean nature for control flow operations like guard_or_false (which is verified by a new test case). ### Verification 1. Bug Reproduced: The initial `UserError: Could not guard on data-dependent expression` was successfully reproduced with the script from the issue. As shown below <img width="1369" height="945" alt="Screenshot 2025-10-13 at 10 29 05" src="https://github.com/user-attachments/assets/8daa4555-3347-4af5-906a-02150b8df9d1" /> 2. Fix Validated: After applying the code changes, the same script now runs to completion, printing `✅ eager success` and `✅ compile success`. As shown below <img width="1228" height="82" alt="Screenshot 2025-10-13 at 10 29 21" src="https://github.com/user-attachments/assets/94c4f143-b898-4dda-9bff-0ad5450a30fa" /> 3. Added a new test class DynamoOpPromotionTests to test/dynamo/test_misc.py with three new test cases: 1. test_symbool_tensor_mul_does_not_fail: Verifies that the original bug report code (with .item() + ) no longer raises an error when compiled. 2. test_symbool_guard_or_false: Verifies that this fix does not cause a regression for guard_or_false(SymBool) (the concern raised by reviewers). 3. test_symbool_tensor_mul: Verifies the behavior of Tensor(bool) Tensor(float) (without .item()) for completeness. All new tests were added and pass locally. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165264 Approved by: https://github.com/laithsakka, https://github.com/Lucaskabela viable/strict/1762253522	2025-11-04 06:33:20 +00:00
Yu, Guangye	c7d00de115	[xpu][fix] Fix XPU oneDNN memory query bug: pointer to array (#166830 ) # Motivation I believe this is a bug - here's why: In [dnnl_common_types.h](`98132c4908/include/oneapi/dnnl/dnnl_common_types.h (L116-L125)`) is defined as a pointer to an `int64_t[12]` array; We can confirm this from the implementation in [memory_desc.cpp](`98132c4908/src/common/memory_desc.cpp (L746-L748)`) where the member indeed points to an internal array. # Solution Therefore, when accessing `md_padded_dims`, we should first dereference the pointer and then use it with an index - directly using it without dereferencing would corrupt memory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166830 Approved by: https://github.com/EikanWang	2025-11-04 06:12:40 +00:00
PyTorch MergeBot	d3cf90ada5	Revert "[inductor] require shape in TritonCSEVariable (#162275 )" This reverts commit c21868b4359586550b12e1d9102283c792f45dff. Reverted https://github.com/pytorch/pytorch/pull/162275 on behalf of https://github.com/izaitsevfb due to breaking test_rms_norm_bwd_float32_split_reductions_True_shape2 ([comment](https://github.com/pytorch/pytorch/pull/162275#issuecomment-3484049109))	2025-11-04 06:06:18 +00:00
Nikhil Patel	0e1a88904f	[Inductor][Grouped Gemm] Add Blackwell CuTeDSL Kernel (#165036 ) Make sure you're on cutlass 4.2.0+ Test Plan: Tritonbench(oss): `clear; CUDA_VISIBLE_DEVICES=2 TRITON_PRINT_AUTOTUNING=1 TRITON_ALWAYS_COMPILE=1 TORCH_LOGS=+inductor TORCHINDUCTOR_FORCE_DISABLE_CACHES=1 TORCHINDUCTOR_MAX_AUTOTUNE_GEMM=1 python run.py --op grouped_gemm --only aten_grouped_mm,preprocessed_pt2_triton_grouped_mm --precision bf16 --num-inputs 1 --metrics tflops,accuracy` Unit Tests(oss): `clear; python test/inductor/test_cutedsl_grouped_mm.py` Differential Revision: D82010227 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165036 Approved by: https://github.com/alexsamardzic, https://github.com/drisspg, https://github.com/mlazos	2025-11-04 05:58:58 +00:00
Stonepia	3232caa078	[XPU][Fix] Register convolution_overrideable for flops count (#166839 ) Fixes #166838 1. Register `convolution_overrideable` key for flop_counter. CUDA relies on keys with `cudnn_convolution`. For devices like `XPU`, it falls to `convolution_overrideable`. Without the correct registration, the flop_couter will silently return 0 for XPU in line: `e1d011d6eb/torch/_inductor/analysis/profile_analysis.py (L178-L179)` 2. Enable the tests when enabling the XPU on `test_analysis.py`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166839 Approved by: https://github.com/guangyey, https://github.com/EikanWang, https://github.com/jansel	2025-11-04 05:56:29 +00:00
Yuanyuan Chen	a6c6acea9d	[11/N] Apply ruff UP035 rule (#166225 ) This PR continues to apply ruff UP035 rule to inductor code. ruff UP035 rule aims to use Python 3.10 syntax and libraries. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166225 Approved by: https://github.com/aorenste	2025-11-04 04:53:40 +00:00
William Wen	55be1cc739	[dynamo, 3.14] add explicit SymFloat int conversion (#166902 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/166902 Approved by: https://github.com/malfet, https://github.com/pianpwk ghstack dependencies: #166757, #166894, #166895	2025-11-04 04:38:03 +00:00
William Wen	344cebda52	[dynamo, 3.14] disable cpython dynamo unittests if 3.14 (#166895 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/166895 Approved by: https://github.com/guilhermeleobas ghstack dependencies: #166757, #166894	2025-11-04 04:38:03 +00:00
William Wen	ba72c6b981	[dynamo, 3.14] fix dynamo error message test for 3.14 (#166894 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/166894 Approved by: https://github.com/malfet ghstack dependencies: #166757	2025-11-04 04:38:03 +00:00
William Wen	888efcc453	[dynamo, 3.14] support tracing type.__dict__[__annotations__].__get__ to trace through typing.get_type_hints (#166757 ) This is covered by `test_get_type_hints` in test/dynamo/test_repros.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/166757 Approved by: https://github.com/Lucaskabela	2025-11-04 04:38:03 +00:00
Jithun Nair	24aa9a2ef7	[ROCm][CI] Add distributed testing back to trunk.yml (#166915 ) Adding distributed testing back to trunk since we have been observing [reasonable queueing](https://hud.pytorch.org/queue_time_analysis?dateRange=30&startDate=2025-10-05T01%3A44%3A55.924Z&endDate=2025-11-04T01%3A44%3A55.925Z&granularity=week&chartType=bar&repos=pytorch%2Fpytorch&category=machine_type&machineTypes=linux.rocm.gpu.gfx942.1&items=linux.rocm.gpu.gfx942.1) based on current MI3xx capacity. Partially addresses https://github.com/pytorch/pytorch/issues/166108. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166915 Approved by: https://github.com/jeffdaily	2025-11-04 04:29:29 +00:00
Yu, Guangye	f70faf2b9a	[xpu][feature] Introduce PeerToPeerAccess API for XPU (#166424 ) # Motivation This PR introduces support for peer-to-peer (P2P) access between devices, including querying and enabling P2P connections between two devices. It supports two categories of allocations: - Regular allocations; - Expandable segment allocations. # Additional Context The follow-up is that we should use this feature to optimize our copy kernel when P2P is supported. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166424 Approved by: https://github.com/gujinghui, https://github.com/albanD ghstack dependencies: #166299, #166292	2025-11-04 04:03:28 +00:00
Yu, Guangye	167e64ba1a	[xpu][feature] Support expandable segment feature for XPU (#166292 ) # Motivation This PR intends to add expandable segment feature support on XPU. This will help - Reduce memory fragmentation; - Gradually map physical pages into virtual address space as needed. # Additional Context The traditional caching allocator frequently allocates and frees device memory blocks. However, over time, with varying tensor size, the device address space becomes fragmented. Even when there's enough total free memory, a lack of contiguous space can cause large allocations to fail. The expandable segment feature addresses this by dynamically extending physical memory within a reserved virtual address range, reducing fragmentation and minimizing reallocation overhead. The potential drawbacks are - Virtual memory overhead; - Potential page mapping overhead; - Increased complexity. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166292 Approved by: https://github.com/albanD, https://github.com/EikanWang, https://github.com/gujinghui ghstack dependencies: #166299	2025-11-04 04:03:28 +00:00
Yu, Guangye	875b18d53c	[xpu][feature] Introduce ExpandableSegment for XPU (#166299 ) # Motivation This PR intends to add `ExpandableSegment` struct, which is used to help support the expandable segment feature. I split it to a single PR to facilitate the code review. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166299 Approved by: https://github.com/EikanWang, https://github.com/albanD, https://github.com/gujinghui	2025-11-04 04:03:28 +00:00
Pian Pawakapan	eec3749c44	[DebugMode] .fwd_stack_trace for autograd bwd ops (#166842 ) In #166440, didn't realize you could turn on anomaly mode while disabling NaN checks for these stacks. Adding them to `debug_mode.operators[*].fwd_stack_trace`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166842 Approved by: https://github.com/yushangdi, https://github.com/mikaylagawarecki	2025-11-04 03:28:43 +00:00
Guilherme Leobas	40133fe966	Fix MSCV C++ compilation error of `pycore_stackref.h` header (#165686 ) Wraps the header in a C file and compile it using a C compiler, which should support designated initializers Fix issue #160647 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165686 Approved by: https://github.com/williamwen42	2025-11-04 02:51:16 +00:00
Animesh Jain	f288433d3e	[dynamo] Raise on as_python_constant error on getattr (#166909 ) This ensures that we graph break at the right time, leading to the right stack trace. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166909 Approved by: https://github.com/tugsbayasgalan	2025-11-04 02:45:59 +00:00
Stonepia	864633fca0	[xpu][test] Enable test_fxir_backend tests for XPU (#166493 ) This PR enables `test_fxir_backend.py`'s tests formerly skipped xpu tests. No additional changes needed for the features. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166493 Approved by: https://github.com/angelayi, https://github.com/EikanWang	2025-11-04 02:14:46 +00:00
Isuru Fernando	c21868b435	[inductor] require shape in TritonCSEVariable (#162275 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/162275 Approved by: https://github.com/mlazos ghstack dependencies: #164158	2025-11-04 02:13:41 +00:00
Ved Thorat	a0a8eca01a	Fixes torch.compile(nn.ModuleList()) changes bool() behavior (#159208 ) Fixes #159139 ## The Cause The bug occurs because the OptimizedModule wrapper in torch._dynamo.eval_frame doesn't call the len method. This causes Python's bool() check to fall back to the default object truthiness (always True) instead of correctly evaluating containers with len() == 0 as False. ## The Fix A very easy fix . I just added the len method to OptimizedModule in torch._dynamo.eval_frame class to delegate the call to the original module ```python def __len__(self): """ Proxy the len() call to the original module to fix truthiness checks. """ return len(self._orig_mod) ``` This successfully fixes the issue . The script now works as expected. ## Reproduction Script ```python import torch import torch.nn as nn # Create an empty nn.ModuleList original = nn.ModuleList() # Compile it using torch.compile compiled = torch.compile(original) # Compare their boolean evaluations print(f"bool(original): {bool(original)}") print(f"bool(compiled): {bool(compiled)}") # Trigger failure if they differ assert bool(original) == bool(compiled), "BUG: truthiness behavior mismatch after compilation" ``` ## Output bool(original): False bool(compiled): False Pull Request resolved: https://github.com/pytorch/pytorch/pull/159208 Approved by: https://github.com/Lucaskabela Co-authored-by: pushkar-hue <pushkarsharma.rtm@gmail.com> Co-authored-by: Lucas Kabela <lucasakabela@gmail.com> viable/strict/1762237630	2025-11-04 02:12:10 +00:00
Guilherme Leobas	0958f307d9	Add `_heapq` polyfill (#161093 ) ---- * Redirect `_heapq.` functions to the python implementation Handle TypeError in PolyfilledFunctionVariable to raise observed exceptions * Implement `__next__` method in IteratorVariable class Pull Request resolved: https://github.com/pytorch/pytorch/pull/161093 Approved by: https://github.com/Lucaskabela	2025-11-04 02:11:33 +00:00
Lucas Kabela	7551507c41	[BE][Typing][Dynamo] Type torch/_dynamo/variables/builtin.py (#166745 ) Provides type coverage to torch/_dynamo/variables/builtin.py ### Coverage report: `mypy torch/_dynamo/variables/builtin.py --linecount-report /tmp/coverage_log` Compare before to after - we go from 2213 lines and 64 funcs covered to 3212 lines and 85 funcs covered Pull Request resolved: https://github.com/pytorch/pytorch/pull/166745 Approved by: https://github.com/williamwen42 viable/strict/1762233925	2025-11-04 01:33:10 +00:00
Yuanyuan Chen	f92834d477	Fix unused assignments (#166791 ) This PR cleans up unused assignments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166791 Approved by: https://github.com/xmfan	2025-11-04 01:07:19 +00:00
Yuanyuan Chen	e1fc01bef8	Enable clang-tidy on some excluded headers (#166835 ) This PR enables clang-tidy on some excluded headers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166835 Approved by: https://github.com/Skylion007 viable/strict/1762231859	2025-11-04 00:37:32 +00:00
Jane Xu	22a745737a	Remove ifndef C10_MOBILE around aoti_torch_abi_version impl (#166882 ) See if after the headeronly migration the mobile build would still fail. Pull Request resolved: https://github.com/pytorch/pytorch/pull/166882 Approved by: https://github.com/mikaylagawarecki	2025-11-04 00:37:22 +00:00
Parshant Sharma	ee708ea96c	fix test_type_hints (#163150 ) Fixes #163149 ### Summary: Fixes mypy type checking failures in `test_type_hints` by consolidating typing imports and eliminating duplicate/conflicting import patterns that caused mypy to fail resolving type annotations. ### Impact: - `test_type_hints` works fine now - module: tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/163150 Approved by: https://github.com/Skylion007	2025-11-04 00:29:22 +00:00
Nicolas De Carli	64819e3701	[Pytorch] Improve conversion from bf16 on aarch64/NEON (#166880 ) Summary: Conversion from/to bfloat16 was not getting covered by conversion templates, because these used bfloat16_t as data type instead of the custom c10::BFloat16 Conversion by casting from/to bfloat16_t is broken in clang-[17, 20], fixed in clang-21. Because Pytorch does not currently have CI running binaries compiled using clang-21, we won't implement this approach for now. We are currently only adding conversion from bfloat16, as it can be implementing by zero-extending into a 4-byte float. We've observed the following performance improvements, when compiling with clang-19 and targeting armv9a+sve2: Before: bfloat16_t->uint8 ===> 423.583us bfloat16_t->int8 ===> 424.090us bfloat16_t->int16 ===> 430.817us bfloat16_t->int64 ===> 571.547us bfloat16_t->double ===> 459.089us After: bfloat16_t->uint8 ===> 123.783us ----> 342% higher throughput bfloat16_t->int8 ===> 131.575us -----> 322% higher throughput bfloat16_t->int16 ===> 136.794us ----> 315% higher throughput bfloat16_t->int64 ===> 177.699us ----> 322% higher throughput bfloat16_t->double ===> 165.556us ---> 277% higher throughput Test Plan: Correctness: buck2 test mode/opt //caffe2/test:test_ops buck2 test mode/opt //caffe2/test:torch Performance: buck2 run mode/opt //caffe2/benchmarks/operator_benchmark/fb:operator_benchmark_test Differential Revision: D86119613 Pull Request resolved: https://github.com/pytorch/pytorch/pull/166880 Approved by: https://github.com/mcfi, https://github.com/aditew01 viable/strict/1762230349	2025-11-04 00:19:42 +00:00
PyTorch MergeBot	79ff2c66c8	Revert "Fix unused assignments (#166791 )" This reverts commit 5125872aeb737fa20ea2ec08338e9342cba694e7. Reverted https://github.com/pytorch/pytorch/pull/166791 on behalf of https://github.com/cyyever due to incomplete PR ([comment](https://github.com/pytorch/pytorch/pull/166791#issuecomment-3483116247))	2025-11-04 00:13:50 +00:00

1 2 3 4 5 ...

95532 Commits