Commit Graph

94551 Commits

Author SHA1 Message Date
d795fb225a [RFC] Add pyrefly to lintrunner (#165179)
This will add pyrefly to lint runner as a warning only - and allow us to collect feedback about the tool before switching to pyrefly as the main type checker.

References the steps outlined here: : https://github.com/pytorch/pytorch/issues/163283:

test plan:
`lintrunner init`
`lintrunner`
confirm when pyrefly errors are present results look like: https://gist.github.com/maggiemoss/e6cb2d015dd1ded560ae1329098cf33f

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165179
Approved by: https://github.com/ezyang
trunk/d795fb225ace717f692ceb3f1d20dfb35afbdf8a
2025-10-16 20:07:09 +00:00
7df9aca529 [ROCm][Windows] Enable AOTriton runtime compile on Windows (#165538)
AOTriton uses prebuilt runtime binaries if the user's ROCm version matches the ones used to generate the prebuilt runtime. However, since there's no prebuilt runtime available for Windows, this check needs to be bypassed for Windows. This PR enables it by changing condition to always build AOTriton runtime from source on Windows.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165538
Approved by: https://github.com/xinyazhang, https://github.com/jeffdaily
trunk/7df9aca52946ae47ca4d98dbe0685a412fbc77b8
2025-10-16 19:51:43 +00:00
d4a713cd9c Change forkserver test to only run below 3.13.8 (#165667)
A multiprocessing bug is fixed in 3.13.8, see [https://docs.python.org/3.13/whatsnew/changelog.html](https://l.workplace.com/l.php?u=https%3A%2F%2Fdocs.python.org%2F3.13%2Fwhatsnew%2Fchangelog.html&h=AT0qUhHJq5c2UJvQaq9_MrSo0mVhwn1VOfq1nDQl2C1UOhDI80RMbzVayhG7LSAT1uYHKtkftKnBDwiGMhbw0YRvQLe5vwE01qejpPFautHvU3LXeOE1KChPykqz3qnCRzk7czu_iNzQ05shR4F1N_qYOzR5YxejA52ZZQ), [gh-126631](https://github.com/python/cpython/issues/126631)

So this test will fail when we update to python 3.13.8
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165667
Approved by: https://github.com/malfet
trunk/d4a713cd9c8ea1dc13917d3311d73c13914306a6
2025-10-16 19:34:10 +00:00
5daef30b26 158232 Fix autocast cache incorrectly retaining no_grad state (#165068)
Fixes #158232
The autocast caching heuristic in `aten/src/ATen/autocast_mode.cpp:139` did not account for gradient mode state when deciding whether to cache. FSDP2 is not directly related.

~~This PR adds `GradMode::is_enabled()` check to caching condition. Caching is now disabled in `no_grad()` contexts to prevent storing tensors with incorrect gradient state. Ensures correctness at the cost of using cache.~~
This PR proposes separate caches for gradient-enabled and gradient-disabled modes.
Adds tests.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165068
Approved by: https://github.com/ngimel, https://github.com/janeyx99
trunk/5daef30b26b794d237fbbc399c1d47ec0380200a
2025-10-16 19:32:01 +00:00
6dedd34c31 [CD] Skip 12.9 build on Windows (#165665)
Per title

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165665
Approved by: https://github.com/Camyll, https://github.com/malfet
trunk/6dedd34c31b9b9ba3a91931efe79eee99cd56cef
2025-10-16 19:11:27 +00:00
a303d6dda9 [inductor] don't try to reorder loops for template (#165601)
fix https://github.com/pytorch/pytorch/issues/165579

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165601
Approved by: https://github.com/yushangdi
trunk/a303d6dda9532f6e6a8e0776ba866727df28b721
2025-10-16 19:05:21 +00:00
7669ac9402 [ROCm] Add scaled_mm v2 support. (#165528)
Add mx fp4 support in Blas.cpp.
Updated the scale_kernel_dispatch array and ScaledGemmImplementation enum to include MXFP4 support.
Modify the tests under test_scaled_matmul_cuda accordingly.

PYTORCH_TEST_WITH_ROCM=1 python test/test_scaled_matmul_cuda.py -v -k test_blockwise
115 test passed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165528
Approved by: https://github.com/jeffdaily
trunk/7669ac940280f3af50ef5ec2a41d788df91abdbc
2025-10-16 18:36:41 +00:00
86fd4fc23e [DeviceMesh] Simplify unflatten method (#165556)
By adding a few small helpers (e.g., a `splice` method to `_MeshLayout`, and making `_init_process_groups` static and thus stateless) we can substantially shorten the definition of the unflatten method, and help readability.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165556
Approved by: https://github.com/fduwjj
ghstack dependencies: #165554, #165555
trunk/86fd4fc23e697e275d37c36e3cbe521f156434fd
2025-10-16 18:36:16 +00:00
99097b6d89 [DeviceMesh] Introduce private constructor instead of _create_mesh_from_ranks (#165555)
The refactoring of DeviceMesh is heavily constrained by the signature of its constructor, which is a public API which contains some "legacy" concepts which we'd love to get rid of, such as an explicit/materialized `mesh` Tensor.

In other languages the solution to this would be to add a private overload of the constructor. Python doesn't natively allow this, but in this PR I managed to build something that approximates it.

This new private constructor basically only takes `_layout`, `_global_rank_permutation`, and `mesh_dim_names`.

With such a constructor we can effectively simplify a lot of callsites and get rid of the `_create_mesh_from_ranks` helper method. That's a good thing because it was instantiating many DeviceMeshes in a for loop, which always felt unnecessary.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165555
Approved by: https://github.com/fduwjj, https://github.com/fegin
ghstack dependencies: #165554
2025-10-16 18:36:16 +00:00
eqy
a214371008 [FP8] Add other Blackwell compute-capabiilities to expected fail test_honor_sm_carveout (#165159)
CUTLASS SM hint also isn't working for other Blackwells, need green context for carveout

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165159
Approved by: https://github.com/Skylion007
trunk/a21437100815725eaaa086aafca2c12ca3e8cedb
2025-10-16 18:35:06 +00:00
7d87d7052e [inductor][bucketing] Fx collectives bucketing of multiple dtypes (#162470)
Bucketing of multiple dtypes to be processed in one bucketed collective.

First target is to bucket bf16 and f32, but already can be used with other dtypes.

For now multidtype bucketing is only supported with "custom_ops" mode.
Non custom_ops needs additional work on inductor side.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162470
Approved by: https://github.com/eellison
trunk/7d87d7052ef40fc802d8340c6a56ce3b7beb8407
2025-10-16 18:31:43 +00:00
1a34ff4e04 Fixing get_local_rank() variable missing when compiled (#165432)
Fixes #165215

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165432
Approved by: https://github.com/bdhirsh
trunk/1a34ff4e04ea45d58f3d49d560086ba256702ccc
2025-10-16 18:20:34 +00:00
fe5ccb1a74 bf16 support for per tensor backward (#165362)
Adding bf16 for the backward pass of `torch._fake_quantize_learnable_per_tensor_affine()`.

Note that for testing, we modified the seed to avoid increasing tolerance due to cases where difference in Python vs CPP downcasting causes tensor mismatches. (e.g. 27.87704 vs  27.8408 before downcasting, 27.7500 vs 27.8750 after downcasting for Python vs CPP op)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165362
Approved by: https://github.com/andrewor14
trunk/fe5ccb1a74b983ecc9e111b704c62e2129e7e03f
2025-10-16 17:47:01 +00:00
85586d7efc Make c7i the default for _linux-build.yml (#164747)
Use linux.c7i.2xlarge as the default runner for the _linux-build.yml workflow. In testing we found that switching from c5 - c7i grants a 15-20% faster build times despite c7i costing 5% more. This should reduce costs of jobs using _linux-build.yml.

Relates to pytorch/test-infra#7175.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164747
Approved by: https://github.com/atalman
trunk/85586d7efcefb36d44264d1019f71ea58d8c472b
2025-10-16 17:37:51 +00:00
e1d71a6b35 Revert "12/n : Remove fbandroid_compiler_flags (#165558)"
This reverts commit d7ffa8b8a29ba6071c51499c1df3d702d0a26f72.

Reverted https://github.com/pytorch/pytorch/pull/165558 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/165558#issuecomment-3411879769))
trunk/e1d71a6b35318c5d492a3900c84b904be8b8c9de
2025-10-16 17:18:56 +00:00
d61a9b88cf [DeviceMesh] Prefer using _layout over _mesh for all sorts of things (#165554)
The goal of this PR is to avoid storing the explicit `mesh` Tensor inside each DeviceMesh, and instead compute it on-the-fly when the end user needs it, and try to replace all of its internal usages with `_layout` and the newly-introduced `_global_rank_permutation` Tensor. The name of this attribute is up for debate. The advantage of the `_global_rank_permutation` Tensor is that it is _the same_ Tensor for the root mesh and all its children, so it doesn't need to be copied/reallocated.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165554
Approved by: https://github.com/fduwjj
trunk/d61a9b88cf3be04a29c5a7d6e9622ae5e8d51de3
2025-10-16 17:01:44 +00:00
99b32a6750 [inductor] print 0.0 as 0 for triton (#164291)
Fixes https://github.com/pytorch/pytorch/issues/164157
Fixes https://github.com/pytorch/pytorch/issues/164086

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164291
Approved by: https://github.com/bobrenjc93
ciflow/mps/165453 ciflow/trunk/165453 ciflow/inductor/165453 trunk/99b32a6750bfd0cfe2bc84a47823e1da34802b7b
2025-10-16 16:37:50 +00:00
783da8b8e7 Repro for property related Dynamo graph break (#165609)
Signed-off-by: Edward Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165609
Approved by: https://github.com/albanD, https://github.com/gchanan, https://github.com/malfet, https://github.com/anijain2305
trunk/783da8b8e7f3af90c5b8bde4c849768bd2860834 viable/strict/1760646330
2025-10-16 16:22:43 +00:00
ed74dc054d add the option to disable functionalization in AOTDispatcher (#164577)
I'm cleaning this PR up as a proper way of disabling functionalization via config in AOTDispatcher. I removed the non-functionalization related changes from the original version:

(1) preventing proxy mode (and functionalization) from incorrectly decomposing CIA ops (Ed has a PR for it here: https://github.com/pytorch/pytorch/pull/164939)

(2) preventing python-dispatcher-based decomps above autograd from running. I'm not doing this for now, will likely do it in a followup

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164577
Approved by: https://github.com/ezyang
ghstack dependencies: #165372
trunk/ed74dc054d45ede6ebf77e1e1b7e2a7a15612e55
2025-10-16 15:44:11 +00:00
f33c7e1a43 add and fix OpInfo tests for the default partitioner (#165372)
I noticed the default partitioner was breaking in some dynamic shape tests, so prior to turning off functionalization I want to tweak it to pass all of our OpInfo tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165372
Approved by: https://github.com/ezyang
2025-10-16 15:44:11 +00:00
219fb6aafc Refactor CUDAAllocatorConfig using ConfigTokenizer (#165281)
* #165129
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165281
Approved by: https://github.com/albanD
ghstack dependencies: #165129, #165131, #165135, #165136
trunk/219fb6aafc6203a1be68798ced470a26e7a2a5d3
2025-10-16 15:26:50 +00:00
515b5ff539 Remove unused code in CUDAAllocatorConfig (#165136)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165136
Approved by: https://github.com/Skylion007
ghstack dependencies: #165129, #165131, #165135
2025-10-16 15:26:50 +00:00
608a6d4a26 Reuse AcceleratorAllocatorConfig in CUDAAllocatorConfig (#165135)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165135
Approved by: https://github.com/Skylion007
ghstack dependencies: #165129, #165131
2025-10-16 15:26:40 +00:00
03e5dbb26e Register CUDAAllocatorConfig to AcceleratorAllocatorConfig (#165131)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165131
Approved by: https://github.com/Skylion007
ghstack dependencies: #165129
2025-10-16 15:26:28 +00:00
7ee45f7503 Restore AcceleratorAllocatorConfig to avoid potential regression (#165129)
# Motivation
This PR aims to restore `AcceleratorAllocatorConfig` to avoid the potential regression mentioned in https://github.com/pytorch/pytorch/pull/160666#issue-3323270375
These code change would be reverted in the following PR https://github.com/pytorch/pytorch/pull/165304
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165129
Approved by: https://github.com/albanD
2025-10-16 15:26:17 +00:00
e6d9d68598 [Bugfix][Dynamo] Fix Sparse tensors by graph break in Dynamo (#164873)
Fixes #164823 by making lack of support for sparse tensors very explicit (in fake tensor, inductor, and lowering code)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164873
Approved by: https://github.com/williamwen42, https://github.com/eellison, https://github.com/mlazos
trunk/e6d9d685986c9b46013a6bef99ecf532a481b8e8
2025-10-16 15:06:20 +00:00
1a5b7eca7b [BE] Fold cond into TORCH_CHECK(false,...) (#165593)
Replace `if (!foo) { TORCH_CHECK(false, "bar");}` with `TORCH_CHECK(foo,"bar");`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165593
Approved by: https://github.com/albanD
ghstack dependencies: #165594
trunk/1a5b7eca7b6a0a73a6d4c03ebe8c45fbb0c115ae viable/strict/1760642528
2025-10-16 15:00:30 +00:00
8573574b32 [MPS] sparse mask implementation (#165102)
sparse mask implementation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165102
Approved by: https://github.com/malfet
trunk/8573574b3242d93f3844c7c0bc8fec913eca3e19 viable/strict/1760640743
2025-10-16 14:31:00 +00:00
e6033f6efb [MPS] Improve index_fill_ error handling (#165594)
It shoudl not throw "Cannot convert a float64 Tensor to MPS", but rather a sensible "Converting complex Scalar to non-complex type is not supported".
Add TODO about the complex support, probably good reason to rip out MPSGraph from index_fill as well
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165594
Approved by: https://github.com/dcci, https://github.com/kulinseth
trunk/e6033f6efb20e717c41a32bfddeeb638387a2e76
2025-10-16 14:18:39 +00:00
9272437cde Fx collectives bucketing: add bucket all_reduce (#165351)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165351
Approved by: https://github.com/eellison
trunk/9272437cde67fcbb7dde66373382f711fd189418 viable/strict/1760635490
2025-10-16 13:27:33 +00:00
f06e669f6c refactor: replace runtime_error with TORCH_CHECK for better error handling (#163628)
Fixes some parts of issue #148114

@pytorchbot label "topic: not user facing"

@FFFrog PTAL
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163628
Approved by: https://github.com/albanD
trunk/f06e669f6c5a0b1840dc57224fecc1a27d46b049 viable/strict/1760628335
2025-10-16 11:09:48 +00:00
69b05913fb Revert "Add mingw to docker (#165560)"
This reverts commit 5e480b8ecf870e4a466c165701ab0e9d055f2ceb.

Reverted https://github.com/pytorch/pytorch/pull/165560 on behalf of https://github.com/pytorch-auto-revert due to Reverted automatically by pytorch's autorevert, to avoid this behaviour add the tag autorevert: disable ([comment](https://github.com/pytorch/pytorch/pull/165560#issuecomment-3409814274))
trunk/69b05913fb0332f9a938c74e26b106e2bd24d82e viable/strict/1760619733
2025-10-16 08:42:11 +00:00
d73c283c3a [CUDA] Large tensor maxpool crash fix (#165374)
Fixes #165297

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165374
Approved by: https://github.com/eqy, https://github.com/malfet
trunk/d73c283c3a315cbed83e1795bb05db8ec315c48a
2025-10-16 07:59:46 +00:00
eaeaa08e3a [PowerPC] Disable MKLDNN TF32 on PowerPC to fix build failure (#163454)
The commits f4d8bc46c7706f872abcb4ec41f0b32207d5d826 added TF32 support for x86 CPUs,
which causes build failures on PowerPC systems with mkldnn.

This patch disables TF32 paths on PowerPC while keeping x86 TF32 support intact,
allowing PyTorch to build successfully on PowerPC.

I have run the mkldnn test case on PowerPC, and it passed successfully.

`pytest test/test_mkldnn.py
87 passed, 2 skipped in 1709.02s (0:28:29`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163454
Approved by: https://github.com/jgong5, https://github.com/malfet
trunk/eaeaa08e3a8071be46f833f7b46aa642ec14e0f7
2025-10-16 06:13:59 +00:00
d0c32971b4 Refine XPU allocator message when OOM (#165509)
# Motivation
Provide more information and align with other backends to enhance the user experience.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165509
Approved by: https://github.com/EikanWang
ghstack dependencies: #165508
trunk/d0c32971b41ba9b9e9b8953beb8c29dd275ebdd3
2025-10-16 05:47:49 +00:00
d7ffa8b8a2 12/n : Remove fbandroid_compiler_flags (#165558)
Summary:
Currently `get_c2_fbandroid_xplat_compiler_flags()` is reading the `caffe2.strip_glog` buckconfig which we want to get rid of.
This diff removes the `fbandroid_compiler_flags` arg and merges it with compiler_flags with a nested select and the select version of the method

The goal is to get rid of all the usages of `get_c2_fbandroid_xplat_compiler_flags()` so that we can get rid of the `caffe2.strip_glog` buckconfig

Test Plan: CI

Differential Revision: D84626885

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165558
Approved by: https://github.com/malfet
trunk/d7ffa8b8a29ba6071c51499c1df3d702d0a26f72
2025-10-16 05:46:02 +00:00
00afa06800 Add cse for make_block_ptr in Triton codegen (#163399)
Summary: per title

Test Plan: added test cases

Differential Revision: D82648215

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163399
Approved by: https://github.com/jansel, https://github.com/njriasan
trunk/00afa06800b7af2aefabeb50c006c45edf3a233c
2025-10-16 05:29:48 +00:00
5d0b22008d Codemod inductor/fx_passes from Optional to union none (#165606)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165606
Approved by: https://github.com/aorenste
ghstack dependencies: #165604, #165605
trunk/5d0b22008d4e4f8d73d5e16d4dc2029fd801bba0
2025-10-16 04:59:47 +00:00
ab6014a903 Codemod inductor/runtime from Optional to union none (#165605)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165605
Approved by: https://github.com/aorenste
ghstack dependencies: #165604
2025-10-16 04:59:47 +00:00
f6daffc54d Codemod codecache.py from Optional to union none (#165604)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165604
Approved by: https://github.com/aorenste
2025-10-16 04:59:37 +00:00
66b75693ae Reuse kLargeBuffer in XPUCachingAllocator (#165508)
# Motivation
Reuse the shared code.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165508
Approved by: https://github.com/EikanWang
trunk/66b75693aeda0f0219106839ed02e9c7577f0bec
2025-10-16 04:12:52 +00:00
21697feff2 [hop] run local_map with interpreter to preserve fx_traceback annotations (#165336)
We have an issue when using fx_traceback.annotate and HOPs that trace joint graphs. HOPs have bodies that have already been traced by Dynamo, and after Animesh's PR, does have the annotations. But when we lower that Dynamo HOP body to aten in either pre-dispatch or post-dispatch, we need to propagate the annotations to the aten nodes.

AOTAutograd does this indirectly by piggybacking off the `PropagateUnbackedSymInts` fx.Interpreter. I'm not sure if all HOPs should be using it to trace their joints or not. This PR adds an interpreter to local_map's implementation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165336
Approved by: https://github.com/yushangdi
trunk/21697feff257ad04dd916ef63b8b841c38f7e9ee
2025-10-16 02:53:17 +00:00
12fa4192c5 [ContextParallel] add process-time based Round-Robin load-balance to CP (#163617)
**Summary**
The load-balancing problem can be modeled as [identical-machines scheduling](https://en.wikipedia.org/wiki/Identical-machines_scheduling) problem. We already provided an easy-to-extend interface in #161062 for
implementing load-balancing and in this PR we start with adding a Round-Robin solution as an example
and also a verification. This can be easily adapted to other solutions like Shortest-processing-time-first/
Longest-processing-time-first with extra padding added for collectives.

- Added a new type of `_LoadBalancer` implementation `_PTRRLoadBalancer` which is designed for
`flex_attention()`. This load-balance strategy analyzes the `BlockMask` sparsity info and perform
Round-Robin (unlike traditional Round-Robin doing it in circular order, we do in zig-zag order).
- Make `_context_parallel_buffers` and `context_parallel_unshard` handle batched load-balance
index (previously it can only handle non-batched load-balance index), like in `create_cp_block_mask`.

**Test**
`pytest test/distributed/tensor/test_attention.py`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163617
Approved by: https://github.com/fegin
trunk/12fa4192c5e6440d400aa45ccb4f33f0f5f36ace
2025-10-16 02:20:27 +00:00
23fb7e9f4b [CI] Add arch prefix in front of op benchmark results (#165584)
To be able to run x86 and aarch64 benchmarks later on
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165584
Approved by: https://github.com/huydhn
ghstack dependencies: #165583
trunk/23fb7e9f4b564e9f00c26231c9d9c3138eaff8ba
2025-10-16 01:50:52 +00:00
5e480b8ecf Add mingw to docker (#165560)
Add mingw to `pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11` docker image to support AOTI cross-compilation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165560
Approved by: https://github.com/malfet
ghstack dependencies: #165574
trunk/5e480b8ecf870e4a466c165701ab0e9d055f2ceb
2025-10-16 01:31:50 +00:00
19ba506ca3 Support libtorch and posix mingw flavor (#165574)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165574
Approved by: https://github.com/desertfire
2025-10-16 01:31:50 +00:00
003dd13073 [dynamo, guards] Better error messages when generated guard fails on the same frame (#165242)
Not sure what exactly we want to have in the message, but that's easy to adjust. I tried to find a reliable test to reproduce this message (happens only when a guard fails right after it's created), but I ended up mocking a `guard_manager.check` function to return `False` to trigger this behavior. I think that's fine, because any other case that we pick (like datetime.now()), we want to patch one day anyway, so every time we make the next patch, will need to chase for another repro test

@williamwen42

Fixes #164990

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165242
Approved by: https://github.com/williamwen42
trunk/003dd130730993eedc302f769b7b653016ab6450 viable/strict/1760592311
2025-10-16 01:05:31 +00:00
c2bd41ac9f Build vLLM nightly wheels for CUDA 13.0 (#163239)
Now that https://github.com/vllm-project/vllm/pull/24599 has been merged
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163239
Approved by: https://github.com/malfet, https://github.com/atalman
trunk/c2bd41ac9f64cd873afa8a061f14192adaadbf7e
2025-10-16 01:03:26 +00:00
ca8bd5dbed Move toString(ScalarType) and ScalarType ostream operator to headeronly (#164405)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164405
Approved by: https://github.com/Skylion007, https://github.com/janeyx99
ghstack dependencies: #164350, #164354
trunk/ca8bd5dbedb5b46f78026e0378b0f47500ddba38
2025-10-16 00:55:43 +00:00
26f3803433 Remove workaround to old CUDA bug (#164354)
As in the title.

A check for https://github.com/pytorch/pytorch/issues/164348 to see if the workaround can be removed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164354
Approved by: https://github.com/janeyx99, https://github.com/ngimel, https://github.com/malfet, https://github.com/jeffdaily
ghstack dependencies: #164350
2025-10-16 00:55:43 +00:00