1867 Commits

Author SHA1 Message Date
b3b4d28f4c [submodule][cutlass] Update pin to b995f93 v4.0.0 (#157376)
@Skylion007 seems afk. https://github.com/pytorch/pytorch/pull/153541

https://github.com/NVIDIA/cutlass/releases/tag/v4.0.0
Pull Request resolved: https://github.com/pytorch/pytorch/pull/157376
Approved by: https://github.com/drisspg, https://github.com/Skylion007
2025-07-07 16:55:47 +00:00
b5ce77c1f5 [ROCm] Initial AITER Integration for mha_bwd asm kernels (#152630)
Generates AITER plumbing via cmake. Calls into fav3 asm bwd CK kernels.

Update submodule composable kernel for this change

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152630
Approved by: https://github.com/xw285cornell, https://github.com/yoyoyocmu
2025-07-01 02:53:27 +00:00
ee56e9f8a8 [BE] Make Eigen an optional dependency (#155955)
Whose version is controlled by `eigen_pin.txt`, but which will be installed only if BLAS providers could not be found.
Why this is good for CI: we don't really build with Eigen ever and gitlab can be down when github is up, which causes spurious CI failures in the past, for example.

Remove eigen submodule and replace it with eigen_pin.txt

Fixes https://github.com/pytorch/pytorch/issues/108773
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155955
Approved by: https://github.com/atalman
2025-06-21 03:02:02 +00:00
208ec60e72 Revert "[BE] Make Eigen an optional dependency (#155955)"
This reverts commit 1b50c12584909bda00009f4f0fd0d38ec792d019.

Reverted https://github.com/pytorch/pytorch/pull/155955 on behalf of https://github.com/atalman due to need to revert eigen test ([comment](https://github.com/pytorch/pytorch/pull/155955#issuecomment-2992512124))
2025-06-20 18:43:52 +00:00
1b50c12584 [BE] Make Eigen an optional dependency (#155955)
Whose version is controlled by `eigen_pin.txt`, but which will be installed only if BLAS providers could not be found.
Why this is good for CI: we don't really build with Eigen ever and gitlab can be down when github is up, which causes spurious CI failures in the past, for example.

Remove eigen submodule and replace it with eigen_pin.txt

Fixes https://github.com/pytorch/pytorch/issues/108773
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155955
Approved by: https://github.com/atalman
ghstack dependencies: #155947, #155954
2025-06-20 17:21:27 +00:00
72c8751b61 Align meta deducing for fft_r2c with fft_r2c_mkl on XPU (#156048)
There is a memory layout mismatching between `fft_r2c` XPU and Inductor meta deducing.
Original `fft_r2c` Inductor meta deducing for XPU backend is aligned with CPU (fallback). This PR is to correct the Inductor meta deducing and update the torch-xpu-ops commit to [intel/torch-xpu-ops@`3a9419c`](3a9419c8bb).
The XPU implementation first performs the R2C transform on the last dimension, followed by iterative C2C transforms on the remaining dimensions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156048
Approved by: https://github.com/guangyey, https://github.com/etaf, https://github.com/jansel
2025-06-20 01:41:03 +00:00
d99cac2816 [Kineto][submodule] Update kineto pin for XPU toggle feature (#155488)
Part of #154898
Update kineto submodule

Summary: We add the toggleCollectionDynamic functionality to XPUPTI in Kineto, so profiler can be enabled/disabled dynamically.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155488
Approved by: https://github.com/guangyey, https://github.com/sraikund16
2025-06-18 12:39:58 +00:00
4a4cac0cef Update torch-xpu-ops commit pin (#154962)
Update the torch-xpu-ops commit to [intel/torch-xpu-ops@`a3a196`](a3a196ccdb) includes:

- Enhanced Adaptive Average Pooling 2D Backward Kernel for performance and code simplification
- Group Norm Backward Optimization with vectorization and parallel reduction
- Support CL path for MaxUnpooling2d and MaxUnpooling3d
- Rename USE_ONEMKL as USE_ONEMKL_XPU and set it as default ON
- Refactor USE_XCCL & USE_C10D_XCCL option
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154962
Approved by: https://github.com/EikanWang
2025-06-09 15:54:13 +00:00
956716880f [c10d][gloo] Enable using c10::Half for gloo (#153862)
Testing with https://github.com/pytorch/gloo/pull/446 and we see that the numerical issues reported in https://github.com/pytorch/pytorch/issues/152300 is indeed resolved and we added a unit test for it. Also update submodule gloo to reflect the change on the gloo side.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153862
Approved by: https://github.com/d4l3k, https://github.com/clee2000, https://github.com/malfet
2025-06-04 17:53:08 +00:00
cyy
f6275bf0fe Bump pocketfft submodule to the latest (#154845)
Fixes #154843

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154845
Approved by: https://github.com/Skylion007, https://github.com/malfet
2025-06-02 14:54:13 +00:00
80af98c6c3 [BE]: Update nlohmann submodule to 3.12.0 (#154817)
This is mostly compiler fixes, C++20 fixes, and clang-tidy fixes. Should be entirely backwards compatible with our current version
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154817
Approved by: https://github.com/jansel, https://github.com/malfet
2025-06-02 01:29:58 +00:00
206e9d5160 [BE]: Update cpp-httplib submodule to 0.20.1 (#154825)
Updates cpp-httplib to 0.20.1. This mostly updates OSS with a bunch of CMake, CXX compiler errors, and bugfixes from upstream. It's a header only library so should be pretty straightforward to upgrade
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154825
Approved by: https://github.com/malfet
2025-06-01 21:44:23 +00:00
cyy
5616fa4a68 [Submodule] Bump flatbuffers to v24.12.23 (#143964)
This sub-module has not been updated for a long time.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143964
Approved by: https://github.com/Skylion007
2025-06-01 02:25:57 +00:00
c33fc9dae3 [BE][Ez]: Update VulkanMemoryAllocator to 3.3.0 (#154796)
Last update to this submodule was 3 years ago, and the API is pretty stable and this is a minor version release update. Part of a bunch of PRs to eradicate low CMake required versions
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154796
Approved by: https://github.com/jansel
2025-06-01 00:30:56 +00:00
b85c460749 [BE][Ez]: Update NVTX submodule to 3.2.1 (#154797)
Update NVTX3 submodule to 3.2.1.
* Mostly improved compiler support, Python support, and better CMake and C++ support.
* Also has a few new APIs to support fancy new features.
* This is header only library so should be an easy non-invasive change.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154797
Approved by: https://github.com/jansel
2025-05-31 23:01:13 +00:00
ed1ff7d0fb [BE][Ez]: Update mimalloc submodule to 2.2.3 (#154720)
Updating minor version of mimalloc. The old version is more than 2 years old, and the newer release has performance fixes and compiler fixes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154720
Approved by: https://github.com/jansel
2025-05-30 20:17:13 +00:00
66f53889d5 [nativert] port semaphore to c10 util (#153504)
Summary:
nativert RFC: https://github.com/zhxchen17/rfcs/blob/master/RFC-0043-torch-native-runtime.md

To land the runtime into PyTorch core, we will gradually land logical parts of the code into the Github issue and get each piece properly reviewed.

This diff adds a simple semaphore interface into c10 until c++20 where we get counting_semaphore

gonna need a oss build export to take a look at this...

Test Plan: CI

Differential Revision: D73882656

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153504
Approved by: https://github.com/zhxchen17
2025-05-28 19:17:30 +00:00
b8452e55bc [Kineto x Insight] Update Kineto submodule (#154426)
Summary: We add a new ActivityType::MTIA_INSIGHT in 20f652846f

Test Plan: CI

Differential Revision: D75454945

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154426
Approved by: https://github.com/Skylion007
2025-05-27 18:29:29 +00:00
a664cfdf95 Add C10_NODEPRECATED check for xpu (#153935)
# Motivation
Add `C10_NODEPRECATED` check for XPU. This doesn't allow xpu codebase to use `c10::optional`.

What's the change about torch-xpu-ops commit update?
Deprecate `c10::optional`, `c10::nullopt`, `c10::make_option`, use the counterpart in std instead.

# Additional Context
This PR depends on
https://github.com/intel/torch-xpu-ops/pull/1683
https://github.com/intel/torch-xpu-ops/pull/1690

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153935
Approved by: https://github.com/Skylion007, https://github.com/cyyever
2025-05-22 06:44:04 +00:00
11f8511455 Update torch-xpu-ops commit pin (#153902)
Update the torch-xpu-ops commit to defce46ae7, includes:

- Resolve the aten::gamma accuracy gap compared to scipy
- Optimize layernom_vectorized_impl by using adaptive wg selection for small shapes
- [Intro async flag and use current stream avoid stream sync](https://github.com/intel/torch-xpu-ops/pull/1546)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153902
Approved by: https://github.com/Skylion007, https://github.com/EikanWang
2025-05-21 13:29:41 +00:00
05bc78e64f [submodule] Update fbgemm pinned version (#153950)
Summary:
Update fbgemm pinned version in PyTroch.
Related update in fbgemm: D74434751

Included changes:
Update fbgemm external dependencies directory in setup.py
Add DISABLE_FBGEMM_AUTOVEC flag to disable fbgemm's autovec

Test Plan: PyTorch OSS CI

Differential Revision: D75073516

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153950
Approved by: https://github.com/Skylion007, https://github.com/ngimel
2025-05-20 20:24:27 +00:00
ef958fa152 [cuDNN][cuDNN frontend] upgrade cuDNN frontend submodule to 1.12 (#153888)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153888
Approved by: https://github.com/Skylion007
2025-05-20 15:08:37 +00:00
d869ea11e0 [BE]: Update fmtlib submodule to 11.2.0 (#153853)
Update fmtlib to 11.2.0 with a lot of miscellaneous fixes for various compilers.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153853
Approved by: https://github.com/malfet
2025-05-20 14:11:18 +00:00
cyy
7ae7324ac4 [submodule] Update google benchmark to v1.9.3 (#153676)
And remove `include_directories`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153676
Approved by: https://github.com/Skylion007
2025-05-16 23:31:53 +00:00
cyy
9d3b6ee4c1 [submodule] Update gtest to v1.17.0 (#153618)
And remove some outdated CMake code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153618
Approved by: https://github.com/malfet
2025-05-16 01:24:19 +00:00
d1dd2c1fc8 gloo: cuda (#153406)
This enables Gloo CUDA when used with a backend that supports GPUDirect which currently is only the IBVERBS backend.

This requires some changes to Gloo which are in https://github.com/pytorch/gloo/pull/441

Since we're now depending on gloo_cuda we need to split ProcessGroupGloo into two pieces, one with the CPU bits (libtorch_cpu) and one with CUDA kernels in libtorch_cuda. This unfortunately requires some major refactoring as some CPU code is shared across both.

The gloo submodule is updated to depend on the new Gloo changes

Test plan:

```py
import os
import time

transport = "TCP"
#transport = "IBVERBS"

os.environ["GLOO_DEVICE_TRANSPORT"] = transport
rank = int(os.environ["RANK"])
os.environ["CUDA_VISIBLE_DEVICES"] = str(rank)

ibv = "mlx5_0:1,mlx5_3:1,mlx5_4:1,mlx5_5:1,mlx5_6:1,mlx5_9:1,mlx5_10:1,mlx5_11:1".split(",")[rank]
ibv_name, ibv_port = ibv.split(":")
os.environ["TORCH_GLOO_IBV_NAME"] = ibv_name
os.environ["TORCH_GLOO_IBV_PORT"] = ibv_port
os.environ["TORCH_GLOO_IBV_INDEX"] = "3"

import torch
import torch.distributed as dist

dist.init_process_group("gloo")

rank = dist.get_rank()

# initial sanity check
#device = "cpu"
#t = torch.zeros(10, device=device)
#dist.all_reduce(t)
#print("sanity complete")

device = "cpu"

iters = 10
warmup_iters = 2

for nelem in [10, 100, 1000, 10000, 100000, 1000000, 10000000, 100000000]:
    t = torch.zeros(nelem, device=device)

    torch.cuda.current_stream().synchronize()
    for i in range(warmup_iters):
        dist.all_reduce(t)

    torch.cuda.current_stream().synchronize()

    start = time.perf_counter()

    for i in range(iters):
        dist.all_reduce(t)

    torch.cuda.current_stream().synchronize()

    dur = (time.perf_counter() - start)
    qps = iters/dur

    bandwidth_gb = t.nbytes * iters / dur / 1e9

    gb = t.nbytes / 1e9

    if rank == 0:
        print(f"{transport=} {device=} {iters=} {nelem=} {qps=} {gb=} {bandwidth_gb=}\n", end="")
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153406
Approved by: https://github.com/fduwjj
2025-05-16 01:13:13 +00:00
cyy
e5e06d9cab [submodule] Update kleidiai to v1.8.0 (#153592)
And cleans up some CMake instructions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153592
Approved by: https://github.com/malfet
2025-05-15 10:14:05 +00:00
1f48bab377 Update torch-xpu-ops commit pin (#153445)
Update the torch-xpu-ops commit to [207105038963e5f9f012f1a0cfd3b9f57b2ab5b0](2071050389), includes:

- Improve the accuracy of `upsample_bilinear2d_backward`
- Enhance the performance of `avg_pool2d`
- Update the implementation of scatter-gather and indexing

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153445
Approved by: https://github.com/guangyey, https://github.com/EikanWang
2025-05-14 15:34:47 +00:00
9c3cef437c gloo: support ibverbs in cmake (#153425)
This updates the gloo submodule in PyTorch to a version that supports the new ibverbs backend that can be used with PyTorch.

Test plan:

```
sudo dnf install rdma-core-devel
USE_GLOO_IBVERBS=ON python setup.py develop
torchrun --nproc_per_node 2 ~/scripts/gloo_ibverbs_test.py
```

```py
"""
run with:

torchrun --nproc_per_node 2 ~/scripts/gloo_ibverbs_test.py
"""

import os

os.environ["GLOO_DEVICE_TRANSPORT"] = "IBVERBS"

import torch
import torch.distributed as dist

dist.init_process_group("gloo")

rank = dist.get_rank()

if rank == 0:
    device = "cpu"
else:
    device = "cuda"

print(device)

t = torch.full((10, 100), fill_value=(rank+1), device=device)
target = torch.full((10, 100), fill_value=3, device=device)

dist.all_reduce(t)

torch.testing.assert_close(t, target)

t = torch.full((10, 100), fill_value=(rank+1), device=device)

if rank == 0:
    dist.send(t, dst=1)
else:
    dist.recv(t, src=0)
    torch.testing.assert_close(t, torch.full_like(t, 1))
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153425
Approved by: https://github.com/fduwjj
2025-05-13 17:09:00 +00:00
cyy
15e08f9571 [submodule] Update ONNX to 1.18 (#152200)
Update ONNX to 1.18.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152200
Approved by: https://github.com/justinchuby, https://github.com/malfet
2025-05-13 04:18:45 +00:00
76e34e3850 [Kineto] Upgrade the kineto commit to fb36cce (#152007)
XPU intends to upgrade oneAPI version(https://github.com/pytorch/pytorch/issues/151097) to support torch Distributed. However, the PTI within the oneAPI to be upgraded introduces breaking changes. It changed the signature of the APIs as follows.
- ptiViewEnableRuntimeApi
- ptiViewGetApiIdName

To avoid the breaks due to the PTI upcoming non-backward-compatible changes, we refined the XPU PTI integration with the kineto. We check the PTI version and then invoke the PTI API accordingly. It means that the kineto of this PR can overcome the non-backward-compatible issue for the sake of the upcoming oneAPI 2025.1.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152007
Approved by: https://github.com/EikanWang, https://github.com/gujinghui, https://github.com/sraikund16, https://github.com/malfet
2025-05-09 18:38:41 +00:00
07a29dbe81 [BE]: Update cutlass submodule to 3.9.2 (#152779)
A lot of last minute bugfixes for CUTLASS blackwell that we should upstream. It's a header only library and a minor release so this should strictly improve compiler support and fix some bugs. Needed to update some instruction numbers in torch compile baselines for the new kernels

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152779
Approved by: https://github.com/henrylhtsang
2025-05-06 16:08:24 +00:00
cyy
ac792a0dca [submodule] Bump ITTAPI to 3.25.5 (#150263)
It hasn't been updated for 3 years. And also to remove CMake 4 workaround.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150263
Approved by: https://github.com/sraikund16
2025-05-06 01:02:18 +00:00
361bf056a7 [nativert] Add moodycamel/concurrentqueue as third-party dependency (#152033)
nativert RFC:  https://github.com/zhxchen17/rfcs/blob/master/RFC-0043-torch-native-runtime.md

moodycamel/concurrentqueue is a high performence mpmc queue implementation and single header only. We want to add this to third_party to be used with upcoming Torch Native Runtime.

The source code is imported from commit hash 2f09da73d22a47dc8a89cdd4fc4c3bfae07f4284 from https://github.com/cameron314/concurrentqueue

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152033
Approved by: https://github.com/seemethere, https://github.com/malfet
2025-04-30 21:37:20 +00:00
f05d3e5019 [torch-xpu-ops] Update torch-xpu-ops commit pin. (#152321)
Update the torch-xpu-ops commit to [655fa9bc7f88ab5bd3766b5f2fd5b43989c2caca](655fa9bc7f), including:

- Fixes batch_norm numeric error by adding additional boundary check
- Enable two operators: fft & jagged_to_padded_dense
- XCCL relevant changes:
- Cache cclStream to improve performance.
- Add support for complex datatypes in allgather and broadcast.
- Support coalescing operations and batch_isend_irecv.
- Introduce additional logging; use export TORCH_CPP_LOG_LEVEL=INFO.
- Fix #152296
- Fix #152020

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152321
Approved by: https://github.com/EikanWang, https://github.com/Skylion007
2025-04-29 04:00:09 +00:00
a6d38051ee [CUDA][CUTLASS] CUTLASS 3.9 submodule upgrade (#151253)
Originally authored by Jack Kosaian, likely needs #ifdefs if we want to preserve compat with 3.8

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151253
Approved by: https://github.com/Skylion007, https://github.com/henrylhtsang

Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>
2025-04-28 23:10:14 +00:00
8172397025 Revert "Update torch-xpu-ops commit pin (#150827)"
This reverts commit 776aa682218bad4df7b6cd46ef2a0f1d8ca1194c.

Reverted https://github.com/pytorch/pytorch/pull/150827 on behalf of https://github.com/etaf due to Inductor UT regression ([comment](https://github.com/pytorch/pytorch/pull/150827#issuecomment-2825857903))
2025-04-24 00:41:06 +00:00
4d2d833976 [CI] Update sleef submodule to v3.8 (#151955)
Should help with RISC-V cross-compilation.
3.9.0 migration is blocked by sleef project switching to C++20
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151955
Approved by: https://github.com/atalman, https://github.com/wdvr, https://github.com/Skylion007
2025-04-23 23:56:05 +00:00
4bf09562e4 [EZ/Profiler] Update Submodule (#151843)
Summary: Update to d82680bbd4

Test Plan: CI

Differential Revision: D73397323

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151843
Approved by: https://github.com/Skylion007, https://github.com/aaronenyeshi
2025-04-22 18:19:43 +00:00
776aa68221 Update torch-xpu-ops commit pin (#150827)
Update the torch-xpu-ops commit to [b51dd3ef4f4d0f6b44c59e61431c5d29354dcaf6](b51dd3ef4f), including:
- Update commit pin to xpu-ops main branch
- Fixes batch_norm numeric error by adding additional boundary check
- Enable two operators: fft & jagged_to_padded_dense
- XCCL relevant changes:
1. Cache `cclStream` to improve performance.
2. Add support for complex datatypes in `allgather` and `broadcast`.
3. Support `coalescing` operations and `batch_isend_irecv`.
4. Introduce additional logging; use `export TORCH_CPP_LOG_LEVEL=INFO`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150827
Approved by: https://github.com/EikanWang

Co-authored-by: Yu, Guangye <106960996+guangyey@users.noreply.github.com>
2025-04-18 10:12:59 +00:00
ad5e9065ac [Profiler/Easy] Remove temp flag for on-demand Memory Snapshot (#151068)
Summary: Now that we have profiler impl in we don't need the temporary flag. submodule update too.

Test Plan: CI

Reviewed By: sanrise

Differential Revision: D72672186

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151068
Approved by: https://github.com/davidberard98
2025-04-11 18:50:25 +00:00
df4e5294a6 Reapply "ProcessGroupGloo: support lazy_init (#150801)" (#151031)
This reverts commit 73f3d6d9aaa128d9917e8b3790933ba2855066cc.

Reapplies #150801

Test plan:

See #150801

submodule

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151031
Approved by: https://github.com/fduwjj
2025-04-11 01:58:35 +00:00
73f3d6d9aa Revert "ProcessGroupGloo: support lazy_init (#150801)"
This reverts commit f237ee54bfb35d16cd10e358d4b78578c88a5781.

Reverted https://github.com/pytorch/pytorch/pull/150801 on behalf of https://github.com/atalman due to failing internally ([comment](https://github.com/pytorch/pytorch/pull/150801#issuecomment-2793161239))
2025-04-10 13:44:31 +00:00
f237ee54bf ProcessGroupGloo: support lazy_init (#150801)
This adds lazy initialization support to ProcessGroupGloo via `TORCH_GLOO_LAZY_INIT` or via `create_device(..., lazy_init=True)`

This is still a draft PR as there's one race condition when doing coalesced operations that needs to be fixed upstream in Gloo first. Depends on https://github.com/facebookincubator/gloo/pull/427 landing first

This also updates the gloo submodule to include the required changes.

Test plan:

added lazy init test variants

```
pytest -v test/distributed/test_c10d_gloo.py -k Lazy
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150801
Approved by: https://github.com/fduwjj
2025-04-09 19:29:50 +00:00
99c9a31386 [submodule] [Snapshot/Profiler] Memory Snapshot On Demand (#150559)
Summary:
Profiler side of memory snapshot.

1. Add API to actually do snapshot when client interface is called
2. Add ifdefs to builds so that kineto hooks snapshot correctly.

Design Philosophy: There is one interesting part of this implementation and it is during export. For export we are callign the python impl of the export rather than CPP even though we are already in CPP. This is because it is better to simply have one path of export rather than 2. Personally, I want there to be parity between auto-trace and on-demand so it if we can limit the side paths then we will have an easier time maintaining this relationship

Test Plan: {F1976563426}

Reviewed By: sanrise

Differential Revision: D70733247

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150559
Approved by: https://github.com/sanrise
2025-04-07 13:04:38 +00:00
2e23768d25 Expose symbols on macos in the xplat pytorch stack (#150487)
Summary:
X-link: https://github.com/pytorch/executorch/pull/9819

Had to revert D71321310 because it affected way too many targets and build sizes.

These changes should expose just enough symbols to be buildable in arvr mode on macOS. Could potentially make narrow it down even more by avoiding eg `get_pt_compiler_flags`

Differential Revision: D72255474

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150487
Approved by: https://github.com/drisspg
2025-04-04 23:03:16 +00:00
0198e44f37 Update torch-xpu-ops commit pin to 98c808d (#150554)
Update the torch-xpu-ops commit to [98c808dea6de7330c415aa777d6921944cf79887](98c808dea6), include

- Fixes #150001 by removing pre-CXX11 ABI logic from build script for XPU
- Fixes #150430
- Fixes XCCL build issue caused by PR #150398

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150554
Approved by: https://github.com/EikanWang, https://github.com/malfet
2025-04-02 22:42:18 +00:00
91666eef60 Update gloo submodule (#150320)
That updates its CMake minimum version(via https://github.com/facebookincubator/gloo/pull/424 ) and removes cmake-4.0.0 workarounds for gloo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150320
Approved by: https://github.com/atalman
2025-03-31 22:40:27 +00:00
f74d5d576a Update torch-xpu-ops commit pin to 3ee2bd2 (#150300)
Update the torch-xpu-ops commit to [3ee2bd2f13e1ed17a685986ff667a58bed5f2aa5](3ee2bd2f13)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150300
Approved by: https://github.com/EikanWang
2025-03-31 13:36:11 +00:00
e91f84c87d [BE]: Update cudnn frontend submodule to 1.11.0 (#149759)
Update CUDNN frontend submodule to 11.1.0. Adds some new features like score_mod from flex_attention and adds a lot of bugfixes and new feature knobs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149759
Approved by: https://github.com/jansel
2025-03-30 17:14:26 +00:00