58 Commits

Author SHA1 Message Date
9e89b1c4c7 Update torch-xpu-ops commit pin (#165321)
Update the torch-xpu-ops commit to [intel/torch-xpu-ops@ce9db1](ce9db15136), includes:

- Fix test_barrier hang by using static global rank in ProcessGroupXCCL
- Update install_xpu_headers only when content should change to speedup recompilation
- Add global rank information to communication logging
- Remove duplicate normalization from FFT methods
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165321
Approved by: https://github.com/EikanWang
2025-10-14 09:07:24 +00:00
53f5af8c92 Update torch-xpu-ops commit pin (#164237)
Update the torch-xpu-ops commit to [intel/torch-xpu-ops@f30173](f301733b03), includes:

- Install xpu internal headers to PyTorch
- Fix error handling for BatchLinearAlgebra Ops
- Fix unnecessary double data type conversion
- Fix overflow when calculating workgroups count
- Fix segmentation fault and calculation error in AveragePool2dKernel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164237
Approved by: https://github.com/EikanWang
2025-10-09 10:38:59 +00:00
4783e3ff49 Update torch-xpu-ops commit pin (#163758)
Update the torch-xpu-ops commit to [intel/torch-xpu-ops@229e8b](229e8ba104), includes:

- Revert tracking of Work status for FlightRecorder in ProcessGroupXCCL to fix memory leak
- Enable SYCL warnings on Linux
- Fix accuracy issues with CTC loss
- Enable aten::nonzero_static on XPU backend
- Stop recursive calculations in polynomial kernels if tensor has NaNs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163758
Approved by: https://github.com/EikanWang
2025-09-26 09:05:08 +00:00
e134bb340a Update torch-xpu-ops commit pin (#163244)
Update the torch-xpu-ops commit to 24fab67b6e, includes:

- Clean up getDeviceIndexOfCurrentQueue
- Fix hardswish gradients corner case
- Fix xccl contiguous check
- Move checks from nonzero kernel to operator
- support high priority stream for xccl

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163244
Approved by: https://github.com/EikanWang
2025-09-19 02:04:40 +00:00
9786243b64 Update torch-xpu-ops commit pin (#162804)
Update the torch-xpu-ops commit to [intel/torch-xpu-ops@d8c3ee](d8c3eefc29), includes:

- Optimize adaptive average pool for channel-last memory format
- Add unregister wait_tensor
- Replace deprecated `[[intel::reqd_sub_group_size(SgSize)]]` with `[[sycl::reqd_sub_group_size(SIMD)]]` and remove unnecessary attributes
- Revert "Roll back to original usage of sycl::get_kernel_bundle"
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162804
Approved by: https://github.com/EikanWang
2025-09-16 06:30:48 +00:00
ba7f546ccc Update torch-xpu-ops commit pin (#162062)
Update the torch-xpu-ops commit to [intel/torch-xpu-ops@83c5a5](83c5a5a551), includes:

- Revert "Disable xccl timer avoid drlm hang" because XPU time event issue has been fixed
- Fallback lu_factor kernel to CPU for single batch
- Enable aten::linalg_inv and aten::linalg_inv_ex on XPU
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162062
Approved by: https://github.com/EikanWang
2025-09-04 17:05:33 +00:00
a99d8d39bc Update torch-xpu-ops commit pin (#161919)
# Motivation
1. Fallback some linalg functionality such as `linalg_eig`, `linalg_householder_product`, `linalg_solve_triangular` to CPU;
2. Fix codegen dependency bug.

# Additional Context
This PR aims to fix https://github.com/pytorch/pytorch/issues/161498

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161919
Approved by: https://github.com/EikanWang
2025-09-02 17:09:07 +00:00
f44ad54bc6 Update torch-xpu-ops commit pin (#161152)
Update the torch-xpu-ops commit to [8b58040ee32689487f660462f655085f31506dab](8b58040ee3), includes:

- Add vectorization path on maxpool forward channel last
- Add FlightRecorder support for ProcessGroupXCCL
- Fix random build failure on codegen
- Suppress dllexport warning on Windows
- Make torch-xpu-ops build depend on ATen XPU
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161152
Approved by: https://github.com/EikanWang

Co-authored-by: Yu, Guangye <106960996+guangyey@users.noreply.github.com>
2025-08-30 07:19:24 +00:00
9b4adc4db7 [fr] [xpu] Add FlightRecorder support for ProcessGroupXCCL (#158568)
Adds support for FlightRecorder in ProcessGroupXCCL.

See https://github.com/intel/torch-xpu-ops/pull/1867 for XCCL implementation and more details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158568
Approved by: https://github.com/guangyey, https://github.com/fduwjj
2025-08-22 09:03:35 +00:00
663da17b62 Update torch-xpu-ops commit pin (#160062)
Update the torch-xpu-ops commit to [77cc792cd265179745d335579d233e6d4f9a2667](77cc792cd2), includes:

- Ensures that the XPU cache is cleared before creating tensors during the test
- Add unused variable warning
- Fix test_linalg and test_torch issue with bf32_on_and_off updates
- Fix deterministic indexing with broadcast
- Fix dist.gather with noncontiguous tensor
- Improve accuracy of index put deterministic kernel
- Add generate file rely avoid build before generate
- optimize embedding bag

Fixes #160661

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160062
Approved by: https://github.com/EikanWang
2025-08-15 15:27:24 +00:00
57ab39f7e4 Update torch-xpu-ops commit pin (#159621)
Update the torch-xpu-ops commit to [intel/torch-xpu-ops@1f7a57](1f7a57f507) includes:

- Add Template Parameter to the function `gpu_kernel` for Controlling Broadcasting Vectorization
- Add optional NaN checks to XCCL
- Fix NllLossForwardReduce2DKernelFunctor accuracy
- Extend the existing communication logging to include the reduction operation for collective calls
- [Reland] Install xpu codegen header to torch/include
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159621
Approved by: https://github.com/EikanWang
2025-08-05 01:46:15 +00:00
72c8751b61 Align meta deducing for fft_r2c with fft_r2c_mkl on XPU (#156048)
There is a memory layout mismatching between `fft_r2c` XPU and Inductor meta deducing.
Original `fft_r2c` Inductor meta deducing for XPU backend is aligned with CPU (fallback). This PR is to correct the Inductor meta deducing and update the torch-xpu-ops commit to [intel/torch-xpu-ops@`3a9419c`](3a9419c8bb).
The XPU implementation first performs the R2C transform on the last dimension, followed by iterative C2C transforms on the remaining dimensions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156048
Approved by: https://github.com/guangyey, https://github.com/etaf, https://github.com/jansel
2025-06-20 01:41:03 +00:00
4a4cac0cef Update torch-xpu-ops commit pin (#154962)
Update the torch-xpu-ops commit to [intel/torch-xpu-ops@`a3a196`](a3a196ccdb) includes:

- Enhanced Adaptive Average Pooling 2D Backward Kernel for performance and code simplification
- Group Norm Backward Optimization with vectorization and parallel reduction
- Support CL path for MaxUnpooling2d and MaxUnpooling3d
- Rename USE_ONEMKL as USE_ONEMKL_XPU and set it as default ON
- Refactor USE_XCCL & USE_C10D_XCCL option
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154962
Approved by: https://github.com/EikanWang
2025-06-09 15:54:13 +00:00
a664cfdf95 Add C10_NODEPRECATED check for xpu (#153935)
# Motivation
Add `C10_NODEPRECATED` check for XPU. This doesn't allow xpu codebase to use `c10::optional`.

What's the change about torch-xpu-ops commit update?
Deprecate `c10::optional`, `c10::nullopt`, `c10::make_option`, use the counterpart in std instead.

# Additional Context
This PR depends on
https://github.com/intel/torch-xpu-ops/pull/1683
https://github.com/intel/torch-xpu-ops/pull/1690

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153935
Approved by: https://github.com/Skylion007, https://github.com/cyyever
2025-05-22 06:44:04 +00:00
11f8511455 Update torch-xpu-ops commit pin (#153902)
Update the torch-xpu-ops commit to defce46ae7, includes:

- Resolve the aten::gamma accuracy gap compared to scipy
- Optimize layernom_vectorized_impl by using adaptive wg selection for small shapes
- [Intro async flag and use current stream avoid stream sync](https://github.com/intel/torch-xpu-ops/pull/1546)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153902
Approved by: https://github.com/Skylion007, https://github.com/EikanWang
2025-05-21 13:29:41 +00:00
1f48bab377 Update torch-xpu-ops commit pin (#153445)
Update the torch-xpu-ops commit to [207105038963e5f9f012f1a0cfd3b9f57b2ab5b0](2071050389), includes:

- Improve the accuracy of `upsample_bilinear2d_backward`
- Enhance the performance of `avg_pool2d`
- Update the implementation of scatter-gather and indexing

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153445
Approved by: https://github.com/guangyey, https://github.com/EikanWang
2025-05-14 15:34:47 +00:00
f05d3e5019 [torch-xpu-ops] Update torch-xpu-ops commit pin. (#152321)
Update the torch-xpu-ops commit to [655fa9bc7f88ab5bd3766b5f2fd5b43989c2caca](655fa9bc7f), including:

- Fixes batch_norm numeric error by adding additional boundary check
- Enable two operators: fft & jagged_to_padded_dense
- XCCL relevant changes:
- Cache cclStream to improve performance.
- Add support for complex datatypes in allgather and broadcast.
- Support coalescing operations and batch_isend_irecv.
- Introduce additional logging; use export TORCH_CPP_LOG_LEVEL=INFO.
- Fix #152296
- Fix #152020

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152321
Approved by: https://github.com/EikanWang, https://github.com/Skylion007
2025-04-29 04:00:09 +00:00
8172397025 Revert "Update torch-xpu-ops commit pin (#150827)"
This reverts commit 776aa682218bad4df7b6cd46ef2a0f1d8ca1194c.

Reverted https://github.com/pytorch/pytorch/pull/150827 on behalf of https://github.com/etaf due to Inductor UT regression ([comment](https://github.com/pytorch/pytorch/pull/150827#issuecomment-2825857903))
2025-04-24 00:41:06 +00:00
776aa68221 Update torch-xpu-ops commit pin (#150827)
Update the torch-xpu-ops commit to [b51dd3ef4f4d0f6b44c59e61431c5d29354dcaf6](b51dd3ef4f), including:
- Update commit pin to xpu-ops main branch
- Fixes batch_norm numeric error by adding additional boundary check
- Enable two operators: fft & jagged_to_padded_dense
- XCCL relevant changes:
1. Cache `cclStream` to improve performance.
2. Add support for complex datatypes in `allgather` and `broadcast`.
3. Support `coalescing` operations and `batch_isend_irecv`.
4. Introduce additional logging; use `export TORCH_CPP_LOG_LEVEL=INFO`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150827
Approved by: https://github.com/EikanWang

Co-authored-by: Yu, Guangye <106960996+guangyey@users.noreply.github.com>
2025-04-18 10:12:59 +00:00
0198e44f37 Update torch-xpu-ops commit pin to 98c808d (#150554)
Update the torch-xpu-ops commit to [98c808dea6de7330c415aa777d6921944cf79887](98c808dea6), include

- Fixes #150001 by removing pre-CXX11 ABI logic from build script for XPU
- Fixes #150430
- Fixes XCCL build issue caused by PR #150398

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150554
Approved by: https://github.com/EikanWang, https://github.com/malfet
2025-04-02 22:42:18 +00:00
f74d5d576a Update torch-xpu-ops commit pin to 3ee2bd2 (#150300)
Update the torch-xpu-ops commit to [3ee2bd2f13e1ed17a685986ff667a58bed5f2aa5](3ee2bd2f13)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150300
Approved by: https://github.com/EikanWang
2025-03-31 13:36:11 +00:00
e9c12e819d Update torch-xpu-ops commit pin (#148881)
Update the torch-xpu-ops commit to [026b2c8c7c92a7b2cec5d26334006e3423251cc6](026b2c8c7c), includes:

- Enable AOT for LNL

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148881
Approved by: https://github.com/EikanWang
2025-03-10 20:31:51 +00:00
21bd5fe203 Update torch-xpu-ops commit pin (#147968)
Update the torch-xpu-ops commit to [86aaaf8a9dd6932c088b7afcac0c0856b23d341a](86aaaf8a9d), includes:

- Bugfix (PT2E/BatchNorm)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147968
Approved by: https://github.com/Skylion007
2025-02-27 05:01:12 +00:00
7bd2e3bca1 Update torch-xpu-ops commit pin (#147743)
Update the torch-xpu-ops commit to [306a0ffb6e0cae27c5bd9a3b9cd378048c8e00e7](306a0ffb6e), includes:

- Bugfix (LayerNorm/Nonzeros)
- Update AOT target

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147743
Approved by: https://github.com/EikanWang
2025-02-25 08:06:35 +00:00
6edc419d69 Update torch-xpu-ops commit pin (#147358)
Update the torch-xpu-ops commit to [a14d1eaa834a616705068103dc8129319087e864](a14d1eaa83), includes:

- SparseCSR XPU support
- Refine build system

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147358
Approved by: https://github.com/EikanWang
2025-02-18 16:05:25 +00:00
6a2bb629ec Update torch-xpu-ops commit pin (#147302)
Update the torch-xpu-ops commit to [b421032c8fed40df5eaee395c2e7f5f8a7bcc815](b421032c8f), includes:

- Correct int4 weight pack implementation
- Enhance build system: only build one shared library for the user

Pull Request resolved: https://github.com/pytorch/pytorch/pull/147302
Approved by: https://github.com/EikanWang
2025-02-18 05:04:15 +00:00
de26ddfbdc Update torch-xpu-ops commit pin (#146671)
Update the torch-xpu-ops commit to [80c375570e2b6b2989a8610da1871f8a50dfddc7](80c375570e), includes:

- Aten operator coverage improvement
- SYCL kernel optimization
- Nested Tensor OPs support

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146671
Approved by: https://github.com/EikanWang
2025-02-14 09:30:36 +00:00
6470b0ea6f Update torch-xpu-ops commit pin (#144739)
Update the torch-xpu-ops commit to [22cc419e4e60f469341712a5a103fa309a7dfd48](22cc419e4e), includes:

- Fix building issue https://github.com/intel/torch-xpu-ops/issues/1279
- Aten operator coverage improvement

Note: new torch-xpu-ops commit don't support bundle 0.5.3

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144739
Approved by: https://github.com/EikanWang, https://github.com/malfet
2025-01-16 15:12:37 +00:00
1e881ceecf Update torch-xpu-ops commit pin (#143984)
Update the torch-xpu-ops commit to [28cfac20ec662abdb0ac98faf122450013e8f520](28cfac20ec), includes:

- Disable batch_norm vectorization path to fix accuracy issues.
- Fix the LSRM/RNN implementation error.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143984
Approved by: https://github.com/EikanWang, https://github.com/ruidazeng, https://github.com/desertfire, https://github.com/jansel
2025-01-05 09:01:36 +00:00
2ed4d65af0 Update torch-xpu-ops commit pin (#143853)
Update the torch-xpu-ops commit to [214f33](214f33b9d9), includes:

- Fix building issue for transformer related operators
- Improve XPU operator coverage

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143853
Approved by: https://github.com/EikanWang
2024-12-30 02:38:16 +00:00
3cdd997f4c Update torch-xpu-ops commit pin (#142113)
Update the torch-xpu-ops commit to [7ecb0b](7ecb0b1a56), includes:

- Capture rrelu_with_noise noise mutation in compile (Reslove https://github.com/pytorch/pytorch/issues/142102)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/142113
Approved by: https://github.com/EikanWang
2024-12-05 17:00:29 +00:00
b31d3b2f41 Update torch-xpu-ops commit pin (#141949)
Update the torch-xpu-ops commit to [f31219](f312190a92), includes:

- Add lazy init for empty_xpu
- Fix nan propagation error for soft_shrink

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141949
Approved by: https://github.com/EikanWang
2024-12-05 05:22:38 +00:00
81ab2cc757 Update torch-xpu-ops commit pin (#141201)
Update the torch-xpu-ops commit to [1e32bbc](1e32bbc3d9), includes:

- Improve XPU aten operator coverage
- Support basic `SparseXPU` operators

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141201
Approved by: https://github.com/EikanWang, https://github.com/jansel
2024-12-02 01:49:07 +00:00
ae7f809bfc Update torch-xpu-ops commit pin (#140782)
Update the torch-xpu-ops commit to [bf4bab1](bf4bab1fff), includes:

- Fix Werror=terminate relevant building issues
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140782
Approved by: https://github.com/EikanWang
2024-11-15 10:10:52 +00:00
f1e045eb75 Update torch-xpu-ops commit pin (#140277)
Update the torch-xpu-ops commit to [01f4e29](01f4e293fa), includes:
- Improve XPU operator coverage
- Fix `Werror=comments` relevant building issues

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140277
Approved by: https://github.com/EikanWang, https://github.com/atalman
2024-11-13 23:38:51 +00:00
c3087ace58 Update torch-xpu-ops commit pin (#139986)
Update the torch-xpu-ops commit to [5e29831 ](https://github.com/intel/torch-xpu-ops/commit/5e29831). Includes:
- OneAPI-2025 build issue fix
- Enhancement of the XPU operator coverage

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139986
Approved by: https://github.com/guangyey, https://github.com/jansel
2024-11-10 06:49:38 +00:00
d08dbd0436 Update torch-xpu-ops commit pin (#139041)
# Motivation
This PR intends to update torch-xpu-ops commit pin. It mainly includes the following two highlighted changes:
1. split the DLL library into 4 smaller libraries to avoid the 2G limitation on Windows;
2. some new operators added, for example, `cdist`, `pdist`, `maxunpool2d`, `maxunpood3d`, `upsample_trilinear3d, `Bessel operators`, etc...

# Additional Context
We have to supply XPU device check logic in `cdist` and `pdist` ops.
This PR depends on https://github.com/pytorch/pytorch/pull/139050 to fix Windows build issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/139041
Approved by: https://github.com/EikanWang, https://github.com/ezyang
2024-10-31 05:06:06 +00:00
0efa590d43 [CI] Fix XPU CI failure (#138548)
# Motivation
Fix https://github.com/pytorch/pytorch/issues/138577.

# Solution
1. All UTs in `test/inductor/test_compiled_optimizers.py` are fixed by https://github.com/pytorch/pytorch/pull/134170
2. UT in `test/inductor/test_pattern_matcher.py` is introduced by https://github.com/pytorch/pytorch/pull/138089, we will skip this UT due to the unsupported feature `max_autotune_gemm_backends:Triton`.
3. We have a new impl related to `histc`, so we remove the expected failure from `test/inductor/test_torchinductor_opinfo.py`
4. We support `avg_pool3d` for `fp16` data type, so we remove the expected failure from `test/inductor/test_torchinductor_opinfo.py`
5. CUDA-bias code is introduced by https://github.com/pytorch/pytorch/issues/138472, we just generalize it to `GPU_TYPE`.

# Additional Context
> Why update torch-xpu-ops commit pin here?

We have to update commit pin to avoid the build failure raised by the code change [C10_UNUSED](https://github.com/pytorch/pytorch/pull/138364).

> What does the feature of torch-xpu-ops update?

1. Add some foreach ops, like `unary ops` and `foreach_clamp_max` etc;
2. Add some maxpool ops forward and backward, like `averge_pool3d` and `max_pool3d`
3. Add some other ops, like `log_normal_`, `index_copy`, and `mode` etc;
4. fix build failure related to `C10_UNUSED`;

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138548
Approved by: https://github.com/malfet, https://github.com/EikanWang
2024-10-24 07:56:26 +00:00
5689e33cfe [Intel GPU] Fix Windows linkage issue due to invisible structured kernel symbols (#137794)
Intel GPU aten library(libtorch_xpu) utilizes `torchgen` to generate structure kernels. Currently, the generated structure kernels are decorated by `TORCH_API` to control the visibility, while `TORCH_API` is controlled by the `CAFFE2_BUILD_MAIN_LIB` macro. However, we cannot enable `CAFFE2_BUILD_MAIN_LIB` for the Intel GPU ATen library naively. Because the macro not only serves for the `TORCH_API` semantic. It means that the semantic of `TORCH_API` is symbol `hidden`.

https://github.com/pytorch/pytorch/blob/main/c10/macros/Export.h#L95-L99

Therefore, we need to use ` TORCH_XPU_API` to decorate the produced structure kernels.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137794
Approved by: https://github.com/atalman
ghstack dependencies: #137873
2024-10-15 15:31:37 +00:00
079f909263 Revert "Make Context to be Device-agnostic Step by Step (1/N) (#136519)"
This reverts commit be0b75256a7e516217b059ef273901b95c022fe7.

Reverted https://github.com/pytorch/pytorch/pull/136519 on behalf of https://github.com/jovianjaison due to this pr is causing errors internally ([comment](https://github.com/pytorch/pytorch/pull/136519#issuecomment-2405781093))
2024-10-10 18:32:17 +00:00
be0b75256a Make Context to be Device-agnostic Step by Step (1/N) (#136519)
- make init to be device-agnostic and move it to AcceleratorHooksInterface
- refactoring context related to device initialization

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136519
Approved by: https://github.com/ezyang, https://github.com/EikanWang, https://github.com/guangyey
2024-10-09 02:13:36 +00:00
0d1d69fd25 Update torch-xpu-ops pin (ATen XPU implementation) (#135647)
Release cycle for PyTorch 2.5
1. Fixing runtime error on Windows: Fail to load torch_xpu_ops_unary_binary_kernels.dll as the bin size is large.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135647
Approved by: https://github.com/EikanWang
2024-09-12 03:16:08 +00:00
60d98b4cfb Update torch-xpu-ops pin (ATen XPU implementation) (#135300)
Release cycle for PyTorch 2.5
1. Bugfixing: correct reduction logic in cdist kernel.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135300
Approved by: https://github.com/EikanWang
2024-09-06 07:30:09 +00:00
b99ef1a02e Update torch-xpu-ops pin (ATen XPU implementation) (#135185)
Release cycle for PyTorch 2.5
1. Update specific AOT targets for Windows. On Windows, AOT target list prefers Intel client GPUs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135185
Approved by: https://github.com/EikanWang
2024-09-05 10:05:23 +00:00
2443507acc Update torch-xpu-ops pin (ATen XPU implementation) (#134983)
Release cycle for PyTorch 2.5
1. Enable Windows build in latest torch-xpu-ops. Resolved large bin issue.
2. Refine test infrastructure for compatibility on different HW platforms.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134983
Approved by: https://github.com/EikanWang
2024-09-03 12:14:37 +00:00
b7baa062fc Update torch-xpu-ops pin (ATen XPU implementation) (#133850)
Bugfixings for PyTorch 2.5,
1. Using SYCL group algorithm API instead of old style for sub group shift utilities.
2. Add preprocess in reduction kernel for cases requiring data type cast.
3. Make group norm memory format compatible.
4. ZeroTensor: a. Remove unnecessary aten operators registration, or ZeroTensor process is bypassed. b. Align preprocess with intree implementation in aten::copy_.
5. Rebase checkIndexTensorTypes usage.
6. Align latest semantics of PyTorch foreach operators. Return multiple tensors with offset=0.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133850
Approved by: https://github.com/EikanWang
2024-08-22 06:27:03 +00:00
81b8d3586f Update torch-xpu-ops pin (ATen XPU implementation) (#132390)
Regular update.
1. New 69 ATen operators and variants are added. See https://github.com/intel/torch-xpu-ops/blob/main/yaml/xpu_functions.yaml.
2. Align with PyTorch in-tree to use safe data pointer access APIs.
3. Enable FP64 conversion emulation for some platforms.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132390
Approved by: https://github.com/EikanWang
2024-08-04 02:22:46 +00:00
dfba85c26b Update torch-xpu-ops pin (ATen XPU implementation) (#131643)
# Motivation
Regular update.
1. Some new ATen ops support
2. ABI=0 build support
3. Remove dispatched implementation of pin_memory&is_pinned
4. Enhance deterministic usage

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131643
Approved by: https://github.com/EikanWang
2024-07-26 05:51:58 +00:00
b556d31586 Update torch-xpu-ops pin (ATen XPU implementation) (#131015)
Regular update.
1. New 90 ATen operators and their variants are supported for XPU.
2. Bugfixing: a. Fixing out-of-bound memory access in index_put kernel b. Fixing debug build error
3. Binary change. Split device AOT code of SYCL kernel into multiple libraries to avoid linkage failure.
4. torch-xpu-ops test case enhancement: a. Hook PyTorch testing ob_db to align opInfo configuration with CUDA b. Hook _check_arg_device2 and freeze_rng_state to make XPU happy

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131015
Approved by: https://github.com/EikanWang
2024-07-19 02:18:55 +00:00
cf090e222e Update torch-xpu-ops pin (ATen XPU implementation) (#130333)
1. Fixing compilation error due to PyTorch update. The helper function prototype changes, `checkIndexTensorTypes`.
2. Fixing compilation error due to PyTorch update. PyTorch forced -Werror=unused-function.
3. Fixing inductor case failure due to CUDA bias implementation in the case. https://github.com/pytorch/pytorch/issues/130426

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130333
Approved by: https://github.com/EikanWang, https://github.com/atalman
2024-07-10 18:10:53 +00:00