pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Author	SHA1	Message	Date
Cui, Yifeng	9e89b1c4c7	Update torch-xpu-ops commit pin (#165321 ) Update the torch-xpu-ops commit to [intel/torch-xpu-ops@ce9db1](`ce9db15136`), includes: - Fix test_barrier hang by using static global rank in ProcessGroupXCCL - Update install_xpu_headers only when content should change to speedup recompilation - Add global rank information to communication logging - Remove duplicate normalization from FFT methods Pull Request resolved: https://github.com/pytorch/pytorch/pull/165321 Approved by: https://github.com/EikanWang	2025-10-14 09:07:24 +00:00
Cui, Yifeng	53f5af8c92	Update torch-xpu-ops commit pin (#164237 ) Update the torch-xpu-ops commit to [intel/torch-xpu-ops@f30173](`f301733b03`), includes: - Install xpu internal headers to PyTorch - Fix error handling for BatchLinearAlgebra Ops - Fix unnecessary double data type conversion - Fix overflow when calculating workgroups count - Fix segmentation fault and calculation error in AveragePool2dKernel Pull Request resolved: https://github.com/pytorch/pytorch/pull/164237 Approved by: https://github.com/EikanWang	2025-10-09 10:38:59 +00:00
Cui, Yifeng	4783e3ff49	Update torch-xpu-ops commit pin (#163758 ) Update the torch-xpu-ops commit to [intel/torch-xpu-ops@229e8b](`229e8ba104`), includes: - Revert tracking of Work status for FlightRecorder in ProcessGroupXCCL to fix memory leak - Enable SYCL warnings on Linux - Fix accuracy issues with CTC loss - Enable aten::nonzero_static on XPU backend - Stop recursive calculations in polynomial kernels if tensor has NaNs Pull Request resolved: https://github.com/pytorch/pytorch/pull/163758 Approved by: https://github.com/EikanWang	2025-09-26 09:05:08 +00:00
Han Chao	e134bb340a	Update torch-xpu-ops commit pin (#163244 ) Update the torch-xpu-ops commit to `24fab67b6e`, includes: - Clean up getDeviceIndexOfCurrentQueue - Fix hardswish gradients corner case - Fix xccl contiguous check - Move checks from nonzero kernel to operator - support high priority stream for xccl Pull Request resolved: https://github.com/pytorch/pytorch/pull/163244 Approved by: https://github.com/EikanWang	2025-09-19 02:04:40 +00:00
Cui, Yifeng	9786243b64	Update torch-xpu-ops commit pin (#162804 ) Update the torch-xpu-ops commit to [intel/torch-xpu-ops@d8c3ee](`d8c3eefc29`), includes: - Optimize adaptive average pool for channel-last memory format - Add unregister wait_tensor - Replace deprecated `[[intel::reqd_sub_group_size(SgSize)]]` with `[[sycl::reqd_sub_group_size(SIMD)]]` and remove unnecessary attributes - Revert "Roll back to original usage of sycl::get_kernel_bundle" Pull Request resolved: https://github.com/pytorch/pytorch/pull/162804 Approved by: https://github.com/EikanWang	2025-09-16 06:30:48 +00:00
Cui, Yifeng	ba7f546ccc	Update torch-xpu-ops commit pin (#162062 ) Update the torch-xpu-ops commit to [intel/torch-xpu-ops@83c5a5](`83c5a5a551`), includes: - Revert "Disable xccl timer avoid drlm hang" because XPU time event issue has been fixed - Fallback lu_factor kernel to CPU for single batch - Enable aten::linalg_inv and aten::linalg_inv_ex on XPU Pull Request resolved: https://github.com/pytorch/pytorch/pull/162062 Approved by: https://github.com/EikanWang	2025-09-04 17:05:33 +00:00
Yu, Guangye	a99d8d39bc	Update torch-xpu-ops commit pin (#161919 ) # Motivation 1. Fallback some linalg functionality such as `linalg_eig`, `linalg_householder_product`, `linalg_solve_triangular` to CPU; 2. Fix codegen dependency bug. # Additional Context This PR aims to fix https://github.com/pytorch/pytorch/issues/161498 Pull Request resolved: https://github.com/pytorch/pytorch/pull/161919 Approved by: https://github.com/EikanWang	2025-09-02 17:09:07 +00:00
yucai-intel	f44ad54bc6	Update torch-xpu-ops commit pin (#161152 ) Update the torch-xpu-ops commit to [8b58040ee32689487f660462f655085f31506dab](`8b58040ee3`), includes: - Add vectorization path on maxpool forward channel last - Add FlightRecorder support for ProcessGroupXCCL - Fix random build failure on codegen - Suppress dllexport warning on Windows - Make torch-xpu-ops build depend on ATen XPU Pull Request resolved: https://github.com/pytorch/pytorch/pull/161152 Approved by: https://github.com/EikanWang Co-authored-by: Yu, Guangye <106960996+guangyey@users.noreply.github.com>	2025-08-30 07:19:24 +00:00
frost-intel	9b4adc4db7	[fr] [xpu] Add FlightRecorder support for ProcessGroupXCCL (#158568 ) Adds support for FlightRecorder in ProcessGroupXCCL. See https://github.com/intel/torch-xpu-ops/pull/1867 for XCCL implementation and more details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/158568 Approved by: https://github.com/guangyey, https://github.com/fduwjj	2025-08-22 09:03:35 +00:00
chunhuanMeng	663da17b62	Update torch-xpu-ops commit pin (#160062 ) Update the torch-xpu-ops commit to [77cc792cd265179745d335579d233e6d4f9a2667](`77cc792cd2`), includes: - Ensures that the XPU cache is cleared before creating tensors during the test - Add unused variable warning - Fix test_linalg and test_torch issue with bf32_on_and_off updates - Fix deterministic indexing with broadcast - Fix dist.gather with noncontiguous tensor - Improve accuracy of index put deterministic kernel - Add generate file rely avoid build before generate - optimize embedding bag Fixes #160661 Pull Request resolved: https://github.com/pytorch/pytorch/pull/160062 Approved by: https://github.com/EikanWang	2025-08-15 15:27:24 +00:00
Cui, Yifeng	57ab39f7e4	Update torch-xpu-ops commit pin (#159621 ) Update the torch-xpu-ops commit to [intel/torch-xpu-ops@1f7a57](`1f7a57f507`) includes: - Add Template Parameter to the function `gpu_kernel` for Controlling Broadcasting Vectorization - Add optional NaN checks to XCCL - Fix NllLossForwardReduce2DKernelFunctor accuracy - Extend the existing communication logging to include the reduction operation for collective calls - [Reland] Install xpu codegen header to torch/include Pull Request resolved: https://github.com/pytorch/pytorch/pull/159621 Approved by: https://github.com/EikanWang	2025-08-05 01:46:15 +00:00
Cui, Yifeng	72c8751b61	Align meta deducing for fft_r2c with fft_r2c_mkl on XPU (#156048 ) There is a memory layout mismatching between `fft_r2c` XPU and Inductor meta deducing. Original `fft_r2c` Inductor meta deducing for XPU backend is aligned with CPU (fallback). This PR is to correct the Inductor meta deducing and update the torch-xpu-ops commit to [intel/torch-xpu-ops@`3a9419c`](`3a9419c8bb`). The XPU implementation first performs the R2C transform on the last dimension, followed by iterative C2C transforms on the remaining dimensions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/156048 Approved by: https://github.com/guangyey, https://github.com/etaf, https://github.com/jansel	2025-06-20 01:41:03 +00:00
Cui, Yifeng	4a4cac0cef	Update torch-xpu-ops commit pin (#154962 ) Update the torch-xpu-ops commit to [intel/torch-xpu-ops@`a3a196`](`a3a196ccdb`) includes: - Enhanced Adaptive Average Pooling 2D Backward Kernel for performance and code simplification - Group Norm Backward Optimization with vectorization and parallel reduction - Support CL path for MaxUnpooling2d and MaxUnpooling3d - Rename USE_ONEMKL as USE_ONEMKL_XPU and set it as default ON - Refactor USE_XCCL & USE_C10D_XCCL option Pull Request resolved: https://github.com/pytorch/pytorch/pull/154962 Approved by: https://github.com/EikanWang	2025-06-09 15:54:13 +00:00
Yu, Guangye	a664cfdf95	Add C10_NODEPRECATED check for xpu (#153935 ) # Motivation Add `C10_NODEPRECATED` check for XPU. This doesn't allow xpu codebase to use `c10::optional`. What's the change about torch-xpu-ops commit update? Deprecate `c10::optional`, `c10::nullopt`, `c10::make_option`, use the counterpart in std instead. # Additional Context This PR depends on https://github.com/intel/torch-xpu-ops/pull/1683 https://github.com/intel/torch-xpu-ops/pull/1690 Pull Request resolved: https://github.com/pytorch/pytorch/pull/153935 Approved by: https://github.com/Skylion007, https://github.com/cyyever	2025-05-22 06:44:04 +00:00
Yutao Xu	11f8511455	Update torch-xpu-ops commit pin (#153902 ) Update the torch-xpu-ops commit to `defce46ae7`, includes: - Resolve the aten::gamma accuracy gap compared to scipy - Optimize layernom_vectorized_impl by using adaptive wg selection for small shapes - [Intro async flag and use current stream avoid stream sync](https://github.com/intel/torch-xpu-ops/pull/1546) Pull Request resolved: https://github.com/pytorch/pytorch/pull/153902 Approved by: https://github.com/Skylion007, https://github.com/EikanWang	2025-05-21 13:29:41 +00:00
chunhuanMeng	1f48bab377	Update torch-xpu-ops commit pin (#153445 ) Update the torch-xpu-ops commit to [207105038963e5f9f012f1a0cfd3b9f57b2ab5b0](`2071050389`), includes: - Improve the accuracy of `upsample_bilinear2d_backward` - Enhance the performance of `avg_pool2d` - Update the implementation of scatter-gather and indexing Pull Request resolved: https://github.com/pytorch/pytorch/pull/153445 Approved by: https://github.com/guangyey, https://github.com/EikanWang	2025-05-14 15:34:47 +00:00
xinan.lin	f05d3e5019	[torch-xpu-ops] Update torch-xpu-ops commit pin. (#152321 ) Update the torch-xpu-ops commit to [655fa9bc7f88ab5bd3766b5f2fd5b43989c2caca](`655fa9bc7f`), including: - Fixes batch_norm numeric error by adding additional boundary check - Enable two operators: fft & jagged_to_padded_dense - XCCL relevant changes: - Cache cclStream to improve performance. - Add support for complex datatypes in allgather and broadcast. - Support coalescing operations and batch_isend_irecv. - Introduce additional logging; use export TORCH_CPP_LOG_LEVEL=INFO. - Fix #152296 - Fix #152020 Pull Request resolved: https://github.com/pytorch/pytorch/pull/152321 Approved by: https://github.com/EikanWang, https://github.com/Skylion007	2025-04-29 04:00:09 +00:00
PyTorch MergeBot	8172397025	Revert "Update torch-xpu-ops commit pin (#150827 )" This reverts commit 776aa682218bad4df7b6cd46ef2a0f1d8ca1194c. Reverted https://github.com/pytorch/pytorch/pull/150827 on behalf of https://github.com/etaf due to Inductor UT regression ([comment](https://github.com/pytorch/pytorch/pull/150827#issuecomment-2825857903))	2025-04-24 00:41:06 +00:00
Yutao Xu	776aa68221	Update torch-xpu-ops commit pin (#150827 ) Update the torch-xpu-ops commit to [b51dd3ef4f4d0f6b44c59e61431c5d29354dcaf6](`b51dd3ef4f`), including: - Update commit pin to xpu-ops main branch - Fixes batch_norm numeric error by adding additional boundary check - Enable two operators: fft & jagged_to_padded_dense - XCCL relevant changes: 1. Cache `cclStream` to improve performance. 2. Add support for complex datatypes in `allgather` and `broadcast`. 3. Support `coalescing` operations and `batch_isend_irecv`. 4. Introduce additional logging; use `export TORCH_CPP_LOG_LEVEL=INFO`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/150827 Approved by: https://github.com/EikanWang Co-authored-by: Yu, Guangye <106960996+guangyey@users.noreply.github.com>	2025-04-18 10:12:59 +00:00
Wang, Chuanqi	0198e44f37	Update torch-xpu-ops commit pin to 98c808d (#150554 ) Update the torch-xpu-ops commit to [98c808dea6de7330c415aa777d6921944cf79887](`98c808dea6`), include - Fixes #150001 by removing pre-CXX11 ABI logic from build script for XPU - Fixes #150430 - Fixes XCCL build issue caused by PR #150398 Pull Request resolved: https://github.com/pytorch/pytorch/pull/150554 Approved by: https://github.com/EikanWang, https://github.com/malfet	2025-04-02 22:42:18 +00:00
Wang, Chuanqi	f74d5d576a	Update torch-xpu-ops commit pin to 3ee2bd2 (#150300 ) Update the torch-xpu-ops commit to [3ee2bd2f13e1ed17a685986ff667a58bed5f2aa5](`3ee2bd2f13`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/150300 Approved by: https://github.com/EikanWang	2025-03-31 13:36:11 +00:00
chunhuanMeng	e9c12e819d	Update torch-xpu-ops commit pin (#148881 ) Update the torch-xpu-ops commit to [026b2c8c7c92a7b2cec5d26334006e3423251cc6](`026b2c8c7c`), includes: - Enable AOT for LNL Pull Request resolved: https://github.com/pytorch/pytorch/pull/148881 Approved by: https://github.com/EikanWang	2025-03-10 20:31:51 +00:00
Yutao Xu	21bd5fe203	Update torch-xpu-ops commit pin (#147968 ) Update the torch-xpu-ops commit to [86aaaf8a9dd6932c088b7afcac0c0856b23d341a](`86aaaf8a9d`), includes: - Bugfix (PT2E/BatchNorm) Pull Request resolved: https://github.com/pytorch/pytorch/pull/147968 Approved by: https://github.com/Skylion007	2025-02-27 05:01:12 +00:00
Yutao Xu	7bd2e3bca1	Update torch-xpu-ops commit pin (#147743 ) Update the torch-xpu-ops commit to [306a0ffb6e0cae27c5bd9a3b9cd378048c8e00e7](`306a0ffb6e`), includes: - Bugfix (LayerNorm/Nonzeros) - Update AOT target Pull Request resolved: https://github.com/pytorch/pytorch/pull/147743 Approved by: https://github.com/EikanWang	2025-02-25 08:06:35 +00:00
Yutao Xu	6edc419d69	Update torch-xpu-ops commit pin (#147358 ) Update the torch-xpu-ops commit to [a14d1eaa834a616705068103dc8129319087e864](`a14d1eaa83`), includes: - SparseCSR XPU support - Refine build system Pull Request resolved: https://github.com/pytorch/pytorch/pull/147358 Approved by: https://github.com/EikanWang	2025-02-18 16:05:25 +00:00
Yutao Xu	6a2bb629ec	Update torch-xpu-ops commit pin (#147302 ) Update the torch-xpu-ops commit to [b421032c8fed40df5eaee395c2e7f5f8a7bcc815](`b421032c8f`), includes: - Correct int4 weight pack implementation - Enhance build system: only build one shared library for the user Pull Request resolved: https://github.com/pytorch/pytorch/pull/147302 Approved by: https://github.com/EikanWang	2025-02-18 05:04:15 +00:00
Yutao Xu	de26ddfbdc	Update torch-xpu-ops commit pin (#146671 ) Update the torch-xpu-ops commit to [80c375570e2b6b2989a8610da1871f8a50dfddc7](`80c375570e`), includes: - Aten operator coverage improvement - SYCL kernel optimization - Nested Tensor OPs support Pull Request resolved: https://github.com/pytorch/pytorch/pull/146671 Approved by: https://github.com/EikanWang	2025-02-14 09:30:36 +00:00
Yutao Xu	6470b0ea6f	Update torch-xpu-ops commit pin (#144739 ) Update the torch-xpu-ops commit to [22cc419e4e60f469341712a5a103fa309a7dfd48](`22cc419e4e`), includes: - Fix building issue https://github.com/intel/torch-xpu-ops/issues/1279 - Aten operator coverage improvement Note: new torch-xpu-ops commit don't support bundle 0.5.3 Pull Request resolved: https://github.com/pytorch/pytorch/pull/144739 Approved by: https://github.com/EikanWang, https://github.com/malfet	2025-01-16 15:12:37 +00:00
Yutao Xu	1e881ceecf	Update torch-xpu-ops commit pin (#143984 ) Update the torch-xpu-ops commit to [28cfac20ec662abdb0ac98faf122450013e8f520](`28cfac20ec`), includes: - Disable batch_norm vectorization path to fix accuracy issues. - Fix the LSRM/RNN implementation error. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143984 Approved by: https://github.com/EikanWang, https://github.com/ruidazeng, https://github.com/desertfire, https://github.com/jansel	2025-01-05 09:01:36 +00:00
Yutao Xu	2ed4d65af0	Update torch-xpu-ops commit pin (#143853 ) Update the torch-xpu-ops commit to [214f33](`214f33b9d9`), includes: - Fix building issue for transformer related operators - Improve XPU operator coverage Pull Request resolved: https://github.com/pytorch/pytorch/pull/143853 Approved by: https://github.com/EikanWang	2024-12-30 02:38:16 +00:00
Yutao Xu	3cdd997f4c	Update torch-xpu-ops commit pin (#142113 ) Update the torch-xpu-ops commit to [7ecb0b](`7ecb0b1a56`), includes: - Capture rrelu_with_noise noise mutation in compile (Reslove https://github.com/pytorch/pytorch/issues/142102) Pull Request resolved: https://github.com/pytorch/pytorch/pull/142113 Approved by: https://github.com/EikanWang	2024-12-05 17:00:29 +00:00
Yutao Xu	b31d3b2f41	Update torch-xpu-ops commit pin (#141949 ) Update the torch-xpu-ops commit to [f31219](`f312190a92`), includes: - Add lazy init for empty_xpu - Fix nan propagation error for soft_shrink Pull Request resolved: https://github.com/pytorch/pytorch/pull/141949 Approved by: https://github.com/EikanWang	2024-12-05 05:22:38 +00:00
Yutao Xu	81ab2cc757	Update torch-xpu-ops commit pin (#141201 ) Update the torch-xpu-ops commit to [1e32bbc](`1e32bbc3d9`), includes: - Improve XPU aten operator coverage - Support basic `SparseXPU` operators Pull Request resolved: https://github.com/pytorch/pytorch/pull/141201 Approved by: https://github.com/EikanWang, https://github.com/jansel	2024-12-02 01:49:07 +00:00
Yutao Xu	ae7f809bfc	Update torch-xpu-ops commit pin (#140782 ) Update the torch-xpu-ops commit to [bf4bab1](`bf4bab1fff`), includes: - Fix Werror=terminate relevant building issues Pull Request resolved: https://github.com/pytorch/pytorch/pull/140782 Approved by: https://github.com/EikanWang	2024-11-15 10:10:52 +00:00
Yutao Xu	f1e045eb75	Update torch-xpu-ops commit pin (#140277 ) Update the torch-xpu-ops commit to [01f4e29](`01f4e293fa`), includes: - Improve XPU operator coverage - Fix `Werror=comments` relevant building issues Pull Request resolved: https://github.com/pytorch/pytorch/pull/140277 Approved by: https://github.com/EikanWang, https://github.com/atalman	2024-11-13 23:38:51 +00:00
Yutao Xu	c3087ace58	Update torch-xpu-ops commit pin (#139986 ) Update the torch-xpu-ops commit to [5e29831 ](https://github.com/intel/torch-xpu-ops/commit/5e29831). Includes: - OneAPI-2025 build issue fix - Enhancement of the XPU operator coverage Pull Request resolved: https://github.com/pytorch/pytorch/pull/139986 Approved by: https://github.com/guangyey, https://github.com/jansel	2024-11-10 06:49:38 +00:00
Yu, Guangye	d08dbd0436	Update torch-xpu-ops commit pin (#139041 ) # Motivation This PR intends to update torch-xpu-ops commit pin. It mainly includes the following two highlighted changes: 1. split the DLL library into 4 smaller libraries to avoid the 2G limitation on Windows; 2. some new operators added, for example, `cdist`, `pdist`, `maxunpool2d`, `maxunpood3d`, `upsample_trilinear3d, `Bessel operators`, etc... # Additional Context We have to supply XPU device check logic in `cdist` and `pdist` ops. This PR depends on https://github.com/pytorch/pytorch/pull/139050 to fix Windows build issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/139041 Approved by: https://github.com/EikanWang, https://github.com/ezyang	2024-10-31 05:06:06 +00:00
Yu, Guangye	0efa590d43	[CI] Fix XPU CI failure (#138548 ) # Motivation Fix https://github.com/pytorch/pytorch/issues/138577. # Solution 1. All UTs in `test/inductor/test_compiled_optimizers.py` are fixed by https://github.com/pytorch/pytorch/pull/134170 2. UT in `test/inductor/test_pattern_matcher.py` is introduced by https://github.com/pytorch/pytorch/pull/138089, we will skip this UT due to the unsupported feature `max_autotune_gemm_backends:Triton`. 3. We have a new impl related to `histc`, so we remove the expected failure from `test/inductor/test_torchinductor_opinfo.py` 4. We support `avg_pool3d` for `fp16` data type, so we remove the expected failure from `test/inductor/test_torchinductor_opinfo.py` 5. CUDA-bias code is introduced by https://github.com/pytorch/pytorch/issues/138472, we just generalize it to `GPU_TYPE`. # Additional Context > Why update torch-xpu-ops commit pin here? We have to update commit pin to avoid the build failure raised by the code change [C10_UNUSED](https://github.com/pytorch/pytorch/pull/138364). > What does the feature of torch-xpu-ops update? 1. Add some foreach ops, like `unary ops` and `foreach_clamp_max` etc; 2. Add some maxpool ops forward and backward, like `averge_pool3d` and `max_pool3d` 3. Add some other ops, like `log_normal_`, `index_copy`, and `mode` etc; 4. fix build failure related to `C10_UNUSED`; Pull Request resolved: https://github.com/pytorch/pytorch/pull/138548 Approved by: https://github.com/malfet, https://github.com/EikanWang	2024-10-24 07:56:26 +00:00
Wang, Eikan	5689e33cfe	[Intel GPU] Fix Windows linkage issue due to invisible structured kernel symbols (#137794 ) Intel GPU aten library(libtorch_xpu) utilizes `torchgen` to generate structure kernels. Currently, the generated structure kernels are decorated by `TORCH_API` to control the visibility, while `TORCH_API` is controlled by the `CAFFE2_BUILD_MAIN_LIB` macro. However, we cannot enable `CAFFE2_BUILD_MAIN_LIB` for the Intel GPU ATen library naively. Because the macro not only serves for the `TORCH_API` semantic. It means that the semantic of `TORCH_API` is symbol `hidden`. https://github.com/pytorch/pytorch/blob/main/c10/macros/Export.h#L95-L99 Therefore, we need to use ` TORCH_XPU_API` to decorate the produced structure kernels. Pull Request resolved: https://github.com/pytorch/pytorch/pull/137794 Approved by: https://github.com/atalman ghstack dependencies: #137873	2024-10-15 15:31:37 +00:00
PyTorch MergeBot	079f909263	Revert "Make Context to be Device-agnostic Step by Step (1/N) (#136519 )" This reverts commit be0b75256a7e516217b059ef273901b95c022fe7. Reverted https://github.com/pytorch/pytorch/pull/136519 on behalf of https://github.com/jovianjaison due to this pr is causing errors internally ([comment](https://github.com/pytorch/pytorch/pull/136519#issuecomment-2405781093))	2024-10-10 18:32:17 +00:00
FFFrog	be0b75256a	Make Context to be Device-agnostic Step by Step (1/N) (#136519 ) - make init to be device-agnostic and move it to AcceleratorHooksInterface - refactoring context related to device initialization Pull Request resolved: https://github.com/pytorch/pytorch/pull/136519 Approved by: https://github.com/ezyang, https://github.com/EikanWang, https://github.com/guangyey	2024-10-09 02:13:36 +00:00
Feng Yuan	0d1d69fd25	Update torch-xpu-ops pin (ATen XPU implementation) (#135647 ) Release cycle for PyTorch 2.5 1. Fixing runtime error on Windows: Fail to load torch_xpu_ops_unary_binary_kernels.dll as the bin size is large. Pull Request resolved: https://github.com/pytorch/pytorch/pull/135647 Approved by: https://github.com/EikanWang	2024-09-12 03:16:08 +00:00
Feng Yuan	60d98b4cfb	Update torch-xpu-ops pin (ATen XPU implementation) (#135300 ) Release cycle for PyTorch 2.5 1. Bugfixing: correct reduction logic in cdist kernel. Pull Request resolved: https://github.com/pytorch/pytorch/pull/135300 Approved by: https://github.com/EikanWang	2024-09-06 07:30:09 +00:00
Feng Yuan	b99ef1a02e	Update torch-xpu-ops pin (ATen XPU implementation) (#135185 ) Release cycle for PyTorch 2.5 1. Update specific AOT targets for Windows. On Windows, AOT target list prefers Intel client GPUs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/135185 Approved by: https://github.com/EikanWang	2024-09-05 10:05:23 +00:00
Feng Yuan	2443507acc	Update torch-xpu-ops pin (ATen XPU implementation) (#134983 ) Release cycle for PyTorch 2.5 1. Enable Windows build in latest torch-xpu-ops. Resolved large bin issue. 2. Refine test infrastructure for compatibility on different HW platforms. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134983 Approved by: https://github.com/EikanWang	2024-09-03 12:14:37 +00:00
Feng Yuan	b7baa062fc	Update torch-xpu-ops pin (ATen XPU implementation) (#133850 ) Bugfixings for PyTorch 2.5, 1. Using SYCL group algorithm API instead of old style for sub group shift utilities. 2. Add preprocess in reduction kernel for cases requiring data type cast. 3. Make group norm memory format compatible. 4. ZeroTensor: a. Remove unnecessary aten operators registration, or ZeroTensor process is bypassed. b. Align preprocess with intree implementation in aten::copy_. 5. Rebase checkIndexTensorTypes usage. 6. Align latest semantics of PyTorch foreach operators. Return multiple tensors with offset=0. Pull Request resolved: https://github.com/pytorch/pytorch/pull/133850 Approved by: https://github.com/EikanWang	2024-08-22 06:27:03 +00:00
Feng Yuan	81b8d3586f	Update torch-xpu-ops pin (ATen XPU implementation) (#132390 ) Regular update. 1. New 69 ATen operators and variants are added. See https://github.com/intel/torch-xpu-ops/blob/main/yaml/xpu_functions.yaml. 2. Align with PyTorch in-tree to use safe data pointer access APIs. 3. Enable FP64 conversion emulation for some platforms. Pull Request resolved: https://github.com/pytorch/pytorch/pull/132390 Approved by: https://github.com/EikanWang	2024-08-04 02:22:46 +00:00
Yu, Guangye	dfba85c26b	Update torch-xpu-ops pin (ATen XPU implementation) (#131643 ) # Motivation Regular update. 1. Some new ATen ops support 2. ABI=0 build support 3. Remove dispatched implementation of pin_memory&is_pinned 4. Enhance deterministic usage Pull Request resolved: https://github.com/pytorch/pytorch/pull/131643 Approved by: https://github.com/EikanWang	2024-07-26 05:51:58 +00:00
Feng Yuan	b556d31586	Update torch-xpu-ops pin (ATen XPU implementation) (#131015 ) Regular update. 1. New 90 ATen operators and their variants are supported for XPU. 2. Bugfixing: a. Fixing out-of-bound memory access in index_put kernel b. Fixing debug build error 3. Binary change. Split device AOT code of SYCL kernel into multiple libraries to avoid linkage failure. 4. torch-xpu-ops test case enhancement: a. Hook PyTorch testing ob_db to align opInfo configuration with CUDA b. Hook _check_arg_device2 and freeze_rng_state to make XPU happy Pull Request resolved: https://github.com/pytorch/pytorch/pull/131015 Approved by: https://github.com/EikanWang	2024-07-19 02:18:55 +00:00
Feng Yuan	cf090e222e	Update torch-xpu-ops pin (ATen XPU implementation) (#130333 ) 1. Fixing compilation error due to PyTorch update. The helper function prototype changes, `checkIndexTensorTypes`. 2. Fixing compilation error due to PyTorch update. PyTorch forced -Werror=unused-function. 3. Fixing inductor case failure due to CUDA bias implementation in the case. https://github.com/pytorch/pytorch/issues/130426 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130333 Approved by: https://github.com/EikanWang, https://github.com/atalman	2024-07-10 18:10:53 +00:00

1 2

58 Commits