pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Author	SHA1	Message	Date
fengqing.lu	acece97c3a	[Intel GPU] Upgrade OneDNN XPU Tag to v3.9.1 (#161932 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/161932 Approved by: https://github.com/EikanWang, https://github.com/Skylion007, https://github.com/guangyey	2025-09-04 11:05:10 +00:00
Nathan Brown	93da9952a7	gloo: fix building system gloo with CUDA/HIP (#146637 ) Fix incorrect linking of Gloo's libraries when building with system Gloo. Previously, either Gloo's native library or Gloo's CUDA library were linked. However, Gloo had changed such that all users of Gloo must link the native library, and can optionally link the CUDA or HIP library for Gloo + CUDA/HIP support. This had been updated when building/linking with vendored Gloo, but not when using system Gloo. Fixes: #146239 Reported-by: Adam J Stewart <ajstewart426@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/146637 Approved by: https://github.com/malfet	2025-08-06 22:56:31 +00:00
cyy	3ee8828c87	[1/N] Don't use CUDA.cmake module (#157188 ) Small changes before removing CUDA.cmake. Pull Request resolved: https://github.com/pytorch/pytorch/pull/157188 Approved by: https://github.com/ezyang	2025-07-08 03:05:35 +00:00
Isuru Fernando	8f0998aafe	Check F2C BLAS for OpenBLAS and other vendors (#143846 ) This issue came from https://github.com/conda-forge/pytorch-cpu-feedstock/issues/180. MKL follows the F2C convention for returning single precision floats as doubles and uses the G77 convention for returning complex valued scalars. OpenBLAS does the opposite. There is a check for this already, but it's done only when the Generic BLAS vendor code path is used and this PR moves that code to `Dependencies.cmake` to make it work when the BLAS vendor is OpenBLAS and others Pull Request resolved: https://github.com/pytorch/pytorch/pull/143846 Approved by: https://github.com/rgommers, https://github.com/atalman	2025-07-01 05:56:24 +00:00
Avanish Tiwari	5e18bc3331	[PowerPC] Fixed build issue for vsx vec256 complexfloat and scaled_mm_out_cpu (#155255 ) Pytorch build is failing on power system from this commit ec24f8f58a74502c5a2488f5d9e85a817616dda0 *Build Failure Logs* Error related to mkldnn ``` pytorch/aten/src/ATen/native/Blas.cpp:302:26: error: ‘cpuinfo_has_x86_amx_int8’ was not declared in this scope 302 \| if ((!mixed_dtype && cpuinfo_has_x86_amx_int8()) \|\| \| ^~~~~~~~~~~~~~~~~~~~~~~~ pytorch/aten/src/ATen/native/Blas.cpp:303:25: error: ‘cpuinfo_has_x86_amx_fp16’ was not declared in this scope 303 \| (mixed_dtype && cpuinfo_has_x86_amx_fp16())) { \| ^~~~~~~~~~~~~~~~~~~~~~~~ ``` Error related to vec256 complex float redefinition ``` aten/src/ATen/cpu/vec/vec256/vsx/vec256_complex_float_vsx.h:19:7: error: specialization of ‘at::vec::DEFAULT::Vectorized<c10::complex<float> >’ after instantiation 19 \| class Vectorized<ComplexFlt> { \| ^~~~~~~~~~~~~~~~~~~~~~ aten/src/ATen/cpu/vec/vec256/vsx/vec256_complex_float_vsx.h:19:7: error: redefinition of ‘class at::vec::DEFAULT::Vectorized<c10::complex<float> >’  aten/src/ATen/cpu/vec/vec256/vsx/vec256_complex_float_vsx.h:633:18: error: ‘const class at::vec::DEFAULT::Vectorized<c10::complex<float> >’ has no member named ‘abs_2_’ 633 \| auto abs_a = a.abs_2_(); \| ^~~~~~ aten/src/ATen/cpu/vec/vec256/vsx/vec256_complex_float_vsx.h:634:18: error: ‘const class at::vec::DEFAULT::Vectorized<c10::complex<float> >’ has no member named ‘abs_2_’ 634 \| auto abs_b = b.abs_2_(); \| ^~~~~~ /aten/src/ATen/cpu/vec/vec256/vsx/vec256_complex_float_vsx.h:666:17: error: ‘const class at::vec::DEFAULT::Vectorized<c10::complex<float> >’ has no member named ‘vec0’ 666 \| vec_add(a.vec0(), b.vec0()), vec_add(a.vec1(), b.vec1())}; aten/src/ATen/cpu/vec/vec256/vsx/vec256_complex_float_vsx.h:673:17: error: ‘const class at::vec::DEFAULT::Vectorized<c10::complex<float> >’ has no member named ‘vec0’ 673 \| vec_sub(a.vec0(), b.vec0()), vec_sub(a.vec1(), b.vec1())}; \| ^~~~ aten/src/ATen/cpu/vec/vec256/vsx/vec256_complex_float_vsx.h:680:27: error: ‘const class at::vec::DEFAULT::Vectorized<c10::complex<float> >’ has no member named ‘vec0’ 680 \| vec_and(a.vec0(), b.vec0()), vec_and(a.vec1(), b.vec1())}; ``` *With this changes build logs* ``` Building wheel torch-2.8.0a0+gita3098a7 -- Building version 2.8.0a0+gita3098a7 -- Checkout nccl release tag: v2.26.5-1 cmake -GNinja -DBLAS=OpenBLAS -DBUILD_PYTHON=True -DBUILD_TEST=True -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/home/avanish/OfficeWork2025/JuneWork/pytorch_5Jun/pack/torch_night_5Jun/pytorch/torch -DCMAKE_PREFIX_PATH=/home/avanish/OfficeWork2025/JuneWork/pyenv/pytorch_5Jun/lib/python3.12/site-packages -DPython_EXECUTABLE=/home/avanish/OfficeWork2025/JuneWork/pyenv/pytorch_5Jun/bin/python -DTORCH_BUILD_VERSION=2.8.0a0+gita3098a7 -DUSE_MKLDNN=ON -DUSE_MKLDNN_CBLAS=ON -DUSE_NUMPY=True -DUSE_OPENMP=ON /home/avanish/OfficeWork2025/JuneWork/pytorch_5Jun/pack/torch_night_5Jun/pytorch cmake --build . --target install --config Release running build_ext -- Building with NumPy bindings -- Not using cuDNN -- Not using CUDA -- Not using XPU -- Using MKLDNN -- Not using Compute Library for the Arm architecture with MKLDNN -- Using CBLAS in MKLDNN -- Not using NCCL -- Building with distributed package: -- USE_TENSORPIPE=True -- USE_GLOO=True -- USE_MPI=False -- Building Executorch -- Not using ITT Copying functorch._C from functorch/functorch.so to /home/avanish/OfficeWork2025/JuneWork/pytorch_5Jun/pack/torch_night_5Jun/pytorch/build/lib.linux-ppc64le-cpython-312/functorch/_C.cpython-312-powerpc64le-linux-gnu.so copying functorch/functorch.so -> /home/avanish/OfficeWork2025/JuneWork/pytorch_5Jun/pack/torch_night_5Jun/pytorch/build/lib.linux-ppc64le-cpython-312/functorch/_C.cpython-312-powerpc64le-linux-gnu.so building 'torch._C' extension creating build/temp.linux-ppc64le-cpython-312/torch/csrc ``` This patch will fix the pytorch build issue on power, and i am able to build successfully. Hi @malfet @albanD Please review this PR for pytorch build issue that we are observing on power. Pull Request resolved: https://github.com/pytorch/pytorch/pull/155255 Approved by: https://github.com/albanD, https://github.com/malfet	2025-06-30 17:54:37 +00:00
PyTorch MergeBot	b1d62febd0	Revert "Use official CUDAToolkit module in CMake (#154595 )" This reverts commit 08dae945ae380d80efbaf140a95abfc5d96e5100. Reverted https://github.com/pytorch/pytorch/pull/154595 on behalf of https://github.com/malfet due to It breaks on some local setup with no clear diagnostic, but looks like it fails to find cuFile ([comment](https://github.com/pytorch/pytorch/pull/154595#issuecomment-2997959344))	2025-06-23 21:15:31 +00:00
cyy	08dae945ae	Use official CUDAToolkit module in CMake (#154595 ) Use CUDA language in CMake and remove forked FindCUDAToolkit.cmake. Some CUDA targets are also renamed with `torch::` prefix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/154595 Approved by: https://github.com/albanD	2025-06-22 05:44:29 +00:00
Xuehai Pan	ccea6ddac3	[BE] fix typos in cmake/ (#156079 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156079 Approved by: https://github.com/Skylion007	2025-06-17 19:25:43 +00:00
Yu, Guangye	10cef1e25d	Remove torch XPU ABI=0 build logic for old compiler (#150095 ) # Motivation Follow https://github.com/pytorch/pytorch/pull/149888, this PR intends to remove ABI=0 build logic for PyTorch XPU build with old compiler (< 2025.0). For newer compilers >= 2025.0, the ABI is neutral by default without requiring additional compilation options (`-fpreview-breaking-changes`). # Additional Context This PR depends on XPU CI pass, which will be fixed by https://github.com/pytorch/pytorch/pull/149843 and https://github.com/intel/torch-xpu-ops/pull/1515 Pull Request resolved: https://github.com/pytorch/pytorch/pull/150095 Approved by: https://github.com/EikanWang, https://github.com/malfet	2025-06-06 13:13:19 +00:00
fengqing.lu@intel.com	7b074346e0	[Intel GPU] Support f32 intermediate dtype, headdim size <=576 and f32 causal mask for SDPA (#152091 ) In OneDNN v3.7, SDPA has below defects: 1. The dtype of intermediate value is the same as QKV, while Pytorch uses FP32 dtype for intermediate value to make sure better accuracy. 2. Only support headdim size <= 256. 3. Don't support implict causal mask when QKV is FP32. We need to build an attention mask explicitly with aten ops. In OneDNN v3.8, they have update for these defects. Since these are tiny changes, I decided to put them in single PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/152091 Approved by: https://github.com/EikanWang, https://github.com/guangyey, https://github.com/drisspg	2025-06-04 05:18:36 +00:00
atalman	22641f42b6	[Binary-builds]Use System NCCL by default in CI/CD. (#152835 ) Use System NCCl by default. The correct nccl version is already built into the Manylinux docker image. Will followup with PR on detecting if user has NCCL installed and enabling USE_SYSTEM_NCCL by default in this case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/152835 Approved by: https://github.com/malfet	2025-05-30 18:51:48 +00:00
Ben Olson	1bebe0424e	Fix platform detection in MKLDNN CMake file (#142067 ) When building PyTorch with `USE_XPU=True` and Clang, the user sees misleading errors related to incorrect platform detection that assumes that all users that are not using the GNU compilers are on Windows. We can fix this by simply using CMake's builtin platform detection variables. Pull Request resolved: https://github.com/pytorch/pytorch/pull/142067 Approved by: https://github.com/EikanWang, https://github.com/min-jean-cho, https://github.com/guangyey	2025-05-26 06:09:37 +00:00
Anthony Shoumikhin	7d39e73c57	Fix more URLs (#153277 ) Or ignore them. Found by running the lint_urls.sh script locally with https://github.com/pytorch/pytorch/pull/153246 Pull Request resolved: https://github.com/pytorch/pytorch/pull/153277 Approved by: https://github.com/malfet	2025-05-14 16:23:50 +00:00
Milos Puzovic	642e9305eb	Fixes detection of ArmPL on Linux platform (#150031 ) On Linux it failed to detect that there is bin directory as it wasn't looking for armpl-info which is the only file that is in that directory on Linux and also adding link to math library as it is required to link against when checking for LAPACK functions. Fixes #149610 Pull Request resolved: https://github.com/pytorch/pytorch/pull/150031 Approved by: https://github.com/fadara01, https://github.com/malfet	2025-05-07 19:47:21 +00:00
Nikita Shulga	07290bdcdc	Skip search for MKL on ARM cpus (#145850 ) It will not find it anyway and makes a bit easier parsing thru CMake log on non-x86 systems Pull Request resolved: https://github.com/pytorch/pytorch/pull/145850 Approved by: https://github.com/atalman	2025-05-02 18:39:49 +00:00
Vinitha Vijayan	e872bf8f88	Avoid linking multiple OMP runtimes in libtorch_cpu.so if BLAS used is OpenBLAS. (#147725 ) When PyTorch is built with OpenBLAS support and libopenblas is ldrectly linked with libgomp.so the libtorch_cpu.so ends up getting multiple omp runtimes linked against it. This may result in unexpected runtime behaviour /regression. This patch fixes this by avoiding linking against libomp.so if OpenBLAS is linked against libgomp.so Fixes #146603 Pull Request resolved: https://github.com/pytorch/pytorch/pull/147725 Approved by: https://github.com/albanD	2025-04-29 23:39:48 +00:00
Ryo Suzuki	fcbbb03d48	Extend vec backend with BF16 SVE intrinsics (#143666 ) - Following the work in https://github.com/pytorch/pytorch/pull/119571, BF16 SVE intrinsics are added to the Vectorized class, providing ~1.7x speedup on `silu` and `softmax`. - Added bf16 detection in CMake - Added a guard for native NEON code to prevent compilation errors @aditew01 @maajidkhann please have a look Pull Request resolved: https://github.com/pytorch/pytorch/pull/143666 Approved by: https://github.com/malfet, https://github.com/aditew01, https://github.com/nikhil-arm Co-authored-by: Aditya Tewari <aditya.tewari@arm.com>	2025-04-28 18:25:44 +00:00
PyTorch MergeBot	bada898f5e	Revert "Extend vec backend with BF16 SVE intrinsics (#143666 )" This reverts commit d072254eaea325a507c1498431e4c8294205fe2d. Reverted https://github.com/pytorch/pytorch/pull/143666 on behalf of https://github.com/malfet due to I'm unsure why this PR got merged, as it doesn't have a valid review ([comment](https://github.com/pytorch/pytorch/pull/143666#issuecomment-2749013169))	2025-03-24 18:13:50 +00:00
Yu, Guangye	db9b031b00	Add default XPU toolkit path to CMake (#149270 ) # Motivation Add default XPU runtime path to CMake to mitigate https://github.com/pytorch/pytorch/issues/149075 This ensures proper linking with `libtorch` when a user does not source the Torch XPU toolkit while working on a C++ library or executable. Pull Request resolved: https://github.com/pytorch/pytorch/pull/149270 Approved by: https://github.com/dvrogozh, https://github.com/EikanWang, https://github.com/atalman	2025-03-24 14:41:24 +00:00
Ryo Suzuki	d072254eae	Extend vec backend with BF16 SVE intrinsics (#143666 ) - Following the work in https://github.com/pytorch/pytorch/pull/119571, BF16 SVE intrinsics are added to the Vectorized class, providing ~1.7x speedup on `silu` and `softmax`. - Added bf16 detection in CMake - Added a guard for native NEON code to prevent compilation errors @aditew01 @maajidkhann please have a look Pull Request resolved: https://github.com/pytorch/pytorch/pull/143666 Approved by: https://github.com/swolchok, https://github.com/aditew01 Co-authored-by: Aditya Tewari <aditya.tewari@arm.com>	2025-03-21 10:55:11 +00:00
Dmitry Rogozhkin	45a879e55b	xpu: improve error handling and reporting in XPU cmake files (#149353 ) For #149075 * Add a graceful cmake error instead of cryptic one if SYCL runtime is not found: ``` The link interface of target "c10_xpu" contains: torch::xpurt but the target was not found. ``` * Suppress unclear cmake error if SYCL compiler is not available and further version query fails: ``` CMake Error at /home/dvrogozh/pytorch/torch/share/cmake/Caffe2/FindSYCLToolkit.cmake:37 (string): string sub-command REGEX, mode REPLACE needs at least 6 arguments total to command. ``` CC: @gujinghui @EikanWang @fengyuan14 @guangyey @jgong5 Pull Request resolved: https://github.com/pytorch/pytorch/pull/149353 Approved by: https://github.com/guangyey, https://github.com/malfet	2025-03-20 02:00:39 +00:00
Michal Gallus	5bbca7d328	[ROCm][Windows] Fix OpenMP Flags for clang-cl (#148097 ) When clang-cl parses its command line arguments, it expects MSVC-style arguments (beggining with `/` such as `/WX`, `/MD`, etc.) to be provided, and clang-style arguments to be preceded by `-Xclang`, otherwise, the clang-style parameters are ignored as they are interpreted unrecognized compiler options. Pull Request resolved: https://github.com/pytorch/pytorch/pull/148097 Approved by: https://github.com/jeffdaily	2025-03-10 22:47:15 +00:00
Fadi Arafeh	d1f21d8ec3	Enable Direct Use of Arm Compute Library (ACL) in ATen (#148584 ) ACL is already built with PyTorch as a shared library when USE_MKLDNN_ACL is set. Currently, it is only used indirectly in ATen via oneDNN for AArch64 targets. However there are cases where it makes sense to utilize ACL directly without oneDNN as an intermediary - e.g. quantization. See #145942, #147337, #146620. This patch enables such use cases by exposing ACL to ATen Pull Request resolved: https://github.com/pytorch/pytorch/pull/148584 Approved by: https://github.com/malfet	2025-03-10 18:29:51 +00:00
ZhiweiYan-96	4075646bd8	Use oneDNN v3.7.1 for Intel GPU (#148403 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/148403 Approved by: https://github.com/EikanWang Co-authored-by: majing <jing1.ma@intel.com> Co-authored-by: xiaolil1 <xiaoli.liu@intel.com>	2025-03-07 08:03:49 +00:00
ZhiweiYan-96	af720cd5a7	[Intel GPU] Decompule Intel GPU oneDNN from other backends (#147926 ) # Motivation Currently, Intel GPU is moving forward rapidly with the development of feature. We(Intel GPU) want an independent version control over oneDNN component so as to quickly adopt the optimization or bug fixing provided by oneDNN team. This PR does not change the behaviors of other backends like Intel CPU, ARM. They can keep using the stable version contained in `third_party/ideep`. # Detail At compilation time, we will `git clone` oneDNN via URL `https://github.com/oneapi-src/oneDNN` and checkout to the tag/commit that Intel GPU backend prefers. This feature is supported by CMake `Externalproject_add` command. Following is a build log example: ```bash [11/60] Performing download step (git clone) for 'xpu_mkldnn_proj' Cloning into 'xpu_mkldnn_proj'... HEAD is now at 5e92240360 meta: updated citation file [12/60] Performing update step for 'xpu_mkldnn_proj' -- Already at requested tag: v3.7 [13/60] No patch step for 'xpu_mkldnn_proj' ``` The log demonstates that, we explicitly download the source files and checkout to a specific tag. The source file of oneDNN is located at `build/xpu_mkldnn_proj-prefix/src/xpu_mkldnn_proj` # Runtime verification Running UT for CPU ```bash onednn_verbose,v1,info,oneDNN v3.7.0 (commit fc3f17ad469b8a6da7192ae12d32625faa509f1e) onednn_verbose,v1,info,cpu,runtime:OpenMP,nthr:24 onednn_verbose,v1,info,cpu,isa:Intel AVX-512 with Intel DL Boost onednn_verbose,v1,info,gpu,runtime:none onednn_verbose,v1,info,graph,backend,0:dnnl_backend onednn_verbose,v1,primitive,info,template:operation,engine ``` Runnint UT for Intel GPU ```bash onednn_verbose,v1,info,oneDNN v3.7.0 (commit 5e9224036021433d2577548ed0539fe9a53256bc) onednn_verbose,v1,info,cpu,runtime:threadpool,nthr:24 onednn_verbose,v1,info,cpu,isa:Intel AVX-512 with Intel DL Boost onednn_verbose,v1,info,gpu,runtime:DPC++ onednn_verbose,v1,info,gpu,engine,sycl gpu device count:2 ``` We can see that, Intel GPU would uses commit `5e922` (tag v3.7), while CPU uses `fc3f17` Pull Request resolved: https://github.com/pytorch/pytorch/pull/147926 Approved by: https://github.com/EikanWang Co-authored-by: leizhenyuan <zhenyuan.lei@intel.com>	2025-02-28 07:42:06 +00:00
Wang, Eikan	2c35af4def	[Intel GPU] Avoid including CPU oneDNN header files for Intel GPU (#147969 ) XPU builds oneDNN in another folder. The XPU oneDNN head files are in the XPU-specific folder - `${__XPU_MKLDNN_BUILD_DIR}`. `f522d899fb/cmake/Modules/FindMKLDNN.cmake (L73)` So, `${PROJECT_SOURCE_DIR}/third_party/ideep/mkl-dnn/include` is useless for XPU. `XPU_MKLDNN_INCLUDE` is good enough. Meanwhile, it may mess up the included files if the version of XPU oneDNN differs from other backends. * __->__ #147969 Pull Request resolved: https://github.com/pytorch/pytorch/pull/147969 Approved by: https://github.com/ZhiweiYan-96, https://github.com/liangan1, https://github.com/atalman	2025-02-27 14:22:17 +00:00
Ding, Yi1	af1072ffb6	[Intel GPU] Enable BUILD_GRAPH for xpu_mkldnn (#147608 ) For preparation of OneDNN based XPU SDPA enabling. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147608 Approved by: https://github.com/EikanWang, https://github.com/atalman	2025-02-21 16:12:30 +00:00
Nikita Shulga	0d5f0a81c5	[CMake] Find HomeBrew OpenMP on MacOS (#145870 ) Either via `OMP_PREFIX` envvar or by searching in `/opt/homebrew/opt/libomp` folder Modify libomp bundling logic in setup.py to change absolute path to libomp.dylib to a relative one if necessary Pull Request resolved: https://github.com/pytorch/pytorch/pull/145870 Approved by: https://github.com/Skylion007, https://github.com/atalman ghstack dependencies: #145871	2025-01-30 03:19:51 +00:00
PyTorch MergeBot	b80482988f	Revert "[CMake] Find HomeBrew OpenMP on MacOS (#145870 )" This reverts commit c26bb9ba5bd40d256a25436212279bc7e4b436ae. Reverted https://github.com/pytorch/pytorch/pull/145870 on behalf of https://github.com/malfet due to Want to refine it a bit ([comment](https://github.com/pytorch/pytorch/pull/145870#issuecomment-2622659614))	2025-01-29 19:34:27 +00:00
Nikita Shulga	c26bb9ba5b	[CMake] Find HomeBrew OpenMP on MacOS (#145870 ) Either via `OMP_PREFIX` envvar or just searching in that folder Pull Request resolved: https://github.com/pytorch/pytorch/pull/145870 Approved by: https://github.com/Skylion007	2025-01-28 23:09:37 +00:00
Nikita Shulga	8d91bfd965	[BE] Include CheckFunctionExists in `FindBLAS.cmake` (#145849 ) It's used in the script, so it must be included Pull Request resolved: https://github.com/pytorch/pytorch/pull/145849 Approved by: https://github.com/Skylion007	2025-01-28 19:47:05 +00:00
Stefan-Alin Pahontu	0674ab7e33	solve apl dependency issue (#145215 ) According to the [APL documentation](https://developer.arm.com/documentation/101004/2404/General-information/Arm-Performance-Libraries-example-programs), libraries ending with _mp are OpenMP multi-threaded libraries. When a project is compiled with MSVC and the -openmp flag, the vcomp library (Visual C++ implementation of OpenMP) is used for runtime calls. However, the current APL implementation uses the libomp.dll (LLVM) variant. As a result, there are unexpected behaviors at runtime. --- For Example: ```python import torch # Create a sparse tensor # Input (Sparse Tensor): # [[0, 1], # [1, 0]] indices = torch.tensor([[0, 1], [1, 0]]) values = torch.tensor([1, 1], dtype=torch.float32) size = torch.Size([2, 2]) sparse_tensor = torch.sparse_coo_tensor(indices, values, size) # Convert sparse tensor to dense tensor dense_tensor = sparse_tensor.to_dense() # Expected Output (Dense Tensor): # [[0, 1], # [1, 0]] print("\nDense Tensor:") print(dense_tensor) ``` However, it prints unexpected outputs such as: ```python # [[0, 11], # [10, 0]] ``` The issue arises because the following code does not function as expected at runtime: https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/ParallelOpenMP.h#L30 ```c++ // returns 1 , however since OpenMP is enabled it should return total number of threads int64_t num_threads = omp_get_num_threads(); ``` --- In the runtime, loading multiple OpenMP libraries (in this case `libomp` and `vcomp`) is causing unexpected behaviours. So, we've changed libraries from `_mp` to non `_mp` versions and we used `vcomp` for OpenMP calls. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145215 Approved by: https://github.com/ozanMSFT, https://github.com/malfet Co-authored-by: Ozan Aydin <148207261+ozanMSFT@users.noreply.github.com>	2025-01-27 13:02:16 +00:00
Yu, Guangye	891ba2ec8a	Fix xpu cmake typo (#140374 ) # Motivation This PR aims to fix a typo in the CMake build. The typo impacts the XPU Windows build and results in PyTorch being built without XPU, which is unexpected. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140374 Approved by: https://github.com/EikanWang, https://github.com/ezyang, https://github.com/atalman	2024-11-13 00:26:35 +00:00
Yu, Guangye	8051ee802c	Add XPU compiler version control in cmake to keep BC (#139258 ) # Motivation This PR aims to maintain backward compatibility when building PyTorch XPU with the old and new compilers. # Additional Context The details are described here. The new compiler (2025.0.0) has some breaking changes compared with the old compiler(2024.1), for examples: 1. On Windows, sycl library is named `sycl7.lib` in the old compiler but is named `sycl.lib` in the new compiler. 2. On Linux, in order to support ABI=0, we have to link `libsycl-preview.so` in the old compiler but we could link `libsycl.so` in the new compiler to have the same ABI compatibility. 3. We added a macro `SYCL_COMPILER_VERSION` to support our new code has good backward compatibility with the old compiler. Now the new feature(Event elapsed_time, memory summary, and device architecture property) introduced by the new compiler will be controlled within the macro `SYCL_COMPILER_VERSION`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/139258 Approved by: https://github.com/EikanWang, https://github.com/atalman, https://github.com/gujinghui	2024-11-09 13:31:21 +00:00
Irem Yuksel	b021486405	Enable Windows Arm64 (#133088 ) This PR enables Pytorch for Windows on Arm64 - CPU only. Currently, there aren't any checks in place to build and test for Windows on Arm64, but we're working to implement those as soon as possible. We recommend using [Arm Performance Libraries (APL)](https://developer.arm.com/Tools%20and%20Software/Arm%20Performance%20Libraries) as a BLAS option, which is introduced in this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/133088 Approved by: https://github.com/malfet Co-authored-by: cristian panaite <panaite.cristian2000@gmail.com> Co-authored-by: Stefan-Alin Pahontu <56953855+alinpahontu2912@users.noreply.github.com> Co-authored-by: Ozan Aydin <148207261+ozanMSFT@users.noreply.github.com>	2024-10-24 16:10:44 +00:00
maajidkhann	5a6ddbcc3b	Extending the Pytorch vec backend for SVE (ARM) (#119571 ) Motivation: In Pytorch, Aten vectorization supports multiple platforms, including x86 and Arm, as well as multiple data types. It provides a generic implementation of Vector (Vec) type that allows the programmer to write code packing various primitives (such as floats) within 256bit & 512bits registers. It can be extended to support other ISAs easily by adding more VecISA sub-classes. Reference Link: https://github.com/pytorch/pytorch/tree/main/aten/src/ATen/cpu/vec This PR: * Our goal with this contribution is to add support for SVE backend for Vec in the Aten vectorization for CPU backend which can be benefitted by any ARM architecture supported CPU's that supports SVE. * More about SVE ISA for ARM: [https://developer.arm.com/Architectures/Scalable Vector Extensions](https://developer.arm.com/Architectures/Scalable%20Vector%20Extensions) * We are using the ARM C Language Extensions for SVE (https://developer.arm.com/documentation/102699/0100/Optimizing-with-intrinsics ) to accelerate performance for various operators in the SVE backend for Vec. * Currently we are adding support only for SVE ISA with the vector length of 256 bits (SVE 256). In future, we plan to extend this SVE support for other vector lengths as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/119571 Approved by: https://github.com/malfet, https://github.com/snadampal Co-authored-by: Divya Kotadiya <divya.kotadiya@fujitsu.com>	2024-09-18 18:59:10 +00:00
Dmitry Rogozhkin	9852c6d236	xpu: fix 3rd party builds on systems with cmake<3.25 (#135767 ) Cmake LINUX variable is available on starting from cmake 3.25. Better to use CMAKE_SYSTEM_NAME instead to relax cmake version requirement. See: https://cmake.org/cmake/help/v3.25/variable/LINUX.html Fixes: #135766 Pull Request resolved: https://github.com/pytorch/pytorch/pull/135767 Approved by: https://github.com/malfet, https://github.com/guangyey	2024-09-12 05:31:01 +00:00
CaoE	f7c0c06692	Add oneDNN BRGEMM support on CPU (#131878 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/131878 Approved by: https://github.com/jgong5, https://github.com/peterbell10	2024-09-07 13:22:30 +00:00
min-jean-cho	ecbd715363	[Intel GPU][Windows] Fix overriding default CMAKE_CXX_FLAGS (#135093 ) The root cause is that `/EHsc` is part of the default `CMAKE_CXX_FLAGS` in CMake. Fix to not override the default `CMAKE_CXX_FLAGS`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/135093 Approved by: https://github.com/EikanWang, https://github.com/atalman	2024-09-05 12:52:43 +00:00
Edward Z. Yang	a258844a32	Properly handle empty CPUINFO variable (#134916 ) Fixes https://github.com/pytorch/pytorch/issues/134915 But I did not root cause why CPUINFO is totally empty to begin with... Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/134916 Approved by: https://github.com/Skylion007	2024-09-03 15:59:59 +00:00
Yu, Guangye	3402a5d865	fix windows xpu build issue (#133845 ) # Motivation If build XPU via oneAPI 2024.2, it will fail because `sycl-preview.lib` exists in windows. And linking the unexpected lib results in `error LNK2019: unresolved external symbol`. # Solution Use explicitly `sycl-preview` in linux build only. # Additional Context For `find_library`, please note that the variable will not be updated if it has been stored. ``` If the library is found the result is stored in the variable and the search will not be repeated unless the variable is cleared. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/133845 Approved by: https://github.com/min-jean-cho, https://github.com/EikanWang, https://github.com/atalman, https://github.com/malfet	2024-08-29 23:53:32 +00:00
Zitong Zhan	90c821814e	SparseCsrCUDA: cuDSS backend for linalg.solve (#129856 ) This PR switches to cuDSS library and has the same purpose of #127692, which is to add Sparse CSR tensor support to linalg.solve. Fixes #69538 Minimum example of usage: ``` import torch if __name__ == '__main__': spd = torch.rand(4, 3) A = spd.T @ spd b = torch.rand(3).to(torch.float64).cuda() A = A.to_sparse_csr().to(torch.float64).cuda() x = torch.linalg.solve(A, b) print((A @ x - b).norm()) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129856 Approved by: https://github.com/amjames, https://github.com/lezcano, https://github.com/huydhn Co-authored-by: Zihang Fang <zhfang1108@gmail.com> Co-authored-by: Huy Do <huydhn@gmail.com>	2024-08-22 07:57:30 +00:00
Mikayla Gawarecki	018e48c337	[Reland] Add wrappers for synchronous GPUDirect Storage APIs (#133489 ) Reland #130633 USE_CUFILE turned off by default in this version Pull Request resolved: https://github.com/pytorch/pytorch/pull/133489 Approved by: https://github.com/albanD	2024-08-15 17:11:52 +00:00
Yu, Guangye	92bebb46fa	Support XPU ABI=0 build (#130110 ) # Motivation This PR intends to support ABI=0 build for XPU backend. # Additional Context The major change is adding a compilation option `-D__INTEL_PREVIEW_BREAKING_CHANGES` for the host compiler(gcc) and `-fpreview-breaking-changes` for XPU device kernel code compiler(icpx), why? Because we use - gcc to compile host code and link SYCL runtime. So we need to pass `-D__INTEL_PREVIEW_BREAKING_CHANGES` to tell the host compiler invoking the ABI-neutral API included in SYCL. And - use icpx to compile device kernel code and link SYCL runtime. So we need to pass `-fpreview-breaking-changes` to tell the device kernel compiler building ABI-neutral code. Besides, - `libsycl-preview.so` is an ABI-neutral library but `libsycl.so` is not. This PR depends on https://github.com/pytorch/pytorch/pull/131643. Pull Request resolved: https://github.com/pytorch/pytorch/pull/130110 Approved by: https://github.com/EikanWang, https://github.com/gujinghui, https://github.com/albanD	2024-08-01 21:42:14 +00:00
PyTorch MergeBot	e191b83462	Revert "Add wrappers for synchronous GPUDirect Storage APIs (#130633 )" This reverts commit 709ddf7a9dcfa1268848b72f6f56b55afa6728d6. Reverted https://github.com/pytorch/pytorch/pull/130633 on behalf of https://github.com/clee2000 due to still failing internally D60265673 ([comment](https://github.com/pytorch/pytorch/pull/130633#issuecomment-2253239607))	2024-07-26 18:08:20 +00:00
Mikayla Gawarecki	709ddf7a9d	Add wrappers for synchronous GPUDirect Storage APIs (#130633 ) Based in part on https://github.com/NVIDIA/apex/pull/1774 Differential Revision: [D60155434](https://our.internmc.facebook.com/intern/diff/D60155434) Pull Request resolved: https://github.com/pytorch/pytorch/pull/130633 Approved by: https://github.com/albanD	2024-07-25 22:23:38 +00:00
PyTorch MergeBot	e4b5645f83	Revert "Add wrappers for synchronous GPUDirect Storage APIs (#130633 )" This reverts commit 5b5e0698a5f560decb9bbdd150ed7b0622eb7777. Reverted https://github.com/pytorch/pytorch/pull/130633 on behalf of https://github.com/clee2000 due to breaking a lot of jobs and build rules internally D60085885, possibly needs to update some bazel build? ([comment](https://github.com/pytorch/pytorch/pull/130633#issuecomment-2245806738))	2024-07-23 17:19:34 +00:00
Mikayla Gawarecki	5b5e0698a5	Add wrappers for synchronous GPUDirect Storage APIs (#130633 ) Based in part on https://github.com/NVIDIA/apex/pull/1774 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130633 Approved by: https://github.com/albanD	2024-07-22 14:51:24 +00:00
Xu Han	f1456c74a0	Fix mkl-static issue for Windows. (#130697 ) Background: We found the pytorch Windows release/2.4 performance regression: https://github.com/pytorch/pytorch/issues/130619 After some debug works, I found the pytorch Windows static mkl build options are wrong: <img width="1049" alt="image" src="https://github.com/user-attachments/assets/38692142-bfca-4c98-8092-6e105c82bb13"> 1. Thread lib is wrong. 2. Miss `openmp` lib and config. > Debug history: https://github.com/pytorch/pytorch/issues/130619#issuecomment-2226782504 and https://github.com/pytorch/pytorch/issues/130619#issuecomment-2226418611 This PR will fix `mkl-static` build options issue. <img width="863" alt="image" src="https://github.com/user-attachments/assets/834f6cee-7e6d-4d74-b2bc-8a270f05e429"> Reference: <img width="482" alt="image" src="https://github.com/user-attachments/assets/8184dadb-f230-4062-a49f-51df1d7285f5"> https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl-link-line-advisor.html#gs.c6izlg Pull Request resolved: https://github.com/pytorch/pytorch/pull/130697 Approved by: https://github.com/jgong5, https://github.com/atalman	2024-07-15 19:28:11 +00:00
Nikita Shulga	fe4032fe20	[BE][CMake] Do not use `EXEC_PROGRAM` (#129714 ) It was deprecated since CMake-3.0 in favor of `execute_process`, see https://cmake.org/cmake/help/v3.18/command/exec_program.html This makes the following warning disappear: ``` CMake Warning (dev) at cmake/Modules/FindARM.cmake:5 (EXEC_PROGRAM): Policy CMP0153 is not set: The exec_program command should not be called. Run "cmake --help-policy CMP0153" for policy details. Use the cmake_policy command to set the policy and suppress this warning. Use execute_process() instead. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129714 Approved by: https://github.com/kit1980	2024-06-28 13:29:52 +00:00

1 2 3 4 5

237 Commits