pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Author	SHA1	Message	Date
Nikita Shulga	9a3c4b917e	[CMake] Remove forcing of `-O2` from `torch_compile_options` (#164894 ) That was introduced by `75a65ffe0f` Hattip to @jathu for alerting me about the issue. As result, all our PyTorch builds were shipped with `-O2` for almost all of its modern history Partially undo the damage introduced by https://github.com/pytorch/pytorch/pull/128406 that cause cross-ISA symbols leak, to be properly followed up in https://github.com/pytorch/pytorch/issues/165123 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164894 Approved by: https://github.com/ezyang	2025-10-10 04:43:53 +00:00
Robert Hardwick	1aeac304b8	Move prioritized text linker optimization code from setup.py to cmake (#160078 ) Note. This is a replica PR of #155901 which will be closed. I had to create a new PR in order to add it into my ghstack as there are some later commits which depend on it. ### Summary 🚀 This PR moves the prioritized text linker optimization from setup.py to cmake ( and enables by default on Linux aarch64 systems ) This change consolidates what was previously manual CI logic into a single location (cmake), ensuring consistent behavior across local builds, CI pipelines, and developer environments. ### Motivation Prioritized text layout has measurable performance benefits on Arm systems by reducing code padding and improving cache utilization. This optimization was previously triggered manually via CI scripts (.ci/aarch64_linux/aarch64_ci_build.sh) or user-set environment variables. By detecting the target architecture within setup.py, this change enables the optimization automatically where applicable, improving maintainability and usability. Note: Due to ninja/cmake graph generation issues we cannot apply the linker file globally to all targets to the targets must be manually defined. See CMakeLists.txt the main libraries torch_python, torch, torch_cpu, torch_cuda, torch_xpu have been targetted which should be enough to maintain the performance benefits outlined above. Co-authored-by: Usamah Zaheer <usamah.zaheer@arm.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/160078 Approved by: https://github.com/seemethere	2025-09-18 17:09:48 +00:00
PyTorch MergeBot	94db2ad51d	Revert "Move prioritized text linker optimization code from setup.py to cmake (#160078 )" This reverts commit 26b3ae58908becbb03b28636f7384d2972a8c9a5. Reverted https://github.com/pytorch/pytorch/pull/160078 on behalf of https://github.com/atalman due to Sorry reverting this broke linux aarch64 CUDA nightlies [pytorch/pytorch/actions/runs/17637486681/job/50146967503](https://github.com/pytorch/pytorch/actions/runs/17637486681/job/50146967503) ([comment](https://github.com/pytorch/pytorch/pull/160078#issuecomment-3281426631))	2025-09-11 15:29:29 +00:00
Robert Hardwick	26b3ae5890	Move prioritized text linker optimization code from setup.py to cmake (#160078 ) Note. This is a replica PR of #155901 which will be closed. I had to create a new PR in order to add it into my ghstack as there are some later commits which depend on it. ### Summary 🚀 This PR moves the prioritized text linker optimization from setup.py to cmake ( and enables by default on Linux aarch64 systems ) This change consolidates what was previously manual CI logic into a single location (cmake), ensuring consistent behavior across local builds, CI pipelines, and developer environments. ### Motivation Prioritized text layout has measurable performance benefits on Arm systems by reducing code padding and improving cache utilization. This optimization was previously triggered manually via CI scripts (.ci/aarch64_linux/aarch64_ci_build.sh) or user-set environment variables. By detecting the target architecture within setup.py, this change enables the optimization automatically where applicable, improving maintainability and usability. Note: Due to ninja/cmake graph generation issues we cannot apply the linker file globally to all targets to the targets must be manually defined. See CMakeLists.txt the main libraries torch_python, torch, torch_cpu, torch_cuda, torch_xpu have been targetted which should be enough to maintain the performance benefits outlined above. Co-authored-by: Usamah Zaheer <usamah.zaheer@arm.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/160078 Approved by: https://github.com/seemethere	2025-09-10 09:21:53 +00:00
atalman	bffc7dd1f3	[CD] Add cuda 13.0 libtorch builds, remove CUDA 12.9 builds (#161916 ) Related to https://github.com/pytorch/pytorch/issues/159779 Adding CUDA 13.0 libtorch builds, followup after https://github.com/pytorch/pytorch/pull/160956 Removing CUDA 12.9 builds, See https://github.com/pytorch/pytorch/issues/159980 Pull Request resolved: https://github.com/pytorch/pytorch/pull/161916 Approved by: https://github.com/jeanschmidt, https://github.com/Skylion007 Co-authored-by: Ting Lu <tingl@nvidia.com>	2025-09-05 07:47:54 +00:00
Scott Todd	ee89cc7a0a	[ROCm][Windows] Fix LoadHIP handling of environment variable paths on Windows. (#159080 ) See https://cmake.org/cmake/help/latest/command/file.html#path-conversion. Paths stored in environment variables may use `/` or `\` (e.g. on Windows), while cmake-style paths always use `/`. This fixes configure errors like: ``` CMake Error at D:/b/pytorch_main/build/CMakeFiles/CMakeScratch/TryCompile-srhq07/CMakeLists.txt:2 (set): Syntax error in cmake code at D:/b/pytorch_main/build/CMakeFiles/CMakeScratch/TryCompile-srhq07/CMakeLists.txt:2 when parsing string D:\projects\TheRock\external-builds\pytorch\.venv\Lib\site-packages\_rocm_sdk_devel/cmake/;D:/b/pytorch_main/cmake/Modules Invalid character escape '\p'. CMake Error at D:/projects/TheRock/external-builds/pytorch/.venv/Lib/site-packages/cmake/data/share/cmake-3.31/Modules/Internal/CheckSourceCompiles.cmake:108 (try_compile): Failed to configure test project build system. ``` (note the mixed usage of `\` and `/` in that string) Pull Request resolved: https://github.com/pytorch/pytorch/pull/159080 Approved by: https://github.com/jeffdaily	2025-08-12 00:18:19 +00:00
Nikita Shulga	0df78f0c11	Remove `/d2implyavx512upperregs-` flag (#159431 ) And reopen https://github.com/pytorch/pytorch/issues/145702 As this flag is not documented anywhere, slows down sccache accelerated build and per https://developercommunity.visualstudio.com/t/Invalid-code-gen-when-using-AVX2-and-SSE/10527298#T-N10562579 it does not workaround a compiler bug, but rather disables some optimizations of AVX512 instructions which are being invoked in AVX2 codepath Fixes https://github.com/pytorch/pytorch/issues/159082 Pull Request resolved: https://github.com/pytorch/pytorch/pull/159431 Approved by: https://github.com/clee2000	2025-07-30 18:47:03 +00:00
cyy	65c1109ca2	Remove CUDA 11 CMake code (#156795 ) CUDA 11 is no longer supported. Pull Request resolved: https://github.com/pytorch/pytorch/pull/156795 Approved by: https://github.com/atalman, https://github.com/malfet	2025-07-24 08:00:41 +00:00
Nikita Shulga	abe0c9538a	[BE] Fix extra-semi warnings (#158730 ) And prevent new ones from appearing by removing `-Wno-error=extra-semi` (not sure what was thereason behind adding the warning but not erroring on on it when building with -Werror introduced by https://github.com/pytorch/pytorch/pull/140236 ) 300+ violations of that rule were fixed by running `sed -i -e "s/});/})/" /` against `torch/nativert` Other 3p deps that needs updates: - TensorPipe - LLVM - FBGEMM Pull Request resolved: https://github.com/pytorch/pytorch/pull/158730 Approved by: https://github.com/Skylion007	2025-07-22 01:05:03 +00:00
Yu, Guangye	cbe1cb7018	[CMake] Move xpu flag to xpu.cmake (#158542 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/158542 Approved by: https://github.com/gujinghui, https://github.com/ezyang	2025-07-21 17:19:59 +00:00
tvukovic-amd	a23f4471b9	[ROCm][Windows] Fix finding ROCm/HIP version (#156486 ) This commit fixes Windows build issue related to trying to use rocm-core (rocm-core doesn't exist on HIP SDK) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156486 Approved by: https://github.com/jeffdaily, https://github.com/stellaraccident	2025-07-16 15:31:43 +00:00
dsashidh	4ff9b7fa31	Fix diagnostic message for CUDA version mismatch in cuda.cmake (#157370 ) This PR fixes #157354 It fixes the issue in 'cmake/public/cuda.cmake' where a diagnostic message incorrectly showed an empty CUDA version when 'FindCUDA' and header-reported versions differed. The problem was caused by this line: set(${cuda_version_from_findcuda} ${CUDA_VERSION_STRING}) This incorrectly used the value of cuda_version_from_findcuda as a variable name. As a result the version string wasn't assigned and the error message omitted the version. This has been corrected to: set(cuda_version_from_findcuda ${CUDA_VERSION_STRING}) Now the diagnostic message properly displays the CUDA version reported by FindCUDA. Pull Request resolved: https://github.com/pytorch/pytorch/pull/157370 Approved by: https://github.com/soulitzer	2025-07-11 20:58:35 +00:00
cyy	7381c77724	Use CMake wholearchive group (#156393 ) Use CMake wholearchive group to simplify code. It may also support more OSes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/156393 Approved by: https://github.com/ezyang	2025-07-08 12:20:29 +00:00
cyy	7c1f627828	Fix 'dllimport attribute ignored on inline function' (#157670 ) There are lots of warnings in builds: ``` 2025-07-05T16:59:46.9208806Z C:\actions-runner\_work\pytorch\pytorch\build\aten\src\ATen\core\TensorBody.h(5043,29): warning: 'at::Tensor::less_' redeclared inline; 'dllimport' attribute ignored [-Wignored-attributes] 2025-07-05T16:59:46.9209030Z 5043 \| inline at::Tensor & Tensor::less_(const at::Scalar & other) const { 2025-07-05T16:59:46.9209104Z \| ^ 2025-07-05T16:59:46.9209671Z C:\actions-runner\_work\pytorch\pytorch\build\aten\src\ATen\core\TensorBody.h(5048,29): warning: 'at::Tensor::less_' redeclared inline; 'dllimport' attribute ignored [-Wignored-attributes] 2025-07-05T16:59:46.9209860Z 5048 \| inline at::Tensor & Tensor::less_(const at::Tensor & other) const ``` This PR has fixed them and turned the warning into an error. Pull Request resolved: https://github.com/pytorch/pytorch/pull/157670 Approved by: https://github.com/albanD	2025-07-07 16:57:48 +00:00
Erik Schultheis	cdb144fcf0	Display a warning when overwriting `CMAKE_CUDA_ARCHITECTURES` (#156123 ) Really, pytorch shoudn't be messing with basic _global_ cmake configuration like this, but without a careful analysis what all depends on this behaviour, I'm not confident to propose a change. But at least notifying the user that something wonky is going on seems like a good idea. @drisspg Pull Request resolved: https://github.com/pytorch/pytorch/pull/156123 Approved by: https://github.com/drisspg, https://github.com/msaroufim Co-authored-by: Mark Saroufim <marksaroufim@meta.com>	2025-06-28 11:22:09 +00:00
cyy	ce1a07570d	Fix TORCH_CUDA_ARCH_LIST (#156667 ) Before the fix, `TORCH_CUDA_ARCH_LIST` variable contains string `TORCH_CUDA_ARCH_LIST` Pull Request resolved: https://github.com/pytorch/pytorch/pull/156667 Approved by: https://github.com/ngimel	2025-06-24 07:27:53 +00:00
tvukovic-amd	b2d473c8f8	[ROCm][Windows] Fix rocsolver undefined symbol error (#156591 ) Fix undefined symbol error while using `rocsolver_ssyevd_strided_batched` call in `aten/src/ATen/native/cuda/linalg/BatchLinearAlgebraLib.cpp`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/156591 Approved by: https://github.com/jeffdaily	2025-06-24 03:28:45 +00:00
PyTorch MergeBot	b1d62febd0	Revert "Use official CUDAToolkit module in CMake (#154595 )" This reverts commit 08dae945ae380d80efbaf140a95abfc5d96e5100. Reverted https://github.com/pytorch/pytorch/pull/154595 on behalf of https://github.com/malfet due to It breaks on some local setup with no clear diagnostic, but looks like it fails to find cuFile ([comment](https://github.com/pytorch/pytorch/pull/154595#issuecomment-2997959344))	2025-06-23 21:15:31 +00:00
PyTorch MergeBot	4f70fbbd16	Revert "Use CMake wholearchive group (#156393 )" This reverts commit d1b4e0fa9a5feb22fc6de1d36dc4c9dac685caed. Reverted https://github.com/pytorch/pytorch/pull/156393 on behalf of https://github.com/etaf due to This PR is breaking XPU windows build. ([comment](https://github.com/pytorch/pytorch/pull/156393#issuecomment-2995576362))	2025-06-23 09:03:19 +00:00
cyy	d1b4e0fa9a	Use CMake wholearchive group (#156393 ) Use CMake wholearchive group to simplify code. It may also support more OSes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/156393 Approved by: https://github.com/ezyang	2025-06-23 06:22:34 +00:00
cyy	08dae945ae	Use official CUDAToolkit module in CMake (#154595 ) Use CUDA language in CMake and remove forked FindCUDAToolkit.cmake. Some CUDA targets are also renamed with `torch::` prefix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/154595 Approved by: https://github.com/albanD	2025-06-22 05:44:29 +00:00
Jeff Daily	30d3cf62fb	support CUBLASLT_MATMUL_MATRIX_SCALE_OUTER_VEC_32F (#154680 ) Requires CUDA >= 12.9 and sm_90. hipBLASLt has a similar enum but is not available until ROCm 7.0. Support the new enum early using a cmake test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/154680 Approved by: https://github.com/malfet, https://github.com/atalman	2025-06-18 18:39:01 +00:00
Xuehai Pan	ccea6ddac3	[BE] fix typos in cmake/ (#156079 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156079 Approved by: https://github.com/Skylion007	2025-06-17 19:25:43 +00:00
Wei Wang	ab56e5add9	[CUDA][BUILD] Add back the capability to use env TORCH_CUDA_ARCH_LIST (#155314 ) Add back the capability to use env TORCH_CUDA_ARCH_LIST to control how downstream projects (which uses find_package (torch)) build. Follow up to: https://github.com/pytorch/pytorch/pull/152715 Before this PR, On a CPU only machine, building a downstream project would ignore the TORCH_CUDA_ARCH_LIST setting (if set) and go straight to the auto GPU detection mode, in which case there would be no GPU detected and an excessive list of cuda architectures may be used. This also means that there is no way to build a binary that would be targeting a different SM on the current machine a developer is using. After this PR, TORCH_CUDA_ARCH_LIST is effective for developers to control explicitly which SM architectures to build. p.s. I think this PR might have been the original intent of https://github.com/pytorch/pytorch/pull/152715 Pull Request resolved: https://github.com/pytorch/pytorch/pull/155314 Approved by: https://github.com/janeyx99, https://github.com/eqy, https://github.com/atalman	2025-06-07 15:52:39 +00:00
Stella Laurenzo	10cd1de518	[ROCm] Make optional features in LoadHIP better conditioned. (#155305 ) * The `rocm-core` CMake package only started appearing in ROCm 6.4, so rework the version probing to work if it is not present. Also collapses the unneeded operating system conditioning in favor of feature probing. * Make `hipsparselt` optional: it only started appearing in ROCm 6.4 and it is not in all recent distribution channels yet. Pull Request resolved: https://github.com/pytorch/pytorch/pull/155305 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-06-07 02:20:55 +00:00
Yu, Guangye	10cef1e25d	Remove torch XPU ABI=0 build logic for old compiler (#150095 ) # Motivation Follow https://github.com/pytorch/pytorch/pull/149888, this PR intends to remove ABI=0 build logic for PyTorch XPU build with old compiler (< 2025.0). For newer compilers >= 2025.0, the ABI is neutral by default without requiring additional compilation options (`-fpreview-breaking-changes`). # Additional Context This PR depends on XPU CI pass, which will be fixed by https://github.com/pytorch/pytorch/pull/149843 and https://github.com/intel/torch-xpu-ops/pull/1515 Pull Request resolved: https://github.com/pytorch/pytorch/pull/150095 Approved by: https://github.com/EikanWang, https://github.com/malfet	2025-06-06 13:13:19 +00:00
Peter Y. Yeh	43390d8b13	ROCm Sparsity through HipSparseLT (#150578 ) TLDR: - This pull request introduces support for hipSPARSELt in ROCm, current usage would be semi-structure sparsity. - Require ROCm 6.4 && gfx942/gfx950. - The average performance uplift (compare to dense operation) is ~ 20% in ROCm 6.4 but expect further performance lift along the way. ### Dense vs. Sparse Performance Comparison #### NT (Row-major) Average Uplift: `1.20` \| M \| N \| K \| hipsparselt-bench (us) \| hipblaslt-bench get all (us) \| Uplift \| \|-------\|--------\|--------\|-------------------------\|-------------------------------\|--------\| \| 14336 \| 8 \| 4096 \| 20.05 \| 25.3 \| 1.26 \| \| 4096 \| 8 \| 14336 \| 21.07 \| 25.28 \| 1.20 \| \| 3072 \| 3072 \| 10240 \| 299.05 \| 351.82 \| 1.18 \| \| 3072 \| 1536 \| 768 \| 18.56 \| 20.05 \| 1.08 \| \| 3072 \| 17664 \| 768 \| 163.13 \| 173.91 \| 1.07 \| \| 3072 \| 196608 \| 768 \| 1717.30 \| 1949.63 \| 1.14 \| \| 3072 \| 24576 \| 768 \| 206.84 \| 242.98 \| 1.17 \| \| 3072 \| 6144 \| 768 \| 53.90 \| 56.88 \| 1.06 \| \| 3072 \| 98304 \| 768 \| 833.77 \| 962.28 \| 1.15 \| \| 768 \| 1536 \| 768 \| 8.53 \| 19.65 \| 2.30 \| \| 768 \| 17664 \| 768 \| 46.02 \| 46.84 \| 1.02 \| \| 768 \| 196608 \| 768 \| 463.15 \| 540.46 \| 1.17 \| \| 768 \| 24576 \| 768 \| 54.32 \| 59.55 \| 1.10 \| \| 768 \| 6144 \| 768 \| 19.47 \| 20.15 \| 1.03 \| \| 768 \| 98304 \| 768 \| 231.88 \| 258.73 \| 1.12 \| --- #### NN (Row-major) Average Uplift: `1.13` \| M \| N \| K \| hipsparselt-bench (us) \| hipblaslt-bench get all (us) \| Uplift \| \|-----\|--------\|-------\|-------------------------\|-------------------------------\|--------\| \| 768 \| 1536 \| 3072 \| 27.50 \| 28.78 \| 1.05 \| \| 768 \| 17664 \| 3072 \| 125.06 \| 158.94 \| 1.27 \| \| 768 \| 196608 \| 3072 \| 1568.38 \| 1767.12 \| 1.13 \| \| 768 \| 24576 \| 3072 \| 171.05 \| 203.49 \| 1.19 \| \| 768 \| 6144 \| 3072 \| 58.72 \| 60.39 \| 1.03 \| \| 768 \| 98304 \| 3072 \| 787.15 \| 887.60 \| 1.13 \| ------------------------- This pull request introduces support for hipSPARSELt in ROCm, alongside various updates and improvements to the codebase and test suite. The changes primarily involve adding configuration flags, updating conditional checks, and ensuring compatibility with hipSPARSELt. ### ROCm and hipSPARSELt Support: * [`BUILD.bazel`](diffhunk://#diff-7fc57714ef13c3325ce2a1130202edced92fcccc0c6db34a72f7b57f60d552a3R292): Added `@AT_HIPSPARSELT_ENABLED@` substitution to enable hipSPARSELt support. * [`aten/CMakeLists.txt`](diffhunk://#diff-0604597797bb21d7c39150f9429d6b2ace10b79ab308514ad03f76153ae8249bR104-R110): Introduced a conditional flag to enable hipSPARSELt support based on ROCm version. * [`aten/src/ATen/CMakeLists.txt`](diffhunk://#diff-ce80f3115ab2f6be5142f0678a1fc92c6b2d7727766ce44f48726c99e720f777R37): Added `AT_HIPSPARSELT_ENABLED` configuration. * [`aten/src/ATen/cuda/CUDAConfig.h.in`](diffhunk://#diff-8bb82da825ca87c28233abacffa1b0566c73a54990b7a77f3f5108d3718fea15R11): Defined `AT_HIPSPARSELT_ENABLED` macro. * `caffe2/CMakeLists.txt`, `cmake/Dependencies.cmake`, `cmake/public/LoadHIP.cmake`: Included hipSPARSELt in the ROCm dependencies. [[1]](diffhunk://#diff-c5ee05f1e918772792ff6f2a3f579fc2f182e57b1709fd786ef6dc711fd68b27R1380) [[2]](diffhunk://#diff-12e8125164bbfc7556b1781a8ed516e333cc0bf058acb7197f7415be44606c72L1084-R1084) [[3]](diffhunk://#diff-b98e27b9a5f196a6965a99ee5a7bb15b3fc633d6375b767635b1b04ccb2fd3d5R153) ### Codebase Updates: * [`aten/src/ATen/native/sparse/cuda/cuSPARSELtOps.cpp`](diffhunk://#diff-ae921dd1584ab98fdd9c25a3521047795de702223f5b65fdaa45a5bd92b4d1f3R1-R6): Added hipSPARSELt support checks and initialization functions. Updated various methods to conditionally handle hipSPARSELt. [[1]](diffhunk://#diff-ae921dd1584ab98fdd9c25a3521047795de702223f5b65fdaa45a5bd92b4d1f3R1-R6) [[2]](diffhunk://#diff-ae921dd1584ab98fdd9c25a3521047795de702223f5b65fdaa45a5bd92b4d1f3R22-R67) [[3]](diffhunk://#diff-ae921dd1584ab98fdd9c25a3521047795de702223f5b65fdaa45a5bd92b4d1f3R78-R85) [[4]](diffhunk://#diff-ae921dd1584ab98fdd9c25a3521047795de702223f5b65fdaa45a5bd92b4d1f3R97-R109) [[5]](diffhunk://#diff-ae921dd1584ab98fdd9c25a3521047795de702223f5b65fdaa45a5bd92b4d1f3R183-R188) [[6]](diffhunk://#diff-ae921dd1584ab98fdd9c25a3521047795de702223f5b65fdaa45a5bd92b4d1f3L134-R200) [[7]](diffhunk://#diff-ae921dd1584ab98fdd9c25a3521047795de702223f5b65fdaa45a5bd92b4d1f3R213-R222) [[8]](diffhunk://#diff-ae921dd1584ab98fdd9c25a3521047795de702223f5b65fdaa45a5bd92b4d1f3L217-R285) ### Test Suite Updates: * [`test/test_sparse_semi_structured.py`](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR50-R65): Added checks for hipSPARSELt availability and updated test conditions to skip tests not supported on ROCm. [[1]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR50-R65) [[2]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR228) [[3]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR239) [[4]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR250) [[5]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR579) [[6]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR624) [[7]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR661) [[8]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR695) [[9]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR730) [[10]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR755) [[11]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR771) [[12]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR809) [[13]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR844) [[14]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cL840-R854) [[15]](diffhunk://#diff-b7b57bc1e34145ef89c7929751d5d26aeecc8edfb37da9c60e9d3f0a1335133cR1005) Pull Request resolved: https://github.com/pytorch/pytorch/pull/150578 Approved by: https://github.com/jeffdaily	2025-05-31 02:03:40 +00:00
cyy	8fe7ec6721	Add /Zc:preprocessor for torch libraries in MSVC builds (#147825 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/147825 Approved by: https://github.com/janeyx99	2025-05-24 06:57:46 +00:00
Xu Han	7421c21b5e	remove unused code. (#153979 ) Remove the unused cmake code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/153979 Approved by: https://github.com/albanD	2025-05-22 17:50:11 +00:00
Xu Han	179e7d8624	Fix vs2022 caused AVX512 illegal instruction issue. (#153480 ) Fixes #145702 Add `/d2implyavx512upperregs-` to disable compiler over-aggressive optimization, which caused involeved AVX512 register on AVX2 machine. Reference to: https://github.com/pytorch/pytorch/issues/145702#issuecomment-2874029459 Local test passed: <img width="1208" alt="image" src="https://github.com/user-attachments/assets/26f4cb91-6bb5-416f-aa35-c899eb1489b2" /> Pull Request resolved: https://github.com/pytorch/pytorch/pull/153480 Approved by: https://github.com/Blackhex, https://github.com/cyyever, https://github.com/atalman	2025-05-20 20:37:00 +00:00
Xu Han	2ade886412	[XPU] [Windows] Auto turn on kineto XPU build when compiler version support. (#153681 ) Since SYCL compiler 20250101, it will remove dependency of level zero header. We can turn on kineto XPU by default. Pull Request resolved: https://github.com/pytorch/pytorch/pull/153681 Approved by: https://github.com/chuanqi129, https://github.com/cyyever, https://github.com/EikanWang	2025-05-19 03:07:14 +00:00
Jithun Nair	f11d7a5978	[ROCm] Update spack includes (#152569 ) * Cleans up code in `caffe2/CMakeLists.txt` to remove individual ROCm library include paths and use `ROCM_INCLUDE_DIRS` CMake var instead * `ROCM_INCLUDE_DIRS` CMake var is set in `cmake/public/LoadHIP.cmake` by adding all the ROCm packages that PyTorch depends on * `rocm_version.h` is provided by the `rocm-core` package, so use the include directory for that component to be compliant with Spack * Move `find_package_and_print_version(hip REQUIRED CONFIG)` earlier so that `hip_version.h` can be located in the hip package include dir for Spack * `list(REMOVE_DUPLICATES ROCM_INCLUDE_DIRS)` to remove duplicate `/opt/rocm/include` entries in the non-Spack case * Remove user-provided env var `ROCM_INCLUDE_DIRS` since `ROCM_PATH` already exists as a user-provided env var, which should be sufficient to locate the include directories for ROCm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/152569 Approved by: https://github.com/renjithravindrankannath, https://github.com/jeffdaily Co-authored-by: Renjith Ravindran <Renjith.RavindranKannath@amd.com>	2025-05-09 21:36:38 +00:00
Jane Xu	2107d87dc9	[BE] remove outdated warning about TORCH_CUDA_ARCH_LIST (#152715 ) I saw this warning when compiling a 3rd party lib and did not agree with it. I'm not sure the original reason why we would want to force people to pass in TORCH_CUDA_ARCH_LIST to cmake vs set it as an env var. As a developer, it's much easier to set it as an env var or have it be autodetected. I also realized this warning was from before 2018!!! 7 years ago! And there are no plans to actually enforce this (nor should there be), so let's remove this misleading warning. Pull Request resolved: https://github.com/pytorch/pytorch/pull/152715 Approved by: https://github.com/malfet, https://github.com/zou3519	2025-05-02 23:00:51 +00:00
cyy	e9e1aacef8	Enable -Wunused on torch targets (#150077 ) For GCC, ``-Wunused`` contains: ``` -Wunused-function Warn whenever a static function is declared but not defined or a non\-inline static function is unused. -Wunused-label Warn whenever a label is declared but not used. To suppress this warning use the unused attribute. -Wunused-parameter Warn whenever a function parameter is unused aside from its declaration. To suppress this warning use the unused attribute. -Wunused-variable Warn whenever a local variable or non-constant static variable is unused aside from its declaration To suppress this warning use the unused attribute. ``` For Clang, some of the diagnostics controlled by ``-Wunused`` are enabled by default: ``` Controls [-Wunused-argument](https://clang.llvm.org/docs/DiagnosticsReference.html#wunused-argument), [-Wunused-but-set-variable](https://clang.llvm.org/docs/DiagnosticsReference.html#wunused-but-set-variable), [-Wunused-function](https://clang.llvm.org/docs/DiagnosticsReference.html#wunused-function), [-Wunused-label](https://clang.llvm.org/docs/DiagnosticsReference.html#wunused-label), [-Wunused-lambda-capture](https://clang.llvm.org/docs/DiagnosticsReference.html#wunused-lambda-capture), [-Wunused-local-typedef](https://clang.llvm.org/docs/DiagnosticsReference.html#wunused-local-typedef), [-Wunused-private-field](https://clang.llvm.org/docs/DiagnosticsReference.html#wunused-private-field), [-Wunused-property-ivar](https://clang.llvm.org/docs/DiagnosticsReference.html#wunused-property-ivar), [-Wunused-value](https://clang.llvm.org/docs/DiagnosticsReference.html#wunused-value), [-Wunused-variable](https://clang.llvm.org/docs/DiagnosticsReference.html#wunused-variable). ``` These checks are all usefull. This PR aims to enable ``-Wunused`` without breaking code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/150077 Approved by: https://github.com/zou3519, https://github.com/wdvr	2025-05-02 07:14:19 +00:00
PyTorch MergeBot	6dadfc4457	Revert "Enable -Wunused on torch targets (#150077 )" This reverts commit 688adc9941f855e78dd4d595682eea16317b7f54. Reverted https://github.com/pytorch/pytorch/pull/150077 on behalf of https://github.com/wdvr due to failing internally with use of undeclared identifier ([comment](https://github.com/pytorch/pytorch/pull/150077#issuecomment-2846499828))	2025-05-02 06:53:20 +00:00
cyy	688adc9941	Enable -Wunused on torch targets (#150077 ) For GCC, ``-Wunused`` contains: ``` -Wunused-function Warn whenever a static function is declared but not defined or a non\-inline static function is unused. -Wunused-label Warn whenever a label is declared but not used. To suppress this warning use the unused attribute. -Wunused-parameter Warn whenever a function parameter is unused aside from its declaration. To suppress this warning use the unused attribute. -Wunused-variable Warn whenever a local variable or non-constant static variable is unused aside from its declaration To suppress this warning use the unused attribute. ``` For Clang, some of the diagnostics controlled by ``-Wunused`` are enabled by default: ``` Controls [-Wunused-argument](https://clang.llvm.org/docs/DiagnosticsReference.html#wunused-argument), [-Wunused-but-set-variable](https://clang.llvm.org/docs/DiagnosticsReference.html#wunused-but-set-variable), [-Wunused-function](https://clang.llvm.org/docs/DiagnosticsReference.html#wunused-function), [-Wunused-label](https://clang.llvm.org/docs/DiagnosticsReference.html#wunused-label), [-Wunused-lambda-capture](https://clang.llvm.org/docs/DiagnosticsReference.html#wunused-lambda-capture), [-Wunused-local-typedef](https://clang.llvm.org/docs/DiagnosticsReference.html#wunused-local-typedef), [-Wunused-private-field](https://clang.llvm.org/docs/DiagnosticsReference.html#wunused-private-field), [-Wunused-property-ivar](https://clang.llvm.org/docs/DiagnosticsReference.html#wunused-property-ivar), [-Wunused-value](https://clang.llvm.org/docs/DiagnosticsReference.html#wunused-value), [-Wunused-variable](https://clang.llvm.org/docs/DiagnosticsReference.html#wunused-variable). ``` These checks are all usefull. This PR aims to enable ``-Wunused`` without breaking code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/150077 Approved by: https://github.com/zou3519	2025-05-01 04:09:06 +00:00
Anthony Shoumikhin	7cae7902a2	Add scripts to check xrefs and urls (#151844 ) Traverses the docs and code to find any broken links Pull Request resolved: https://github.com/pytorch/pytorch/pull/151844 Approved by: https://github.com/huydhn	2025-04-28 09:30:07 +00:00
Wei Wang	b74be52454	[CUDA][NVTX] Move nvtx3 code from cmake/public/cuda.cmake to cmake/Dependencies.cmake (#151583 ) Fixes [#147220] Context: In the CUDA NVTX world, there are NVTX v2 and NVTX v3. As announced in CUDA release notes, e.g. [CUDA 12.8 Update 1]( https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#deprecated-or-dropped-operating-systems) "`NVTX v2 is deprecated. To migrate to NVTX v3. Change your code from: #include <nvtoolsext.h> to #include "nvtx3/nvtoolsext.h`". This header is included in the toolkit." On the PyTorch side, TORCH_CUDA_USE_NVTX3 compile time macro is used and it is set to true when (most of the time) nvtx3 is found. nvtx3 is found in two cases: 1) USE_SYSTEM_NVTX=0 (default), torch build process would automatically look for the nvtx3 in pytorch/third_party/nvtx. This is the most common and default case. 2) when USE_SYSTEM_NVTX=1 is used, nvtx3 is found from the installed CUDA toolkit (e.g. CUDA 12.8 and even some earlier cuda versions). As described in #147220, the reason it can find pytorch/third_party/nvtx is because it used `6f035d8462/cmake/public/cuda.cmake (L176)` note the "PROJECT_SOURCE_DIR" usage in [pytorch/cmake/public/cuda.cmake](`6f035d8462/cmake/public/cuda.cmake (L176)`) Before this PR: PyTorch build would succeed in finding nvtx3 due to the above described process, everything is good. But downstream projects like torchvision can fail, and would by default fail because the following are happening: 1) USE_SYSTEM_NVTX=0 is used (and most likely it is this case because it is the default) 2) NVTX v2 can no longer be found (e.g. future CUDA versions because deprecation would eventually become removal) 3) TorchVision cannot find NVTX3 either because torchvision was invoking [pytorch/cmake/public/cuda.cmake] but the PROJECT_SOURCE_DIR is no longer the pytorch source but torchvision source! 4) One workaround is to "USE_SYSTEM_NVTX=1" but users have to explicitly set this and do the plumbing work After this PR: PyTorch can still find nvtx3 because the part of the code that finds nvtx3 is just moved to a new place. The CI logs are showing it being able to find nvtx3. e.g. [this job](https://productionresultssa14.blob.core.windows.net/actions-results/47f8efaa-0afe-4e1f-bc94-0a82629941cb/workflow-job-run-dc8201b1-845b-5da1-a6ea-d3360ce1b508/logs/job/job-logs.txt?rsct=text%2Fplain&se=2025-04-18T20%3A38%3A05Z&sig=yMd6egC%2Banl3lR%2BudXFX18bfUH189z0DTGLtscHQJwY%3D&ske=2025-04-19T06%3A21%3A45Z&skoid=ca7593d4-ee42-46cd-af88-8b886a2f84eb&sks=b&skt=2025-04-18T18%3A21%3A45Z&sktid=398a6654-997b-47e9-b12b-9515b896b4de&skv=2025-01-05&sp=r&spr=https&sr=b&st=2025-04-18T20%3A28%3A00Z&sv=2025-01-05), which reads "`Found nvtx3: C:/actions-runner/_work/pytorch/pytorch/pytorch/third_party/NVTX/c/include`" For torchvision, it still invoke [pytorch/cmake/public/cuda.cmake] but it no longer tries to find nvtx3 as torchvision is not using nvtx3 (if in future it uses, it can set USE_SYSTEM_NVTX=1 by default). So it would avoid the error reported in [#147220] Pull Request resolved: https://github.com/pytorch/pytorch/pull/151583 Approved by: https://github.com/eqy, https://github.com/atalman, https://github.com/malfet	2025-04-18 21:18:09 +00:00
Faa Diallo	423e4a4568	[ROCm] cmake 4 workaround for hiprtc (#150324 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/150324 Approved by: https://github.com/jeffdaily, https://github.com/atalman, https://github.com/malfet	2025-03-31 21:55:53 +00:00
Nichols A. Romero	c0af782f30	[ROCm] Change LoadHIP to use find_file for rocm_version.h (#149983 ) Fixes #149805 Pull Request resolved: https://github.com/pytorch/pytorch/pull/149983 Approved by: https://github.com/jeffdaily	2025-03-26 21:26:41 +00:00
cyy	79e8a69257	Enable move warnings for torch targets (#149923 ) This PR enables more move warnings for torch targets and fixes some code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/149923 Approved by: https://github.com/malfet	2025-03-26 08:38:13 +00:00
Su, Tong	60523540f1	Force build to conform C++ standard on windows by adding /permissive- flag (#149035 ) Fixes #147366 1. Add `/permissive-` to the `torch_compile_options` for the build to conform to the C++ standard. 2. Fix the error when trying to assign a string literal to a non-const ptr. The `/permissive-` flag can be found at https://learn.microsoft.com/en-us/cpp/build/reference/permissive-standards-conformance?view=msvc-170 From the above [doc](https://learn.microsoft.com/en-us/cpp/build/reference/permissive-standards-conformance?view=msvc-170#remarks), > By default, the /permissive- option is set in new projects created by Visual Studio 2017 version 15.5 and later versions. > The /permissive- option is implicitly set by the /std:c++latest option starting in Visual Studio 2019 version 16.8, and in version 16.11 by the /std:c++20 option. Thus, it is reasonable to add this flag to the existing project. Pull Request resolved: https://github.com/pytorch/pytorch/pull/149035 Approved by: https://github.com/guangyey, https://github.com/malfet	2025-03-18 01:51:46 +00:00
Michal Gallus	b706044cca	[ROCm][Windows] Enable hipblaslt for Windows (#148563 ) This PR adds hipblaslt library as one of the Windows' dependencies. `rocBLAS` is added too, since certain symbols aren't detected with `hipblas` alone on Windows. Pull Request resolved: https://github.com/pytorch/pytorch/pull/148563 Approved by: https://github.com/jeffdaily	2025-03-10 21:07:16 +00:00
cyy	f7c0c230b0	Fix compile errors (#148758 ) Fix ``` /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/14.2.1/../../../../include/c++/14.2.1/bits/unique_ptr.h:91:16: error: invalid application of 'sizeof' to an incomplete type 'torch::jit::AliasDb::WriteRegistry' 91 \| static_assert(sizeof(_Tp)>0, \| ^~~~~~~~~~~ /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/14.2.1/../../../../include/c++/14.2.1/bits/unique_ptr.h:399:4: note: in instantiation of member function 'std::default_delete<torch::jit::AliasDb::WriteRegistry>::operator()' requested here 399 \| get_deleter()(std::move(__ptr)); \| ^ ../torch/csrc/jit/ir/alias_analysis.cpp:200:10: note: in instantiation of member function 'std::unique_ptr<torch::jit::AliasDb::WriteRegistry>::~unique_ptr' requested here 200 \| AliasDb::~AliasDb() = default; \| ^ ../torch/csrc/jit/ir/alias_analysis.cpp:200:23: note: in defaulted destructor for 'torch::jit::AliasDb' first required here 200 \| AliasDb::~AliasDb() = default; \| ^ ../torch/csrc/jit/ir/alias_analysis.h:298:10: note: forward declaration of 'torch::jit::AliasDb::WriteRegistry' 298 \| struct WriteRegistry; \| ^ 1 error generated. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/148758 Approved by: https://github.com/Skylion007	2025-03-08 04:56:42 +00:00
Xiao Wang	976ff5cf01	Add cmake hints to USE_SYSTEM_NVTX for nvtx3 include dir (#147418 ) per title sometimes, it's hard for cmake to find NVTX3 without the cuda include path hint Pull Request resolved: https://github.com/pytorch/pytorch/pull/147418 Approved by: https://github.com/nWEIdia, https://github.com/malfet	2025-02-26 20:52:28 +00:00
Peter Yeh	81dccd706b	[ROCm] OCP FP8 Support for new GPUs (#146632 ) TLDR: Follow up/ Build on top of https://github.com/pytorch/pytorch/pull/144476. add OCP FP8 support for gfx950 refer to https://github.com/pytorch/ao/pull/1677 This pull request includes several changes to improve compatibility and support for new GPU architectures and data types, particularly for ROCm. The key updates involve adding support for new ROCm versions and GPU architectures, updating data type handling, and removing outdated checks. ### Improvements to GPU Architecture and ROCm Version Support: * [`aten/src/ATen/Context.cpp`](diffhunk://#diff-33de472d304acbe57d693c8567370c638068bedc1aa0ce8e9dc115dad05a7810L323-R326): Added support for new GPU architectures `gfx1200`, `gfx1201`, and `gfx950` based on ROCm version checks. * [`aten/src/ATen/native/cuda/Blas.cpp`](diffhunk://#diff-e8a569efee1e650172f120a0fdcda024fe3e4703a4ee3336425c8f685af6b3abL196-R199): Updated architecture support in multiple functions to include `gfx1200`, `gfx1201`, and `gfx950` based on ROCm version checks. [[1]](diffhunk://#diff-e8a569efee1e650172f120a0fdcda024fe3e4703a4ee3336425c8f685af6b3abL196-R199) [[2]](diffhunk://#diff-e8a569efee1e650172f120a0fdcda024fe3e4703a4ee3336425c8f685af6b3abL865-R876) ### Updates to Data Type Handling: * [`aten/src/ATen/cuda/CUDADataType.h`](diffhunk://#diff-9188bb13b1a49f459141f5f9b875593d1c5ce2beb5ad711fdbaf5bc7089ec015L81-L98): Enhanced data type conversion to include new float8 types for both CUDA and ROCm environments. * [`aten/src/ATen/cuda/tunable/GemmHipblaslt.h`](diffhunk://#diff-bfa1a3b5d4bef1892bf50338775f3b0fd8cd31fc1868148f3968b98aefb68e3fL29-R80): Updated `HipDataTypeFor` template to handle new float8 types and added hard-coded enum values for ROCm versions prior to 6.3. ### Removal of Outdated Checks: * [`cmake/public/LoadHIP.cmake`](diffhunk://#diff-b98e27b9a5f196a6965a99ee5a7bb15b3fc633d6375b767635b1b04ccb2fd3d5L169-L197): Removed the check for `HIP_NEW_TYPE_ENUMS` as it is no longer necessary with the updated ROCm versions. [[1]](diffhunk://#diff-b98e27b9a5f196a6965a99ee5a7bb15b3fc633d6375b767635b1b04ccb2fd3d5L169-L197) [[2]](diffhunk://#diff-b98e27b9a5f196a6965a99ee5a7bb15b3fc633d6375b767635b1b04ccb2fd3d5L211-R182) These changes ensure better compatibility and performance on newer hardware and software environments, particularly for users leveraging ROCm and CUDA for deep learning and scientific computing tasks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146632 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-02-24 22:47:52 +00:00
PyTorch MergeBot	3e2d9d079e	Revert "[ROCm] OCP FP8 Support for new GPUs (#146632 )" This reverts commit f95ab46797e1f3e8cc48ce2f45e4f6985132fb19. Reverted https://github.com/pytorch/pytorch/pull/146632 on behalf of https://github.com/jeanschmidt due to Breaking internal builds, I'll find someone to help merge this PR back to main ([comment](https://github.com/pytorch/pytorch/pull/146632#issuecomment-2676823614))	2025-02-23 12:04:50 +00:00
Peter Yeh	f95ab46797	[ROCm] OCP FP8 Support for new GPUs (#146632 ) TLDR: Follow up/ Build on top of https://github.com/pytorch/pytorch/pull/144476. add OCP FP8 support for gfx950 refer to https://github.com/pytorch/ao/pull/1677 This pull request includes several changes to improve compatibility and support for new GPU architectures and data types, particularly for ROCm. The key updates involve adding support for new ROCm versions and GPU architectures, updating data type handling, and removing outdated checks. ### Improvements to GPU Architecture and ROCm Version Support: * [`aten/src/ATen/Context.cpp`](diffhunk://#diff-33de472d304acbe57d693c8567370c638068bedc1aa0ce8e9dc115dad05a7810L323-R326): Added support for new GPU architectures `gfx1200`, `gfx1201`, and `gfx950` based on ROCm version checks. * [`aten/src/ATen/native/cuda/Blas.cpp`](diffhunk://#diff-e8a569efee1e650172f120a0fdcda024fe3e4703a4ee3336425c8f685af6b3abL196-R199): Updated architecture support in multiple functions to include `gfx1200`, `gfx1201`, and `gfx950` based on ROCm version checks. [[1]](diffhunk://#diff-e8a569efee1e650172f120a0fdcda024fe3e4703a4ee3336425c8f685af6b3abL196-R199) [[2]](diffhunk://#diff-e8a569efee1e650172f120a0fdcda024fe3e4703a4ee3336425c8f685af6b3abL865-R876) ### Updates to Data Type Handling: * [`aten/src/ATen/cuda/CUDADataType.h`](diffhunk://#diff-9188bb13b1a49f459141f5f9b875593d1c5ce2beb5ad711fdbaf5bc7089ec015L81-L98): Enhanced data type conversion to include new float8 types for both CUDA and ROCm environments. * [`aten/src/ATen/cuda/tunable/GemmHipblaslt.h`](diffhunk://#diff-bfa1a3b5d4bef1892bf50338775f3b0fd8cd31fc1868148f3968b98aefb68e3fL29-R80): Updated `HipDataTypeFor` template to handle new float8 types and added hard-coded enum values for ROCm versions prior to 6.3. ### Removal of Outdated Checks: * [`cmake/public/LoadHIP.cmake`](diffhunk://#diff-b98e27b9a5f196a6965a99ee5a7bb15b3fc633d6375b767635b1b04ccb2fd3d5L169-L197): Removed the check for `HIP_NEW_TYPE_ENUMS` as it is no longer necessary with the updated ROCm versions. [[1]](diffhunk://#diff-b98e27b9a5f196a6965a99ee5a7bb15b3fc633d6375b767635b1b04ccb2fd3d5L169-L197) [[2]](diffhunk://#diff-b98e27b9a5f196a6965a99ee5a7bb15b3fc633d6375b767635b1b04ccb2fd3d5L211-R182) These changes ensure better compatibility and performance on newer hardware and software environments, particularly for users leveraging ROCm and CUDA for deep learning and scientific computing tasks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146632 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2025-02-21 23:44:08 +00:00
Michal Gallus	3f5ed05688	[Windows][ROCm] Fix c10 hip tests (#146599 ) - Solves a problem related to .hip source files being ignored by the build system when HIP language is not enabled in CMake. - Also ensures that the test executables link to an appropriate CRT Runtime Library and hence have access to all the necessary symbols. Previously, there were many problems related to linkage errors. - Moves part of Linux-related hipBLASLt changes in `LoadHIP.cmake` under the UNIX conditional branch, as these aren't supported on Windows yet. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146599 Approved by: https://github.com/jeffdaily	2025-02-06 23:41:25 +00:00
Jeff Daily	6ac0616504	[ROCm] hipblaslt rowwise f8 gemm (#144432 ) hipblaslt added rowwise f8 gemm support. Integrate with scaled_mm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144432 Approved by: https://github.com/drisspg	2025-01-15 18:23:44 +00:00

1 2 3 4 5 ...

359 Commits