pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 12:54:11 +08:00

Author	SHA1	Message	Date
atalman	c17ba69ba5	[submodule] Revert "Adds support for accelerated sorting with x86-simd-sort (#127936 ) (#141901 ) Looks like the original PR caused: https://github.com/pytorch/pytorch/issues/140590 Please see comment: https://github.com/pytorch/pytorch/issues/140590#issuecomment-2508704480 Pull Request resolved: https://github.com/pytorch/pytorch/pull/141901 Approved by: https://github.com/andrewor14, https://github.com/malfet	2024-12-03 00:16:35 +00:00
cyy	0fca51bcc4	[11/N] Fix Wextra-semi warning (#140926 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/140926 Approved by: https://github.com/ezyang	2024-11-20 00:32:45 +00:00
Max Ren	cca34be584	Update XNNPACK Version (#139913 ) Updating XNNPACK Version to 4ea82e595b36106653175dcb04b2aa532660d0d8 submodule update Pull Request resolved: https://github.com/pytorch/pytorch/pull/139913 Approved by: https://github.com/digantdesai, https://github.com/huydhn	2024-11-18 18:16:31 +00:00
Nathan Brown	a290c1d748	Fix building with system GLOO (#140275 ) Leverage existing FindGloo CMake module to locate system's library and headers. Add system's gloo headers to include path rather than the gloo from third party when USE_SYSTEM_GLOO is specified. Fixes #140274 Pull Request resolved: https://github.com/pytorch/pytorch/pull/140275 Approved by: https://github.com/malfet	2024-11-11 22:58:39 +00:00
Yu, Guangye	8051ee802c	Add XPU compiler version control in cmake to keep BC (#139258 ) # Motivation This PR aims to maintain backward compatibility when building PyTorch XPU with the old and new compilers. # Additional Context The details are described here. The new compiler (2025.0.0) has some breaking changes compared with the old compiler(2024.1), for examples: 1. On Windows, sycl library is named `sycl7.lib` in the old compiler but is named `sycl.lib` in the new compiler. 2. On Linux, in order to support ABI=0, we have to link `libsycl-preview.so` in the old compiler but we could link `libsycl.so` in the new compiler to have the same ABI compatibility. 3. We added a macro `SYCL_COMPILER_VERSION` to support our new code has good backward compatibility with the old compiler. Now the new feature(Event elapsed_time, memory summary, and device architecture property) introduced by the new compiler will be controlled within the macro `SYCL_COMPILER_VERSION`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/139258 Approved by: https://github.com/EikanWang, https://github.com/atalman, https://github.com/gujinghui	2024-11-09 13:31:21 +00:00
Matthew Sterrett	7e65060410	Adds support for accelerated sorting with x86-simd-sort (#127936 ) Adds x86-simd-sort as a submodule to accelerate sorting for 32-bit and 64-bit datatypes when AVX2 or AVX512 are available. For contiguous data, this can be over a 10x speedup for large arrays. For discontiguous data, it can give over a 4x speedup with larger arrays. These benchmarks were gathered on a Skylake system (7900x), limited to 8 threads. <details> <summary><b>Contiguous Benchmarks</b></summary> ``` float32, normally distributed (in microseconds) size Default AVX2 AVX512 Default/AVX2 Default/AVX512 16 7.150844336 6.886271477 7.132277489 1.038420335 1.002603214 128 9.208030939 8.478154898 7.846915245 1.086089019 1.173458697 1024 37.79037627 23.60707456 16.44122627 1.600807257 2.298513241 10000 714.7355628 203.9921844 105.5683001 3.503739934 6.770361577 100000 8383.074408 721.6333354 465.3709247 11.61680593 18.01374766 1000000 97124.31945 5632.054572 3920.148401 17.24491803 24.77567416 10000000 1161974.907 86070.48988 71533.82301 13.50027063 16.24371323 int32_t, uniformly distributed (in microseconds) size Default AVX2 AVX512 Default/AVX2 Default/AVX512 16 7.203208685 6.92212224 7.014458179 1.040606975 1.026908779 128 8.972388983 8.195516348 7.592543125 1.094792396 1.18173698 1024 32.77489477 23.6874548 15.36617105 1.383639359 2.132925285 10000 607.8824128 193.3402024 99.25090471 3.144107667 6.124703997 100000 523.9384684 608.1836536 442.3166784 0.861480682 1.184532472 1000000 5211.348627 5271.598405 3518.861883 0.988570871 1.480975611 10000000 133853.6263 81463.05084 67852.97394 1.643120714 1.972700952 ``` </details> Note that the int32_t sort is accelerated by FBGEMM's radix sort for larger arrays, but this only handles contiguous data and in one sorting direction. <details> <summary><b>Discontiguous Benchmarks</b></summary> ``` float, normal distributed, discontiguous in sorted dimension (in microseconds) size Default AVX2 AVX512 Default/AVX2 Default/AVX512 16 3.836543679 4.011214256 3.84376061 0.956454439 0.99812243 128 5.755310194 5.755723127 4.820394962 0.999928257 1.193949923 1024 49.46946019 24.78790785 15.47874362 1.995709379 3.195960952 10000 665.2505291 236.6165959 143.9490662 2.811512551 4.621429974 100000 4328.002203 1329.001212 818.3516414 3.256582586 5.288682743 1000000 47651.5018 16693.72045 11827.39551 2.854456677 4.028909133 10000000 556655.1288 236252.6258 184215.9828 2.356185998 3.021752621 int32_t, uniformly distributed, discontiguous in sorted dimension (in microseconds) size Default AVX2 AVX512 Default/AVX2 Default/AVX512 16 3.817994356 3.878117442 3.770039797 0.984496837 1.012719908 128 5.578731397 5.577152082 4.716770534 1.000283176 1.182743862 1024 43.3412619 23.61275801 14.55446819 1.835501887 2.977866408 10000 634.3997478 224.4322851 133.9518324 2.826686667 4.736028889 100000 4084.358152 1292.363303 781.7867576 3.16037924 5.22438902 1000000 46262.20465 16608.35284 11367.51817 2.785478192 4.06968381 10000000 541231.9104 235185.1861 180249.9294 2.301301028 3.002674742 ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/127936 Approved by: https://github.com/jgong5, https://github.com/peterbell10, https://github.com/sanchitintel	2024-11-02 02:14:01 +00:00
Irem Yuksel	b021486405	Enable Windows Arm64 (#133088 ) This PR enables Pytorch for Windows on Arm64 - CPU only. Currently, there aren't any checks in place to build and test for Windows on Arm64, but we're working to implement those as soon as possible. We recommend using [Arm Performance Libraries (APL)](https://developer.arm.com/Tools%20and%20Software/Arm%20Performance%20Libraries) as a BLAS option, which is introduced in this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/133088 Approved by: https://github.com/malfet Co-authored-by: cristian panaite <panaite.cristian2000@gmail.com> Co-authored-by: Stefan-Alin Pahontu <56953855+alinpahontu2912@users.noreply.github.com> Co-authored-by: Ozan Aydin <148207261+ozanMSFT@users.noreply.github.com>	2024-10-24 16:10:44 +00:00
cyy	af8bd323e8	Remove legacy Caffe2 pthreadpool from CMake (#134936 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/134936 Approved by: https://github.com/ezyang	2024-10-17 05:22:08 +00:00
Nichols A. Romero	e8f1dd6ba0	Fix hardcoded ROCm paths in `Caffe2Targets.cmake` (#136283 ) Fixes #131701 Use CMake imported targets more consistently to eliminate hardcode paths. Here is the new relevant sections of Caffe2Targets.cmake: ``` set_target_properties(c10_hip PROPERTIES INTERFACE_INCLUDE_DIRECTORIES "${_IMPORT_PREFIX}/include" INTERFACE_LINK_LIBRARIES "c10;hip::amdhip64" ) ``` ``` set_target_properties(torch_hip PROPERTIES INTERFACE_COMPILE_DEFINITIONS "USE_C10D_NCCL" INTERFACE_COMPILE_OPTIONS "-fPIC;-D__HIP_PLATFORM_AMD__=1;-DCUDA_HAS_FP16=1;-DUSE_ROCM;-D__HIP_NO_HALF_OPERATORS__=1;-D__HIP_NO_HALF_CONVERSIONS__=1;-DTORCH_HIP_VERSION=602;-Wno-shift-count-negative;-Wno-shift-count-overflow;-Wno-duplicate-decl-specifier;-DCAFFE2_USE_MIOPEN;-DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_HIP;-std=c++17;-DHIPBLAS_V2;-DHIP_NEW_TYPE_ENUMS" INTERFACE_INCLUDE_DIRECTORIES "${_IMPORT_PREFIX}/include" INTERFACE_LINK_LIBRARIES "c10_hip;torch_cpu_library;hip::amdhip64;MIOpen;hiprtc::hiprtc;roc::hipblaslt;roc::hipblas;hip::hipfft;hip::hiprand;roc::hipsparse;roc::hipsolver" ) ``` HIPCUB dependency was not actually used; which is why it is removed here as the imported target had undesirable side effects. Pull Request resolved: https://github.com/pytorch/pytorch/pull/136283 Approved by: https://github.com/jeffdaily, https://github.com/Skylion007, https://github.com/jithunnair-amd, https://github.com/atalman	2024-09-26 00:34:43 +00:00
PyTorch MergeBot	0e19522122	Revert "Adds support for accelerated sorting with x86-simd-sort (#127936 )" This reverts commit 239a9ad65eebf93dcf9bb108a5129d4160b12c86. Reverted https://github.com/pytorch/pytorch/pull/127936 on behalf of https://github.com/atalman due to test/test_sort_and_select.py::TestSortAndSelectCPU::test_sort_discontiguous_slow_cpu_float32 [GH job link](https://github.com/pytorch/pytorch/actions/runs/10994904767/job/30525578456) [HUD commit link](`239a9ad65e`) ([comment](https://github.com/pytorch/pytorch/pull/127936#issuecomment-2368522316))	2024-09-23 14:52:23 +00:00
cyy	c459430558	Pass Werror to CUDA host compiler (#130213 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/130213 Approved by: https://github.com/ezyang	2024-09-21 08:01:06 +00:00
Matthew Sterrett	239a9ad65e	Adds support for accelerated sorting with x86-simd-sort (#127936 ) Adds x86-simd-sort as a submodule to accelerate sorting for 32-bit and 64-bit datatypes when AVX2 or AVX512 are available. For contiguous data, this can be over a 10x speedup for large arrays. For discontiguous data, it can give over a 4x speedup with larger arrays. These benchmarks were gathered on a Skylake system (7900x), limited to 8 threads. <details> <summary><b>Contiguous Benchmarks</b></summary> ``` float32, normally distributed (in microseconds) size Default AVX2 AVX512 Default/AVX2 Default/AVX512 16 7.150844336 6.886271477 7.132277489 1.038420335 1.002603214 128 9.208030939 8.478154898 7.846915245 1.086089019 1.173458697 1024 37.79037627 23.60707456 16.44122627 1.600807257 2.298513241 10000 714.7355628 203.9921844 105.5683001 3.503739934 6.770361577 100000 8383.074408 721.6333354 465.3709247 11.61680593 18.01374766 1000000 97124.31945 5632.054572 3920.148401 17.24491803 24.77567416 10000000 1161974.907 86070.48988 71533.82301 13.50027063 16.24371323 int32_t, uniformly distributed (in microseconds) size Default AVX2 AVX512 Default/AVX2 Default/AVX512 16 7.203208685 6.92212224 7.014458179 1.040606975 1.026908779 128 8.972388983 8.195516348 7.592543125 1.094792396 1.18173698 1024 32.77489477 23.6874548 15.36617105 1.383639359 2.132925285 10000 607.8824128 193.3402024 99.25090471 3.144107667 6.124703997 100000 523.9384684 608.1836536 442.3166784 0.861480682 1.184532472 1000000 5211.348627 5271.598405 3518.861883 0.988570871 1.480975611 10000000 133853.6263 81463.05084 67852.97394 1.643120714 1.972700952 ``` </details> Note that the int32_t sort is accelerated by FBGEMM's radix sort for larger arrays, but this only handles contiguous data and in one sorting direction. <details> <summary><b>Discontiguous Benchmarks</b></summary> ``` float, normal distributed, discontiguous in sorted dimension (in microseconds) size Default AVX2 AVX512 Default/AVX2 Default/AVX512 16 3.836543679 4.011214256 3.84376061 0.956454439 0.99812243 128 5.755310194 5.755723127 4.820394962 0.999928257 1.193949923 1024 49.46946019 24.78790785 15.47874362 1.995709379 3.195960952 10000 665.2505291 236.6165959 143.9490662 2.811512551 4.621429974 100000 4328.002203 1329.001212 818.3516414 3.256582586 5.288682743 1000000 47651.5018 16693.72045 11827.39551 2.854456677 4.028909133 10000000 556655.1288 236252.6258 184215.9828 2.356185998 3.021752621 int32_t, uniformly distributed, discontiguous in sorted dimension (in microseconds) size Default AVX2 AVX512 Default/AVX2 Default/AVX512 16 3.817994356 3.878117442 3.770039797 0.984496837 1.012719908 128 5.578731397 5.577152082 4.716770534 1.000283176 1.182743862 1024 43.3412619 23.61275801 14.55446819 1.835501887 2.977866408 10000 634.3997478 224.4322851 133.9518324 2.826686667 4.736028889 100000 4084.358152 1292.363303 781.7867576 3.16037924 5.22438902 1000000 46262.20465 16608.35284 11367.51817 2.785478192 4.06968381 10000000 541231.9104 235185.1861 180249.9294 2.301301028 3.002674742 ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/127936 Approved by: https://github.com/jgong5, https://github.com/peterbell10	2024-09-20 21:19:33 +00:00
min-jean-cho	416a7894fe	[Windows][XPU] Disable Kineto PTI on Windows only (#134620 ) Disable Kineto + XPU PTI on Windows only. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134620 Approved by: https://github.com/guangyey, https://github.com/malfet Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>	2024-08-29 20:58:55 +00:00
Gregory Comer	3b40b07efb	Update PyTorch for XNNPACK 87ee0b4 (#134518 ) Summary: Update XNNPACK library version. Test Plan: Combined diff CI is clean: D61586079 (all changes, has to be split out for export). Differential Revision: D61822610 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134518 Approved by: https://github.com/mcr229	2024-08-28 19:24:04 +00:00
Xinya Zhang	5fd670e0ef	[ROCM] Properly disable Flash Attention/Efficient Attention with environment variables (#133866 ) Now `USE_FLASH_ATTENTION=0 USE_MEM_EFF_ATTENTION=0 python setup.py` can compile correctly Fixes #125230 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133866 Approved by: https://github.com/jithunnair-amd, https://github.com/jeffdaily, https://github.com/malfet	2024-08-27 18:24:29 +00:00
Mikayla Gawarecki	018e48c337	[Reland] Add wrappers for synchronous GPUDirect Storage APIs (#133489 ) Reland #130633 USE_CUFILE turned off by default in this version Pull Request resolved: https://github.com/pytorch/pytorch/pull/133489 Approved by: https://github.com/albanD	2024-08-15 17:11:52 +00:00
PyTorch MergeBot	fa1d7b0262	Revert "Remove unused Caffe2 macros (#132979 )" This reverts commit da65cfbdea4f1f2176f6242004bda940a24f9ddb. Reverted https://github.com/pytorch/pytorch/pull/132979 on behalf of https://github.com/ezyang due to these are apparently load bearing internally ([comment](https://github.com/pytorch/pytorch/pull/132979#issuecomment-2284666332))	2024-08-12 18:34:56 +00:00
cyy	da65cfbdea	Remove unused Caffe2 macros (#132979 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/132979 Approved by: https://github.com/ezyang	2024-08-09 04:48:20 +00:00
cyy	05e8e87a69	[Submodule] Remove foxi (#132976 ) It is not used after removal of Caffe2 code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/132976 Approved by: https://github.com/ezyang	2024-08-09 03:46:52 +00:00
Chen, Zejun	26b0011fb8	[XPU][Kineto Submodule] Introduce kineto-based XPU profiler (#130811 ) As XPU became a PyTorch built-in device, the profiler support is indispensable part of functionality completeness. This PR is associated with the PR to introduce XPU profiler plugin into the kineto. When USE_XPU is enabled, the LIBKINETO_NOXPUPTI option will be suppressed accordingly, which allows kineto to build with XPU profiler plugin. Associated PR to introduce kineto-based XPU profiler into kineto: https://github.com/pytorch/kineto/pull/961 Also updates the Kineto Submodule to include XPU changes. Co-authored-by: Aaron Enye Shi <enye.shi@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/130811 Approved by: https://github.com/aaronenyeshi	2024-08-07 18:41:37 +00:00
Yu, Guangye	92bebb46fa	Support XPU ABI=0 build (#130110 ) # Motivation This PR intends to support ABI=0 build for XPU backend. # Additional Context The major change is adding a compilation option `-D__INTEL_PREVIEW_BREAKING_CHANGES` for the host compiler(gcc) and `-fpreview-breaking-changes` for XPU device kernel code compiler(icpx), why? Because we use - gcc to compile host code and link SYCL runtime. So we need to pass `-D__INTEL_PREVIEW_BREAKING_CHANGES` to tell the host compiler invoking the ABI-neutral API included in SYCL. And - use icpx to compile device kernel code and link SYCL runtime. So we need to pass `-fpreview-breaking-changes` to tell the device kernel compiler building ABI-neutral code. Besides, - `libsycl-preview.so` is an ABI-neutral library but `libsycl.so` is not. This PR depends on https://github.com/pytorch/pytorch/pull/131643. Pull Request resolved: https://github.com/pytorch/pytorch/pull/130110 Approved by: https://github.com/EikanWang, https://github.com/gujinghui, https://github.com/albanD	2024-08-01 21:42:14 +00:00
PyTorch MergeBot	e191b83462	Revert "Add wrappers for synchronous GPUDirect Storage APIs (#130633 )" This reverts commit 709ddf7a9dcfa1268848b72f6f56b55afa6728d6. Reverted https://github.com/pytorch/pytorch/pull/130633 on behalf of https://github.com/clee2000 due to still failing internally D60265673 ([comment](https://github.com/pytorch/pytorch/pull/130633#issuecomment-2253239607))	2024-07-26 18:08:20 +00:00
Mikayla Gawarecki	709ddf7a9d	Add wrappers for synchronous GPUDirect Storage APIs (#130633 ) Based in part on https://github.com/NVIDIA/apex/pull/1774 Differential Revision: [D60155434](https://our.internmc.facebook.com/intern/diff/D60155434) Pull Request resolved: https://github.com/pytorch/pytorch/pull/130633 Approved by: https://github.com/albanD	2024-07-25 22:23:38 +00:00
PyTorch MergeBot	e4b5645f83	Revert "Add wrappers for synchronous GPUDirect Storage APIs (#130633 )" This reverts commit 5b5e0698a5f560decb9bbdd150ed7b0622eb7777. Reverted https://github.com/pytorch/pytorch/pull/130633 on behalf of https://github.com/clee2000 due to breaking a lot of jobs and build rules internally D60085885, possibly needs to update some bazel build? ([comment](https://github.com/pytorch/pytorch/pull/130633#issuecomment-2245806738))	2024-07-23 17:19:34 +00:00
Mikayla Gawarecki	5b5e0698a5	Add wrappers for synchronous GPUDirect Storage APIs (#130633 ) Based in part on https://github.com/NVIDIA/apex/pull/1774 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130633 Approved by: https://github.com/albanD	2024-07-22 14:51:24 +00:00
cyy	a6345d3477	[CMake] [3/N] Remove unused code (#130322 ) Some functions used by Caffe2 were removed along with some outdated checks. Follows #130006. Pull Request resolved: https://github.com/pytorch/pytorch/pull/130322 Approved by: https://github.com/r-barnes	2024-07-09 19:33:33 +00:00
Yichen Yan	953c6476bd	[CMAKE] Look for `Development.Module` instead of `Development` (#129669 ) Based on the [cmake issue](https://gitlab.kitware.com/cmake/cmake/-/issues/23716) and [manylinux issue](https://github.com/pypa/manylinux/issues/1347), when building a python module, it should find the `Development.Module` module, not `Development`, which includes `Development.Module` and `Development.Embed`, and will expect the shared python library only. After this PR and before #124613, pytorch could be built with a static libpython (e.g. in manylinux). Pull Request resolved: https://github.com/pytorch/pytorch/pull/129669 Approved by: https://github.com/malfet	2024-07-09 09:16:43 +00:00
Shivam Raikundalia	a21d4363d2	[Profiler] Remove all instances of TMP_USE_TSC_AS_TIMESTAMP (#129973 ) Summary: Now that D56584521 is in, we can remove all insteances of TMP_USE_TSC_AS_TIMESTAMP Test Plan: Ran resnet. Trace looks good https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/devvm2185.cco0.facebook.com/rank-0.Jun_27_14_46_01.1967733.pt.trace.json.gz&bucket=gpu_traces Reviewed By: aaronenyeshi, swolchok Differential Revision: D59132793 Pull Request resolved: https://github.com/pytorch/pytorch/pull/129973 Approved by: https://github.com/aaronenyeshi	2024-07-03 19:28:52 +00:00
cyy	46366888d7	Remove outdated CMake code (#129851 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/129851 Approved by: https://github.com/ezyang	2024-07-02 00:40:37 +00:00
Shivam Raikundalia	1d0efedc85	[Profiler] Add TSC Clock Callback to CUPTI (#125036 ) Summary: Right now we use the default clock for CUPTI which is not monotonic nor particularly fast. We have already added the Kineto side of the implementation here: https://www.internalfb.com/diff/D56525885 This diff only adds the compile flags such that the TSC format is used and sets the converter using a libkineto call in the profiler Test Plan: Obtained following trace using resnet test: https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/devvm2185.cco0.facebook.com/rank-0.Apr_25_11_03_18.3862943.pt.trace.json.gz&bucket=gpu_traces TBD: Add benchmarks Differential Revision: D56584521 Pull Request resolved: https://github.com/pytorch/pytorch/pull/125036 Approved by: https://github.com/aaronenyeshi	2024-06-27 21:07:43 +00:00
Chirag Pandya	64f1111d38	Expose nholmann json to torch (#129570 ) Summary: Expose nlohmann json library so that it can be used from inside Pytorch. The library already exists in the `third_party` directory. This PR is making `nlohmann/json.hpp` header available to be used from `torch.distributed`. The next PR makes actual use of this header. imported-using-ghimport Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D59035246 Pulled By: c-p-i-o Pull Request resolved: https://github.com/pytorch/pytorch/pull/129570 Approved by: https://github.com/d4l3k, https://github.com/malfet	2024-06-26 21:59:26 +00:00
cyy	479ce5e2f4	Remove outdated CUDA code from CMake (#128801 ) It's possible to simplify some CUDA handling logic in CMake. Pull Request resolved: https://github.com/pytorch/pytorch/pull/128801 Approved by: https://github.com/r-barnes, https://github.com/malfet	2024-06-21 15:00:00 +00:00
sdp	b4a0161449	Build SYCL kernels for ATen XPU ops on Native Windows (take 2) (#127390 ) Original PR https://github.com/pytorch/pytorch/pull/126725 is closed due to bad rebase. ------- As proposed in https://github.com/pytorch/pytorch/issues/126719, we are enabling PyTorch XPU on Native Windows on Intel GPU. This PR enables XPU build on Windows as the first step of #126719: - Enable `USE_XPU` build on Windows using MSVC as host compiler. The use of MSVC as host compiler seamlessly aligns with the existing PyTorch build on Windows. - Build oneDNN GPU library on Windows. Co-authored-by: Yu, Guangye <guangye.yu@intel.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/127390 Approved by: https://github.com/guangyey, https://github.com/EikanWang, https://github.com/gujinghui, https://github.com/ezyang	2024-06-06 01:41:06 +00:00
cyy	df75a9dc80	Remove Caffe2/onnx (#127991 ) Remove Caffe2/onnx since it is not used. Other tiny fixes are also applied. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127991 Approved by: https://github.com/ezyang	2024-06-05 15:10:12 +00:00
Tristan Rice	597922ba21	Reapply "distributed debug handlers (#126601 )" (#127805 ) This reverts commit 7646825c3eb687030c4f873b01312be0eed80174. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127805 Approved by: https://github.com/PaliC	2024-06-04 19:44:30 +00:00
atalman	53f001c599	Revert "correct BLAS input (#126200 )" (#127762 ) This reverts commit ea13e9a097aaa875a2b404822579b7f8b62ea291. Looks like this could have caused: https://github.com/pytorch/pytorch/actions/runs/9346105069/job/25722431775#step:17:984 Aarch64 tests failures: ``` + echo 'Checking that MKLDNN is available on aarch64' Checking that MKLDNN is available on aarch64 + pushd /tmp /tmp / + python -c 'import torch; exit(0 if torch.backends.mkldnn.is_available() else 1)' Error: Process completed with exit code 1. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/127762 Approved by: https://github.com/PaliC, https://github.com/malfet	2024-06-03 15:49:48 +00:00
cyy	4e7f497bb3	[Submodule] Remove ios-cmake (#127694 ) It has not been updated for a long time and CI iOS builds don't rely on it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127694 Approved by: https://github.com/ezyang	2024-06-02 04:40:21 +00:00
Robert Mast	ea13e9a097	correct BLAS input (#126200 ) Fixes #32407 With this little correction to Dependencies.cmake it is possible to build an MKL-free version of Pytorch up from version v2.0.0 by explicitly choosing another MKL-free BLAS. This pullrequest fulfills the "if not already present" part of the original comment in Dependencies.cmake: "setting default preferred BLAS options if not already present." It's tested with this Action-.yml: ``` name: Build PyTorch v2.0.0 without AVX on: push: branches: - v2.0.0 pull_request: branches: - v2.0.0 jobs: build: runs-on: ubuntu-20.04 defaults: run: shell: bash -el {0} steps: - name: Checkout repository uses: actions/checkout@v4 with: #repository: 'pytorch/pytorch' #ref: 'v2.3.0' submodules: 'recursive' - uses: conda-incubator/setup-miniconda@v3 with: auto-activate-base: true activate-environment: true python-version: 3.10.13 - name: Install Dependencies - Common - Linux 2 run: \| conda info conda list conda install nomkl conda install astunparse numpy ninja pyyaml setuptools cmake cffi typing_extensions future six requests dataclasses export PYTORCH_CPU_CAPABILITY=cpu export ATEN_CPU_CAPABILITY_DEFAULT=cpu export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"} export ATEN_CPU_CAPABILITY=default export USE_NNPACK=0 export MAX_JOBS=4 export USE_CUDA=0 export USE_ROCM=0 export BLAS=OpenBLAS export CMAKE_ARGS="-D CMAKE_BUILD_TYPE=Release -D USE_AVX=OFF -D USE_NNPACK=OFF -D C_HAS_AVX_2=OFF -D C_HAS_AVX2_2=OFF -D CXX_HAS_AVX_2=OFF -D CXX_HAS_AVX2_2=OFF -D CAFFE2_COMPILER_SUPPORTS_AVX512_EXTENSIONS=OFF -DPYTHON_INCLUDE_DIR=$(python -c "import sysconfig; print(sysconfig.get_path('include'))") -DPYTHON_LIBRARY=$(python -c "import sysconfig; print(sysconfig.get_config_var('LIBDIR'))") -DPYTHON_EXECUTABLE:FILEPATH=`which python`" pip install build wheel typing_extensions python setup.py bdist_wheel - name: Archive production artifacts uses: actions/upload-artifact@v4 with: name: dist-without-markdown path: \| dist !dist/*/.md ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/126200 Approved by: https://github.com/jgong5, https://github.com/kit1980	2024-05-31 19:38:42 +00:00
cyy	3e66052e16	Improve python3 discovery code in CMake (#127600 ) The improvement is based on my comments in #124613 and it also fixes the current linux-s390x-binary-manywheel CI failures. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127600 Approved by: https://github.com/Skylion007	2024-05-31 17:29:06 +00:00
cyy	0c5faee372	Replace python::python with Python::Module (#127485 ) Use found Python::Module target Pull Request resolved: https://github.com/pytorch/pytorch/pull/127485 Approved by: https://github.com/ezyang	2024-05-31 05:57:05 +00:00
PyTorch MergeBot	7646825c3e	Revert "distributed debug handlers (#126601 )" This reverts commit 3d541835d509910fceca00fc5a916e9718c391d8. Reverted https://github.com/pytorch/pytorch/pull/126601 on behalf of https://github.com/PaliC due to breaking internal typechecking tests ([comment](https://github.com/pytorch/pytorch/pull/126601#issuecomment-2141076987))	2024-05-31 01:21:24 +00:00
cyy	d44daebdbc	[Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/127051 Approved by: https://github.com/cpuhrsch, https://github.com/malfet	2024-05-31 01:20:45 +00:00
Tristan Rice	3d541835d5	distributed debug handlers (#126601 ) This adds debug handlers as described in: * https://gist.github.com/d4l3k/828b7be585c7615e85b2c448b308d925 (public copy) * https://docs.google.com/document/d/1la68szcS6wUYElUUX-P6zXgkPA8lnfzpagMTPys3aQ8/edit (internal copy) This is only adding the C++ pieces that will be used from the main process. The Python and torchrun pieces will be added in a follow up PR. This adds 2 handlers out of the box: * `/handler/ping` for testing purposes * `/handler/dump_nccl_trace_pickle` as a POC integration with Flight Recorder Test plan: ``` python test/distributed/elastic/test_control_plane.py ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/126601 Approved by: https://github.com/kurman, https://github.com/c-p-i-o	2024-05-30 02:21:08 +00:00
PyTorch MergeBot	67739d8c6f	Revert "[Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051 )" This reverts commit 699db7988d84d163ebb6919f78885e4630182a7a. Reverted https://github.com/pytorch/pytorch/pull/127051 on behalf of https://github.com/PaliC due to This PR needs to be synced using the import button as there is a bug in our diff train ([comment](https://github.com/pytorch/pytorch/pull/127051#issuecomment-2138496995))	2024-05-30 01:16:57 +00:00
cyy	8ea1dc8748	Use Python::NumPy target (#127399 ) Now that we use FindPython, use it again for numpy detection. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127399 Approved by: https://github.com/malfet	2024-05-29 23:17:58 +00:00
Nikita Shulga	0910429d72	[BE][CMake] Use FindPython module (#124613 ) As FindPythonInterp and FindPythonLibs has been deprecated since cmake-3.12 Replace `PYTHON_EXECUTABLE` with `Python_EXECUTABLE` everywhere (CMake variable names are case-sensitive) This makes PyTorch buildable with python3 binary shipped with XCode on MacOS TODO: Get rid of `FindNumpy` as its part of Python package Pull Request resolved: https://github.com/pytorch/pytorch/pull/124613 Approved by: https://github.com/cyyever, https://github.com/Skylion007	2024-05-29 13:17:35 +00:00
cyy	699db7988d	[Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/127051 Approved by: https://github.com/cpuhrsch, https://github.com/malfet	2024-05-29 11:58:03 +00:00
PyTorch MergeBot	cdbb2c9acc	Revert "[Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051 )" This reverts commit 4fdbaa794f9d5af2f171f772a51cb710c51c925f. Reverted https://github.com/pytorch/pytorch/pull/127051 on behalf of https://github.com/PaliC due to This PR needs to be synced using the import button as there is a bug in our diff train ([comment](https://github.com/pytorch/pytorch/pull/127051#issuecomment-2136428735))	2024-05-29 03:02:35 +00:00
cyy	4fdbaa794f	[Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/127051 Approved by: https://github.com/cpuhrsch, https://github.com/malfet	2024-05-27 03:54:03 +00:00
cyy	95e5c994f9	[Submodule] Clear USE_QNNPACK build option (#126941 ) Following the removal of QNNPACK third-party module #126657, we can clear more build system code. Also third_party/neon2sse was removed because it is not used. Pull Request resolved: https://github.com/pytorch/pytorch/pull/126941 Approved by: https://github.com/ezyang	2024-05-24 00:12:56 +00:00

1 2 3 4 5 ...

625 Commits