pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Author	SHA1	Message	Date
Xuehai Pan	a3abfa5cb5	[BE][Easy][1/19] enforce style for empty lines in import segments (#129752 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129752 Approved by: https://github.com/ezyang, https://github.com/malfet	2024-07-16 00:42:56 +00:00
PyTorch MergeBot	074a5c0c9b	Revert "[BE] bump `optree` version to 0.12.1 (#130139 )" This reverts commit 8fcb156e8b5697a8f292db6db2a1803c5f4ce2d7. Reverted https://github.com/pytorch/pytorch/pull/130139 on behalf of https://github.com/clee2000 due to broke inductor/test_torchinductor_codegen_dynamic_shapes.py and test_sympy_utils.py `8fcb156e8b` ([comment](https://github.com/pytorch/pytorch/pull/130139#issuecomment-2229248447))	2024-07-15 19:42:11 +00:00
Xuehai Pan	8fcb156e8b	[BE] bump `optree` version to 0.12.1 (#130139 ) 0.12.0 Major Updates: - Add context manager to temporarily set the dictionary sorting mode - Add accessor APIs - Use `stable` tag for `pybind11` for Python 3.13 support - Fix potential segmentation fault for pickling support 0.12.1 Updates: - Fix warning regression during import when launch with strict warning filters Closes #130155 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130139 Approved by: https://github.com/zou3519	2024-07-15 17:27:07 +00:00
Catherine Lee	df50452279	Pin optree==0.11.0 on windows CI (#130155 ) Fixes #ISSUE_NUMBER doctests test_testing Failing run has 0.12.0 https://github.com/pytorch/pytorch/actions/runs/9804335516/job/27072891998 Succeeding run has 0.11.0 https://github.com/pytorch/pytorch/actions/runs/9798330845/job/27057359554 It is already pinned for mac and linux Pull Request resolved: https://github.com/pytorch/pytorch/pull/130155 Approved by: https://github.com/huydhn, https://github.com/atalman	2024-07-05 20:28:58 +00:00
PaliC	3d56673b24	[Split Build][BE] remove extraneous .py, .a, and .so files (#130053 ) Removes extraneous .a, .so, and .py files from the split build. From here we can also clean up the builder script which produces the binary to do this. That pr is https://github.com/pytorch/builder/pull/1912 Verification: The built wheel with BUILD_LIBTORCH_WHL=1 has the following files only (with .a, .so, and .py extensions) ``` sahanp@devgpu086 ~/p/dist (viable/strict)> pwd (pytorch-3.10) /home/sahanp/pytorch/dist sahanp@devgpu086 ~/p/dist (viable/strict)> find . -type f $ -name ".py" -o -name ".a" -o -name "*.so" $ (pytorch-3.10) ./torch/__init__.py ./torch/lib/libbackend_with_compiler.so ./torch/lib/libc10.so ./torch/lib/libjitbackend_test.so ./torch/lib/libtorch.so ./torch/lib/libtorch_cpu.so ./torch/lib/libtorch_global_deps.so ./torch/lib/libtorchbind_test.so sahanp@devgpu086 ~/p/dist (viable/strict)> ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/130053 Approved by: https://github.com/atalman	2024-07-05 19:05:32 +00:00
Randolf Scholz	22a06869f2	include jit/*.pyi (#129654 ) Fixes #108781, see https://github.com/pytorch/pytorch/pull/108782#issuecomment-1927321532 Pull Request resolved: https://github.com/pytorch/pytorch/pull/129654 Approved by: https://github.com/ezyang	2024-06-28 12:40:11 +00:00
Xu Han	424068d0d2	[Windows] remove mkl shared library dependency. (#129493 ) # Background I have fixed pytorch Windows missing mkl shared library dependency issue: https://github.com/pytorch/pytorch/issues/124009 The solution is change torch_cpu module static link mkl library: 1. pytorch static link mkl PR: https://github.com/pytorch/pytorch/pull/124925 2. builder install mkl static library: https://github.com/pytorch/builder/pull/1790 Double confirmed current build is using mkl static link: https://github.com/pytorch/pytorch/issues/124009#issuecomment-2160941802 # Goal Remove setup.py `install_requires` will install mkl shared lib on pytorch Windows. It is not required now, due to we have static linked it. It will reduce the pytorch install network traffic and avoid install useless mkl shared library package. Pull Request resolved: https://github.com/pytorch/pytorch/pull/129493 Approved by: https://github.com/malfet	2024-06-28 11:42:21 +00:00
Nikita Shulga	816e8a3f21	[MacOS] Improve libomp packaging (#129473 ) Instead of replacing `@rpath/libomp.dylib` with `@loadper_path/libomp.dylib`, keep it in place and add `@loadper_path` as new rpath This should prevent double-loading of OpenMP runtime, because in case of `@rpath` loader is allowed to reuse other libraries, but `loadper_path` directive forces it to load it from the location relative to the executable Test plan: - Prepare the environment ```shell conda create -n py310-cf python=3.10 numpy pip -c conda-forge conda activate py310-cf pip install torch --index-url https://download.pytorch.org/whl/test/cpu ``` - Verify that OpenMP is loaded twice and than crashes ```shell KMP_VERSION=true python -c "import numpy as np; import torch; print(torch.__version__, torch.backends.openmp.is_available()); print(torch.rand(300, 300).abs().max())" ``` output: ``` LLVM OMP version: 5.0.20140926 LLVM OMP library type: performance LLVM OMP link type: dynamic LLVM OMP build time: no_timestamp LLVM OMP build compiler: Clang 16.0 LLVM OMP alternative compiler support: yes LLVM OMP API version: 5.0 (201611) LLVM OMP dynamic error checking: no LLVM OMP thread affinity support: no LLVM OMP version: 5.0.20140926 LLVM OMP library type: performance LLVM OMP link type: dynamic LLVM OMP build time: no_timestamp LLVM OMP build compiler: Clang 12.0 LLVM OMP alternative compiler support: yes LLVM OMP API version: 5.0 (201611) LLVM OMP dynamic error checking: no LLVM OMP thread affinity support: no 2.4.0 True zsh: segmentation fault KMP_VERSION=true python -c ``` - Install artifact from this PR and make sure it passes the same test ```shell python -mpip install ~/Downloads/torch-2.5.0.dev20240625-cp310-none-macosx_11_0_arm64.whl KMP_VERSION=true python -c "import numpy as np; import torch; print(torch.__version__, torch.backends.openmp.is_available()); print(torch.rand(300, 300).abs().max())" ``` output ``` LLVM OMP version: 5.0.20140926 LLVM OMP library type: performance LLVM OMP link type: dynamic LLVM OMP build time: no_timestamp LLVM OMP build compiler: Clang 16.0 LLVM OMP alternative compiler support: yes LLVM OMP API version: 5.0 (201611) LLVM OMP dynamic error checking: no LLVM OMP thread affinity support: no 2.5.0.dev20240625 True tensor(1.0000) ``` - Make sure it still uses bundled OpenMP if none is available in the environment ``` conda uninstall numpy -c conda-forge KMP_VERSION=true python -c "from ctypes import cdll, c_char_p, c_uint32; import torch; from ctypes import cdll, c_char_p, c_uint32; libdyld = cdll.LoadLibrary('libSystem.dylib'); libdyld._dyld_image_count.restype = c_uint32; libdyld._dyld_get_image_name.restype = c_char_p; libdyld._dyld_get_image_name.argtypes = [c_uint32]; print(torch.rand(300, 300).abs().max()); libs = [libdyld._dyld_get_image_name(i).decode('ascii') for i in range(libdyld._dyld_image_count())]; print([l for l in libs if 'libomp.dylib' in l])" ``` Fixes https://github.com/pytorch/pytorch/issues/124497 and https://github.com/pytorch/pytorch/issues/126385 Pull Request resolved: https://github.com/pytorch/pytorch/pull/129473 Approved by: https://github.com/atalman	2024-06-25 19:12:34 +00:00
PaliC	b0044e2e18	[Split Build] Support nightly release (#129011 ) This PR adds the split build to our binaries workflow. Validation for the workflow is done using the PR above in conjunction with https://github.com/pytorch/builder/pull/1876. Test Workflow: Check CI in the workflow above Pull Request resolved: https://github.com/pytorch/pytorch/pull/129011 Approved by: https://github.com/atalman	2024-06-22 05:45:14 +00:00
cyy	479ce5e2f4	Remove outdated CUDA code from CMake (#128801 ) It's possible to simplify some CUDA handling logic in CMake. Pull Request resolved: https://github.com/pytorch/pytorch/pull/128801 Approved by: https://github.com/r-barnes, https://github.com/malfet	2024-06-21 15:00:00 +00:00
PaliC	7d33ff59ba	[Split Build]Use same package (#127934 ) This PR removes the second separate package we were using for the libtorch wheel. In terms of testing that this works we will look use the PRs above this in the stack. As for sanity checking these are the wheels that are produced by running ``` python setup.py clean && BUILD_LIBTORCH_WHL=1 with-proxy python setup.py bdist_whee l && BUILD_PYTHON_ONLY=1 with-proxy python setup.py bdist_wheel --cmake ``` ``` sahanp@devgpu086 ~/pytorch ((5f15e171…))> ls -al dist/ (pytorch-3.10) total 677236 drwxr-xr-x 1 sahanp users 188 Jun 4 12:19 ./ drwxr-xr-x 1 sahanp users 1696 Jun 4 12:59 ../ -rw-r--r-- 1 sahanp users 81405742 Jun 4 12:19 torch-2.4.0a0+gitca0a73c-cp310-cp310-linux_x86_64.whl -rw-r--r-- 1 sahanp users 612076919 Jun 4 12:19 libtorch-2.4.0a0+gitca0a73c-py3-none-any.whl ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/127934 Approved by: https://github.com/atalman	2024-06-19 15:57:21 +00:00
albanD	8bcebc8dae	Add runtime dependency on setuptools for cpp_extensions (#127921 ) As per title since this was removed from the builtin python binary in 3.12 and we use it `torch.utils.cpp_extension.*`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127921 Approved by: https://github.com/Skylion007	2024-06-05 23:59:38 +00:00
cyy	d44daebdbc	[Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/127051 Approved by: https://github.com/cpuhrsch, https://github.com/malfet	2024-05-31 01:20:45 +00:00
PyTorch MergeBot	67739d8c6f	Revert "[Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051 )" This reverts commit 699db7988d84d163ebb6919f78885e4630182a7a. Reverted https://github.com/pytorch/pytorch/pull/127051 on behalf of https://github.com/PaliC due to This PR needs to be synced using the import button as there is a bug in our diff train ([comment](https://github.com/pytorch/pytorch/pull/127051#issuecomment-2138496995))	2024-05-30 01:16:57 +00:00
PaliC	9257a0698b	[Split Build] Load dependencies from libtorch in __init__.py (#126826 ) This PR makes it such that we search for a libtorch wheel when initializing pytorch in order to find the necessary shared libraries. Pull Request resolved: https://github.com/pytorch/pytorch/pull/126826 Approved by: https://github.com/huydhn, https://github.com/atalman, https://github.com/ZainRizvi	2024-05-29 22:03:50 +00:00
cyy	699db7988d	[Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/127051 Approved by: https://github.com/cpuhrsch, https://github.com/malfet	2024-05-29 11:58:03 +00:00
PaliC	a25b28a753	[Split Build] Add option to create libtorch wheel and use it to build pytorch as a separate wheel (#126328 ) Creates an option to just build the libtorch portion of pytorch such that we have the necessary .so files. Then it builds a torch package using the libtorch wheel. These options are enabled using ` BUILD_LIBTORCH_WHL` and `BUILD_PYTHON_ONLY`. We run ``` BUILD_LIBTORCH_WHL=1 python setup.py install python setup.py clean BUILD_PYTHON_ONLY=1 python setup.py install ``` to produce ``` sahanp@devgpu086 ~/pytorch (detached HEAD\|REBASE-i 3/5)> ls /home/sahanp/.conda/envs/pytorch-3.10/lib/python3.10/site-packages/torch/lib/ (pytorch-3.10) libshm.so* libtorch_global_deps.so* libtorch_python.so* sahanp@devgpu086 ~/pytorch (detached HEAD\|REBASE-i 3/5)> ldd build/lib/libtorch_python.so (pytorch-3.10) linux-vdso.so.1 (0x00007ffdc2d37000) libtorch.so => /home/sahanp/.conda/envs/pytorch-3.10/lib/python3.10/site-packages/libtorch/lib/libtorch.so (0x00007f539fe99000) libshm.so => /home/sahanp/pytorch/build/lib/libshm.so (0x00007f539fe90000) libcudnn.so.8 => /usr/local/cuda-12.1/targets/x86_64-linux/lib/libcudnn.so.8 (0x00007f539e800000) libnvToolsExt.so.1 => /usr/local/cuda/lib64/libnvToolsExt.so.1 (0x00007f539e400000) libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f539e000000) libm.so.6 => /lib64/libm.so.6 (0x00007f539fda5000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f539ebe5000) libc.so.6 => /lib64/libc.so.6 (0x00007f539dc00000) /lib64/ld-linux-x86-64.so.2 (0x00007f539fea0000) libtorch_cpu.so => /home/sahanp/.conda/envs/pytorch-3.10/lib/python3.10/site-packages/libtorch/lib/libtorch_cpu.so (0x00007f5392400000) libtorch_cuda.so => /home/sahanp/.conda/envs/pytorch-3.10/lib/python3.10/site-packages/libtorch/lib/libtorch_cuda.so (0x00007f5380000000) librt.so.1 => /lib64/librt.so.1 (0x00007f539fd9e000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f539fd99000) libdl.so.2 => /lib64/libdl.so.2 (0x00007f539fd94000) libc10.so => /home/sahanp/.conda/envs/pytorch-3.10/lib/python3.10/site-packages/libtorch/lib/libc10.so (0x00007f539eb07000) libmkl_intel_lp64.so.2 => /home/sahanp/.conda/envs/pytorch-3.10/lib/libmkl_intel_lp64.so.2 (0x00007f537ec00000) libmkl_gnu_thread.so.2 => /home/sahanp/.conda/envs/pytorch-3.10/lib/libmkl_gnu_thread.so.2 (0x00007f537ce00000) libmkl_core.so.2 => /home/sahanp/.conda/envs/pytorch-3.10/lib/libmkl_core.so.2 (0x00007f5378800000) libomp.so => /home/sahanp/.conda/envs/pytorch-3.10/lib/libomp.so (0x00007f539e707000) libcupti.so.12 => /usr/local/cuda/lib64/libcupti.so.12 (0x00007f5377e00000) libcudart.so.12 => /usr/local/cuda/lib64/libcudart.so.12 (0x00007f5377a00000) libc10_cuda.so => /home/sahanp/.conda/envs/pytorch-3.10/lib/python3.10/site-packages/libtorch/lib/libc10_cuda.so (0x00007f539ea6a000) libcusparse.so.12 => /usr/local/cuda/lib64/libcusparse.so.12 (0x00007f5368400000) libcufft.so.11 => /usr/local/cuda/lib64/libcufft.so.11 (0x00007f535ee00000) libcusolver.so.11 => /usr/local/cuda/lib64/libcusolver.so.11 (0x00007f534c800000) libcurand.so.10 => /usr/local/cuda/lib64/libcurand.so.10 (0x00007f5346200000) libcublas.so.12 => /usr/local/cuda/lib64/libcublas.so.12 (0x00007f533f800000) libcublasLt.so.12 => /usr/local/cuda/lib64/libcublasLt.so.12 (0x00007f531e800000) libutil.so.1 => /lib64/libutil.so.1 (0x00007f539ea63000) libnvJitLink.so.12 => /usr/local/cuda/lib64/libnvJitLink.so.12 (0x00007f531b800000) sahanp@devgpu086 ~/pytorch (detached HEAD\|REBASE-i 3/5)> ldd build/lib/libtorch_global_deps.so (pytorch-3.10) linux-vdso.so.1 (0x00007ffc265df000) libmkl_intel_lp64.so.2 => /home/sahanp/.conda/envs/pytorch-3.10/lib/libmkl_intel_lp64.so.2 (0x00007fa93fc00000) libmkl_gnu_thread.so.2 => /home/sahanp/.conda/envs/pytorch-3.10/lib/libmkl_gnu_thread.so.2 (0x00007fa93de00000) libmkl_core.so.2 => /home/sahanp/.conda/envs/pytorch-3.10/lib/libmkl_core.so.2 (0x00007fa939800000) libm.so.6 => /lib64/libm.so.6 (0x00007fa940f05000) libcudart.so.12 => /usr/local/cuda/lib64/libcudart.so.12 (0x00007fa939400000) libnvToolsExt.so.1 => /usr/local/cuda/lib64/libnvToolsExt.so.1 (0x00007fa939000000) libgomp.so.1 => /home/sahanp/.conda/envs/pytorch-3.10/lib/libgomp.so.1 (0x00007fa93fb07000) libc.so.6 => /lib64/libc.so.6 (0x00007fa938c00000) libdl.so.2 => /lib64/libdl.so.2 (0x00007fa940efe000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fa940ef9000) /lib64/ld-linux-x86-64.so.2 (0x00007fa940ff5000) librt.so.1 => /lib64/librt.so.1 (0x00007fa940ef2000) libstdc++.so.6 => /home/sahanp/.conda/envs/pytorch-3.10/lib/libstdc++.so.6 (0x00007fa93921d000) libgcc_s.so.1 => /home/sahanp/.conda/envs/pytorch-3.10/lib/libgcc_s.so.1 (0x00007fa93faec000) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/126328 Approved by: https://github.com/atalman	2024-05-29 04:33:56 +00:00
PyTorch MergeBot	cdbb2c9acc	Revert "[Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051 )" This reverts commit 4fdbaa794f9d5af2f171f772a51cb710c51c925f. Reverted https://github.com/pytorch/pytorch/pull/127051 on behalf of https://github.com/PaliC due to This PR needs to be synced using the import button as there is a bug in our diff train ([comment](https://github.com/pytorch/pytorch/pull/127051#issuecomment-2136428735))	2024-05-29 03:02:35 +00:00
cyy	4fdbaa794f	[Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/127051 Approved by: https://github.com/cpuhrsch, https://github.com/malfet	2024-05-27 03:54:03 +00:00
cyy	7428fd19fe	Remove outdated options from setup.py (#125988 ) Since the recent removal of Caffe2 files. Pull Request resolved: https://github.com/pytorch/pytorch/pull/125988 Approved by: https://github.com/ezyang	2024-05-21 18:48:23 +00:00
cyy	4ed93d6e0c	[Submodule] Remove zstd dependency (#126485 ) After searching in the codebase, it seems that zstd is not in use now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/126485 Approved by: https://github.com/ezyang	2024-05-17 12:49:23 +00:00
FEI	b950217f19	Support third-party devices emit a range for each autograd operator (#125822 ) Fixes #125752 Pull Request resolved: https://github.com/pytorch/pytorch/pull/125822 Approved by: https://github.com/aaronenyeshi	2024-05-15 05:06:24 +00:00
Richard Barnes	b9e7b35912	Remove caffe2 from more build files (#125898 ) Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/125898 Approved by: https://github.com/Skylion007	2024-05-13 18:37:59 +00:00
PyTorch MergeBot	33fae4fcf4	Revert "Use recursive blob for package data (#119257 )" This reverts commit f20e3ae0c36146c962a5665018e9ad662a7cf211. Reverted https://github.com/pytorch/pytorch/pull/119257 on behalf of https://github.com/malfet due to This likely caused https://github.com/pytorch/pytorch/issues/124941, not sure why warning about recursive grep was ignored ([comment](https://github.com/pytorch/pytorch/pull/119257#issuecomment-2078312309))	2024-04-25 23:08:22 +00:00
Florian	7ad6dc2cf3	[Profiler][PrivateUse1] Profiler support PrivateUse1 key (#124818 ) Summary: 1.Package public headers of kineto if USE_KINETO so that they can be used by PrivateUse1 user. 2.Add PrivateUse1 key to ActivityType. 3. Support PrivateUse1 key in function deviceTypeFromActivity and _supported_activities. 4. Fix some bugs when processing profiler results. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124818 Approved by: https://github.com/aaronenyeshi	2024-04-24 18:52:08 +00:00
Timmy Xiao	f20e3ae0c3	Use recursive blob for package data (#119257 ) setup.py now supports recursive glob for package data I only added `.cpp`, `.h`, and `.yaml` files. Not sure if you want to include BAZEL or other files in package_data. Pull Request resolved: https://github.com/pytorch/pytorch/pull/119257 Approved by: https://github.com/zou3519	2024-04-20 06:33:39 +00:00
PyTorch MergeBot	36f6928a37	Revert "[Profiler][PrivateUse1] Profiler support PrivateUse1 key (#120556 )" This reverts commit 41613a0803f7cde7956f039bc80f94253b0843f9. Reverted https://github.com/pytorch/pytorch/pull/120556 on behalf of https://github.com/aaronenyeshi due to Breaks GPU Chrome trace UI ([comment](https://github.com/pytorch/pytorch/pull/120556#issuecomment-2061578951))	2024-04-17 15:38:14 +00:00
Xuehai Pan	2e48f7b044	[pytree] add `tree_iter` function (#123913 ) - Add a new `tree_iter` function. - Bump `optree` version to `0.11.0` for C++ version of `tree_iter`. This PR is split from #120300. - #120300 Pull Request resolved: https://github.com/pytorch/pytorch/pull/123913 Approved by: https://github.com/zou3519	2024-04-16 06:02:08 +00:00
Florian	41613a0803	[Profiler][PrivateUse1] Profiler support PrivateUse1 key (#120556 ) Summary: 1.Package public headers of kineto if USE_KINETO so that they can be used by PrivateUse1 user. 2.Add PrivateUse1 key to ActivityType. 3. Support PrivateUse1 key in function deviceTypeFromActivity and _supported_activities. 4. Fix some bugs when processing profiler results. Co-authored-by: albanD <desmaison.alban@gmail.com> Co-authored-by: Aaron Shi <enye.shi@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/120556 Approved by: https://github.com/aaronenyeshi	2024-04-12 14:28:19 +00:00
Aidyn-A	a6080f79e9	[Build] Add linker script optimization (#121975 ) This PR adds a linker script optimization based on prioritized symbols that can be extracted from the profiles of popular workloads. The present linker script was generated to target ARM+CUDA and later can be extended if necessary. The reason we target ARM is shown below: > PyTorch and other applications that access more than 24x 2MB code regions in quick succession can result in performance bottlenecks in the CPU front-end. The link-time optimization improves executable code locality and improve performance. We recommend turning on the optimization always for PyTorch and other application that behaves similarly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121975 Approved by: https://github.com/ptrblck, https://github.com/atalman	2024-04-09 20:22:25 +00:00
Nikita Shulga	5b0ce8f334	[Wheel] Change libtorch_cpu OpenMP search path (#123417 ) To prevent delocate from double-packing it, which makes Torch wheels unusable with torch.compile out of the box Fixes https://github.com/pytorch/pytorch/issues/122705 Pull Request resolved: https://github.com/pytorch/pytorch/pull/123417 Approved by: https://github.com/atalman	2024-04-05 13:02:38 +00:00
Lei,zhenyuan	15bd81bfaf	expose transformer header in cmake and wheel (#122586 ) expose transformer header in cmake and wheel, some utils functions are used in nested transformer development on IPEX side Pull Request resolved: https://github.com/pytorch/pytorch/pull/122586 Approved by: https://github.com/drisspg, https://github.com/Neilblaze, https://github.com/gujinghui	2024-04-03 02:27:40 +00:00
dujinhang	9990d1bc22	Add 'profiler/python' to the package.' (#121892 ) Fixes #ISSUE_NUMBER expose the `py_symbolize` interface for use. thank you Pull Request resolved: https://github.com/pytorch/pytorch/pull/121892 Approved by: https://github.com/zdevito	2024-03-16 11:11:26 +00:00
Bin Bao	bd19d6d822	[AOTI] Use torchgen to generate C shim functions (#120513 ) Summary: The current C shim layer manually implements a C interface for a handful of ops. Obviously that's not scalable if we want to extend it to cover all aten ops. This new torchgen script automatically generates C shim interfaces for CPU and CUDA backends. The interface follows the same parameter passing rules as the current C shim layer, such as * Use plain C data types to pass parameters * Use AtenTensorHandle to pass at::Tensor * Use pointer type to pass optional parameter * Use pointer+length to pass list * Use device_type+device_index to pass device * When a parameter is a pointer of pointer, e.g. AtenTensorHandle**, the script generates either a list of optional values or an optional list of values https://gist.github.com/desertfire/83701532b126c6d34dae6ba68a1b074a is an example of the generated torch/csrc/inductor/aoti_torch/generated/c_shim_cuda.cpp file. The current version doesn't generate C shim wrappers for all aten ops, and probably generates more wrappers than needed on the other hand, but it should serve as a good basis. This PR by itself won't change AOTI codegen and thus won't introduce any FC breakage. The actual wrapper codegen changes will come in another PR with some version control flag to avoid FC breakage. Differential Revision: [D54258087](https://our.internmc.facebook.com/intern/diff/D54258087) Pull Request resolved: https://github.com/pytorch/pytorch/pull/120513 Approved by: https://github.com/jansel	2024-03-05 04:28:44 +00:00
atalman	cb812c9832	Add windows constraint to mkl package in wheel (#121014 ) Follow up on: https://github.com/pytorch/pytorch/pull/102604 Address this comment: https://github.com/pytorch/pytorch/pull/102604#discussion_r1419944305 Whl metadata for all wheels published to pypi must match, otherwise poetry install will fail see this comment: https://github.com/pytorch/pytorch/issues/88049#issuecomment-1302555269 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121014 Approved by: https://github.com/malfet	2024-03-04 20:54:26 +00:00
Kurman Karabukaev	b0cfa96e82	[Torchelastic][Logging] Pluggable logsspecs using python entrypoints and option to specify one by name. (#120942 ) Summary: Expose an option to users to specify name of the LogsSpec implementation to use. - Has to be defined in entrypoints under `torchrun.logs_specs` group. - Must implement LogsSpec defined in prior PR/diff. Test Plan: unit test+local tests Reviewed By: ezyang Differential Revision: D54180838 Pull Request resolved: https://github.com/pytorch/pytorch/pull/120942 Approved by: https://github.com/ezyang	2024-03-02 08:07:52 +00:00
Lei,zhenyuan	eee040c939	expose nested header to wheel (#120603 ) expose nested header to pytorch wheel, help with developers for reuse pytorch nested tensor related utils header inside wheel Pull Request resolved: https://github.com/pytorch/pytorch/pull/120603 Approved by: https://github.com/jbschlosser, https://github.com/gujinghui	2024-03-01 09:59:45 +00:00
Yu, Guangye	df40847486	Add xpu header to include/ATen/xpu (#120786 ) # Motivation Add xpu header file to `include/ATen/xpu` to make them public. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120786 Approved by: https://github.com/gujinghui, https://github.com/EikanWang, https://github.com/jgong5, https://github.com/albanD	2024-02-28 16:22:14 +00:00
Jeff Daily	0e6eee3c89	[ROCm] TunableOp (#114894 ) Some operations, such as GEMMs, could be implemented using more than one library or more than one technique. For example, a GEMM could be implemented for CUDA or ROCm using either the blas or blasLt libraries. Further, ROCm's rocblas and hipblaslt libraries allow the user to query for all possible algorithms and then choose one. How does one know which implementation is the fastest and should be chosen? That's what TunableOp provides. See the README.md for additional details. TunableOp was ported from onnxruntime starting from commit `08dce54266`. The content was significantly modified and reorganized for use within PyTorch. The files copied and their approximate new names or source content location within aten/src/ATen/cuda/tunable include the following: - onnxruntime/core/framework/tunable.h -> Tunable.h - onnxruntime/core/framework/tuning_context.h -> Tunable.h - onnxruntime/core/framework/tuning_context_impl.h -> Tunable.cpp - onnxruntime/core/providers/rocm/tunable/gemm_common.h -> GemmCommon.h - onnxruntime/core/providers/rocm/tunable/gemm_hipblaslt.h -> GemmHipblaslt.h - onnxruntime/core/providers/rocm/tunable/gemm_rocblas.h -> GemmRocblas.h - onnxruntime/core/providers/rocm/tunable/gemm_tunable.cuh -> TunableGemm.h - onnxruntime/core/providers/rocm/tunable/rocm_tuning_context.cc -> Tunable.cpp - onnxruntime/core/providers/rocm/tunable/util.h -> StreamTimer.h - onnxruntime/core/providers/rocm/tunable/util.cc -> StreamTimer.cpp Pull Request resolved: https://github.com/pytorch/pytorch/pull/114894 Approved by: https://github.com/xw285cornell, https://github.com/jianyuh	2024-02-14 19:03:49 +00:00
Aaron Gokaslan	f9200c8608	[BE][Ez]: FURB129: remove unneeded readlines() (#119796 ) Applies a refurb rule to remove any readlines() in a for loop iteration as it just creates a temporary list in memory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/119796 Approved by: https://github.com/ezyang	2024-02-13 21:21:22 +00:00
Nikita Shulga	60148f1761	[EZ] Set maximum supported version of Python as 3.12 (#119743 ) Doesn't really affect anything other than metadata on PyPI website Otherwise programming languages tab on https://pypi.org/project/torch/2.2.0/ shows supported version 3.8 to 3.10: <img width="239" alt="image" src="https://github.com/pytorch/pytorch/assets/2453524/e17f9982-8833-4cd8-b8d8-b2f1cb538548"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/119743 Approved by: https://github.com/kit1980, https://github.com/Skylion007	2024-02-13 06:56:32 +00:00
Yu, Guangye	a205e7bf56	[3/4] Intel GPU Runtime Upstreaming for Device (#116850 ) # Motivation According to [[1/4] Intel GPU Runtime Upstreaming for Device](https://github.com/pytorch/pytorch/pull/116019), As mentioned in [[RFC] Intel GPU Runtime Upstreaming](https://github.com/pytorch/pytorch/issues/114842), this third PR covers the changes under `libtorch_python`. # Design This PR primarily offers device-related APIs in python frontend, including - `torch.xpu.is_available` - `torch.xpu.device_count` - `torch.xpu.current_device` - `torch.xpu.set_device` - `torch.xpu.device` - `torch.xpu.device_of` - `torch.xpu.get_device_name` - `torch.xpu.get_device_capability` - `torch.xpu.get_device_properties` - ==================== - `torch.xpu._DeviceGuard` - `torch.xpu._is_compiled` - `torch.xpu._get_device` # Additional Context We will implement the support of lazy initialization in the next PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/116850 Approved by: https://github.com/EikanWang, https://github.com/jgong5, https://github.com/gujinghui, https://github.com/malfet	2024-02-01 12:31:26 +00:00
Zhengxu Chen	2d37a046e7	[export] Enforce serialization BC/FC with updater script. (#118424 ) Summary: This diff implements a mechanism for safely update torch.export serialization schema, aka schema.py, which is the API surface having the strongest compatibility guarantee. The diff is consist of 3 changes: - Added a script to "build" or "materialize" schema.py into a platform neutral format (yaml), which serves as the committed form of the seialization schema. - Added unittest to compare against schema.py and schema.yaml, so that it forces developers to execute the updater script when there is mismatch between two files. - Added a checker inside the updater script, so that all the compatible change will result in a minor version bump, and all the incompatible changes will result in a major version bump. torch.export's serialization BC/FC policy is (tentatively) documented here: https://docs.google.com/document/d/1EN7JrHbOPDhbpLDtiYG4_BPUs7PttpXlbZ27FuwKhxg/edit#heading=h.pup7ir8rqjhx , we will update the As noted in the code doc, people should be able to run the following command to update schema properly from now on: ``` python scripts/export/update_schema.py --prefix <path_to_torch_development_diretory> or buck run caffe2:export_update_schema -- --prefix /data/users/$USER/fbsource/fbcode/caffe2/ ``` Test Plan: buck test mode/opt caffe2/test:test_export -- -r test_schema buck run caffe2:update_export_schema -- --prefix /data/users/$USER/fbsource/fbcode/caffe2/ Differential Revision: D52971020 Pull Request resolved: https://github.com/pytorch/pytorch/pull/118424 Approved by: https://github.com/angelayi	2024-01-31 05:37:58 +00:00
feifan	3c77a3ed03	export ATen/native/sparse/*.h (#118274 ) Fixes #ISSUE_NUMBER We are trying to adapt `SparsePrivateUse1` in our code. However, I found that `sparse_stup` has not been exposed yet, which makes it impossible for me to implement stup and register. I hope that the header files in this directory can be exposed. @albanD Pull Request resolved: https://github.com/pytorch/pytorch/pull/118274 Approved by: https://github.com/ezyang	2024-01-25 22:47:39 +00:00
mantaionut	6784594532	Fix sparse windows on CPU with MKL (#102604 ) Fix https://github.com/pytorch/pytorch/issues/97352. This PR changes the way the linking to intel MKL is done and updating MKL on Windows to mkl-2021.4.0 . There are for both conda and pip packages MKL version with which you can link dynamically. mkl-devel contains the static versions of the dlls and MKL contains the needed dlls for the runtime. MKL dlls and static libs starting with 2021.4.0 have the version in their names( for MKL 2023 we have mkl_core.2.dll and for 2021.4.0 we have mkl_core.1.dll) so its possible to have multiple versions installed and it will work properly. For the wheel build, I added dependency for whell MKL and on conda a dependecy for the conda MKL and on libtorch I copied the MKL binaries in libtorch. In order to test this PR I have to use custom builder https://github.com/pytorch/builder/pull/1467 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102604 Approved by: https://github.com/IvanYashchuk, https://github.com/malfet	2024-01-23 17:41:18 +00:00
Nikita Shulga	c4eab49ded	[MacOS] Embed libomp.dylib/omp.h into MacOS wheel (#114816 ) To keep them on par with what we do on x86 And `omp.h` as it is needed for `torch.compile` on CPU Pull Request resolved: https://github.com/pytorch/pytorch/pull/114816 Approved by: https://github.com/atalman	2024-01-19 21:21:33 +00:00
Yu, Guangye	50049cfaa0	[1/4] Intel GPU Runtime Upstreaming for Device (#116019 ) # Motivation As mentioned in [[RFC] Intel GPU Runtime Upstreaming](https://github.com/pytorch/pytorch/issues/114842), The first runtime component we would like to upstream is `Device` which contains the device management functions of Intel GPU's runtime. To facilitate the code review, we split the code changes into 4 PRs. This is one of the 4 PRs and covers the changes under `c10`. # Design Intel GPU device is a wrapper of sycl device on which kernels can be executed. In our design, we will maintain a sycl device pool containing all the GPU devices of the current machine, and manage the status of the device pool by PyTorch. The thread local safe is considered in this design. The corresponding C++ files related to `Device` will be placed in c10/xpu folder. And we provide the c10 device runtime APIs, like - `c10::xpu::device_count` - `c10::xpu::set_device` - ... # Additional Context In our plan, 4 PRs should be submitted to PyTorch for `Device`: 1. for c10 2. for aten 3. for python frontend 4. for lazy initialization shared with CUDA Pull Request resolved: https://github.com/pytorch/pytorch/pull/116019 Approved by: https://github.com/gujinghui, https://github.com/jgong5, https://github.com/EikanWang, https://github.com/malfet	2024-01-12 07:36:25 +00:00
Edward Yang	b4a35632f9	Add function to materialize COW storages (#117053 ) Summary: From Kurt Mohler, see https://github.com/pytorch/pytorch/pull/113396 (manually imported due to ghimport problems) Test Plan: sandcastle, OSS CI Differential Revision: D52610522 Pull Request resolved: https://github.com/pytorch/pytorch/pull/117053 Approved by: https://github.com/malfet, https://github.com/kurtamohler	2024-01-10 15:34:16 +00:00
PyTorch MergeBot	9ac0e6971a	Revert "[1/4] Intel GPU Runtime Upstreaming for Device (#116019 )" This reverts commit b4cebe2c34242ceee3a1bc285f426662942a29ac. Reverted https://github.com/pytorch/pytorch/pull/116019 on behalf of https://github.com/malfet due to Broke internal and periodic buck builds, see https://github.com/pytorch/pytorch/actions/runs/7414664129/job/20176215868 ([comment](https://github.com/pytorch/pytorch/pull/116019#issuecomment-1879030285))	2024-01-05 17:36:39 +00:00
Yu, Guangye	b4cebe2c34	[1/4] Intel GPU Runtime Upstreaming for Device (#116019 ) # Motivation As mentioned in [[RFC] Intel GPU Runtime Upstreaming](https://github.com/pytorch/pytorch/issues/114842), The first runtime component we would like to upstream is `Device` which contains the device management functions of Intel GPU's runtime. To facilitate the code review, we split the code changes into 4 PRs. This is one of the 4 PRs and covers the changes under `c10`. # Design Intel GPU device is a wrapper of sycl device on which kernels can be executed. In our design, we will maintain a sycl device pool containing all the GPU devices of the current machine, and manage the status of the device pool by PyTorch. The thread local safe is considered in this design. The corresponding C++ files related to `Device` will be placed in c10/xpu folder. And we provide the c10 device runtime APIs, like - `c10::xpu::device_count` - `c10::xpu::set_device` - ... # Additional Context In our plan, 4 PRs should be submitted to PyTorch for `Device`: 1. for c10 2. for aten 3. for python frontend 4. for lazy initialization shared with CUDA Pull Request resolved: https://github.com/pytorch/pytorch/pull/116019 Approved by: https://github.com/gujinghui, https://github.com/jgong5, https://github.com/EikanWang, https://github.com/malfet	2024-01-04 17:35:04 +00:00

1 2 3 4 5 ...

796 Commits