pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Author	SHA1	Message	Date
jjsjann123	7282be3d91	Patch for nvfuser build (#97404 ) 1. Packaging nvfuser header for support c++ build against nvfuser; 2. Moving `#include <torch/csrc/jit/codegen/fuser/interface.h>` from `torch/csrc/jit/runtime/register_ops_utils.h` to `torch/csrc/jit/runtime/register_prim_ops_fulljit.cpp` to avoid missing header, since pytorch doesn't package `interface.h`; 3. Patching DynamicLibrary load of nvfuser to leak the handle, this avoids double de-allocation of `libnvfuser_codegen.so`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97404 Approved by: https://github.com/davidberard98	2023-03-28 23:36:08 +00:00
Han Qi (qihqi)	b895a0a675	[BE] Move flatbuffer related python C bindings to script_init (#97476 ) Summary: Extra C binding module for flatbuffer was introduced because not all dependencies of Pytorch want (or can) bundle in flatbuffer. However, flatbuffer is in by default now so this separate binding is not longer needed. Test Plan: existing unit tests Differential Revision: D44352583 Pull Request resolved: https://github.com/pytorch/pytorch/pull/97476 Approved by: https://github.com/dbort	2023-03-28 17:56:32 +00:00
PyTorch MergeBot	5170995b2a	Revert "Upgrade NVTX to NVTX3 (#90689 )" This reverts commit e64ddd1ab9d46cfc921c19269969ffc5cd7d6f6c. Reverted https://github.com/pytorch/pytorch/pull/90689 on behalf of https://github.com/osalpekar due to Build Failures due to not being able to find one nvtx3 header in FRL jobs: [D42332540](https://www.internalfb.com/diff/D42332540)	2023-03-24 18:16:06 +00:00
cyy	e64ddd1ab9	Upgrade NVTX to NVTX3 (#90689 ) Due to recent upgrade to CUDA 11, we can upgrade NVTX to NVTX3 as well, which is a header only library that can simplify the building system a lot. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90689 Approved by: https://github.com/soumith, https://github.com/malfet	2023-03-23 01:56:42 +00:00
Nikita Shulga	1ab883797a	[BE] Dedup hardcoded triton versions (#96580 ) Define it once in `.ci/docker/trition_version.txt` and use everywhere. Also, patch version defined in `triton/__init__.py` as currently it always returns `2.0.0` even if package name is `2.1.0` Followup after https://github.com/pytorch/pytorch/pull/95896 where version needed to be updated in 4+ places Pull Request resolved: https://github.com/pytorch/pytorch/pull/96580 Approved by: https://github.com/huydhn	2023-03-12 20:00:48 +00:00
PyTorch MergeBot	30b968f60d	Revert "[BE] Dedup hardcoded triton versions (#96580 )" This reverts commit c131e51e6248cf04135db317040b5be3ab944d41. Reverted https://github.com/pytorch/pytorch/pull/96580 on behalf of https://github.com/malfet due to Forgot to fix lint	2023-03-12 19:37:52 +00:00
Nikita Shulga	c131e51e62	[BE] Dedup hardcoded triton versions (#96580 ) Define it once in `.ci/docker/trition_version.txt` and use everywhere. Also, patch version defined in `triton/__init__.py` as currently it always returns `2.0.0` even if package name is `2.1.0` Followup after https://github.com/pytorch/pytorch/pull/95896 where version needed to be updated in 4+ places Pull Request resolved: https://github.com/pytorch/pytorch/pull/96580 Approved by: https://github.com/huydhn	2023-03-12 16:56:04 +00:00
Natalia Gimelshein	76cac70939	new triton main pin (#95896 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/95896 Approved by: https://github.com/jansel, https://github.com/malfet	2023-03-10 06:30:41 +00:00
cyy	6786a24fd2	fix some tiny code issues (#95757 ) This PR tries to fix: 1. a misspelled NDEBUG preprocessing condition. 2. get ride of all writable-strings warnings. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95757 Approved by: https://github.com/soulitzer	2023-03-01 23:27:32 +00:00
Wei Wang	46f092dc66	Add jinja2 as mandatory dependency (#95691 ) Should fix #95671 for nightly wheels issue. v2.0.0 RC does not need this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95691 Approved by: https://github.com/malfet	2023-03-01 17:28:55 +00:00
cyy	f27e09de04	Cleanup Windows warning suppression in CMake and fix some warnings in the source code (#94927 ) This PR do two things: 1. It moves some Windows warning suppression from various CMake files into the main CMakeList.txt, following the conventions of gcc and clang. 2. It fixes some Windows warnings in the source code. Most importantly, it fixes lots of dll warnings by adjusting C10_API to TORCH_API or TORCH_PYTHON_API. There are still some dll warnings because some TORCH_API functions are actually built as part of libtorch_python Pull Request resolved: https://github.com/pytorch/pytorch/pull/94927 Approved by: https://github.com/malfet	2023-02-27 19:22:20 +00:00
donnyyou	5d70ee93fa	Expose more headers for extensions. (#95447 ) Fixes #ISSUE_NUMBER Expose more headers for extensions of distributed methods. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95447 Approved by: https://github.com/ezyang	2023-02-27 18:59:40 +00:00
jjsjann123	21eb7f70f1	Nvfuser python API import fix (#94036 ) 1. Having nvfuser python API import working with both devel and upstream; 2. Add environment variable to allow custom nvfuser code base to be built with upstream pytorch core. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94036 Approved by: https://github.com/malfet, https://github.com/davidberard98	2023-02-16 20:10:40 +00:00
Douglas Lehr	77d1135566	[ROCm] Pyt 2.0 rocm staging (#94660 ) Add triton support for ROCm builds of PyTorch. * Enables inductor and dynamo when rocm is detected * Adds support for pytorch-triton-mlir backend * Adds check_rocm support for verify_dynamo.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/94660 Approved by: https://github.com/malfet	2023-02-15 06:15:18 +00:00
Wen Chen	69bcefceec	[ROCm] Added MIOpen header files to installation package for ROCm. (#92969 ) Added MIOpen header files to installation package for building Pytorch extensions that requires MIOpen as a dependency. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92969 Approved by: https://github.com/jeffdaily, https://github.com/malfet	2023-02-14 21:43:31 +00:00
Xuehai Pan	69e0bda999	[BE] Import `Literal`, `Protocol`, and `Final` from standard library `typing` as of Python 3.8+ (#94490 ) Changes: 1. `typing_extensions -> typing-extentions` in dependency. Use dash rather than underline to fit the [PEP 503: Normalized Names](https://peps.python.org/pep-0503/#normalized-names) convention. ```python import re def normalize(name): return re.sub(r"[-_.]+", "-", name).lower() ``` 2. Import `Literal`, `Protocal`, and `Final` from standard library as of Python 3.8+ 3. Replace `Union[Literal[XXX], Literal[YYY]]` to `Literal[XXX, YYY]`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94490 Approved by: https://github.com/ezyang, https://github.com/albanD	2023-02-09 19:17:49 +00:00
Soumith Chintala	76b999803a	add filelock as a dependency (#91607 ) `filelock` is a dependency now for inductor's caching mechanism and CPU backend. Add `filelock` as a dependency Fixes https://github.com/pytorch/pytorch/issues/93499 Pull Request resolved: https://github.com/pytorch/pytorch/pull/91607 Approved by: https://github.com/anijain2305, https://github.com/jansel	2023-02-01 17:30:55 +00:00
Nikita Shulga	5976f0bdfe	Set min supported Python version to 3.8 (#93155 ) Also, grep for `if sys.version_info .cond. (3, 8)` and replaces them with appropriate action. This is a last in a series of PRs that moved CI/CD away from testing PyTorch behavior against Python-3.7. Fixes https://github.com/pytorch/pytorch/issues/80513 Pull Request resolved: https://github.com/pytorch/pytorch/pull/93155 Approved by: https://github.com/huydhn	2023-01-29 18:28:46 +00:00
jjsjann123	c11b301bcd	[NVFUSER] refactor nvfuser build (#89621 ) This PR is the first step towards refactors the build for nvfuser in order to have the coegen being a standalone library. Contents inside this PR: 1. nvfuser code base has been moved to `./nvfuser`, from `./torch/csrc/jit/codegen/cuda/`, except for registration code for integration (interface.h/interface.cpp) 2. splits the build system so nvfuser is generating its own `.so` files. Currently there are: - `libnvfuser_codegen.so`, which contains the integration, codegen and runtime system of nvfuser - `nvfuser.so`, which is nvfuser's python API via pybind. Python frontend is now exposed via `nvfuser._C.XXX` instead of `torch._C._nvfuser` 3. nvfuser cpp tests is currently being compiled into `nvfuser_tests` 4. cmake is refactored so that: - nvfuser now has its own `CMakeLists.txt`, which is under `torch/csrc/jit/codegen/cuda/`. - nvfuser backend code is not compiled inside `libtorch_cuda_xxx` any more - nvfuser is added as a subdirectory under `./CMakeLists.txt` at the very end after torch is built. - since nvfuser has dependency on torch, the registration of nvfuser at runtime is done via dlopen (`at::DynamicLibrary`). This avoids circular dependency in cmake, which will be a nightmare to handle. For details, look at `torch/csrc/jit/codegen/cuda/interface.cpp::LoadingNvfuserLibrary` Future work that's scoped in following PR: - Currently since nvfuser codegen has dependency on torch, we need to refactor that out so we can move nvfuser into a submodule and not rely on dlopen to load the library. @malfet - Since we moved nvfuser into a cmake build, we effectively disabled bazel build for nvfuser. This could impact internal workload at Meta, so we need to put support back. cc'ing @vors Pull Request resolved: https://github.com/pytorch/pytorch/pull/89621 Approved by: https://github.com/davidberard98	2023-01-26 02:50:44 +00:00
Driss Guessous	4bc0491752	Add USE_FLASH_ATTENTION flag to setup.py (#92903 ) # Summary Adds documentation to setup.py for USE_FLASH_ATTENTION=0 disabling to decrease build times. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92903 Approved by: https://github.com/cpuhrsch, https://github.com/bdhirsh	2023-01-24 22:59:51 +00:00
Jason Ansel	7c1c239db1	[inductor] Rewrite Triton templates + epilogue fusion (retry) (#91575 ) This reverts commit 94262efc7d381ace82aa74ed2f5f5ec76f8fca95 to reland #91105 / #90738. Fixes https://github.com/pytorch/torchdynamo/issues/2015 Pull Request resolved: https://github.com/pytorch/pytorch/pull/91575 Approved by: https://github.com/ngimel	2023-01-11 00:08:03 +00:00
Adrian Ostrowski	d0a4e2e782	Don't remove files across the whole OS on clean (#91503 ) setup.py clean now won't remove paths matching .gitignore patterns across the entire OS. Instead, now only files from the repository will be removed. `/build_*` had to be removed from .gitignore because with the wildcard fixed, build_variables.bzl file was deleted on cleanup. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91503 Approved by: https://github.com/soumith	2023-01-06 05:13:51 +00:00
Wei Wang	cce577b391	Revert D42257039: Multisect successfully blamed D42257039 for test or build failures (#91548 ) Summary: This diff is reverting D42257039 D42257039 has been identified to be causing the following test or build failures: Tests affected: - [assistant/neural_dm/rl/modules/tests:action_mask_classifier_test - main](https://www.internalfb.com/intern/test/281475048940766/) Here's the Multisect link: https://www.internalfb.com/intern/testinfra/multisect/1493969 Here are the tasks that are relevant to this breakage: T93770103: 1 test started failing for oncall assistant_multimodal in the last 2 weeks We're generating a revert to back out the changes in this diff, please note the backout may land if someone accepts it. Test Plan: NA Reviewed By: weiwangmeta Differential Revision: D42272391 Pull Request resolved: https://github.com/pytorch/pytorch/pull/91548 Approved by: https://github.com/kit1980	2023-01-02 21:08:30 +00:00
Nikita Shulga	bc92444b34	Rename `torchtriton` (#91539 ) to `pytorch-triton` Pull Request resolved: https://github.com/pytorch/pytorch/pull/91539 Approved by: https://github.com/seemethere, https://github.com/soumith	2022-12-30 22:49:17 +00:00
Jasha	1c681f4bd8	Fix distutils.LooseVersion DeprecationWarning (#88524 ) Fixes #84712 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88524 Approved by: https://github.com/MaKaNu, https://github.com/milutter, https://github.com/soumith	2022-12-27 11:46:00 +00:00
Mengwei Liu	2f154f68ea	[torchgen] Add CI job to make sure torchgen works for Executorch op registration (#89596 ) ## Job Test running on most CI jobs. ## Test binary * `test_main.cpp`: entry for gtest * `test_operator_registration.cpp`: test cases for gtest ## Helper sources * `operator_registry.h/cpp`: simple operator registry for testing purpose. * `Evalue.h`: a boxed data type that wraps ATen types, for testing purpose. * `selected_operators.yaml`: operators Executorch care about so far, we should cover all of them. ## Templates * `NativeFunctions.h`: for generating headers for native functions. (not compiled in the test, since we will be using `libtorch`) * `RegisterCodegenUnboxedKernels.cpp`: for registering boxed operators. * `Functions.h`: for declaring operator C++ APIs. Generated `Functions.h` merely wraps `ATen/Functions.h`. ## Build files * `CMakeLists.txt`: generate code to register ops. * `build.sh`: driver file, to be called by CI job. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89596 Approved by: https://github.com/ezyang	2022-12-21 03:07:32 +00:00
PyTorch MergeBot	94262efc7d	Revert "[inductor] Rewrite Triton templates + epilogue fusion (retry) (#91105 )" This reverts commit d6dd2e97da619319a103d1061290fe33ce33b6a4. Reverted https://github.com/pytorch/pytorch/pull/91105 on behalf of https://github.com/atalman due to Broke internal builds	2022-12-21 00:02:38 +00:00
Jason Ansel	d6dd2e97da	[inductor] Rewrite Triton templates + epilogue fusion (retry) (#91105 ) https://github.com/pytorch/pytorch/pull/90738 seems a bit borked. ghimport fails on it, and I unlinked it from the Phabricator diff, but it still won't land. This is an exact copy that PR without using ghstack. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91105 Approved by: https://github.com/ngimel	2022-12-20 02:38:23 +00:00
atalman	3bd37ff2d5	Removing invalid git option when updating submodules (#91132 ) Same as this: https://github.com/pytorch/builder/pull/1246 Related to following git commit: `51243f9f0f` Which makes jobs = 0 invalid. Nightlies for MacOS are failing because of this issue: https://github.com/pytorch/pytorch/actions/runs/3729522653/jobs/6325523414 Pull Request resolved: https://github.com/pytorch/pytorch/pull/91132 Approved by: https://github.com/kit1980, https://github.com/huydhn, https://github.com/malfet, https://github.com/seemethere	2022-12-20 02:17:02 +00:00
Ram Rachum	351d73b97f	Fix exception causes all over the codebase (#90271 ) This is the continuation to #90134 and hopefully the final PR in this series. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90271 Approved by: https://github.com/kit1980	2022-12-07 04:29:00 +00:00
Alexander Grund	fdb2dd113d	Install missing VSX headers (POWER) (#85547 ) E.g. `test_cpp_extensions_aot_ninja` fails as it includes `vec.h` which requires the vec/vsx/* headers and `sleef.h`. The latter is also required for AVX512 builds on non MSVC compilers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85547 Approved by: https://github.com/kit1980	2022-11-24 01:52:11 +00:00
Jacob Hayes	2e358cc98f	Add platform markers for linux only extra_install_requires (#88826 ) Fixes #88049 https://github.com/pytorch/pytorch/pull/85097 added new extra dependencies on `nvidia-*`. They are linux (GPU) only packages, but were not marked as such, causing issues installing pytorch 1.13 via Poetry (and possibly other tools that follow PyPI's metadata API) on non-Linux systems. This "fixes" the issue by adding the `; platform_system = 'Linux'` marker on these dependencies, but the main problem of different metadata for different wheels is a [somewhat larger issue](https://github.com/pytorch/pytorch/issues/88049#issuecomment-1302555269). https://github.com/pytorch/pytorch/pull/85097 used `;` as a delimiter for splitting the different deps, but that is the delimiter used in markers, so I changed to split on `\|`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88826 Approved by: https://github.com/neersighted, https://github.com/lalmei, https://github.com/malfet	2022-11-18 14:09:21 +00:00
Wang, Eikan	6541e51ffd	Explicit vectorization support for TorchInductor (#87068 ) In this PR, we replace OMP SIMD with `aten::vec` to optimize TorchInductor vectorization performance. Take `res=torch.exp(torch.add(x, y))` as the example. The generated code is as follows if `config.cpp.simdlen` is 8. ```C++ extern "C" void kernel(const float* __restrict__ in_ptr0, const float* __restrict__ in_ptr1, float* __restrict__ out_ptr0, const long ks0, const long ks1) { #pragma omp parallel num_threads(48) { #pragma omp for for(long i0=0; i0<((ks0ks1) / 8); ++i0) { auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8i0); auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8i0); auto tmp2 = tmp0 + tmp1; auto tmp3 = tmp2.exp(); tmp3.store(out_ptr0 + 8i0); } #pragma omp for simd simdlen(4) for(long i0=8(((ks0ks1) / 8)); i0<ks0*ks1; ++i0) { auto tmp0 = in_ptr0[i0]; auto tmp1 = in_ptr1[i0]; auto tmp2 = tmp0 + tmp1; auto tmp3 = std::exp(tmp2); out_ptr0[i0] = tmp3; } } } ``` The major pipeline is as follows. - Check whether the loop body could be vectorized by `aten::vec`. The checker consists of two parts. [One ](`bf66991fc4/torch/_inductor/codegen/cpp.py (L702)`)is to check whether all the `ops` have been supported. The [other one](`355326faa3/torch/_inductor/codegen/cpp.py (L672)`) is to check whether the data access could be vectorized. - [`CppSimdVecKernelChecker`](`355326faa3/torch/_inductor/codegen/cpp.py (L655)`) - Create the `aten::vec` kernel and original omp simd kernel. Regarding the original omp simd kernel, it serves for the tail loop when the loop is vectorized. - [`CppSimdVecKernel`](`355326faa3/torch/_inductor/codegen/cpp.py (L601)`) - [`CppSimdVecOverrides`](`355326faa3/torch/_inductor/codegen/cpp.py (L159)`): The ops that we have supported on the top of `aten::vec` - Create kernel - [`aten::vec` kernel](`355326faa3/torch/_inductor/codegen/cpp.py (L924)`) - [`Original CPP kernel - OMP SIMD`](`355326faa3/torch/_inductor/codegen/cpp.py (L929)`) - Generate code - [`CppKernelProxy`](`355326faa3/torch/_inductor/codegen/cpp.py (L753)`) is used to combine the `aten::vec` kernel and original cpp kernel - [Vectorize the most inner loop](`355326faa3/torch/_inductor/codegen/cpp.py (L753)`) - [Generate code](`355326faa3/torch/_inductor/codegen/cpp.py (L821)`) Next steps: - [x] Support reduction - [x] Vectorize the tail loop with `aten::vec` - [ ] Support BF16 - [ ] Optimize the loop condition and loop index calculation by replacing `div` with `add` Pull Request resolved: https://github.com/pytorch/pytorch/pull/87068 Approved by: https://github.com/jgong5, https://github.com/jansel	2022-11-07 06:24:14 +00:00
Radek Bartoň	ba26bc0fc2	Fix random "C1041: cannot open program database" errors when compiling on Windows (#88084 ) Adds `/FS` option to `CMAKE_CXX_FLAGS` and `CMAKE_CUDA_FLAGS`. So far I've encountered this kind of errors: ``` C:\Users\MyUser\AppData\Local\Temp\tmpxft_00004728_00000000-7_cuda.cudafe1.cpp: fatal error C1041: cannot open program database 'C:\Projects\pytorch\build\third_party\gloo\gloo\CMakeFiles\gloo_cuda.dir\vc140.pdb'; if multiple CL.EXE write to the same .PDB file, please use /FS ``` when building with VS 2022. cc @peterjc123 @mszhanyi @skyline75489 @nbcsm Related issues: - https://github.com/pytorch/pytorch/issues/87691 - https://github.com/pytorch/pytorch/issues/39989 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88084 Approved by: https://github.com/ezyang	2022-10-31 21:11:16 +00:00
Nikita Shulga	e7b854fae9	[BE] Do not package caffe2 in wheel (#87986 ) If PyTorch is build without caffe2 integration, do not package unusable .py files/headers Same is true about functorch - don't package it unless building with `functorch` (although, I wonder if we should remove this option at some point in the future) Followup after https://github.com/pytorch/builder/pull/1181 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87986 Approved by: https://github.com/seemethere	2022-10-30 04:31:45 +00:00
atalman	4f2d869095	Fix distributed issue by including distributed files (#87615 ) This fixes regression in distributed headers installation. Caused by following PR: https://github.com/pytorch/pytorch/pull/85953 which removed the inclusions Fixes #87173 Test plan from wheel build by this CI: https://github.com/pytorch/pytorch/actions/runs/3314742519 ``` [ec2-user@ip-10-0-9-132 c10d]$ pwd /home/ec2-user/actions-runner/_work/_temp/artifacts/torch/include/torch/csrc/distributed/c10d [ec2-user@ip-10-0-9-132 c10d]$ ls -las total 300 4 drwxr-xr-x 2 ec2-user ec2-user 4096 Oct 24 19:12 . 0 drwxr-xr-x 4 ec2-user ec2-user 29 Oct 24 19:12 .. 12 -rw-r--r-- 1 ec2-user ec2-user 9051 Oct 24 17:28 Backend.hpp 4 -rw-r--r-- 1 ec2-user ec2-user 216 Oct 24 17:28 c10d.h 4 -rw-r--r-- 1 ec2-user ec2-user 3880 Oct 24 17:28 comm.hpp 4 -rw-r--r-- 1 ec2-user ec2-user 604 Oct 24 17:28 debug.h 4 -rw-r--r-- 1 ec2-user ec2-user 1717 Oct 24 17:28 default_comm_hooks.hpp 4 -rw-r--r-- 1 ec2-user ec2-user 1316 Oct 24 17:28 error.h 4 -rw-r--r-- 1 ec2-user ec2-user 962 Oct 24 17:28 exception.h 4 -rw-r--r-- 1 ec2-user ec2-user 1461 Oct 24 17:28 FileStore.hpp 4 -rw-r--r-- 1 ec2-user ec2-user 771 Oct 24 17:28 GlooDeviceFactory.hpp 4 -rw-r--r-- 1 ec2-user ec2-user 1154 Oct 24 17:28 HashStore.hpp 4 -rw-r--r-- 1 ec2-user ec2-user 4058 Oct 24 17:28 logger.hpp 4 -rw-r--r-- 1 ec2-user ec2-user 2059 Oct 24 17:28 logging.h 8 -rw-r--r-- 1 ec2-user ec2-user 7979 Oct 24 17:28 NCCLUtils.hpp 4 -rw-r--r-- 1 ec2-user ec2-user 2756 Oct 24 17:28 Ops.hpp 4 -rw-r--r-- 1 ec2-user ec2-user 1814 Oct 24 17:28 ParamCommsUtils.hpp 4 -rw-r--r-- 1 ec2-user ec2-user 1478 Oct 24 17:28 PrefixStore.hpp 16 -rw-r--r-- 1 ec2-user ec2-user 13235 Oct 24 17:28 ProcessGroupGloo.hpp 12 -rw-r--r-- 1 ec2-user ec2-user 11298 Oct 24 17:28 ProcessGroup.hpp 12 -rw-r--r-- 1 ec2-user ec2-user 8645 Oct 24 17:28 ProcessGroupMPI.hpp 28 -rw-r--r-- 1 ec2-user ec2-user 26526 Oct 24 17:28 ProcessGroupNCCL.hpp 4 -rw-r--r-- 1 ec2-user ec2-user 3805 Oct 24 17:28 ProcessGroupRoundRobin.hpp 12 -rw-r--r-- 1 ec2-user ec2-user 10361 Oct 24 17:28 ProcessGroupUCC.hpp 8 -rw-r--r-- 1 ec2-user ec2-user 5062 Oct 24 17:28 ProcessGroupWrapper.hpp 8 -rw-r--r-- 1 ec2-user ec2-user 4201 Oct 24 17:28 PyProcessGroup.hpp 4 -rw-r--r-- 1 ec2-user ec2-user 1072 Oct 24 17:28 python_comm_hook.h 24 -rw-r--r-- 1 ec2-user ec2-user 23859 Oct 24 17:28 reducer.hpp 4 -rw-r--r-- 1 ec2-user ec2-user 2330 Oct 24 17:28 reducer_timer.hpp 4 -rw-r--r-- 1 ec2-user ec2-user 1683 Oct 24 17:28 sequence_num.hpp 4 -rw-r--r-- 1 ec2-user ec2-user 2108 Oct 24 17:28 socket.h 4 -rw-r--r-- 1 ec2-user ec2-user 2589 Oct 24 17:28 Store.hpp 4 -rw-r--r-- 1 ec2-user ec2-user 3264 Oct 24 17:28 TCPStore.hpp 8 -rw-r--r-- 1 ec2-user ec2-user 6944 Oct 24 17:28 TraceUtils.h 8 -rw-r--r-- 1 ec2-user ec2-user 4539 Oct 24 17:28 Types.hpp 4 -rw-r--r-- 1 ec2-user ec2-user 580 Oct 24 17:28 UCCForNCCL.hpp 4 -rw-r--r-- 1 ec2-user ec2-user 2301 Oct 24 17:28 UCCTracing.hpp 8 -rw-r--r-- 1 ec2-user ec2-user 4933 Oct 24 17:28 UCCUtils.hpp 4 -rw-r--r-- 1 ec2-user ec2-user 584 Oct 24 17:28 UnixSockUtils.hpp 24 -rw-r--r-- 1 ec2-user ec2-user 20796 Oct 24 17:28 Utils.hpp 4 -rw-r--r-- 1 ec2-user ec2-user 575 Oct 24 17:28 WinSockUtils.hpp 8 -rw-r--r-- 1 ec2-user ec2-user 4259 Oct 24 17:28 Work.hpp ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/87615 Approved by: https://github.com/malfet	2022-10-24 19:38:07 +00:00
Nikita Shulga	dfe3fc028c	[CI] Add triton wheels build workflow (#87234 ) Also, add `torchtriton` and `jinja2` as extra `dynamo` dependency to PyTorch wheels, Version packages as first 10 characters of pinned repo hash and make `torch[dynamo]` wheel depend on the exact version it was build against. TODO: Automate uploading to nightly wheels storage Pull Request resolved: https://github.com/pytorch/pytorch/pull/87234 Approved by: https://github.com/msaroufim	2022-10-19 03:35:16 +00:00
Kevin Tse	0cb273b5d9	[DataPipe] Fixing interface generation in setup.py (#87081 ) Based on the artifact generated on this [page](https://hud.pytorch.org/pr/87081), I downloaded [[s3] linux-focal-py3.7-clang7-asan/artifacts.zip](https://gha-artifacts.s3.amazonaws.com/pytorch/pytorch/3266430083/linux-focal-py3.7-clang7-asan/artifacts.zip) (1.14 GB) and unpacked it. `torch.utils.data.datapipes.datapipe.pyi` does exist. I believe this means the file should be part of the distribution. I also did `wheel unpack ***.whl` to confirm the existence of the file. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87081 Approved by: https://github.com/ejguan	2022-10-17 21:45:33 +00:00
PyTorch MergeBot	8eb579e362	Revert "[Profiler] Move legacy profiler out of `torch/csrc/autograd` (#85512 )" This reverts commit 157a3d2a7cd25779258f3e3dcef14633f1930103. Reverted https://github.com/pytorch/pytorch/pull/85512 on behalf of https://github.com/DanilBaibak due to Due to files were deleted, the internal build failed. Please re-submit via codev.	2022-10-14 14:56:59 +00:00
Taylor Robie	157a3d2a7c	[Profiler] Move legacy profiler out of `torch/csrc/autograd` (#85512 ) The legacy profiler is an eyesore in the autograd folder. At this point the implementation is almost completely decoupled from the rest of profiler, and it is in maintaince mode pending deprecation. As a result, I'm moving it to `torch/csrc/profiler/standalone`. Unfortuantely BC requires that the symbols remain in `torch::autograd::profiler`, so I've put some basic forwarding logic in `torch/csrc/autograd/profiler.h`. One strange bit is that `profiler_legacy.h` forward declares `torch::autograd::Node`, but doesn't seem to do anything with it. I think we can delete it, but I want to test to make sure. (Note: this should not land until https://github.com/pytorch/torchrec/pull/595 is landed.) Differential Revision: [D39108648](https://our.internmc.facebook.com/intern/diff/D39108648/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85512 Approved by: https://github.com/aaronenyeshi	2022-10-14 05:38:48 +00:00
Taylor Robie	b8f14b7877	[Profiler][Minor] Group and consolidate stub APIs (#85510 ) There is a concept in profiler of a stub that wraps a profiling API. It was introduced for CUDA profiling before Kineto, and ITT has adopted it to call into VTune APIs. However for the most part we don't really interact with them when developing the PyTorch profiler. Thus it makes sense to unify the fallback registration mechanism and create a subfolder to free up real estate in the top level `torch/csrc/profiler` directory. Differential Revision: [D39108647](https://our.internmc.facebook.com/intern/diff/D39108647/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85510 Approved by: https://github.com/aaronenyeshi	2022-10-14 05:38:46 +00:00
Jason Ansel	c7c09722ad	Move TorchDynamo into PyTorch core (#86461 ) Context: https://github.com/pytorch/torchdynamo/issues/1588 This PR moves [TorchDynamo](https://github.com/pytorch/torchdynamo) and TorchInductor into PyTorch core. - `torchdynamo` becomes `torch._dynamo` - `torchinductor` becomes `torch._inductor` This PR was generated by running `copy_to_core.sh` in https://github.com/pytorch/torchdynamo/pull/1538 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86461 Approved by: https://github.com/voznesenskym	2022-10-13 23:18:06 +00:00
Jason Ansel	f1fdb6efbd	Manual changes for moving dynamo to core (#86621 ) This is the subset of the changes in #86461 not auto-generated by `copy_to_core.sh`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86621 Approved by: https://github.com/albanD	2022-10-11 23:01:21 +00:00
Sahan Paliskara	936e93058b	Delete torch::deploy from pytorch core (#85953 ) As we have migrated torch::deploy over to https://github.com/pytorch/multipy, we can now delete it from pytorch core as ongoing development will happen there. This PR was created due to syncing issues with https://github.com/pytorch/pytorch/pull/85443 which is where the review history can be found. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85953 Approved by: https://github.com/seemethere, https://github.com/malfet	2022-10-06 07:20:16 +00:00
Min Si	089a64e99e	Install c10d headers with absolute path (#86257 ) https://github.com/pytorch/pytorch/pull/85780 updated all c10d headers in pytorch to use absolute path following the other distributed components. However, the headers were still copied to `${TORCH_INSTALL_INCLUDE_DIR}/torch`, thus external extentions still have to reference the c10d headers as `<c10d/*.h>`, making the usage inconsistent (the only exception was c10d/exception.h, which was copied to `${TORCH_INSTALL_INCLUDE_DIR}/torch/csrc/distributed/c10d`). This patch fixes the installation step to copy all c10d headers to `${TORCH_INSTALL_INCLUDE_DIR}/torch/csrc/distributed/c10d`, thus external extensions can consistently reference c10d headers with the absolute path. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86257 Approved by: https://github.com/kumpera	2022-10-05 20:02:05 +00:00
Jane Xu	3cdf621fe5	Add opt-einsum to CI (#85574 ) Depends on https://github.com/pytorch/pytorch/pull/84890. This PR adds opt_einsum to CI, enabling path optimization for the multi-input case. It also updates the installation sites to install torch with einsum, but those are mostly to make sure it would work on the user's end (as opt-einsum would have already been installed in the docker or in prior set up steps). This PR also updates the windows build_pytorch.bat script to use the same bdist_wheel and install commands as on Linux, replacing the `setup.py install` that'll become deprecated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85574 Approved by: https://github.com/huydhn, https://github.com/soulitzer	2022-09-29 14:28:55 +00:00
Jane Xu	e7e1cd945f	Add path optimize kwarg to einsum (#84890 ) ## This PR seeks to: - [x] add c++ support for an optimize path - [x] add python opt_einsum path passthrough - [x] add opt_einsum to OSS requirements, but a soft one - [x] show benchmark results here Additional things I've explored + their conclusions: - Delaying the summing over dimensions => added! - The idea here is to not incur kernel calls to `sum` as we try to early sum out in einsum. Thus, we collect all the dimensions that need to be summed together in one contraction + sum at the end instead of summing as we go. While this optimization didn't feel like it made things faster for the random cases we've selected (they all summed 1 dim per contraction), it is a good principle and would help more common use cases that would reduce multiple dimensions at a time (like `bxy,xyi,xyj->bij`). - Caching contract_path based on equation and tensor sizes => dropped :( - The benchmarks were strictly worse for all the cases, and, from scanning the use cases, I observed people do not often call einsum on the same equation/tensor order enough for caching to be justified. I do think caching can be effective in the future, but it would require further investigation. ## Not a part of this PR (but are next steps): - adding opt_einsum package to OSS CI - adding it to internal CI - potentially adding a kwarg path argument to the python API -- if the path is given, we wouldn't have to spend time calculating it, but there would be some time lost validating user input. ## Testing: - Added more tests to CI ## Benchmarking: TL;DRs - torch.einsum with opt_einsum is a definite win for the production case. - torch.einsum with opt_einsum installed is consistently fast, but has an overhead of needing to find the path. If the path is already found/optimal, it will be slightly slower. - The einsum overhead decreases for bigger dimensions. - torch.einsum without opt_einsum installed is comparable to before this commit, with occasional slowness potentially due to not reshaping/squeezing as we contract until the end. - For many of the random generated cases, the dimensions were too similar and small where an optimal order wasn't that much more optimal than just going left to right. However, in production, dimensions are commonly quite distinct (batch size will be small, but the data will be huge). - torch.einsum opt is comparable (slightly faster overall) compared to numpy.einsum opt for the cpu case. This is interesting given that torch.einsum currently spends time computing the path, but numpy.einsum takes it as input. - torch.einsum opt is significantly faster than numpy.einsum opt for the gpu case. This is because numpy doesn't take advantage of GPUs. The following benchmarks were done on an A100 GPU and Linux CPUs. The line in the first chart separates GPU (on top) from CPU, and the line in the second graph separates CPU (on top) and then GPU. Sorry it's flipped 😛 . Production example (see [colab benchmark](https://colab.research.google.com/drive/1V2s4v1dOOKwRvp5T_DC-PNUosOV9FFJx?authuser=1#scrollTo=WZoQkC8Mdt6I) for more context): <img width="1176" alt="image" src="https://user-images.githubusercontent.com/31798555/192012636-9a68bfa7-2601-43b1-afeb-b4e0877db6a4.png"> Randomly generated examples (the same ones as in https://github.com/pytorch/pytorch/pull/60191) <img width="1176" alt="image" src="https://user-images.githubusercontent.com/31798555/192012804-1c639595-b3e6-48c9-a385-ad851c13e1c2.png"> Open below to see old + not super relevant benchmarking results: <details> Benchmark results BEFORE this PR (on Linux -- I will update devices so they are consistent later): <img width="776" alt="image" src="https://user-images.githubusercontent.com/31798555/190807274-18f71fce-556e-47f4-b18c-e0f7d0c0d5aa.png"> Benchmark results with the code on this PR (on my x86 mac): For the CPU internal use case -- ![image](https://user-images.githubusercontent.com/31798555/190801376-6f591b00-cebd-4ca7-bb23-ae8f17f1634e.png) For the general use case -- It looks like numpy opt still does better in several of these random cases, but torch einsum opt is consistently faster than torch.einsum. ![image](https://user-images.githubusercontent.com/31798555/190811730-fbb6797d-af59-4f5a-92da-ba4103372014.png) <details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/84890 Approved by: https://github.com/albanD, https://github.com/soulitzer	2022-09-24 03:47:36 +00:00
atalman	eb94df28c7	Use pip install cu117 (#85097 ) Creates new wheel workflow specific to CUDA 11.7 that does not bundle the cudnn and cublas. Workflow: https://github.com/pytorch/pytorch/actions/runs/3094622781 New Package: manywheel-py3_10-cuda11_7-with-pypi-cudnn \| 843 MB Old Package: manywheel-py3_10-cuda11_7 \| 1.65 GB Testing workflow: [manywheel-py3_7-cuda11_7-with-pypi-cudnn-build / build](https://github.com/pytorch/pytorch/actions/runs/3091145546/jobs/5000867662#logs): ``` Bundling without cudnn and cublas. + DEPS_LIST=("/usr/local/cuda/lib64/libcudart.so.11.0" "/usr/local/cuda/lib64/libnvToolsExt.so.1" "/usr/local/cuda/lib64/libnvrtc.so.11.2" "/usr/local/cuda/lib64/libnvrtc-builtins.so.11.7" "$LIBGOMP_PATH") + DEPS_SONAME=("libcudart.so.11.0" "libnvToolsExt.so.1" "libnvrtc.so.11.2" "libnvrtc-builtins.so.11.7" "libgomp.so.1") ..... pytorch_extra_install_requirements: nvidia-cuda-runtime-cu11, nvidia-cudnn-cu11, nvidia-cublas-cu11 ``` [manywheel-py3_7-cuda11_7-build / build](https://github.com/pytorch/pytorch/actions/runs/3091145546/jobs/5000863250#logs) ``` Bundling with cudnn and cublas. + DEPS_LIST=("/usr/local/cuda/lib64/libcudart.so.11.0" "/usr/local/cuda/lib64/libnvToolsExt.so.1" "/usr/local/cuda/lib64/libnvrtc.so.11.2" "/usr/local/cuda/lib64/libnvrtc-builtins.so.11.7" "/usr/local/cuda/lib64/libcudnn_adv_infer.so.8" "/usr/local/cuda/lib64/libcudnn_adv_train.so.8" "/usr/local/cuda/lib64/libcudnn_cnn_infer.so.8" "/usr/local/cuda/lib64/libcudnn_cnn_train.so.8" "/usr/local/cuda/lib64/libcudnn_ops_infer.so.8" "/usr/local/cuda/lib64/libcudnn_ops_train.so.8" "/usr/local/cuda/lib64/libcudnn.so.8" "/usr/local/cuda/lib64/libcublas.so.11" "/usr/local/cuda/lib64/libcublasLt.so.11" "$LIBGOMP_PATH") + DEPS_SONAME=("libcudart.so.11.0" "libnvToolsExt.so.1" "libnvrtc.so.11.2" "libnvrtc-builtins.so.11.7" "libcudnn_adv_infer.so.8" "libcudnn_adv_train.so.8" "libcudnn_cnn_infer.so.8" "libcudnn_cnn_train.so.8" "libcudnn_ops_infer.so.8" "libcudnn_ops_train.so.8" "libcudnn.so.8" "libcublas.so.11" "libcublasLt.so.11" "libgomp.so.1") ``` cc: @malfet @ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/85097 Approved by: https://github.com/malfet	2022-09-21 16:30:25 +00:00
Nikita Shulga	d05a11337c	[CMake] Add functorch target (#83464 ) Move functorch/functorch into `functorch` folder - Add functorch/CMakeLists.txt that adds `functorch` native python exension - Modify `setup.py` to package pytorch and functorch together into a single wheel - Modify `functorch.__version__` is not equal to that of `torch.__version__` - Add dummy `functorch/setup.py` file for the projects that still want to build it Differential Revision: [D39058811](https://our.internmc.facebook.com/intern/diff/D39058811) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83464 Approved by: https://github.com/zou3519	2022-09-14 00:05:33 +00:00
Kento Nozawa	5238404f4d	Increment `version_range_max` (#84815 ) Python 3.10 should be added as a listing in `Programming Language` on https://pypi.org/project/torch/: <img width="238" alt="Screenshot 2022-09-11 at 2 48 01" src="https://user-images.githubusercontent.com/7121753/189495599-72bd6a28-4248-4e4e-8194-b5b1f9e984e2.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/84815 Approved by: https://github.com/malfet	2022-09-12 21:38:16 +00:00

... 3 4 5 6 7 ...

893 Commits