pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 12:54:11 +08:00

Author	SHA1	Message	Date
PyTorch MergeBot	564d00f364	Revert "Fix clang-tidy warnings in Caffe2 code (#134935 )" This reverts commit 7cfd23636c8fa6fcbb8bf3ea34e15b847ec9ad9d. Reverted https://github.com/pytorch/pytorch/pull/134935 on behalf of https://github.com/izaitsevfb due to breaks internal builds, caffe2 is still used internally ([comment](https://github.com/pytorch/pytorch/pull/134935#issuecomment-2349368152))	2024-09-13 16:42:37 +00:00
cyy	7cfd23636c	Fix clang-tidy warnings in Caffe2 code (#134935 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/134935 Approved by: https://github.com/ezyang	2024-09-12 03:27:09 +00:00
Janet Yang	26ec06e45d	[amd][lowering] hipify shim v2 headers (#134689 ) Summary: The default c_shim version was switched to 2 for HIP in D60674018. This results in some linking errors where shim function symbols are missing from the compiled .so file (eg. P1551186492) when building lowering benchmark scripts since the required files aren't included. Hipify the shim v2 generated header files as well since they're needed during codegen when the buck binaries are executed. Reviewed By: frank-wei, zoranzhao, henryoier Differential Revision: D61865202 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134689 Approved by: https://github.com/zoranzhao	2024-08-28 21:53:24 +00:00
cyyever	c638a40a93	[Caffe2] Remove unused AVX512 code (#133160 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/133160 Approved by: https://github.com/albanD	2024-08-23 23:16:16 +00:00
PyTorch MergeBot	fa1d7b0262	Revert "Remove unused Caffe2 macros (#132979 )" This reverts commit da65cfbdea4f1f2176f6242004bda940a24f9ddb. Reverted https://github.com/pytorch/pytorch/pull/132979 on behalf of https://github.com/ezyang due to these are apparently load bearing internally ([comment](https://github.com/pytorch/pytorch/pull/132979#issuecomment-2284666332))	2024-08-12 18:34:56 +00:00
cyy	da65cfbdea	Remove unused Caffe2 macros (#132979 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/132979 Approved by: https://github.com/ezyang	2024-08-09 04:48:20 +00:00
Huamin Li	ff8042bcfb	Enable AOTI shim v2 build and add into libtorch (#125211 ) Summary: Follow up of https://github.com/pytorch/pytorch/pull/125087 This diff will create shim v2 header and cpp file and corresponding build Differential Revision: D56617546 Pull Request resolved: https://github.com/pytorch/pytorch/pull/125211 Approved by: https://github.com/desertfire	2024-05-31 23:56:11 +00:00
Joel Schlosser	9ec8dd2467	Reify view_func() closures as ViewFuncs (#118404 ) Replaces `view_func()` closures with a reified `ViewFunc` data structure. Codegen generates a `ViewFunc` subclass for each view op (e.g. `NarrowViewFunc`) containing state needed to reconstruct the view. The `ViewFunc` API allows for querying and hot-swapping any `SymInt`s or `Tensors` in the state through `get_symints()` / `get_tensors()` / `clone_and_set()`, which will be essential for fake-ification later on. ```cpp /// Base class for view functions, providing reapplication of a view on a new base. /// Each view op should get a codegenerated subclass of this class containing /// any state needed to reconstruct the view. The class also provides convenience /// accessors for saved SymInts / tensor state. This is useful for e.g. fake-ification, /// where we want to use symbolic values or fake tensors instead. struct TORCH_API ViewFunc { virtual ~ViewFunc() {} /// Returns any SymInts in the saved state. virtual std::vector<c10::SymInt> get_symints() const { return {}; } /// Returns the number of SymInts in the saved state. virtual size_t num_symints() const { return 0; } /// Returns any tensors in the saved state. virtual std::vector<at::Tensor> get_tensors() const { return {}; } /// Returns the number of tensors in the saved state. virtual size_t num_tensors() const { return 0; } /// Reapplies the view on the given base using the saved state. virtual at::Tensor operator()(const at::Tensor&) const = 0; /// Returns a clone of this ViewFunc, optionally with the specified saved state. virtual std::unique_ptr<ViewFunc> clone_and_set( std::optional<std::vector<c10::SymInt>> = c10::nullopt, std::optional<std::vector<at::Tensor>> = c10::nullopt) const = 0; protected: /// Sets the values of any SymInts in the saved state. The input vector size must /// match the number of SymInts in the saved state (i.e. the size of the list /// returned by get_symints()). virtual void set_symints(std::vector<c10::SymInt>) {} /// Sets the values of any Tensors in the saved state. The input vector size must /// match the number of Tensors in the saved state (i.e. the size of the list /// returned by get_tensors()). virtual void set_tensors(std::vector<at::Tensor>) {} }; ``` New codegen files: * `torch/csrc/autograd/generated/ViewFunc.h` * `torch/csrc/autograd/generated/ViewFuncs.cpp` The templates for these also contains impls for `ChainedViewFunc` and `ErroringViewFunc` which are used in a few places within autograd. Example codegen for `slice.Tensor`: ```cpp // torch/csrc/autograd/generated/ViewFuncs.h #define SLICE_TENSOR_VIEW_FUNC_AVAILABLE struct SliceTensorViewFunc : public torch::autograd::ViewFunc { SliceTensorViewFunc(int64_t dim, c10::optional<c10::SymInt> start, c10::optional<c10::SymInt> end, c10::SymInt step) : dim(dim), start(start), end(end), step(step) {}; virtual ~SliceTensorViewFunc() override {}; virtual std::vector<c10::SymInt> get_symints() const override; virtual size_t num_symints() const override; virtual std::vector<at::Tensor> get_tensors() const override; virtual size_t num_tensors() const override; virtual at::Tensor operator()(const at::Tensor&) const override; virtual std::unique_ptr<ViewFunc> clone_and_set( std::optional<std::vector<c10::SymInt>> = c10::nullopt, std::optional<std::vector<at::Tensor>> = c10::nullopt) const override; protected: virtual void set_symints(std::vector<c10::SymInt>) override; virtual void set_tensors(std::vector<at::Tensor>) override; private: int64_t dim; c10::optional<c10::SymInt> start; c10::optional<c10::SymInt> end; c10::SymInt step; }; ... // torch/csrc/autograd/generated/ViewFuncs.cpp std::vector<c10::SymInt> SliceTensorViewFunc::get_symints() const { ::std::vector<c10::SymInt> symints; symints.reserve((start.has_value() ? 1 : 0) + (end.has_value() ? 1 : 0) + 1); if(start.has_value()) symints.insert(symints.end(), (start)); if(end.has_value()) symints.insert(symints.end(), (end)); symints.push_back(step); return symints; } size_t SliceTensorViewFunc::num_symints() const { return static_cast<size_t>((start.has_value() ? 1 : 0) + (end.has_value() ? 1 : 0) + 1); } void SliceTensorViewFunc::set_symints(std::vector<c10::SymInt> symints) { TORCH_INTERNAL_ASSERT(symints.size() == num_symints()); auto i = 0; if(start.has_value()) start = symints[i]; i += (start.has_value() ? 1 : 0); if(end.has_value()) end = symints[i]; i += (end.has_value() ? 1 : 0); step = symints[i]; } std::vector<at::Tensor> SliceTensorViewFunc::get_tensors() const { ::std::vector<at::Tensor> tensors; return tensors; } size_t SliceTensorViewFunc::num_tensors() const { return static_cast<size_t>(0); } void SliceTensorViewFunc::set_tensors(std::vector<at::Tensor> tensors) { TORCH_INTERNAL_ASSERT(tensors.size() == num_tensors()); } at::Tensor SliceTensorViewFunc::operator()(const at::Tensor& input_base) const { return at::_ops::slice_Tensor::call(input_base, dim, start, end, step); } std::unique_ptr<ViewFunc> SliceTensorViewFunc::clone_and_set( std::optional<std::vector<c10::SymInt>> symints, std::optional<std::vector<at::Tensor>> tensors) const { auto output = std::make_unique<SliceTensorViewFunc>(dim, start, end, step); if (symints.has_value()) { output->set_symints(std::move((symints))); } if (tensors.has_value()) { output->set_tensors(std::move((tensors))); } return output; } ``` The `_view_func()` / `_view_func_unsafe()` methods now accept two additional (optional) args for `symint_visitor_fn` / `tensor_visitor_fn`. If these are defined, they are expected to be python callables that operate on a single SymInt / tensor and return a new one. This allows for the hot-swapping needed during fake-ification. For testing, there are extensive pre-existing tests, and I added a test to ensure that hot-swapping functions correctly. ```sh python test/test_autograd.py -k test_view_func_replay python test/test_ops.py -k test_view_replay ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/118404 Approved by: https://github.com/ezyang	2024-02-14 22:00:43 +00:00
PyTorch MergeBot	24bdd03d23	Revert "Reify view_func() closures as ViewFuncs (#118404 )" This reverts commit d5a6762263a98e5153bc057c8ba4f377542c7e55. Reverted https://github.com/pytorch/pytorch/pull/118404 on behalf of https://github.com/DanilBaibak due to Broken trunk ([comment](https://github.com/pytorch/pytorch/pull/118404#issuecomment-1938600260))	2024-02-12 12:38:51 +00:00
Joel Schlosser	d5a6762263	Reify view_func() closures as ViewFuncs (#118404 ) Replaces `view_func()` closures with a reified `ViewFunc` data structure. Codegen generates a `ViewFunc` subclass for each view op (e.g. `NarrowViewFunc`) containing state needed to reconstruct the view. The `ViewFunc` API allows for querying and hot-swapping any `SymInt`s or `Tensors` in the state through `get_symints()` / `get_tensors()` / `clone_and_set()`, which will be essential for fake-ification later on. ```cpp /// Base class for view functions, providing reapplication of a view on a new base. /// Each view op should get a codegenerated subclass of this class containing /// any state needed to reconstruct the view. The class also provides convenience /// accessors for saved SymInts / tensor state. This is useful for e.g. fake-ification, /// where we want to use symbolic values or fake tensors instead. struct TORCH_API ViewFunc { virtual ~ViewFunc() {} /// Returns any SymInts in the saved state. virtual std::vector<c10::SymInt> get_symints() const { return {}; } /// Returns the number of SymInts in the saved state. virtual size_t num_symints() const { return 0; } /// Returns any tensors in the saved state. virtual std::vector<at::Tensor> get_tensors() const { return {}; } /// Returns the number of tensors in the saved state. virtual size_t num_tensors() const { return 0; } /// Reapplies the view on the given base using the saved state. virtual at::Tensor operator()(const at::Tensor&) const = 0; /// Returns a clone of this ViewFunc, optionally with the specified saved state. virtual std::unique_ptr<ViewFunc> clone_and_set( std::optional<std::vector<c10::SymInt>> = c10::nullopt, std::optional<std::vector<at::Tensor>> = c10::nullopt) const = 0; protected: /// Sets the values of any SymInts in the saved state. The input vector size must /// match the number of SymInts in the saved state (i.e. the size of the list /// returned by get_symints()). virtual void set_symints(std::vector<c10::SymInt>) {} /// Sets the values of any Tensors in the saved state. The input vector size must /// match the number of Tensors in the saved state (i.e. the size of the list /// returned by get_tensors()). virtual void set_tensors(std::vector<at::Tensor>) {} }; ``` New codegen files: * `torch/csrc/autograd/generated/ViewFunc.h` * `torch/csrc/autograd/generated/ViewFuncs.cpp` The templates for these also contains impls for `ChainedViewFunc` and `ErroringViewFunc` which are used in a few places within autograd. Example codegen for `slice.Tensor`: ```cpp // torch/csrc/autograd/generated/ViewFuncs.h #define SLICE_TENSOR_VIEW_FUNC_AVAILABLE struct SliceTensorViewFunc : public torch::autograd::ViewFunc { SliceTensorViewFunc(int64_t dim, c10::optional<c10::SymInt> start, c10::optional<c10::SymInt> end, c10::SymInt step) : dim(dim), start(start), end(end), step(step) {}; virtual ~SliceTensorViewFunc() override {}; virtual std::vector<c10::SymInt> get_symints() const override; virtual size_t num_symints() const override; virtual std::vector<at::Tensor> get_tensors() const override; virtual size_t num_tensors() const override; virtual at::Tensor operator()(const at::Tensor&) const override; virtual std::unique_ptr<ViewFunc> clone_and_set( std::optional<std::vector<c10::SymInt>> = c10::nullopt, std::optional<std::vector<at::Tensor>> = c10::nullopt) const override; protected: virtual void set_symints(std::vector<c10::SymInt>) override; virtual void set_tensors(std::vector<at::Tensor>) override; private: int64_t dim; c10::optional<c10::SymInt> start; c10::optional<c10::SymInt> end; c10::SymInt step; }; ... // torch/csrc/autograd/generated/ViewFuncs.cpp std::vector<c10::SymInt> SliceTensorViewFunc::get_symints() const { ::std::vector<c10::SymInt> symints; symints.reserve((start.has_value() ? 1 : 0) + (end.has_value() ? 1 : 0) + 1); if(start.has_value()) symints.insert(symints.end(), (start)); if(end.has_value()) symints.insert(symints.end(), (end)); symints.push_back(step); return symints; } size_t SliceTensorViewFunc::num_symints() const { return static_cast<size_t>((start.has_value() ? 1 : 0) + (end.has_value() ? 1 : 0) + 1); } void SliceTensorViewFunc::set_symints(std::vector<c10::SymInt> symints) { TORCH_INTERNAL_ASSERT(symints.size() == num_symints()); auto i = 0; if(start.has_value()) start = symints[i]; i += (start.has_value() ? 1 : 0); if(end.has_value()) end = symints[i]; i += (end.has_value() ? 1 : 0); step = symints[i]; } std::vector<at::Tensor> SliceTensorViewFunc::get_tensors() const { ::std::vector<at::Tensor> tensors; return tensors; } size_t SliceTensorViewFunc::num_tensors() const { return static_cast<size_t>(0); } void SliceTensorViewFunc::set_tensors(std::vector<at::Tensor> tensors) { TORCH_INTERNAL_ASSERT(tensors.size() == num_tensors()); } at::Tensor SliceTensorViewFunc::operator()(const at::Tensor& input_base) const { return at::_ops::slice_Tensor::call(input_base, dim, start, end, step); } std::unique_ptr<ViewFunc> SliceTensorViewFunc::clone_and_set( std::optional<std::vector<c10::SymInt>> symints, std::optional<std::vector<at::Tensor>> tensors) const { auto output = std::make_unique<SliceTensorViewFunc>(dim, start, end, step); if (symints.has_value()) { output->set_symints(std::move((symints))); } if (tensors.has_value()) { output->set_tensors(std::move((tensors))); } return output; } ``` The `_view_func()` / `_view_func_unsafe()` methods now accept two additional (optional) args for `symint_visitor_fn` / `tensor_visitor_fn`. If these are defined, they are expected to be python callables that operate on a single SymInt / tensor and return a new one. This allows for the hot-swapping needed during fake-ification. For testing, there are extensive pre-existing tests, and I added a test to ensure that hot-swapping functions correctly. ```sh python test/test_autograd.py -k test_view_func_replay python test/test_ops.py -k test_view_replay ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/118404 Approved by: https://github.com/ezyang	2024-02-09 18:51:36 +00:00
hongxyan	66a76516bf	[ROCm] Disabling Kernel Asserts for ROCm by default - fix and clean up and refactoring (#114660 ) Related to #103973 #110532 #108404 #94891 Context: As commented in `6ae0554d11/cmake/Dependencies.cmake (L1198)` Kernel asserts are enabled by default for CUDA and disabled for ROCm. However it is somewhat broken, and Kernel assert was still enabled for ROCm. Disabling kernel assert is also needed for users who do not have PCIe atomics support. These community users have verified that disabling the kernel assert in PyTorch/ROCm platform fixed their pytorch workflow, like torch.sum script, stable-diffusion. (see the related issues) Changes: This pull request serves the following purposes: * Refactor and clean up the logic, make it simpler for ROCm to enable and disable Kernel Asserts * Fix the bug that Kernel Asserts for ROCm was not disabled by default. Specifically, - Renamed `TORCH_DISABLE_GPU_ASSERTS` to `C10_USE_ROCM_KERNEL_ASSERT` for the following reasons: (1) This variable only applies to ROCm. (2) The new name is more align with #define CUDA_KERNEL_ASSERT function. (3) With USE_ in front of the name, we can easily control it with environment variable to turn on and off this feature during build (e.g. `USE_ROCM_KERNEL_ASSERT=1 python setup.py develop` will enable kernel assert for ROCm build). - Get rid of the `ROCM_FORCE_ENABLE_GPU_ASSERTS' to simplify the logic and make it easier to understand and maintain - Added `#cmakedefine` to carry over the CMake variable to C++ Tests: (1) build with default mode and verify that USE_ROCM_KERNEL_ASSERT is OFF(0), and kernel assert is disabled: ``` python setup.py develop ``` Verify CMakeCache.txt has correct value. ``` /xxxx/pytorch/build$ grep USE_ROCM_KERNEL_ASSERT CMakeCache.txt USE_ROCM_KERNEL_ASSERT:BOOL=0 ``` Tested the following code in ROCm build and CUDA build, and expected the return code differently. ``` subprocess.call([sys.executable, '-c', "import torch;torch._assert_async(torch.tensor(0,device='cuda'));torch.cuda.synchronize()"]) ``` This piece of code is adapted from below unit test to get around the limitation that this unit test now was skipped for ROCm. (We will check to enable this unit test in the future) ``` python test/test_cuda_expandable_segments.py -k test_fixed_cuda_assert_async ``` Ran the following script, expecting r ==0 since the CUDA_KERNEL_ASSERT is defined as nothing: ``` >> import sys >>> import subprocess >>> r=subprocess.call([sys.executable, '-c', "import torch;torch._assert_async(torch.tensor(0,device='cuda'));torch.cuda.synchronize()"]) >>> r 0 ``` (2) Enable the kernel assert by building with USE_ROCM_KERNEL_ASSERT=1, or USE_ROCM_KERNEL_ASSERT=ON ``` USE_ROCM_KERNEL_ASSERT=1 python setup.py develop ``` Verify `USE_ROCM_KERNEL_ASSERT` is `1` ``` /xxxx/pytorch/build$ grep USE_ROCM_KERNEL_ASSERT CMakeCache.txt USE_ROCM_KERNEL_ASSERT:BOOL=1 ``` Run the assert test, and expected return code not equal to 0. ``` >> import sys >>> import subprocess >>> r=subprocess.call([sys.executable, '-c', "import torch;torch._assert_async(torch.tensor(0,device='cuda'));torch.cuda.synchronize()"]) >>>/xxxx/pytorch/aten/src/ATen/native/hip/TensorCompare.hip:108: _assert_async_cuda_kernel: Device-side assertion `input[0] != 0' failed. :0:rocdevice.cpp :2690: 2435301199202 us: [pid:206019 tid:0x7f6cf0a77700] Callback: Queue 0x7f64e8400000 aborting with error : HSA_STATUS_ERROR_EXCEPTION: An HSAIL operation resulted in a hardware exception. code: 0x1016 >>> r -6 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/114660 Approved by: https://github.com/jeffdaily, https://github.com/malfet, https://github.com/jithunnair-amd	2023-12-13 15:44:53 +00:00
Andrei Gheorghe	00908475e6	Use global variables to register the return_types namedtuples (#108832 ) Fixes #69221. Builds on top of #107000, fixing the buck build issue linked [here](https://github.com/pytorch/pytorch/pull/107000#issuecomment-1708857375). Pull Request resolved: https://github.com/pytorch/pytorch/pull/108832 Approved by: https://github.com/zou3519	2023-09-13 17:42:46 +00:00
PyTorch MergeBot	27d5dcf589	Revert "Use global variables to register the return_types namedtuples (#107000 )" This reverts commit ae8eb7a3f9aee106affca3b27c1f4031bd216730. Reverted https://github.com/pytorch/pytorch/pull/107000 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it is failing internal build ([comment](https://github.com/pytorch/pytorch/pull/107000#issuecomment-1708862325))	2023-09-06 18:13:23 +00:00
Andrei Gheorghe	ae8eb7a3f9	Use global variables to register the return_types namedtuples (#107000 ) Fixes #69221 @pytorchbot label "topic: not user facing" Pull Request resolved: https://github.com/pytorch/pytorch/pull/107000 Approved by: https://github.com/zou3519	2023-09-05 20:00:29 +00:00
mikey dagitses	950431c334	extract out a caffe2 macros library (#98156 ) Slowly carving out the minimal caffe2 dependencies to build PyTorch. Differential Revision: [D44609764](https://our.internmc.facebook.com/intern/diff/D44609764/) NOTE FOR REVIEWERS: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D44609764/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/98156 Approved by: https://github.com/ezyang, https://github.com/PaliC	2023-04-04 10:04:21 +00:00
mikey dagitses	301f00f350	generate caffe2/core/macros.h in shared build structure (#98131 ) This is only used by Bazel for now. Differential Revision: [D44604078](https://our.internmc.facebook.com/intern/diff/D44604078/) NOTE FOR REVIEWERS: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D44604078/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/98131 Approved by: https://github.com/ezyang, https://github.com/PaliC	2023-04-04 09:23:03 +00:00
mikey dagitses	2ac9086987	run buildifier on unified build files (#98141 ) This is pretty tricky. buildifier by default doesn't do much to these files. It does a little more if you tell it that they are `BUILD.bazel` files with -type=build. But it can do even more if you remove the target definitions from the `def define_rules()` wrapper and dedent them. I wrote a little wrapper that does that. I'll submit it at a later date. Differential Revision: [D44606558](https://our.internmc.facebook.com/intern/diff/D44606558/) NOTE FOR REVIEWERS: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D44606558/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/98141 Approved by: https://github.com/ezyang, https://github.com/PaliC	2023-04-04 00:37:19 +00:00
PyTorch MergeBot	f4f1a5b5b3	Revert "Move functional collectives to the right namespace (#97793 )" This reverts commit 184bfbc3d7b37e8f202f4938f6ea9ba557c93b1e. Reverted https://github.com/pytorch/pytorch/pull/97793 on behalf of https://github.com/atalman due to breaks internal builds	2023-03-31 16:02:07 +00:00
Rodrigo Kumpera	184bfbc3d7	Move functional collectives to the right namespace (#97793 ) This moves them from `torch._C._nn` to `torch._C._dist` Pull Request resolved: https://github.com/pytorch/pytorch/pull/97793 Approved by: https://github.com/albanD	2023-03-30 22:18:13 +00:00
Mikayla Gawarecki	e217b30b0f	Add `torch.nested` namespace (#84102 ) First step towards #83775 - only `to_padded_tensor` is moved to the nested namespace for now - following the schema used for `special`, `fft`, `linalg` and other namespaces, nested functions are registered in native_functions.yaml as `nested_{function_name}` and are bound to the desired Python name in `torch/nested/__init__.py`, and the desired C++ name in `torch/csrc/api/include/torch/nested.h`. ~~Question: should we keep the documentation for `Tensor.to_padded_tensor` or can this deleted since it is shared by `torch.nested.to_padded_tensor`?~~ [generated nested docs](https://docs-preview.pytorch.org/84102/nested.html?highlight=nested#module-torch.nested) Differential Revision: [D39361148](https://our.internmc.facebook.com/intern/diff/D39361148) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84102 Approved by: https://github.com/drisspg	2022-09-12 16:31:05 +00:00
YifanShenSZ	673b35c847	Better reshape with autograd support (#82754 ) (#84154 ) The original author is @YifanShenSZ and the original PR is: #82754 # Summary: Previous reshape [https://github.com/pytorch/pytorch/issues/80981](https://github.com/pytorch/pytorch/pull/80981) is ok for forward, but needs improvement for backward: need to handle "sometimes view sometimes copy" behavior. This pull request fixes it by: 1. add a new alias dispatch key `CompositeImplicitAutogradNestedTensor`, which ideally would work as nested-tensor version of `CompositeImplicitAutograd` 2. register `reshape_nested` to `reshape` by `CompositeImplicitAutogradNestedTensor` Side changes: * add contiguous memory format support to `clone_nested` * add `view_nested` * add `reshape_as_nested` Fix issue [https://github.com/pytorch/pytorch/issues/83041](https://github.com/pytorch/pytorch/issues/83041) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82754 Test Plan: Imported from GitHub, without a `Test Plan:` line. Static Docs Preview: executorch \|[Full Site](https://our.intern.facebook.com/intern/staticdocs/eph/D39023822/V13/executorch/)\| \|Modified Pages\| Reviewed By: albanD Differential Revision: D39023822 Pulled By: drisspg Pull Request resolved: https://github.com/pytorch/pytorch/pull/84154 Approved by: https://github.com/bdhirsh, https://github.com/albanD	2022-09-01 20:01:39 +00:00
Mengwei Liu	406ce692ca	[torchgen] Generate wrapper functions under custom namespaces (#81744 ) Summary: A follow up of #81581. Before these 2 PRs, if an operator with custom kernel namespace is added to `native_functions.yaml` (or any other yaml consumed by `torchgen`), although we are able to recognize the custom kernel in files such as `NativeFunctions.h` and `RegisterCPU.cpp`, we still generate backend specific wrappers under the hardcoded `at` namespace. This changes the behavior, by generating wrapper functions under custom namespaces. For example, if the entries in yaml file looks like: ``` - func: op_1(Tensor(a) self) -> Tensor(a) dispatch: CPU: at::op_1_kernel # ATen kernel - func: op_2(Tensor(a) self) -> Tensor(a) dispatch: CPU: custom::op_2_kernel # custom kernel ``` We generate the following code for `CPUFunctions_inl.h` and `RegisterCPU.cpp`: `CPUFunctions_inl.h`: ``` namespace at { namespace cpu { TORCH_API at::Tensor & op_1(const at::Tensor & self); } // namespace cpu } // namespace at namespace custom { namespace cpu { TORCH_API at::Tensor & op_2(const at::Tensor & self); } // namespace cpu } // namespace custom ``` Notice the difference between `at::cpu` and `custom::cpu`. Then the definition for these can be found in `RegisterCPU.cpp`. `RegisterCPU.cpp`: ``` #include "CPUFunctions.h" namespace at { namespace { at::Tensor & wrapper_op_1(const at::Tensor & self) { // No device check // DeviceGuard omitted return at::native::op_1_kernel(self); } } // anonymous namespace TORCH_LIBRARY_IMPL(aten, CPU, m) { m.impl("op_1", TORCH_FN(wrapper_op_1)); } namespace cpu { at::Tensor & op_1(at::Tensor & self) { return wrapper_op_1(self); } } // namespace cpu } // namespace at namespace custom { namespace { at::Tensor & wrapper_op_2(const at::Tensor & self) { // No device check // DeviceGuard omitted return at::native::op_2_kernel(self); } } // anonymous namespace TORCH_LIBRARY_IMPL(aten, CPU, m) { m.impl("op_2", TORCH_FN(wrapper_op_2)); } namespace cpu { at::Tensor & op_2(at::Tensor & self) { return wrapper_op_2(self); } } // namespace cpu } // namespace custom ``` The benefit for this change is that it unifies all the namespaces derived from custom ops. In the example above, there are: 1. `custom::native` for kernels 2. `custom::<dispatch_key>` e.g., `custom::cpu` for wrappers This customized operator will have nothing to do with `at::native`, `at::cpu` etc. Test Plan: This is very hard to test. I will refactor this logic, abstract out some layers so it's testable. Will do it in coming PRs Differential Revision: D37972772 Pull Request resolved: https://github.com/pytorch/pytorch/pull/81744 Approved by: https://github.com/bdhirsh	2022-08-04 07:48:44 +00:00
Richard Zou	5c92777307	Stop checking in VmapGeneratedPlumbing.h (#82351 ) This PR changes VmapGeneratedPlumbing.h to be generated by torchgen. The output file is ATen/VmapGeneratedPlumbing.h. Why generate this file inside PyTorch codegen instead of a separate step in functorch? - I can't figure out how to get functorch's fbcode target to generate - functorch's build system will, in the mid-term, be absorbed into pytorch's build system, so I don't want to do the extra work of adding a step to the functorch build process. Test Plan: - build pytorch, build functorch Pull Request resolved: https://github.com/pytorch/pytorch/pull/82351 Approved by: https://github.com/ezyang	2022-07-27 20:39:37 +00:00
Edward Z. Yang	6f0c253956	Add sparse, quantized and nested tensor meta support to codegen (#81793 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/81793 Approved by: https://github.com/cpuhrsch, https://github.com/bdhirsh	2022-07-21 21:23:56 +00:00
Richard Howell	51cc614cb9	[pytorch] add missing -fexceptions flags (#81394 ) Summary: Add missing `-fexceptions` flags that are currently being passed through `exported_preprocessor_flags`. The exported preprocessor flags will be removed in a subsequent diff. This is a rediff of D37386802 (`3e1ac21c3b`) with the changes split out to avoid reverts. Test Plan: Check flag is present: ``` $ buck uquery xplat/caffe2:common_core -a 'compiler_flags' { "//xplat/caffe2:common_core" : { "compiler_flags" : [ "-fexceptions", "-frtti", "-Os", "-Wno-unknown-pragmas", "-Wno-write-strings", "-Wno-unused-variable", "-Wno-unused-function", "-Wno-deprecated-declarations", "-Wno-shadow", "-Wno-global-constructors", "-Wno-missing-prototypes", "-std=gnu++17", "/EHsc", "/GR", "/O1", "/wd4101" ] } } ``` Differential Revision: D37813869 Pull Request resolved: https://github.com/pytorch/pytorch/pull/81394 Approved by: https://github.com/linbinyu	2022-07-14 20:03:17 +00:00
PyTorch MergeBot	e608befae4	Revert "[c10] move fexceptions to compiler_flags (#80387 )" This reverts commit 3e1ac21c3bcbc0e27dcf058900e9572d3c135a20. Reverted https://github.com/pytorch/pytorch/pull/80387 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally	2022-07-12 14:50:55 +00:00
Richard Howell	3e1ac21c3b	[c10] move fexceptions to compiler_flags (#80387 ) Summary: Move `-fexceptions` out of the exported preprocessor flags and in to the libraries compiler flags. Apply the same changes to all rdeps of this library in the caffe2 subtree. Test Plan: Verify no rdeps are missing `-fexceptions` that have cpp sources: ``` % buck uquery 'kind(cxx*, rdeps(//xplat/caffe2/..., //xplat/caffe2/c10:c10, 1))' > /tmp/rdeps % buck uquery '%Ss - attrfilter(preprocessor_flags, "-fexceptions", %Ss) - attrfilter(compiler_flags, "-fexceptions", %Ss)' @/tmp/rdeps //xplat/pytorch_models/build/pytorch_dev_mobilenetv3/v1/nnc:asm //xplat/pytorch_models/build/aot_test_model/v1/nnc:asm //xplat/pytorch_models/build/pytorch_dev_linear/v1/nnc:asm //xplat/pytorch_models/build/bi_bytedoc_nnc/v1/nnc:asm //xplat/pytorch_models/build/bi_bytedoc_nnc/v2/nnc:asm ``` Differential Revision: D37386802 Pull Request resolved: https://github.com/pytorch/pytorch/pull/80387 Approved by: https://github.com/linbinyu	2022-07-12 14:49:16 +00:00
Md Aamir Raihan	7ea723b8f6	Updating miniz library from version 2.0.8 -> 2.1.0 (#79636 ) Summary: This PR updates the miniz library from version 2.0.8 to 2.1.0. Pull Request resolved: https://github.com/pytorch/pytorch/pull/79636 Approved by: https://github.com/albanD	2022-06-22 15:02:16 +00:00
Michael Andreas Dagitses	e21c0ac9a5	use exe/exepath in our genrules Pull Request resolved: https://github.com/pytorch/pytorch/pull/79626 Buck does not properly handle caching when the executable is identified with `$(location ...)`. See https://fb.workplace.com/groups/askbuck/posts/8600146743367198 for more information. Differential Revision: [D37179273](https://our.internmc.facebook.com/intern/diff/D37179273/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D37179273/)! Approved by: https://github.com/malfet	2022-06-16 02:23:51 +00:00
Michael Andreas Dagitses	86606fbe22	fix generate-code caching by indicating that the binary is an executable Pull Request resolved: https://github.com/pytorch/pytorch/pull/79625 Per Josiah Gaskin's followup on https://www.internalfb.com/intern/qa/365579, using $(exe ...) instead of $(location ...) should address the caching behavior. @override-unit-failures (Note: this ignores all push blocking failures!) Differential Revision: [D36970846](https://our.internmc.facebook.com/intern/diff/D36970846/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36970846/)! Approved by: https://github.com/malfet	2022-06-16 02:21:03 +00:00
Brian Hirsh	adf8060600	add a new alias key for functional to view op decompositions Pull Request resolved: https://github.com/pytorch/pytorch/pull/79615 Approved by: https://github.com/zou3519	2022-06-15 23:18:09 +00:00
Michael Andreas Dagitses	eb5751d84b	move gen_aten and gen_aten_hip into shared build structure Pull Request resolved: https://github.com/pytorch/pytorch/pull/77751 This requires two changes to rule generation: * pulling the cpu static dispatch prediction into the rules * disabling the Bazel-style generated file aliases Differential Revision: [D36481918](https://our.internmc.facebook.com/intern/diff/D36481918/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36481918/)! Approved by: https://github.com/kit1980, https://github.com/seemethere	2022-06-15 18:22:52 +00:00
anjali411	38350acf8f	Autogen Tags enum, and allow specifying tags while defining an op Pull Request resolved: https://github.com/pytorch/pytorch/pull/79322 Approved by: https://github.com/albanD	2022-06-11 00:29:32 +00:00
Michael Andreas Dagitses	7d12eecba1	move GENERATED_CPP_CUDA to caffe2/build.bzl Pull Request resolved: https://github.com/pytorch/pytorch/pull/77744 This is needed by gen_aten and it's immediate downstream libraries. As such, it can live solely in the shared build structure. Differential Revision: [D36480812](https://our.internmc.facebook.com/intern/diff/D36480812/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36480812/)! Approved by: https://github.com/kit1980	2022-06-02 18:38:05 +00:00
Michael Andreas Dagitses	7dc5b5bf10	move generated_srcs_list.bzl into caffe2/build.bzl Pull Request resolved: https://github.com/pytorch/pytorch/pull/77680 This is only used by ATen code generation and libraries. These are about to move into the shared build structure, so let's move this cleanly first. Differential Revision: [D36455725](https://our.internmc.facebook.com/intern/diff/D36455725/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36455725/)! Approved by: https://github.com/kit1980	2022-06-01 23:03:54 +00:00
Antonio Kim	02c4d877b4	Codegen Non-Native IR Nodes (#76535 ) Add codegen infrastructure to generate IR nodes for non-native ops. The proposed change is to add a `non_native` key to the `{backend}_native_functions.yaml` file that contains schema definitions similar to what is found in `native_functions.yaml`. e.g. ``` non_native: ... - func: expand(Tensor input, int[] size, bool is_scalar_expand) -> Tensor ... ``` these definitions are parsed into a `LazyIrSchema` that can be used for generating IR nodes using `GenLazyIR`. Fixes #74628 CC: @wconstab @desertfire @henrytwo Pull Request resolved: https://github.com/pytorch/pytorch/pull/76535 Approved by: https://github.com/wconstab	2022-05-24 19:29:23 +00:00
Michael Andreas Dagitses	c2ff413622	move generated-autograd-headers to the shared build structure Pull Request resolved: https://github.com/pytorch/pytorch/pull/76183 This is a relatively simple target but we have to fix our header expansion to understand generated files. Next step will be to use this in Bazel. Differential Revision: [D35820541](https://our.internmc.facebook.com/intern/diff/D35820541/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D35820541/)! Approved by: https://github.com/dreiss, https://github.com/malfet	2022-05-19 04:31:56 +00:00
Michael Andreas Dagitses	e517fc8b28	eliminate Bazel's libtorch_cpp_generated_sources Pull Request resolved: https://github.com/pytorch/pytorch/pull/76179 This list is redundant with the shared build structure. Differential Revision: [D35818500](https://our.internmc.facebook.com/intern/diff/D35818500/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D35818500/)! Approved by: https://github.com/dreiss	2022-05-17 03:46:49 +00:00
Michael Andreas Dagitses	a013d83bf9	eliminate Bazel's libtorch_python_generated_sources Pull Request resolved: https://github.com/pytorch/pytorch/pull/76178 These contents are already identified in the shared build structure. Differential Revision: [D35817999](https://our.internmc.facebook.com/intern/diff/D35817999/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D35817999/)! Approved by: https://github.com/dreiss	2022-05-17 03:43:02 +00:00
PyTorch MergeBot	7eaf4780ba	Revert "[LT] Store OpKind for each IR subclass in a static field" This reverts commit ac37ddc79557d7a5ec184a7d2924241ccfc8a333. Reverted https://github.com/pytorch/pytorch/pull/76711 on behalf of https://github.com/malfet	2022-05-09 20:50:09 +00:00
Bin Bao	ac37ddc795	[LT] Store OpKind for each IR subclass in a static field Summary: Currently OpKind is stored as an object field called op_ for each IR node, and one usage of op_ is to avoid dynamic_cast in NodeCast when we need to downcast a base-node pointer into a concrete sub-node pointer. As a result, we need to construct and pass in an op when downcasting nodes, and this becomes quite anonnying when we start to implement the trie-based IR node reusing. More importantly, the op for each subclass should be unique for that subclass and thus making it a const static field is a more logical design. In this PR, we still keep the object-level op_ for easier XLA adoption. As furture work, we can come back to remove op_, make the op() method virtual, and get rid of OpKind in all the node constructors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76711 Approved by: https://github.com/wconstab, https://github.com/JackCaoG	2022-05-06 19:14:46 +00:00
mikey dagitses	37fb636b7f	fix package violation caused by D35587412 (#76808 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/76808 This reached into aten/TARGETS in fbcode. ghstack-source-id: 155484095 Test Plan: Verified manually. Reviewed By: dreiss, malfet Differential Revision: D36128458 fbshipit-source-id: c7447b3a40fe905993e799d211241e72183f8acb (cherry picked from commit b68eb7a45d8973fadab2dfcafcbb0f63801abd40)	2022-05-05 23:39:03 +00:00
mikey dagitses	ac45fb9b93	switch Bazel to the shared generate-code genrule (#75790 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/75790 We were building it before, but now we use it in downstream rules. This enables us to eliminate the handwritten genrule. ghstack-source-id: 155300051 Test Plan: Verified locally and in CI. Reviewed By: dreiss Differential Revision: D35645390 fbshipit-source-id: 478bb37a6ec295c232f66383babf46606e83ed5e (cherry picked from commit 2822d4c5b48c6d9282149b2d43cf72d645237196)	2022-05-04 15:26:25 +00:00
mikey dagitses	096ff0ecca	introduce new --gen-dir flag to generate_code and use it in fbcode (#75800 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/75800 This leads to more similarities between OSS CMake and eventually OSS Bazel. We will be able to generate files with the same names and not have different file lists between the builds. ghstack-source-id: 155300043 Test Plan: Verified locally and in CI. Reviewed By: dreiss Differential Revision: D35648586 fbshipit-source-id: 9f1638b5665ebcc64466883f65ef24a2bfd05228 (cherry picked from commit 7f2acff1baa8dfafddefdc720714f8d39feda436)	2022-05-04 15:26:25 +00:00
mikey dagitses	401179f263	disable the //:generate-code target in Bazel (#76174 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/76174 This is about to conflict with the existing Bazel codegen outputs. Switch to it atomically. ghstack-source-id: 155029309 Test Plan: Verify manually and rely on CI. Reviewed By: dreiss Differential Revision: D35815288 fbshipit-source-id: 8b35e7baeb8572aef13c07cac689ee84dc7335d5 (cherry picked from commit 6dde9831a30fcf664b73fccaa51e30a7049b3251)	2022-05-03 12:13:19 +00:00
mikey dagitses	eb27c85160	move generate-code into shared build structure (#75699 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/75699 ghstack-source-id: 155255334 Test Plan: Rely on CI. Reviewed By: dreiss Differential Revision: D35587412 fbshipit-source-id: 5ab79c07029de279a1fae36519654a73bb61d430 (cherry picked from commit 4896b72a6c0cc087e36889d21d2d885009d94a6d)	2022-05-03 09:53:37 +00:00
mikey dagitses	8b1cf8ed6b	move version_h to shared build structure in Buck (#75964 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/75964 This is already in the shared build structure for Bazel, but we need to implement genrule for fbcode. There's an xplat target that can't build in fbcode yet because the dependencies don't line up, so we have to add a tag to exclude it. ghstack-source-id: 154696020 Test Plan: Rely on CI Reviewed By: malfet Differential Revision: D35443900 fbshipit-source-id: 0768b29906c8218d7aebfdc7c18d69f59a0c9384 (cherry picked from commit bff47be441bd142392a07aa177be02e18aa86f1c)	2022-04-26 12:06:09 +00:00
mikey dagitses	f4200600e4	move Bazel version header generation to shared build structure (#75332 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/75332 ghstack-source-id: 154678044 Test Plan: Rely on OSS CI. Reviewed By: malfet Differential Revision: D35434229 fbshipit-source-id: 7cdd33fa32d0c485f44477e414c24c9bc4b74963 (cherry picked from commit 60285c613e8703c52f36f0bf1178e35c04574ffa)	2022-04-25 17:51:30 +00:00
mikey dagitses	d78dd825ba	define the caffe2_serialize target in Bazel (#75942 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/75942 This also requires changes to the target definition and the xplat translator to get it working. ghstack-source-id: 154678046 Test Plan: Verify locally and rely on CI. Reviewed By: malfet Differential Revision: D35704597 fbshipit-source-id: 6b0d9f5a044609b24dda656f80233ba6186c097f (cherry picked from commit 6de43c5ca7a973c9f8b71f4d60d4d5e85cc2ba21)	2022-04-25 16:14:05 +00:00
Sergii Dymchenko	a5b4839f35	Move //xplat/caffe2:caffe2_serialize to shared build structure Summary: This is a first step to migrate xplat targets to shared build structure. Eventually both xplat buck and open source bazel targets will be generated from shared build.bzl. Test Plan: Should be no-op, rely on CI. Reviewed By: malfet Differential Revision: D35270004 Pull Request resolved: https://github.com/pytorch/pytorch/pull/75847 Approved by: https://github.com/linbinyu	2022-04-15 17:25:29 +00:00

50 Commits