Commit Graph

50 Commits

Author SHA1 Message Date
564d00f364 Revert "Fix clang-tidy warnings in Caffe2 code (#134935)"
This reverts commit 7cfd23636c8fa6fcbb8bf3ea34e15b847ec9ad9d.

Reverted https://github.com/pytorch/pytorch/pull/134935 on behalf of https://github.com/izaitsevfb due to breaks internal builds, caffe2 is still used internally ([comment](https://github.com/pytorch/pytorch/pull/134935#issuecomment-2349368152))
2024-09-13 16:42:37 +00:00
cyy
7cfd23636c Fix clang-tidy warnings in Caffe2 code (#134935)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134935
Approved by: https://github.com/ezyang
2024-09-12 03:27:09 +00:00
26ec06e45d [amd][lowering] hipify shim v2 headers (#134689)
Summary: The default c_shim version was switched to 2 for HIP in D60674018. This results in some linking errors where shim function symbols are missing from the compiled .so file (eg. P1551186492) when building lowering benchmark scripts since the required files aren't included. Hipify the shim v2 generated header files as well since they're needed during codegen when the buck binaries are executed.

Reviewed By: frank-wei, zoranzhao, henryoier

Differential Revision: D61865202

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134689
Approved by: https://github.com/zoranzhao
2024-08-28 21:53:24 +00:00
c638a40a93 [Caffe2] Remove unused AVX512 code (#133160)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133160
Approved by: https://github.com/albanD
2024-08-23 23:16:16 +00:00
fa1d7b0262 Revert "Remove unused Caffe2 macros (#132979)"
This reverts commit da65cfbdea4f1f2176f6242004bda940a24f9ddb.

Reverted https://github.com/pytorch/pytorch/pull/132979 on behalf of https://github.com/ezyang due to these are apparently load bearing internally ([comment](https://github.com/pytorch/pytorch/pull/132979#issuecomment-2284666332))
2024-08-12 18:34:56 +00:00
cyy
da65cfbdea Remove unused Caffe2 macros (#132979)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132979
Approved by: https://github.com/ezyang
2024-08-09 04:48:20 +00:00
ff8042bcfb Enable AOTI shim v2 build and add into libtorch (#125211)
Summary:
Follow up of https://github.com/pytorch/pytorch/pull/125087

This diff will create shim v2 header and cpp file and corresponding build

Differential Revision: D56617546

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125211
Approved by: https://github.com/desertfire
2024-05-31 23:56:11 +00:00
9ec8dd2467 Reify view_func() closures as ViewFuncs (#118404)
Replaces `view_func()` closures with a reified `ViewFunc` data structure. Codegen generates a `ViewFunc` subclass for each view op (e.g. `NarrowViewFunc`) containing state needed to reconstruct the view. The `ViewFunc` API allows for querying and hot-swapping any `SymInt`s or `Tensors` in the state through `get_symints()` / `get_tensors()` / `clone_and_set()`, which will be essential for fake-ification later on.

```cpp
/// Base class for view functions, providing reapplication of a view on a new base.
/// Each view op should get a codegenerated subclass of this class containing
/// any state needed to reconstruct the view. The class also provides convenience
/// accessors for saved SymInts / tensor state. This is useful for e.g. fake-ification,
/// where we want to use symbolic values or fake tensors instead.
struct TORCH_API ViewFunc {
  virtual ~ViewFunc() {}
  /// Returns any SymInts in the saved state.
  virtual std::vector<c10::SymInt> get_symints() const { return {}; }
  /// Returns the number of SymInts in the saved state.
  virtual size_t num_symints() const { return 0; }
  /// Returns any tensors in the saved state.
  virtual std::vector<at::Tensor> get_tensors() const { return {}; }
  /// Returns the number of tensors in the saved state.
  virtual size_t num_tensors() const { return 0; }
  /// Reapplies the view on the given base using the saved state.
  virtual at::Tensor operator()(const at::Tensor&) const = 0;
  /// Returns a clone of this ViewFunc, optionally with the specified saved state.
  virtual std::unique_ptr<ViewFunc> clone_and_set(
      std::optional<std::vector<c10::SymInt>> = c10::nullopt,
      std::optional<std::vector<at::Tensor>> = c10::nullopt) const = 0;

protected:
  /// Sets the values of any SymInts in the saved state. The input vector size must
  /// match the number of SymInts in the saved state (i.e. the size of the list
  /// returned by get_symints()).
  virtual void set_symints(std::vector<c10::SymInt>) {}
  /// Sets the values of any Tensors in the saved state. The input vector size must
  /// match the number of Tensors in the saved state (i.e. the size of the list
  /// returned by get_tensors()).
  virtual void set_tensors(std::vector<at::Tensor>) {}
};
```

New codegen files:
* `torch/csrc/autograd/generated/ViewFunc.h`
* `torch/csrc/autograd/generated/ViewFuncs.cpp`

The templates for these also contains impls for `ChainedViewFunc` and `ErroringViewFunc` which are used in a few places within autograd.

Example codegen for `slice.Tensor`:
```cpp
// torch/csrc/autograd/generated/ViewFuncs.h
#define SLICE_TENSOR_VIEW_FUNC_AVAILABLE
struct SliceTensorViewFunc : public torch::autograd::ViewFunc {
  SliceTensorViewFunc(int64_t dim, c10::optional<c10::SymInt> start, c10::optional<c10::SymInt> end, c10::SymInt step) : dim(dim), start(start), end(end), step(step)
  {};
  virtual ~SliceTensorViewFunc() override {};
  virtual std::vector<c10::SymInt> get_symints() const override;
  virtual size_t num_symints() const override;
  virtual std::vector<at::Tensor> get_tensors() const override;
  virtual size_t num_tensors() const override;
  virtual at::Tensor operator()(const at::Tensor&) const override;
  virtual std::unique_ptr<ViewFunc> clone_and_set(
      std::optional<std::vector<c10::SymInt>> = c10::nullopt,
      std::optional<std::vector<at::Tensor>> = c10::nullopt) const override;

protected:
  virtual void set_symints(std::vector<c10::SymInt>) override;
  virtual void set_tensors(std::vector<at::Tensor>) override;

private:
  int64_t dim;
  c10::optional<c10::SymInt> start;
  c10::optional<c10::SymInt> end;
  c10::SymInt step;
};
...

// torch/csrc/autograd/generated/ViewFuncs.cpp
std::vector<c10::SymInt> SliceTensorViewFunc::get_symints() const {
  ::std::vector<c10::SymInt> symints;
  symints.reserve((start.has_value() ? 1 : 0) + (end.has_value() ? 1 : 0) + 1);
  if(start.has_value()) symints.insert(symints.end(), *(start));
  if(end.has_value()) symints.insert(symints.end(), *(end));
  symints.push_back(step);
  return symints;
}

size_t SliceTensorViewFunc::num_symints() const {
  return static_cast<size_t>((start.has_value() ? 1 : 0) + (end.has_value() ? 1 : 0) + 1);
}

void SliceTensorViewFunc::set_symints(std::vector<c10::SymInt> symints) {
  TORCH_INTERNAL_ASSERT(symints.size() == num_symints());
  auto i = 0;
  if(start.has_value()) start = symints[i];
  i += (start.has_value() ? 1 : 0);
  if(end.has_value()) end = symints[i];
  i += (end.has_value() ? 1 : 0);
  step = symints[i];
}

std::vector<at::Tensor> SliceTensorViewFunc::get_tensors() const {
  ::std::vector<at::Tensor> tensors;
  return tensors;
}

size_t SliceTensorViewFunc::num_tensors() const {
  return static_cast<size_t>(0);
}

void SliceTensorViewFunc::set_tensors(std::vector<at::Tensor> tensors) {
  TORCH_INTERNAL_ASSERT(tensors.size() == num_tensors());

}

at::Tensor SliceTensorViewFunc::operator()(const at::Tensor& input_base) const {
  return at::_ops::slice_Tensor::call(input_base, dim, start, end, step);
}

std::unique_ptr<ViewFunc> SliceTensorViewFunc::clone_and_set(
    std::optional<std::vector<c10::SymInt>> symints,
    std::optional<std::vector<at::Tensor>> tensors) const {
  auto output = std::make_unique<SliceTensorViewFunc>(dim, start, end, step);
  if (symints.has_value()) {
    output->set_symints(std::move(*(symints)));
  }
  if (tensors.has_value()) {
    output->set_tensors(std::move(*(tensors)));
  }
  return output;
}
```

The `_view_func()` / `_view_func_unsafe()` methods now accept two additional (optional) args for `symint_visitor_fn` / `tensor_visitor_fn`. If these are defined, they are expected to be python callables that operate on a single SymInt / tensor and return a new one. This allows for the hot-swapping needed during fake-ification.

For testing, there are extensive pre-existing tests, and I added a test to ensure that hot-swapping functions correctly.
```sh
python test/test_autograd.py -k test_view_func_replay
python test/test_ops.py -k test_view_replay
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118404
Approved by: https://github.com/ezyang
2024-02-14 22:00:43 +00:00
24bdd03d23 Revert "Reify view_func() closures as ViewFuncs (#118404)"
This reverts commit d5a6762263a98e5153bc057c8ba4f377542c7e55.

Reverted https://github.com/pytorch/pytorch/pull/118404 on behalf of https://github.com/DanilBaibak due to Broken trunk ([comment](https://github.com/pytorch/pytorch/pull/118404#issuecomment-1938600260))
2024-02-12 12:38:51 +00:00
d5a6762263 Reify view_func() closures as ViewFuncs (#118404)
Replaces `view_func()` closures with a reified `ViewFunc` data structure. Codegen generates a `ViewFunc` subclass for each view op (e.g. `NarrowViewFunc`) containing state needed to reconstruct the view. The `ViewFunc` API allows for querying and hot-swapping any `SymInt`s or `Tensors` in the state through `get_symints()` / `get_tensors()` / `clone_and_set()`, which will be essential for fake-ification later on.

```cpp
/// Base class for view functions, providing reapplication of a view on a new base.
/// Each view op should get a codegenerated subclass of this class containing
/// any state needed to reconstruct the view. The class also provides convenience
/// accessors for saved SymInts / tensor state. This is useful for e.g. fake-ification,
/// where we want to use symbolic values or fake tensors instead.
struct TORCH_API ViewFunc {
  virtual ~ViewFunc() {}
  /// Returns any SymInts in the saved state.
  virtual std::vector<c10::SymInt> get_symints() const { return {}; }
  /// Returns the number of SymInts in the saved state.
  virtual size_t num_symints() const { return 0; }
  /// Returns any tensors in the saved state.
  virtual std::vector<at::Tensor> get_tensors() const { return {}; }
  /// Returns the number of tensors in the saved state.
  virtual size_t num_tensors() const { return 0; }
  /// Reapplies the view on the given base using the saved state.
  virtual at::Tensor operator()(const at::Tensor&) const = 0;
  /// Returns a clone of this ViewFunc, optionally with the specified saved state.
  virtual std::unique_ptr<ViewFunc> clone_and_set(
      std::optional<std::vector<c10::SymInt>> = c10::nullopt,
      std::optional<std::vector<at::Tensor>> = c10::nullopt) const = 0;

protected:
  /// Sets the values of any SymInts in the saved state. The input vector size must
  /// match the number of SymInts in the saved state (i.e. the size of the list
  /// returned by get_symints()).
  virtual void set_symints(std::vector<c10::SymInt>) {}
  /// Sets the values of any Tensors in the saved state. The input vector size must
  /// match the number of Tensors in the saved state (i.e. the size of the list
  /// returned by get_tensors()).
  virtual void set_tensors(std::vector<at::Tensor>) {}
};
```

New codegen files:
* `torch/csrc/autograd/generated/ViewFunc.h`
* `torch/csrc/autograd/generated/ViewFuncs.cpp`

The templates for these also contains impls for `ChainedViewFunc` and `ErroringViewFunc` which are used in a few places within autograd.

Example codegen for `slice.Tensor`:
```cpp
// torch/csrc/autograd/generated/ViewFuncs.h
#define SLICE_TENSOR_VIEW_FUNC_AVAILABLE
struct SliceTensorViewFunc : public torch::autograd::ViewFunc {
  SliceTensorViewFunc(int64_t dim, c10::optional<c10::SymInt> start, c10::optional<c10::SymInt> end, c10::SymInt step) : dim(dim), start(start), end(end), step(step)
  {};
  virtual ~SliceTensorViewFunc() override {};
  virtual std::vector<c10::SymInt> get_symints() const override;
  virtual size_t num_symints() const override;
  virtual std::vector<at::Tensor> get_tensors() const override;
  virtual size_t num_tensors() const override;
  virtual at::Tensor operator()(const at::Tensor&) const override;
  virtual std::unique_ptr<ViewFunc> clone_and_set(
      std::optional<std::vector<c10::SymInt>> = c10::nullopt,
      std::optional<std::vector<at::Tensor>> = c10::nullopt) const override;

protected:
  virtual void set_symints(std::vector<c10::SymInt>) override;
  virtual void set_tensors(std::vector<at::Tensor>) override;

private:
  int64_t dim;
  c10::optional<c10::SymInt> start;
  c10::optional<c10::SymInt> end;
  c10::SymInt step;
};
...

// torch/csrc/autograd/generated/ViewFuncs.cpp
std::vector<c10::SymInt> SliceTensorViewFunc::get_symints() const {
  ::std::vector<c10::SymInt> symints;
  symints.reserve((start.has_value() ? 1 : 0) + (end.has_value() ? 1 : 0) + 1);
  if(start.has_value()) symints.insert(symints.end(), *(start));
  if(end.has_value()) symints.insert(symints.end(), *(end));
  symints.push_back(step);
  return symints;
}

size_t SliceTensorViewFunc::num_symints() const {
  return static_cast<size_t>((start.has_value() ? 1 : 0) + (end.has_value() ? 1 : 0) + 1);
}

void SliceTensorViewFunc::set_symints(std::vector<c10::SymInt> symints) {
  TORCH_INTERNAL_ASSERT(symints.size() == num_symints());
  auto i = 0;
  if(start.has_value()) start = symints[i];
  i += (start.has_value() ? 1 : 0);
  if(end.has_value()) end = symints[i];
  i += (end.has_value() ? 1 : 0);
  step = symints[i];
}

std::vector<at::Tensor> SliceTensorViewFunc::get_tensors() const {
  ::std::vector<at::Tensor> tensors;
  return tensors;
}

size_t SliceTensorViewFunc::num_tensors() const {
  return static_cast<size_t>(0);
}

void SliceTensorViewFunc::set_tensors(std::vector<at::Tensor> tensors) {
  TORCH_INTERNAL_ASSERT(tensors.size() == num_tensors());

}

at::Tensor SliceTensorViewFunc::operator()(const at::Tensor& input_base) const {
  return at::_ops::slice_Tensor::call(input_base, dim, start, end, step);
}

std::unique_ptr<ViewFunc> SliceTensorViewFunc::clone_and_set(
    std::optional<std::vector<c10::SymInt>> symints,
    std::optional<std::vector<at::Tensor>> tensors) const {
  auto output = std::make_unique<SliceTensorViewFunc>(dim, start, end, step);
  if (symints.has_value()) {
    output->set_symints(std::move(*(symints)));
  }
  if (tensors.has_value()) {
    output->set_tensors(std::move(*(tensors)));
  }
  return output;
}
```

The `_view_func()` / `_view_func_unsafe()` methods now accept two additional (optional) args for `symint_visitor_fn` / `tensor_visitor_fn`. If these are defined, they are expected to be python callables that operate on a single SymInt / tensor and return a new one. This allows for the hot-swapping needed during fake-ification.

For testing, there are extensive pre-existing tests, and I added a test to ensure that hot-swapping functions correctly.
```sh
python test/test_autograd.py -k test_view_func_replay
python test/test_ops.py -k test_view_replay
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118404
Approved by: https://github.com/ezyang
2024-02-09 18:51:36 +00:00
66a76516bf [ROCm] Disabling Kernel Asserts for ROCm by default - fix and clean up and refactoring (#114660)
Related to #103973  #110532 #108404 #94891

**Context:**
As commented in 6ae0554d11/cmake/Dependencies.cmake (L1198)
Kernel asserts are enabled by default for CUDA and disabled for ROCm.
However it is somewhat broken, and Kernel assert was still enabled for ROCm.

Disabling kernel assert is also needed for users who do not have PCIe atomics support. These community users have verified that disabling the kernel assert in PyTorch/ROCm platform fixed their pytorch workflow, like torch.sum script, stable-diffusion. (see the related issues)

**Changes:**

This pull request serves the following purposes:
* Refactor and clean up the logic,  make it simpler for ROCm to enable and disable Kernel Asserts
* Fix the bug that Kernel Asserts for ROCm was not disabled by default.

Specifically,
- Renamed `TORCH_DISABLE_GPU_ASSERTS` to `C10_USE_ROCM_KERNEL_ASSERT` for the following reasons:
(1) This variable only applies to ROCm.
(2) The new name is more align with #define CUDA_KERNEL_ASSERT function.
(3) With USE_ in front of the name, we can easily control it with environment variable to turn on and off this feature during build (e.g. `USE_ROCM_KERNEL_ASSERT=1 python setup.py develop` will enable kernel assert for ROCm build).
- Get rid of the `ROCM_FORCE_ENABLE_GPU_ASSERTS' to simplify the logic and make it easier to understand and maintain
- Added `#cmakedefine` to carry over the CMake variable to C++

**Tests:**
(1) build with default mode and verify that USE_ROCM_KERNEL_ASSERT  is OFF(0), and kernel assert is disabled:

```
python setup.py develop
```
Verify CMakeCache.txt has correct value.
```
/xxxx/pytorch/build$ grep USE_ROCM_KERNEL_ASSERT CMakeCache.txt
USE_ROCM_KERNEL_ASSERT:BOOL=0
```
Tested the following code in ROCm build and CUDA build, and expected the return code differently.

```
subprocess.call([sys.executable, '-c', "import torch;torch._assert_async(torch.tensor(0,device='cuda'));torch.cuda.synchronize()"])
```
This piece of code is adapted from below unit test to get around the limitation that this unit test now was skipped for ROCm. (We will check to enable this unit test in the future)

```
python test/test_cuda_expandable_segments.py -k test_fixed_cuda_assert_async
```

Ran the following script, expecting r ==0 since the CUDA_KERNEL_ASSERT is defined as nothing:
```
>> import sys
>>> import subprocess
>>> r=subprocess.call([sys.executable, '-c', "import torch;torch._assert_async(torch.tensor(0,device='cuda'));torch.cuda.synchronize()"])
>>> r
0
```

(2) Enable the kernel assert by building with USE_ROCM_KERNEL_ASSERT=1, or USE_ROCM_KERNEL_ASSERT=ON
```
USE_ROCM_KERNEL_ASSERT=1 python setup.py develop
```

Verify `USE_ROCM_KERNEL_ASSERT` is `1`
```
/xxxx/pytorch/build$ grep USE_ROCM_KERNEL_ASSERT CMakeCache.txt
USE_ROCM_KERNEL_ASSERT:BOOL=1
```

Run the assert test, and expected return code not equal to 0.

```
>> import sys
>>> import subprocess
>>> r=subprocess.call([sys.executable, '-c', "import torch;torch._assert_async(torch.tensor(0,device='cuda'));torch.cuda.synchronize()"])
>>>/xxxx/pytorch/aten/src/ATen/native/hip/TensorCompare.hip:108: _assert_async_cuda_kernel: Device-side assertion `input[0] != 0' failed.
:0:rocdevice.cpp            :2690: 2435301199202 us: [pid:206019 tid:0x7f6cf0a77700] Callback: Queue 0x7f64e8400000 aborting with error : HSA_STATUS_ERROR_EXCEPTION: An HSAIL operation resulted in a hardware exception. code: 0x1016

>>> r
-6
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114660
Approved by: https://github.com/jeffdaily, https://github.com/malfet, https://github.com/jithunnair-amd
2023-12-13 15:44:53 +00:00
00908475e6 Use global variables to register the return_types namedtuples (#108832)
Fixes #69221. Builds on top of #107000, fixing the buck build issue linked [here](https://github.com/pytorch/pytorch/pull/107000#issuecomment-1708857375).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/108832
Approved by: https://github.com/zou3519
2023-09-13 17:42:46 +00:00
27d5dcf589 Revert "Use global variables to register the return_types namedtuples (#107000)"
This reverts commit ae8eb7a3f9aee106affca3b27c1f4031bd216730.

Reverted https://github.com/pytorch/pytorch/pull/107000 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it is failing internal build ([comment](https://github.com/pytorch/pytorch/pull/107000#issuecomment-1708862325))
2023-09-06 18:13:23 +00:00
ae8eb7a3f9 Use global variables to register the return_types namedtuples (#107000)
Fixes #69221

@pytorchbot label "topic: not user facing"
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107000
Approved by: https://github.com/zou3519
2023-09-05 20:00:29 +00:00
950431c334 extract out a caffe2 macros library (#98156)
Slowly carving out the minimal caffe2 dependencies to build PyTorch.

Differential Revision: [D44609764](https://our.internmc.facebook.com/intern/diff/D44609764/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D44609764/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98156
Approved by: https://github.com/ezyang, https://github.com/PaliC
2023-04-04 10:04:21 +00:00
301f00f350 generate caffe2/core/macros.h in shared build structure (#98131)
This is only used by Bazel for now.

Differential Revision: [D44604078](https://our.internmc.facebook.com/intern/diff/D44604078/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D44604078/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98131
Approved by: https://github.com/ezyang, https://github.com/PaliC
2023-04-04 09:23:03 +00:00
2ac9086987 run buildifier on unified build files (#98141)
This is pretty tricky. buildifier by default doesn't do much to these
files. It does a little more if you tell it that they are
`BUILD.bazel` files with -type=build. But it can do even more if you
remove the target definitions from the `def define_rules()` wrapper
and dedent them.

I wrote a little wrapper that does that. I'll submit it at a later
date.

Differential Revision: [D44606558](https://our.internmc.facebook.com/intern/diff/D44606558/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D44606558/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98141
Approved by: https://github.com/ezyang, https://github.com/PaliC
2023-04-04 00:37:19 +00:00
f4f1a5b5b3 Revert "Move functional collectives to the right namespace (#97793)"
This reverts commit 184bfbc3d7b37e8f202f4938f6ea9ba557c93b1e.

Reverted https://github.com/pytorch/pytorch/pull/97793 on behalf of https://github.com/atalman due to breaks internal builds
2023-03-31 16:02:07 +00:00
184bfbc3d7 Move functional collectives to the right namespace (#97793)
This moves them from `torch._C._nn` to `torch._C._dist`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97793
Approved by: https://github.com/albanD
2023-03-30 22:18:13 +00:00
e217b30b0f Add torch.nested namespace (#84102)
First step towards #83775
- only `to_padded_tensor` is moved to the nested namespace for now
- following the schema used for `special`, `fft`, `linalg` and other namespaces, nested functions are registered in native_functions.yaml as `nested_{function_name}` and are bound to the desired Python name in
`torch/nested/__init__.py`, and the desired C++ name in `torch/csrc/api/include/torch/nested.h`.

~~**Question**: should we keep the documentation for `Tensor.to_padded_tensor` or can this deleted since it is shared by `torch.nested.to_padded_tensor`?~~

[generated nested docs](https://docs-preview.pytorch.org/84102/nested.html?highlight=nested#module-torch.nested)

Differential Revision: [D39361148](https://our.internmc.facebook.com/intern/diff/D39361148)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84102
Approved by: https://github.com/drisspg
2022-09-12 16:31:05 +00:00
673b35c847 Better reshape with autograd support (#82754) (#84154)
The original author is @YifanShenSZ  and the original PR is: #82754
# Summary:
Previous reshape [https://github.com/pytorch/pytorch/issues/80981](https://github.com/pytorch/pytorch/pull/80981) is ok for forward, but needs improvement for backward: need to handle "sometimes view sometimes copy" behavior.

This pull request fixes it by:
1. add a new alias dispatch key `CompositeImplicitAutogradNestedTensor`, which ideally would work as nested-tensor version of `CompositeImplicitAutograd`
2. register `reshape_nested` to `reshape` by `CompositeImplicitAutogradNestedTensor`

Side changes:
* add contiguous memory format support to `clone_nested`
* add `view_nested`
* add `reshape_as_nested`

Fix issue [https://github.com/pytorch/pytorch/issues/83041](https://github.com/pytorch/pytorch/issues/83041)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82754

Test Plan:
Imported from GitHub, without a `Test Plan:` line.

**Static Docs Preview: executorch**
|[Full Site](https://our.intern.facebook.com/intern/staticdocs/eph/D39023822/V13/executorch/)|

|**Modified Pages**|

Reviewed By: albanD

Differential Revision: D39023822

Pulled By: drisspg

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84154
Approved by: https://github.com/bdhirsh, https://github.com/albanD
2022-09-01 20:01:39 +00:00
406ce692ca [torchgen] Generate wrapper functions under custom namespaces (#81744)
Summary:
A follow up of #81581. Before these 2 PRs, if an operator with custom kernel namespace is added to `native_functions.yaml` (or any other yaml consumed by `torchgen`), although we are able to recognize the custom kernel in files such as `NativeFunctions.h` and `RegisterCPU.cpp`, we still generate backend specific wrappers under the hardcoded `at` namespace. This changes the behavior, by generating wrapper functions under custom namespaces.

For example, if the entries in yaml file looks like:

```
 - func: op_1(Tensor(a) self) -> Tensor(a)
  dispatch:
    CPU: at::op_1_kernel # ATen kernel

- func: op_2(Tensor(a) self) -> Tensor(a)
  dispatch:
    CPU: custom::op_2_kernel # custom kernel
```

We generate the following code for `CPUFunctions_inl.h` and `RegisterCPU.cpp`:

`CPUFunctions_inl.h`:
```
namespace at {
namespace cpu {
TORCH_API at::Tensor & op_1(const at::Tensor & self);
} // namespace cpu
} // namespace at

namespace custom {
namespace cpu {
TORCH_API at::Tensor & op_2(const at::Tensor & self);
} // namespace cpu
} // namespace custom

```

Notice the difference between `at::cpu` and `custom::cpu`.

Then the definition for these can be found in `RegisterCPU.cpp`.

`RegisterCPU.cpp`:
```
#include "CPUFunctions.h"

namespace at {

namespace {
at::Tensor & wrapper_op_1(const at::Tensor & self) {
    // No device check
  // DeviceGuard omitted
  return at::native::op_1_kernel(self);
}
} // anonymous namespace

TORCH_LIBRARY_IMPL(aten, CPU, m) {
m.impl("op_1", TORCH_FN(wrapper_op_1));
}

namespace cpu {
at::Tensor & op_1(at::Tensor & self) {
  return wrapper_op_1(self);
}
} // namespace cpu
} // namespace at

namespace custom {

namespace {
at::Tensor & wrapper_op_2(const at::Tensor & self) {
    // No device check
  // DeviceGuard omitted
  return at::native::op_2_kernel(self);
}
} // anonymous namespace

TORCH_LIBRARY_IMPL(aten, CPU, m) {
m.impl("op_2", TORCH_FN(wrapper_op_2));
}

namespace cpu {
at::Tensor & op_2(at::Tensor & self) {
  return wrapper_op_2(self);
}
} // namespace cpu
} // namespace custom

```

The benefit for this change is that it unifies all the namespaces derived from custom ops. In the example above, there are:

1. `custom::native` for kernels
2. `custom::<dispatch_key>` e.g., `custom::cpu` for wrappers

This customized operator will have nothing to do with `at::native`, `at::cpu` etc.

Test Plan: This is very hard to test. I will refactor this logic, abstract out some layers so it's testable. Will do it in coming PRs

Differential Revision: D37972772

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81744
Approved by: https://github.com/bdhirsh
2022-08-04 07:48:44 +00:00
5c92777307 Stop checking in VmapGeneratedPlumbing.h (#82351)
This PR changes VmapGeneratedPlumbing.h to be generated by torchgen. The
output file is ATen/VmapGeneratedPlumbing.h.

Why generate this file inside PyTorch codegen instead of a separate step
in functorch?
- I can't figure out how to get functorch's fbcode target to generate
- functorch's build system will, in the mid-term, be absorbed into
pytorch's build system, so I don't want to do the extra work of adding
a step to the functorch build process.

Test Plan:
- build pytorch, build functorch
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82351
Approved by: https://github.com/ezyang
2022-07-27 20:39:37 +00:00
6f0c253956 Add sparse, quantized and nested tensor meta support to codegen (#81793)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81793
Approved by: https://github.com/cpuhrsch, https://github.com/bdhirsh
2022-07-21 21:23:56 +00:00
51cc614cb9 [pytorch] add missing -fexceptions flags (#81394)
Summary:
Add missing `-fexceptions` flags that are currently being passed through `exported_preprocessor_flags`. The exported preprocessor flags will be removed in a subsequent diff.

This is a rediff of D37386802 (3e1ac21c3b) with the changes split out to avoid reverts.

Test Plan:
Check flag is present:
```
$ buck uquery xplat/caffe2:common_core -a 'compiler_flags'
{
  "//xplat/caffe2:common_core" : {
    "compiler_flags" : [
      "-fexceptions",
      "-frtti",
      "-Os",
      "-Wno-unknown-pragmas",
      "-Wno-write-strings",
      "-Wno-unused-variable",
      "-Wno-unused-function",
      "-Wno-deprecated-declarations",
      "-Wno-shadow",
      "-Wno-global-constructors",
      "-Wno-missing-prototypes",
      "-std=gnu++17",
      "/EHsc",
      "/GR",
      "/O1",
      "/wd4101"
    ]
  }
}
```

Differential Revision: D37813869

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81394
Approved by: https://github.com/linbinyu
2022-07-14 20:03:17 +00:00
e608befae4 Revert "[c10] move fexceptions to compiler_flags (#80387)"
This reverts commit 3e1ac21c3bcbc0e27dcf058900e9572d3c135a20.

Reverted https://github.com/pytorch/pytorch/pull/80387 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally
2022-07-12 14:50:55 +00:00
3e1ac21c3b [c10] move fexceptions to compiler_flags (#80387)
Summary: Move `-fexceptions` out of the exported preprocessor flags and in to the libraries compiler flags. Apply the same changes to all rdeps of this library in the caffe2 subtree.

Test Plan:
Verify no rdeps are missing `-fexceptions` that have cpp sources:
```
% buck uquery 'kind(cxx*, rdeps(//xplat/caffe2/..., //xplat/caffe2/c10:c10, 1))' > /tmp/rdeps
% buck uquery '%Ss - attrfilter(preprocessor_flags, "-fexceptions", %Ss) - attrfilter(compiler_flags, "-fexceptions", %Ss)' @/tmp/rdeps
//xplat/pytorch_models/build/pytorch_dev_mobilenetv3/v1/nnc:asm
//xplat/pytorch_models/build/aot_test_model/v1/nnc:asm
//xplat/pytorch_models/build/pytorch_dev_linear/v1/nnc:asm
//xplat/pytorch_models/build/bi_bytedoc_nnc/v1/nnc:asm
//xplat/pytorch_models/build/bi_bytedoc_nnc/v2/nnc:asm
```

Differential Revision: D37386802

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80387
Approved by: https://github.com/linbinyu
2022-07-12 14:49:16 +00:00
7ea723b8f6 Updating miniz library from version 2.0.8 -> 2.1.0 (#79636)
Summary:
This PR updates the miniz library from version 2.0.8 to 2.1.0.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79636
Approved by: https://github.com/albanD
2022-06-22 15:02:16 +00:00
e21c0ac9a5 use exe/exepath in our genrules
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79626

Buck does not properly handle caching when the executable is
identified with `$(location ...)`. See
https://fb.workplace.com/groups/askbuck/posts/8600146743367198 for
more information.

Differential Revision: [D37179273](https://our.internmc.facebook.com/intern/diff/D37179273/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D37179273/)!

Approved by: https://github.com/malfet
2022-06-16 02:23:51 +00:00
86606fbe22 fix generate-code caching by indicating that the binary is an executable
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79625

Per Josiah Gaskin's followup on
https://www.internalfb.com/intern/qa/365579, using $(exe ...) instead
of $(location ...) should address the caching behavior.

@override-unit-failures
(Note: this ignores all push blocking failures!)

Differential Revision: [D36970846](https://our.internmc.facebook.com/intern/diff/D36970846/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36970846/)!

Approved by: https://github.com/malfet
2022-06-16 02:21:03 +00:00
adf8060600 add a new alias key for functional to view op decompositions
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79615

Approved by: https://github.com/zou3519
2022-06-15 23:18:09 +00:00
eb5751d84b move gen_aten and gen_aten_hip into shared build structure
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77751

This requires two changes to rule generation:
 * pulling the cpu static dispatch prediction into the rules
 * disabling the Bazel-style generated file aliases

Differential Revision: [D36481918](https://our.internmc.facebook.com/intern/diff/D36481918/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36481918/)!

Approved by: https://github.com/kit1980, https://github.com/seemethere
2022-06-15 18:22:52 +00:00
38350acf8f Autogen Tags enum, and allow specifying tags while defining an op
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79322

Approved by: https://github.com/albanD
2022-06-11 00:29:32 +00:00
7d12eecba1 move GENERATED_CPP_CUDA to caffe2/build.bzl
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77744

This is needed by gen_aten and it's immediate downstream libraries. As
such, it can live solely in the shared build structure.

Differential Revision: [D36480812](https://our.internmc.facebook.com/intern/diff/D36480812/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36480812/)!

Approved by: https://github.com/kit1980
2022-06-02 18:38:05 +00:00
7dc5b5bf10 move generated_srcs_list.bzl into caffe2/build.bzl
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77680

This is only used by ATen code generation and libraries. These are
about to move into the shared build structure, so let's move this
cleanly first.

Differential Revision: [D36455725](https://our.internmc.facebook.com/intern/diff/D36455725/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36455725/)!

Approved by: https://github.com/kit1980
2022-06-01 23:03:54 +00:00
02c4d877b4 Codegen Non-Native IR Nodes (#76535)
Add codegen infrastructure to generate IR nodes for non-native ops.

The proposed change is to add a `non_native` key to the `{backend}_native_functions.yaml` file that contains schema definitions similar to what is found in `native_functions.yaml`. e.g.
```
non_native:
    ...
    - func: expand(Tensor input, int[] size, bool is_scalar_expand) -> Tensor
    ...
```
these definitions are parsed into a `LazyIrSchema` that can be used for generating IR nodes using `GenLazyIR`.

Fixes #74628

CC: @wconstab @desertfire @henrytwo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76535
Approved by: https://github.com/wconstab
2022-05-24 19:29:23 +00:00
c2ff413622 move generated-autograd-headers to the shared build structure
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76183

This is a relatively simple target but we have to fix our header
expansion to understand generated files. Next step will be to use this
in Bazel.

Differential Revision: [D35820541](https://our.internmc.facebook.com/intern/diff/D35820541/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D35820541/)!

Approved by: https://github.com/dreiss, https://github.com/malfet
2022-05-19 04:31:56 +00:00
e517fc8b28 eliminate Bazel's libtorch_cpp_generated_sources
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76179

This list is redundant with the shared build structure.

Differential Revision: [D35818500](https://our.internmc.facebook.com/intern/diff/D35818500/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D35818500/)!

Approved by: https://github.com/dreiss
2022-05-17 03:46:49 +00:00
a013d83bf9 eliminate Bazel's libtorch_python_generated_sources
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76178

These contents are already identified in the shared build structure.

Differential Revision: [D35817999](https://our.internmc.facebook.com/intern/diff/D35817999/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D35817999/)!

Approved by: https://github.com/dreiss
2022-05-17 03:43:02 +00:00
7eaf4780ba Revert "[LT] Store OpKind for each IR subclass in a static field"
This reverts commit ac37ddc79557d7a5ec184a7d2924241ccfc8a333.

Reverted https://github.com/pytorch/pytorch/pull/76711 on behalf of https://github.com/malfet
2022-05-09 20:50:09 +00:00
ac37ddc795 [LT] Store OpKind for each IR subclass in a static field
Summary: Currently OpKind is stored as an object field called op_ for each IR
node, and one usage of op_ is to avoid dynamic_cast in NodeCast when we
need to downcast a base-node pointer into a concrete sub-node pointer.
As a result, we need to construct and pass in an op when downcasting
nodes, and this becomes quite anonnying when we start to implement the
trie-based IR node reusing. More importantly, the op for each subclass
should be unique for that subclass and thus making it a const static field
is a more logical design.

In this PR, we still keep the object-level op_ for easier XLA adoption. As
furture work, we can come back to remove op_, make the op() method
virtual, and get rid of OpKind in all the node constructors.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76711

Approved by: https://github.com/wconstab, https://github.com/JackCaoG
2022-05-06 19:14:46 +00:00
37fb636b7f fix package violation caused by D35587412 (#76808)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76808

This reached into aten/TARGETS in fbcode.
ghstack-source-id: 155484095

Test Plan: Verified manually.

Reviewed By: dreiss, malfet

Differential Revision: D36128458

fbshipit-source-id: c7447b3a40fe905993e799d211241e72183f8acb
(cherry picked from commit b68eb7a45d8973fadab2dfcafcbb0f63801abd40)
2022-05-05 23:39:03 +00:00
ac45fb9b93 switch Bazel to the shared generate-code genrule (#75790)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75790

We were building it before, but now we use it in downstream
rules. This enables us to eliminate the handwritten genrule.
ghstack-source-id: 155300051

Test Plan: Verified locally and in CI.

Reviewed By: dreiss

Differential Revision: D35645390

fbshipit-source-id: 478bb37a6ec295c232f66383babf46606e83ed5e
(cherry picked from commit 2822d4c5b48c6d9282149b2d43cf72d645237196)
2022-05-04 15:26:25 +00:00
096ff0ecca introduce new --gen-dir flag to generate_code and use it in fbcode (#75800)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75800

This leads to more similarities between OSS CMake and eventually OSS
Bazel. We will be able to generate files with the same names and not
have different file lists between the builds.
ghstack-source-id: 155300043

Test Plan: Verified locally and in CI.

Reviewed By: dreiss

Differential Revision: D35648586

fbshipit-source-id: 9f1638b5665ebcc64466883f65ef24a2bfd05228
(cherry picked from commit 7f2acff1baa8dfafddefdc720714f8d39feda436)
2022-05-04 15:26:25 +00:00
401179f263 disable the //:generate-code target in Bazel (#76174)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76174

This is about to conflict with the existing Bazel codegen
outputs. Switch to it atomically.
ghstack-source-id: 155029309

Test Plan: Verify manually and rely on CI.

Reviewed By: dreiss

Differential Revision: D35815288

fbshipit-source-id: 8b35e7baeb8572aef13c07cac689ee84dc7335d5
(cherry picked from commit 6dde9831a30fcf664b73fccaa51e30a7049b3251)
2022-05-03 12:13:19 +00:00
eb27c85160 move generate-code into shared build structure (#75699)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75699

ghstack-source-id: 155255334

Test Plan: Rely on CI.

Reviewed By: dreiss

Differential Revision: D35587412

fbshipit-source-id: 5ab79c07029de279a1fae36519654a73bb61d430
(cherry picked from commit 4896b72a6c0cc087e36889d21d2d885009d94a6d)
2022-05-03 09:53:37 +00:00
8b1cf8ed6b move version_h to shared build structure in Buck (#75964)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75964

This is already in the shared build structure for Bazel, but we need
to implement genrule for fbcode.

There's an xplat target that can't build in fbcode yet because the
dependencies don't line up, so we have to add a tag to exclude it.

ghstack-source-id: 154696020

Test Plan: Rely on CI

Reviewed By: malfet

Differential Revision: D35443900

fbshipit-source-id: 0768b29906c8218d7aebfdc7c18d69f59a0c9384
(cherry picked from commit bff47be441bd142392a07aa177be02e18aa86f1c)
2022-04-26 12:06:09 +00:00
f4200600e4 move Bazel version header generation to shared build structure (#75332)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75332

ghstack-source-id: 154678044

Test Plan: Rely on OSS CI.

Reviewed By: malfet

Differential Revision: D35434229

fbshipit-source-id: 7cdd33fa32d0c485f44477e414c24c9bc4b74963
(cherry picked from commit 60285c613e8703c52f36f0bf1178e35c04574ffa)
2022-04-25 17:51:30 +00:00
d78dd825ba define the caffe2_serialize target in Bazel (#75942)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75942

This also requires changes to the target definition and the xplat translator to get it working.
ghstack-source-id: 154678046

Test Plan: Verify locally and rely on CI.

Reviewed By: malfet

Differential Revision: D35704597

fbshipit-source-id: 6b0d9f5a044609b24dda656f80233ba6186c097f
(cherry picked from commit 6de43c5ca7a973c9f8b71f4d60d4d5e85cc2ba21)
2022-04-25 16:14:05 +00:00
a5b4839f35 Move //xplat/caffe2:caffe2_serialize to shared build structure
Summary: This is a first step to migrate xplat targets to shared build structure. Eventually both xplat buck and open source bazel targets will be generated from shared build.bzl.

Test Plan: Should be no-op, rely on CI.

Reviewed By: malfet

Differential Revision: D35270004

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75847
Approved by: https://github.com/linbinyu
2022-04-15 17:25:29 +00:00