02fee6caec
Revert "Change ATEN generator argument type to const std::optional<Generator>& ( #120076 )"
...
This reverts commit ecbe82b9cec75324b7efb58e1d9cae6b35b71bdc.
Reverted https://github.com/pytorch/pytorch/pull/120076 on behalf of https://github.com/jeanschmidt due to Reverting in order to check if this will fix XLA trunk jobs ([comment](https://github.com/pytorch/pytorch/pull/120076#issuecomment-2015272644 ))
2024-03-22 14:53:45 +00:00
ecbe82b9ce
Change ATEN generator argument type to const std::optional<Generator>& ( #120076 )
...
This PR proposes to use std::optional<Generator>& for underlying functions to avoid unnecessary copy and move operations. The torchgen code was changed to generate the new type.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120076
Approved by: https://github.com/malfet
2024-03-22 03:49:31 +00:00
cd6bfc7965
Proper view support for jagged layout NestedTensor ( #113279 )
...
This PR:
* Introduces an ATen op for creating true jagged views from a dense values buffer
* `_nested_view_from_jagged(values, offsets, lengths, ragged_idx, dummy)`
* This ops is implemented on the Python side using torch.library so we can return a subclass instance
* `jagged_from_list()` now uses this instead of the old autograd.Function `NestedViewFromBuffer`
* The latter op is used for non-contiguous JTs returned via `torch.nested.narrow()`
* `dummy` is an awful hack to ensure that `NestedTensor.__torch_dispatch__()` is invoked for our view
* Introduces an ATen op for accessing the `values` component of an NT via a view
* `_nested_get_values(nt)`
* **Removes** the autograd.Functions `ViewNestedFromBuffer` and `ViewBufferFromNested` in favor of `nested_from_values_offsets()` / `nested_from_values_offsets_lengths()` and `nt.values()`, respectively.
* Changes test code to prefer `as_nested_tensor()` over `jagged_from_list()` directly
* Similarly, avoid `buffer_from_jagged()`, preferring `values()`
* Depends on general subclass view fake-ification on the PT2 side (handled solely in previous PRs in the stack)
With these changes, the semantics of jagged layout NTs are such that they are considered a true view of the underlying `values` buffer. This means views of jagged NTs are views of the underlying buffer as well, simplifying some handling.
Differential Revision: [D54269922](https://our.internmc.facebook.com/intern/diff/D54269922 )
Co-authored-by: voznesenskym <voznesenskym@gmail.com >
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113279
Approved by: https://github.com/ezyang
2024-03-22 02:12:36 +00:00
224beecee6
Revert "Proper view support for jagged layout NestedTensor ( #113279 )"
...
This reverts commit 5855c490f09a028bfdfefea8b93c9833eb55dc5c.
Reverted https://github.com/pytorch/pytorch/pull/113279 on behalf of https://github.com/jbschlosser due to Need to fix BC thing ([comment](https://github.com/pytorch/pytorch/pull/113279#issuecomment-2013899762 ))
2024-03-21 22:03:01 +00:00
5855c490f0
Proper view support for jagged layout NestedTensor ( #113279 )
...
This PR:
* Introduces an ATen op for creating true jagged views from a dense values buffer
* `_nested_view_from_jagged(values, offsets, lengths, ragged_idx, dummy)`
* This ops is implemented on the Python side using torch.library so we can return a subclass instance
* `jagged_from_list()` now uses this instead of the old autograd.Function `NestedViewFromBuffer`
* The latter op is used for non-contiguous JTs returned via `torch.nested.narrow()`
* `dummy` is an awful hack to ensure that `NestedTensor.__torch_dispatch__()` is invoked for our view
* Introduces an ATen op for accessing the `values` component of an NT via a view
* `_nested_get_values(nt)`
* **Removes** the autograd.Functions `ViewNestedFromBuffer` and `ViewBufferFromNested` in favor of `nested_from_values_offsets()` / `nested_from_values_offsets_lengths()` and `nt.values()`, respectively.
* Changes test code to prefer `as_nested_tensor()` over `jagged_from_list()` directly
* Similarly, avoid `buffer_from_jagged()`, preferring `values()`
* Depends on general subclass view fake-ification on the PT2 side (handled solely in previous PRs in the stack)
With these changes, the semantics of jagged layout NTs are such that they are considered a true view of the underlying `values` buffer. This means views of jagged NTs are views of the underlying buffer as well, simplifying some handling.
Differential Revision: [D54269922](https://our.internmc.facebook.com/intern/diff/D54269922 )
Co-authored-by: voznesenskym <voznesenskym@gmail.com >
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113279
Approved by: https://github.com/ezyang
2024-03-20 23:45:34 +00:00
c0996866f4
Revert "Change ATEN generator argument type to const std::optional<Generator>& ( #120076 )"
...
This reverts commit 4305c64fea154ee1ab566e19bd7568753fc30916.
Reverted https://github.com/pytorch/pytorch/pull/120076 on behalf of https://github.com/izaitsevfb due to breaking internal builds(take 3) ([comment](https://github.com/pytorch/pytorch/pull/120076#issuecomment-1986338164 ))
2024-03-08 20:01:03 +00:00
4305c64fea
Change ATEN generator argument type to const std::optional<Generator>& ( #120076 )
...
This PR proposes to use std::optional<Generator>& for underlying functions to avoid unnecessary copy and move operations. The torchgen code was changed to generate the new type.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120076
Approved by: https://github.com/malfet
2024-03-07 09:52:21 +00:00
c3496d50f0
Fix torch.return_types init signature ( #119284 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119284
Approved by: https://github.com/peterbell10 , https://github.com/XuehaiPan
2024-02-23 21:52:34 +00:00
5c46600f84
[RELAND] refactor lazy init to device-agnostic ( #119248 )
...
# Motivation
This PR intends to extend `cuda_lazy_init` to `device_lazy_init` which is a device-agnostic API that can support any backend. And change `maybe_initialize_cuda` to `maybe_initialize_device` to support lazy initialization for CUDA while maintaining scalability.
# Design
We maintain a flag for each backend to manage the lazy initialization state separately.
# Additional Context
No need more UTs.
This is a reland PR, the original PR is [refactor lazy init to device-agnostic](https://github.com/pytorch/pytorch/pull/118846 ).
This is a common PR, and does not trigger xpu ciflow.
Differential Revision: [D53478332](https://our.internmc.facebook.com/intern/diff/D53478332 )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119248
Approved by: https://github.com/EikanWang , https://github.com/gujinghui , https://github.com/jgong5 , https://github.com/atalman
2024-02-07 15:58:51 +00:00
ab613a4019
Revert "refactor lazy init to device-agnostic ( #118846 )"
...
This reverts commit 520771d7b35034c96c5b4604ecf8960e6aab856f.
Reverted https://github.com/pytorch/pytorch/pull/118846 on behalf of https://github.com/atalman due to Failing, tests https://github.com/pytorch/torchdistx/blob/main/src/python/torchdistx/_C/fake.cc#L11 ([comment](https://github.com/pytorch/pytorch/pull/118846#issuecomment-1927651305 ))
2024-02-05 18:06:30 +00:00
520771d7b3
refactor lazy init to device-agnostic ( #118846 )
...
# Motivation
This PR intends to extend `cuda_lazy_init` to `device_lazy_init` which is a device-agnostic API that can support any backend. And change `maybe_initialize_cuda` to `maybe_initialize_device` to support lazy initialization for CUDA while maintaining scalability.
# Design
We maintain a flag for each backend to manage the lazy initialization state separately.
# Additional Context
No need more UTs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118846
Approved by: https://github.com/malfet
2024-02-02 12:10:39 +00:00
1562dae62c
[BE]: Apply RUF025 dict.fromkeys preview rule ( #118637 )
...
Simplifies and optimizes dict construction using the `fromkeys` classmethod ctor. This also makes it really obvious when all the keys will have the same static value, which could be a bug if unintentional. It is also significantly faster than using a dict comprehension. The rule is in preview, but I am adding a forward fix for when it becomes stable.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118637
Approved by: https://github.com/albanD
2024-01-30 20:46:54 +00:00
46712b019d
Enable local_partial_types ( #118467 )
...
When using dmypy, this setting is enabled and cannot be turned off. Force it for regular mypy too.
Signed-off-by: Edward Z. Yang <ezyang@meta.com >
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118467
Approved by: https://github.com/Skylion007
ghstack dependencies: #118414 , #118418 , #118432
2024-01-28 13:38:22 +00:00
24133e44b1
Fix return type hint for list types ( #118238 )
...
All single element list types are `Tensor[]` so they will always be Tuple.
I don't know of any way to easily access the pyi type and compare that to a real run so no testing here :(
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118238
Approved by: https://github.com/ezyang
2024-01-25 23:35:20 +00:00
16d69290c6
Use view name instead of view_copy name for functional inverses ( #117056 )
...
Ex: `unsqueeze_copy_inverse()` -> `unsqueeze_inverse()`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117056
Approved by: https://github.com/bdhirsh
2024-01-10 00:52:36 +00:00
52f0457d7d
Support view returns for functional inverses on narrowing views ( #115893 )
...
Part 1 of implementation for general [subclass view fake-ification](https://docs.google.com/document/d/1C5taWiplmX7nKiURXDOAZG2W5VNJ2iV0fQFq92H0Cxw ).
The following functional inverses are currently implemented scatter-style and thus never return views:
* `as_strided_copy_inverse()`
* `diagonal_copy_inverse()`
* `expand_copy_inverse()`
* `select_copy_int_inverse()`
* `slice_copy_Tensor_inverse()`
* `split_copy_Tensor_inverse()`
* `split_with_sizes_copy_inverse()`
* `unbind_copy_int_inverse()`
* `unfold_copy_inverse()`
We need to get actual views for the introduction of reverse view funcs coming next.
Details:
* Use `as_strided()` to implement actual view inverses for the above
* Assumes we're given a mutated_view that is actually part of a bigger storage; this isn't really the case for functionalization
* Introduce `InverseReturnMode` enum for customization of functional inverses
* `AlwaysView` - always return an actual view; needed for reverse view_funcs()
* `NeverView` - always do a copy; useful for certain functionalization use cases (e.g. XLA, executorch)
* `ViewOrScatterInverse` - return an actual view in most cases, but prefer scatter inverses when they exist. this avoids the need to implement `as_strided()` for subclasses, which can be difficult or impossible
* Make sure functionalization works as before
* Use `ViewOrScatterInverse` when reapply_views TLS is True or `NeverView` otherwise
* Adds tests to ensure old behavior for above inverses **in functionalization**
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115893
Approved by: https://github.com/bdhirsh
2023-12-21 21:39:22 +00:00
ee5d981249
[BE]: Enable RUFF PERF402 and apply fixes ( #115505 )
...
* Enable PERF402. Makes code more efficient and succinct by removing useless list copies that could be accomplished either via a list constructor or extend call. All test cases have noqa added since performance is not as sensitive in that folder.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115505
Approved by: https://github.com/malfet
2023-12-20 18:01:24 +00:00
99554112d3
[pytorch] add namespace for optTypeMetaToScalarType in codegen to avoid not declared when compile ( #115623 )
...
Fixes compilation failure in some environment.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115623
Approved by: https://github.com/albanD
2023-12-13 00:59:01 +00:00
7fc292930c
Add support for torch.Generator
type in TorchScript ( #110413 )
...
- Add support for `torch.Generator` type in TorchScript
- Add `generator` args to all `torch.nn.init` functions that call `uniform_` or `normal_`
- Add support for `torch.Generator` in LTC's TorchScript backend (CC: @wconstab)
CC: @eellison @davidberard98 @GlebKazantaev @behzad-a
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110413
Approved by: https://github.com/wconstab , https://github.com/albanD , https://github.com/glebk-cerebras , https://github.com/davidberard98
2023-11-21 23:07:21 +00:00
8c4812be80
Replace expect_int with guard_int ( #113921 )
...
The idea is that instead of erroring, we will just specialize at these sites.
Fixes https://github.com/pytorch/pytorch/issues/113142
Signed-off-by: Edward Z. Yang <ezyang@meta.com >
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113921
Approved by: https://github.com/zou3519
2023-11-20 21:27:48 +00:00
dbb96ef30d
improve annotation device parameters where a device ordinal is allowed ( #113647 )
...
Using mypy in code that depends on pytorch, I noticed that the type annotation doesn't allow a device ordinal.
`error: Argument "device" to "to_empty" of "Module" has incompatible type "int"; expected "str | device" [arg-type]`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113647
Approved by: https://github.com/albanD
2023-11-17 14:41:22 +00:00
deec2380c7
Add 0dim Tensor overload for _foreach_div ( #113688 )
...
This PR is ALMOST basically just following the steps from #106677 EXCEPT! We do add one feature. Similar to fused_adam(w), for the CUDA dispatches: when the scalar tensor is on CPU, we .item and redispatch to the normal scalar overload. Otherwise, the cuda kernel will complain about mismatch in devices between the scalar and the tensors.
Why do we add this feature? Our optimizers want to allow lr as a tensor, and lr could be a CPU tensor. lr is used with foreach_div_ in Adam, so our CI will break otherwise.
After this PR, `_foreach_mul` and `_foreach_div` will accept either a CPU or a GPU tensor for the scalar tensor (vs only a GPU tensor). They join the ranks of `fused_adam(w)` in this characteristic. I did not yet do the same thing for foreach_add (the only other foreach op with a .Tensor overload) because there is no use case and will be more involved.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113688
Approved by: https://github.com/mlazos , https://github.com/albanD
2023-11-15 20:59:32 +00:00
6c187246d6
Add support for float8_e4m3fnuz and _e5m2fnuz ( #107586 )
...
This PR relates to the feature in [this feature submission](https://docs.google.com/document/d/1pF2T1xz54IPg1jG7FhykbrpbcJZVelQw0v8vBaoLkfs/edit ). It has been based on #104242 which adds similar float8 types.
These new types added in this PR are described in the paper at https://arxiv.org/abs/2206.02915 . A brief description and comparison of the types with other float8 types can be also found in the [OpenXLA RFC](https://github.com/openxla/stablehlo/blob/main/rfcs/20230321-fp8_fnuz.md ).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107586
Approved by: https://github.com/seemethere , https://github.com/malfet
2023-11-15 15:01:11 +00:00
252e68a83b
Revert "Add support for torch.Generator
type in TorchScript ( #110413 )"
...
This reverts commit 54493fe8c4b1cca4c5ff993b99eb3e3dbc984226.
Reverted https://github.com/pytorch/pytorch/pull/110413 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is, unfortunately, still breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/110413#issuecomment-1811625557 ))
2023-11-15 00:51:23 +00:00
54493fe8c4
Add support for torch.Generator
type in TorchScript ( #110413 )
...
- Add support for `torch.Generator` type in TorchScript
- Add `generator` args to all `torch.nn.init` functions that call `uniform_` or `normal_`
- Add support for `torch.Generator` in LTC's TorchScript backend (CC: @wconstab)
CC: @eellison @davidberard98 @GlebKazantaev @behzad-a
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110413
Approved by: https://github.com/wconstab , https://github.com/albanD , https://github.com/glebk-cerebras , https://github.com/davidberard98
2023-11-13 23:18:14 +00:00
9a28a7b498
Revert "Add support for torch.Generator
type in TorchScript ( #110413 )"
...
This reverts commit 27e31ab6e86259b27d816d6fb6e7a69de526a0e4.
Reverted https://github.com/pytorch/pytorch/pull/110413 on behalf of https://github.com/PaliC due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/110413#issuecomment-1799003164 ))
2023-11-07 15:53:32 +00:00
27e31ab6e8
Add support for torch.Generator
type in TorchScript ( #110413 )
...
- Add support for `torch.Generator` type in TorchScript
- Add `generator` args to all `torch.nn.init` functions that call `uniform_` or `normal_`
- Add support for `torch.Generator` in LTC's TorchScript backend (CC: @wconstab)
CC: @eellison @davidberard98 @GlebKazantaev @behzad-a
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110413
Approved by: https://github.com/wconstab , https://github.com/albanD , https://github.com/glebk-cerebras , https://github.com/davidberard98
2023-11-06 21:27:02 +00:00
19e9f5cc7b
[torchgen] Add support for optional tensor ( #112938 )
...
Summary: As titled
Test Plan: rely on CI
Differential Revision: D50997957
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112938
Approved by: https://github.com/Skylion007
2023-11-06 20:03:05 +00:00
1ad0f0b308
[BE]: remove unnecessary enumerate calls ( #111690 )
...
Remove unnecessary enumerate calls, entirely automated fixes so probably reasonably low risk.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111690
Approved by: https://github.com/malfet
2023-10-20 23:20:29 +00:00
ca7d084ff9
Add ScalarTensor or 0dim overload for _foreach_add ( #111079 )
...
Adding a Tensor overload will allow us to:
- optimize in more cases than before
- increase coverage for scalarTensor instead of just scalars in our foreach APIs
The main complication in this PR was that add.Tensor has a scalar overload, so I've now built out support for that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111079
Approved by: https://github.com/albanD
2023-10-20 01:34:07 +00:00
ac48c11ab7
Fix typo under torchgen directory ( #111154 )
...
This PR fixes typo in comments and messages in files under `torchgen` directory.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111154
Approved by: https://github.com/rajveer43 , https://github.com/Skylion007
2023-10-13 16:43:46 +00:00
dede1e96e2
[BE] Enable Ruff's Flake8 PYI018 ( #111101 )
...
Enable [unused-private-type-var (PYI018)](https://docs.astral.sh/ruff/rules/unused-private-type-var/#unused-private-type-var-pyi018 )
Link: #110950
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111101
Approved by: https://github.com/albanD
2023-10-12 16:26:21 +00:00
6a974bec5d
Change flash attention outputs to be SymInt instead of int ( #110533 )
...
Fixes https://github.com/pytorch/pytorch/issues/110322
Signed-off-by: Edward Z. Yang <ezyang@meta.com >
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110533
Approved by: https://github.com/albanD
2023-10-05 01:00:07 +00:00
053367b1ed
fix: flake8-bugbear code B024 ( #107265 )
...
See #106571 item B024
This fix concerns the addition of `abstractmethod` to methods declared inside abstract classes.
Should I also include PEP8 compliant reformatting on the files I had to modify ?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107265
Approved by: https://github.com/kit1980
2023-10-04 23:52:52 +00:00
d9fb7166d6
[BE] use DeviceIndex instead of int64_t for related device interfaces ( #103068 )
...
This PR unifies the device interfaces in aten/*cpp and torch/csrc/*cpp to use **c10::DeviceIndex**.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103068
Approved by: https://github.com/malfet
2023-08-25 20:16:14 +00:00
5814380e7b
Revert "Revert "Reland "Add forward mode AD to out-place foreach functions ( #102409 ) ( #106043 )""" ( #106320 )
...
Fixed a typo specifying the number of tensors and elements in the test having failed in slow gradcheck
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106320
Approved by: https://github.com/soulitzer
2023-08-18 23:01:42 +00:00
2b427ae3a7
Revert "Reland "Add forward mode AD to out-place foreach functions ( #102409 ) ( #106043 )"
...
This reverts commit e773f28ee307e2a246a4b765f3a51117661b45ba.
Reverted https://github.com/pytorch/pytorch/pull/106043 on behalf of https://github.com/DanilBaibak due to Break slow tests ([comment](https://github.com/pytorch/pytorch/pull/106043#issuecomment-1658642734 ))
2023-07-31 15:50:36 +00:00
e773f28ee3
Reland "Add forward mode AD to out-place foreach functions ( #102409 ) ( #106043 )
...
forward-mode AD of out-of-place foreach functions, finally.
rel:
- #102409
- #105504
- #58833
- #100695
---
# Generated Foreach
```c++
::std::vector<at::Tensor> _foreach_sinh(c10::DispatchKeySet ks, at::TensorList self) {
auto self_ = unpack(self, "self", 0);
[[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self );
std::vector<bool> _any_has_forward_grad_result(self.size());
for (const auto& i : c10::irange(self.size())) {
_any_has_forward_grad_result[i] = isFwGradDefined(self[i]);
}
std::shared_ptr<ForeachSinhBackward0> grad_fn;
if (_any_requires_grad) {
grad_fn = std::shared_ptr<ForeachSinhBackward0>(new ForeachSinhBackward0(), deleteNode);
grad_fn->set_next_edges(collect_next_edges( self ));
grad_fn->self_ = make_saved_variable_list(self);
grad_fn->self_size_ = self.size();
}
#ifndef NDEBUG
std::vector<c10::optional<Storage>> self__storage_saved(self_.size());
for (const Tensor& tensor : self_)
self__storage_saved.push_back(
tensor.has_storage() ? c10::optional<Storage>(tensor.storage()) : c10::nullopt);
std::vector<c10::intrusive_ptr<TensorImpl>> self__impl_saved(self_.size());
for (size_t i=0; i<self_.size(); i++)
if (self_[i].defined()) self__impl_saved[i] = self_[i].getIntrusivePtr();
#endif
auto _tmp = ([&]() {
at::AutoDispatchBelowADInplaceOrView guard;
return at::redispatch::_foreach_sinh(ks & c10::after_autograd_keyset, self_);
})();
auto result = std::move(_tmp);
#ifndef NDEBUG
for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) {
if (self__storage_saved[i].has_value() && !at::impl::tensorlist_has_dispatch(self_))
TORCH_INTERNAL_ASSERT(self__storage_saved[i].value().is_alias_of(self_[i].storage()));
}
for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) {
if (self__impl_saved[i] && !at::impl::tensorlist_has_dispatch(self_))
TORCH_INTERNAL_ASSERT(self__impl_saved[i] == self_[i].getIntrusivePtr());
}
#endif
if (grad_fn) {
set_history(flatten_tensor_args( result ), grad_fn);
}
std::vector<c10::optional<at::Tensor>> result_new_fw_grad_opts(self.size(), c10::nullopt);
for (const auto& i : c10::irange(result_new_fw_grad_opts.size())) {
if (_any_has_forward_grad_result[i]) {
auto self_t_raw = toNonOptFwGrad(self[i]);
auto self_tensor = toNonOptTensor(self[i]);
auto self_t = (self_t_raw.defined() || !self_tensor.defined())
? self_t_raw : at::_efficientzerotensor(self_tensor.sizes(), self_tensor.options());
auto self_p = toNonOptPrimal(self[i]);
result_new_fw_grad_opts[i] = (self_t.conj() * self_p.cosh().conj()).conj();
}
}
for (const auto& i : c10::irange(result_new_fw_grad_opts.size())) {
auto& result_new_fw_grad_opt = result_new_fw_grad_opts[i];
if (result_new_fw_grad_opt.has_value() && result_new_fw_grad_opt.value().defined() && result[i].defined()) {
// The hardcoded 0 here will need to be updated once we support multiple levels.
result[i]._set_fw_grad(result_new_fw_grad_opt.value(), /* level */ 0, /* is_inplace_op */ false);
}
}
return result;
}
::std::vector<at::Tensor> _foreach_norm_Scalar(c10::DispatchKeySet ks, at::TensorList self, const at::Scalar & ord) {
auto self_ = unpack(self, "self", 0);
[[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self );
std::vector<bool> _any_has_forward_grad_result(self.size());
for (const auto& i : c10::irange(self.size())) {
_any_has_forward_grad_result[i] = isFwGradDefined(self[i]);
}
std::shared_ptr<ForeachNormBackward0> grad_fn;
if (_any_requires_grad) {
grad_fn = std::shared_ptr<ForeachNormBackward0>(new ForeachNormBackward0(), deleteNode);
grad_fn->set_next_edges(collect_next_edges( self ));
grad_fn->ord = ord;
grad_fn->self_ = make_saved_variable_list(self);
grad_fn->self_size_ = self.size();
}
#ifndef NDEBUG
std::vector<c10::optional<Storage>> self__storage_saved(self_.size());
for (const Tensor& tensor : self_)
self__storage_saved.push_back(
tensor.has_storage() ? c10::optional<Storage>(tensor.storage()) : c10::nullopt);
std::vector<c10::intrusive_ptr<TensorImpl>> self__impl_saved(self_.size());
for (size_t i=0; i<self_.size(); i++)
if (self_[i].defined()) self__impl_saved[i] = self_[i].getIntrusivePtr();
#endif
auto _tmp = ([&]() {
at::AutoDispatchBelowADInplaceOrView guard;
return at::redispatch::_foreach_norm(ks & c10::after_autograd_keyset, self_, ord);
})();
auto result = std::move(_tmp);
#ifndef NDEBUG
for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) {
if (self__storage_saved[i].has_value() && !at::impl::tensorlist_has_dispatch(self_))
TORCH_INTERNAL_ASSERT(self__storage_saved[i].value().is_alias_of(self_[i].storage()));
}
for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) {
if (self__impl_saved[i] && !at::impl::tensorlist_has_dispatch(self_))
TORCH_INTERNAL_ASSERT(self__impl_saved[i] == self_[i].getIntrusivePtr());
}
#endif
if (grad_fn) {
set_history(flatten_tensor_args( result ), grad_fn);
}
std::vector<c10::optional<at::Tensor>> result_new_fw_grad_opts(self.size(), c10::nullopt);
for (const auto& i : c10::irange(result_new_fw_grad_opts.size())) {
if (_any_has_forward_grad_result[i]) {
auto self_t_raw = toNonOptFwGrad(self[i]);
auto self_tensor = toNonOptTensor(self[i]);
auto self_t = (self_t_raw.defined() || !self_tensor.defined())
? self_t_raw : at::_efficientzerotensor(self_tensor.sizes(), self_tensor.options());
auto self_p = toNonOptPrimal(self[i]);
result_new_fw_grad_opts[i] = norm_jvp(self_p, self_t, ord, result[i]);
}
}
for (const auto& i : c10::irange(result_new_fw_grad_opts.size())) {
auto& result_new_fw_grad_opt = result_new_fw_grad_opts[i];
if (result_new_fw_grad_opt.has_value() && result_new_fw_grad_opt.value().defined() && result[i].defined()) {
// The hardcoded 0 here will need to be updated once we support multiple levels.
result[i]._set_fw_grad(result_new_fw_grad_opt.value(), /* level */ 0, /* is_inplace_op */ false);
}
}
if (grad_fn) {
grad_fn->result = result;
}
return result;
}
```
# Reference
```c++
at::Tensor sinh(c10::DispatchKeySet ks, const at::Tensor & self) {
auto& self_ = unpack(self, "self", 0);
[[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self );
[[maybe_unused]] auto _any_has_forward_grad_result = (isFwGradDefined(self));
std::shared_ptr<SinhBackward0> grad_fn;
if (_any_requires_grad) {
grad_fn = std::shared_ptr<SinhBackward0>(new SinhBackward0(), deleteNode);
grad_fn->set_next_edges(collect_next_edges( self ));
grad_fn->self_ = SavedVariable(self, false);
}
#ifndef NDEBUG
c10::optional<Storage> self__storage_saved =
self_.has_storage() ? c10::optional<Storage>(self_.storage()) : c10::nullopt;
c10::intrusive_ptr<TensorImpl> self__impl_saved;
if (self_.defined()) self__impl_saved = self_.getIntrusivePtr();
#endif
auto _tmp = ([&]() {
at::AutoDispatchBelowADInplaceOrView guard;
return at::redispatch::sinh(ks & c10::after_autograd_keyset, self_);
})();
auto result = std::move(_tmp);
#ifndef NDEBUG
if (self__storage_saved.has_value() &&
!at::impl::dispatch_mode_enabled() &&
!at::impl::tensor_has_dispatch(self_))
TORCH_INTERNAL_ASSERT(self__storage_saved.value().is_alias_of(self_.storage()));
if (self__impl_saved && !at::impl::dispatch_mode_enabled() && !at::impl::tensor_has_dispatch(self_))
TORCH_INTERNAL_ASSERT(self__impl_saved == self_.getIntrusivePtr());
if (result.has_storage() && !at::impl::dispatch_mode_enabled() && !at::impl::tensor_has_dispatch(result)) {
TORCH_INTERNAL_ASSERT(result.storage().use_count() == 1, "function: sinh");
}
if (!at::impl::dispatch_mode_enabled() && !at::impl::tensor_has_dispatch(result))
TORCH_INTERNAL_ASSERT(result.use_count() <= 1, "function: sinh");
#endif
if (grad_fn) {
set_history(flatten_tensor_args( result ), grad_fn);
}
c10::optional<at::Tensor> result_new_fw_grad_opt = c10::nullopt;
if (_any_has_forward_grad_result && (result.defined())) {
auto self_t_raw = toNonOptFwGrad(self);
auto self_tensor = toNonOptTensor(self);
auto self_t = (self_t_raw.defined() || !self_tensor.defined())
? self_t_raw : at::_efficientzerotensor(self_tensor.sizes(), self_tensor.options());
auto self_p = toNonOptPrimal(self);
result_new_fw_grad_opt = (self_t.conj() * self_p.cosh().conj()).conj();
}
if (result_new_fw_grad_opt.has_value() && result_new_fw_grad_opt.value().defined() && result.defined()) {
// The hardcoded 0 here will need to be updated once we support multiple levels.
result._set_fw_grad(result_new_fw_grad_opt.value(), /* level */ 0, /* is_inplace_op */ false);
}
return result;
}
at::Tensor norm_Scalar(c10::DispatchKeySet ks, const at::Tensor & self, const at::Scalar & p) {
auto& self_ = unpack(self, "self", 0);
[[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self );
[[maybe_unused]] auto _any_has_forward_grad_result = (isFwGradDefined(self));
std::shared_ptr<NormBackward0> grad_fn;
if (_any_requires_grad) {
grad_fn = std::shared_ptr<NormBackward0>(new NormBackward0(), deleteNode);
grad_fn->set_next_edges(collect_next_edges( self ));
grad_fn->p = p;
grad_fn->self_ = SavedVariable(self, false);
}
#ifndef NDEBUG
c10::optional<Storage> self__storage_saved =
self_.has_storage() ? c10::optional<Storage>(self_.storage()) : c10::nullopt;
c10::intrusive_ptr<TensorImpl> self__impl_saved;
if (self_.defined()) self__impl_saved = self_.getIntrusivePtr();
#endif
auto _tmp = ([&]() {
at::AutoDispatchBelowADInplaceOrView guard;
return at::redispatch::norm(ks & c10::after_autograd_keyset, self_, p);
})();
auto result = std::move(_tmp);
#ifndef NDEBUG
if (self__storage_saved.has_value() &&
!at::impl::dispatch_mode_enabled() &&
!at::impl::tensor_has_dispatch(self_))
TORCH_INTERNAL_ASSERT(self__storage_saved.value().is_alias_of(self_.storage()));
if (self__impl_saved && !at::impl::dispatch_mode_enabled() && !at::impl::tensor_has_dispatch(self_))
TORCH_INTERNAL_ASSERT(self__impl_saved == self_.getIntrusivePtr());
if (result.has_storage() && !at::impl::dispatch_mode_enabled() && !at::impl::tensor_has_dispatch(result)) {
TORCH_INTERNAL_ASSERT(result.storage().use_count() == 1, "function: norm_Scalar");
}
if (!at::impl::dispatch_mode_enabled() && !at::impl::tensor_has_dispatch(result))
TORCH_INTERNAL_ASSERT(result.use_count() <= 1, "function: norm_Scalar");
#endif
if (grad_fn) {
set_history(flatten_tensor_args( result ), grad_fn);
}
throw_error_for_complex_autograd(result, "norm");
c10::optional<at::Tensor> result_new_fw_grad_opt = c10::nullopt;
if (_any_has_forward_grad_result && (result.defined())) {
auto self_t_raw = toNonOptFwGrad(self);
auto self_tensor = toNonOptTensor(self);
auto self_t = (self_t_raw.defined() || !self_tensor.defined())
? self_t_raw : at::_efficientzerotensor(self_tensor.sizes(), self_tensor.options());
auto self_p = toNonOptPrimal(self);
result_new_fw_grad_opt = norm_jvp(self_p, self_t, p, result);
}
if (result_new_fw_grad_opt.has_value() && result_new_fw_grad_opt.value().defined() && result.defined()) {
// The hardcoded 0 here will need to be updated once we support multiple levels.
result._set_fw_grad(result_new_fw_grad_opt.value(), /* level */ 0, /* is_inplace_op */ false);
}
if (grad_fn) {
grad_fn->result_ = SavedVariable(result, true);
}
return result;
}
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106043
Approved by: https://github.com/soulitzer
2023-07-27 03:13:24 +00:00
70b0f1b248
fix some typos ( #106018 )
...
Fixes #ISSUE_NUMBER
Fix typos in `test_static_module.cc`, `backend_cutting_test.cc` and `types_base.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106018
Approved by: https://github.com/awgu
2023-07-26 18:14:44 +00:00
4cc1745b13
[BE] f-stringify torch/ and scripts ( #105538 )
...
This PR is a follow up on the pyupgrade series to convert more strings to use f-strings using `flynt`.
- https://docs.python.org/3/reference/lexical_analysis.html#f-strings
- https://pypi.org/project/flynt/
Command used:
```
flynt torch/ -ll 120
flynt scripts/ -ll 120
flynt tools/ -ll 120
```
and excluded `collect_env.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105538
Approved by: https://github.com/ezyang , https://github.com/malfet
2023-07-21 19:35:24 +00:00
b64bd4a5dd
Add torch.float8_e5m2 and torch.float8_e4m3 data types ( #104242 )
...
Proposal of two float8 variants - e5m2 and e4m3 - based on https://arxiv.org/pdf/2209.05433.pdf
Hide all Float8 operator implementations behind `#if !defined(C10_MOBILE)` guard to keep Android build size almost unchanged
TODO:
- Refactor duplicated code
- Cleanup unbalanced pragma pop in dtype utils
- Add native implementation on the CUDA size
Co-authored-by: Nikita Shulga <nshulga@meta.com >
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104242
Approved by: https://github.com/albanD
2023-07-20 16:09:11 +00:00
f2b15772ff
Revert "Add torch.float8_e5m2 and torch.float8_e4m3 data types ( #104242 )"
...
This reverts commit a9804130e5a9a982d82934fa9702abd08d6903ce.
Reverted https://github.com/pytorch/pytorch/pull/104242 on behalf of https://github.com/PaliC due to breaks lint (run lintrunner and remerge) ([comment](https://github.com/pytorch/pytorch/pull/104242#issuecomment-1644150284 ))
2023-07-20 15:37:53 +00:00
a9804130e5
Add torch.float8_e5m2 and torch.float8_e4m3 data types ( #104242 )
...
Proposal of two float8 variants - e5m2 and e4m3 - based on https://arxiv.org/pdf/2209.05433.pdf
Hide all Float8 operator implementations behind `#if !defined(C10_MOBILE)` guard to keep Android build size almost unchanged
TODO:
- Refactor duplicated code
- Cleanup unbalanced pragma pop in dtype utils
- Add native implementation on the CUDA size
Co-authored-by: Nikita Shulga <nshulga@meta.com >
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104242
Approved by: https://github.com/albanD
2023-07-20 09:45:45 +00:00
964d29f312
[BE] Enable ruff's UP rules and autoformat torchgen/ ( #105423 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105423
Approved by: https://github.com/Skylion007
2023-07-18 06:44:20 +00:00
8958f041be
Revert "Add forward mode AD to out-place foreach functions ( #102409 )"
...
This reverts commit e2ec0ba404f9fbd3c215cad4cabd7383c692cb33.
Reverted https://github.com/pytorch/pytorch/pull/102409 on behalf of https://github.com/huydhn due to Sorry for reverting your PR but it is failing some tests in trunk e799f565eb
([comment](https://github.com/pytorch/pytorch/pull/102409#issuecomment-1615254393 ))
2023-06-30 22:46:57 +00:00
e2ec0ba404
Add forward mode AD to out-place foreach functions ( #102409 )
...
The major difference from in-place support is that some out-place functions have their derivatives spelled out in derivatives.yaml, which requires some changes in `load_derivatives.py` and some handlings in various places due to the others whose derivatives are generated by `torchgen`.
rel:
- #58833
- #100695
---
# Generated Foreach
```c++
::std::vector<at::Tensor> _foreach_sinh(c10::DispatchKeySet ks, at::TensorList self) {
auto self_ = unpack(self, "self", 0);
[[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self );
std::vector<bool> _any_has_forward_grad_result(self.size());
for (const auto& i : c10::irange(self.size())) {
_any_has_forward_grad_result[i] = isFwGradDefined(self[i]);
}
std::shared_ptr<ForeachSinhBackward0> grad_fn;
if (_any_requires_grad) {
grad_fn = std::shared_ptr<ForeachSinhBackward0>(new ForeachSinhBackward0(), deleteNode);
grad_fn->set_next_edges(collect_next_edges( self ));
grad_fn->self_ = make_saved_variable_list(self);
grad_fn->self_size_ = self.size();
}
#ifndef NDEBUG
std::vector<c10::optional<Storage>> self__storage_saved(self_.size());
for (const Tensor& tensor : self_)
self__storage_saved.push_back(
tensor.has_storage() ? c10::optional<Storage>(tensor.storage()) : c10::nullopt);
std::vector<c10::intrusive_ptr<TensorImpl>> self__impl_saved(self_.size());
for (size_t i=0; i<self_.size(); i++)
if (self_[i].defined()) self__impl_saved[i] = self_[i].getIntrusivePtr();
#endif
auto _tmp = ([&]() {
at::AutoDispatchBelowADInplaceOrView guard;
return at::redispatch::_foreach_sinh(ks & c10::after_autograd_keyset, self_);
})();
auto result = std::move(_tmp);
#ifndef NDEBUG
for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) {
if (self__storage_saved[i].has_value() && !at::impl::tensorlist_has_dispatch(self_))
TORCH_INTERNAL_ASSERT(self__storage_saved[i].value().is_alias_of(self_[i].storage()));
}
for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) {
if (self__impl_saved[i] && !at::impl::tensorlist_has_dispatch(self_))
TORCH_INTERNAL_ASSERT(self__impl_saved[i] == self_[i].getIntrusivePtr());
}
#endif
if (grad_fn) {
set_history(flatten_tensor_args( result ), grad_fn);
}
std::vector<c10::optional<at::Tensor>> result_new_fw_grad_opts(self.size(), c10::nullopt);
for (const auto& i : c10::irange(result_new_fw_grad_opts.size())) {
if (_any_has_forward_grad_result[i]) {
auto self_t_raw = toNonOptFwGrad(self[i]);
auto self_tensor = toNonOptTensor(self[i]);
auto self_t = (self_t_raw.defined() || !self_tensor.defined())
? self_t_raw : at::_efficientzerotensor(self_tensor.sizes(), self_tensor.options());
auto self_p = toNonOptPrimal(self[i]);
result_new_fw_grad_opts[i] = (self_t.conj() * self_p.cosh().conj()).conj();
}
}
for (const auto& i : c10::irange(result_new_fw_grad_opts.size())) {
auto& result_new_fw_grad_opt = result_new_fw_grad_opts[i];
if (result_new_fw_grad_opt.has_value() && result_new_fw_grad_opt.value().defined() && result[i].defined()) {
// The hardcoded 0 here will need to be updated once we support multiple levels.
result[i]._set_fw_grad(result_new_fw_grad_opt.value(), /* level */ 0, /* is_inplace_op */ false);
}
}
return result;
}
::std::vector<at::Tensor> _foreach_norm_Scalar(c10::DispatchKeySet ks, at::TensorList self, const at::Scalar & ord) {
auto self_ = unpack(self, "self", 0);
[[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self );
std::vector<bool> _any_has_forward_grad_result(self.size());
for (const auto& i : c10::irange(self.size())) {
_any_has_forward_grad_result[i] = isFwGradDefined(self[i]);
}
std::shared_ptr<ForeachNormBackward0> grad_fn;
if (_any_requires_grad) {
grad_fn = std::shared_ptr<ForeachNormBackward0>(new ForeachNormBackward0(), deleteNode);
grad_fn->set_next_edges(collect_next_edges( self ));
grad_fn->ord = ord;
grad_fn->self_ = make_saved_variable_list(self);
grad_fn->self_size_ = self.size();
}
#ifndef NDEBUG
std::vector<c10::optional<Storage>> self__storage_saved(self_.size());
for (const Tensor& tensor : self_)
self__storage_saved.push_back(
tensor.has_storage() ? c10::optional<Storage>(tensor.storage()) : c10::nullopt);
std::vector<c10::intrusive_ptr<TensorImpl>> self__impl_saved(self_.size());
for (size_t i=0; i<self_.size(); i++)
if (self_[i].defined()) self__impl_saved[i] = self_[i].getIntrusivePtr();
#endif
auto _tmp = ([&]() {
at::AutoDispatchBelowADInplaceOrView guard;
return at::redispatch::_foreach_norm(ks & c10::after_autograd_keyset, self_, ord);
})();
auto result = std::move(_tmp);
#ifndef NDEBUG
for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) {
if (self__storage_saved[i].has_value() && !at::impl::tensorlist_has_dispatch(self_))
TORCH_INTERNAL_ASSERT(self__storage_saved[i].value().is_alias_of(self_[i].storage()));
}
for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) {
if (self__impl_saved[i] && !at::impl::tensorlist_has_dispatch(self_))
TORCH_INTERNAL_ASSERT(self__impl_saved[i] == self_[i].getIntrusivePtr());
}
#endif
if (grad_fn) {
set_history(flatten_tensor_args( result ), grad_fn);
}
std::vector<c10::optional<at::Tensor>> result_new_fw_grad_opts(self.size(), c10::nullopt);
for (const auto& i : c10::irange(result_new_fw_grad_opts.size())) {
if (_any_has_forward_grad_result[i]) {
auto self_t_raw = toNonOptFwGrad(self[i]);
auto self_tensor = toNonOptTensor(self[i]);
auto self_t = (self_t_raw.defined() || !self_tensor.defined())
? self_t_raw : at::_efficientzerotensor(self_tensor.sizes(), self_tensor.options());
auto self_p = toNonOptPrimal(self[i]);
result_new_fw_grad_opts[i] = norm_jvp(self_p, self_t, ord, result[i]);
}
}
for (const auto& i : c10::irange(result_new_fw_grad_opts.size())) {
auto& result_new_fw_grad_opt = result_new_fw_grad_opts[i];
if (result_new_fw_grad_opt.has_value() && result_new_fw_grad_opt.value().defined() && result[i].defined()) {
// The hardcoded 0 here will need to be updated once we support multiple levels.
result[i]._set_fw_grad(result_new_fw_grad_opt.value(), /* level */ 0, /* is_inplace_op */ false);
}
}
if (grad_fn) {
grad_fn->result = result;
}
return result;
}
```
# Reference
```c++
at::Tensor sinh(c10::DispatchKeySet ks, const at::Tensor & self) {
auto& self_ = unpack(self, "self", 0);
[[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self );
[[maybe_unused]] auto _any_has_forward_grad_result = (isFwGradDefined(self));
std::shared_ptr<SinhBackward0> grad_fn;
if (_any_requires_grad) {
grad_fn = std::shared_ptr<SinhBackward0>(new SinhBackward0(), deleteNode);
grad_fn->set_next_edges(collect_next_edges( self ));
grad_fn->self_ = SavedVariable(self, false);
}
#ifndef NDEBUG
c10::optional<Storage> self__storage_saved =
self_.has_storage() ? c10::optional<Storage>(self_.storage()) : c10::nullopt;
c10::intrusive_ptr<TensorImpl> self__impl_saved;
if (self_.defined()) self__impl_saved = self_.getIntrusivePtr();
#endif
auto _tmp = ([&]() {
at::AutoDispatchBelowADInplaceOrView guard;
return at::redispatch::sinh(ks & c10::after_autograd_keyset, self_);
})();
auto result = std::move(_tmp);
#ifndef NDEBUG
if (self__storage_saved.has_value() &&
!at::impl::dispatch_mode_enabled() &&
!at::impl::tensor_has_dispatch(self_))
TORCH_INTERNAL_ASSERT(self__storage_saved.value().is_alias_of(self_.storage()));
if (self__impl_saved && !at::impl::dispatch_mode_enabled() && !at::impl::tensor_has_dispatch(self_))
TORCH_INTERNAL_ASSERT(self__impl_saved == self_.getIntrusivePtr());
if (result.has_storage() && !at::impl::dispatch_mode_enabled() && !at::impl::tensor_has_dispatch(result)) {
TORCH_INTERNAL_ASSERT(result.storage().use_count() == 1, "function: sinh");
}
if (!at::impl::dispatch_mode_enabled() && !at::impl::tensor_has_dispatch(result))
TORCH_INTERNAL_ASSERT(result.use_count() <= 1, "function: sinh");
#endif
if (grad_fn) {
set_history(flatten_tensor_args( result ), grad_fn);
}
c10::optional<at::Tensor> result_new_fw_grad_opt = c10::nullopt;
if (_any_has_forward_grad_result && (result.defined())) {
auto self_t_raw = toNonOptFwGrad(self);
auto self_tensor = toNonOptTensor(self);
auto self_t = (self_t_raw.defined() || !self_tensor.defined())
? self_t_raw : at::_efficientzerotensor(self_tensor.sizes(), self_tensor.options());
auto self_p = toNonOptPrimal(self);
result_new_fw_grad_opt = (self_t.conj() * self_p.cosh().conj()).conj();
}
if (result_new_fw_grad_opt.has_value() && result_new_fw_grad_opt.value().defined() && result.defined()) {
// The hardcoded 0 here will need to be updated once we support multiple levels.
result._set_fw_grad(result_new_fw_grad_opt.value(), /* level */ 0, /* is_inplace_op */ false);
}
return result;
}
at::Tensor norm_Scalar(c10::DispatchKeySet ks, const at::Tensor & self, const at::Scalar & p) {
auto& self_ = unpack(self, "self", 0);
[[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self );
[[maybe_unused]] auto _any_has_forward_grad_result = (isFwGradDefined(self));
std::shared_ptr<NormBackward0> grad_fn;
if (_any_requires_grad) {
grad_fn = std::shared_ptr<NormBackward0>(new NormBackward0(), deleteNode);
grad_fn->set_next_edges(collect_next_edges( self ));
grad_fn->p = p;
grad_fn->self_ = SavedVariable(self, false);
}
#ifndef NDEBUG
c10::optional<Storage> self__storage_saved =
self_.has_storage() ? c10::optional<Storage>(self_.storage()) : c10::nullopt;
c10::intrusive_ptr<TensorImpl> self__impl_saved;
if (self_.defined()) self__impl_saved = self_.getIntrusivePtr();
#endif
auto _tmp = ([&]() {
at::AutoDispatchBelowADInplaceOrView guard;
return at::redispatch::norm(ks & c10::after_autograd_keyset, self_, p);
})();
auto result = std::move(_tmp);
#ifndef NDEBUG
if (self__storage_saved.has_value() &&
!at::impl::dispatch_mode_enabled() &&
!at::impl::tensor_has_dispatch(self_))
TORCH_INTERNAL_ASSERT(self__storage_saved.value().is_alias_of(self_.storage()));
if (self__impl_saved && !at::impl::dispatch_mode_enabled() && !at::impl::tensor_has_dispatch(self_))
TORCH_INTERNAL_ASSERT(self__impl_saved == self_.getIntrusivePtr());
if (result.has_storage() && !at::impl::dispatch_mode_enabled() && !at::impl::tensor_has_dispatch(result)) {
TORCH_INTERNAL_ASSERT(result.storage().use_count() == 1, "function: norm_Scalar");
}
if (!at::impl::dispatch_mode_enabled() && !at::impl::tensor_has_dispatch(result))
TORCH_INTERNAL_ASSERT(result.use_count() <= 1, "function: norm_Scalar");
#endif
if (grad_fn) {
set_history(flatten_tensor_args( result ), grad_fn);
}
throw_error_for_complex_autograd(result, "norm");
c10::optional<at::Tensor> result_new_fw_grad_opt = c10::nullopt;
if (_any_has_forward_grad_result && (result.defined())) {
auto self_t_raw = toNonOptFwGrad(self);
auto self_tensor = toNonOptTensor(self);
auto self_t = (self_t_raw.defined() || !self_tensor.defined())
? self_t_raw : at::_efficientzerotensor(self_tensor.sizes(), self_tensor.options());
auto self_p = toNonOptPrimal(self);
result_new_fw_grad_opt = norm_jvp(self_p, self_t, p, result);
}
if (result_new_fw_grad_opt.has_value() && result_new_fw_grad_opt.value().defined() && result.defined()) {
// The hardcoded 0 here will need to be updated once we support multiple levels.
result._set_fw_grad(result_new_fw_grad_opt.value(), /* level */ 0, /* is_inplace_op */ false);
}
if (grad_fn) {
grad_fn->result_ = SavedVariable(result, true);
}
return result;
}
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102409
Approved by: https://github.com/soulitzer
2023-06-30 04:51:43 +00:00
d997969b8b
[Reland] Add sym_size/stride/numel/storage_offset to native_function.yaml ( #103107 )
...
Differential Revision: D46459100
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103107
Approved by: https://github.com/angelayi , https://github.com/soulitzer
2023-06-12 19:18:49 +00:00
ba2bc7df8f
Enable backward
on _foreach_zero_
( #101149 )
...
Currently torchgen cannot find an appropriate `DifferentiabilityInfo` for `_foreach_zero_` because `gen_foreach_derivativeinfo` doesn't correctly make use of `functional_info_by_signature` and `differentiability_infos`, and `is_reference_for_foreach` a bit too strict to `_foreach_zero_`.
Generated code in `VariableType`
```c++
void _foreach_zero_(c10::DispatchKeySet ks, at::TensorList self) {
auto self_ = unpack(self, "self", 0);
[[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self );
std::vector<c10::optional<at::Tensor>> original_selfs(self.size());
std::vector<std::shared_ptr<ZeroBackward0>> grad_fns;
if (_any_requires_grad) {
for (const auto& i : c10::irange( self.size() )) {
const auto ith_requires_grad = compute_requires_grad(self[i]);
check_inplace(self[i], ith_requires_grad);
grad_fns.push_back([&]() -> std::shared_ptr<ZeroBackward0> {
if (!ith_requires_grad) {
return nullptr;
} else {
auto grad_fn = std::shared_ptr<ZeroBackward0>(new ZeroBackward0(), deleteNode);
grad_fn->set_next_edges(collect_next_edges( self[i] ));
return grad_fn;
}
}());
}
}
#ifndef NDEBUG
std::vector<c10::optional<Storage>> self__storage_saved(self_.size());
for (const Tensor& tensor : self_)
self__storage_saved.push_back(
tensor.has_storage() ? c10::optional<Storage>(tensor.storage()) : c10::nullopt);
std::vector<c10::intrusive_ptr<TensorImpl>> self__impl_saved(self_.size());
for (size_t i=0; i<self_.size(); i++)
if (self_[i].defined()) self__impl_saved[i] = self_[i].getIntrusivePtr();
#endif
{
at::AutoDispatchBelowAutograd guard;
at::redispatch::_foreach_zero_(ks & c10::after_autograd_keyset, self_);
}
#ifndef NDEBUG
for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) {
if (self__storage_saved[i].has_value() && !at::impl::tensorlist_has_dispatch(self_))
TORCH_INTERNAL_ASSERT(self__storage_saved[i].value().is_alias_of(self_[i].storage()));
}
for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) {
if (self__impl_saved[i] && !at::impl::tensorlist_has_dispatch(self_))
TORCH_INTERNAL_ASSERT(self__impl_saved[i] == self_[i].getIntrusivePtr());
}
#endif
if (!grad_fns.empty()) {
auto differentiable_outputs = flatten_tensor_args( self );
TORCH_INTERNAL_ASSERT(differentiable_outputs.size() == grad_fns.size());
for (const auto& i : c10::irange(grad_fns.size())) {
auto grad_fn = grad_fns[i];
if (grad_fn != nullptr) {
rebase_history(differentiable_outputs[i], grad_fns[i]);
}
}
}
}
```
Rel:
- #58833
- #96405
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101149
Approved by: https://github.com/soulitzer
2023-05-17 03:10:13 +00:00
20cf42de2c
Revert "[Reland] Add sym_size/stride/numel/storage_offset to native_function.… ( #100749 )"
...
This reverts commit bb454891ed5ce97f580ae52e20f8e9ff2d0f3bf5.
2023-05-16 18:17:02 -07:00
b94f143ace
SymIntify convNd and conv_transposeNd, fix inductor symint handling ( #101488 )
...
Fixes https://github.com/pytorch/pytorch/issues/101014
Signed-off-by: Edward Z. Yang <ezyang@meta.com >
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101488
Approved by: https://github.com/ngimel
2023-05-16 17:46:52 +00:00