This PR introduces **-Wmissing-prototypes** of clang-tidy to prevent further coding errors such as the one fixed by PR #96714.
<!--
copilot:summary
-->
### <samp>🤖 Generated by Copilot at fd2cf2a</samp>
This pull request makes several internal functions static to improve performance and avoid name clashes. It also fixes some typos, formatting, and missing includes in various files. It adds a new .clang-tidy check to warn about missing prototypes for non-static functions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96805
Approved by: https://github.com/malfet, https://github.com/albanD
The strategy is that we will heap allocate a LargeNegativeIntSymNodeImpl whenever we have a large negative int, so that we can keep the old `is_symbolic` test (now called `is_heap_allocated`) on SymInt. Whenever we need to do something with these ints, though, we convert them back into a plain `int64_t` (and then, e.g., wrap it in whatever user specificed SymNodeImpl they need.) We cannot wrap directly in the user specified SymNodeImpl as we generally do not know what the "tracing context" is from C++. We expect large negative ints to be rare, so we don't apply optimizations like singleton-ifying INT_MIN. Here's the order to review:
* c10/core/SymInt.h and cpp
* `is_symbolic` renamed to `is_heap_allocated` as I needed to audit all use sites: the old `is_symbolic` test would return true for large negative int, but it would be wrong to then try to dispatch on the LargeNegativeIntSymNodeImpl which supports very few operations. In this file, I had to update expect_int,
* If you pass in a large negative integer, we instead heap allocate it in `promote_to_negative`. The function is written in a funny way to keep compact constructor code for SymInt (the heap allocation happens out of line)
* clone is now moved out-of-line
* New method maybe_as_int which will give you a constant int if it is possible, either because it's stored inline or in LargeNegativeIntSymNodeImpl. This is the preferred replacement for previous use of is_symbolic() and then as_int_unchecked().
* Rename toSymNodeImpl to toSymNode, which is more correct (since it returns a SymNode)
* Complete rewrite of `normalize_symints.cpp` to use new `maybe_as_int`. Cannot easily use the old code structure, so it's now done doing a macro and typing out each case manually (it's actually not that bad.)
* Reimplementations of all the unary operators by hand to use `maybe_as_int`, relatively simple.
* c10/core/LargeNegativeIntSymNodeImpl.h - Just stores a int64_t value, but it has to be big and negative. Most methods are not implemented, since we will rewrap the large negative int in the real SymNodeImpl subclass before doing operations with it
* The rest of the files are just rewriting code to use `maybe_as_int`. There is a nontrivial comment in c10/core/SymIntArrayRef.h
Very minor test adjustment in c10/test/core/SymInt_test.cpp . Plan to exercise this properly in next PR.
Companion XLA PR: https://github.com/pytorch/xla/pull/4882
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99157
Approved by: https://github.com/albanD
…eMeta
This modularizes ExtraMeta to bring down its creation cost when it is needed for other functions than syn shape handling.
Change-Id: Ife59b201b0c4fd75090fe8be5171a6dd73a10d10
Fixes #ISSUE_NUMBER
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98399
Approved by: https://github.com/ezyang
This PR removes the unnecessary == 0 guard when constructing empty tensors, by ensuring that when we create a contiguous tensor we go directly to the C++ torch.empty implementation (instead of indirecting through empty_strided), where we can bypass doing zero tests when computing the size of the storage. This probably also speeds up trace time.
When I did this, I found out that `empty_tensor_restride_symint` was flagrantly wrong (we had never exercised it before because we redirected to `empty_strided` in PrimTorch decomp, which doesn't hit this codepath.) The bugs:
* Stride computation was wrong (only `last_idx` was ever written to)
* Using set_sizes_and_strides with `sym_sizes` input doesn't work, because there is some sort of ordering problem where `clone_symvec` isn't safe when you clone a vector into itself. Probably should fix this.
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94512
Approved by: https://github.com/ngimel
The basic idea behind this PR is that we want to continue using the guarding implementations of contiguity tests, if all of the elements are backend (aka, have hints). If they don't have hints, we'll have to do something slower (use the non-short circuiting, non guarding implementations of contiguity), but most of the time you aren't dealing with unbacked SymInts.
So this PR has three parts.
1. We expose `has_hint` on `SymNode`. This allows us to query whether or not a SymInt is backed or not from C++. Fairly self explanatory. Will require LTC/XLA updates; but for backends that don't support unbacked SymInts you can just always return true.
2. We update `compute_non_overlapping_and_dense` to test if the inputs are hinted. If they are all hinted, we use the conventional C++ implementation. Otherwise we call into Python. The Python case is not heavily tested right now because I haven't gotten all of the pieces for unbacked SymInts working yet. Coming soon.
3. We add stubs for all of the other contiguity tests. The intention is to apply the same treatment to them as well, but this is not wired up yet for safety reasons.
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94431
Approved by: https://github.com/voznesenskym
This changes TensorImpl to store SymBool instead of bool. However, it doesn't actually compute these quantities symbolically (outside of some top level disjunctions.) The purpose of this PR is to make it easier to diagnose performance problems in the next PR, as after this change we can switch to guardless implementations without modifying TensorImpl.h
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92229
Approved by: https://github.com/Skylion007, https://github.com/albanD
Applies various automated fixes that reduces the number of spurious copies in torch, aten, and c10. I also inlined any default dtors that would have made the type trivially destructible.
Follow up to #89000
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90629
Approved by: https://github.com/ezyang
Along the way, I undid making sparse/dense dim symint (they're
dimensions, so they should be static.)
Also symintify set_indices_and_values_unsafe
There is a little bit of a nontrivial infra change here: previously, we didn't populate the strides field on sparse tensors. It is now populated with "empty" strides, and this meant that sparse tensors were falsely reporting they were non-overlapping dense/contiguous. I added in a hack to work around this case.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88573
Approved by: https://github.com/anjali411
I saw some missed optimization opportunities in C10 using std::move and thought I would submit a PR to fix them. There are particularly a lot of them dealing with the symbolic operators which are used in quite a few places including in loops.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88512
Approved by: https://github.com/ezyang
Previously, our handling for contiguity was inconsistent in the following ways:
- is_strides_like 2d/3d and is_non_overlapping_and_dense always were computed
based on sizes_and_strides_, even if you had symbolic ints
- Furthermore, even if you set custom policy for strides, these quantities were
not overridable by subclasses
- Furthermore, we didn't even store these fields on ExtraMeta
- We duplicate implementations of compute_contiguous (plain, channels last,
channels last 3d)
- We inconsistently called refresh_numel()/refresh_contiguous(), versus
recomputing it ourselves
This factor makes a consistent strategy for all of the boolean fields, and
for numel computation. After this refactor:
- All layout boolean fields are interposable via strides policy
and can be overridden from Python; you will never access a garbage field
- All layout boolean fields are on ExtraMeta
- You can always call refresh_numel/contiguous, no matter if your Tensor is
contiguous or not
- The numel/layout boolean fields are always populated consistently with
the sizes strides fields (either on Tensor or ExtraMeta), even if you
have custom policy
- There is only one implementation of the actual computation logic
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Differential Revision: [D39907696](https://our.internmc.facebook.com/intern/diff/D39907696)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85858
Approved by: https://github.com/albanD
From the perspective of having valid sympy expressions for any given size/stride property, we can have tensors inherit SymInts from each other (in cases where the size expression is unchanged, which is a common case).
But we also use SymInts to let us build graph traces of our programs, and we need to be able to trace from a SymInt back to the tensor that it originated from in order to trace correct graphs. This change ensures each tensor starts with fresh SymInts.
- note: our policy has already been to use PySymIntNode objects to store pointers to proxy-tracer objects for use during tracing
- before making this change (to clone symints), sometimes we'd attempt to store more than one proxy-tracer object on the same symint and the last-stored one would clobber all the earlier ones. This would result in tracing the wrong graph in some cases.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85878
Approved by: https://github.com/ezyang
Based on @ezyang's suggestion, mode stack now has "one true mode" which is the _only_ mode that can ever be active at the C++ level. That mode's torch dispatch is just to take the top mode in the stack, reenable itself (if we aren't at the end of the mode stack), and run the top mode's torch_{dispatch|function}
This maintains that in the middle of a mode's torch dispatch, the mode itself will not be active. It changes the function the user has to call to see what the current mode is (no longer queries the C++, it's python only) but allows the user to also see the entire mode stack easily
Removes `enable_torch_dispatch_mode` and `.restore()` since neither makes sense in this new setup
### Background
Why do we want this? Well, a pretty common pattern that was coming up was that users had to do something like
```python
## PRE-PR UX
def f(mode):
with mode.restore(): # user needs to understand this restore thing?
...
with Mode() as m:
pass
f(m)
```
Many users were getting error from forgetting to call `.restore` or from forgetting to add the (tbh weird) "mode instantiation" step where they use the mode as a context manager with an empty body. Really, they wanted to treat modes like context managers and just write
```python
## FROM FEEDBACK, USER DESIRED CODE. POSSIBLE POST-PR
def f(mode):
with mode:
...
f(Mode())
```
** Technical Details **
With the old mode stack, we basically had a linked list so the mode itself could only be used once and had a fixed parent. In this new design, the mode stack is just a python list that we're pushing to and popping from. There's only one mode that's ever active at the C++ level and it runs the next mode in the Python list. The modes don't have state on them anymore
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84774
Approved by: https://github.com/ezyang, https://github.com/zou3519
A longstanding confusion in the implementation of fake tensor and proxy tensor is what to do about torch.ops.aten.sym_sizes and related calls. In particular, when you have a tensor that (1) has symbolic shapes and (2) has a `__torch_dispatch__` call, previously, you would always get `__torch_dispatch__` calls for sizes/strides query, *even if you didn't request it* via the dispatch kwargs in `make_wrapper_subclass`.
The reason for this is because we were previously mixing several concepts: "I want to dispatch to Python", "I want to call a virtual method" and "I have dynamic shapes". A single boolean variable controlled all of these things, and so it was not possible to understand inside TensorImpl what the user had actually originally requested.
In this PR, we track each of these concepts individually so that we can preserve user intent. Then, we combine these into a single "policy" variable that controls whether or not we can use the fastpath or not. For the policy to trigger, we only need one of the exceptional cases to be true.
Billing of changes:
* Rename `set_sizes_strides_policy` to `set_custom_sizes_strides`; in general, you cannot DIRECTLY set policy; you have to indirectly set it by the public functions.
* Some helpers for sizes and strides, since it's more complicated (as it is an enum, rather than just bools as is the case for device and layout). `matches_python_custom` is used to test the Python dispatch user ask. `matches_policy` does the policy test (only used in the user facing functions.)
* I reorged the accessor methods so that they are more logical. This makes the diff bad, so I recommend reading the final code directly.
* The default custom implementations now more reliably call their default() implementations
* As bonus refactor, I devirtualized some functions that don't need to be virtual
* `set_sym_sizes_and_strides` is renamed to `set_sizes_and_strides` to make it easier to use in template contexts; it optionally takes a storage offset now so you can set all three values at the same time. If you use the SymInt overload but there are no symbolic integers, we give you a normal resize.
* This adds `sym_storage_offset` since we had that in the symbolic shapes branch and there's no reason not to put it in (and it reduces merge conflicts)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84641
Approved by: https://github.com/wconstab
swolchok reported that non-tracing usage of Tensor we are wasting a lot
of time on is_symbolic() tests, e.g., when destructing SymInts. This
is a regression for no good reason because we don't actually ever
have SymInts in those cases. This PR moves the stored SymInts on
Tensor out of line, into a separate ExtraMeta struct, which is only
allocated when we make a Tensor store symbolic sizes/strides.
To avoid adding another word to TensorImpl, I take over the named tensor
metadata field. This makes named tensor require a double indirection
and use up more space, but it's OK since we're going to delete this
feature anyway soon.
I restore regular int64_t storage on Tensor. This entailed reverting
https://github.com/pytorch/pytorch/pull/82467 ; there are no other
substantive changes to SizesAndStrides so a close review is not
necessary.
I don't bother optimizes sizes and strides in ExtraMeta in the same
way stock tensor is optimized. I add a SymDimVector alias. I make
SymInt UNCHECKED constructor public as it is a useful optimization
in some situations when the int is known to be positive.
I thought about storing the SymInts on the Python object instead.
However, because we can allocate symbolic shape tensors directly
from C++, we cannot guarantee that there is a PyInterpreter for
a Tensor. So we do it this way instead; it's also faster since you
don't have to take out the GIL to do accesses.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84390
Approved by: https://github.com/swolchok, https://github.com/Krovatkin
I realized that we can deal with the dead vtable problem by...
introducing another indirection! The resulting code is worse
(you have to do one more dereference to get to the vtable), but
the reduction in boilerplate is, IMO, worth it.
I did this refactor because I'm about to add a lot more methods
to PyInterpreter to handle expunging SymInt from TensorImpl.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84388
Approved by: https://github.com/albanD
The bug is that:
(1) functionalization kernels internally call `at::empty_strided()` to construct meta tensors, and then call the meta tensor op
(2) This happens with the Python dispatch key already added to the TLS exclude set, so we expect these meta tensors never to enter python
(3) When calling detach() though, `TensorImpl::shallow_copy_and_detach()` will currently always call into python when a PythonMode is set. Instead, I updated it to check if the Python key is in the TLS exclude set first.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83701
Approved by: https://github.com/ezyang
# Summary
This is PR is pulling out all the changes from #81838 specific to properly creating nested_tensor views. I will update this comment with a design doc once that has been made. This should enable proper creation of NestedTensor views, two nested_tensors sharing the same buffer_ but with different NestedTensor meta data.
The function `create_nested_tensor_view` is a helper function for creating a new nested tensor whose storage aliases the base causing the underlying storage to be shared - and is therefore a view.
This function by itself is not differentiable and therefore autograd does not track its uses. If a nested tensor function implementation uses this helper in its implementation the aten_op must meet two requirements:
- The function must return a view of the input
- The function must be explicit and defines its backward
## Testing
A bug was found when creating a base tensor out of inference mode and then creating a view in inference mode. This test has been aded to this PR in order to show the effect of the change.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82658
Approved by: https://github.com/albanD
I noticed I was missing tensor creations with modes when I tried
to delete proxy tensor. This was the cause.
Hypothetically, all PyInterpreter calls could get this treatment.
But I think it only matters for detach; the rest do not return
Tensors and most modes will not be interested in them.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83372
Approved by: https://github.com/zou3519
Add `TensorImpl::sym_strides`, bind it to python with `torch.ops.aten.sym_strides`, and use it in `ProxyTensor` and `FakeTensor`.
Before, `ProxyTensor` was generating `ProxySymInt`'s for the sizes, but not for the strides. Internally we still represent strides with a `SymIntArrayRef` though, so I ran into some weird issues where sizes were showing up as `ProxySymInt`, but strides were `PySymInt`'s.
Differential Revision: [D38594558](https://our.internmc.facebook.com/intern/diff/D38594558)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81300
Approved by: https://github.com/ezyang