Commit Graph

341 Commits

Author SHA1 Message Date
1ba1b7b597 Support remaining *_like factory functions for NJT (#144889)
Fixes #144761

This PR adds NJT impls for those *_like functions that were previously missing:
* `full_like()`
* `rand_like()`
* `randint_like()`

It also fixes a bug in existing *_like functions when a new device is specified. Fix is to also transfer `offsets` / `lengths` to the new device.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144889
Approved by: https://github.com/soulitzer
2025-01-27 21:33:51 +00:00
46e06e1d09 Avoid data-dependent errors in NJT tests via capture_scalar_outputs=True (#144588)
Part of my BE project addressing NJT bugs surfaced via OpInfo tests.

There are several xfails related to data-dependent errors in torch.compile. This PR sets `torch._dynamo.config.capture_scalar_outputs=True` to avoid these, which tends to exercise unbacked SymInt logic and will require `torch._check()`-related fixes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144588
Approved by: https://github.com/soulitzer
ghstack dependencies: #144586, #144587
2025-01-24 22:45:01 +00:00
128f3627b1 Implement backward for NJT matmul (#144587)
Part of my BE project addressing NJT bugs surfaced via OpInfo tests.

This PR implements missing backward support for NJT matmul. Notably, for dense tensors, matmul dispatches to bmm. However, due to historical reasons related to NST, NJT handles matmul directly, and thus can't rely on the CompositeImplicit impl of matmul to get the derivative formula.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144587
Approved by: https://github.com/soulitzer
ghstack dependencies: #144586
2025-01-21 18:27:50 +00:00
af204135d8 Fix NJT fill.Scalar for contiguous inputs (#144586)
Part of my BE project addressing NJT bugs surfaced via OpInfo tests.

This PR implements the missing `fill.Scalar` support, which works fine for contiguous inputs, but there is still some AOTAutograd debugging required to handle non-contiguous transposed NJTs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144586
Approved by: https://github.com/soulitzer
2025-01-21 18:22:08 +00:00
b63b81410c Fix NJT frexp() to handle both outputs (#144585)
Part of my BE project addressing NJT bugs surfaced via OpInfo tests.

Before this PR, `frexp()` for NJT was handled via the unary pointwise fallback. The op returns a tuple, however, and the fallback doesn't handle that. This PR defines an explicit impl for `frexp()` that wraps both returned `(mantissa, exponent)` as NJTs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144585
Approved by: https://github.com/soulitzer
ghstack dependencies: #144582, #144583, #144584
2025-01-18 15:59:56 +00:00
3ee531f8b9 Support NJT chunk() backward on batch dim (#144584)
Part of my BE project addressing NJT bugs surfaced via OpInfo tests.

Implements `chunk()` backward on the batch dim, which was left out before. This PR unbinds the components and invokes `copy_()` on these to pass along the appropriate gradients.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144584
Approved by: https://github.com/soulitzer
ghstack dependencies: #144582, #144583
2025-01-18 15:58:24 +00:00
a8ef423fed Fix NJT min / max backward() for non-ragged reductions (#144583)
Part of my BE project addressing NJT bugs surfaced via OpInfo tests.

`value_selecting_reduction_backward()` is used in the backward for min / max, so this PR implements it for NJT. Notably, this isn't enough for reducing over the ragged dim, since that results in a dense tensor and thus NJT's torch_dispatch will not be called for this op. We need factory function support for nested ints to fix that case.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144583
Approved by: https://github.com/soulitzer
ghstack dependencies: #144582
2025-01-17 20:57:11 +00:00
cac10b8190 Fix NJT OpInfo entry for nn.functional.prelu (#144582)
Part of my BE project addressing NJT bugs surfaced via OpInfo tests.

The OpInfo entry for prelu was wrong before this PR; `weight` needs to be passed as well. The op isn't fully implemented yet.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144582
Approved by: https://github.com/soulitzer
2025-01-17 20:36:15 +00:00
c07dc64017 Update pin memory related APIs to not pass 'device' argument (#131858)
Based on https://github.com/pytorch/pytorch/pull/126376, this PR tries to update all PT callers (e.g., `Tensor.is_pinned()`, `Tensor.pin_memory()`) to not pass `device` argument.
As for `storage/untyped_storage.is_pinned()/pin_memory()`, we keep the `device` argument but passing `device` is discouraged. And if not given, the default `device` is still 'cuda' for BC.
Additionally, based on device-agnostic pin_memory, `pin_memory_device` argument of `torch.utils.data.DataLoader` is discouraged  now. For BC, explictly passing this argument is still effective. If not given, the default `device` will be the current accelerator.

Fixes #124908
Relates https://github.com/pytorch/pytorch/pull/126376

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131858
Approved by: https://github.com/albanD

Co-authored-by: albanD <desmaison.alban@gmail.com>
2025-01-15 17:23:35 +00:00
cyy
df458be4e5 [4/N] Apply py39 ruff and pyupgrade fixes (#143257)
```torch/fx/passes/annotate_getitem_nodes.py``` was changed to support the new type hinting annotations.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143257
Approved by: https://github.com/justinchuby, https://github.com/albanD
2025-01-04 10:47:51 +00:00
228b228449 Fix batch-specific attention mod for NJT + Flex (#143866)
Fixes #143788
Pull Request resolved: https://github.com/pytorch/pytorch/pull/143866
Approved by: https://github.com/Skylion007, https://github.com/cpuhrsch
2024-12-27 20:51:41 +00:00
d8c8ba2440 Fix unused Python variables in test/[e-z]* (#136964)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136964
Approved by: https://github.com/justinchuby, https://github.com/albanD
2024-12-18 23:02:30 +00:00
5dabe2d464 Fix NJT backward tests (#143072)
This PR fixes some issues with NJT backward / compile backward tests:
1. `requires_grad` was not being propagated appropriately during `SampleInput` generation, so a LOT of backward cases were untested before (sad times). This PR utilizes a helper function `_clone()` to clone() / detach() NJTs for SampleInputs while preserving `requires_grad` status. Note: the clone() / detach() stuff is for autograd; can't have two SampleInputs as part of the same autograd graph.
2. Per-sample skips weren't -fully- working; the op logic would still be invoked even with a skip. I found this out thanks to `split_with_sizes`, which segfaults during backwards because it tries to use an NST-specific formula. As annoying as it is, I tried a ton of things but ultimately had to split the `subtest_ctx` into that + a `skip_xfail_ctx` to run the subtests within.
    * Updated all uses of per-sample skips / xfails: 4 in `test_nestedtensor.py` and 1 in `test_vmap.py`
3. Added the appropriate skips / xfails to get everything passing. There are a shitton of bugs to fix!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/143072
Approved by: https://github.com/cpuhrsch, https://github.com/soulitzer
2024-12-12 18:06:23 +00:00
e1196dfe51 Deprecate torch._utils.is_compiling() (#127690)
This PR is split from PR #126898.

- #126898

------

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127690
Approved by: https://github.com/Skylion007, https://github.com/malfet
2024-12-08 22:55:36 +00:00
5663ad99e7 Fix per-sample xfails for NJT tests (#142243)
#140736 fixed some xfails, but these were not properly failing in CI due to #142157. This PR removes the xfails so we can land a fix to that issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/142243
Approved by: https://github.com/huydhn
2024-12-06 22:39:35 +00:00
661d1f0372 [aotd] non-contiguous NestedTensor mutation in compile (#139630)
Allow mutations mutations for subclasses that are non-contiguous.

Changes:

Removing assert in collect_metadata_analysis

Main requested testcase:
Compilation of NJT.index_put()

Adding test in test_nestedtensor.py, that compiles NJT.index_put()

It is  decomposed to NJT split,unbind, which  needed additional `torch._check`, `torch._check_is_size` for NJT.unbind()  and guard_size_oblivious() usage in _meta_registrations and _inductor/lowering.py.

Special case:
If tangent is mutated outside of the graph, it does not participate in backward graph. Autograd in this case will set this tangent to zeros tensor.

We handle it separately in CompiledFunction.backward: not doing any processing for this tangent and broadcast to number of expected subclass unwrapped arguments.

disabling for dynamo 2 tests:
1/ For nested tensor - symbolic shapes issue on nested_tensor index operation that does splits [0, 0, 0] - there is a failure with "pending unbacked symints". This PR does not add more .tolist()/item() ops than it was before.

2/ As we do not fail with exception in collect_metadata_analysis new paths for dynamo started working and it started failing with smth strange that set_ in storage_offset (because of test for views) handling updates storage "cpu" -> "meta"

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139630
Approved by: https://github.com/bdhirsh
2024-12-06 12:18:46 +00:00
161a2340ee Switch to using Python nested int (#141166)
Doesn't seem to noticeably slow down eager - TestNestedTensorSubclass tests with and without the PR finished in similar amounts of time (around 57s, 58s)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141166
Approved by: https://github.com/ezyang
2024-12-02 19:17:30 +00:00
c9e2b3fefe NJT: Return correct number of outputs for chunk() on the batch dim (#141604)
Old logic was completely wrong, returning `chunk_size` chunks instead of the intended number. The original test didn't catch this because `chunk_size == num_chunks` :p New OpInfo-based testing covers it though.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141604
Approved by: https://github.com/soulitzer
ghstack dependencies: #141500, #140736, #140161, #141392, #141506
2024-11-27 02:31:23 +00:00
43121b6f0d Adjust output NJT ragged_idx for reductions and select() (#141506)
This fixes some bugs when performing reductions / select() on dims before the ragged dim. In this case, the output NJT has a smaller number of dims, and its ragged_idx should reflect that correctly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141506
Approved by: https://github.com/cpuhrsch, https://github.com/soulitzer
ghstack dependencies: #141500, #140736, #140161, #141392
2024-11-27 02:25:53 +00:00
23793cf93d NJT unsqueeze() fixes (#141392)
This PR contains three `unsqueeze()`-related fixes for NJT:
1. Adjusts the output's `_ragged_idx` when `unsqueeze()` inserts a dim before the ragged dim
2. Corrects the unbind reference for `unsqueeze()` after the last input dim. For this case, the dim kwarg canonicalization logic needs to be applied wrt `inp.dim() + 1` to account for `dim=-1` properly
3. Adds ragged dim support to `unsqueeze()`, allowing for e.g. `(B, j1, D) -> (B, 1, j1, D)`. This is okay now after #137125

Note that `unsqueeze()` still doesn't support batch dim operation, and arguably should never support this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141392
Approved by: https://github.com/cpuhrsch
ghstack dependencies: #141500, #140736, #140161
2024-11-26 22:38:35 +00:00
9ee5d6f83c Initial NJT testing over dim type / views (#140161)
This PR introduces `ExtraOpData`, a structure that contains op metadata regarding whether the op is a view and the dim-related args it accepts. It also populates a huge database for dim-wise / view ops with this info.

Test logic (sample input generation, references) have been updated to utilize this data. It allows for a fairly generic set of sample inputs & a reference for the class of ops that accept a single NJT and operate dim-wise (AKA "unary dimwise ops").

Testing is added over the following ops:
* `chunk()`
* `narrow()`
* `select()`
* `split()`
* `split_with_sizes()`
* `squeeze()`
* `unflatten()`
* `unsqueeze()`

Most of the above do not operate on the ragged / batch dims or on non-contiguous NJTs, so the proper xfails are added as needed.

I also slipped in a couple minor fixes (sorry):
1. The `_wrap_jagged_dim()` helper now avoids assuming the `nt._ragged_idx == 1` and allows for a batch dim to be a valid input, disambiguating the converted inner dim as necessary through an additional `operating_on_batch` return value (i.e. both dim=0 and dim=1 map to dim=0 on the inner values tensor, since that dim represents a packed ragged dim for all batch items)
2. Padded dense -> NJT conversion requires shape gymnastics to operate with the restrictive FBGEMM kernel. The gymnastics were slightly wrong for the transposed NJT case, and this PR fixes that
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140161
Approved by: https://github.com/Skylion007, https://github.com/cpuhrsch
ghstack dependencies: #141500, #140736
2024-11-26 22:08:08 +00:00
869d629c0f Forward / backward NJT support for several activation functions (#140736)
Several activation functions were unimplemented due to missing `pointwise` tags. This PR adds them and corresponding backwards implementations.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140736
Approved by: https://github.com/Skylion007, https://github.com/cpuhrsch
ghstack dependencies: #141500
2024-11-26 21:19:58 +00:00
8ba555ec8a Fix where() for NJT (#141500)
**Background:** It's common to use `scalar_tensor()` in the input to `where()` to convert any scalars present to compatible tensors with matching options, *including layout*. This shows up in various places, notably including derivative formulas ([example](78491d6afc/tools/autograd/derivatives.yaml (L432-L434))). It causes problems for NJTs because they have `layout=torch.jagged` and it never makes sense to create a scalar tensor with this layout. Some of the breakage only seems to happen in CI for reasons I don't fully understand (see the revert of #140736 due to softshrink's derivative formula).

**This PR:**
* Allows non-contiguous NJT inputs to `where()` + adds tests for this
* Handles scalar tensor / dense tensor inputs for `condition` / `other` + adds tests for this
    * Uses limited `broadcast_tensors()` / `broadcast_to()` support
    * Improves `expand()` to work on non-contig NJTs
* Changes `scalar_tensor()` to use `torch.strided` instead of `torch.jagged` in both eager and torch.compile (i.e. meta registration)
* Changes backward formulas for `sinc`, `pow`, `special.i1`, and `special.i1e` to uses `scalar_tensor()` instead of e.g. `zeros({})`

**Alternative approach:** Update all problematic usages of `scalar_tensor()` to avoid ever passing `layout=torch.jagged`. This is an extensive change and includes `torch.where()` logic, a bunch of derivative formulas, and likely other places not yet discovered.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141500
Approved by: https://github.com/malfet, https://github.com/cpuhrsch, https://github.com/soulitzer
2024-11-26 20:13:27 +00:00
cffeb83f15 Revert "Forward / backward NJT support for several activation functions (#140736)"
This reverts commit daaecb96d6b8049f8ca95974cd8a45b2fb9d4e28.

Reverted https://github.com/pytorch/pytorch/pull/140736 on behalf of https://github.com/malfet due to Take 2, of stack revert your change but its tests are failing in trunk ([comment](https://github.com/pytorch/pytorch/pull/140736#issuecomment-2498479702))
2024-11-25 16:27:00 +00:00
e0f9ec4a25 Revert "Initial NJT testing over dim type / views (#140161)"
This reverts commit 730caf0aed187ce5c1c36fae7e9ae1f700585280.

Reverted https://github.com/pytorch/pytorch/pull/140161 on behalf of https://github.com/malfet due to Sorry for reverting your change but its tests are failing in trunk ([comment](https://github.com/pytorch/pytorch/pull/140736#issuecomment-2498358652))
2024-11-25 15:40:54 +00:00
58727b6f5f Revert "NJT unsqueeze() fixes (#141392)"
This reverts commit 48409a5cc6b14b6a5237beb6263a436d309afcd2.

Reverted https://github.com/pytorch/pytorch/pull/141392 on behalf of https://github.com/malfet due to Sorry for reverting your change but its tests are failing in trunk ([comment](https://github.com/pytorch/pytorch/pull/140736#issuecomment-2498358652))
2024-11-25 15:40:54 +00:00
48409a5cc6 NJT unsqueeze() fixes (#141392)
This PR contains three `unsqueeze()`-related fixes for NJT:
1. Adjusts the output's `_ragged_idx` when `unsqueeze()` inserts a dim before the ragged dim
2. Corrects the unbind reference for `unsqueeze()` after the last input dim. For this case, the dim kwarg canonicalization logic needs to be applied wrt `inp.dim() + 1` to account for `dim=-1` properly
3. Adds ragged dim support to `unsqueeze()`, allowing for e.g. `(B, j1, D) -> (B, 1, j1, D)`. This is okay now after #137125

Note that `unsqueeze()` still doesn't support batch dim operation, and arguably should never support this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141392
Approved by: https://github.com/cpuhrsch
ghstack dependencies: #140736, #140161
2024-11-25 08:08:38 +00:00
730caf0aed Initial NJT testing over dim type / views (#140161)
This PR introduces `ExtraOpData`, a structure that contains op metadata regarding whether the op is a view and the dim-related args it accepts. It also populates a huge database for dim-wise / view ops with this info.

Test logic (sample input generation, references) have been updated to utilize this data. It allows for a fairly generic set of sample inputs & a reference for the class of ops that accept a single NJT and operate dim-wise (AKA "unary dimwise ops").

Testing is added over the following ops:
* `chunk()`
* `narrow()`
* `select()`
* `split()`
* `split_with_sizes()`
* `squeeze()`
* `unflatten()`
* `unsqueeze()`

Most of the above do not operate on the ragged / batch dims or on non-contiguous NJTs, so the proper xfails are added as needed.

I also slipped in a couple minor fixes (sorry):
1. The `_wrap_jagged_dim()` helper now avoids assuming the `nt._ragged_idx == 1` and allows for a batch dim to be a valid input, disambiguating the converted inner dim as necessary through an additional `operating_on_batch` return value (i.e. both dim=0 and dim=1 map to dim=0 on the inner values tensor, since that dim represents a packed ragged dim for all batch items)
2. Padded dense -> NJT conversion requires shape gymnastics to operate with the restrictive FBGEMM kernel. The gymnastics were slightly wrong for the transposed NJT case, and this PR fixes that
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140161
Approved by: https://github.com/Skylion007, https://github.com/cpuhrsch
ghstack dependencies: #140736
2024-11-25 08:08:38 +00:00
daaecb96d6 Forward / backward NJT support for several activation functions (#140736)
Several activation functions were unimplemented due to missing `pointwise` tags. This PR adds them and corresponding backwards implementations.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140736
Approved by: https://github.com/Skylion007, https://github.com/cpuhrsch
2024-11-25 08:08:31 +00:00
1f734bc90c Add bfloat16 support to torch.bmm(NST, NST) (#141380)
Adds bfloat16 support to torch.bmm(NST, NST) where NST is NestedTensor with the torch.strided (default) layout.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141380
Approved by: https://github.com/jbschlosser
2024-11-23 04:18:48 +00:00
2e7ba0b194 Revert "Switch to using Python nested int (#141166)"
This reverts commit e2e8a7fa2e519433a4ec1071f80d2f6f843c6300.

Reverted https://github.com/pytorch/pytorch/pull/141166 on behalf of https://github.com/clee2000 due to broke docs [GH job link](https://github.com/pytorch/pytorch/actions/runs/11980936976/job/33406870951) [HUD commit link](e2e8a7fa2e) ([comment](https://github.com/pytorch/pytorch/pull/141166#issuecomment-2495112297))
2024-11-22 23:54:36 +00:00
0be0c944b1 Revert "Forward / backward NJT support for several activation functions (#140736)"
This reverts commit af70f5e04c69839a1a0e08942254c170dc4c3d61.

Reverted https://github.com/pytorch/pytorch/pull/140736 on behalf of https://github.com/huydhn due to Sorry for reverting your change but its tests are failing in trunk ([comment](https://github.com/pytorch/pytorch/pull/140736#issuecomment-2495075871))
2024-11-22 23:15:55 +00:00
e2e8a7fa2e Switch to using Python nested int (#141166)
Doesn't seem to noticeably slow down eager - TestNestedTensorSubclass tests with and without the PR finished in similar amounts of time (around 57s, 58s)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141166
Approved by: https://github.com/ezyang
2024-11-22 22:12:25 +00:00
af70f5e04c Forward / backward NJT support for several activation functions (#140736)
Several activation functions were unimplemented due to missing `pointwise` tags. This PR adds them and corresponding backwards implementations.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140736
Approved by: https://github.com/Skylion007, https://github.com/cpuhrsch
2024-11-22 22:05:53 +00:00
41f315417c Fix NJT linear_backward() memory usage (#141163)
Fixes #141112

The formula we're using for `linear_backward()` is inefficient for higher dim input sizes, even if the input is trivially higher dim (e.g. via use of `unsqueeze()`). This PR updates the formula to match the more efficient version employed by NST. Specifically, note the leading dim collapse for `grad_output`'s values before we compute the various matmuls.
d5ee1d1b58/aten/src/ATen/native/nested/NestedTensorBackward.cpp (L37-L70)

Testing for correctness is done via existing gradcheck tests (e.g. `test_backward_nn_functional_linear`). I added a memory usage test but I think it's likely there's a better way to do this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141163
Approved by: https://github.com/Skylion007, https://github.com/cpuhrsch, https://github.com/soulitzer
2024-11-21 15:22:45 +00:00
5e0c009a5a Forward fix lint after #140443 (#141088)
TSIA
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141088
Approved by: https://github.com/atalman
2024-11-20 02:21:24 +00:00
780c580d68 General per-SampleInput xfail / skip system (#140443)
### Background
This PR adds the functionality to xfail / skip on a per-`SampleInput` basis for `OpInfo` tests. See #89354 and #82669 for some requests asking for this type of functionality.

This was originally landed for NJT in #138370 and is generalized and slightly tweaked here.

### Design
#### Principles
* Clean separation among `SampleInput` generation logic, test logic that uses the `SampleInput`s, and xfail / skip logic (which will change as bugs are addressed).
* Flexibility in xfail / skip predicate specification - ideally each bug can be handled by a single skip / xfail, even if it surfaces across a specific class of ops.
    * This is important in practice for NJT, where it's common to have a bug that affects all binary ops, for example.
* Opt-in with minimal test logic changes + no substantial impact on other tests.

#### Details
The core new concept is a `SampleRule`, which can be either an `XFailRule` or `SkipRule`.

```python
@dataclass
class SampleRule(ABC):
    # function to indicate whether the rule applies to this op; return True if so
    # NB: str arg of callable is device_type
    op_match_fn: Callable[[str, OpInfo], bool] = None
    # function to indicate whether the rule applies to this sample; return True if so
    sample_match_fn: Callable[[torch.device, SampleInput], bool] = None
    # optional name for identifying the rule
    name: str = ""

@dataclass
class XFailRule(SampleRule):
    # expected error type
    error_type: TypeVar = Exception
    # expected error message
    error_msg: str = ".*"

@dataclass
class SkipRule(SampleRule):
    ...
```

* See below for example usage details, but at a high level: each test should have a corresponding list of `sample_skips_and_xfails`.
    * The list of `sample_skips_and_xfails` is traversed in order, and the first rule that matches (if any) is applied, so order can matter.
    * The PR includes a logging mechanism for matched rules accessible by setting the loglevel to `DEBUG`.
* The split between `op_match_fn` and `sample_match_fn` is made to allow pre-filtering of the list of rules to get only those that apply to the op under test.
* Each `SampleInput` is run within a subtest context so they can be individually skipped / xfailed as needed. This also means that a test will no longer stop after the first erroring `SampleInput`; all samples will be run through test logic.

### Example Usage
Consider the following OpInfo test:
```python
class MyTestCase(TestCase):
    @ops(op_db)
    def test_foo(self, device, dtype, op):
        for sample in op.sample_inputs(device, dtype, requires_grad=False):
            # do some SampleInput-based test logic
            output = op.op(sample.input, *sample.args, **sample.kwargs)
            ...
```

This is a common pattern for such tests; simply generate a list of `SampleInputs` and run them through the op. Now say you want to xfail one of these `SampleInput`s for a given op. Today, you have to xfail the entire test or hack around this in the test logic.

This PR lets you do this to get very flexible xfail / skips based on op / sample input properties:
```python
# NB: Define rules for per-SampleInput xfails / skips. These can also be defined in-line in the @ops decorator, but
# it can be more readable to maintain these somewhere else. These are attempted to be matched in order and
# the first one that matches applies, so order can matter.
FOO_SKIPS_AND_XFAILS = [
    XFailRule(
        error_type=ValueError,
        error_mg="2D inputs not supported",
        op_match_fn=lambda device, op: (
            # NB: logic for which ops this rule applies to goes here
            op.full_name == "add"
        ),
        sample_match_fn=lambda device, sample: (
            # NB: logic which samples this rule applies to goes here
            sample.input.dim() == 2
        ),
        # NB: optional rule identifier can help with debugging matched rules
        name="add_with_2D_inputs_not_supported",
    ),
    # NB: This follows a similar structure as XFailRule but without error_type / error_msg. Obviously
    # this skips a particular SampleInput instead of xfailing :)
    SkipRule(...),
    ...
]

class MyTestCase(TestCase):
    @ops(op_db)
    @sample_skips_and_xfails(FOO_SKIPS_AND_XFAILS)
    # NB: the @ops decorator automatically filters out any rules that don't apply to this op
    def test_foo(self, device, dtype, op):
        for sample, subtest_ctx in op.sample_inputs(
            # NB: use_subtests=True is required for skips / xfails to work. If skips / xfails are defined and use_subtests != True,
            # an informative error will be thrown.
            device, dtype, requires_grad=False, use_subtests=True
        ):
            # NB: this subtest context manager runs each sample input as a "subtest" and handles skips / xfails appropriately
            with subtest_ctx(self):
                # do some SampleInput-based test logic
                output = op.op(sample.input, *sample.args, **sample.kwargs)
                ...
```

More examples can be seen in `test/test_nestedtensor.py`, where this system is used in practice.

I also demonstrate usage of syntactic sugar over this system in `test/functorch/test_vmap.py`. Here, a skip for the `to()` operator is replaced with a granular xfail for `test_vmap_exhaustive()`:
```python
...
# pre-existing xfail
xfail("item"),
# new granular xfail using syntactic sugar over the general system
xfailIf(
    "to",
    lambda sample: (
        sample.kwargs["memory_format"] == torch.channels_last
    ),
),
...
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140443
Approved by: https://github.com/janeyx99, https://github.com/zou3519
ghstack dependencies: #140160, #138370
2024-11-19 23:09:38 +00:00
b63a84804c Allow NJT by default for weights_only torch.load (take 2) (#140739)
Per discussion with @malfet, only allow weights_only unpickler to load NJT if `torch.nested` and `torch._dynamo`  are imported

(this is slightly weird as technically `torch.nested` is actually imported by default and `torch._dynamo.decorators._DimRange` is actually what needs to be imported)

we can't import this from `torch.nested` as this would
- undo dynamo lazy import
- cause circular import

===========================
Redo of https://github.com/pytorch/pytorch/pull/140304 caused issues as `torch.nested._internal.foo` needs to be imported, which causes issues like

```python
torch/_weights_only_unpickler.py", line 339, in load
    if full_path in _get_allowed_globals():
torch/_weights_only_unpickler.py", line 188, in _get_allowed_globals
    torch.nested._internal.nested_tensor.NestedTensor
AttributeError: module 'torch.nested' has no attribute '_internal'
```

**This likely wasn't caught in our CI because imports are global during unit tests(?), so we use subprocess to properly test this time**

Differential Revision: [D65961691](https://our.internmc.facebook.com/intern/diff/D65961691)

@jbschlosser
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140739
Approved by: https://github.com/malfet
2024-11-19 02:44:53 +00:00
e80b1b2870 Flex + NJT: cross attention support (#140723)
Fixes #140598

Allows ragged structures for query and key+value sequence lengths to differ (i.e. supports cross attention for Flex + NJT).

Technically, this is BC-breaking thanks to arg renaming and positional arg reordering in `create_nested_block_mask()`, but Flex + NJT support isn't in a major release yet so I'm hoping we can just do it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140723
Approved by: https://github.com/drisspg
2024-11-18 19:49:45 +00:00
5dc6b8c19e Revert "Allow NJT by default for weights_only torch.load (#140304)"
This reverts commit 1f28235ee2984dbad45b55aa65358b59a7aeea33.

Reverted https://github.com/pytorch/pytorch/pull/140304 on behalf of https://github.com/mikaylagawarecki due to Breaking internal tests due to missing torch.nested._internal ([comment](https://github.com/pytorch/pytorch/pull/140304#issuecomment-2473928461))
2024-11-13 15:24:00 +00:00
cb71bcc542 Replace clone.detach with detach.clone (#140264)
Fixes #64532

As state in issue, replace `clone.detach` by `detach.clone`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140264
Approved by: https://github.com/soulitzer
2024-11-13 07:01:02 +00:00
d6b3ad4de2 [Dynamo] Replace torch._dynamo.optimize() with torch.compile() [2/N] (#140238)
related commits:

- #139706
- #140238
- #140247
- #140253

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140238
Approved by: https://github.com/soulitzer
2024-11-13 05:13:39 +00:00
3d2dd14217 [BE][Bugfix]: Add rad2deg to pointwise ops (#140290)
Adds missing pontwise tags. Apparently this allows NestedTensor to properly generate a function for opinfo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140290
Approved by: https://github.com/jbschlosser
2024-11-13 00:02:00 +00:00
1f28235ee2 Allow NJT by default for weights_only torch.load (#140304)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140304
Approved by: https://github.com/jbschlosser
2024-11-12 23:34:27 +00:00
034b105d53 [BE][Ez]: Add NT unary op macro (#140213)
* Adds a macro to simplify adding more unary ops to NT.
* Adds sqrt support to NT
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140213
Approved by: https://github.com/jbschlosser
2024-11-12 19:50:06 +00:00
09bab7566a Revert "Allow NJT by default for weights_only torch.load (#140304)"
This reverts commit 455dc4c14264a0cd7d70ba5328382a9fb7769094.

Reverted https://github.com/pytorch/pytorch/pull/140304 on behalf of https://github.com/huydhn due to A bunch of failure shows up in trunk after this lands, so probably a landrace ([comment](https://github.com/pytorch/pytorch/pull/140304#issuecomment-2469602096))
2024-11-12 04:53:10 +00:00
455dc4c142 Allow NJT by default for weights_only torch.load (#140304)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140304
Approved by: https://github.com/jbschlosser
2024-11-12 02:04:18 +00:00
e7ec294c10 NJT OpInfo tests v2 (#138370)
This PR updates OpInfo-based tests for NJTs:
* Adds extensive coverage across non-contiguous NJTs (both non-contiguous transposed and non-contiguous with holes)
    * The `_sample_njts()` helper that `sample_input_func`s utilize now produces non-contig NJTs as well
* Utilizes a `SampleInput`-based xfail system for granular classification of bugs. For example, it's possible to indicate that a class of ops is expected to fail only on non-contig with holes NJT inputs.
    * I decided on adding `SampleInput`s and utilizing this system over using test parametrization for two reasons:
        * Test perf - adding `SampleInput`s is faster than generating entire new tests
        * Avoiding the possibility of `sample_input_func`s not respecting the non-contig test parameter - this would result in silently incorrect passing of these tests. Keeping the responsibility for `SampleInput` generation firmly within each `OpInfo`'s `sample_input_func` means weirdness like this isn't possible
* Improves `SampleInput` naming for a bunch of `sample_input_func`s. This makes it easier to xfail them as needed. For example, binary / unary / other ops now use the new `_describe_njt()` helper to get a string repr that uniquely defines the type of NJT being passed to the op
* Adds appropriate `XFailRule`s to get tests passing for forward / backward / forward compile / backward compile. In general, each xfail corresponds to some bug that needs to be fixed

```python
# Represents a rule indicating how to xfail a particular test. It allows granularity
# at the device, dtype, op, and individual sample levels. This flexibility allows entire
# bugs to be represented by a single rule, even if this corresponds with multiple conceptual
# test cases across multiple ops.
@dataclass
class XFailRule:
    # expected error type
    error_type: TypeVar = Exception
    # expected error message
    error_msg: str = ".*"
    # function to indicate whether the rule applies; return True if so
    match_fn: Callable[[torch.device, torch.dtype, OpInfo, SampleInput], bool] = None
    # optional name for identifying the rule
    name: str = ""

    def match(self, device, dtype, op, sample) -> bool:
        return self.match_fn(device, dtype, op, sample)
```

Example:
```python
    # Bug when broadcasting a binary op with non-contiguous with holes NJT + dense
    # tensor with 1 in ragged dim.
    XFailRule(
        error_type=RuntimeError,
        error_msg="cannot call binary pointwise function .* with inputs of shapes",
        match_fn=lambda device, dtype, op, sample: (
            isinstance(op, BinaryUfuncInfo)
            and "noncontig_holes" in sample.name
            and "broadcasting 1 over ragged" in sample.name
        ),
        name="binary_noncontig_holes_broadcasting_1_over_ragged",
    ),
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138370
Approved by: https://github.com/cpuhrsch, https://github.com/soulitzer
ghstack dependencies: #140160
2024-11-11 19:35:24 +00:00
cbf449c83c [BE]: Add NT missing fp classification functions (#139890)
Follow up to some issues @malfet's recent PR pointed out about missing ops #139763. Tried to mirror it to other important nearby ops. Seems like we could automate / autogen this more for generic pointwise ops like this.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139890
Approved by: https://github.com/malfet
2024-11-06 22:00:54 +00:00
3abbde976d Allow any single non-batch dim to be ragged for NJT (#137125)
Fixes #137512

Relaxes the restriction that the ragged dim is immediately next to the batch dim e.g. `(B, *, D_0, ..., D_N)`. This allows for constructing NJTs of shape e.g. `(B, D, j0)` directly. It's possible before this PR to get an NJT of e.g. shape `(B, D, j0)` by constructing an NJT of shape `(B, j0, D)` and transposing it. This PR allows a user to go straight there without the transpose. The standard `torch.nested.nested_tensor(list)` constructor has been updated to support this.

At the very least, this is useful for testing on transposed NJTs. I'm willing to make this functionality private if needed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137125
Approved by: https://github.com/cpuhrsch, https://github.com/soulitzer
2024-11-06 18:50:08 +00:00