125 Commits

Author SHA1 Message Date
4b8127b90e Revert "[Dynamo] Fix nested function resume execution (#100426)"
This reverts commit d719f0276d69a8315b65f4c4500cfc1cdaddb025.

Reverted https://github.com/pytorch/pytorch/pull/100426 on behalf of https://github.com/jeanschmidt due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/100426#issuecomment-1540915913))
2023-05-09 21:32:13 +00:00
d719f0276d [Dynamo] Fix nested function resume execution (#100426)
Fixes #99665

Let me explain the root cause using the unit test I added:
* This bug is triggered when:
  * ```wrapped``` is a nested function.
  * ```wrapped``` is in another module which is different from the main function ```fn```.
  * There is a graph break inside of ```wrapped```.
* The root cause is when resuming nested function, actually we are using the outermost function(```fn``` in my example)'s global variables, but ```wrapped``` calls ```inner_func``` which is not part of ```fn```'s globals, so we have to set correct globals when nested function resume execution.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100426
Approved by: https://github.com/jansel
2023-05-06 05:04:50 +00:00
77dae43767 Don't truncate leading 1s if they are unbacked (#95141)
This prevents us from guarding on leading unbacked SymInts.

The previous attempt at https://github.com/pytorch/pytorch/pull/94521 I got the logic a bit wrong. My idea there was to avoid slicing when the values to be set have low enough dimensionality that they definitely aren't too long. To do this, I need to compute the difference between the data to be set, and the post-slice space for the values. But I incorrectly compared against the *pre-slice* space in the original PR. Another version of this PR which is wrong is to compare against variableIndices.size(); but remember that in advanced indexing with tensors/lists, each of the individual indices specify what coordinates to read out of each dimension! A third incorrect attempt tested `variableIndices[0].dim()`, which is only correct if you don't broadcast one of the later variable indices, and if there are enough variableIndices to cover all dims. This is all quite complicated, so I went for a simpler solution of checking if the leading dim had a hint before testing if it is not equal to one.

BTW, there is no test for this one stripping behavior. There is now a test for this, based off the real code that caused the problem.

Signed-off-by: Edward Z. Yang <ezyangmeta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95141
Approved by: https://github.com/ngimel
2023-02-21 00:22:24 +00:00
d5d55363d9 Add broadcastable check to index_put (#94849)
Copy-n-paste it from
989299802c/aten/src/ATen/native/TensorAdvancedIndexing.cpp (L582-L583)

Which is used for both CPU and CUDA checks, unless op is called for GPU with `deterministicAlgorithms()` set to true

Followup: do the same for XLA and fix the case when indices are not null

Fixes https://github.com/pytorch/pytorch/issues/94667

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94849
Approved by: https://github.com/ngimel
2023-02-17 20:37:23 +00:00
21dd311077 Add a mode to rerun all disabled tests (without running anything else) (#88646)
Rerun all disabled test to gather their latest result so that we can close disabled tickets automatically. When running under this mode (RERUN_DISABLED_TESTS=true), only disabled tests are run while the rest are skipped `<skipped message="Test is enabled but --rerun-disabled-tests verification mode is set, so only disabled tests are run" type="skip"/>`

The logic is roughly as follows, the test runs multiple times (n=50)

* If the disabled test passes, and it's flaky, do nothing because it's still flaky.  In the test report, we'll see the test passes with the following skipped message:
```
<testcase classname="TestMultiprocessing" file="test_multiprocessing.py" line="357" name="test_fs" time="0.000" timestamp="0001-01-01T00:00:00">
    <skipped message="{&quot;flaky&quot;: True, &quot;num_red&quot;: 4, &quot;num_green&quot;: 0, &quot;max_num_retries&quot;: 3, &quot;rerun_disabled_test&quot;: true}" type="skip"/>
</testcase>
```

* If the disabled test passes every single time, and it is not flaky anymore, mark it so that it can be closed later.  We will see the test runs and passes, i.e.
```
<testcase classname="TestCommonCUDA" name="test_out_warning_linalg_lu_factor_cuda" time="0.170" file="test_ops.py" />
```

* If the disabled test fails after all retries, this is also expected. So only report this but don't fail the job (because we don't care about red signals here), we'll see the test is skipped (without the `flaky` field), i.e.
```
<testcase classname="TestMultiprocessing" file="test_multiprocessing.py" line="357" name="test_fs" time="0.000" timestamp="0001-01-01T00:00:00">
    <skipped message="{&quot;num_red&quot;: 4, &quot;num_green&quot;: 0, &quot;max_num_retries&quot;: 3, &quot;rerun_disabled_test&quot;: true}" type="skip"/>
</testcase>
```

This runs at the same schedule as `mem_leak_check` (daily).  The change to update test stats, and (potentially) grouping on HUD will come in separated PRs.

### Testing

* pull https://github.com/pytorch/pytorch/actions/runs/3447434434
* trunk https://github.com/pytorch/pytorch/actions/runs/3447434928
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88646
Approved by: https://github.com/clee2000
2022-11-15 05:08:26 +00:00
dc9c507d24 add nominal support for int32 indices in index/index_put ops (#86309)
Currently index_select/index_add decompositions decompose to `index` or `index_put` ops. The problem with this is that `index_select` and `index_add` accept int32 indices while `index` doesn't. That leads to error in meta func for those decompositions. This PR adds non-performant support for int32 indices to `index` operations, to allow decompositions go through.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86309
Approved by: https://github.com/lezcano
2022-10-05 23:59:16 +00:00
f37069aac7 Re-enable fixed dynamo tests (#84969)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84969
Approved by: https://github.com/bdhirsh, https://github.com/ezyang
2022-09-16 15:36:52 +00:00
f701cb04fb Test Dynamo CI w Fake Tensors (#84282)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84282
Approved by: https://github.com/anijain2305
2022-09-01 00:15:05 +00:00
cd41c8f032 fix set item to scalar tensor missing gradient info
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78246

Approved by: https://github.com/ngimel
2022-05-25 21:22:44 +00:00
394b4d853c Fix deterministic indexing with non-contiguous tensor
Fixes #76176 (first case)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76220
Approved by: https://github.com/mruberry
2022-04-22 18:46:50 +00:00
0988dc481a [Codemod][Codemod deprecated unittest asserts] fbcode//caffe2/test (#71708)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71708

In Python 3.2, a number of asserts were deprecated.

In Python 3.11, these asserts are deleted completely. The files in this change still use the deprecated asserts.

Switch over to the supported syntax for 3.2 onwards.

Test Plan: Tested on the internal test suite runner.

Reviewed By: ajtulloch

Differential Revision: D33503694

fbshipit-source-id: a150f296033260acf8365d77b837ce0679f57361
(cherry picked from commit abf60ed97409265222915d8265aaabedd625fd93)
2022-03-15 19:28:52 +00:00
a2ab06514b Fixes CUDA vs CPU consistency for index_put_ when accumulating (part 2) (#67189)
Summary:
Description:
- Follow up PR to https://github.com/pytorch/pytorch/issues/66790 to fix the tests on functorch, https://github.com/pytorch/functorch/issues/195

In functorch, a null tensor is added to the list of indices for the batch dimension in C++, but I can not find an equivalent of that in python without using `torch.jit.script`. If any other better solutions could be suggested, I'd be happy to replace the current way of testing.

cc ngimel zou3519

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67189

Reviewed By: suo

Differential Revision: D31966686

Pulled By: ngimel

fbshipit-source-id: a14b9e5d77d9f43cd728d474e2976d84a87a6ff4
2021-11-08 17:56:43 -08:00
efdb17b984 Add meta support to tensor range factories (#67032)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67032

This PR adds meta backend support to the `range`, `arange`, `linspace`, and `logspace` operators.

Note that the original PR (#66630) was reverted due to two failing unit tests in the Bionic CI. This revision includes a fix for those tests; otherwise its content is identical to the previous PR.

Original commit changeset: 2f9d8d1acbb0
ghstack-source-id: 142487306

Test Plan: Extended the existing tensor creation tests to assert meta backend support.

Reviewed By: zhaojuanmao

Differential Revision: D31834403

fbshipit-source-id: a489858a2a8a38a03234b14408e14d2b208a8d34
2021-11-05 15:36:29 -07:00
885a8e53ba replace onlyOnCPUAndCUDA with onlyNativeDeviceTypes (#65201)
Summary:
Reference https://github.com/pytorch/pytorch/issues/53849

Replace `onlyOnCPUandCUDA` with `onlyNativeDeviceTypes` which includes `cpu, cuda and meta`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65201

Reviewed By: mrshenli

Differential Revision: D31299718

Pulled By: mruberry

fbshipit-source-id: 2d8356450c035d6a314209ab51b2c237583920fd
2021-11-01 09:22:34 -07:00
28fac23409 Fixes CUDA vs CPU consistency for index_put_ when accumulating (#66790)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/39227
Fixes https://github.com/pytorch/pytorch/issues/66495 (duplicate of 39227)

Description:
- Expands values for CUDA implementation
- Improved shapes checking for CUDA
- Improved error message for CUDA
- Added tests

cc zou3519

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66790

Reviewed By: mruberry

Differential Revision: D31843566

Pulled By: ngimel

fbshipit-source-id: c9e5d12a33e1067619c210174ba6e3cd66d5718b
2021-10-21 19:09:57 -07:00
8a65047acc [skip ci] Set test owners for everything considered with module: tests (#66865)
Summary:
Action following https://github.com/pytorch/pytorch/issues/66232

cc mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66865

Reviewed By: anjali411

Differential Revision: D31771147

Pulled By: janeyx99

fbshipit-source-id: 8bebe5ac2098364ef1ee93b590abb5f4455b0f89
2021-10-20 09:37:03 -07:00
d37636901e [Doc] make_tensor to torch.testing module (#63925)
Summary:
This PR aims to add `make_tensor` to the `torch.testing` module in PyTorch docs.

TODOs:

* [x] Add examples

cc: pmeier mruberry brianjo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63925

Reviewed By: ngimel

Differential Revision: D30633487

Pulled By: mruberry

fbshipit-source-id: 8e5a1f880c6ece5925b4039fee8122bd739538af
2021-08-30 12:25:40 -07:00
1022443168 Revert D30279364: [codemod][lint][fbcode/c*] Enable BLACK by default
Test Plan: revert-hammer

Differential Revision:
D30279364 (b004307252)

Original commit changeset: c1ed77dfe43a

fbshipit-source-id: eab50857675c51e0088391af06ec0ecb14e2347e
2021-08-12 11:45:01 -07:00
b004307252 [codemod][lint][fbcode/c*] Enable BLACK by default
Test Plan: manual inspection & sandcastle

Reviewed By: zertosh

Differential Revision: D30279364

fbshipit-source-id: c1ed77dfe43a3bde358f92737cd5535ae5d13c9a
2021-08-12 10:58:35 -07:00
42d6543c7b [bc-breaking] Dispatch index_put with boolean mask argument to masked_fill (#61612)
Summary:
https://github.com/pytorch/pytorch/issues/57515

Based on ngimel 's branch, with a few tweaks to determine when to copy value tensors to device memory/additional tests.
bc-breaking note: Previously, if in `x[index]=value` `value` was a 0-d tensor with device different from `x`'s device, it resulted in a RuntimeError. Now this case is handled by copying `value` to the correct device.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61612

Reviewed By: mrshenli

Differential Revision: D29753491

Pulled By: ngimel

fbshipit-source-id: 3fba14f4c2b9b136b50af020f9c1eda88f7373b0
2021-07-19 22:53:14 -07:00
61f946bba6 don't copy indices to the self device in dispatch_index (#59059)
Summary:
Let index/index_put implementation in aten take care of moving the indices to the correct device, don't make python wrapper do that.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59059

Reviewed By: mruberry

Differential Revision: D28750562

Pulled By: ngimel

fbshipit-source-id: 2f2b5f875733898f1c0b30b544c89808f91e4a6f
2021-05-27 14:19:59 -07:00
3de86b951d Migrate thrust->cub for index put (#55693)
Summary:
64bit indexing is not supported, because if `num_indices = 2^31`, then 4 long tensors of `num_indices` elements will take 64GB RAM. I don't think anybody will be interested in running `index_put` with 64GB GPU RAM.

Benchmark on CUDA 11.3 RTX3090:
```python
import torch
import itertools

def run50_sync(f):
    for _ in range(50):
        f()
    torch.cuda.synchronize()

run50_sync(lambda: torch.randperm(1000000, device='cuda'))

def benchmark(M, L):
    a = torch.randn(M, device='cuda')
    i1 = torch.randint(M, (L,), dtype=torch.long, device='cuda')
    v = torch.randn(L, device='cuda')

    torch.cuda.synchronize()

    %timeit run50_sync(lambda:a.index_put_((i1,), v, True))

for M, L in itertools.product((100, 100000, 10000000), repeat=2):
    print(M, L)
    benchmark(M, L)
```

Before
```
100 100
5.13 ms ± 91 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
100 100000
30.2 ms ± 471 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
100 10000000
3.17 s ± 14.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
100000 100
5.19 ms ± 61.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
100000 100000
11.9 ms ± 200 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
100000 10000000
712 ms ± 3.49 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
10000000 100
5.07 ms ± 66.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
10000000 100000
12.1 ms ± 76.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
10000000 10000000
627 ms ± 7.65 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
```

After
```
100 100
3.75 ms ± 49.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
100 100000
26.2 ms ± 154 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
100 10000000
2.81 s ± 23.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
100000 100
3.85 ms ± 16.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
100000 100000
9.74 ms ± 40.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
100000 10000000
444 ms ± 1.86 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
10000000 100
3.85 ms ± 14.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
10000000 100000
10.7 ms ± 116 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
10000000 10000000
396 ms ± 2.63 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55693

Reviewed By: albanD

Differential Revision: D27895967

Pulled By: ngimel

fbshipit-source-id: 0616ce33395ce46f1a4161dfd38940b8e54fedc2
2021-04-27 12:27:09 -07:00
399b66c813 Ports logdet from method_tests() to op_db (#55743)
Summary:
Per title. Also updates some tensor construction helpers.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55743

Reviewed By: ngimel

Differential Revision: D27702060

Pulled By: mruberry

fbshipit-source-id: f64b7bee855733ad1f4fd182819ceec5831d9878
2021-04-11 20:39:16 -07:00
93bf0ae6fc Remove legacy constructor calls from pytorch codebase. (#54142)
Summary:
Follow up from https://github.com/pytorch/pytorch/issues/53889
Related to https://github.com/pytorch/pytorch/issues/47112

Removing every occurrence of the legacy constructor call present in PyTorch at:
- _docs_
- _benchmarks_
- _test_
- _caffe2_
- _CONTRIBUTING.md_

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54142

Reviewed By: ngimel

Differential Revision: D27699450

Pulled By: mruberry

fbshipit-source-id: 530aa3f5746cc8bc1407d5d51b2bbd8075e30546
2021-04-11 15:45:17 -07:00
0527d14248 [numpy] Add torch.take_along_dim (#52833)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/38349

Wrapper around the existing `torch.gather` with broadcasting logic.

TODO:
* [x] Add Doc entry (see if phrasing can be improved)
* [x] Add OpInfo
* [x] Add test against numpy
* [x] Handle broadcasting behaviour and when dim is not given.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52833

Reviewed By: malfet

Differential Revision: D27319038

Pulled By: mruberry

fbshipit-source-id: 00f307825f92c679d96e264997aa5509172f5ed1
2021-03-28 05:22:51 -07:00
dc070605f1 TST Replaces assertEqualIgnoreTypes with assertEqual in test_indexing (#53115)
Summary:
Related to https://github.com/pytorch/pytorch/issues/38095 and https://github.com/pytorch/pytorch/issues/50006

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53115

Reviewed By: mruberry

Differential Revision: D27086086

Pulled By: VitalyFedyunin

fbshipit-source-id: 7a6af6bcf3d7ce9ba96d47a24a40f451d00f0e67
2021-03-16 16:06:36 -07:00
4a2aa0f5f1 index_put_ for complex tensors on CUDA (#51148)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51148

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D26102025

Pulled By: anjali411

fbshipit-source-id: b1b6fd12fda03c4520a3c3200226edf352496188
2021-01-27 09:11:37 -08:00
50b361a821 Enable BF16 for indexing on CUDA (#48801)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48801

Reviewed By: glaringlee

Differential Revision: D25542914

Pulled By: ngimel

fbshipit-source-id: 4113eb2729d15b40a89268172cc37122b5213624
2020-12-14 17:24:31 -08:00
36c87f1243 Refactors test_torch.py to be fewer than 10k lines (#47356)
Summary:
Creates multiple new test suites to have fewer tests in test_torch.py, consistent with previous test suite creation like test_unary_ufuncs.py and test_linalg.py.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47356

Reviewed By: ngimel

Differential Revision: D25202268

Pulled By: mruberry

fbshipit-source-id: 75fde3ca76545d1b32b86d432a5cb7a5ba8f5bb6
2020-11-28 20:11:40 -08:00
0aecbbb762 Changes TensorIterator computation to not consider out kwarg, lets UnaryOps safe cast to out (#39655)
Summary:
**BC breaking note:**

In PyTorch 1.5 passing the out= kwarg to some functions, like torch.add, could affect the computation. That is,

```
out = torch.add(a, b)
```

could produce a different tensor than

```
torch.add(a, b, out=out)
```

This is because previously the out argument participated in the type promotion rules. For greater consistency with NumPy, Python, and C++, in PyTorch 1.6 the out argument no longer participates in type promotion, and has no effect on the computation performed.

**ORIGINAL PR NOTE**

This PR effectively rewrites Tensor Iterator's "compute_types" function to both clarify its behavior and change how our type promotion works to never consider the out argument when determining the iterator's "common dtype," AKA its "computation type." That is,

```
a = op(b, c)
```

should always produce the same result as

```
op(b, c, out=a)
```

This is consistent with NumPy and programming languages like Python and C++.

The conceptual model for this change is that a TensorIterator may have a "common computation type" that all inputs are cast to and its computation performed in. This common computation type, if it exists, is determined by applying our type promotion rules to the inputs.

A common computation type is natural for some classes of functions, like many binary elementwise functions (e.g. add, sub, mul, div...). (NumPy describes these as "universal functions.") Many functions, however, like indexing operations, don't have a natural common computation type. In the future we'll likely want to support setting the TensorIterator's common computation type explicitly to enable "floating ufuncs" like the sin function that promote integer types to the default scalar type. Logic like that is beyond the type promotion system, which can only review inputs.

Implementing this change in a readable and maintainable manner was challenging because compute_types() has had many small modifications from many authors over ~2 year period, and the existing logic was in some places outdated and in other places unnecessarily complicated. The existing "strategies" approach also painted with a broad brush, and two of them no longer made conceptual sense after this change. As a result, the new version of this function has a small set of flags to control its behavior. This has the positive effect of disentangling checks like all operands having the same device and their having the same dtype.

Additional changes in this PR:

- Unary operations now support out arguments with different dtypes. Like binary ops they check canCast(computation type, out dtype).
- The dtype checking for lerp was outdated and its error message included the wrong variable. It has been fixed.
- The check for whether all tensors are on the same device has been separated from other checks. TensorIterators used by copy disable this check.
- As a result of this change, the output dtype can be computed if only the input types are available.
- The "fast path" for checking if a common dtype computation is necessary has been updated and simplified to also handle zero-dim tensors.
- A couple helper functions for compute_types() have been inlined to improve readability.
- The confusingly named and no longer used promote_gpu_output_dtypes_ has been removed. This variable was intended to support casting fp16 reductions on GPU, but it has become a nullop. That logic is now implemented here: 856215509d/aten/src/ATen/native/ReduceOpsUtils.h (L207).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39655

Differential Revision: D21970878

Pulled By: mruberry

fbshipit-source-id: 5e6354c78240877ab5d6b1f7cfb351bd89049012
2020-06-10 09:04:13 -07:00
57d01be92b Replacing assertEqual with assertEqualIgnoreType wherever types missmatch (#38102)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38102

Test Plan: Imported from OSS

Differential Revision: D21477060

Pulled By: VitalyFedyunin

fbshipit-source-id: 25e0fd837ca9bfccf0ce994c80f7790c894096d4
2020-05-09 14:48:55 -07:00
b64fc3c4b5 Changes warnings generated in cpp to show point of Python origination (#36052)
Summary:
Today in PyTorch, warnings triggered in C++ are printed to Python users like this:

`../aten/src/ATen/native/BinaryOps.cpp:81: UserWarning: Integer division of tensors using div or / is deprecated, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead.`

This may be unhelpful to Python users, who have complained it's difficult to relate these messages back to their programs. After this PR, warnings that go through the PyWarningHandler and allow it to add context print like this:

```
test/test_torch.py:16463: UserWarning: Integer division of tensors using div or / is deprecated, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead. (Triggered internally at  ../aten/src/ATen/native/BinaryOps.cpp:81.)
  cpu_result = getattr(cpu_tensor, op_str)(*cpu_args)
```

This relates the warning back to the user's program. The information about the cpp file and line number is preserved in the body of the warning message.

Some warnings, like those generated in the JIT, already account for a user's Python context, and so they specify that they should be printed verbatim and are unaffected by this change. Warnings originating in Python and warnings that go through c10's warning handler, which prints to cerr, are also unaffected.

A test is added to test_torch.py for this behavior. The test relies on uint8 indexing being deprecated and its warning originating from its current header file, which is an unfortunate dependency. We could implement a `torch.warn` function, instead.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36052

Differential Revision: D20887740

Pulled By: mruberry

fbshipit-source-id: d3515c6658a387acb7fccaf83f23dbb452f02847
2020-04-25 21:18:58 -07:00
a604041a11 Back out "[pytorch][PR] indexing: throw exception for masks with dtype=uint8" (#36013)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36013

Original commit changeset: f4ebaabf427d

Test Plan: CI

Differential Revision: D20853694

fbshipit-source-id: 93deb43f67a385ddfd6853fef6f1dc6de408ec37
2020-04-03 21:40:02 -07:00
2f84a07b58 indexing: throw exception for masks with dtype=uint8 (#34418)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/33751
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34418

Differential Revision: D20776164

Pulled By: ngimel

fbshipit-source-id: f4ebaabf427d7967f2f317235562f91c8f9216f0
2020-03-31 20:51:56 -07:00
a500491cbc Fix index_put when tensor length > int_max (#33753)
Summary:
This PR would fix https://github.com/pytorch/pytorch/issues/33345.

The original CUDA kernel looks good. I changed most appearances of `int` to `int64_t` to avoid the CUDA memory access issue. Removed the two `TORCH_CHECK`. Added a unit test.

cc csarofeen ngimel ptrblck
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33753

Differential Revision: D20185005

Pulled By: ngimel

fbshipit-source-id: ef0abdc12ea680e10fe6b85266e2773c7a272f0d
2020-03-01 21:51:23 -08:00
f050b16dd9 Move pytorch distributed tests to separate folder for contbuild. (#30445)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30445

Create distributed and rpc directories under caffe/test for better management
of unit tests.

Differential Revision: D18702786

fbshipit-source-id: e9daeed0cfb846ef68806f6decfcb57c0e0e3606
2020-01-22 21:16:59 -08:00
ee87b01f40 add additional types to indexing operations dispatch (#31692)
Summary:
- Fixes https://github.com/pytorch/pytorch/issues/31672
- Adds Bfloat16 dispatch to the indexing operations that were missing it
    - index_put on cuda does not have bfloat16 dispatch, because I'm not sure bfloat16 math ops work on cuda

Note: `index_put_` with `accum=True` is enabled for `bool`, which does not make much sense, but I'm not the one who started it, so this behavior is preserved.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31692

Differential Revision: D19249561

Pulled By: ngimel

fbshipit-source-id: 1269196194f7b9f611b32be198c001704731a78f
2019-12-29 23:03:54 -08:00
285cc13435 check devices for all input tensors in index_put (#31280)
Summary:
Fix for https://github.com/pytorch/pytorch/issues/30960
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31280

Differential Revision: D19149114

Pulled By: ngimel

fbshipit-source-id: af185a98ac6ea614f43bbf865de02ea113d4ed56
2019-12-18 09:25:40 -08:00
9b875e1256 Buffer python warning to avoid deadlocks
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26613

Test Plan: Imported from OSS

Differential Revision: D18249633

Pulled By: albanD

fbshipit-source-id: 863f52400e1b97943a67a9e1abb09ae8d045e7f0
2019-11-07 08:35:06 -08:00
a9a9d362e2 Makes test_indexing.py device generic (#26634)
Summary:
- Makes test_indexing.py device generic
- Removes test_indexing_cuda.py

Note: a couple tests in test_indexing.py were already CPU and CUDA tests, meaning these tests were run multiple times when CUDA was available. Genericizing test_indexing.py corrects this and lets these tests be run on other device types, like XLA, too.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26634

Differential Revision: D17529001

Pulled By: mruberry

fbshipit-source-id: e71ba28d947749255a0aceeb7b77a42c4811439d
2019-09-23 11:52:48 -07:00
19c675178f Updated docs and added deprecation warnings to acknowledge a bool tensor (#22261)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22261
ghimport-source-id: 1611d62d056a04c0ad15ef662e594a3d206a78e2

Test Plan: Imported from OSS

Differential Revision: D16005990

Pulled By: izdeby

fbshipit-source-id: 2413824aa75a0755719e4df11acd21e6607e5a85
2019-08-05 07:42:34 -07:00
00c1584979 Added possibility to index scalars by bool masks (#21030)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21030
ghimport-source-id: 7a66ca096c62d050a38a6fcc9f6b2d61e387eb34

Differential Revision: D15530498

Pulled By: izdeby

fbshipit-source-id: d5d38f9610caa55fb7179d41f568c5ea5fa1f2e2
2019-05-29 09:32:55 -07:00
5950c1e8c4 Added indexing for bool tensors and bool Indices (#18583)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18583
ghimport-source-id: 2b1941449827f4ab632fa0f5c8cf0791a6be0845

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18583 Added indexing for bool tensors and bool Indices**
* #18505 Added numpy conversion
* #18166 Bool Tensor for CUDA

-----------
This PR enables bool tensor indexing and indexing with bool indices. This is a part of Bool Tensor feature implementation work. The whole plan looks like this:
1. Storage Implementation [Done]
2. Tensor Creation.
    a) CPU [Done]
    b) CUDA [In review]
3. Tensor Conversions. [In review]
4. Tensor Indexing. [This PR]
5. Tensor Operations.
6. Back compatibility related changes.

TODO:
as a follow up, we should move nonzero method from TH to Aten to make code cleaner.

Change:
```
v = torch.tensor([True, False, True], dtype=torch.bool)
boolIndices = torch.tensor([True, False, False], dtype=torch.bool)
v[boolIndices]
-> tensor([True], dtype=torch.bool)

v = torch.randn(5, 7, 3)
boolIndices = torch.tensor([True, False, True, True, False], dtype=torch.bool)
v[boolIndices]
->
tensor([[[ 0.5885, -0.3322,  0.7388],
         [ 1.1182,  0.7808, -1.1492],
         [-0.7952,  0.5255, -0.0251],
         [ 0.7128,  0.8099,  1.2689],
         [-0.7018, -1.4733, -0.3732],
         [ 0.4503,  0.4986, -1.1605],
         [ 0.3348, -1.3767, -0.2976]],

        [[-2.0303, -0.4720, -0.1448],
         [-0.1914, -0.6821,  2.0061],
         [-1.0420, -0.1872, -0.3438],
         [ 1.7587, -0.4183, -0.7577],
         [ 1.0094, -0.1950, -0.2430],
         [ 0.1174,  0.3308, -0.5700],
         [ 0.1110, -0.2714,  1.3006]],

        [[-0.1946, -1.4747, -0.4650],
         [-1.0567,  1.0110, -0.2809],
         [ 0.3729, -0.5699,  0.0815],
         [-0.7733, -0.8316,  0.1674],
         [ 1.2000, -0.3745, -1.1679],
         [ 1.7105,  0.9851, -0.1907],
         [-1.1077,  0.2086, -0.0548]]])
```

Differential Revision: D14673403

fbshipit-source-id: 2b88ec2c7eb26a4f5ef64f8707fb68068d476fc9
2019-04-03 12:47:26 -07:00
173f224570 Turn on F401: Unused import warning. (#18598)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18598
ghimport-source-id: c74597e5e7437e94a43c163cee0639b20d0d0c6a

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18598 Turn on F401: Unused import warning.**

This was requested by someone at Facebook; this lint is turned
on for Facebook by default.  "Sure, why not."

I had to noqa a number of imports in __init__.  Hypothetically
we're supposed to use __all__ in this case, but I was too lazy
to fix it.  Left for future work.

Be careful!  flake8-2 and flake8-3 behave differently with
respect to import resolution for # type: comments.  flake8-3 will
report an import unused; flake8-2 will not.  For now, I just
noqa'd all these sites.

All the changes were done by hand.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14687478

fbshipit-source-id: 30d532381e914091aadfa0d2a5a89404819663e3
2019-03-30 09:01:17 -07:00
a5e7b1d032 Use IndexError instead of RuntimeError in ATen CPU kernels
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17049

Reviewed By: ezyang

Differential Revision: D14064700

Pulled By: fmassa

fbshipit-source-id: 3575db103bba5a7d82f574cbb082beca419151ec
2019-02-13 10:19:28 -08:00
482d3a3bf3 printing correct dimension while indexing (#16495)
Summary:
applySelect does modify the tensor and removes the top most dimension which makes it complicated to track just using dim and need to use another parameter as real_dim to signify original dimension
fixes #16192
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16495

Differential Revision: D13897182

Pulled By: gchanan

fbshipit-source-id: 105581dbbff6b431cc8e2539a07e0058161e53a1
2019-01-31 11:45:56 -08:00
d6cbcb43c5 allow numpy-like boolean-list indexing in pytorch (#14932)
Summary:
Suggested fix to issue #6773, the fix allows numpy-like boolean-list indexing in pytorch
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14932

Differential Revision: D13398795

Pulled By: ezyang

fbshipit-source-id: 67f8daf9829db2550ff76d2bde673be6dd2708cd
2018-12-20 15:33:06 -08:00
04b65dfd1f Issue 14984: Remove divide by zero error in index_put_ (#14986)
Summary:
No check for zero index tensor was done in the accumulate=True (serial) case in the new TensorIterator code since https://github.com/pytorch/pytorch/pull/13420.

https://github.com/pytorch/pytorch/issues/14984
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14986

Differential Revision: D13417861

Pulled By: colesbury

fbshipit-source-id: e6ed1af8f708b53a35803fc157ed1f043169ec89
2018-12-11 13:38:12 -08:00
c1c841a4e7 Changes based on @gchanan's review of #13420 (#14441)
Summary:
```
The most significant change is that this fixes the error message when
indexing an empty tensor with an out-of-bounds index. For example:

  x = torch.ones(10, 0)
  x[:, [3, 4]]
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14441

Differential Revision: D13226737

Pulled By: colesbury

fbshipit-source-id: d1c4a35a30e3217e3d1727d13f6b354a4a3b2a24
2018-11-30 11:03:20 -08:00
006505bb8f Speed-up "advanced" indexing operations (#13420)
Summary:
This speeds-up "advanced" indexing (indexing a tensor by a tensor)
on CPU and GPU. There's still a bunch of work to do, including
speeding up indexing by a byte (boolean) mask and speeding up the derivative
calculation for advanced indexing.

Here's some speed comparisons to indexing on master using a little [benchmark script](https://gist.github.com/colesbury/c369db72aad594e5e032c8fda557d909) with 16 OpenMP threads and on a P100. The test cases are listed as (input shape -> output shape).

| Test case             | CPU (old vs. new)   | CUDA (old vs. new)     |
|-----------------------|---------------------|------------------------|
| 1024x1024 -> 512x1024 | 225 us vs. **57 us**  | 297 us vs. **47 us** |
| 1024x1024 -> 1024x512 | 208 us vs. **153 us** | 335 us vs. **54 us** |
| 50x50 -> 20000x50     | 617 us vs. **77 us**  | 239 us vs. **54 us** |
| 50x50 -> 50x20000     | 575 us vs. **236 us** | 262 us vs. **58 us** |
| 2x5x10 -> 10          | 65 us  vs. **18 us**  | 612 us vs. **93 us** |

See #11647
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13420

Reviewed By: soumith

Differential Revision: D13088936

Pulled By: colesbury

fbshipit-source-id: 0a5c2ee9aa54e15f96d06692d1694c3b24b924e2
2018-11-27 15:23:59 -08:00