Commit Graph

164 Commits

Author SHA1 Message Date
d5988c5eca remove unused type: ignore directives (#60006)
Summary:
During development it is common practice to put `type: ignore` comments on lines that are correct, but `mypy` doesn't recognize this. This often stems from the fact, that the used `mypy` version wasn't able to handle the used pattern.

With every new release `mypy` gets better at handling complex code. In addition to fix all the previously accepted but now failing patterns, we should also revisit all `type: ignore` comments to see if they are still needed or not. Fortunately, we don't need to do it manually: by adding `warn_unused_ignores = True` to the configuration, `mypy` will error out in case it encounters an `type: ignore` that is no longer needed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60006

Reviewed By: jbschlosser, malfet

Differential Revision: D29133237

Pulled By: albanD

fbshipit-source-id: 41e82edc5cd5affa7ccedad044b59b94dad4425a
2021-06-18 07:23:31 -07:00
b8ab98626b only runs mem leak check on master (#60023)
Summary:
setting environment variable to only do cuda mem leak check on master CI jobs.

See discussion in https://github.com/pytorch/pytorch/pull/59402#issuecomment-860773034

See stats before/after disabling mem leak check: https://github.com/pytorch/pytorch/pull/59942#issuecomment-860947095

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60023

Test Plan:
https://github.com/pytorch/pytorch/issues/60108
https://github.com/pytorch/pytorch/issues/60116

Reviewed By: janeyx99

Differential Revision: D29164182

Pulled By: walterddr

fbshipit-source-id: dfe88c2c1275b6eb35f18b58aacdc220f34ccb59
2021-06-17 07:56:26 -07:00
a62f6b6d04 ci: Add skipIfOnGHA util (#59748)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59748

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: janeyx99

Differential Revision: D29008217

Pulled By: seemethere

fbshipit-source-id: ffc2f7935df722f26c1252e3833085430ada7433
2021-06-09 21:19:26 -07:00
97dfc7e300 [Reland] Adding run specified tests option to run_test.py (#59649)
Summary:
Reland of https://github.com/pytorch/pytorch/issues/59487

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59649

Reviewed By: samestep

Differential Revision: D28970751

Pulled By: janeyx99

fbshipit-source-id: 6e28d4dcfdab8a49da4b6a02c57516b08bacd7b5
2021-06-08 16:04:46 -07:00
0208e604e3 seems os.environ.get() not working well on windows (#59634)
Summary:
replace with os.getenv() instead

For some reason this was intermittently failing azure pipelines. I can't login to the pipeline itself for debugging but here are 2 examples: [successful](https://app.circleci.com/pipelines/github/pytorch/pytorch/332405/workflows/944609ad-5dcf-49da-984f-26c381d1f16c/jobs/13969059) vs [failed](https://app.circleci.com/pipelines/github/pytorch/pytorch/332518/workflows/21f8a5a6-3b95-432e-be42-ac98008c671b/jobs/13975637)

However given the fact that the other common_utils.py exposed constants using `os.getenv()` was working. I am making them consistent.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59634

Test Plan: CI/master

Reviewed By: jbschlosser

Differential Revision: D28966412

Pulled By: walterddr

fbshipit-source-id: 7bcb9adf06df0acabd9574459eb6637c3e6a2947
2021-06-08 11:59:39 -07:00
5d6a10a765 Revert D28913223: [pytorch][PR] Adding run-specified-test-cases option in run_test.py
Test Plan: revert-hammer

Differential Revision:
D28913223 (24432eaa29)

Original commit changeset: 0d1f99109734

fbshipit-source-id: 47c073720cff23a5d4cb64556381c46025e90937
2021-06-08 02:18:16 -07:00
57d8bccd00 only reorder tests based on git diff if IN_CI (#59565)
Summary:
Do not reorder tests unless they are in IN_CI, this causes local development test ordering indeterministic. most of use branch out from viable strict not head of master.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59565

Reviewed By: ejguan

Differential Revision: D28943906

Pulled By: walterddr

fbshipit-source-id: e742e7ce4b3fc017d7563b01e93c4cd774d0a537
2021-06-07 17:54:19 -07:00
24432eaa29 Adding run-specified-test-cases option in run_test.py (#59487)
Summary:
The run-specified-test-cases option would allow us to specify a list of test cases to run by having a CSV with minimally two columns: test_filename and test_case_name.

This PR also adds .json to some files we use for better clarity.

Usage:
`python test/run_test.py --run-specified-test-cases <csv_file>` where the csv file can look like:
```
test_filename,test_case_name,test_total_time,windows_only_failure_sha_count,total_sha_count,windows_failure_count,linux_failure_count,windows_total_count,linux_total_count
test_cuda,test_cudnn_multiple_threads_same_device,8068.8409659525,46,3768,53,0,2181,6750
test_utils,test_load_standalone,8308.8062920459,14,4630,65,0,2718,8729
test_ops,test_forward_mode_AD_acosh_cuda_complex128,91.652619369806,11,1971,26,1,1197,3825
test_ops,test_forward_mode_AD_acos_cuda_complex128,91.825633094915,11,1971,26,1,1197,3825
test_profiler,test_source,60.93786725749,9,4656,21,3,2742,8805
test_profiler,test_profiler_tracing,203.09352795241,9,4662,21,3,2737,8807
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59487

Test Plan:
Without specifying the option, everything should be as they were before.

Running `python test/run_test.py --run-specified-test-cases windows_smoke_tests.csv` resulted in this paste P420276949 (you can see internally). A snippet looks like:
```
(pytorch) janeyx@janeyx-mbp pytorch % python test/run_test.py --run-specified-test-cases windows_smoke_tests.csv
Loading specified test cases to run from windows_smoke_tests.csv.
Processed 28 test cases.
Running test_cpp_extensions_jit ... [2021-06-04 17:24:41.213644]
Executing ['/Users/janeyx/miniconda3/envs/pytorch/bin/python', 'test_cpp_extensions_jit.py', '-k', 'test_jit_cuda_archflags'] ... [2021-06-04 17:24:41.213781]
s
----------------------------------------------------------------------
Ran 1 test in 0.000s

OK (skipped=1)
...
```
With pytest, an example executable would be:
`Running test_dataloader ... [2021-06-04 17:37:57.643039]
Executing ['/Users/janeyx/miniconda3/envs/pytorch/bin/python', '-m', 'pytest', 'test_dataloader.py', '-v', '-k', 'test_segfault or test_timeout'] ... [2021-06-04 17:37:57.643327]`

Reviewed By: samestep

Differential Revision: D28913223

Pulled By: janeyx99

fbshipit-source-id: 0d1f9910973426b8756815c697b483160517b127
2021-06-07 16:27:43 -07:00
344ecb2e71 flip via TI (#59509)
Summary:
Resubmit of https://github.com/pytorch/pytorch/issues/58747

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59509

Reviewed By: mruberry

Differential Revision: D28918665

Pulled By: ngimel

fbshipit-source-id: b045c7b35eaf22e53b1bc359ffbe5a4fda05dcda
2021-06-05 15:43:29 -07:00
5117ac3bb4 Revert D28877076: [pytorch][PR] torch.flip via TI
Test Plan: revert-hammer

Differential Revision:
D28877076 (d82bc3feb8)

Original commit changeset: 4fa6eb519085

fbshipit-source-id: c81e7d3283ff6822db913bf9f49a1533268755d0
2021-06-04 23:03:53 -07:00
d82bc3feb8 torch.flip via TI (#58747)
Summary:
Implements an idea by ngimel to improve the performance of `torch.flip` via a clever hack into TI to bypass the fact that TI is not designed to work with negative indices.

Something that might be added is vectorisation support on CPU, given how simple the implementation is now.

Some low-hanging fruits that I did not implement:
- Write it as a structured kernel
- Migrate the tests to opinfos
- Have a look at `cumsum_backward` and `cumprod_backward`,  as I think that they could be implemented faster with `flip`, now that `flip` is fast.

**Edit**
This operation already has OpInfos and it cannot be migrated to a structured kernel because it implements quantisation

Summary of the PR:
- x1.5-3 performance boost on CPU
- x1.5-2 performance boost on CUDA
- Comparable performance across dimensions, regardless of the strides (thanks TI)
- Simpler code

<details>
<summary>
Test Script
</summary>

```python
from itertools import product

import torch
from torch.utils.benchmark import Compare, Timer

def get_timer(size, dims, num_threads, device):
    x = torch.rand(*size, device=device)

    timer = Timer(
        "torch.flip(x, dims=dims)",
        globals={"x": x, "dims": dims},
        label=f"Flip {device}",
        description=f"dims: {dims}",
        sub_label=f"size: {size}",
        num_threads=num_threads,
    )

    return timer.blocked_autorange(min_run_time=5)

def get_params():
    sizes = ((1000,)*2, (1000,)*3, (10000,)*2)
    for size, device in product(sizes, ("cpu", "cuda")):
        threads = (1, 2, 4) if device == "cpu" else (1,)
        list_dims = [(0,), (1,), (0, 1)]
        if len(size) == 3:
            list_dims.append((0, 2))
        for num_threads, dims in product(threads, list_dims):
            yield size, dims, num_threads, device

def compare():
    compare = Compare([get_timer(*params) for params in get_params()])
    compare.trim_significant_figures()
    compare.colorize()
    compare.print()

compare()
```
</details>

<details>
<summary>
Benchmark PR
</summary>

![image](https://user-images.githubusercontent.com/3291265/119139954-81e46d80-ba3b-11eb-9aad-e825e515d41b.png)

</details>

<details>
<summary>
Benchmark master
</summary>

![image](https://user-images.githubusercontent.com/3291265/119139915-76914200-ba3b-11eb-9aa8-84b3ca220c93.png)

</details>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58747

Reviewed By: agolynski

Differential Revision: D28877076

Pulled By: ngimel

fbshipit-source-id: 4fa6eb519085950176cb3a9161eeb3b6289ec575
2021-06-04 20:13:38 -07:00
e9e5588588 Improve Tensor traverse to traverse its grad_fn when possible (#58271)
Summary:
There are two main changes here:
- THPVariable will actually visit their grad_fn if there are no other reference to the c++ Tensor and no other reference to the grad_fn. The critical observation compared to the existing comment (thanks Ed!) is that if we also check that the c++ Tensor object is not referenced somewhere else, we're sure that no one can change the grad_fn refcount between the traverse and the clear.
- THPVariable don't need a special clear for this new cases as we're the only owner of the c++ Tensor and so the cdata.reset() will necessarily free the Tensor and all its resources.

The two tests are to ensure:
- That the cycles are indeed collectible by the gc

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58271

Reviewed By: ngimel

Differential Revision: D28796461

Pulled By: albanD

fbshipit-source-id: 62c05930ddd0c48422c79b03118db41a73c1355d
2021-06-01 10:27:52 -07:00
ea465f7378 OpInfo: true_divide and minor fix (#59154)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/54261

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59154

Reviewed By: ngimel

Differential Revision: D28780115

Pulled By: mruberry

fbshipit-source-id: 91e254698597fa0c7d4df6053ec017a85e180304
2021-05-30 18:35:30 -07:00
0c1420aa3c OpInfo: fmod and remainder (#57941)
Summary:
See https://github.com/pytorch/pytorch/issues/54261

cc: mruberry Lezcano kshitij12345

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57941

Reviewed By: mrshenli

Differential Revision: D28744464

Pulled By: mruberry

fbshipit-source-id: 19847277d4f8d3a39a706c2b3c9eddf0dedcb20c
2021-05-27 20:32:56 -07:00
b14c3205fd [JIT] Add torch._C.ScriptDict (#52659)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52659

**Summary**
This commit adds `torch._C.ScriptDict`, a dictionary type that has reference
semantics across the Python/TorchScript boundary. That is, modifications
made to instances of `torch._C.ScriptDict` in TorchScript are visible in
Python even when it is not returned from the function. Instances can be
constructed by passing an instance of a Python dictionary to
`torch.jit.script`. In the case of an empty dictionary, its type is
assumed to be `Dict[str, Tensor]` to be consistent with the handling of
empty dictionaries in TorchScript source code.

`torch._C.ScriptDict` is implemented using a modified version of pybind's `stl_bind.h`-style bindings attached to `ScriptDict`, `ScriptDictIterator` and `ScriptDictKeyIterator`, wrapper classes around `c10::impl::GenericDict` and `c10::impl::GenericDict::iterator`. These bindings allow instances of `torch._C.ScriptDict` to be used as if it were a regular `dict` Python. Reference semantics are achieved by simply retrieving the `IValue` contained in `ScriptDict` in `toIValue` (invoked when converting Python arguments to `IValues` before calling TorchScript code).

**Test Plan**
This commit adds `TestScriptDict` to `test_list_dict.py`, a set of tests
that check that all of the common dictionary operations are supported
and that instances have reference semantics across the
Python/TorchScript boundary.

Differential Revision:
D27211605
D27211605

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Pulled By: SplitInfinity

fbshipit-source-id: 446d4e5328375791aa73eb9e8b04dfe3465af960
2021-05-27 10:25:30 -07:00
aaca12bcc2 Deprecate in docs torch.svd and change svd -> linalg_svd (#57981)
Summary:
This PR adds a note to the documentation that torch.svd is deprecated together with an upgrade guide on how to use `torch.linalg.svd` and `torch.linalg.svdvals` (Lezcano's instructions from https://github.com/pytorch/pytorch/issues/57549).
In addition, all usage of the old svd function is replaced with a new one from torch.linalg module, except for the `at::linalg_pinv` function, that fails the XLA CI build (https://github.com/pytorch/xla/issues/2755, see failure in draft PR https://github.com/pytorch/pytorch/pull/57772).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57981

Reviewed By: ngimel

Differential Revision: D28345558

Pulled By: mruberry

fbshipit-source-id: 02dd9ae6efe975026e80ca128e9b91dfc65d7213
2021-05-11 18:04:10 -07:00
29753339b7 Do not download slow test when on sandcastle (#57953)
Summary:
Downloading slow_test list on SC causes timeout, this is even a bigger issue since `common_utils.py` is reused in many internal projects/modules.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57953

Test Plan: CI

Reviewed By: janeyx99

Differential Revision: D28325527

fbshipit-source-id: ae47c9e43ad6f416008005bb26ceb2f3d6966f2e
2021-05-10 10:39:10 -07:00
18c89a904b Modernize test-suite in sparse tensor CSR (#56392)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56392

Fixes for gh-56371 and gh-56369

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D27913212

Pulled By: mruberry

fbshipit-source-id: 2c78fe9fa4b6c6b566d9eb01f71e6016d672a545
2021-04-27 15:22:17 -07:00
d01302431c Enable fast gradcheck for real inputs and outputs (#55237)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55237

In this PR, we reenable fast-gradcheck and resolve misc issues that arise:
Before landing this PR, land #55182 so that slow tests are still being run periodically.

Bolded indicates the issue is handled in this PR, otherwise it is handled in a previous PR.

**Non-determinism issues**:
- ops that do not have deterministic implementation (as documented https://pytorch.org/docs/stable/generated/torch.use_deterministic_algorithms.html#torch.use_deterministic_algorithms)
  - test_pad_cuda (replication_pad2d) (test_nn)
  - interpolate (test_nn)
  - cummin, cummax (scatter_add_cuda_kernel) (test_ops)
  - test_fn_gradgrad_prod_cpu_float64 (test_ops)

Randomness:
  - RRelu (new module tests) - we fix by using our own generator as to avoid messing with user RNG state (handled in #54480)

Numerical precision issues:
- jacobian mismatch: test_gelu (test_nn, float32, not able to replicate locally) - we fixed this by disabling for float32 (handled in previous  PR)
- cholesky_solve (test_linalg): #56235 handled in previous PR
- **cumprod** (test_ops) - #56275 disabled fast gradcheck

Not yet replicated:
 - test_relaxed_one_hot_categorical_2d (test_distributions)

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D27920906

fbshipit-source-id: 894dd7bf20b74f1a91a5bc24fe56794b4ee24656
2021-04-22 19:46:37 -07:00
5dcc7ac35c Add new scheduled job to circle-ci workflow (#55182)
Summary:
Under this setting the job should run 3 times a day.

When the environment variable, `PYTORCH_TEST_WITH_SLOW_GRADCHECK` is set to `ON`, set the default value for `fast_mode` in gradchack wrapper as False. This would be overriden by whatever value the user explicitly passes in.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55182

Reviewed By: albanD

Differential Revision: D27919236

Pulled By: soulitzer

fbshipit-source-id: 3a55ec6edcfc6e65fbc3a8a09c63aaea1bd1c5bf
2021-04-21 17:05:10 -07:00
59b61f912a Switch assertWarnsOnceRegex logic to check any instead of all. (#56434)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56434

If we hit multiple TORCH_WARN from different sources when running the
statement, it makes more sense to me that we want to check the regex is
met in any one of the warning messages instead of all messages.

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D27871946

Pulled By: ailzhang

fbshipit-source-id: 5940a8e43e4cc91aef213ef01e48d506fd9a1132
2021-04-20 10:37:36 -07:00
e3900d2ba5 Add lint for unqualified noqa (#56272)
Summary:
As this diff shows, currently there are a couple hundred instances of raw `noqa` in the codebase, which just ignore all errors on a given line. That isn't great, so this PR changes all existing instances of that antipattern to qualify the `noqa` with respect to a specific error code, and adds a lint to prevent more of this from happening in the future.

Interestingly, some of the examples the `noqa` lint catches are genuine attempts to qualify the `noqa` with a specific error code, such as these two:
```
test/jit/test_misc.py:27:            print(f"{hello + ' ' + test}, I'm a {test}") # noqa E999
test/jit/test_misc.py:28:            print(f"format blank") # noqa F541
```
However, those are still wrong because they are [missing a colon](https://flake8.pycqa.org/en/3.9.1/user/violations.html#in-line-ignoring-errors), which actually causes the error code to be completely ignored:

- If you change them to anything else, the warnings will still be suppressed.
- If you add the necessary colons then it is revealed that `E261` was also being suppressed, unintentionally:
  ```
  test/jit/test_misc.py:27:57: E261 at least two spaces before inline comment
  test/jit/test_misc.py:28:35: E261 at least two spaces before inline comment
  ```

I did try using [flake8-noqa](https://pypi.org/project/flake8-noqa/) instead of a custom `git grep` lint, but it didn't seem to work. This PR is definitely missing some of the functionality that flake8-noqa is supposed to provide, though, so if someone can figure out how to use it, we should do that instead.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56272

Test Plan:
CI should pass on the tip of this PR, and we know that the lint works because the following CI run (before this PR was finished) failed:

- https://github.com/pytorch/pytorch/runs/2365189927

Reviewed By: janeyx99

Differential Revision: D27830127

Pulled By: samestep

fbshipit-source-id: d6dcf4f945ebd18cd76c46a07f3b408296864fcb
2021-04-19 13:16:18 -07:00
42f5d66080 [DDP] Fixes flaky tests caused by incorrect floating-point comparison (#56192)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/50699.

The root cause was that some floating-point assertions had a "greater than or **equal to**" condition. The "equal to" part was causing flakiness due to strict equality check (`==`) in `TestCase.assertGreaterEqual()`. This PR introduces a new assertion method called `assertGreaterAlmostEqual()` in `common_utils.py` that mitigates the problem by behaving similar to `TestCase.assertAlmostEqual()`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56192

Reviewed By: zhaojuanmao

Differential Revision: D27804724

Pulled By: cbalioglu

fbshipit-source-id: bc44a41ca4ce45dfee62fb3769fb47bfd9028831
2021-04-15 17:15:42 -07:00
3c6b52ae62 Cache slow/disabled test files (#55682)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55682

Fixes #55648

For now it downloads and writes the relevant files to the system's temp dir and marks it as valid for 3 hours.

Test Plan: Imported from OSS

Reviewed By: malfet, nikithamalgifb

Differential Revision: D27685616

Pulled By: driazati

fbshipit-source-id: 27469b85fe4b6b4addde6b22bf795bca3d4990ef
2021-04-12 09:17:07 -07:00
399b66c813 Ports logdet from method_tests() to op_db (#55743)
Summary:
Per title. Also updates some tensor construction helpers.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55743

Reviewed By: ngimel

Differential Revision: D27702060

Pulled By: mruberry

fbshipit-source-id: f64b7bee855733ad1f4fd182819ceec5831d9878
2021-04-11 20:39:16 -07:00
6ee333cdb5 modernize test_sparse (#54572)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54572

Adding device generic tests to `test_sparse`.
Follow-up PR: #54153

I think is ready to review.
Looking forward your comments cc mruberry.

Thanks

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D27562663

Pulled By: mruberry

fbshipit-source-id: c48973e707f779b529bc7f61b75103194b428987
2021-04-09 12:19:29 -07:00
2a24a2418a common_utils.py use new file names for disabled/slow tests (#55620)
Summary:
Following these changes in renaming the files:
https://github.com/pytorch/pytorch/pull/55618
https://github.com/pytorch/test-infra/pull/3

We should update the use sites in common_utils.py

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55620

Reviewed By: samestep

Differential Revision: D27651884

Pulled By: janeyx99

fbshipit-source-id: 298a981e55e0b7c95202294d9bc4b3fcce359590
2021-04-09 09:25:20 -07:00
f4967d68f5 make torch.testing asserts importable (#54769)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54769

Follow-up to #53820. This

- makes the `asserts.py` module private as per suggestion from rgommers in https://github.com/pytorch/pytorch/pull/53820#issuecomment-802661387. With this the functions should only be accessible through `torch.testing`, giving us the option the change the underlying structure later.
- moves the code from `torch/testing/__init__.py` to `torch/testing/_core.py` (happy to accept other name suggestions). Otherwise we can't import the new `_asserts.py` in `torch/testing/__init__.py` due to circular imports.

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D27438451

Pulled By: mruberry

fbshipit-source-id: c7292b4d5709185b42b4aac8016648562688040e
2021-04-07 23:53:02 -07:00
2e9eb5afa2 Use slow tests stats in common_utils (#55190)
Summary:
This is a step in adding automatic slowTest detection to our testing infrastructure. This uses stats (updated daily) in https://github.com/pytorch/test-infra/blob/master/stats/.pytorch-slow-tests to determine whether more tests need to be marked as slow as they are run.

More details in previous PR draft/proposal [here](https://github.com/pytorch/pytorch/pull/54456#issue-598388491), though I no longer think we need the third step as using a raw git file does not require much processing.

Upon looking at [logs](https://circleci.com/api/v1.1/project/github/pytorch/pytorch/12060292/output/107/0?file=true&allocation-id=606660dbd8e5857bcc2b2e0f-0-build%2F60DCA8CD) for the coverage tests as of the first commit [when I had not skipped the tests so we could see their actual times], here are some slow tests that weren't marked as slow before:
```
test_fn_gradgrad_unfold_cpu_complex128 (__main__.TestGradientsCPU) (172.554s)
test_matmul_4d_4d_complex_cpu (__main__.TestAutogradDeviceTypeCPU) (180.057s)
test_conv1d_basic (__main__.TestXNNPACKConv1dTransformPass) (94.737s)
```

And here is a test that wasn't actually slow but was still marked as slow based on stats:
```
test_trunc_normal (__main__.TestNNInit) ... ok (1.208s)
```

The new logs show the above tests as skipped (as they should be):
[Coverage Test 1](https://app.circleci.com/pipelines/github/pytorch/pytorch/296224/workflows/ba6c2917-51f8-4fb8-be57-90151c2e5c25/jobs/12126156) and [Coverage Test 2](https://app.circleci.com/pipelines/github/pytorch/pytorch/296224/workflows/ba6c2917-51f8-4fb8-be57-90151c2e5c25/jobs/12126155)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55190

Reviewed By: samestep

Differential Revision: D27566663

Pulled By: janeyx99

fbshipit-source-id: c13f8c676bb8eb15d9d697d224dbaef7df98aef3
2021-04-07 08:04:39 -07:00
fd02fc5d71 Port put_ and take from TH to ATen (#53356)
Summary:
The two ports were don together, as they can be implemented with the same kernel. In TH, they were already implemented with the same kernel.

Resolves https://github.com/pytorch/pytorch/issues/24751
Resolves https://github.com/pytorch/pytorch/issues/24614
Resolves https://github.com/pytorch/pytorch/issues/24640
Resolves https://github.com/pytorch/pytorch/issues/24772

This port makes sure that it interacts correctly with the "deterministic algorithms" flag, as done in https://github.com/pytorch/pytorch/pull/51388

This PR also makes these two functions correct in the following aspects (all of them added to the tests as well):
- Support for complex numbers
- Correct handling of scalar inputs and zero-dimensional inputs
- Implementation that does not do any copies nor sorting of any of the input tensors
- Faster and more correct implementation of the backwards (now it works as it should when `source.shape() != index.shape()`)
- Now `put_(..., accumulate=True)` is implemented correctly with atomic operations on GPU / CPU (when possible) and is deterministic (modulo the loss of precision that might happen due to the reordering of a sum of floats)
- Adds the `torch.put` function that was missing, (`index_put` exists, for example)
- Corrected docs

It also adds a much more thorough testing to the operations and their gradients.

There is a BC-breaking change, and that is that now we check that the inputs do not overlap in the `put_` operation. This was handled (some of the cases, other cases were wrong) in the TH implementation by making contiguous copies of the inputs. How should we handle this one?

**Edit.** Benchmarks:
<details>
<summary>Script</summary>

```python
from IPython import get_ipython
import torch
from itertools import product

torch.manual_seed(13)
torch.set_num_threads(1)

ipython = get_ipython()

cpu = torch.device('cpu')
cuda = torch.device('cuda')

def run_test(ndims, size, index_len, device, cmd):
    print(f"cmd: {cmd}, ndims: {ndims}, tensor_size: {size}, index_len: {index_len}, device: {device}")

    large_tensor = torch.rand(*([size] * ndims), device=device)
    small_tensor = torch.rand((index_len,), device=device)
    index = torch.randint(size * ndims, (index_len,), dtype=torch.long, device=device)
    if cmd == "put":
        command = "large_tensor.put_(index, small_tensor, accumulate=False)"
        if device == cuda:
            command += "; torch.cuda.synchronize()"
    elif cmd == "accumulate":
        command = "large_tensor.put_(index, small_tensor, accumulate=True)"
        if device == cuda:
            command += "; torch.cuda.synchronize()"
    elif cmd == "take":
        command = "torch.take(large_tensor, index)"
        if device == cuda:
            command += "; torch.cuda.synchronize()"
    ipython.magic(f"timeit {command}")
    print()

for method, device in product(["accumulate", "put", "take"], [cpu, cuda]):
    run_test(3, 1000, 10, device, method)
    run_test(3, 1000, 1000, device, method)
    run_test(3, 1000, 10000, device, method)
    run_test(2, 10000, 100000, device, method)
```
</details>

```python
put_(accumulate=False)
```

<details>
<summary>ATen CPU (1.5x - 2x speedup)</summary>

```python
cmd: put, ndims: 3, tensor_size: 1000, index_len: 10, device: cpu
1.05 µs ± 2.35 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

cmd: put, ndims: 3, tensor_size: 1000, index_len: 1000, device: cpu
3.15 µs ± 5.13 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

cmd: put, ndims: 3, tensor_size: 1000, index_len: 10000, device: cpu
21.6 µs ± 13.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

cmd: put, ndims: 2, tensor_size: 10000, index_len: 100000, device: cpu
238 µs ± 781 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```
</details>

<details>
<summary>TH CPU</summary>

```python
cmd: put, ndims: 3, tensor_size: 1000, index_len: 10, device: cpu
722 ns ± 2.67 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

cmd: put, ndims: 3, tensor_size: 1000, index_len: 1000, device: cpu
4.89 µs ± 18.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

cmd: put, ndims: 3, tensor_size: 1000, index_len: 10000, device: cpu
42.5 µs ± 96.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

cmd: put, ndims: 2, tensor_size: 10000, index_len: 100000, device: cpu
428 µs ± 774 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```
</details>
<details>
<summary>ATen GPU (same speed)</summary>

```python
cmd: put, ndims: 3, tensor_size: 1000, index_len: 10, device: cuda
8.99 µs ± 16 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

cmd: put, ndims: 3, tensor_size: 1000, index_len: 1000, device: cuda
10.4 µs ± 24.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

cmd: put, ndims: 3, tensor_size: 1000, index_len: 10000, device: cuda
10.4 µs ± 11.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

cmd: put, ndims: 2, tensor_size: 10000, index_len: 100000, device: cuda
15.6 µs ± 1.12 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
```
</details>

<details>
<summary>TH GPU</summary>

```python
cmd: put, ndims: 3, tensor_size: 1000, index_len: 10, device: cuda
8.44 µs ± 31.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

cmd: put, ndims: 3, tensor_size: 1000, index_len: 1000, device: cuda
9.09 µs ± 4.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

cmd: put, ndims: 3, tensor_size: 1000, index_len: 10000, device: cuda
9.77 µs ± 0.998 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

cmd: put, ndims: 2, tensor_size: 10000, index_len: 100000, device: cuda
15.8 µs ± 5.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
```
</details>

```python
put_(accumulate=True)
```

<details>
<summary>ATen CPU (x2 speedup)</summary>

```python
cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 10, device: cpu
1.12 µs ± 2.91 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 1000, device: cpu
3.14 µs ± 2.05 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 10000, device: cpu
20.8 µs ± 25.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

cmd: accumulate, ndims: 2, tensor_size: 10000, index_len: 100000, device: cpu
264 µs ± 263 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```
</details>

<details>
<summary>TH CPU</summary>

```python
cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 10, device: cpu
814 ns ± 1.87 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 1000, device: cpu
5.11 µs ± 6.02 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 10000, device: cpu
43.9 µs ± 49.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

cmd: accumulate, ndims: 2, tensor_size: 10000, index_len: 100000, device: cpu
442 µs ± 1.07 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```
</details>
<details>
<summary>ATen GPU (3x - 11x speedup)</summary>

```python
cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 10, device: cuda
9.01 µs ± 14.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 1000, device: cuda
10.4 µs ± 15.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 10000, device: cuda
10.3 µs ± 44.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

cmd: accumulate, ndims: 2, tensor_size: 10000, index_len: 100000, device: cuda
12.6 µs ± 19 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
```
</details>

<details>
<summary>TH GPU</summary>

```python
cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 10, device: cuda
34.7 µs ± 131 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 1000, device: cuda
38.2 µs ± 116 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 10000, device: cuda
61.2 µs ± 50.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

cmd: accumulate, ndims: 2, tensor_size: 10000, index_len: 100000, device: cuda
140 µs ± 24.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
```
</details>

```python
take()
```

<details>
<summary>ATen CPU (1.1x speedup)</summary>

```python
cmd: take, ndims: 3, tensor_size: 1000, index_len: 10, device: cpu
1.18 µs ± 2.34 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

cmd: take, ndims: 3, tensor_size: 1000, index_len: 1000, device: cpu
2.79 µs ± 2.96 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

cmd: take, ndims: 3, tensor_size: 1000, index_len: 10000, device: cpu
16.6 µs ± 10.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

cmd: take, ndims: 2, tensor_size: 10000, index_len: 100000, device: cpu
161 µs ± 984 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
```
</details>

<details>
<summary>TH CPU</summary>

```python
cmd: take, ndims: 3, tensor_size: 1000, index_len: 10, device: cpu
1.1 µs ± 3.14 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

cmd: take, ndims: 3, tensor_size: 1000, index_len: 1000, device: cpu
2.93 µs ± 7.31 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

cmd: take, ndims: 3, tensor_size: 1000, index_len: 10000, device: cpu
18.6 µs ± 14.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

cmd: take, ndims: 2, tensor_size: 10000, index_len: 100000, device: cpu
178 µs ± 139 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
```
</details>
<details>
<summary>ATen GPU (same speed)</summary>

```python
cmd: take, ndims: 3, tensor_size: 1000, index_len: 10, device: cuda
9.38 µs ± 23.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

cmd: take, ndims: 3, tensor_size: 1000, index_len: 1000, device: cuda
10.7 µs ± 9.77 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

cmd: take, ndims: 3, tensor_size: 1000, index_len: 10000, device: cuda
10.6 µs ± 107 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

cmd: take, ndims: 2, tensor_size: 10000, index_len: 100000, device: cuda
11.5 µs ± 21.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
```
</details>

<details>
<summary>TH GPU</summary>

```python
cmd: take, ndims: 3, tensor_size: 1000, index_len: 10, device: cuda
9.31 µs ± 7.57 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

cmd: take, ndims: 3, tensor_size: 1000, index_len: 1000, device: cuda
9.52 µs ± 5.78 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

cmd: take, ndims: 3, tensor_size: 1000, index_len: 10000, device: cuda
9.73 µs ± 17.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

cmd: take, ndims: 2, tensor_size: 10000, index_len: 100000, device: cuda
11.7 µs ± 5.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
```
</details>

cc mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53356

Reviewed By: mruberry

Differential Revision: D27520243

Pulled By: ngimel

fbshipit-source-id: e3979349c2c62d2949e09fb05e5fd4883fbc9093
2021-04-05 18:05:38 -07:00
6d87b3667f Added support for TensorList inputs in OpInfo (#54922)
Summary:
Stack:
* https://github.com/pytorch/pytorch/issues/54954 Fixed OpInfo jit tests failing for TensorList inputs
* __#54922 Added support for TensorList inputs in OpInfo__

Updated OpInfo to accept either a `Tensor` or `TensorList` as `sample.input` and added workarounds to make this work with gradcheck.

Note: JIT testing support for TensorList inputs will be added in a follow up PR.

Fixes https://github.com/pytorch/pytorch/issues/51996

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54922

Reviewed By: H-Huang

Differential Revision: D27448952

Pulled By: heitorschueroff

fbshipit-source-id: 3f24a56f6180eb2d044dcfc89ba59fce8acfe278
2021-03-31 04:42:10 -07:00
b5ab348253 Fix missing format string qualifier (#54705)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54705

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D27338808

Pulled By: ezyang

fbshipit-source-id: b21c931c2306e525bc444766bc203bb303868dbf
2021-03-27 11:55:36 -07:00
61b074581c torch.prod backward for complex types. (#48125)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/53511
torch.det does depend on torch.prod, which in turn depends on several other functions, and they also depend on torch.prod, so there is a circular relationship, hence this PR will enable complex backward support for several functions at once.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48125

Reviewed By: pbelevich

Differential Revision: D27188589

Pulled By: anjali411

fbshipit-source-id: bbb80f8ecb83a0c3bea2b917627d3cd3b84eb09a
2021-03-19 09:44:08 -07:00
a2a7179695 Fix bug in assertRaises NotImplemented handling when no exception is thrown (#54126)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54126

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: agolynski, mruberry

Differential Revision: D27109510

Pulled By: ezyang

fbshipit-source-id: ba5a4de85ca00f81724f3d4e645797e8f32aa3b1
2021-03-17 12:30:51 -07:00
c2f41b6b84 Add meta device to generic device testing framework, skip NotImplementedError (#53682)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53682

With this, under the meta device, 101 tests passed and 16953 skipped.
It ain't much, but it's a start.

Some various bits and bobs:
- NotImplementedError suppression at test level is implemented
  in the same way as CUDA memory leak check, i.e., by wrapping
  test methods and monkeypatching them back in.
- I had to reimplement assertRaises/assertRaisesRegex from scratch to
  ignore NotImplementedError when _ignore_not_implemented_error is True.
  The implementation relies on a small amount of private API that hasn't
  changed since 2010
- expectedAlertNondeterministic doesn't really work so I skipped them
  all; there's probably a way to do it better

I tested this using `pytest --disable-warnings --tb=native -k meta --sw
test/*.py` and a pile of extra patches to make collection actually work
(lol).

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D26955539

Pulled By: ezyang

fbshipit-source-id: ac21c8734562497fdcca3b614a28010bc4c03d74
2021-03-14 20:41:19 -07:00
d47d246206 Add 'noarch' tests which only run in one CI config (#53747)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53747

Fixes #53743

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D26971343

Pulled By: ezyang

fbshipit-source-id: cee7aa10063ae674f741406a3af830e4b4f128df
2021-03-14 20:39:07 -07:00
00771eff8e [reland] Add OpInfo for bitwise_not (#53181)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/42515

Note: Reland https://github.com/pytorch/pytorch/issues/51944

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53181

Reviewed By: albanD

Differential Revision: D27004695

Pulled By: mruberry

fbshipit-source-id: 92b4e8c60bb6f3c302907716de040b5c81c8db69
2021-03-12 14:43:56 -08:00
afa1ff8e04 Implements torch.linalg.lstsq (#49093)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/44378 by providing a wider range of drivers similar to what SciPy is doing.

The supported CPU drivers are `gels, gelsy, gelsd, gelss`.
The CUDA interface has only `gels` implemented but only for overdetermined systems.

The current state of this PR:
- [x] CPU interface
- [x] CUDA interface
- [x] CPU tests
- [x] CUDA tests
- [x] Memory-efficient batch-wise iteration with broadcasting which fixes https://github.com/pytorch/pytorch/issues/49252
- [x] docs

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49093

Reviewed By: albanD

Differential Revision: D26991788

Pulled By: mruberry

fbshipit-source-id: 8af9ada979240b255402f55210c0af1cba6a0a3c
2021-03-12 13:25:55 -08:00
1e992810b5 Revert D26811466: [pytorch][PR] [reland] Add OpInfo for bitwise_not and make ROCM and CUDA OpInfo tests consistent
Test Plan: revert-hammer

Differential Revision:
D26811466 (a5ada2127d)

Original commit changeset: 8434a7515d83

fbshipit-source-id: 9c2c760e18154a88cf7531e45843a802e3f3d19c
2021-03-08 15:47:47 -08:00
a5ada2127d [reland] Add OpInfo for bitwise_not and make ROCM and CUDA OpInfo tests consistent (#53181)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/42515

This PR also enables the OpInfo tests on ROCM to check the same dtypes that of CUDA.

Note: Reland https://github.com/pytorch/pytorch/issues/51944

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53181

Reviewed By: zhangguanheng66

Differential Revision: D26811466

Pulled By: mruberry

fbshipit-source-id: 8434a7515d83ed859db1b2f916fad81a9deaeb9b
2021-03-08 03:39:01 -08:00
54a2498919 Modify tests to use assertWarnsOnceRegex instead of maybeWarnsRegex (#52387)
Summary:
Related to https://github.com/pytorch/pytorch/issues/50006

Follow on for https://github.com/pytorch/pytorch/issues/48560 to ensure TORCH_WARN_ONCE warnings are caught. Most of this is straight-forward find-and-replace, but I did find one place where the TORCH_WARN_ONCE warning was not wrapped into a python warning.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52387

Reviewed By: albanD

Differential Revision: D26773387

Pulled By: mruberry

fbshipit-source-id: 5be7efbc8ab4a32ec8437c9c45f3b6c3c328f5dd
2021-03-08 03:32:14 -08:00
8c798e0622 Forbid trailing whitespace (#53406)
Summary:
Context: https://github.com/pytorch/pytorch/pull/53299#discussion_r587882857

These are the only hand-written parts of this diff:
- the addition to `.github/workflows/lint.yml`
- the file endings changed in these four files (to appease FB-internal land-blocking lints):
  - `GLOSSARY.md`
  - `aten/src/ATen/core/op_registration/README.md`
  - `scripts/README.md`
  - `torch/csrc/jit/codegen/fuser/README.md`

The rest was generated by running this command (on macOS):
```
git grep -I -l ' $' -- . ':(exclude)**/contrib/**' ':(exclude)third_party' | xargs gsed -i 's/ *$//'
```

I looked over the auto-generated changes and didn't see anything that looked problematic.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53406

Test Plan:
This run (after adding the lint but before removing existing trailing spaces) failed:
- https://github.com/pytorch/pytorch/runs/2043032377

This run (on the tip of this PR) succeeded:
- https://github.com/pytorch/pytorch/runs/2043296348

Reviewed By: walterddr, seemethere

Differential Revision: D26856620

Pulled By: samestep

fbshipit-source-id: 3f0de7f7c2e4b0f1c089eac9b5085a58dd7e0d97
2021-03-05 17:22:55 -08:00
9c2673df46 Revert D26723384: [pytorch][PR] Implements torch.linalg.lstsq
Test Plan: revert-hammer

Differential Revision:
D26723384 (3ac9013235)

Original commit changeset: c9866a95f140

fbshipit-source-id: 3e5263d71facdc91ca09d7dcbbbe3ba818ee2821
2021-03-03 15:24:25 -08:00
3ac9013235 Implements torch.linalg.lstsq (#49093)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/44378 by providing a wider range of drivers similar to what SciPy is doing.

The supported CPU drivers are `gels, gelsy, gelsd, gelss`.
The CUDA interface has only `gels` implemented but only for overdetermined systems.

The current state of this PR:
- [x] CPU interface
- [x] CUDA interface
- [x] CPU tests
- [x] CUDA tests
- [x] Memory-efficient batch-wise iteration with broadcasting which fixes https://github.com/pytorch/pytorch/issues/49252
- [x] docs

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49093

Reviewed By: H-Huang

Differential Revision: D26723384

Pulled By: mruberry

fbshipit-source-id: c9866a95f14091955cf42de22f4ac9e2da009713
2021-03-02 19:00:07 -08:00
2444b4d122 Add wait_for_worker param to TCPStore and fix port in use flaky test failures (#52888)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52888

Test Plan: Imported from OSS

Reviewed By: glaringlee

Differential Revision: D26678707

Pulled By: H-Huang

fbshipit-source-id: 5662e60c4d06d88d2e57834f496b52fb7600de29
2021-03-02 14:31:33 -08:00
f71d9e28f9 Store test filename in test report path (#52791)
Summary:
This way, we can have a mapping from the test files we directly execute (the tests [here](https://github.com/pytorch/pytorch/blob/master/test/run_test.py#L20)) to the test suites that we store data for in XML reports.

This will come in use later for categorizing the tests we run in CI.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52791

Reviewed By: samestep

Differential Revision: D26655086

Pulled By: janeyx99

fbshipit-source-id: 94be32f80d7bc0ea1a7a11d4c4b1d3d8e774c5ea
2021-02-25 07:53:30 -08:00
08d7f29601 Add discontiguous kwarg to make_tensor (#51985)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51985

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D26375733

Pulled By: heitorschueroff

fbshipit-source-id: bb7831dc28c24b90c6f83885681eeccfdbb83438
2021-02-24 08:57:24 -08:00
e8ab58bfc7 [reland] Early terminate CUDA on common_utils TestCases (#52126)
Summary:
Take 2 of https://github.com/pytorch/pytorch/issues/50914
This change moves the early termination logic into common_utils.TestCase class.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52126

Test Plan: CI with ci-all tag

Reviewed By: malfet

Differential Revision: D26391762

Pulled By: walterddr

fbshipit-source-id: a149ecc47ccda7f2795e107fb95915506ae060b4
2021-02-12 07:32:42 -08:00
9f1f5636d7 Revert D26019289: [pytorch][PR] Early terminate CUDA on common_utils TestCases
Test Plan: revert-hammer

Differential Revision:
D26019289 (c1b7ca8062)

Original commit changeset: ddc7c1c0d00d

fbshipit-source-id: 6902d03fa06cda5d03191846bc4dd98af501b594
2021-02-10 17:29:10 -08:00
c1b7ca8062 Early terminate CUDA on common_utils TestCases (#50914)
Summary:
This is a follow up on https://github.com/pytorch/pytorch/issues/49869.

Previously CUDA early termination only happens for generic test classes that extends from `DeviceTypeTestBase`. However, JIT test cases which extends from common_utils.TestCase cannot benefit from the early termination.

This change moves the early termination logic into common_utils.TestCase class.
- all tests extended from common_utils.TestCase now should early terminate if CUDA assert occurs.
- For TestCases that extends from common_device_type.DeviceTypeTestBase, still only do torch.cuda.synchronize() when RTE is thrown.
- For TestCases extends common_utils.TestCase, regardless of whether a test case uses GPU or not, it will always synchronize CUDA as long as `torch.cuda.is_initialize()` returns true.
- Disabling this on common_distributed.py

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50914

Reviewed By: malfet

Differential Revision: D26019289

Pulled By: walterddr

fbshipit-source-id: ddc7c1c0d00db4d073a6c8bc5b7733637a7e77d1
2021-02-10 07:15:40 -08:00