Commit Graph

40 Commits

Author SHA1 Message Date
b90496eef5 [nn] zero_grad() set_to_none default True (#92731)
Attempts to fix #92656

BC-breaking! This changes the default of zero_grad in optim and in nn to default set grads to None instead of zero tensors. We are changing the default because there are proven perf wins and existing code has typically not regressed due to this change. (will probably have to flesh out this note more).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92731
Approved by: https://github.com/ngimel
2023-01-26 01:04:28 +00:00
b3603f8129 Revert "Deduplicate c10 error and PyTorchError hierarchy (#87855)"
This reverts commit 34f2d3e6ae56744c20c2f859f97101dff291bbbc.

Reverted https://github.com/pytorch/pytorch/pull/87855 on behalf of https://github.com/osalpekar due to perf regression in quantization tests
2023-01-06 19:56:35 +00:00
34f2d3e6ae Deduplicate c10 error and PyTorchError hierarchy (#87855)
Fixes #53370

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87855
Approved by: https://github.com/albanD
2023-01-02 15:53:36 +00:00
eqy
946e57704e Drop compute capability < 5.0 in CUDA 12 (#91213)
CC @ptrblck @crcrpar

#91122
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91213
Approved by: https://github.com/ngimel
2022-12-30 04:53:05 +00:00
4c20c0509d Split out forward AD tests from test_ops_gradients and reenable slow gradcheck CI (#88216)
Fixes: https://github.com/pytorch/pytorch/issues/88010

This PR does a couple things to stop slow gradcheck from timing out:
- Splits out test_ops_fwd_gradients from test_ops_gradients, and factors out TestFwdGradients and TestBwdGradients which both inherit from TestGradients, now situated in common_utils (maybe there is a better place?)
- Skips CompositeCompliance (and several other test files) for slow gradcheck CI since they do not use gradcheck
- because test times for test_ops_fwd_gradients and test_ops_gradients are either unknown or wrong, we hardcode them for now to prevent them from being put together. We can undo the hack after we see actual test times are updated. ("def calculate_shards" randomly divides tests with unknown test times in a round-robin fashion.)
- Updates references to test_ops_gradients and TestGradients
- Test files that are skipped for slow gradcheck CI are now centrally located in in run_tests.py, this reduces how fine-grained we can be with the skips, so for some skips (one so far) we still use the old skipping mechanism, e.g. for test_mps

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88216
Approved by: https://github.com/albanD
2022-11-03 00:20:45 +00:00
c794ee5cc1 Reenable TestCppExtensionJIT on M1 (#84552)
Works fine locally, let's see if it'll pass CI

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84552
Approved by: https://github.com/kit1980
2022-09-06 17:49:29 +00:00
2255911f8a Make M1 tests green (#82213)
This is skipping all the failing tests and add a new master job to test on M1

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82213
Approved by: https://github.com/seemethere, https://github.com/soulitzer, https://github.com/malfet
2022-08-05 16:12:08 +00:00
0fcdf936e7 Skip tests that don't call gradcheck in slow gradcheck CI (#82117)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82117
Approved by: https://github.com/kit1980, https://github.com/albanD
2022-07-25 21:33:52 +00:00
8473173c36 Remove breakpad dependency
This functionality does not seem to be used
and there are some requests to update dependency.

Add `third_party` to torch_cpu include directories if compiling with
Caffe2 support, as `caffe2/quantization/server/conv_dnnlowp_op.cc` depends on `third_party/fbgemm/src/RefImplementations.h`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75394
Approved by: https://github.com/janeyx99, https://github.com/seemethere
2022-05-03 20:21:55 +00:00
d79d9fa283 Revert "Remove breakpad dependency"
This reverts commit 9aa3c7fd8389735b04622bf07f6ef85c608374d0.

Reverted https://github.com/pytorch/pytorch/pull/75394 on behalf of https://github.com/malfet
2022-04-17 17:58:51 +00:00
9aa3c7fd83 Remove breakpad dependency
This functionality does not seem to be used
and there are some requests to update dependency

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75394
Approved by: https://github.com/janeyx99, https://github.com/seemethere
2022-04-17 17:43:45 +00:00
e279963eef Remove remaining THC code (#69039)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69039

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D32872476

Pulled By: ngimel

fbshipit-source-id: 7972aacc24aef9450fb59b707ed6396c501bcb31
2021-12-08 12:18:08 -08:00
f5c5ab2868 [skip ci] Set test owner for cpp-extensions tests (#66837)
Summary:
Action following https://github.com/pytorch/pytorch/issues/66232

cc yf225 glaringlee zou3519

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66837

Reviewed By: anjali411

Differential Revision: D31828401

Pulled By: janeyx99

fbshipit-source-id: 35ac27f3e1c0eb70ccb38c07c42ba61bd0c848fe
2021-10-21 08:15:38 -07:00
8eb85b5027 Remove THCNumerics (#66388)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66388

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D31547710

Pulled By: ngimel

fbshipit-source-id: 20710328f2e5fc2e931a3f8ba9b4243acc310d54
2021-10-12 22:05:03 -07:00
9afdf017dc Add force_on_cpu test to win cuda10.2 on GHA (#65094)
Summary:
Part of migrating from Circle.

Once we get a successful force_on_cpu test, we can move it to trunk only.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65094

Reviewed By: seemethere

Differential Revision: D31086289

Pulled By: janeyx99

fbshipit-source-id: e1d135cc844d51f0b243b40efb49edca277d9de8
2021-09-21 11:14:15 -07:00
bd8608cd5c Use CMake for breakpad (#63186)
Summary:
We currently build breakpad from [this fork](https://github.com/driazati/breakpad) to include extra logic to restore signal handlers that were previously present. With some [new additions](https://github.com/google/breakpad/compare/main...driazati:main) this fork now includes a CMake based build, so we can add breakpad as a proper dependency rather than rely on including it in Docker images as a system library which is error prone (we have a bunch of images) and hard to extend to MacOS / Windows. This also includes some changes to the crash handling code to support MacOS / Windows in a similar way to Linux.

```python
import torch

# On Windows this writes crashes to C:\Users\<user>\AppData\pytorch_crashes
# On MacOS/Linux this writes crashes to /tmp/pytorch_crashes
torch.utils._crash_handler.enable_minidumps()

# Easy way to cause a segfault and trigger the handler
torch.bincount(input=torch.tensor([9223372036854775807]))
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63186

Reviewed By: malfet, seemethere

Differential Revision: D30318404

Pulled By: driazati

fbshipit-source-id: 0d7daf3701cfaba5451cc529a0730272ab1eb1dc
2021-08-19 10:42:01 -07:00
1022443168 Revert D30279364: [codemod][lint][fbcode/c*] Enable BLACK by default
Test Plan: revert-hammer

Differential Revision:
D30279364 (b004307252)

Original commit changeset: c1ed77dfe43a

fbshipit-source-id: eab50857675c51e0088391af06ec0ecb14e2347e
2021-08-12 11:45:01 -07:00
b004307252 [codemod][lint][fbcode/c*] Enable BLACK by default
Test Plan: manual inspection & sandcastle

Reviewed By: zertosh

Differential Revision: D30279364

fbshipit-source-id: c1ed77dfe43a3bde358f92737cd5535ae5d13c9a
2021-08-12 10:58:35 -07:00
86715623dd Adding super calls to JIT test case setUp and tearDown (#61922)
Summary:
This issue was surfaced when adding this issue: https://github.com/pytorch/pytorch/issues/61655 did not manage to skip the appropriate test case.

I then investigated and realized it was because the setUp code that does the test disabling is not called because another defined setUp overrode the parent class' setUp.

I am not sure if that was intentional--if so we would have to adopt the child class' code to call the check_if_enable function in common_utils.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61922

Reviewed By: ejguan

Differential Revision: D29798716

Pulled By: janeyx99

fbshipit-source-id: d31b664e48507d69de14574ff5e6ecf1d41ae24d
2021-07-20 15:08:44 -07:00
45cc207a88 Fix breakpad build + add test canary (#60990)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60990

This makes the breakpad build more explicit in its messaging and hints to cmake where to look for the library (it wasn't able to find it without `PATHS` on CI even though that works locally). This also adds a smoke test that will fail if breakpad isn't present on a CI job where it is expected (e.g. binary builds).

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D29514316

Pulled By: driazati

fbshipit-source-id: 79514363334788f311ba5d4f25deed3452f0c3eb
2021-07-06 14:15:07 -07:00
059a717c9e Fix breakpad build and add to more images (#59236)
Summary:
This PR
* adds the breakpad build to most of the remaining docker images (except the mobile + slim ones)
* pins to a [fork of breakpad](https://github.com/google/breakpad/compare/master...driazati:master?expand=1) to enable dasiy chaining on signal handlers
* renames the API to be nicer

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59236

Reviewed By: malfet

Differential Revision: D28792511

Pulled By: driazati

fbshipit-source-id: 83723e74b7f0a00e1695210ac2620a0c91ab4bf2
2021-06-01 22:47:14 -07:00
1ec12fd491 Add minidump collection via breakpad (#55647)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55647

This adds [breakpad](https://github.com/google/breakpad) which comes with out-of-the-box utilities to register a signal handler that writes out a minidump on an unhandled exception. Right now this is gated behind a flag in `torch.utils`, but in the future it could be on by default. Sizewise this adds aboute 500k to `libtorch_cpu.so` (187275968 B to 187810016 B).

```bash
$ cat <<EOF > test.py
import torch

torch.utils.enable_minidump_collection()

# temporary util that just segfaults
torch._C._crash()
EOF

$ python test.py
Wrote minidump to /tmp/pytorch_crashes/6a829041-50e9-4247-ea992f99-a74cf47a.dmp
fish: “python test.py” terminated by signal SIGSEGV (Address boundary error)
$ minidump-2-core /tmp/pytorch_crashes/6a829041-50e9-4247-ea992f99-a74cf47a.dmp -o core.dmp
$ gdb python core.dmp
... commence debugging ...
```

Right now all exceptions that get passed up to Python don't trigger the signal handler (which by default only
handles [these](https://github.com/google/breakpad/blob/main/src/client/linux/handler/exception_handler.cc#L115)). It would be possible for PyTorch exceptions to explicitly write a minidump when passed up to Python (maybe only when the exception is unhandled or something).

Test Plan: Imported from OSS

Reviewed By: ailzhang

Differential Revision: D27679767

Pulled By: driazati

fbshipit-source-id: 1ab3b5160b6dc405f5097eb25acc644d533358d7
2021-04-16 13:05:01 -07:00
c0966914bc Internal gradcheck wrapper in testing._internal that sets certain flags to True (#51133)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/49409

There are many call sites where, gradcheck/gradgradcheck is now being implicitly invoked with `check_batched_grad` as True, but they were previously False. Cases fall into two basic categories:
1) the call site was previously using `torch.autograd.gradcheck` but is now changed to use the globally imported function instead
3) the call site was already using globally imported function, but does not explicitly pass `check_batched_grad` flag

Only in the _assertGradAndGradgradChecks cases, which are infrequent, I assumed that the the author is aware that omitting the flag means not applying check_batched_grad=True. (but maybe that is not the case?)

Overall this PR in its current state assumes that unless the author explicitly specified `check_batched_grad=False`, they were just probably not aware of this flag and did not mean to have this flag as False.

So far exceptions to the above (as discovered by CI) include:
 - Mkldnn (opaque tensors do not have strides) https://app.circleci.com/pipelines/github/pytorch/pytorch/264416/workflows/e4d87886-6247-4305-8526-2696130aa9a4/jobs/10401882/tests
 - all cases in test_sparse (https://app.circleci.com/pipelines/github/pytorch/pytorch/264553/workflows/3c1cbe30-830d-4acd-b240-38d833dccd9b/jobs/10407103)
 - all cases in test_overrides (https://app.circleci.com/pipelines/github/pytorch/pytorch/264553/workflows/3c1cbe30-830d-4acd-b240-38d833dccd9b/jobs/10407236)
 - test_autograd (test_LSTM_grad_and_gradgrad) - (https://app.circleci.com/pipelines/github/pytorch/pytorch/264553/workflows/3c1cbe30-830d-4acd-b240-38d833dccd9b/jobs/10407235)
 - test_data_parallel (test_data_parallel_buffers_requiring_grad) - *SIGSEGV* (https://app.circleci.com/pipelines/github/pytorch/pytorch/264820/workflows/14d89503-040d-4e3d-9f7b-0bc04833589b/jobs/10422697)
 - test_nn (https://app.circleci.com/pipelines/github/pytorch/pytorch/264919/workflows/df79e3ed-8a31-4a8e-b584-858ee99686ff/jobs/10427315)

Possible TODO is to prevent new tests from invoking external gradcheck.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51133

Reviewed By: ezyang

Differential Revision: D26147919

Pulled By: soulitzer

fbshipit-source-id: dff883b50f337510a89f391ea2fd87de2d531432
2021-01-29 09:13:37 -08:00
5834b3b204 Fix test_jit_cuda_archflags on machine with more than one arch (#50405)
Summary:
This fixes the following flaky test on machine with gpus of different arch:
```
_________________________________________________________________________________________________________________ TestCppExtensionJIT.test_jit_cuda_archflags __________________________________________________________________________________________________________________

self = <test_cpp_extensions_jit.TestCppExtensionJIT testMethod=test_jit_cuda_archflags>

    unittest.skipIf(not TEST_CUDA, "CUDA not found")
    unittest.skipIf(TEST_ROCM, "disabled on rocm")
    def test_jit_cuda_archflags(self):
        # Test a number of combinations:
        #   - the default for the machine we're testing on
        #   - Separators, can be ';' (most common) or ' '
        #   - Architecture names
        #   - With/without '+PTX'

        capability = torch.cuda.get_device_capability()
        # expected values is length-2 tuple: (list of ELF, list of PTX)
        # note: there should not be more than one PTX value
        archflags = {
            '': (['{}{}'.format(capability[0], capability[1])], None),
            "Maxwell+Tegra;6.1": (['53', '61'], None),
            "Pascal 3.5": (['35', '60', '61'], None),
            "Volta": (['70'], ['70']),
        }
        if int(torch.version.cuda.split('.')[0]) >= 10:
            # CUDA 9 only supports compute capability <= 7.2
            archflags["7.5+PTX"] = (['75'], ['75'])
            archflags["5.0;6.0+PTX;7.0;7.5"] = (['50', '60', '70', '75'], ['60'])

        for flags, expected in archflags.items():
>           self._run_jit_cuda_archflags(flags, expected)

test_cpp_extensions_jit.py:198:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
test_cpp_extensions_jit.py:158: in _run_jit_cuda_archflags
    _check_cuobjdump_output(expected[0])
test_cpp_extensions_jit.py:134: in _check_cuobjdump_output
    self.assertEqual(actual_arches, expected_arches,
../../.local/lib/python3.9/site-packages/torch/testing/_internal/common_utils.py:1211: in assertEqual
    super().assertEqual(len(x), len(y), msg=self._get_assert_msg(msg, debug_msg=debug_msg))
E   AssertionError: 2 != 1 : Attempted to compare the lengths of [iterable] types: Expected: 2; Actual: 1.
E   Flags: ,  Actual: ['sm_75', 'sm_86'],  Expected: ['sm_86']
E   Stderr:
E   Output: ELF file    1: cudaext_archflags.1.sm_75.cubin
E   ELF file    2: cudaext_archflags.2.sm_86.cubin

```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50405

Reviewed By: albanD

Differential Revision: D25920200

Pulled By: mrshenli

fbshipit-source-id: 1042a984142108f954a283407334d39e3ec328ce
2021-01-26 08:38:54 -08:00
654ab209c6 [JIT] Disable broken tests (#43750)
Summary:
These started failing since **https://github.com/pytorch/pytorch/pull/43633** for indecipherable reasons; temporarily disable. The errors on the PRs were
```
Downloading workspace layers
  workflows/workspaces/3ca9ca71-7449-4ae1-bb7b-b7612629cc62/0/8607ba99-5ced-473b-b60a-0025b48739a6/0/105.tar.gz - 8.4 MB
Applying workspace layers
  8607ba99-5ced-473b-b60a-0025b48739a6
```
which is not too helpful...

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43750

Reviewed By: ZolotukhinM

Differential Revision: D23388060

Pulled By: eellison

fbshipit-source-id: 96afa0160ec948049f3e194787a0a7ddbeb5124a
2020-08-27 18:12:57 -07:00
0f78e596ba ROCm: Fix linking of custom ops in load_inline (#41257)
Summary:
Previously we did not link against amdhip64 (roughly equivalent to cudart). Apparently, the recent RTDL_GLOBAL fixes prevent the extensions from finding the symbols needed for launching kernels.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41257

Reviewed By: zou3519

Differential Revision: D22573288

Pulled By: ezyang

fbshipit-source-id: 89f9329b2097df26785e2f67e236d60984d40fdd
2020-07-17 12:14:50 -07:00
a318234eb0 Print raising warnings in Python rather than C++ if other error occurs (#41116)
Summary:
When we return to Python from C++ in PyTorch and have warnings and and error, we have the problem of what to do when the warnings throw because we can only throw one error.
Previously, if we had an error, we punted all warnings to the C++ warning handler which would write them to stderr (i.e. system fid 2) or pass them on to glog.

This has drawbacks if an error happened:
- Warnings are not handled through Python even if they don't raise,
- warnings are always printed with no way to suppress this,
- the printing bypasses sys.stderr, so Python modules wanting to
  modify this don't work (with the prominent example being Jupyter).

This patch does the following instead:
- Set the warning using standard Python extension mechanisms,
- if Python decides that this warning is an error and we have a
  PyTorch error, we print the warning through Python and clear
  the error state (from the warning).

This resolves the three drawbacks discussed above, in particular it fixes https://github.com/pytorch/pytorch/issues/37240 .

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41116

Differential Revision: D22456393

Pulled By: albanD

fbshipit-source-id: c3376735723b092efe67319321a8a993402985c7
2020-07-09 11:38:07 -07:00
ac8c8b028d [ROCm] restore jit tests (#40447)
Summary:
Remove `skipIfRocm` from most jit tests and enable `RUN_CUDA_HALF` tests for ROCm.

These changes passed more than three rounds of CI testing against the ROCm CI.

CC ezyang xw285cornell sunway513
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40447

Differential Revision: D22190711

Pulled By: xw285cornell

fbshipit-source-id: bac44825a2675d247b3abe2ec2f80420a95348a3
2020-06-27 01:03:59 -07:00
3e6fa778a5 Testcppextensionjit rebuild once (#40169)
Summary:
Previous:
    deco dont_wipe_extensions_build_folder control clean build path or not.
Now:
    If cpp files or args changed, rebuild extension. clean build path only before and after test suite.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40169

Differential Revision: D22161450

Pulled By: ezyang

fbshipit-source-id: 9167c8265e13922f68cd886be900f84ffc6afb84
2020-06-23 08:43:14 -07:00
13120bf677 Updates assertEqual to require atol and rtol, removes positional atol (#38872)
Summary:
This updates assertEqual and assertEqual-like functions to either require both or neither of atol and rtol be specified. This should improve clarity around handling precision in the test suite, and it allows us to remove the legacy positional atol argument from assertEqual. In addition, the "message" kwarg is replace with a kwarg-only "msg" argument whose name is consistent with unittest's assertEqual argument.

In the future we could make "msg" an optional third positional argument to be more consistent with unittest's assertEqual, but requiring it be specified should be clear, and we can easily update the signature to make "msg" an optional positional argument in the future, too.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38872

Differential Revision: D21740237

Pulled By: mruberry

fbshipit-source-id: acbc027aa1d7877a49664d94db9a5fff91a07042
2020-05-27 06:31:07 -07:00
63e545e0fe Revert D21717199: [pytorch][PR] Updates assertEqual to require atol and rtol, removes positional atol
Test Plan: revert-hammer

Differential Revision:
D21717199

Original commit changeset: 9feb856f94ee

fbshipit-source-id: bfde9c39a5ce99f0ca6183a7dde703c65b7c8259
2020-05-26 18:23:59 -07:00
6ddca30b2d Updates assertEqual to require atol and rtol, removes positional atol (#38872)
Summary:
This updates assertEqual and assertEqual-like functions to either require both or neither of atol and rtol be specified. This should improve clarity around handling precision in the test suite, and it allows us to remove the legacy positional atol argument from assertEqual. In addition, the "message" kwarg is replace with a kwarg-only "msg" argument whose name is consistent with unittest's assertEqual argument.

In the future we could make "msg" an optional third positional argument to be more consistent with unittest's assertEqual, but requiring it be specified should be clear, and we can easily update the signature to make "msg" an optional positional argument in the future, too.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38872

Differential Revision: D21717199

Pulled By: mruberry

fbshipit-source-id: 9feb856f94eee911b44f6c7140a1d07c1b026d3a
2020-05-26 08:30:23 -07:00
12f5a32863 Don't use NonVariableTypeMode in custom ops (#37355)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37355

Potentially fixes https://github.com/pytorch/pytorch/issues/37306
ghstack-source-id: 103073537

Test Plan: waitforsandcastle

Differential Revision: D21261946

fbshipit-source-id: 454652b528dcf942bec5438f89201822de40bbf0
2020-04-28 20:11:31 -07:00
e75fb4356b Remove (most) Python 2 support from Python code (#35615)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35615

Python 2 has reached end-of-life and is no longer supported by PyTorch.
Now we can clean up a lot of cruft that we put in place to support it.
These changes were all done manually, and I skipped anything that seemed
like it would take more than a few seconds, so I think it makes sense to
review it manually as well (though using side-by-side view and ignoring
whitespace change might be helpful).

Test Plan: CI

Differential Revision: D20842886

Pulled By: dreiss

fbshipit-source-id: 8cad4e87c45895e7ce3938a88e61157a79504aed
2020-04-22 09:23:14 -07:00
d070c0bcf0 ROCm: enable cpp_extensions.load/load_inline (#35897)
Summary:
This enables cpp_extensions.load/load_inline. This works by hipify-ing cuda sources.
Also enable tests.
CuDNN/MIOpen extensions aren't yet supported, I propose to not do this in this PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35897

Differential Revision: D20983279

Pulled By: ezyang

fbshipit-source-id: a5d0f5ac592d04488a6a46522c58e2ee0a6fd57c
2020-04-13 11:44:08 -07:00
9e3605de98 [RELAND] New operator registration API (#35061) (#35629)
Summary:
Reland of https://github.com/pytorch/pytorch/pull/35061 ; removed
the get qualified type name magic from debug strings to work around
MSVC 2017 bug.

Main points of the new API:

- You can register implementations (impl) without having to specify a schema.
- Registrations are commutative, so no matter what order your static
  initializers run, you end up with the same end result.

op_registration_test.cpp contains a reasonably comprehensive accounting
for the available API surface

How does this implementation proceed?  The basic concept is to relax the
internal invariants of Dispatcher data structures to allow the
possibility that a FunctionSchema is not specified in an Operator.

- DispatchKeyExtractor has an uninitialized state where it doesn't look
  for dispatch keys in any arguments of the stack.  It can have a
  schema (de)registered to itself post facto with
  registerSchema/unregisterSchema.
- DispatchTable has a new constructor taking only an OperatorName for
  the uninitialized state.  It can have a schema (de)registered to itself
  post facto with registerSchema/unregisterSchema
- OperatorDef maintains counts of both defs and well as defs_and_impls.
  defs_and_impls keeps track of the outstanding impl registrations; you
  may have impl registrations but no defs.  If there are no defs (no
  schema), the operator is not returned by findSchema.  A new
  findOperatorByName fucntion unconditionally returns the OperatorHandle
  even if there's no schema.  OperatorHandle::hasSchema can be used
  to check if the operator has schema.
- Replaced 'registerKernel' with 'registerImpl', which is the new
  interface for directly registering kernels without implementations.
- Because 'registerImpl' no longer requires an OperatorHandle, change
  'registerDef' to only return a RegistrationHandleRAII.  This is marginally
  less efficient (since we're doing two hash table lookups on a registration
  now), but this won't matter in the long term, and probably doesn't
  matter now either.
- Rename registerBackendFallbackKernel to registerFallback (this exposed
  a bunch of places where we're improperly directly interfacing with Dispatcher;
  we need to add this capability to the true public API)
- All code generated internal registrations are switched to use the new
  API.  This includes VariableType registrations (which previously
  weren't converted) and the mobile autograd stuff
- Switch the new-style def()/impl() APIs to interact directly with Dispatcher,
  rather than indirecting through the old API
- We deleted alias analysis kind merging entirely.  As a nod to BC, it's
  possible to define a full schema with alias analysis kind, and then
  later do another full schema def with missing alias analysis kind, but
  the opposite direction is not allowed.  We can remove this entirely
  following the plan at https://github.com/pytorch/pytorch/issues/35040
- Schema matching is moved inside the dispatcher, because we might not
  be able to immediately schema match at the point of an impl() (because
  we don't have the schema yet).  To do this, we store the inferred
  function schema inside a KernelEntry, so we can check it when we get
  the real schema.
- Registered kernel functions now store a debug string which
  can be used to more easily identify them.  Tests use this to
  distinguish between multiple distinct registrations; regular
  invocations get only very basic information.

Because we need our static initializers to work no matter what order
they're run, the testing strategy on this PR is quite involved.

The general concept:
- Bind a (very gimped) version of the dispatcher API from Python,
  so that we can easily write a more complex testing harness
  using expect tests.
- For series of registrations we want to test, exhaustively
  test every possible permutation of registrations (and
  deregistrations), and show that the intermediate states
  agree no matter what path is taken.
- Intermediate states are rendered using a new dumpState()
  debugging method that prints the internal state of the
  dispatcher.  This method may be generally useful for people
  who want to see what's in the dispatcher.
- Simultaneously, add a new invariant testing function which
  checks that the internal invariants of the dispatcher are
  upheld (so we don't have to print internal implementation
  details of the dispatcher)

The testing framework found a few bugs in development.  For example,
here is a case where we registered schema too early, before checking
if it was valid:

```
Traceback (most recent call last):
  File "test/test_dispatch.py", line 164, in test_def_impl_schema_mismatch
    ], raises=True)
  File "test/test_dispatch.py", line 135, in commute
    results=results, raises=raises)
  File "test/test_dispatch.py", line 83, in run_permutation
    .format(ctor_order[:i], op_ix))
  File "test/test_dispatch.py", line 59, in check_invariants
    .format(expected_provenance, actual_provenance)
AssertionError: 'name[16 chars]ema: (none)\ncatchall: boxed unboxed :: (Tenso[18 chars]0)\n' != 'name[16 chars]ema: test::foo(Tensor x, Tensor y) -> (Tensor)[53 chars]0)\n'
  name: test::foo
- schema: (none)
+ schema: test::foo(Tensor x, Tensor y) -> (Tensor)
  catchall: boxed unboxed :: (Tensor _0) -> (Tensor _0)
 : expected from running ctors (1,); actual from running ctors (1,) and then failing to run ctor 0 (did this failure leave the dispatcher in a wedged state? it shouldn't!)
```

There are also C++ smoketests for the API.  These tests comprehensively
cover the C++ API surface of the new operator registration API, but
don't check very hard if the API does the right thing (that's what
test_dispatch.py is for)

Some miscellaneous changes which could have been split into other
PRs, but I was too lazy to do so:

- Add torch::jit::parseName (mirroring parseSchema/parseSchemaOrName)
- Add cloneWithName functionality to FunctionSchema
- Unconditionally generate schema registration, even when type_method_dispatch
  is a dict.  The one exception is for manual registrations....
- Add fallback, CppFunction::makeFallthrough and
  CppFunction::makeFromBoxedFunction to public API of op_registration, so we can
  stop calling internal registerImpl directly
- Add new syntax sugar dispatch_autograd for registering autograd kernels
- Minor OperatorName cleanup, storing OperatorName in DispatchTable
  and defining operator<< on OperatorName
- Refactored the op registration API to take FunctionSchema directly.
  We now do namespacing by post facto fixing up the OperatorName
  embedded in FunctionSchema.  This also means that you can
  now do torch::import("ns1").def("ns2::blah") and have the ns2
  override ns1 (although maybe this is not the correct behavior.)
- New torch::schema public API, for attaching alias analysis kind
  annotation kinds.  This meant we had to template up some function
  signatures which previously took const char*.  There's now a nice
  comment explaining this strategy.
- torch::import now takes std::string which means we can use
  the namespacing from Python

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35629

Differential Revision: D20724551

Pulled By: ezyang

fbshipit-source-id: befa46a1affb4ec4ae1fb39e3564a63695a6ca41
2020-03-29 19:48:29 -07:00
227beb9095 Revert D20680520: New operator registration API
Test Plan: revert-hammer

Differential Revision:
D20680520

Original commit changeset: 5d39a28e4ec7

fbshipit-source-id: 5b2497ffc24db9a05b01d526f161bc0164f9f707
2020-03-28 14:49:56 -07:00
28ab8c6ff8 New operator registration API (#35061)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35061

Main points of the new API:

- You can register implementations (impl) without having to specify a schema.
- Registrations are commutative, so no matter what order your static
  initializers run, you end up with the same end result.

op_registration_test.cpp contains a reasonably comprehensive accounting
for the available API surface

How does this implementation proceed?  The basic concept is to relax the
internal invariants of Dispatcher data structures to allow the
possibility that a FunctionSchema is not specified in an Operator.

- DispatchKeyExtractor has an uninitialized state where it doesn't look
  for dispatch keys in any arguments of the stack.  It can have a
  schema (de)registered to itself post facto with
  registerSchema/unregisterSchema.
- DispatchTable has a new constructor taking only an OperatorName for
  the uninitialized state.  It can have a schema (de)registered to itself
  post facto with registerSchema/unregisterSchema
- OperatorDef maintains counts of both defs and well as defs_and_impls.
  defs_and_impls keeps track of the outstanding impl registrations; you
  may have impl registrations but no defs.  If there are no defs (no
  schema), the operator is not returned by findSchema.  A new
  findOperatorByName fucntion unconditionally returns the OperatorHandle
  even if there's no schema.  OperatorHandle::hasSchema can be used
  to check if the operator has schema.
- Replaced 'registerKernel' with 'registerImpl', which is the new
  interface for directly registering kernels without implementations.
- Because 'registerImpl' no longer requires an OperatorHandle, change
  'registerDef' to only return a RegistrationHandleRAII.  This is marginally
  less efficient (since we're doing two hash table lookups on a registration
  now), but this won't matter in the long term, and probably doesn't
  matter now either.
- Rename registerBackendFallbackKernel to registerFallback (this exposed
  a bunch of places where we're improperly directly interfacing with Dispatcher;
  we need to add this capability to the true public API)
- All code generated internal registrations are switched to use the new
  API.  This includes VariableType registrations (which previously
  weren't converted) and the mobile autograd stuff
- Switch the new-style def()/impl() APIs to interact directly with Dispatcher,
  rather than indirecting through the old API
- We deleted alias analysis kind merging entirely.  As a nod to BC, it's
  possible to define a full schema with alias analysis kind, and then
  later do another full schema def with missing alias analysis kind, but
  the opposite direction is not allowed.  We can remove this entirely
  following the plan at https://github.com/pytorch/pytorch/issues/35040
- Schema matching is moved inside the dispatcher, because we might not
  be able to immediately schema match at the point of an impl() (because
  we don't have the schema yet).  To do this, we store the inferred
  function schema inside a KernelEntry, so we can check it when we get
  the real schema.
- Registered kernel functions now store a debug string which
  can be used to more easily identify them.  There's some best
  effort stuff based on __FUNCSIG__ but this is only really
  capable of reporting types and not function symbols.  Tests
  use this to distinguish between multiple distinct registrations.

Because we need our static initializers to work no matter what order
they're run, the testing strategy on this PR is quite involved.

The general concept:
- Bind a (very gimped) version of the dispatcher API from Python,
  so that we can easily write a more complex testing harness
  using expect tests.
- For series of registrations we want to test, exhaustively
  test every possible permutation of registrations (and
  deregistrations), and show that the intermediate states
  agree no matter what path is taken.
- Intermediate states are rendered using a new dumpState()
  debugging method that prints the internal state of the
  dispatcher.  This method may be generally useful for people
  who want to see what's in the dispatcher.
- Simultaneously, add a new invariant testing function which
  checks that the internal invariants of the dispatcher are
  upheld (so we don't have to print internal implementation
  details of the dispatcher)

The testing framework found a few bugs in development.  For example,
here is a case where we registered schema too early, before checking
if it was valid:

```
Traceback (most recent call last):
  File "test/test_dispatch.py", line 164, in test_def_impl_schema_mismatch
    ], raises=True)
  File "test/test_dispatch.py", line 135, in commute
    results=results, raises=raises)
  File "test/test_dispatch.py", line 83, in run_permutation
    .format(ctor_order[:i], op_ix))
  File "test/test_dispatch.py", line 59, in check_invariants
    .format(expected_provenance, actual_provenance)
AssertionError: 'name[16 chars]ema: (none)\ncatchall: boxed unboxed :: (Tenso[18 chars]0)\n' != 'name[16 chars]ema: test::foo(Tensor x, Tensor y) -> (Tensor)[53 chars]0)\n'
  name: test::foo
- schema: (none)
+ schema: test::foo(Tensor x, Tensor y) -> (Tensor)
  catchall: boxed unboxed :: (Tensor _0) -> (Tensor _0)
 : expected from running ctors (1,); actual from running ctors (1,) and then failing to run ctor 0 (did this failure leave the dispatcher in a wedged state? it shouldn't!)
```

There are also C++ smoketests for the API.  These tests comprehensively
cover the C++ API surface of the new operator registration API, but
don't check very hard if the API does the right thing (that's what
test_dispatch.py is for)

Some miscellaneous changes which could have been split into other
PRs, but I was too lazy to do so:

- Add torch::jit::parseName (mirroring parseSchema/parseSchemaOrName)
- Add cloneWithName functionality to FunctionSchema
- Unconditionally generate schema registration, even when type_method_dispatch
  is a dict.  The one exception is for manual registrations....
- Add fallback, CppFunction::makeFallthrough and
  CppFunction::makeFromBoxedFunction to public API of op_registration, so we can
  stop calling internal registerImpl directly
- Add new syntax sugar dispatch_autograd for registering autograd kernels
- Minor OperatorName cleanup, storing OperatorName in DispatchTable
  and defining operator<< on OperatorName
- Refactored the op registration API to take FunctionSchema directly.
  We now do namespacing by post facto fixing up the OperatorName
  embedded in FunctionSchema.  This also means that you can
  now do torch::import("ns1").def("ns2::blah") and have the ns2
  override ns1 (although maybe this is not the correct behavior.)
- New torch::schema public API, for attaching alias analysis kind
  annotation kinds.  This meant we had to template up some function
  signatures which previously took const char*.  There's now a nice
  comment explaining this strategy.
- torch::import now takes std::string which means we can use
  the namespacing from Python

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D20680520

Pulled By: ezyang

fbshipit-source-id: 5d39a28e4ec7c73fe4b1fb2222e865ab65e188f5
2020-03-28 10:52:49 -07:00
44af8ee6cd Add pybind11 exception translator (#30588)
Summary:
Closes https://github.com/pytorch/pytorch/issues/30027

The idea here is that you can bind a function with `pybind11` in a single line and without modifying the function:
```cpp
m.def("foo", foo, py::call_guard<torch::PyWarningHandler>());
```
Where warnings are handled by the [`call_guard`](https://pybind11.readthedocs.io/en/stable/advanced/functions.html#call-guard) and exceptions are handled by the `pybind11` exception translator. To do this, I have added support for handling C++ exceptions in `torch::PyWarningHandler`'s destructor without setting the python error state before hand.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30588

Differential Revision: D19905626

Pulled By: albanD

fbshipit-source-id: 90c0a5e298b123cc0c8ab9c52c91be4e96ea47c6
2020-02-18 11:33:29 -08:00
6209412647 Add option to use ninja to compile ahead-of-time cpp_extensions (#32495)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32495

Background
------------------------------
Previously, ninja was used to compile+link inline cpp_extensions and
ahead-of-time cpp_extensions were compiled with distutils. This PR adds
the ability to compile (but not link) ahead-of-time cpp_extensions with ninja.

The main motivation for this is to speed up cpp_extension builds: distutils
does not make use of parallelism. With this PR, using the new option, on my machine,
- torchvision compilation goes from 3m43s to 49s
- nestedtensor compilation goes from 2m0s to 28s.

User-facing changes
------------------------------

I added a `use_ninja` flag to BuildExtension. This defaults to
`True`. When `use_ninja` is True:
- it will attempt to use ninja.
- If we cannot use ninja, then this throws a warning and falls back to
distutils.
- Situations we cannot use ninja: Windows (NYI, I'll open a new issue
for this), if ninja cannot be found on the system.

Implementation Details
------------------------------

This PR makes this change in two steps. Please me know if it would be
easier to review this if I split this up into a stacked diff.
Those changes are:
1) refactor _write_ninja_file to separate the policy (what compiler flags
to pass) from the mechanism (how to write the ninja file and do compilation).
2) call _write_ninja_file and _run_ninja_build while building
ahead-of-time cpp_extensions. These are only used to compile objects;
distutils still handles the linking.

Change 1: refactor _write_ninja_file to seperate policy from mechanism
- I split _write_ninja_file into: _write_ninja_file and
_write_ninja_file_to_build_library
- I renamed _build_extension_module to _run_ninja_build

Change 2: Call _write_ninja_file while building ahead-of-time
cpp_extensions
- _write_ninja_file_and_compile_objects calls _write_ninja_file to only
build object files.
- We monkey-patch distutils.CCompiler.compile to call
_write_ninja_files_and_compile_objects
- distutils still handles the linking step. The linking step is not a
bottleneck so it was not a concern.
- This change only works on unix-based systems. Our code for windows
goes down a different codepath and I did not want to mess with that.
- If a system does not support ninja, we raise a warning and fall back
to the original compilation path.

Test Plan
------------------------------

Adhoc testing
- I built torchvision using pytorch master and printed out the build
commands. Next, I used this branch to build torchvision and looked at
the ninja file. I compared the ninja file with the build commands and
asserted that they were functionally the same.
- I repeated the above for pytorch/nestedtensor.

PyTorch test suite
- I split `test_cpp_extensions` into `test_cpp_extensions_aot` and
`test_cpp_extensions_jit`. The AOT (ahead-of-time) version tests
ahead-of-time and the JIT version tests just-in-time (not to be confused
with TorchScript)
- `test_cpp_extensions_aot` gets run TWICE by run_test.py, once with
a module that was built with ninja, and once with a module that was
built without ninja.
- run_test.py asserts that when we are building with use_ninja=True,
ninja is actually available on the system.

Test Plan: Imported from OSS

Differential Revision: D19730432

Pulled By: zou3519

fbshipit-source-id: 819590d01cf65e8da5a1e8019b8b3084792fee90
2020-02-05 18:49:29 -08:00