226 Commits

Author SHA1 Message Date
8fae7027b3 Don't introduce new overload for SymInt (#83628)
Previously, we introduced new SymInt overloads for every function we wanted.  This led to a lot of boilerplate, and also a lot of confusion about how the overloads needed to be implemented.

This PR takes a simpler but more risky approach: just take the original function and changes its ints to SymInts.

This is BC-breaking in the following ways:

* The C++ API for registering implementations for aten operators will change from int64_t to SymInt whenever you make this change. Code generated registrations in PyTorch do not change as codegen handles the translation automatically, but manual registrations will need to follow the change.  Typically, if you now accept a SymInt where you previously only took int64_t, you have to convert it back manually.  This will definitely break XLA, see companion PR https://github.com/pytorch/xla/pull/3914 Note that not all dispatch keys get the automatic translation; all the composite keys and Meta keys are modified to take SymInt directly (because they should handle them directly), and so there are adjustments for this.

This is not BC-breaking in the following ways:

* The user facing C++ API remains compatible.  Even if a function changes from int to SymInt, the default C++ binding still takes only ints.  (e.g., at::empty(IntArrayRef, ...).  To call with SymInts, you must call at::empty_symint instead. This involved adding two more signatures to CppSignatureGroup; in many cases I refactored code to iterate over all signatures in the group instead of hard-coding the two that previously existed.
* This is TorchScript compatible; internally we treat SymInts as ints so there is no change to what happens at runtime in TorchScript. In particular, it's OK to reference an empty schema by its old type (using int types), as long as you're not doing string equality (which you shouldn't be), these parse to the same underyling type.

Structure of the PR:

* The general strategy of this PR is that, even when you write `SymInt` inside `native_functions.yaml`, sometimes, we will treat it *as if* it were an `int`. This idea pervades the codegen changes, where we have a translation from SymInt to c10::SymInt or int64_t, and this is controlled by a symint kwarg which I added and then audited all call sites to decide which I wanted. Here are some of the major places where we pick one or the other:
  * The C++ FunctionSchema representation represents `SymInt` as `int`. There are a few places we do need to know that we actually have a SymInt and we consult `real_type()` to get the real type in this case. In particular:
    * When we do schema validation of C++ operator registration, we must compare against true schema (as the C++ API will provide `c10::SymInt`, and this will only be accepted if the schema is `SymInt`. This is handled with cloneWithRealTypes before we check for schema differences.
    * In `toIValue` argument parsing, we parse against the true schema value. For backwards compatibility reasons, I do still accept ints in many places where Layout/SymInt/etc were expected. (Well, accepting int where SymInt is expected is not BC, it's just the right logic!)
  * In particular, because SymInt never shows up as type() in FunctionSchema, this means that we no longer need a dedicated Tag::SymInt. This is good, because SymInts never show up in mobile anyway.
* Changes to functorch/aten are mostly about tracking changes to the C++ API registration convention. Additionally, since SymInt overloads no longer exist, registrations for SymInt implementations are deleted. In many cases, the old implementations did not properly support SymInts; I did not add any new functionality with this PR, but I did try to annotate with TODOs where this is work to do. Finally, because the signature of `native::` API changed from int to SymInt, I need to find alternative APIs for people who were directly calling these functions to call. Typically, I insert a new dispatch call when perf doesn't matter, or use `at::compositeexplicitautograd` namespace to handle other caes.
* The change to `make_boxed_from_unboxed_functor.h` is so that we accept a plain IntList IValue anywhere a SymIntList is expected; these are read-only arguments so covariant typing is OK.
* I change how unboxing logic works slightly. Previously, we interpret the C++ type for Layout/etc directly as IntType JIT type, which works well because the incoming IValue is tagged as an integer. Now, we interpret the C++ type for Layout as its true type, e.g., LayoutType (change to `jit_type.h`), but then we accept an int IValue for it anyway. This makes it symmetric with SymInt, where we interpret the C++ type as SymIntType, and then accept SymInt and int IValues for it.
* I renamed the `empty.names` overload to `empty_names` to make it less confusing (I kept mixing it up with the real empty overload)
* I deleted the `empty.SymInt` overload, which ended up killing a pile of functions. (This was originally a separate PR but the profiler expect test was giving me grief so I folded it in.)
* I deleted the LazyDynamicOpsTest tests. These were failing after these changes, and I couldn't figure out why they used to be passing: they make use of `narrow_copy` which didn't actually support SymInts; they were immediately converted to ints.
* I bashed LTC into working. The patches made here are not the end of the story. The big problem is that SymInt translates into Value, but what if you have a list of SymInt? This cannot be conveniently represented in the IR today, since variadic Values are not supported. To work around this, I translate SymInt[] into plain int[] (this is fine for tests because LTC dynamic shapes never actually worked); but this will need to be fixed for proper LTC SymInt support. The LTC codegen also looked somewhat questionable; I added comments based on my code reading.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83628
Approved by: https://github.com/albanD, https://github.com/bdhirsh
2022-08-23 22:04:07 +00:00
badbdb0330 [torchgen] Relax the restriction on number of custom namespaces (#83580)
Summary:
We started to see use cases where it involves more than 1 custom namespace to live within the same yaml file. Hence relaxing the restriction that 1 yaml file can only have 1 custom namespace other than `aten`. Updated unit test as well.

Differential Revision: D38775685

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83580
Approved by: https://github.com/JacobSzwejbka
2022-08-18 04:47:13 +00:00
11d4d91bdc [torchgen] Add logic in annotation parser to accept alias set (#83501)
Extending the current regex in `model.py` to support annotation alias set. See issue #83214.

Ideally we should have a full fledged lexer similar to `schema_type_parser.cpp`, since regex can be more and more difficult to read if we add more support to it.

Adding this to unblock this issue for now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83501
Approved by: https://github.com/SherlockNoMad
2022-08-17 07:04:25 +00:00
d0d6b1f222 [torchgen] Generate out variant for functional operator (#81437)
Summary:
Previously we don't generate out variant (both schema and kernel) for an operator with functional variant only. This adds support for that and adds test.

## Changes on `native_function_generation.py`

We are generating out variant for all functional variants if possible. This PR introduces a lot of newly generated out variants and `native_functions.yaml` needs to incorporate the changes by adding `autogen` keywords.

The logic for determining what operators we should generate an out variant for is the following:

1. No existing out variant for this `NativeFunction`
2. Contains an existing in place, mutable or functional variant
3. Contains at least 1 tensor like return(s)

For operators matching the first two conditions but failing the third, I listed them in `FUNCTIONAL_OPS_THAT_CANNOT_GET_AN_OUT_VARIANT`.

## Special handling

The following operators satisfy all 3 criteria above but we chose to not autogen them, with some reasons.
* `mkldnn_adaptive_avg_pool2d`, the generated out variant `mkldnn_adaptive_avg_pool2d.out` is colliding with the `mkldnn_adaptive_avg_pool2d_out` kernel in `adaptive_avg_pool2d.out` operator. I manually created `mkldnn_adaptive_avg_pool2d.out` and renamed `mkldnn_adaptive_avg_pool2d_out` to `mkldnn_adaptive_avg_pool2d_out_stub`.
* `min`, `max` and `mean`. There already exist `min.out`, `max.out` and `mean.out` but they are having different semantics with the functional ones. I manually created `min.unary_out`, `max.unary_out` and `mean.dtype_out` to disambiguate.

## Autograd Changes

We introduced a logic to not match derivatives info in `derivatives.yaml` to out variant, since we are generating `NOT_IMPLEMENTED` kernels for those out variants anyway. The issue we are seeing with the original logic is that it doesn't handle `TensorOption` arguments really well. For example we have these two operators:

* `_to_copy(Tensor self, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None, bool non_blocking=False, MemoryFormat? memory_format=None) -> Tensor`
* `_to_copy.out(Tensor self, *, bool non_blocking=False, MemoryFormat? memory_format=None, Tensor(a!) out) -> Tensor(a!)`

If we uses `_to_copy` derivative info, there will be compilation error since `dtype` is missing from `_to_copy.out` signature.
Test Plan: Rely on unit test

Differential Revision: D37832342

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81437
Approved by: https://github.com/iseeyuan, https://github.com/bdhirsh
2022-08-13 05:44:53 +00:00
c322fc03a1 [torchgen] Fix selective build error on custom namespace (#83141)
Summary: Currently `SelectiveBuilder` is hardcoding namespace `aten` for operators. This is not working anymore since operators started to have custom namespaces. This fixes it.

Test Plan: Rely on newly added unit test

Differential Revision: D38565527

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83141
Approved by: https://github.com/JacobSzwejbka
2022-08-10 21:27:05 +00:00
e3e33cfae0 Enable codegen of per-dispatch key derivative formulas in derivatives.yaml (#82801)
`derivatives.yaml` can now take a `dispatch` entry which registers per-autograd dispatch key derivatives such as
```
name: foo(Tensor self, Tensor y) -> Tensor
dispatch:
  Default:
    x: grad
    y: grad.expand(y.sizes())
  AutogradNestedTensor:
    x: grad
    y:  NestedTensor_foo_backward(grad, y)
output_differentiabilty: [True]
```

However the old schema where there is no `dispatch` entry is still supported.

Would greatly appreciate feedback on *how to improve the testing strategy* of this PR, currently have registered an aten test op in TestOps.cpp with dummy gradients in derivatives.yaml and have some tests in test_autograd.py:TestAutogradMultipleDispatch but I am not sure whether these are sufficiently rigorous.

Additionally, this PR also makes the assumption that sets like [VIEW_FUNCTIONS](ff5399e528/tools/autograd/gen_inplace_or_view_type.py (L60)) are per-native-function and not per-native-function-and-dispatch-key. I'm not sure whether this is necessarily the case, *would there ever be a situation where (e.g. a nested_tensor op is a view op but the aten function is not or vice versa?)*

* __->__ #82801
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82801
Approved by: https://github.com/bhosmer, https://github.com/albanD
2022-08-10 19:26:29 +00:00
943553965e support custom class in torchgen schema parser (#82925)
Differential Revision: [D38480514](https://our.internmc.facebook.com/intern/diff/D38480514/)

torchgen schema parser does not support parsing function schemas using custom class so far. Here is an example:
```
quantized::conv2d_relu.new(Tensor qx, __torch__.torch.classes.quantized.Conv2dPackedParamsBase packed_weight, float output_scale, int output_zero_point) -> (Tensor)
```

This PR parse custom class name and encapsulate that into an object of CustomClassType. The only thing we need right now is just store the string class name and return that in `__str__` method.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82925
Approved by: https://github.com/ezyang, https://github.com/bdhirsh
2022-08-08 22:24:43 +00:00
301fe8c27d [torchgen] Fix multiple backends with custom namespace (#82133)
Summary:
Some quantized operators needs `QuantizedCPU` backend, due to an issue in namespace checking, currently if we have two backends as well as a custom namespaces in native function, codegen will hit assertion error. This PR fixes this issue

The root cause is that codegen right now asserts that a native function should only have one namespace. The current behavior is that If a native function is not found in a `BackendIndex`, we will use default namespace for that backend, for fallback kernels. However that default namespace may not be listed in the yaml file and it should not be counted when checking if we have two different namespaces for that backend. In our error case, we have 2 `BackendIndex`, one for `QuantizedCPU` and one for `CPU`. The native function doesn't have a kernel in `QuantizedCPU` but we still use a default namespace (`at::native`) for it. Since we have a custom namespace for dispatch key `CPU`, we ran into the assertion error.

This PR changes the assertion criteria. We only error out if a namespace has two or more kernels and they have two or more different namespaces.

Test Plan: rely on newly added unit test

Differential Revision: D38101345

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82133
Approved by: https://github.com/iseeyuan
2022-07-29 22:53:58 +00:00
30e74be784 a new section for ir generation (#81847)
This is to get a conversation started.

* @JackCaoG we could add attributes to items in `ir_codegen` section to customize IR generation logic (e.g. not generating `::Lower`). Though it could be a bit tricky to thread it through.
* Adding an extra argument to `map_codegen` to filter native functions out seems like a step in the right direction. Otherwise, it's a bit confusing how do we go from a full list to a codegen list.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81847
Approved by: https://github.com/JackCaoG, https://github.com/wconstab, https://github.com/bdhirsh
2022-07-26 20:39:07 +00:00
06a0cfc0ea pytest to run test_ops, test_ops_gradients, test_ops_jit in non linux cuda environments (#79898)
This PR uses pytest to run test_ops, test_ops_gradients, and test_ops_jit in parallel in non linux cuda environments to decrease TTS.  I am excluding linux cuda because running in parallel results in errors due to running out of memory

Notes:
* update hypothesis version for compatability with pytest
* use rerun-failures to rerun tests (similar to flaky tests, although these test files generally don't have flaky tests)
  * reruns are denoted by a rerun tag in the xml.  Failed reruns also have the failure tag.  Successes (meaning that the test is flaky) do not have the failure tag.
* see https://docs.google.com/spreadsheets/d/1aO0Rbg3y3ch7ghipt63PG2KNEUppl9a5b18Hmv2CZ4E/edit#gid=602543594 for info on speedup (or slowdown in the case of slow tests)
  * expecting windows tests to decrease by 60 minutes total
* slow test infra is expected to stay the same - verified by running pytest and unittest on the same job and check the number of skipped/run tests
* test reports to s3 changed - add entirely new table to keep track of invoking_file times
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79898
Approved by: https://github.com/malfet, https://github.com/janeyx99
2022-07-19 19:50:57 +00:00
347b036350 Apply ufmt linter to all py files under tools (#81285)
With ufmt in place https://github.com/pytorch/pytorch/pull/81157, we can now use it to gradually format all files. I'm breaking this down into multiple smaller batches to avoid too many merge conflicts later on.

This batch (as copied from the current BLACK linter config):
* `tools/**/*.py`

Upcoming batchs:
* `torchgen/**/*.py`
* `torch/package/**/*.py`
* `torch/onnx/**/*.py`
* `torch/_refs/**/*.py`
* `torch/_prims/**/*.py`
* `torch/_meta_registrations.py`
* `torch/_decomp/**/*.py`
* `test/onnx/**/*.py`

Once they are all formatted, BLACK linter will be removed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81285
Approved by: https://github.com/suo
2022-07-13 07:59:22 +00:00
80f6d2e9e6 [torchgen] Extract out schema registration logic into a function (#80780)
Summary:
A followup to  #78015 and #79733. In those PRs I introduced custom namespace support into:
* `Register<DispatchKey>.cpp`
* `RegisterSchema.cpp`
* `NativeFunctions.h`

This PR extracts out logic that generates schema registration code (used in `RegisterSchema.cpp`) into a function so that it can be easily tested and reused. Added unit test to cover the logic as well.

Test Plan: Rely on newly added unit tests.

Differential Revision: D37581186

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80780
Approved by: https://github.com/iseeyuan
2022-07-12 21:52:42 +00:00
5c8a9803c8 [torchgen] Support multiple namespace in NativeFunctions.h (#79733)
Summary:
This is a follow up to #78015. This PR
* introduces namespace logic for generating `NativeFunctions.h`.
* adds helper function to extract namespace from string
* relaxes the constraint on the levels we support for custom kernel namespace to 2

Test Plan:
Yaml entry:
```
- func: unsqueeze.out(Tensor(a) self, int dim, *, Tensor(a!) out) -> Tensor(a!)
  variants: function
  device_check: NoCheck
  dispatch:
    CPU: custom_1::custom_2::unsqueeze
```

Generated `NativeFunctions.h`:

```
namespace custom_1 {
namespace custom_2 {
namespace native {
    TORCH_API at::Tensor & unsqueeze(const at::Tensor & self, int64_t dim, at::Tensor & out);
} // namespace native
} // namespace custom_2
} // namespace custom_1

```

Differential Revision: D37198111

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79733
Approved by: https://github.com/bdhirsh
2022-07-08 21:56:52 +00:00
b62d39eda0 Consolidate all python targets in the tools folder (#80408)
Summary:
All buck targets that points to caffe2/tools folder are now moved to tools/BUCK.
This also eliminates all python library/binary import in pt_defs.bzl, which caused T124308913.

Test Plan: CI

Differential Revision: D37468313

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80408
Approved by: https://github.com/seemethere, https://github.com/malfet
2022-06-29 23:27:47 +00:00
d67ce755ad [ci] only upload summarized test stats for PRs (#80295)
We are uploading too many tests which is stressing out Rockset. We do
not really use granular per-test statistics for PRs, so change to
uploading summary information for PRs.

We will still be uploading per-test stats from master.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80295
Approved by: https://github.com/seemethere, https://github.com/malfet, https://github.com/janeyx99
2022-06-27 18:40:39 +00:00
8878bc7bfe Revert "[ci] only upload summarized test stats for PRs (#80295)"
This reverts commit 5d595182c7151b8dfae3a9b42569e22657bc5fbc.

Reverted https://github.com/pytorch/pytorch/pull/80295 on behalf of https://github.com/suo due to reverting and relanding with a small bugfix
2022-06-27 18:33:05 +00:00
5d595182c7 [ci] only upload summarized test stats for PRs (#80295)
We are uploading too many tests which is stressing out Rockset. We do
not really use granular per-test statistics for PRs, so change to
uploading summary information for PRs.

We will still be uploading per-test stats from master.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80295
Approved by: https://github.com/seemethere, https://github.com/malfet
2022-06-27 18:24:15 +00:00
eaaa34daef [ci] write test suites to rockset
Currently we upload all `testcase` elements as individual test runs to
Rockset. It would be nice to also have `testsuite`s as well, which
aggregate high level information.

These aggregations could technically be performed in the backend, but it's
faster to just log the data since we already have it in the XML test
report.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79265

Approved by: https://github.com/seemethere
2022-06-10 15:38:09 +00:00
b26c5b4638 [ci] refactor upload_test_stats + add unit test
Clean some things up and add a unit test.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79210

Approved by: https://github.com/janeyx99
2022-06-10 15:38:09 +00:00
02c4d877b4 Codegen Non-Native IR Nodes (#76535)
Add codegen infrastructure to generate IR nodes for non-native ops.

The proposed change is to add a `non_native` key to the `{backend}_native_functions.yaml` file that contains schema definitions similar to what is found in `native_functions.yaml`. e.g.
```
non_native:
    ...
    - func: expand(Tensor input, int[] size, bool is_scalar_expand) -> Tensor
    ...
```
these definitions are parsed into a `LazyIrSchema` that can be used for generating IR nodes using `GenLazyIR`.

Fixes #74628

CC: @wconstab @desertfire @henrytwo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76535
Approved by: https://github.com/wconstab
2022-05-24 19:29:23 +00:00
6cbe9d1f58 [ci] delete old linter stuff
lintrunner has been running for a while, so delete redundant linter
things

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76984

Approved by: https://github.com/janeyx99
2022-05-11 07:40:44 +00:00
12f67ed2c0 Revert "[ci] delete old linter stuff"
This reverts commit 3a68155ce0973c005457593375801a2cc19de54f.

Reverted https://github.com/pytorch/pytorch/pull/76984 on behalf of https://github.com/janeyx99
2022-05-10 23:17:40 +00:00
3a68155ce0 [ci] delete old linter stuff
lintrunner has been running for a while, so delete redundant linter
things

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76984

Approved by: https://github.com/janeyx99
2022-05-10 22:06:18 +00:00
fb0f285638 [lint] upgrade mypy to latest version
Fixes https://github.com/pytorch/pytorch/issues/75927.

Had to fix some bugs and add some ignores.

To check if clean:
```
lintrunner --paths-cmd='git grep -Il .' --take MYPY,MYPYSTRICT
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76753

Approved by: https://github.com/malfet
2022-05-03 20:51:34 +00:00
3d7428d9ac Revert "[lint] upgrade mypy to latest version"
This reverts commit 9bf18aab94943f5352604a39340ad57ad4d0c5a4.

Reverted https://github.com/pytorch/pytorch/pull/76753 on behalf of https://github.com/suo
2022-05-03 20:01:18 +00:00
9bf18aab94 [lint] upgrade mypy to latest version
Fixes https://github.com/pytorch/pytorch/issues/75927.

Had to fix some bugs and add some ignores.

To check if clean:
```
lintrunner --paths-cmd='git grep -Il .' --take MYPY,MYPYSTRICT
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76753

Approved by: https://github.com/malfet
2022-05-03 19:43:28 +00:00
0708630d9f Allow sharding for distributed tests
Addresses my mistake introduced in https://github.com/pytorch/pytorch/pull/76536#issuecomment-1112657429

Also allows for sharding 1 in run_test.py
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76570
Approved by: https://github.com/jeffdaily, https://github.com/seemethere
2022-04-29 03:55:07 +00:00
b204ad863f Revert "Revert "Allow specifying tags for aten operators in native_functions.yaml""
This reverts commit ea44645c9a682a4e212e64b94a86383c3388ed6b.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76456

Approved by: https://github.com/osalpekar
2022-04-28 02:04:57 +00:00
36420b5e8c Rename tools/codegen to torchgen (#76275)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76275

In preparation for addressing
https://github.com/pytorch/pytorch/issues/73212

Diff was generated with:

```
git mv tools/codegen torchgen
git grep -l 'tools.codegen' | xargs sed -i 's/tools.codegen/torchgen/g'
sed -i "s/\${TOOLS_PATH}\/codegen/\${TORCH_ROOT}\/torchgen/g" caffe2/CMakeLists.txt
```

and a manual edits to:

* tools/test/test_gen_backend_stubs.py
* torchgen/build.bzl
* torchgen/gen_backend_stubs.py

aka this diff:

```
 diff --git a/tools/test/test_gen_backend_stubs.py b/tools/test/test_gen_backend_stubs.py
index 3dc26c6d2d..104054575e 100644
 --- a/tools/test/test_gen_backend_stubs.py
+++ b/tools/test/test_gen_backend_stubs.py
@@ -9,7 +9,7 @@ from torchgen.gen_backend_stubs import run
 from torchgen.gen import _GLOBAL_PARSE_NATIVE_YAML_CACHE  # noqa: F401

 path = os.path.dirname(os.path.realpath(__file__))
-gen_backend_stubs_path = os.path.join(path, '../torchgen/gen_backend_stubs.py')
+gen_backend_stubs_path = os.path.join(path, '../../torchgen/gen_backend_stubs.py')

 # gen_backend_stubs.py is an integration point that is called directly by external backends.
 # The tests here are to confirm that badly formed inputs result in reasonable error messages.
 diff --git a/torchgen/build.bzl b/torchgen/build.bzl
index ed04e35a43..d00078a3cf 100644
 --- a/torchgen/build.bzl
+++ b/torchgen/build.bzl
@@ -1,6 +1,6 @@
 def define_targets(rules):
     rules.py_library(
-        name = "codegen",
+        name = "torchgen",
         srcs = rules.glob(["**/*.py"]),
         deps = [
             rules.requirement("PyYAML"),
@@ -11,6 +11,6 @@ def define_targets(rules):

     rules.py_binary(
         name = "gen",
-        srcs = [":codegen"],
+        srcs = [":torchgen"],
         visibility = ["//visibility:public"],
     )
 diff --git a/torchgen/gen_backend_stubs.py b/torchgen/gen_backend_stubs.py
index c1a672a655..beee7a15e0 100644
 --- a/torchgen/gen_backend_stubs.py
+++ b/torchgen/gen_backend_stubs.py
@@ -474,7 +474,7 @@ def run(
 ) -> None:

     # Assumes that this file lives at PYTORCH_ROOT/torchgen/gen_backend_stubs.py
-    pytorch_root = pathlib.Path(__file__).parent.parent.parent.absolute()
+    pytorch_root = pathlib.Path(__file__).parent.parent.absolute()
     template_dir = os.path.join(pytorch_root, "aten/src/ATen/templates")

     def make_file_manager(install_dir: str) -> FileManager:
```

run_all_fbandroid_tests

Test Plan: sandcastle

Reviewed By: albanD, ngimel

Differential Revision: D35770317

fbshipit-source-id: 153ac4a7fef15b1e750812a90bfafdbc8f1ebcdf
(cherry picked from commit c6d485d1d4648fa1c8a4c14c5bf3d8e899b9b4dd)
2022-04-25 01:38:06 +00:00
a11c1bbdd0 Run Black on all of tools/
Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76089

Approved by: https://github.com/albanD
2022-04-20 17:29:41 +00:00
cbbb96c271 [lint] add shellcheck to lintrunner
As title

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75856

Approved by: https://github.com/malfet
2022-04-15 04:03:54 +00:00
1c60b9aaa5 [ci] use lintrunner in CI
This changes our lint workflows to use lintrunner for the linters that
are currently supported

+ some random fixes to make things lint clean on master
+ changes to Makefile to use lintrunner

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68460

Approved by: https://github.com/t10-13rocket, https://github.com/seemethere, https://github.com/janeyx99
2022-04-15 00:08:21 +00:00
db6165215e Revert "[ci] use lintrunner in CI"
This reverts commit 4c3ee53522ab8f71749b9f1412bea523f7b04304.

Reverted https://github.com/pytorch/pytorch/pull/68460 on behalf of https://github.com/malfet
2022-04-14 23:27:27 +00:00
4c3ee53522 [ci] use lintrunner in CI
This changes our lint workflows to use lintrunner for the linters that
are currently supported

+ some random fixes to make things lint clean on master
+ changes to Makefile to use lintrunner

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68460

Approved by: https://github.com/t10-13rocket, https://github.com/seemethere, https://github.com/janeyx99
2022-04-14 17:43:41 +00:00
bfbe4b60aa use links to re-enable tests
Adds ability to use links to PRs to re-enable disabled tests.

Test Plan

Included "fixes <link to 74223>" in a commit message, which is a disabled test belonging to the periodic job linux-bionic-cuda11.5-py3.7-gcc7. Looking at the logs of this job will show that this test (test_exception_all) was run in the job.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75210
Approved by: https://github.com/suo, https://github.com/janeyx99
2022-04-05 21:09:55 +00:00
40d70a6fcb re-enable disabled test via commit message
Fixes #[74132](https://github.com/pytorch/pytorch/issues/74132)

Allows for re-enabling disabled tests using commit messages.  Previously, this done in the PR body by writing something similar to "Fixes #<issue number that disables a test>" and the test would be re-enabled.  Now, this re-enabling can be done in either the commit message or the PR body.

Test Plan

One of the commits in this PR contains "fixes 74223" (the # is ommited because github would mistakenly close that issue), which is a disabled test belonging to the periodic job linux-bionic-cuda11.5-py3.7-gcc7.  Looking at the logs of this job will show that this test (test_exception_all) was run in the job.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74981
Approved by: https://github.com/seemethere, https://github.com/janeyx99
2022-04-01 15:49:42 +00:00
ea44645c9a Revert "Allow specifying tags for aten operators in native_functions.yaml"
This reverts commit 1dab71ab258df5168bb635530a820625f9d4b522.

Reverted https://github.com/pytorch/pytorch/pull/72549 on behalf of https://github.com/malfet
2022-03-28 18:04:38 +00:00
1dab71ab25 Allow specifying tags for aten operators in native_functions.yaml
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72549

Approved by: https://github.com/ezyang
2022-03-25 21:17:52 +00:00
78cb1abc97 Class name
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73201
Approved by: https://github.com/bdhirsh
2022-03-23 14:18:02 +00:00
ce7910ba81 ufunc codegen (#65851)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65851

Design doc: https://docs.google.com/document/d/12rtlHnPUpaJ-I52Iob3L0WA3rKRr_OY7fXqeCvn2MVY/edit

First read the design doc to understand the user syntax. In this PR, we have converted add to use ufunc codegen; most of the cpp changes are deleting the preexisting implementations of add, and ufunc/add.h are the new implementations in the ufunc format.

The bulk of this PR is in the new codegen machinery. Here's the order to read the files:

* `tools/codegen/model.py`
  * Some self-explanatory utility classes: `ScalarType`, `DTYPE_CLASSES`
  * New classes for representing ufunc entries in `native_functions.yaml`: `UfuncKey` and  `UfuncInnerLoop`, as well as parsing logic for these entries. UfuncKey has some unusual entries (e.g., CPUScalar) that don't show up in the documentation, more on these below).
  * A predicate `is_ufunc_dispatch_key` for testing which dispatch keys should get automatically generated when an operator opts into ufuncs (CPU and CUDA, for now!)
* `tools/codegen/api/types.py`
  * More self-explanatory utility stuff: ScalarTypeToCppMapping mapping ScalarType to CppTypes; Binding.rename for changing the name of a binding (used when we assign constructor variables to member variables inside CUDA functors)
  * New VectorizedCType, representing `at::vec::Vectorized<T>`. This is used inside vectorized CPU codegen.
  * New `scalar_t` and `opmath_t` BaseCppTypes, representing template parameters that we work with when doing codegen inside ufunc kernel loops (e.g., where you previously had Tensor, now you have `scalar_t`)
  * `StructuredImplSignature` represents a `TORCH_IMPL_FUNC` definition, and straightforwardly follows from preexisting `tools.codegen.api.structured`
* `tools/codegen/translate.py` - Yes, we use translate a LOT in this PR. I improved some of the documentation, the only substantive changes are adding two new conversions: given a `scalar_t` or a `const Scalar&`, make it convertible to an `opmath_t`
* `tools/codegen/api/ufunc.py`
  * OK, now we're at the meaty stuff. This file represents the calling conventions of three important concepts in ufunc codegen, which we'll describe shortly. All of these APIs are relatively simple, since there aren't any complicated types by the time you get to kernels.
  * stubs are the DispatchStub trampolines that CPU kernels use to get to their vectorized versions. They drop all Tensor arguments (as they are in TensorIterator) but otherwise match the structured calling convention
  * ufuncs are the inner loop template functions that you wrote in ufunc/add.h which do the actual computation in question. Here, all the Tensors and Scalars have been converted into the computation type (`opmath_t` in CUDA, `scalar_t` in CPU)
  * ufunctors are a CUDA-only concept representing functors that take some of their arguments on a host-side constructor, and the rest in the device-side apply. Once again, Tensors and Scalars are converted into the computation type, `opmath_t`, but for clarity all the functions take `scalar_t` as argument (as this is the type that is most salient at the call site). Because the constructor and apply are code generated separately, `ufunctor_arguments` returns a teeny struct `UfunctorBindings`
* `tools/codegen/dest/ufunc.py` - the workhorse. This gets its own section below.
* `tools/codegen/gen.py` - just calling out to the new dest.ufunc implementation to generate UfuncCPU_add.cpp, UFuncCPUKernel_add.cpp and UfuncCUDA_add.cu files per ufunc operator. Each of these files does what you expect (small file that registers kernel and calls stub; CPU implementation; CUDA implementation). There is a new file manager for UFuncCPUKernel files as these need to get replicated by cmake for vectorization. One little trick to avoid recompilation is we directly replicate code generated forward declarations in these files, to reduce the number of headers we depend on (this is codegen, we're just doing the preprocessors job!)
* I'll talk about build system adjustments below.

OK, let's talk about tools/codegen/dest/ufunc.py. This file can be roughly understood in two halves: one for CPU code generation, and the other for CUDA code generation.

**CPU codegen.** Here's roughly what we want to generate:

```
// in UfuncCPU_add.cpp
using add_fn = void (*)(TensorIteratorBase&, const at::Scalar&);
DECLARE_DISPATCH(add_fn, add_stub);
DEFINE_DISPATCH(add_stub);

TORCH_IMPL_FUNC(ufunc_add_CPU)
(const at::Tensor& self, const at::Tensor& other, const at::Scalar& alpha, const at::Tensor& out) {
  add_stub(device_type(), *this, alpha);
}

// in UfuncCPUKernel_add.cpp
void add_kernel(TensorIteratorBase& iter, const at::Scalar& alpha) {
  at::ScalarType st = iter.common_dtype();
  RECORD_KERNEL_FUNCTION_DTYPE("add_stub", st);
  switch (st) {
    AT_PRIVATE_CASE_TYPE("add_stub", at::ScalarType::Bool, bool, [&]() {
      auto _s_alpha = alpha.to<scalar_t>();
      cpu_kernel(iter, [=](scalar_t self, scalar_t other) {
        return ufunc::add(self, other, _s_alpha);
      });
    })

    AT_PRIVATE_CASE_TYPE(
        "add_stub", at::ScalarType::ComplexFloat, c10::complex<float>, [&]() {
          auto _s_alpha = alpha.to<scalar_t>();
          auto _v_alpha = at::vec::Vectorized<scalar_t>(_s_alpha);
          cpu_kernel_vec(
              iter,
              [=](scalar_t self, scalar_t other) {
                return ufunc::add(self, other, _s_alpha);
              },
              [=](at::vec::Vectorized<scalar_t> self,
                  at::vec::Vectorized<scalar_t> other) {
                return ufunc::add(self, other, _v_alpha);
              });
        })

   ...
```

The most interesting change about the generated code is what previously was an `AT_DISPATCH` macro invocation is now an unrolled loop. This makes it easier to vary behavior per-dtype (you can see in this example that the entry for bool and float differ) without having to add extra condtionals on top.

Otherwise, to generate this code, we have to hop through several successive API changes:

* In TORCH_IMPL_FUNC(ufunc_add_CPU), go from StructuredImplSignature to StubSignature (call the stub). This is normal argument massaging in the classic translate style.
* In add_kernel, go from StubSignature to UfuncSignature. This is nontrivial, because we must do various conversions outside of the inner kernel loop. These conversions are done by hand, setting up the context appropriately, and then the final ufunc call is done using translate. (BTW, I introduce a new convention here, call on a Signature, for code generating a C++ call, and I think we should try to use this convention elsewhere)

The other piece of nontrivial logic is the reindexing by dtype. This reindexing exists because the native_functions.yaml format is indexed by UfuncKey:

```
   Generic: add (AllAndComplex, BFloat16, Half)
   ScalarOnly: add (Bool)
```

but when we do code generation, we case on dtype first, and then we generate a `cpu_kernel` or `cpu_kernel_vec` call. We also don't care about CUDA code generation (which Generic) hits. Do this, we lower these keys into two low level keys, CPUScalar and CPUVector, which represent the CPU scalar and CPU vectorized ufuncs, respectively (Generic maps to CPUScalar and CPUVector, while ScalarOnly maps to CPUScalar only). Reindexing then gives us:

```
    AllAndComplex:
        CPUScalar: add
        CPUVector: add
    Bool:
        CPUScalar: add
    ...
```

which is a good format for code generation, but too wordy to force native_functions.yaml authors to write. Note that when reindexing, it is possible for there to be a conflicting definition for the same dtype; we just define a precedence order and have one override the other, so that it is easy to specialize on a particular dtype if necessary. Also note that because CPUScalar/CPUVector are part of UfuncKey, technically you can manually specify them in native_functions.yaml, although I don't expect this functionality to be used.

**CUDA codegen.** CUDA code generation has many of the same ideas as CPU codegen, but it needs to know about functors, and stubs are handled slightly differently.  Here is what we want to generate:

```
template <typename scalar_t>
struct CUDAFunctorOnSelf_add {
  using opmath_t = at::opmath_type<scalar_t>;
  opmath_t other_;
  opmath_t alpha_;
  CUDAFunctorOnSelf_add(opmath_t other, opmath_t alpha)
      : other_(other), alpha_(alpha) {}
  __device__ scalar_t operator()(scalar_t self) {
    return ufunc::add(static_cast<opmath_t>(self), other_, alpha_);
  }
};

... two more functors ...

void add_kernel(TensorIteratorBase& iter, const at::Scalar & alpha) {
  TensorIteratorBase& iter = *this;
  at::ScalarType st = iter.common_dtype();
  RECORD_KERNEL_FUNCTION_DTYPE("ufunc_add_CUDA", st);
  switch (st) {
    AT_PRIVATE_CASE_TYPE("ufunc_add_CUDA", at::ScalarType::Bool, bool, [&]() {
      using opmath_t = at::opmath_type<scalar_t>;
      if (false) {
      } else if (iter.is_cpu_scalar(1)) {
        CUDAFunctorOnOther_add<scalar_t> ufunctor(
            iter.scalar_value<opmath_t>(1), (alpha).to<opmath_t>());
        iter.remove_operand(1);
        gpu_kernel(iter, ufunctor);
      } else if (iter.is_cpu_scalar(2)) {
        CUDAFunctorOnSelf_add<scalar_t> ufunctor(
            iter.scalar_value<opmath_t>(2), (alpha).to<opmath_t>());
        iter.remove_operand(2);
        gpu_kernel(iter, ufunctor);
      } else {
        gpu_kernel(iter, CUDAFunctor_add<scalar_t>((alpha).to<opmath_t>()));
      }
    })

   ...

REGISTER_DISPATCH(add_stub, &add_kernel);

TORCH_IMPL_FUNC(ufunc_add_CUDA)
(const at::Tensor& self,
 const at::Tensor& other,
 const at::Scalar& alpha,
 const at::Tensor& out) {
  add_kernel(*this, alpha);
}

```

The functor business is the bulk of the complexity. Like CPU, we decompose CUDA implementation into three low-level keys: CUDAFunctor (normal, all CUDA kernels will have this), and CUDAFunctorOnOther/CUDAFunctorOnScalar (these are to support Tensor-Scalar specializations when the Scalar lives on CPU). Both Generic and ScalarOnly provide ufuncs for CUDAFunctor, but for us to also lift these into Tensor-Scalar specializations, the operator itself must be eligible for Tensor-Scalar specialization. At the moment, this is hardcoded to be all binary operators, but in the future we can use tags in native_functions.yaml to disambiguate (or perhaps expand codegen to handle n-ary operators).

The reindexing process not only reassociates ufuncs by dtype, but it also works out if Tensor-Scalar specializations are needed and codegens the ufunctors necessary for the level of specialization here (`compute_ufunc_cuda_functors`). Generating the actual kernel (`compute_ufunc_cuda_dtype_body`) just consists of, for each specialization, constructing the functor and then passing it off to `gpu_kernel`. Most of the hard work is in functor generation, where we take care to make sure `operator()` has the correct input and output types (which `gpu_kernel` uses to arrange for memory accesses to the actual CUDA tensor; if you get these types wrong, your kernel will still work, it will just run very slowly!)

There is one big subtlety with CUDA codegen: this won't work:

```
   Generic: add (AllAndComplex, BFloat16, Half)
   ScalarOnly: add_bool (Bool)
```

This is because, even though there are separate Generic/ScalarOnly entries, we only generate a single functor to cover ALL dtypes in this case, and the functor has the ufunc name hardcoded into it. You'll get an error if you try to do this; to fix it, just make sure the ufunc is named the same consistently throughout. In the code, you see this because after testing for the short circuit case (when a user provided the functor themselves), we squash all the generic entries together and assert their ufunc names are the same. Hypothetically, if we generated a separate functor per dtype, we could support differently named ufuncs but... why would you do that to yourself. (One piece of nastiness is that the native_functions.yaml syntax doesn't stop you from shooting yourself in the foot.)

A brief word about CUDA stubs: technically, they are not necessary, as there is no CPU/CPUKernel style split for CUDA kernels (so, if you look, structured impl actually calls add_kernel directly). However, there is some code that still makes use of CUDA stubs (in particular, I use the stub to conveniently reimplement sub in terms of add), so we still register it. This might be worth frying some more at a later point in time.

**Build system changes.** If you are at FB, you should review these changes in fbcode, as there are several changes in files that are not exported to ShipIt.

The build system changes in this patch are substantively complicated by the fact that I have to implement these changes five times:

* OSS cmake build
* OSS Bazel build
* FB fbcode Buck build
* FB xplat Buck build (selective build)
* FB ovrsource Buck build

Due to technical limitations in the xplat Buck build related to selective build, it is required that you list every ufunc header manually (this is done in tools/build_variables.bzl)

The OSS cmake changes are entirely in cmake/Codegen.cmake there is a new set of files cpu_vec_generated (corresponding to UfuncCPUKernel files) which is wired up in the same way as other files. These files are different because they need to get compiled multiple times under different vectorization settings. I adjust the codegen, slightly refactoring the inner loop into its own function so I can use different base path calculation depending on if the file is traditional (in the native/cpu folder) or generated (new stuff from this diff.

The Bazel/Buck changes are organized around tools/build_variables.bzl, which contain the canonical list of ufunc headers (aten_ufunc_headers), and tools/ufunc_defs.bzl (added to ShipIt export list in D34465699) which defines a number of functions that compute the generated cpu, cpu kernel and cuda files based on the headers list. For convenience, these functions take a genpattern (a string with a {} for interpolation) which can be used to easily reformat the list of formats in target form, which is commonly needed in the build systems.

The split between build_variables.bzl and ufunc_defs.bzl is required because build_variables.bzl is executed by a conventional Python interpreter as part of the OSS cmake, but we require Skylark features to implement the functions in ufunc_defs.bzl (I did some quick Googling but didn't find a lightweight way to run the Skylark interpreter in open source.)

With these new file lists, the rest of the build changes are mostly inserting references to these files wherever necessary; in particular, cpu kernel files have to be worked into the multiple vectorization build flow (intern_build_aten_ops in OSS Bazel). Most of the subtlety relates to selective build. Selective build requires operator files to be copied per overall selective build; as dhruvbird explains to me, glob expansion happens during the action graph phase, but the selective build handling of TEMPLATE_SOURCE_LIST is referencing the target graph. In other words, we can't use a glob to generate deps for another rule, because we need to copy files from wherever (included generated files) to a staging folder so the rules can pick them up.

It can be somewhat confusing to understand which bzl files are associated with which build. Here are the relevant mappings for files I edited:

* Used by everyone - tools/build_tools.bzl, tools/ufunc_defs.bzl
* OSS Bazel - aten.bzl, BUILD.bazel
* FB fbcode Buck - TARGETS
* FB xplat Buck -BUCK, pt_defs.bzl, pt_template_srcs.bzl
* FB ovrsource Buck - ovrsource_defs.bzl, pt_defs.bzl

Note that pt_defs.bzl is used by both xplat and ovrsource. This leads to the "tiresome" handling for enabled backends, as selective build is CPU only, but ovrsource is CPU and CUDA.

BTW, while I was at it, I beefed up fb/build_arvr.sh to also do a CUDA ovrsource build, which was not triggered previously.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D31306586

Pulled By: ezyang

fbshipit-source-id: 210258ce83f578f79cf91b77bfaeac34945a00c6
(cherry picked from commit d65157b0b894b6701ee062f05a5f57790a06c91c)
2022-03-01 00:33:40 +00:00
08510ba5e4 Disable test history as it's fragile
Related to #73083

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73093
2022-02-18 18:19:03 +00:00
7b8c43cd7c Revert "Revert D32498570: make codegen'd device guards not cuda-specific. Allow them to be used in external codegen" (#69951)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69951

This reverts commit 0ef523633fddf2d63e97d5028b00af10ff344561.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D33113543

Pulled By: bdhirsh

fbshipit-source-id: b28073ee0870b413ea9f617f27671ae5c6f3c696
2022-01-04 14:53:21 -08:00
bb5b4cceb6 Revert "Revert D32498569: allow external backend codegen to toggle whether to generate out= and inplace kernels" (#69950)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69950

This reverts commit f6cad53443704dfe5a20cc62bee14d91e3bffcaa.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D33113545

Pulled By: bdhirsh

fbshipit-source-id: d6590294662588d36c09662dea65919ad4e1e288
2022-01-04 14:52:00 -08:00
f6cad53443 Revert D32498569: allow external backend codegen to toggle whether to generate out= and inplace kernels
Test Plan: revert-hammer

Differential Revision:
D32498569 (aa0cf68c17)

Original commit changeset: ebd932d042b9

Original Phabricator Diff: D32498569 (aa0cf68c17)

fbshipit-source-id: 21a393fa339510d926512a7983d33ece327b743d
2021-12-14 15:27:24 -08:00
0ef523633f Revert D32498570: make codegen'd device guards not cuda-specific. Allow them to be used in external codegen
Test Plan: revert-hammer

Differential Revision:
D32498570 (2e7a91c45f)

Original commit changeset: 0ce6a5614417

Original Phabricator Diff: D32498570 (2e7a91c45f)

fbshipit-source-id: 7c64ce1b5e51a680b4aeae8721e0c9e15c793289
2021-12-14 15:04:10 -08:00
2e7a91c45f make codegen'd device guards not cuda-specific. Allow them to be used in external codegen (#68531)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68531

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D32498570

Pulled By: bdhirsh

fbshipit-source-id: 0ce6a5614417671313b4d274ea84742c5b81d1b0
2021-12-14 10:25:04 -08:00
aa0cf68c17 allow external backend codegen to toggle whether to generate out= and inplace kernels (#68530)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68530

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D32498569

Pulled By: bdhirsh

fbshipit-source-id: ebd932d042b988e19c71aa04a21677db9bdc9f04
2021-12-14 10:25:02 -08:00
923f06621c Fix Windows ninja builds when MAX_JOBS is specified (#65444)
Summary:
Reported by cloudhan in https://github.com/pytorch/pytorch/pull/64733#issuecomment-924545463

Fixes regression introduced by 047e68235f

cc malfet seemethere

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65444

Reviewed By: dagitses, seemethere

Differential Revision: D31103260

Pulled By: malfet

fbshipit-source-id: 9d5454a64cb8a0b96264119cf16582cc5afed284
2021-09-22 14:04:31 -07:00
543185a0fd support using gradients named for outputs in derivatives (#63947)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63947

Fixes #62196

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D30541485

Pulled By: dagitses

fbshipit-source-id: ea1dd0edd1a51936a295631e52b85e9c022a9c87
2021-09-18 07:31:45 -07:00
047e68235f delegate parallelism to Ninja when possible (#64733)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64733

The previous implementation was wrong when CPU scheduling affinity is
set. In fact, it is still wrong if Ninja is not being used.

When there is CPU scheduling affinity set, the number of processors
available on the system likely exceeds the number of processors that
are usable to the build. We ought to use
`len(os.sched_getaffinity(0))` to determine the effective parallelism.

This change is more minimal and instead just delegates to Ninja (which
handles this correctly) when it is used.

Test Plan:
I verified this worked as correctly using Ninja on a 96-core machine
with 24 cores available for scheduling by checking:
 * the cmake command did not specify "-j"
 * the number of top-level jobs in top/pstree never exceeded 26 (24 +
   2)

And I verified we get the legacy behavior by specifying USE_NINJA=0 on
the build.

Reviewed By: jbschlosser, driazati

Differential Revision: D30968796

Pulled By: dagitses

fbshipit-source-id: 29547dd378fea793957bcc2f7d52d5def1ecace2
2021-09-17 12:28:28 -07:00