Commit Graph

34 Commits

Author SHA1 Message Date
c4d1ff02f8 [Lint] Update clang-format to 19.1.4 (#153889)
All changes other than the one to `tools/linter/adapters/s3_init_config.json` are generated by newer clang-format
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153889
Approved by: https://github.com/cyyever, https://github.com/atalman
2025-05-20 14:12:46 +00:00
3ed5f1fb77 [CUDA][cuBLAS] Aten GEMM overload for FP32 output from FP16/BF16 inputs (#150812)
Enable FP32 output from FP16/BF16 GEMMs in aten with cuBLAS. Accumulation for these GEMMs are generally already done in FP32. Adds the functionality to the following aten operators:
* mm
* bmm
* addmm
* baddmm

Follow up of customer issue: https://github.com/pytorch/pytorch/issues/146241#issuecomment-2781889390

Differential Revision: [D73126191](https://our.internmc.facebook.com/intern/diff/D73126191)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150812
Approved by: https://github.com/ngimel, https://github.com/eqy
2025-04-18 01:53:26 +00:00
cyy
73d0f484b3 [structural binding][11/N] Replace std::tie with structural binding (#130830)
Follows  #130784

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130830
Approved by: https://github.com/janeyx99
2024-07-18 00:45:06 +00:00
3f5b59eef4 [codemod] c10::optional -> std::optional in caffe2/aten/src/ATen/DeviceGuard.h +117 (#126901)
Summary:
Generated with
```
fbgs -f '.*\.(cpp|cxx|cc|h|hpp|cu|cuh)$' c10::optional -l | perl -pe 's/^fbsource.fbcode.//' | grep -v executorch | xargs -n 50 perl -pi -e 's/c10::optional/std::optional/g'
```

 - If you approve of this diff, please use the "Accept & Ship" button :-)

(117 files modified.)

Test Plan: Sandcastle

Reviewed By: palmje

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126901
Approved by: https://github.com/Skylion007, https://github.com/eqy
2024-05-24 00:26:15 +00:00
ed327876f5 [codemod] c10:optional -> std::optional (#126135)
Generated by running the following from PyTorch root:
```
find . -regex ".*\.\(cpp\|h\|cu\|hpp\|cc\|cxx\)$" | grep -v "build/" | xargs -n 50 -P 4 perl -pi -e 's/c10::optional/std::optional/'
```

`c10::optional` is just an alias for `std::optional`. This removes usages of that alias in preparation for eliminating it entirely.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126135
Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/albanD, https://github.com/aaronenyeshi
2024-05-14 19:35:51 +00:00
cyy
dee100945e [2/N] Move c10::variant to std::variant (#109723)
This PR moves most of c10::variant calls to std::variant.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109723
Approved by: https://github.com/ezyang
2023-09-24 02:47:43 +00:00
429a80dded [NNC] Lowering function generates the output buffer with the specified stride (#76529)
Summary:
Pass stride information to lowering function to generate the output bufer with proper memory layout.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76529

Reviewed By: ZolotukhinM

Differential Revision: D36116712

Pulled By: IvanKobzarev

fbshipit-source-id: d3901f756b3710ecce172d6db3ecb0b7c12fb929
(cherry picked from commit b6cd53c91c01db36ea0e99167dc0ce0ae1d3aa23)
2022-05-04 20:04:22 +00:00
1d55518198 Revert "[nnc] Strides to Tensor (#72962)"
This reverts commit 939060925f28c9498da42225f216d838e1f7f4ca.

Fixes https://github.com/pytorch/vision/issues/5873

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76332
Approved by: https://github.com/seemethere
2022-04-25 19:50:00 +00:00
939060925f [nnc] Strides to Tensor (#72962)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72962

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM, cpuhrsch

Differential Revision: D34589306

Pulled By: IvanKobzarev

fbshipit-source-id: ecee5249760ecc0c8b2edb1842b90218899bc944
(cherry picked from commit 9e310c4c67389da30da89126d838ffe3864aba6f)
2022-04-23 19:35:15 +00:00
252e1ccce6 Enable TE fuser to support user defined operator (#73073)
Summary:
PyTorch supports registering a custom operator by `TORCH_LIBRARY_FRAGMENT` / `TORCH_LIBRARY_IMPL` and `torch::jit::tensorexpr::getNNCLoweringRegistry` could insert a custom operator. But the te fuser passes conditional check does not support custom operator. The `isSupported` of `tensorexpr_fuser` checks whether the `Node` is `get_tensorexpr_elementwise_set()`, `supported_non_eltwise_set()`, `supported_misc_set` and `supported_reduction_set`. If a custom operator needs to be added to the TE fusion group, the checked will block it.

Taking the RN50 as an example, we can speed up the model by fusing the convolution and consecutive element-wise operator into a custom operator. The framework overhead becomes non-negligible when the computation becomes more efficient, especially for the latency mode and the tiny models. If the TE fuser allows adding the custom operator to the fusion group, then the entire RN50 model could be fused by TE as a single operator/function consisting of "ExternalCalls" and TE-IR.  This could significantly reduce framework overhead, which in turn improves RN50 E2E performance. The same goes for other models.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73073

Reviewed By: pbelevich

Differential Revision: D35453165

Pulled By: ZolotukhinM

fbshipit-source-id: a764cf340b0b1e05fe230649cbe44f5786bdd37d
(cherry picked from commit ee95aa4d36714540fbb216a338799e6a6bb966d5)
2022-04-07 04:36:39 +00:00
1855b14922 [TensorExpr] Delet DimArg class. (#72390)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72390

This class didn't add much value and only caused more boilerplate code.
This change removes the class and updates all the use cases with
uses of `ExprHandle`.

A side effect of this change is different names in loop variables, which
caused massive mechanical changes in our tests.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D34030296

Pulled By: ZolotukhinM

fbshipit-source-id: 2ba4e313506a43ab129a10d99e72b638b7d40108
(cherry picked from commit c2ec46a0587cafd4e915c5bf1e0dc0b5d244e8d5)
2022-02-11 01:21:59 +00:00
75ce040620 [TensorExpr] Allow for 'keepdim' argument in aten::mean in NNC's external call. (#68756)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68756

That fixes some warnings in our tests.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D32600952

Pulled By: ZolotukhinM

fbshipit-source-id: 548eaf3659e20795cce44d8f57e77f4a47d44d98
2021-11-30 00:06:34 -08:00
ff5c61a74e [TensorExpr] Add lowering for aten::max (reduction). (#66519)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66519

Differential Revision:
D31590853
D31590853

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: a702621621f681d7f5392912e8a77ca124e14170
2021-11-03 09:44:09 -07:00
00afe9ba7b [TensorExpr] Add lowering for aten::embedding. (#66518)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66518

Differential Revision:
D31590855
D31590855

Test Plan: Imported from OSS

Reviewed By: pbelevich

Pulled By: ZolotukhinM

fbshipit-source-id: aace0a87b1649330dae44182f7873aca27160d64
2021-11-03 09:44:07 -07:00
008a58d226 [TensorExpr] Add lowering for aten::conv1d. (#66517)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66517

Differential Revision:
D31590856
D31590856

Test Plan: Imported from OSS

Reviewed By: pbelevich

Pulled By: ZolotukhinM

fbshipit-source-id: c05a37d8741acd0606c2adb8d6cfeb1f57bc8aa0
2021-11-03 09:44:05 -07:00
f23f21dafe [TensorExpr] Remove 'Placeholder' class. (#64887)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64887

BufHandle has exactly the same functionality and should be used instead.

Differential Revision:
D30889483
D30889483

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: 365fe8e396731b88920535a3de96bd3301aaa3f3
2021-09-14 00:22:44 -07:00
f0d274294d [TensorExpr] Nuke KernelArena and KernelScope. (#63587)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63587

Now that there is no classes using KernelArena for memory management we
can remove it.

Differential Revision:
D30429115
D30429115

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: 375f6f9294d27790645eeb7cb5a8e87047a57544
2021-08-24 00:32:16 -07:00
62d02f2b57 [TensorExpr] Make 'Tensor' a value type. (#63586)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63586

This is another commit in transition from KernelArena memory management.
Tensor is essentially just a pair of <BufPtr, StmtPtr> and we don't need
to dynamically allocate it at all - it's cheap to pass it by value, and
that's what we're switching to in this commit.

After this change nothing uses KernelScope/KernelArena and they can be
safely removed.

Differential Revision:
D30429114
D30429114

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: f90b859cfe863692b7beffbe9bd0e4143df1e819
2021-08-24 00:32:13 -07:00
1d62fb8a63 [TensorExpr] Speedup ExternalCall.ComputeInterop test by reducing tensor sizes. (#63526)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63526

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30411410

Pulled By: ZolotukhinM

fbshipit-source-id: d9a99afac14d2238b5100c98ae9ed4467f9f05ea
2021-08-18 22:58:25 -07:00
1dc2b52764 [TensorExpr] Add a wrapper for all expr and stmt pointers. (#63195)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63195

This helps us to later switch from using KernelArena with raw pointers
to shared pointers without having to change all our source files at
once.

The changes are mechanical and should not affect any functionality.

With this PR, we're changing the following:
 * `Add*` --> `AddPtr`
 * `new Add(...)` --> `alloc<Add>(...)`
 * `dynamic_cast<Add*>` --> `to<Add>`
 * `static_cast<Add*>` --> `static_to<Add>`

Due to some complications with args forwarding, some places became more
verbose, e.g.:
 * `new Block({})` --> `new Block(std::vector<ExprPtr>())`

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30292779

Pulled By: ZolotukhinM

fbshipit-source-id: 150301c7d2df56b608b035827b6a9a87f5e2d9e9
2021-08-17 13:44:45 -07:00
a9b0a921d5 Disable avoid-non-const-global-variables lint check (#62008)
Summary:
As GoogleTest `TEST` macro is non-compliant with it as well as `DEFINE_DISPATCH`

All changes but the ones to `.clang-tidy` are generated using following script:
```
for i in `find . -type f -iname "*.c*" -or -iname "*.h"|xargs grep cppcoreguidelines-avoid-non-const-global-variables|cut -f1 -d:|sort|uniq`;  do sed -i "/\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)/d" $i; done
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62008

Reviewed By: driazati, r-barnes

Differential Revision: D29838584

Pulled By: malfet

fbshipit-source-id: 1b2f8602c945bd4ce50a9bfdd204755556e31d13
2021-07-22 18:04:40 -07:00
b0c9762e2d [pytorch][nnc] external function call to xnnpack ops (#59525)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59525

This PR added NNC external function call binding for two XNNPack ops:
- prepacked::linear_clamp_run
- prepacked::conv2d_clamp_run

Both ops take two arguments: a regular input tensor and a prepacked context
object that contains other parameters like weights/bias/etc. The prepacked
context object's type is a custom class.

NNC doesn't generate assembly code that reads the content of the prepacked
object directly. It simply passes it into the XNNPack ops wrapper, so both
NNC and the generated assembly code don't need to know the custom class type.

At compilation time, we use a size-1 dummy tensor as the placeholder for the
prepacked XNNPack context object.

At runtime, we pass in the raw pointer of the XNNPack context object as if it
were a regular tensor storage data pointer.

Inside the external function call wrapper, we reinterpret_cast the raw pointer
back to the custom class type before dispatching to the XNNPack ops.
ghstack-source-id: 132135512

Test Plan: unit test

Reviewed By: bertmaher

Differential Revision: D28924934

fbshipit-source-id: 15326b35dc6c022f4c3f247a2037c361e06e80b4
2021-06-22 21:29:31 -07:00
cbfce376a8 [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D28319469

fbshipit-source-id: 8295597a8ee16b2fef3f7aacdd6c892cb22db988
2021-05-10 03:39:31 -07:00
3a66a1cb99 [clang-tidy] Exclude cppcoreguidelines-avoid-magic-numbers (#57841)
Summary:
Add cppcoreguidelines-avoid-magic-numbers exclusion to clang-tidy
Remove existing nolint warnings using following script:
```
for file in `git ls-files | grep -v \.py`; do gsed '/^ *\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-magic-numbers)/d' -i  $file; done
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57841

Reviewed By: samestep

Differential Revision: D28295045

Pulled By: malfet

fbshipit-source-id: 7c6e8d1213c9593f169ed3df6a916498f1a97163
2021-05-07 20:02:33 -07:00
3c4d57c18b [pytorch][nnc] update external functions for mobile build (#56850)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56850

This is part of the changes to enable NNC AOT compilation for mobile.
The generated kernels need to call these external functions thus change the declarations to use C linkage when building the mobile runtime.

Added nnc_aten_addmm external function.

ghstack-source-id: 127877411

Test Plan:
- build & CI;
- tested mobile build with stacked PRs;

Reviewed By: ZolotukhinM

Differential Revision: D27897154

fbshipit-source-id: 61d5499d7781a83bd2657859659fd1b5043d6b04
2021-04-30 19:07:19 -07:00
4cb534f92e Make PyTorch code-base clang-tidy compliant (#56892)
Summary:
This is an automatic change generated by the following script:
```
#!/usr/bin/env python3
from subprocess import check_output, check_call
import os

def get_compiled_files_list():
    import json
    with open("build/compile_commands.json") as f:
        data = json.load(f)
    files = [os.path.relpath(node['file']) for node in data]
    for idx, fname in enumerate(files):
        if fname.startswith('build/') and fname.endswith('.DEFAULT.cpp'):
            files[idx] = fname[len('build/'):-len('.DEFAULT.cpp')]
    return files

def run_clang_tidy(fname):
    check_call(["python3", "tools/clang_tidy.py", "-c", "build", "-x", fname,"-s"])
    changes = check_output(["git", "ls-files", "-m"])
    if len(changes) == 0:
        return
    check_call(["git", "commit","--all", "-m", f"NOLINT stubs for {fname}"])

def main():
    git_files = check_output(["git", "ls-files"]).decode("ascii").split("\n")
    compiled_files = get_compiled_files_list()
    for idx, fname in enumerate(git_files):
        if fname not in compiled_files:
            continue
        if fname.startswith("caffe2/contrib/aten/"):
            continue
        print(f"[{idx}/{len(git_files)}] Processing {fname}")
        run_clang_tidy(fname)

if __name__ == "__main__":
    main()
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56892

Reviewed By: H-Huang

Differential Revision: D27991944

Pulled By: malfet

fbshipit-source-id: 5415e1eb2c1b34319a4f03024bfaa087007d7179
2021-04-28 14:10:25 -07:00
7ab654afd7 [TensorExpr] Rename Tensor::call to Tensor::load to be consistent with Buf and Placeholder. (#55826)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55826

It's a mechanical change.

Differential Revision: D27717777

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: fbc1bb99602250c706cf2c8c2684119c323e4d51
2021-04-13 12:08:53 -07:00
1ceb90405b [TensorExpr] Add plumbing for conv2d fusion. (#54439)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54439

For now the only way to represent conv2d in TE is via an external call,
and since aten library doesn't have an out variant for conv2d, the
external call has to perform an extra copy. Because of that fusing
conv2d now regressed performance and hence is disabled. However, in near
future we should have two alternative ways to enable it:
1) represent conv2d natively in TE (without an external call)
2) add an out variant for conv2d

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D27237045

Pulled By: ZolotukhinM

fbshipit-source-id: f5545ff711b75f9f37bc056316d1999a70043b4c
2021-03-24 18:49:07 -07:00
067ad31210 [NNC] Added some more external function bindings (#53420)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53420

Reviewed By: navahgar

Differential Revision: D26876784

Pulled By: Chillee

fbshipit-source-id: 05e7c782a72de5159879f88a104f1a273e0345eb
2021-03-08 14:18:30 -08:00
64847c7f0b [TensorExpr] Properly handle ExternalCalls in LoadStore analysis and Inliner. (#52628)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52628

Prior to this change ExternalCalls were not considered as Loads or
Stores to/from its buffers, which led to incorrect behavior in inlining.
This PR fixes it.

Differential Revision: D26589378

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: cd69d5f7075f6dc756aabcf676842b9a250334d6
2021-02-22 21:50:48 -08:00
52e6ef8b53 [TensorExpr] Add another test for ExternalCalls. (#52162)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52162

This test demonstrates how external calls can interoperate with other
tensor computations and between themselves.

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D26410813

Pulled By: ZolotukhinM

fbshipit-source-id: 8180164013b43f613d53620d1b249e0af769ae8e
2021-02-13 18:38:17 -08:00
c639513378 [TensorExpr] Resubmit: Introduce ExternalCall nodes to TE IR. (#51594)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51594

ExternalCall nodes represent opaque calls to external functions to fill a
tensor (buffer) with values. It could be used to include nodes that are
otherwise not-representable as TE, or whose TE representation is currently too
slow.

To make an external function available in NNC as ExternalCall, one needs to
implement a "bridge" function that would take raw (void*) pointers to the data
along with the arrays containing dimension info. This function would then
internally call the desired external function and make sure the results of the
call are correctly placed in the provided raw data buffers.

The reason the PR was previously reverted was that the LLVM generated
calls to bridge functions were breaking unwind tables. This is now fixed
by requiring bridge functions to never throw and setting the
corresponding attribute in the LLVM generated code.

Differential Revision: D26213882

Test Plan: Imported from OSS

Reviewed By: pbelevich, ngimel

Pulled By: ZolotukhinM

fbshipit-source-id: db954d8338e2d750c2bf0a41e88e38bd494f2945
2021-02-03 10:22:54 -08:00
4f37150f40 Revert D26179083: [TensorExpr] Introduce ExternalCall nodes to TE IR.
Test Plan: revert-hammer

Differential Revision:
D26179083 (f4fc3e3920)

Original commit changeset: 9e44de098ae9

fbshipit-source-id: d15684e04c65c395b4102d4f98a4488482822d1b
2021-02-02 05:29:41 -08:00
f4fc3e3920 [TensorExpr] Introduce ExternalCall nodes to TE IR. (#51475)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51475

ExternalCall nodes represent opaque calls to external functions to fill a
tensor (buffer) with values. It could be used to include nodes that are
otherwise not-representable as TE, or whose TE representation is currently too
slow.

To make an external function available in NNC as ExternalCall, one needs to
implement a "bridge" function that would take raw (void*) pointers to the data
along with the arrays containing dimension info. This function would then
internally call the desired external function and make sure the results of the
call are correctly placed in the provided raw data buffers.

Test Plan: Imported from OSS

Reviewed By: pbelevich, Chillee

Differential Revision: D26179083

Pulled By: ZolotukhinM

fbshipit-source-id: 9e44de098ae94d25772cf5e2659d539fa6f3f659
2021-02-02 00:50:46 -08:00