11 Commits

Author SHA1 Message Date
cyy
1dd503c6fb [4/N] Fix Wextra-semi warning (#139256)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139256
Approved by: https://github.com/ezyang
2024-10-31 03:01:14 +00:00
8f1c3c68d3 [BE] Use nested namespaces in .cpp/.cu files (#92100)
As we live in C++17 world

This is a functional no-op, just
- `s/namespace at { namespace native {/namespace at::native {/`
- `s/namespace torch { namespace jit {/namespace torch::jit {/`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92100
Approved by: https://github.com/izaitsevfb
2023-01-13 16:32:34 +00:00
5816a95ca9 CPU Kernel: Use per-operator headers (#71137)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71137

ATen has a header dependency problem. Whenever an operator is added or modified, it changes `ATen/Functions.h` and `ATen/NativeFunctions.h` which in turn requires essentially every single file to be rebuilt. Per-operator headers allow files to only include the specific operators they use and so minimizes unnecessary rebuilds during incremental builds and improves cache hits in CI builds.

See this note for more details:
3a03af2f50/aten/src/ATen/templates/Functions.h (L20)

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D33949900

Pulled By: malfet

fbshipit-source-id: d76a53bfab9ea75391de060a2d85923339f93a7c
(cherry picked from commit d3be6e4283bbc8d5c967c4634ea1a6b3386861ed)
2022-03-05 01:32:17 +00:00
65f54bc000 [SR] Optimize VarStack (#68750)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68750

There was some room for optimization in static runtime's `prim::VarStack`:

* Avoid refcount bumps - constructing the `std::vector<at::Tensor>` can be avoided by writing a custom version of `stack_out` that takes a `std::vector<at::Tensor*>`

* Skip the memory overlap check

* Avoid device dispatcher overhead in a few places (e.g. `tensor.unsqueeze -> at::native::unsqueeze`)

Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest -- Stack`

Reviewed By: swolchok

Differential Revision: D32596934

fbshipit-source-id: e8f0ccea37c48924cb4fccbfdac4e1e11da95ee0
2021-12-20 11:46:11 -08:00
bceb1db885 use irange for loops 3 (#66747)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66747

Modified loops in files under fbsource/fbcode/caffe2/ from the format

`for(TYPE var=x0;var<x_max;x++)`

to the format

`for(const auto var: irange(xmax))`

This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D31705365

fbshipit-source-id: 5c3af2184766b063eed2f4e8feb69f1fedd3503e
2021-10-18 21:50:32 -07:00
2f099c7555 Revert D30652629: use irange for loops
Test Plan: revert-hammer

Differential Revision:
D30652629 (687c2267d4)

Original commit changeset: 0ae6c4bbbb55

fbshipit-source-id: 5c4f067b584a021c8c9656454d1ee60999600fb3
2021-10-15 15:23:10 -07:00
687c2267d4 use irange for loops (#66234)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66234

Modified loops in files under fbsource/fbcode/caffe2/ from the format

`for(TYPE var=x0;var<x_max;x++)`

to the format

`for(const auto var: irange(xmax))`

This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.

bypass_size_limit
allow-large-files

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D30652629

fbshipit-source-id: 0ae6c4bbbb554bad42e372792a6430e1acf15e3e
2021-10-15 13:50:33 -07:00
a9b0a921d5 Disable avoid-non-const-global-variables lint check (#62008)
Summary:
As GoogleTest `TEST` macro is non-compliant with it as well as `DEFINE_DISPATCH`

All changes but the ones to `.clang-tidy` are generated using following script:
```
for i in `find . -type f -iname "*.c*" -or -iname "*.h"|xargs grep cppcoreguidelines-avoid-non-const-global-variables|cut -f1 -d:|sort|uniq`;  do sed -i "/\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)/d" $i; done
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62008

Reviewed By: driazati, r-barnes

Differential Revision: D29838584

Pulled By: malfet

fbshipit-source-id: 1b2f8602c945bd4ce50a9bfdd204755556e31d13
2021-07-22 18:04:40 -07:00
47c566ebb1 Rename namespace vec256 to vec, struct Vec256 to Vectorized (and other related classes/structs) (#58438)
Summary:
In order to make it more convenient for maintainers to review the ATen AVX512 implementation, the namespace `vec256` is being renamed to `vec` in this PR, as modifying 77 files & creating 2 new files only took a few minutes, as these changes aren't significant, so fewer files would've to be reviewed while reviewing https://github.com/pytorch/pytorch/issues/56992.
The struct `Vec256` is not being renamed to `Vec`, but `Vectorized` instead, because there are some `using Vec=` statements in the codebase, so renaming it to `Vectorized` was more convenient. However, I can still rename it to `Vec`, if required.

### Changes made in this PR -
Created `aten/src/ATen/cpu/vec` with subdirectory `vec256` (vec512 would be added via https://github.com/pytorch/pytorch/issues/56992).
The changes were made in this manner -

1. First, a script was run to rename `vec256` to `vec` & `Vec` to `Vectorized` -
```
# Ref: https://stackoverflow.com/a/20721292
cd aten/src
grep -rli 'vec256\/vec256\.h' * | xargs -i@ sed -i 's/vec256\/vec256\.h/vec\/vec\.h/g' @
grep -rli 'vec256\/functional\.h' * | xargs -i@ sed -i 's/vec256\/functional\.h/vec\/functional\.h/g' @
grep -rli 'vec256\/intrinsics\.h' * | xargs -i@ sed -i 's/vec256\/intrinsics\.h/vec\/vec256\/intrinsics\.h/g' @
grep -rli 'namespace vec256' * | xargs -i@ sed -i 's/namespace vec256/namespace vec/g' @
grep -rli 'Vec256' * | xargs -i@ sed -i 's/Vec256/Vectorized/g' @
grep -rli 'vec256\:\:' * | xargs -i@ sed -i 's/vec256\:\:/vec\:\:/g' @
grep -rli 'at\:\:vec256' * | xargs -i@ sed -i 's/at\:\:vec256/at\:\:vec/g' @
cd ATen/cpu
mkdir vec
mv vec256 vec
cd vec/vec256
grep -rli 'cpu\/vec256\/' * | xargs -i@ sed -i 's/cpu\/vec256\//cpu\/vec\/vec256\//g' @
grep -rli 'vec\/vec\.h' * | xargs -i@ sed -i 's/vec\/vec\.h/vec\/vec256\.h/g' @
```

2. `vec256` & `VEC256` were replaced with `vec` & `VEC` respectively in 4 CMake files.

3. In `pytorch_vec/aten/src/ATen/test/`, `vec256_test_all_types.h` & `vec256_test_all_types.cpp` were renamed.

4. `pytorch_vec/aten/src/ATen/cpu/vec/vec.h` & `pytorch_vec/aten/src/ATen/cpu/vec/functional.h` were created.
Both currently have one line each & would have 5 when AVX512 support would be added for ATen.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58438

Reviewed By: malfet

Differential Revision: D28509615

Pulled By: ezyang

fbshipit-source-id: 63840df5f23b3b59e203d25816e2977c6a901780
2021-05-19 16:04:36 -07:00
4cb534f92e Make PyTorch code-base clang-tidy compliant (#56892)
Summary:
This is an automatic change generated by the following script:
```
#!/usr/bin/env python3
from subprocess import check_output, check_call
import os

def get_compiled_files_list():
    import json
    with open("build/compile_commands.json") as f:
        data = json.load(f)
    files = [os.path.relpath(node['file']) for node in data]
    for idx, fname in enumerate(files):
        if fname.startswith('build/') and fname.endswith('.DEFAULT.cpp'):
            files[idx] = fname[len('build/'):-len('.DEFAULT.cpp')]
    return files

def run_clang_tidy(fname):
    check_call(["python3", "tools/clang_tidy.py", "-c", "build", "-x", fname,"-s"])
    changes = check_output(["git", "ls-files", "-m"])
    if len(changes) == 0:
        return
    check_call(["git", "commit","--all", "-m", f"NOLINT stubs for {fname}"])

def main():
    git_files = check_output(["git", "ls-files"]).decode("ascii").split("\n")
    compiled_files = get_compiled_files_list()
    for idx, fname in enumerate(git_files):
        if fname not in compiled_files:
            continue
        if fname.startswith("caffe2/contrib/aten/"):
            continue
        print(f"[{idx}/{len(git_files)}] Processing {fname}")
        run_clang_tidy(fname)

if __name__ == "__main__":
    main()
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56892

Reviewed By: H-Huang

Differential Revision: D27991944

Pulled By: malfet

fbshipit-source-id: 5415e1eb2c1b34319a4f03024bfaa087007d7179
2021-04-28 14:10:25 -07:00
0d9ca21d74 [Static Runtime] Native stack for contiguous inputs (#50863)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50863

- Avoid calling unsqueeze on every input tensor by copying data directly
- Model benchmark shows small improvement: -2.3% (b=1), -1.1% (b=20)
- This diff does not yet modify torch::stack implementation, only the static_runtime path. A followup diff will do this.

Test Plan:
# Test
```
buck test //caffe2/aten:native_test
buck run //caffe2/test:torch
```
# Op benchmark
expected no changes here because this diff only touches static runtime
```
Baseline                  |Native                   |Change
6.38                      |6.336                    |-0.69%
6.553                     |6.588                    |0.53%
14.904                    |14.883                   |-0.14%
5.657                     |5.68                     |0.41%
5.612                     |5.795                    |3.26%
6.051                     |6.058                    |0.12%
4.225                     |4.252                    |0.64%
4.24                      |4.294                    |1.27%
6.28                      |4.249                    |-32.34%
6.267                     |4.257                    |-32.07%
418.932                   |404.356                  |-3.48%
417.694                   |404.752                  |-3.10%
1592.455                  |1583.277                 |-0.58%
2919.261                  |2685.636                 |-8.00%
211.458                   |202.838                  |-4.08%
211.518                   |203.229                  |-3.92%
783.953                   |792.198                  |1.05%
1457.823                  |1348.824                 |-7.48%
2032.816                  |1975.961                 |-2.80%
2090.662                  |2000.612                 |-4.31%
6487.098                  |6635.41                  |2.29%
11874.702                 |10853.302                |-8.60%
2123.83                   |2039.272                 |-3.98%
2195.453                  |2221.82                  |1.20%
6435.978                  |6593.363                 |2.45%
11852.205                 |10858.92                 |-8.38%
2036.526                  |1983.042                 |-2.63%
2055.618                  |2072.03                  |0.80%
6417.192                  |6681.064                 |4.11%
12468.744                 |10888.336                |-12.67%
4959.704                  |4954.734                 |-0.10%
5121.823                  |4996.84                  |-2.44%
5082.105                  |5029.652                 |-1.03%
5395.936                  |5438.628                 |0.79%
5162.756                  |5114.147                 |-0.94%
23798.08                  |21884.065                |-8.04%
4957.921                  |4972.01                  |0.28%
4971.234                  |4968.977                 |-0.05%
5005.909                  |5039.95                  |0.68%
5159.614                  |5180.426                 |0.40%
5013.221                  |5202.684                 |3.78%
20238.741                 |20212.581                |-0.13%
7632.439                  |7610.345                 |-0.29%
7589.376                  |7679.148                 |1.18%
7859.937                  |7850.485                 |-0.12%
8214.213                  |8150.846                 |-0.77%
11606.562                 |11724.139                |1.01%
34612.919                 |34817.677                |0.59%
```

# Adindexer model benchmark

```
caffe2=0 batch={1|20} profile=1 ./scripts/bwasti/static_runtime/run.sh
```

## Baseline
```
Batch 1
0.00291311 ms.    3.97139%. aten::stack (1 nodes)
Batch 20
0.00477447 ms.   0.934081%. aten::stack (1 nodes)

```

## Native stack (this change)
```
Batch 1
0.00115161 ms.    1.67388%. aten::stack (1 nodes)
Batch 20
0.00264831 ms.   0.543767%. aten::stack (1 nodes)
```

Reviewed By: hlu1

Differential Revision: D25988638

fbshipit-source-id: 82ce84c88963cae40dc5819004baf03ce9093ecc
2021-02-03 14:59:52 -08:00