pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Author	SHA1	Message	Date
cyy	1dd503c6fb	[4/N] Fix Wextra-semi warning (#139256 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/139256 Approved by: https://github.com/ezyang	2024-10-31 03:01:14 +00:00
Nikita Shulga	8f1c3c68d3	[BE] Use nested namespaces in .cpp/.cu files (#92100 ) As we live in C++17 world This is a functional no-op, just - `s/namespace at { namespace native {/namespace at::native {/` - `s/namespace torch { namespace jit {/namespace torch::jit {/` Pull Request resolved: https://github.com/pytorch/pytorch/pull/92100 Approved by: https://github.com/izaitsevfb	2023-01-13 16:32:34 +00:00
Peter Bell	5816a95ca9	CPU Kernel: Use per-operator headers (#71137 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71137 ATen has a header dependency problem. Whenever an operator is added or modified, it changes `ATen/Functions.h` and `ATen/NativeFunctions.h` which in turn requires essentially every single file to be rebuilt. Per-operator headers allow files to only include the specific operators they use and so minimizes unnecessary rebuilds during incremental builds and improves cache hits in CI builds. See this note for more details: `3a03af2f50/aten/src/ATen/templates/Functions.h (L20)` Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D33949900 Pulled By: malfet fbshipit-source-id: d76a53bfab9ea75391de060a2d85923339f93a7c (cherry picked from commit d3be6e4283bbc8d5c967c4634ea1a6b3386861ed)	2022-03-05 01:32:17 +00:00
Mike Iovine	65f54bc000	[SR] Optimize VarStack (#68750 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68750 There was some room for optimization in static runtime's `prim::VarStack`: * Avoid refcount bumps - constructing the `std::vector<at::Tensor>` can be avoided by writing a custom version of `stack_out` that takes a `std::vector<at::Tensor>` Skip the memory overlap check * Avoid device dispatcher overhead in a few places (e.g. `tensor.unsqueeze -> at::native::unsqueeze`) Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest -- Stack` Reviewed By: swolchok Differential Revision: D32596934 fbshipit-source-id: e8f0ccea37c48924cb4fccbfdac4e1e11da95ee0	2021-12-20 11:46:11 -08:00
Richard Barnes	bceb1db885	use irange for loops 3 (#66747 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66747 Modified loops in files under fbsource/fbcode/caffe2/ from the format `for(TYPE var=x0;var<x_max;x++)` to the format `for(const auto var: irange(xmax))` This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand. Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D31705365 fbshipit-source-id: 5c3af2184766b063eed2f4e8feb69f1fedd3503e	2021-10-18 21:50:32 -07:00
Xue Li	2f099c7555	Revert D30652629: use irange for loops Test Plan: revert-hammer Differential Revision: D30652629 (`687c2267d4`) Original commit changeset: 0ae6c4bbbb55 fbshipit-source-id: 5c4f067b584a021c8c9656454d1ee60999600fb3	2021-10-15 15:23:10 -07:00
Richard Barnes	687c2267d4	use irange for loops (#66234 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66234 Modified loops in files under fbsource/fbcode/caffe2/ from the format `for(TYPE var=x0;var<x_max;x++)` to the format `for(const auto var: irange(xmax))` This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand. bypass_size_limit allow-large-files Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D30652629 fbshipit-source-id: 0ae6c4bbbb554bad42e372792a6430e1acf15e3e	2021-10-15 13:50:33 -07:00
Nikita Shulga	a9b0a921d5	Disable `avoid-non-const-global-variables` lint check (#62008 ) Summary: As GoogleTest `TEST` macro is non-compliant with it as well as `DEFINE_DISPATCH` All changes but the ones to `.clang-tidy` are generated using following script: ``` for i in `find . -type f -iname ".c" -or -iname "*.h"\|xargs grep cppcoreguidelines-avoid-non-const-global-variables\|cut -f1 -d:\|sort\|uniq`; do sed -i "/\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)/d" $i; done ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/62008 Reviewed By: driazati, r-barnes Differential Revision: D29838584 Pulled By: malfet fbshipit-source-id: 1b2f8602c945bd4ce50a9bfdd204755556e31d13	2021-07-22 18:04:40 -07:00
Winston Smith	47c566ebb1	Rename namespace `vec256` to `vec`, struct `Vec256` to `Vectorized` (and other related classes/structs) (#58438 ) Summary: In order to make it more convenient for maintainers to review the ATen AVX512 implementation, the namespace `vec256` is being renamed to `vec` in this PR, as modifying 77 files & creating 2 new files only took a few minutes, as these changes aren't significant, so fewer files would've to be reviewed while reviewing https://github.com/pytorch/pytorch/issues/56992. The struct `Vec256` is not being renamed to `Vec`, but `Vectorized` instead, because there are some `using Vec=` statements in the codebase, so renaming it to `Vectorized` was more convenient. However, I can still rename it to `Vec`, if required. ### Changes made in this PR - Created `aten/src/ATen/cpu/vec` with subdirectory `vec256` (vec512 would be added via https://github.com/pytorch/pytorch/issues/56992). The changes were made in this manner - 1. First, a script was run to rename `vec256` to `vec` & `Vec` to `Vectorized` - ``` # Ref: https://stackoverflow.com/a/20721292 cd aten/src grep -rli 'vec256\/vec256\.h' * \| xargs -i@ sed -i 's/vec256\/vec256\.h/vec\/vec\.h/g' @ grep -rli 'vec256\/functional\.h' * \| xargs -i@ sed -i 's/vec256\/functional\.h/vec\/functional\.h/g' @ grep -rli 'vec256\/intrinsics\.h' * \| xargs -i@ sed -i 's/vec256\/intrinsics\.h/vec\/vec256\/intrinsics\.h/g' @ grep -rli 'namespace vec256' * \| xargs -i@ sed -i 's/namespace vec256/namespace vec/g' @ grep -rli 'Vec256' * \| xargs -i@ sed -i 's/Vec256/Vectorized/g' @ grep -rli 'vec256\:\:' * \| xargs -i@ sed -i 's/vec256\:\:/vec\:\:/g' @ grep -rli 'at\:\:vec256' * \| xargs -i@ sed -i 's/at\:\:vec256/at\:\:vec/g' @ cd ATen/cpu mkdir vec mv vec256 vec cd vec/vec256 grep -rli 'cpu\/vec256\/' * \| xargs -i@ sed -i 's/cpu\/vec256\//cpu\/vec\/vec256\//g' @ grep -rli 'vec\/vec\.h' * \| xargs -i@ sed -i 's/vec\/vec\.h/vec\/vec256\.h/g' @ ``` 2. `vec256` & `VEC256` were replaced with `vec` & `VEC` respectively in 4 CMake files. 3. In `pytorch_vec/aten/src/ATen/test/`, `vec256_test_all_types.h` & `vec256_test_all_types.cpp` were renamed. 4. `pytorch_vec/aten/src/ATen/cpu/vec/vec.h` & `pytorch_vec/aten/src/ATen/cpu/vec/functional.h` were created. Both currently have one line each & would have 5 when AVX512 support would be added for ATen. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58438 Reviewed By: malfet Differential Revision: D28509615 Pulled By: ezyang fbshipit-source-id: 63840df5f23b3b59e203d25816e2977c6a901780	2021-05-19 16:04:36 -07:00
Nikita Shulga	4cb534f92e	Make PyTorch code-base clang-tidy compliant (#56892 ) Summary: This is an automatic change generated by the following script: ``` #!/usr/bin/env python3 from subprocess import check_output, check_call import os def get_compiled_files_list(): import json with open("build/compile_commands.json") as f: data = json.load(f) files = [os.path.relpath(node['file']) for node in data] for idx, fname in enumerate(files): if fname.startswith('build/') and fname.endswith('.DEFAULT.cpp'): files[idx] = fname[len('build/'):-len('.DEFAULT.cpp')] return files def run_clang_tidy(fname): check_call(["python3", "tools/clang_tidy.py", "-c", "build", "-x", fname,"-s"]) changes = check_output(["git", "ls-files", "-m"]) if len(changes) == 0: return check_call(["git", "commit","--all", "-m", f"NOLINT stubs for {fname}"]) def main(): git_files = check_output(["git", "ls-files"]).decode("ascii").split("\n") compiled_files = get_compiled_files_list() for idx, fname in enumerate(git_files): if fname not in compiled_files: continue if fname.startswith("caffe2/contrib/aten/"): continue print(f"[{idx}/{len(git_files)}] Processing {fname}") run_clang_tidy(fname) if __name__ == "__main__": main() ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/56892 Reviewed By: H-Huang Differential Revision: D27991944 Pulled By: malfet fbshipit-source-id: 5415e1eb2c1b34319a4f03024bfaa087007d7179	2021-04-28 14:10:25 -07:00
Marat Subkhankulov	0d9ca21d74	[Static Runtime] Native stack for contiguous inputs (#50863 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50863 - Avoid calling unsqueeze on every input tensor by copying data directly - Model benchmark shows small improvement: -2.3% (b=1), -1.1% (b=20) - This diff does not yet modify torch::stack implementation, only the static_runtime path. A followup diff will do this. Test Plan: # Test ``` buck test //caffe2/aten:native_test buck run //caffe2/test:torch ``` # Op benchmark expected no changes here because this diff only touches static runtime ``` Baseline \|Native \|Change 6.38 \|6.336 \|-0.69% 6.553 \|6.588 \|0.53% 14.904 \|14.883 \|-0.14% 5.657 \|5.68 \|0.41% 5.612 \|5.795 \|3.26% 6.051 \|6.058 \|0.12% 4.225 \|4.252 \|0.64% 4.24 \|4.294 \|1.27% 6.28 \|4.249 \|-32.34% 6.267 \|4.257 \|-32.07% 418.932 \|404.356 \|-3.48% 417.694 \|404.752 \|-3.10% 1592.455 \|1583.277 \|-0.58% 2919.261 \|2685.636 \|-8.00% 211.458 \|202.838 \|-4.08% 211.518 \|203.229 \|-3.92% 783.953 \|792.198 \|1.05% 1457.823 \|1348.824 \|-7.48% 2032.816 \|1975.961 \|-2.80% 2090.662 \|2000.612 \|-4.31% 6487.098 \|6635.41 \|2.29% 11874.702 \|10853.302 \|-8.60% 2123.83 \|2039.272 \|-3.98% 2195.453 \|2221.82 \|1.20% 6435.978 \|6593.363 \|2.45% 11852.205 \|10858.92 \|-8.38% 2036.526 \|1983.042 \|-2.63% 2055.618 \|2072.03 \|0.80% 6417.192 \|6681.064 \|4.11% 12468.744 \|10888.336 \|-12.67% 4959.704 \|4954.734 \|-0.10% 5121.823 \|4996.84 \|-2.44% 5082.105 \|5029.652 \|-1.03% 5395.936 \|5438.628 \|0.79% 5162.756 \|5114.147 \|-0.94% 23798.08 \|21884.065 \|-8.04% 4957.921 \|4972.01 \|0.28% 4971.234 \|4968.977 \|-0.05% 5005.909 \|5039.95 \|0.68% 5159.614 \|5180.426 \|0.40% 5013.221 \|5202.684 \|3.78% 20238.741 \|20212.581 \|-0.13% 7632.439 \|7610.345 \|-0.29% 7589.376 \|7679.148 \|1.18% 7859.937 \|7850.485 \|-0.12% 8214.213 \|8150.846 \|-0.77% 11606.562 \|11724.139 \|1.01% 34612.919 \|34817.677 \|0.59% ``` # Adindexer model benchmark ``` caffe2=0 batch={1\|20} profile=1 ./scripts/bwasti/static_runtime/run.sh ``` ## Baseline ``` Batch 1 0.00291311 ms. 3.97139%. aten::stack (1 nodes) Batch 20 0.00477447 ms. 0.934081%. aten::stack (1 nodes) ``` ## Native stack (this change) ``` Batch 1 0.00115161 ms. 1.67388%. aten::stack (1 nodes) Batch 20 0.00264831 ms. 0.543767%. aten::stack (1 nodes) ``` Reviewed By: hlu1 Differential Revision: D25988638 fbshipit-source-id: 82ce84c88963cae40dc5819004baf03ce9093ecc	2021-02-03 14:59:52 -08:00

11 Commits