8 Commits

Author SHA1 Message Date
622947afa8 [BE] Use nested namespace in ATen/native (#115938)
It's a C++17 feature that usually makes code a bit more compact, and should have no side-effects otherwise.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115938
Approved by: https://github.com/Skylion007
2023-12-16 06:07:40 +00:00
8a9aca7b8d Reland 2 Many symintifications (#87604) (#87980)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87980
Approved by: https://github.com/ezyang
2022-10-28 13:40:11 +00:00
8b4d95759c Revert "Many symintifications (#87604)"
This reverts commit 777e6a2c5100f3274cff1bcf7e47ccbe1a651927.

Reverted https://github.com/pytorch/pytorch/pull/87604 on behalf of https://github.com/weiwangmeta due to breaking internal builds
2022-10-28 03:00:11 +00:00
777e6a2c51 Many symintifications (#87604)
Adds
expand_inplace
conv conv_double_backward
convolution
adaptive_avg_pool2d_symint
_embedding_bag_backward_symint
cudnn_grid_sampler
cuda 32 bit indexing
nll_loss / nll_loss_2d
tensor split
pooling same mode
cudnn_is_acceptable
storage nbytes

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87604
Approved by: https://github.com/ezyang
2022-10-26 17:33:53 +00:00
4abd3e299d ATen/native (3/6): Use per-operator headers (#75573)
Differential Revision: [D40126701](https://our.internmc.facebook.com/intern/diff/D40126701)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75573
Approved by: https://github.com/malfet
2022-10-24 23:12:14 +00:00
96e3d1a76c Remove native_functions.yaml dependency from Sorting.cu (#66621)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66621

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D31856099

Pulled By: dagitses

fbshipit-source-id: d9c2b6b45099e49c7beaae5888140de350d23696
2021-11-02 14:46:29 -07:00
4cb534f92e Make PyTorch code-base clang-tidy compliant (#56892)
Summary:
This is an automatic change generated by the following script:
```
#!/usr/bin/env python3
from subprocess import check_output, check_call
import os

def get_compiled_files_list():
    import json
    with open("build/compile_commands.json") as f:
        data = json.load(f)
    files = [os.path.relpath(node['file']) for node in data]
    for idx, fname in enumerate(files):
        if fname.startswith('build/') and fname.endswith('.DEFAULT.cpp'):
            files[idx] = fname[len('build/'):-len('.DEFAULT.cpp')]
    return files

def run_clang_tidy(fname):
    check_call(["python3", "tools/clang_tidy.py", "-c", "build", "-x", fname,"-s"])
    changes = check_output(["git", "ls-files", "-m"])
    if len(changes) == 0:
        return
    check_call(["git", "commit","--all", "-m", f"NOLINT stubs for {fname}"])

def main():
    git_files = check_output(["git", "ls-files"]).decode("ascii").split("\n")
    compiled_files = get_compiled_files_list()
    for idx, fname in enumerate(git_files):
        if fname not in compiled_files:
            continue
        if fname.startswith("caffe2/contrib/aten/"):
            continue
        print(f"[{idx}/{len(git_files)}] Processing {fname}")
        run_clang_tidy(fname)

if __name__ == "__main__":
    main()
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56892

Reviewed By: H-Huang

Differential Revision: D27991944

Pulled By: malfet

fbshipit-source-id: 5415e1eb2c1b34319a4f03024bfaa087007d7179
2021-04-28 14:10:25 -07:00
33519e19ab Fix 64-bit indexing in GridSampler (#41923)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/41656

For the CPU version, this is a regression introduced in https://github.com/pytorch/pytorch/issues/10980 which vectorized the `grid_sampler_2d` implementation. It uses the AVX2 gather intrinsic which for `float` requires 32-bit indexing to match the number of floats in the AVX register. There is also an `i64gather_ps` variant but this only utilizes half of the vector width so would be expected to give worse performance in the more likely case where 32-bit indexing is acceptable. So, I've left the optimised AVX version as-is and reinstated the old non-vectorized version as a fallback.

For the CUDA version, this operation has never supported 32-bit indexing so this isn't a regression. I've templated the kernel on index type and added 64-bit variants. Although I gather in some places a simple `TORCH_CHECK(canUse32BitIndexMath(...))` is used instead. So, there is a decision to be made here.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41923

Reviewed By: glaringlee

Differential Revision: D22925931

Pulled By: zou3519

fbshipit-source-id: 920816107aae26360c5e7f4e9c729fa9057268bb
2020-08-06 16:08:09 -07:00