Commit Graph

135 Commits

Author SHA1 Message Date
30587195d3 Migrate c10/macros/cmake_macros.h.in to torch/headeronly (#158035)
Summary: As above, also changes a bunch of the build files to be better

Test Plan:
internal and external CI

did run buck2 build fbcode//caffe2:torch and it succeeded

Rollback Plan:

Reviewed By: swolchok

Differential Revision: D78016591

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158035
Approved by: https://github.com/swolchok
2025-07-15 19:52:59 +00:00
04bd7e6850 [ROCm] Remove use of warpsize on host-side compilation (#156979)
Changes needed for ROCm7.0:
* `warpSize` is _not_ a compile-time constant on device-side compilation for ROCm anymore
* `warpSize` is _not_ defined on host-side compilation, hence `at::cuda::warp_size()` must be used to query warpsize at runtime
* Redefining `C10_WARP_SIZE` to be a compile-time constant, with a reasonable value for device-side compilation, but an unreasonable value of 1 for host-side compilation

Pull Request resolved: https://github.com/pytorch/pytorch/pull/156979
Approved by: https://github.com/jeffdaily

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
2025-07-01 04:55:31 +00:00
6303cc41b7 [ROCm] support CUDA_KERNEL_ASSERT using abort() (#155262)
We won't have the full message that __assert_fail would provide, but at least we won't silently do nothing.

Fixes #155045.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155262
Approved by: https://github.com/hongxiayang, https://github.com/jeffdaily

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
2025-06-18 23:52:35 +00:00
c4d1ff02f8 [Lint] Update clang-format to 19.1.4 (#153889)
All changes other than the one to `tools/linter/adapters/s3_init_config.json` are generated by newer clang-format
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153889
Approved by: https://github.com/cyyever, https://github.com/atalman
2025-05-20 14:12:46 +00:00
e2f9759bd0 Fix broken URLs (#152237)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152237
Approved by: https://github.com/huydhn, https://github.com/malfet
2025-04-27 09:56:42 +00:00
cyy
79e8a69257 Enable move warnings for torch targets (#149923)
This PR enables more move warnings for torch targets and fixes some code.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149923
Approved by: https://github.com/malfet
2025-03-26 08:38:13 +00:00
73fde0d940 [PyTorch] Unbreak C10_ALWAYS_INLINE_ATTRIBUTE on MSVC (#139363)
At least one recent version refuses to accept it on a lambda, so disable.

Differential Revision: [D65250256](https://our.internmc.facebook.com/intern/diff/D65250256/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D65250256/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/139363
Approved by: https://github.com/ngimel, https://github.com/malfet
2024-10-31 07:40:05 +00:00
dbf0fa811a Remove C10_HOST_CONSTEXPR_EXCEPT_WIN_CUDA and CONSTEXPR_EXCEPT_WIN_CUDA (#138479)
BC linter suppressed due to removal of `tools/linter/adapters/constexpr_linter.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138479
Approved by: https://github.com/eqy, https://github.com/malfet
2024-10-24 07:51:05 +00:00
b1b7c714ed Add deprecated C10_UNUSED and C10_NODISCARD macros back (#138398)
For backwards compatibility. Disallow internal use.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138398
Approved by: https://github.com/malfet
2024-10-20 00:21:19 +00:00
fddabc6e0b C10_UNUSED to [[maybe_unused]] (#6357) (#138364)
Summary: Pull Request resolved: https://github.com/pytorch/executorch/pull/6357

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138364
Approved by: https://github.com/Skylion007, https://github.com/eqy
2024-10-19 13:17:43 +00:00
cyy
2f6a70bfea Enable more UBSAN checks (#138288)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138288
Approved by: https://github.com/ezyang
2024-10-19 13:00:26 +00:00
542f7c8383 Eliminate C10_NODISCARD (#138336)
Test Plan: Sandcastle

Reviewed By: swolchok

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138336
Approved by: https://github.com/Skylion007
2024-10-19 02:54:06 +00:00
8dd575faf6 [BE] Modernize C10_UNUSED (#138102)
[`[[maybe_unused]]`](https://en.cppreference.com/w/cpp/language/attributes/maybe_unused) is part of C++17 standard

Test Plan: Sandcastle

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138102
Approved by: https://github.com/Skylion007, https://github.com/albanD, https://github.com/malfet, https://github.com/eqy
2024-10-18 16:33:01 +00:00
8abbd1c7c7 Modernize C10_NODISCARD to [[nodiscard]] (#138151)
PyTorch is C++17 now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138151
Approved by: https://github.com/Skylion007, https://github.com/albanD
2024-10-17 18:07:39 +00:00
c8a7da305b [PyTorch] Add attribute version of C10_ALWAYS_INLINE (#136445)
Sometimes (such as on a lambda), you need `__attribute__((always_inline))` but not `inline`.

Differential Revision: [D63266917](https://our.internmc.facebook.com/intern/diff/D63266917/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136445
Approved by: https://github.com/malfet
2024-10-03 18:18:37 +00:00
db193d1e29 add msg to _assert_async (#134813)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134813
Approved by: https://github.com/ezyang, https://github.com/eqy, https://github.com/albanD
2024-09-03 06:33:18 +00:00
8629939a51 [torch/c10] Add C10_UBSAN_ENABLED macro and use it to disable SymInt_… (#127967)
Adds `C10_UBSAN_ENABLED` macro and use it to disable `SymIntTest::Overflows` (fails under `signed-integer-overflow` UBSAN check).

Also cleans up UBSAN guard in `jit/test_misc.cpp` to use `C10_UBSAN_ENABLED`  and the existing `C10_ASAN_ENABLED` instead of locally defining `HAS_ASANUBSAN`.

> NOTE: This should fix `SymIntTest::Overflows` failing under ubsan in fbcode too...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127967
Approved by: https://github.com/atalman, https://github.com/d4l3k, https://github.com/malfet
2024-06-14 16:01:12 +00:00
bae3b17fd9 Tweak a comment and fix spelling (#126681)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126681
Approved by: https://github.com/Skylion007
2024-05-21 17:19:06 +00:00
ae9a4fa63c [ROCm] enforce ROCM_VERSION >= 6.0 (#125646)
Remove any code relying on ROCM_VERSION < 6.0.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125646
Approved by: https://github.com/albanD, https://github.com/eqy
2024-05-12 18:01:28 +00:00
cyy
d5d13ab15e Remove C10_FALLTHROUGH (#120157)
Since [[fallthrough]] is supported in our C++17 compilers and no other repo is using it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120157
Approved by: https://github.com/Skylion007
2024-02-21 06:18:58 +00:00
9aa8bbf7f2 [BE] Delete C10_IS_TRIVIALLY_COPYABLE (#120120)
It's not used anywhere in PyTorch after custom implementation of `c10::optional` is gone, and it's not used by the repo as well, see https://github.com/search?type=code&q=C10_IS_TRIVIALLY_COPYABLE+org%3Apytorch

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120120
Approved by: https://github.com/Skylion007, https://github.com/albanD, https://github.com/huydhn
2024-02-17 01:04:30 +00:00
79811e765c [2/4] Intel GPU Runtime Upstreaming for Device (#116833)
# Motivation
According to [[1/4] Intel GPU Runtime Upstreaming for Device](https://github.com/pytorch/pytorch/pull/116019), as mentioned in [[RFC] Intel GPU Runtime Upstreaming](https://github.com/pytorch/pytorch/issues/114842), the second PR  covers the changes under `aten`.

# Design
We will compile the code for XPU separately into a library named `libtorch_xpu.so`. Currently, it primarily offers device-related APIs, including
- `getCurrentDeviceProperties`
- `getDeviceProperties`
- `getGlobalIdxFromDevice`
- `getDeviceFromPtr`

# Additional Context
`XPUHooks` is an indispensable part of the runtime. We upstream `XPUHooks` in this PR since there is some code related to `Device` in it and we also refine some logic and code to avoid forward declaration in `DLPack`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116833
Approved by: https://github.com/EikanWang, https://github.com/jgong5, https://github.com/gujinghui, https://github.com/malfet
2024-01-18 05:02:42 +00:00
50049cfaa0 [1/4] Intel GPU Runtime Upstreaming for Device (#116019)
# Motivation
As mentioned in [[RFC] Intel GPU Runtime Upstreaming](https://github.com/pytorch/pytorch/issues/114842), The first runtime component we would like to upstream is `Device` which contains the device management functions of Intel GPU's runtime. To facilitate the code review, we split the code changes into 4 PRs. This is one of the 4 PRs and covers the changes under `c10`.

# Design
Intel GPU device is a wrapper of sycl device on which kernels can be executed. In our design, we will maintain a sycl device pool containing all the GPU devices of the current machine, and manage the status of the device pool by PyTorch. The thread local safe is considered in this design. The corresponding C++ files related to `Device` will be placed in c10/xpu folder. And we provide the c10 device runtime APIs, like
  - `c10::xpu::device_count`
  - `c10::xpu::set_device`
  - ...

# Additional Context
In our plan, 4 PRs should be submitted to PyTorch for `Device`:
1. for c10
2. for aten
3. for python frontend
4. for lazy initialization shared with CUDA

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116019
Approved by: https://github.com/gujinghui, https://github.com/jgong5, https://github.com/EikanWang, https://github.com/malfet
2024-01-12 07:36:25 +00:00
9ac0e6971a Revert "[1/4] Intel GPU Runtime Upstreaming for Device (#116019)"
This reverts commit b4cebe2c34242ceee3a1bc285f426662942a29ac.

Reverted https://github.com/pytorch/pytorch/pull/116019 on behalf of https://github.com/malfet due to Broke internal and periodic buck builds, see https://github.com/pytorch/pytorch/actions/runs/7414664129/job/20176215868 ([comment](https://github.com/pytorch/pytorch/pull/116019#issuecomment-1879030285))
2024-01-05 17:36:39 +00:00
b4cebe2c34 [1/4] Intel GPU Runtime Upstreaming for Device (#116019)
# Motivation
As mentioned in [[RFC] Intel GPU Runtime Upstreaming](https://github.com/pytorch/pytorch/issues/114842), The first runtime component we would like to upstream is `Device` which contains the device management functions of Intel GPU's runtime. To facilitate the code review, we split the code changes into 4 PRs. This is one of the 4 PRs and covers the changes under `c10`.

# Design
Intel GPU device is a wrapper of sycl device on which kernels can be executed. In our design, we will maintain a sycl device pool containing all the GPU devices of the current machine, and manage the status of the device pool by PyTorch. The thread local safe is considered in this design. The corresponding C++ files related to `Device` will be placed in c10/xpu folder. And we provide the c10 device runtime APIs, like
  - `c10::xpu::device_count`
  - `c10::xpu::set_device`
  - ...

# Additional Context
In our plan, 4 PRs should be submitted to PyTorch for `Device`:
1. for c10
2. for aten
3. for python frontend
4. for lazy initialization shared with CUDA

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116019
Approved by: https://github.com/gujinghui, https://github.com/jgong5, https://github.com/EikanWang, https://github.com/malfet
2024-01-04 17:35:04 +00:00
53e32d12c4 [c10] Use nested namespace in c10/cuda (#116464)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116464
Approved by: https://github.com/Skylion007
2023-12-27 23:14:00 +00:00
d7caef7996 [CI] Update clang-format (#116002)
To 17.0.6 build using https://github.com/pytorch/test-infra/blob/main/.github/workflows/clang-tidy-linux.yml

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116002
Approved by: https://github.com/suo
2023-12-18 14:58:46 +00:00
66a76516bf [ROCm] Disabling Kernel Asserts for ROCm by default - fix and clean up and refactoring (#114660)
Related to #103973  #110532 #108404 #94891

**Context:**
As commented in 6ae0554d11/cmake/Dependencies.cmake (L1198)
Kernel asserts are enabled by default for CUDA and disabled for ROCm.
However it is somewhat broken, and Kernel assert was still enabled for ROCm.

Disabling kernel assert is also needed for users who do not have PCIe atomics support. These community users have verified that disabling the kernel assert in PyTorch/ROCm platform fixed their pytorch workflow, like torch.sum script, stable-diffusion. (see the related issues)

**Changes:**

This pull request serves the following purposes:
* Refactor and clean up the logic,  make it simpler for ROCm to enable and disable Kernel Asserts
* Fix the bug that Kernel Asserts for ROCm was not disabled by default.

Specifically,
- Renamed `TORCH_DISABLE_GPU_ASSERTS` to `C10_USE_ROCM_KERNEL_ASSERT` for the following reasons:
(1) This variable only applies to ROCm.
(2) The new name is more align with #define CUDA_KERNEL_ASSERT function.
(3) With USE_ in front of the name, we can easily control it with environment variable to turn on and off this feature during build (e.g. `USE_ROCM_KERNEL_ASSERT=1 python setup.py develop` will enable kernel assert for ROCm build).
- Get rid of the `ROCM_FORCE_ENABLE_GPU_ASSERTS' to simplify the logic and make it easier to understand and maintain
- Added `#cmakedefine` to carry over the CMake variable to C++

**Tests:**
(1) build with default mode and verify that USE_ROCM_KERNEL_ASSERT  is OFF(0), and kernel assert is disabled:

```
python setup.py develop
```
Verify CMakeCache.txt has correct value.
```
/xxxx/pytorch/build$ grep USE_ROCM_KERNEL_ASSERT CMakeCache.txt
USE_ROCM_KERNEL_ASSERT:BOOL=0
```
Tested the following code in ROCm build and CUDA build, and expected the return code differently.

```
subprocess.call([sys.executable, '-c', "import torch;torch._assert_async(torch.tensor(0,device='cuda'));torch.cuda.synchronize()"])
```
This piece of code is adapted from below unit test to get around the limitation that this unit test now was skipped for ROCm. (We will check to enable this unit test in the future)

```
python test/test_cuda_expandable_segments.py -k test_fixed_cuda_assert_async
```

Ran the following script, expecting r ==0 since the CUDA_KERNEL_ASSERT is defined as nothing:
```
>> import sys
>>> import subprocess
>>> r=subprocess.call([sys.executable, '-c', "import torch;torch._assert_async(torch.tensor(0,device='cuda'));torch.cuda.synchronize()"])
>>> r
0
```

(2) Enable the kernel assert by building with USE_ROCM_KERNEL_ASSERT=1, or USE_ROCM_KERNEL_ASSERT=ON
```
USE_ROCM_KERNEL_ASSERT=1 python setup.py develop
```

Verify `USE_ROCM_KERNEL_ASSERT` is `1`
```
/xxxx/pytorch/build$ grep USE_ROCM_KERNEL_ASSERT CMakeCache.txt
USE_ROCM_KERNEL_ASSERT:BOOL=1
```

Run the assert test, and expected return code not equal to 0.

```
>> import sys
>>> import subprocess
>>> r=subprocess.call([sys.executable, '-c', "import torch;torch._assert_async(torch.tensor(0,device='cuda'));torch.cuda.synchronize()"])
>>>/xxxx/pytorch/aten/src/ATen/native/hip/TensorCompare.hip:108: _assert_async_cuda_kernel: Device-side assertion `input[0] != 0' failed.
:0:rocdevice.cpp            :2690: 2435301199202 us: [pid:206019 tid:0x7f6cf0a77700] Callback: Queue 0x7f64e8400000 aborting with error : HSA_STATUS_ERROR_EXCEPTION: An HSAIL operation resulted in a hardware exception. code: 0x1016

>>> r
-6
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114660
Approved by: https://github.com/jeffdaily, https://github.com/malfet, https://github.com/jithunnair-amd
2023-12-13 15:44:53 +00:00
cyy
e676ec2fe7 Fix undefined __assert_fail on FreeBSD (#111761)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111761
Approved by: https://github.com/Skylion007
2023-10-23 12:46:03 +00:00
97396cdbb2 Fix undefined behavior detected by clang-12 (#106354)
Compiler behavior when non-zero offset is added to a null pointer is undefined and is a bad habit.

- When `lapackEig` is called with to estimate a workspace size, do not add matrix size to the W pointer.
- When `unpack_pivots_cpu_kernel` with zero `dim_size` exit early.
- When `topk_impl_loop` is called with  `k` is zero, exit right away as output tensors are empty anyway.
- Ignore adding non-zero storage-offset in `TensorImpl::data_ptr_impl_impl`, which can be the case if tensor is created as `torch.empty(3)[4:]`.
- In `s_addmm_out_sparse_dense_worker` do not call `axpy` over an empty vector.
- In `_sparse_binary_op_intersection_kernel_impl` do skip computing `ptr_indices_dim` when `sparse_dim` is empty.
- Exit `grid_sample` forward/backward kernels earlier if either `input` or `grid` are empty tensors.

Found by asan in clang-12

Before the change UBSan report looks as follows:
```
 ASAN_SYMBOLIZER_PATH=/usr/lib/llvm-12/bin/llvm-symbolizer UBSAN_OPTIONS=print_stacktrace=1 LD_PRELOAD=/usr/lib/llvm-12/lib/clang/12.0.1/lib/linux/libclang_rt.asan-x86_64.so python test_fx_experimental.py -v -k test_normalize_operator_exhaustive_linalg_eig_cpu_float32
Test results will be stored in test-reports/python-unittest/test_fx_experimental

Running tests...
----------------------------------------------------------------------
  test_normalize_operator_exhaustive_linalg_eig_cpu_float32 (__main__.TestNormalizeOperatorsCPU) ... /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/overrides.py:111: UserWarning: 'has_cuda' is deprecated, please use 'torch.backends.cuda.is_built()'
  torch.has_cuda,
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/overrides.py:112: UserWarning: 'has_cudnn' is deprecated, please use 'torch.backends.cudnn.is_available()'
  torch.has_cudnn,
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/overrides.py:118: UserWarning: 'has_mps' is deprecated, please use 'torch.backends.mps.is_built()'
  torch.has_mps,
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/overrides.py:119: UserWarning: 'has_mkldnn' is deprecated, please use 'torch.backends.mkldnn.is_available()'
  torch.has_mkldnn,
/var/lib/jenkins/workspace/aten/src/ATen/native/BatchLinearAlgebra.cpp:937:17: runtime error: applying non-zero offset 20 to null pointer
    #0 0x7f2025794888 in void at::native::lapackEig<float, float>(char, char, int, float*, int, float*, float*, int, float*, int, float*, int, float*, int*) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x9945888)
    #1 0x7f20257da256 in void at::native::(anonymous namespace)::apply_linalg_eig<float>(at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, bool) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x998b256)
    #2 0x7f20257d902d in at::native::(anonymous namespace)::linalg_eig_kernel(at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor const&, bool) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x998a02d)
    #3 0x7f20257b5b3d in at::native::linalg_eig_out_info(at::Tensor const&, at::Tensor&, at::Tensor&, at::Tensor&, bool) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x9966b3d)
    #4 0x7f20257b4770 in at::native::linalg_eig_out(at::Tensor const&, at::Tensor&, at::Tensor&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x9965770)
    #5 0x7f20280710e6 in c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor&, at::Tensor&> (at::Tensor const&, at::Tensor&, at::Tensor&), &(at::(anonymous namespace)::(anonymous namespace)::wrapper_CPU_out_linalg_eig_out(at::Tensor const&, at::Tensor&, at::Tensor&))>, std::tuple<at::Tensor&, at::Tensor&>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor&, at::Tensor&> >, std::tuple<at::Tensor&, at::Tensor&> (at::Tensor const&, at::Tensor&, at::Tensor&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor&, at::Tensor&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xc2220e6)
    #6 0x7f202727a045 in at::_ops::linalg_eig_out::call(at::Tensor const&, at::Tensor&, at::Tensor&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xb42b045)
    #7 0x7f20257b7e29 in at::native::linalg_eig(at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x9968e29)
    #8 0x7f2028070bf0 in c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor> (at::Tensor const&), &(at::(anonymous namespace)::(anonymous namespace)::wrapper_CPU__linalg_eig(at::Tensor const&))>, std::tuple<at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&> >, std::tuple<at::Tensor, at::Tensor> (at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xc221bf0)
    #9 0x7f2026b1f787 in std::tuple<at::Tensor, at::Tensor> c10::Dispatcher::redispatch<std::tuple<at::Tensor, at::Tensor>, at::Tensor const&>(c10::TypedOperatorHandle<std::tuple<at::Tensor, at::Tensor> (at::Tensor const&)> const&, c10::DispatchKeySet, at::Tensor const&) const (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xacd0787)
    #10 0x7f20273230a7 in at::_ops::linalg_eig::redispatch(c10::DispatchKeySet, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xb4d40a7)
    #11 0x7f202c3cc32d in torch::autograd::VariableType::(anonymous namespace)::linalg_eig(c10::DispatchKeySet, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x1057d32d)
    #12 0x7f202c3cba96 in c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&), &(torch::autograd::VariableType::(anonymous namespace)::linalg_eig(c10::DispatchKeySet, at::Tensor const&))>, std::tuple<at::Tensor, at::Tensor>, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&> >, std::tuple<at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x1057ca96)
    #13 0x7f20272798e0 in at::_ops::linalg_eig::call(at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xb42a8e0)
    #14 0x7f2043d97ae3 in torch::autograd::THPVariable_linalg_eig(_object*, _object*, _object*) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_python.so+0x23feae3)
    #15 0x5072d6 in cfunction_call /usr/local/src/conda/python-3.9.17/Objects/methodobject.c:543:19
    ...

SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /var/lib/jenkins/workspace/aten/src/ATen/native/BatchLinearAlgebra.cpp:937:17 in
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106354
Approved by: https://github.com/huydhn, https://github.com/lezcano
2023-08-03 05:36:03 +00:00
a1d0db1c60 [pytorch] Fix MSVC unexpected tokens following preprocessor directive (#105922)
Summary:
Fix this warning:
```
caffe2\c10\macros\Macros.h(138): warning C4067: unexpected tokens following preprocessor directive - expected a newline
```
`caffe2/c10/util/variant.h` already has a similar to check and define a stub for `__has_attribute(x)`, so this would not be new to caffe2/pytorch.

Test Plan: CI should complete, still with plenty of caffe2 warnings but this one should be gone from the Windows build log

Differential Revision: D47735319

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105922
Approved by: https://github.com/kit1980
2023-07-27 06:03:31 +00:00
076781ba9b Revert "fix building errors on FreeBSD (#105897)"
This reverts commit 5c5eece6d85d5be3485f96a6da3905f2dd28331b.

Reverted https://github.com/pytorch/pytorch/pull/105897 on behalf of https://github.com/PaliC due to causing regressions on internal models ([comment](https://github.com/pytorch/pytorch/pull/105897#issuecomment-1652840218))
2023-07-27 03:01:44 +00:00
cyy
5c5eece6d8 fix building errors on FreeBSD (#105897)
Although FreeBSD is not officially supported, this PR fixes some errors on FreeBSD.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105897
Approved by: https://github.com/kit1980
2023-07-26 08:11:42 +00:00
cyy
37f7c00a8a More fixes and improved clang-tidy checkers (#93213)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93213
Approved by: https://github.com/Skylion007
2023-02-01 14:44:17 +00:00
cyy
f172feae0d More tidy fixes (#93069)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93069
Approved by: https://github.com/Skylion007
2023-01-27 06:40:50 +00:00
eqy
f8026413f5 Fix CUDA_MAX_THREADS_PER_SM for sm_89 (#91972)
Basically the same as #88644, to fix warnings like `ptxas warning : Value of threads per SM for entry _ZN2at6native13reduce_kernelILi512ELi1ENS0_8ReduceOpIfNS0_10NormTwoffEEjfLi4EEEEEvT1_ is out of range. .minnctapersm will be ignored`

CC @ptrblck @ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91972
Approved by: https://github.com/ngimel
2023-01-12 00:30:27 +00:00
bfdc0358dc Compile fix for Clang + libc++ (#91212)
Summary:
LLVM 15 has a compile issue with the deprecated __has_trivial_copy. Update the GCC ifdef logic to exclude Clang + libc++.

```
caffe2/c10/util/Optional.h:536:13: error: builtin __has_trivial_copy is deprecated; use __is_trivially_copyable instead [-Werror,-Wdeprecated-builtins]
            C10_IS_TRIVIALLY_COPYABLE(T) &&
            ^
caffe2/c10/macros/Macros.h:438:38: note: expanded from macro 'C10_IS_TRIVIALLY_COPYABLE'
#define C10_IS_TRIVIALLY_COPYABLE(T) __has_trivial_copy(T)
```

Test Plan: CI

Reviewed By: kit1980

Differential Revision: D42180203

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91212
Approved by: https://github.com/kit1980, https://github.com/soumith
2022-12-21 11:19:58 +00:00
322e4b4c8a set -Wsuggest-override for builds (#89852)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/pytorch/pytorch/pull/89852).
* __->__ #89852
* #89851

set -Wsuggest-override for builds

Summary: This was flagged by a Meta internal build.

Test Plan: Rely on CI.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89852
Approved by: https://github.com/malfet
2022-12-19 22:08:47 +00:00
3e30a9ea1c Fix CUDA_MAX_THREADS_PER_SM for sm_87 (#88644)
#88326
CC @ngimel @ptrblck

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88644
Approved by: https://github.com/ngimel
2022-11-08 19:44:23 +00:00
fbd08fb358 Introduce TORCH_DISABLE_GPU_ASSERTS (#84190)
- Asserts for CUDA are enabled by default
- Disabled for ROCm by default by setting `TORCH_DISABLE_GPU_ASSERTS` to `ON`
- Can be enabled for ROCm by setting above variable to`OFF` during build or can be forcefully enabled by setting `ROCM_FORCE_ENABLE_GPU_ASSERTS:BOOL=ON`

This is follow up changes as per comment in PR #81790, comment [link](https://github.com/pytorch/pytorch/pull/81790#issuecomment-1215929021)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84190
Approved by: https://github.com/jeffdaily, https://github.com/malfet
2022-11-04 04:43:05 +00:00
0fa23663cc Revert "Introduce TORCH_DISABLE_GPU_ASSERTS (#84190)"
This reverts commit 1e2c4a6e0e60dda763b53f00f25ee5c1f1e5233d.

Reverted https://github.com/pytorch/pytorch/pull/84190 on behalf of https://github.com/malfet due to Needs internal changes, has to be landed via co-dev
2022-11-02 18:13:37 +00:00
1e2c4a6e0e Introduce TORCH_DISABLE_GPU_ASSERTS (#84190)
- Asserts for CUDA are enabled by default
- Disabled for ROCm by default by setting `TORCH_DISABLE_GPU_ASSERTS` to `ON`
- Can be enabled for ROCm by setting above variable to`OFF` during build or can be forcefully enabled by setting `ROCM_FORCE_ENABLE_GPU_ASSERTS:BOOL=ON`

This is follow up changes as per comment in PR #81790, comment [link](https://github.com/pytorch/pytorch/pull/81790#issuecomment-1215929021)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84190
Approved by: https://github.com/jeffdaily, https://github.com/malfet
2022-11-02 17:41:57 +00:00
9c793b366f Move incorrectly placed closing curly brace of extern "C" block (#87853)
### Bug description
When `__SYCL_DEVICE_ONLY__` is defined, while building PyTorch, the output of the preprocessing step would not have the closing curly brace of the `extern "C"` block, as it has been incorrectly placed. Compilers don't seem to report an error or a warning for a missing closing brace of an `extern "C"` block.

### Impact of the bug
If `c10/macros/Macros.h` would be included in a C++ file, and after the preprocessing stage, if the preprocessed source file would have some templated code after `extern "C" {`, then, after compilation, linking might fail with the error `templates must have c++ linkage`). eg. https://stackoverflow.com/questions/61717819/template-with-c-linkage-error-when-using-template-keyword-in-main-cpp/61717908#61717908 (its answer also has a small snippet of code to reproduce such an issue).

### Solution in this PR
one-liner bug fix that rectifies the placement of closing curly brace (`}`), so that the `extern "C"` block ends properly when `__SYCL_DEVICE_ONLY__` is defined.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87853
Approved by: https://github.com/jgong5, https://github.com/kit1980, https://github.com/malfet
2022-10-28 03:42:20 +00:00
85ffbedfb2 Strip GCC5 stuff from PyTorch (#85914)
[This file](https://github.com/pytorch/pytorch/pull/63208/files) indicates that we don't support anything less than GCC 7.5. Given that, let's remove this GCC 5 stuff.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85914
Approved by: https://github.com/ezyang
2022-10-26 00:07:44 +00:00
ff56f1c30d Define the SYCL device version assertation used in the other backend, like XPU (#84106)
# Motivation:
We need a device version assertation that can be used in SYCL kernel. SYCL_KERNEL_ASSERT will be used in the kernel launched on device XPU.

# Solution:
We add a macro SYCL_KERNEL_ASSERT via __assert_fail declaration for Linux and _wassert declaration for Windows even though  NDEBUG is enabled.

# Additional context:
`__assert_fail` in SYCL kernel
`extern SYCL_EXTERNAL void __assert_fail(const char *expr, const char *file, unsigned int line, const char *func);`
`_wassert` in SYCL kernel
`extern SYCL_EXTERNAL void _wassert(const wchar_t *wexpr, const wchar_t *wfile, unsigned line);`
No additional unit test because this change could not affect PyTorch's functionality. It only affects assertation in kernel on XPU backend. So it is difficult to add ut to test it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84106
Approved by: https://github.com/malfet
2022-09-01 22:22:28 +00:00
a08911400e Use C10_HAS_CPP_ATTRIBUTE to simplify nodiscard definition (#83976)
`C10_HAS_CPP_ATTRIBUTE` only expands to `__has_cpp_attribute` when it
is defined, so we avoid the extra `#if defined(__has_cpp_attribute)`
checks and double-nested `#if`s
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83976
Approved by: https://github.com/albanD
2022-08-26 15:45:47 +00:00
e4af53c1a1 [PyTorch] Remove unused sstream/string includes from c10/macros/Macros.h (#83353)
Nothing in the rest of the header seems to use these.

Differential Revision: [D38672680](https://our.internmc.facebook.com/intern/diff/D38672680/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83353
Approved by: https://github.com/malfet
2022-08-23 16:56:00 +00:00
8473e69684 [ROCm] Fixes the kernel asserts API declaration mismatch error (#81790)
This problem updates the the PR [#73040](https://github.com/pytorch/pytorch/pull/73040)

The compilation error in pyTorch with ROCm is successful with these changes when `NDEBUG` is enabled.

Solution:
For HIP we keep `__device__ __assert_fail()`
and for host side compilation we want to use the `__assert_fail()` from the glibc library.

Tested the code by compiling with below steps
```
python3 tools/amd_build/build_amd.py
python3 setup.py develop --cmake-only
cmake -DHIP_HIPCC_FLAGS_RELEASE="-DNDEBUG" build
cmake --build build
```

The UT test_fixed_cuda_assert_async is still skipped due performance overhead.

cc @jithunnair-amd

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81790
Approved by: https://github.com/shintaro-iwasaki, https://github.com/jeffdaily, https://github.com/malfet
2022-08-16 19:22:31 +00:00
80c4919bec [PyTorch] Stack-allocate boxed args for RecordFunction
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76266

Saving a heap allocation in this path improves performance.

Differential Revision: [D34090699](https://our.internmc.facebook.com/intern/diff/D34090699/)

Approved by: https://github.com/ezyang
2022-05-24 17:06:40 +00:00
2e480fc2db Cleanup ATen-core forward declarations
I noticed that when `SymInt` was introduced, `jit_type_base.h` was
added as an include to the `Operator.h` template which is supposed to
be kept extremely clean and only use forward declarations. Also,
that forward declarations for `OptionalArrayRef` were missing.

So, I've refactored the forward declarations into
`ATen/core/ATen_fwd.h` and cleaned up some of the `c10`
headers that were masking these missing declarations. I've also
re-generated the pre-compiled header so `SymInt` is included.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76576
Approved by: https://github.com/albanD
2022-05-02 14:50:48 +00:00