179 Commits

Author SHA1 Message Date
2b0f9b1f61 Move c10/macros/Macros.h to headeronly (#158365)
^

Differential Revision: [D78361893](https://our.internmc.facebook.com/intern/diff/D78361893/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158365
Approved by: https://github.com/swolchok
ghstack dependencies: #158358
2025-07-16 18:46:52 +00:00
b40f48d191 Move the rest of c10/macros/Export.h (#158358)
Differential Revision: [D78356975](https://our.internmc.facebook.com/intern/diff/D78356975/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158358
Approved by: https://github.com/swolchok
2025-07-16 18:46:52 +00:00
30587195d3 Migrate c10/macros/cmake_macros.h.in to torch/headeronly (#158035)
Summary: As above, also changes a bunch of the build files to be better

Test Plan:
internal and external CI

did run buck2 build fbcode//caffe2:torch and it succeeded

Rollback Plan:

Reviewed By: swolchok

Differential Revision: D78016591

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158035
Approved by: https://github.com/swolchok
2025-07-15 19:52:59 +00:00
04bd7e6850 [ROCm] Remove use of warpsize on host-side compilation (#156979)
Changes needed for ROCm7.0:
* `warpSize` is _not_ a compile-time constant on device-side compilation for ROCm anymore
* `warpSize` is _not_ defined on host-side compilation, hence `at::cuda::warp_size()` must be used to query warpsize at runtime
* Redefining `C10_WARP_SIZE` to be a compile-time constant, with a reasonable value for device-side compilation, but an unreasonable value of 1 for host-side compilation

Pull Request resolved: https://github.com/pytorch/pytorch/pull/156979
Approved by: https://github.com/jeffdaily

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
2025-07-01 04:55:31 +00:00
fee2377f9e Reapply D77381084 / #156964: Rename torch::standalone to headeronly (#157251)
Was reverted due to internal failure which should be fixed now. I believe Jane wants this reapplied and picked to release, and she's out this week.

Original summary:

headeronly is more clear, let's change the name before anyone depends on standalone

Differential Revision: [D77520173](https://our.internmc.facebook.com/intern/diff/D77520173/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/157251
Approved by: https://github.com/janeyx99, https://github.com/Skylion007, https://github.com/desertfire
2025-06-30 23:25:30 +00:00
e290a4c645 Revert "Rename torch::standalone to headeronly (#156964)"
This reverts commit 7e54c02a35b905e758497b856a1953eb009ba836.

Reverted https://github.com/pytorch/pytorch/pull/156964 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/156964#issuecomment-3011136947))
2025-06-27 02:20:33 +00:00
7e54c02a35 Rename torch::standalone to headeronly (#156964)
Summary: headeronly is more clear, let's change the name before anyone depends on standalone

Test Plan:
CI should pass!

Rollback Plan:

Differential Revision: D77381084

Pull Request resolved: https://github.com/pytorch/pytorch/pull/156964
Approved by: https://github.com/swolchok, https://github.com/albanD, https://github.com/desertfire
2025-06-27 01:00:14 +00:00
acaf6ba3c6 Organize BUCK for torch/standalone (#156503)
Summary: Undo highlevel BUCKification in favor of something more organized by moving it to the dir itself

Test Plan:
CI

Rollback Plan:

Reviewed By: swolchok

Differential Revision: D76920013

Pull Request resolved: https://github.com/pytorch/pytorch/pull/156503
Approved by: https://github.com/swolchok
2025-06-25 22:56:15 +00:00
6303cc41b7 [ROCm] support CUDA_KERNEL_ASSERT using abort() (#155262)
We won't have the full message that __assert_fail would provide, but at least we won't silently do nothing.

Fixes #155045.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155262
Approved by: https://github.com/hongxiayang, https://github.com/jeffdaily

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
2025-06-18 23:52:35 +00:00
13044b2b04 Move c10/macros/Export.h to torch/standalone (#154850)
Summary: The goal of this PR and future follow-up PRs is to group a set of header files required by AOTInductor Standalone in a separate directory, ensuring they are implemented in a header-only manner.

Test Plan: CI

Bifferential Revision: D75756619

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154850
Approved by: https://github.com/janeyx99
2025-06-03 06:18:59 +00:00
c4d1ff02f8 [Lint] Update clang-format to 19.1.4 (#153889)
All changes other than the one to `tools/linter/adapters/s3_init_config.json` are generated by newer clang-format
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153889
Approved by: https://github.com/cyyever, https://github.com/atalman
2025-05-20 14:12:46 +00:00
e2f9759bd0 Fix broken URLs (#152237)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152237
Approved by: https://github.com/huydhn, https://github.com/malfet
2025-04-27 09:56:42 +00:00
cyy
79e8a69257 Enable move warnings for torch targets (#149923)
This PR enables more move warnings for torch targets and fixes some code.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149923
Approved by: https://github.com/malfet
2025-03-26 08:38:13 +00:00
93633d0e80 [ROCm][Windows] Fix export macros (#144098)
For correct import and export of functions when the dynamic linkage is used for HIP libraries on windows, the appropriate export/import macros need to be put in place. This Pull Request utilizes existing CUDA import/export macros by converting them to corresponding HIP macros during the hipification process.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144098
Approved by: https://github.com/jeffdaily
2025-01-04 17:12:46 +00:00
73fde0d940 [PyTorch] Unbreak C10_ALWAYS_INLINE_ATTRIBUTE on MSVC (#139363)
At least one recent version refuses to accept it on a lambda, so disable.

Differential Revision: [D65250256](https://our.internmc.facebook.com/intern/diff/D65250256/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D65250256/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/139363
Approved by: https://github.com/ngimel, https://github.com/malfet
2024-10-31 07:40:05 +00:00
dbf0fa811a Remove C10_HOST_CONSTEXPR_EXCEPT_WIN_CUDA and CONSTEXPR_EXCEPT_WIN_CUDA (#138479)
BC linter suppressed due to removal of `tools/linter/adapters/constexpr_linter.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138479
Approved by: https://github.com/eqy, https://github.com/malfet
2024-10-24 07:51:05 +00:00
b1b7c714ed Add deprecated C10_UNUSED and C10_NODISCARD macros back (#138398)
For backwards compatibility. Disallow internal use.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138398
Approved by: https://github.com/malfet
2024-10-20 00:21:19 +00:00
fddabc6e0b C10_UNUSED to [[maybe_unused]] (#6357) (#138364)
Summary: Pull Request resolved: https://github.com/pytorch/executorch/pull/6357

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138364
Approved by: https://github.com/Skylion007, https://github.com/eqy
2024-10-19 13:17:43 +00:00
cyy
2f6a70bfea Enable more UBSAN checks (#138288)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138288
Approved by: https://github.com/ezyang
2024-10-19 13:00:26 +00:00
542f7c8383 Eliminate C10_NODISCARD (#138336)
Test Plan: Sandcastle

Reviewed By: swolchok

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138336
Approved by: https://github.com/Skylion007
2024-10-19 02:54:06 +00:00
8dd575faf6 [BE] Modernize C10_UNUSED (#138102)
[`[[maybe_unused]]`](https://en.cppreference.com/w/cpp/language/attributes/maybe_unused) is part of C++17 standard

Test Plan: Sandcastle

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138102
Approved by: https://github.com/Skylion007, https://github.com/albanD, https://github.com/malfet, https://github.com/eqy
2024-10-18 16:33:01 +00:00
8abbd1c7c7 Modernize C10_NODISCARD to [[nodiscard]] (#138151)
PyTorch is C++17 now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138151
Approved by: https://github.com/Skylion007, https://github.com/albanD
2024-10-17 18:07:39 +00:00
c8a7da305b [PyTorch] Add attribute version of C10_ALWAYS_INLINE (#136445)
Sometimes (such as on a lambda), you need `__attribute__((always_inline))` but not `inline`.

Differential Revision: [D63266917](https://our.internmc.facebook.com/intern/diff/D63266917/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136445
Approved by: https://github.com/malfet
2024-10-03 18:18:37 +00:00
db193d1e29 add msg to _assert_async (#134813)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134813
Approved by: https://github.com/ezyang, https://github.com/eqy, https://github.com/albanD
2024-09-03 06:33:18 +00:00
8629939a51 [torch/c10] Add C10_UBSAN_ENABLED macro and use it to disable SymInt_… (#127967)
Adds `C10_UBSAN_ENABLED` macro and use it to disable `SymIntTest::Overflows` (fails under `signed-integer-overflow` UBSAN check).

Also cleans up UBSAN guard in `jit/test_misc.cpp` to use `C10_UBSAN_ENABLED`  and the existing `C10_ASAN_ENABLED` instead of locally defining `HAS_ASANUBSAN`.

> NOTE: This should fix `SymIntTest::Overflows` failing under ubsan in fbcode too...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127967
Approved by: https://github.com/atalman, https://github.com/d4l3k, https://github.com/malfet
2024-06-14 16:01:12 +00:00
bae3b17fd9 Tweak a comment and fix spelling (#126681)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126681
Approved by: https://github.com/Skylion007
2024-05-21 17:19:06 +00:00
ae9a4fa63c [ROCm] enforce ROCM_VERSION >= 6.0 (#125646)
Remove any code relying on ROCM_VERSION < 6.0.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125646
Approved by: https://github.com/albanD, https://github.com/eqy
2024-05-12 18:01:28 +00:00
cyy
d5d13ab15e Remove C10_FALLTHROUGH (#120157)
Since [[fallthrough]] is supported in our C++17 compilers and no other repo is using it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120157
Approved by: https://github.com/Skylion007
2024-02-21 06:18:58 +00:00
9aa8bbf7f2 [BE] Delete C10_IS_TRIVIALLY_COPYABLE (#120120)
It's not used anywhere in PyTorch after custom implementation of `c10::optional` is gone, and it's not used by the repo as well, see https://github.com/search?type=code&q=C10_IS_TRIVIALLY_COPYABLE+org%3Apytorch

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120120
Approved by: https://github.com/Skylion007, https://github.com/albanD, https://github.com/huydhn
2024-02-17 01:04:30 +00:00
79811e765c [2/4] Intel GPU Runtime Upstreaming for Device (#116833)
# Motivation
According to [[1/4] Intel GPU Runtime Upstreaming for Device](https://github.com/pytorch/pytorch/pull/116019), as mentioned in [[RFC] Intel GPU Runtime Upstreaming](https://github.com/pytorch/pytorch/issues/114842), the second PR  covers the changes under `aten`.

# Design
We will compile the code for XPU separately into a library named `libtorch_xpu.so`. Currently, it primarily offers device-related APIs, including
- `getCurrentDeviceProperties`
- `getDeviceProperties`
- `getGlobalIdxFromDevice`
- `getDeviceFromPtr`

# Additional Context
`XPUHooks` is an indispensable part of the runtime. We upstream `XPUHooks` in this PR since there is some code related to `Device` in it and we also refine some logic and code to avoid forward declaration in `DLPack`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116833
Approved by: https://github.com/EikanWang, https://github.com/jgong5, https://github.com/gujinghui, https://github.com/malfet
2024-01-18 05:02:42 +00:00
50049cfaa0 [1/4] Intel GPU Runtime Upstreaming for Device (#116019)
# Motivation
As mentioned in [[RFC] Intel GPU Runtime Upstreaming](https://github.com/pytorch/pytorch/issues/114842), The first runtime component we would like to upstream is `Device` which contains the device management functions of Intel GPU's runtime. To facilitate the code review, we split the code changes into 4 PRs. This is one of the 4 PRs and covers the changes under `c10`.

# Design
Intel GPU device is a wrapper of sycl device on which kernels can be executed. In our design, we will maintain a sycl device pool containing all the GPU devices of the current machine, and manage the status of the device pool by PyTorch. The thread local safe is considered in this design. The corresponding C++ files related to `Device` will be placed in c10/xpu folder. And we provide the c10 device runtime APIs, like
  - `c10::xpu::device_count`
  - `c10::xpu::set_device`
  - ...

# Additional Context
In our plan, 4 PRs should be submitted to PyTorch for `Device`:
1. for c10
2. for aten
3. for python frontend
4. for lazy initialization shared with CUDA

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116019
Approved by: https://github.com/gujinghui, https://github.com/jgong5, https://github.com/EikanWang, https://github.com/malfet
2024-01-12 07:36:25 +00:00
9ac0e6971a Revert "[1/4] Intel GPU Runtime Upstreaming for Device (#116019)"
This reverts commit b4cebe2c34242ceee3a1bc285f426662942a29ac.

Reverted https://github.com/pytorch/pytorch/pull/116019 on behalf of https://github.com/malfet due to Broke internal and periodic buck builds, see https://github.com/pytorch/pytorch/actions/runs/7414664129/job/20176215868 ([comment](https://github.com/pytorch/pytorch/pull/116019#issuecomment-1879030285))
2024-01-05 17:36:39 +00:00
b4cebe2c34 [1/4] Intel GPU Runtime Upstreaming for Device (#116019)
# Motivation
As mentioned in [[RFC] Intel GPU Runtime Upstreaming](https://github.com/pytorch/pytorch/issues/114842), The first runtime component we would like to upstream is `Device` which contains the device management functions of Intel GPU's runtime. To facilitate the code review, we split the code changes into 4 PRs. This is one of the 4 PRs and covers the changes under `c10`.

# Design
Intel GPU device is a wrapper of sycl device on which kernels can be executed. In our design, we will maintain a sycl device pool containing all the GPU devices of the current machine, and manage the status of the device pool by PyTorch. The thread local safe is considered in this design. The corresponding C++ files related to `Device` will be placed in c10/xpu folder. And we provide the c10 device runtime APIs, like
  - `c10::xpu::device_count`
  - `c10::xpu::set_device`
  - ...

# Additional Context
In our plan, 4 PRs should be submitted to PyTorch for `Device`:
1. for c10
2. for aten
3. for python frontend
4. for lazy initialization shared with CUDA

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116019
Approved by: https://github.com/gujinghui, https://github.com/jgong5, https://github.com/EikanWang, https://github.com/malfet
2024-01-04 17:35:04 +00:00
53e32d12c4 [c10] Use nested namespace in c10/cuda (#116464)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116464
Approved by: https://github.com/Skylion007
2023-12-27 23:14:00 +00:00
d7caef7996 [CI] Update clang-format (#116002)
To 17.0.6 build using https://github.com/pytorch/test-infra/blob/main/.github/workflows/clang-tidy-linux.yml

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116002
Approved by: https://github.com/suo
2023-12-18 14:58:46 +00:00
66a76516bf [ROCm] Disabling Kernel Asserts for ROCm by default - fix and clean up and refactoring (#114660)
Related to #103973  #110532 #108404 #94891

**Context:**
As commented in 6ae0554d11/cmake/Dependencies.cmake (L1198)
Kernel asserts are enabled by default for CUDA and disabled for ROCm.
However it is somewhat broken, and Kernel assert was still enabled for ROCm.

Disabling kernel assert is also needed for users who do not have PCIe atomics support. These community users have verified that disabling the kernel assert in PyTorch/ROCm platform fixed their pytorch workflow, like torch.sum script, stable-diffusion. (see the related issues)

**Changes:**

This pull request serves the following purposes:
* Refactor and clean up the logic,  make it simpler for ROCm to enable and disable Kernel Asserts
* Fix the bug that Kernel Asserts for ROCm was not disabled by default.

Specifically,
- Renamed `TORCH_DISABLE_GPU_ASSERTS` to `C10_USE_ROCM_KERNEL_ASSERT` for the following reasons:
(1) This variable only applies to ROCm.
(2) The new name is more align with #define CUDA_KERNEL_ASSERT function.
(3) With USE_ in front of the name, we can easily control it with environment variable to turn on and off this feature during build (e.g. `USE_ROCM_KERNEL_ASSERT=1 python setup.py develop` will enable kernel assert for ROCm build).
- Get rid of the `ROCM_FORCE_ENABLE_GPU_ASSERTS' to simplify the logic and make it easier to understand and maintain
- Added `#cmakedefine` to carry over the CMake variable to C++

**Tests:**
(1) build with default mode and verify that USE_ROCM_KERNEL_ASSERT  is OFF(0), and kernel assert is disabled:

```
python setup.py develop
```
Verify CMakeCache.txt has correct value.
```
/xxxx/pytorch/build$ grep USE_ROCM_KERNEL_ASSERT CMakeCache.txt
USE_ROCM_KERNEL_ASSERT:BOOL=0
```
Tested the following code in ROCm build and CUDA build, and expected the return code differently.

```
subprocess.call([sys.executable, '-c', "import torch;torch._assert_async(torch.tensor(0,device='cuda'));torch.cuda.synchronize()"])
```
This piece of code is adapted from below unit test to get around the limitation that this unit test now was skipped for ROCm. (We will check to enable this unit test in the future)

```
python test/test_cuda_expandable_segments.py -k test_fixed_cuda_assert_async
```

Ran the following script, expecting r ==0 since the CUDA_KERNEL_ASSERT is defined as nothing:
```
>> import sys
>>> import subprocess
>>> r=subprocess.call([sys.executable, '-c', "import torch;torch._assert_async(torch.tensor(0,device='cuda'));torch.cuda.synchronize()"])
>>> r
0
```

(2) Enable the kernel assert by building with USE_ROCM_KERNEL_ASSERT=1, or USE_ROCM_KERNEL_ASSERT=ON
```
USE_ROCM_KERNEL_ASSERT=1 python setup.py develop
```

Verify `USE_ROCM_KERNEL_ASSERT` is `1`
```
/xxxx/pytorch/build$ grep USE_ROCM_KERNEL_ASSERT CMakeCache.txt
USE_ROCM_KERNEL_ASSERT:BOOL=1
```

Run the assert test, and expected return code not equal to 0.

```
>> import sys
>>> import subprocess
>>> r=subprocess.call([sys.executable, '-c', "import torch;torch._assert_async(torch.tensor(0,device='cuda'));torch.cuda.synchronize()"])
>>>/xxxx/pytorch/aten/src/ATen/native/hip/TensorCompare.hip:108: _assert_async_cuda_kernel: Device-side assertion `input[0] != 0' failed.
:0:rocdevice.cpp            :2690: 2435301199202 us: [pid:206019 tid:0x7f6cf0a77700] Callback: Queue 0x7f64e8400000 aborting with error : HSA_STATUS_ERROR_EXCEPTION: An HSAIL operation resulted in a hardware exception. code: 0x1016

>>> r
-6
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114660
Approved by: https://github.com/jeffdaily, https://github.com/malfet, https://github.com/jithunnair-amd
2023-12-13 15:44:53 +00:00
cyy
e676ec2fe7 Fix undefined __assert_fail on FreeBSD (#111761)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111761
Approved by: https://github.com/Skylion007
2023-10-23 12:46:03 +00:00
97396cdbb2 Fix undefined behavior detected by clang-12 (#106354)
Compiler behavior when non-zero offset is added to a null pointer is undefined and is a bad habit.

- When `lapackEig` is called with to estimate a workspace size, do not add matrix size to the W pointer.
- When `unpack_pivots_cpu_kernel` with zero `dim_size` exit early.
- When `topk_impl_loop` is called with  `k` is zero, exit right away as output tensors are empty anyway.
- Ignore adding non-zero storage-offset in `TensorImpl::data_ptr_impl_impl`, which can be the case if tensor is created as `torch.empty(3)[4:]`.
- In `s_addmm_out_sparse_dense_worker` do not call `axpy` over an empty vector.
- In `_sparse_binary_op_intersection_kernel_impl` do skip computing `ptr_indices_dim` when `sparse_dim` is empty.
- Exit `grid_sample` forward/backward kernels earlier if either `input` or `grid` are empty tensors.

Found by asan in clang-12

Before the change UBSan report looks as follows:
```
 ASAN_SYMBOLIZER_PATH=/usr/lib/llvm-12/bin/llvm-symbolizer UBSAN_OPTIONS=print_stacktrace=1 LD_PRELOAD=/usr/lib/llvm-12/lib/clang/12.0.1/lib/linux/libclang_rt.asan-x86_64.so python test_fx_experimental.py -v -k test_normalize_operator_exhaustive_linalg_eig_cpu_float32
Test results will be stored in test-reports/python-unittest/test_fx_experimental

Running tests...
----------------------------------------------------------------------
  test_normalize_operator_exhaustive_linalg_eig_cpu_float32 (__main__.TestNormalizeOperatorsCPU) ... /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/overrides.py:111: UserWarning: 'has_cuda' is deprecated, please use 'torch.backends.cuda.is_built()'
  torch.has_cuda,
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/overrides.py:112: UserWarning: 'has_cudnn' is deprecated, please use 'torch.backends.cudnn.is_available()'
  torch.has_cudnn,
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/overrides.py:118: UserWarning: 'has_mps' is deprecated, please use 'torch.backends.mps.is_built()'
  torch.has_mps,
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/overrides.py:119: UserWarning: 'has_mkldnn' is deprecated, please use 'torch.backends.mkldnn.is_available()'
  torch.has_mkldnn,
/var/lib/jenkins/workspace/aten/src/ATen/native/BatchLinearAlgebra.cpp:937:17: runtime error: applying non-zero offset 20 to null pointer
    #0 0x7f2025794888 in void at::native::lapackEig<float, float>(char, char, int, float*, int, float*, float*, int, float*, int, float*, int, float*, int*) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x9945888)
    #1 0x7f20257da256 in void at::native::(anonymous namespace)::apply_linalg_eig<float>(at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, bool) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x998b256)
    #2 0x7f20257d902d in at::native::(anonymous namespace)::linalg_eig_kernel(at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor const&, bool) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x998a02d)
    #3 0x7f20257b5b3d in at::native::linalg_eig_out_info(at::Tensor const&, at::Tensor&, at::Tensor&, at::Tensor&, bool) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x9966b3d)
    #4 0x7f20257b4770 in at::native::linalg_eig_out(at::Tensor const&, at::Tensor&, at::Tensor&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x9965770)
    #5 0x7f20280710e6 in c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor&, at::Tensor&> (at::Tensor const&, at::Tensor&, at::Tensor&), &(at::(anonymous namespace)::(anonymous namespace)::wrapper_CPU_out_linalg_eig_out(at::Tensor const&, at::Tensor&, at::Tensor&))>, std::tuple<at::Tensor&, at::Tensor&>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor&, at::Tensor&> >, std::tuple<at::Tensor&, at::Tensor&> (at::Tensor const&, at::Tensor&, at::Tensor&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor&, at::Tensor&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xc2220e6)
    #6 0x7f202727a045 in at::_ops::linalg_eig_out::call(at::Tensor const&, at::Tensor&, at::Tensor&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xb42b045)
    #7 0x7f20257b7e29 in at::native::linalg_eig(at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x9968e29)
    #8 0x7f2028070bf0 in c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor> (at::Tensor const&), &(at::(anonymous namespace)::(anonymous namespace)::wrapper_CPU__linalg_eig(at::Tensor const&))>, std::tuple<at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&> >, std::tuple<at::Tensor, at::Tensor> (at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xc221bf0)
    #9 0x7f2026b1f787 in std::tuple<at::Tensor, at::Tensor> c10::Dispatcher::redispatch<std::tuple<at::Tensor, at::Tensor>, at::Tensor const&>(c10::TypedOperatorHandle<std::tuple<at::Tensor, at::Tensor> (at::Tensor const&)> const&, c10::DispatchKeySet, at::Tensor const&) const (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xacd0787)
    #10 0x7f20273230a7 in at::_ops::linalg_eig::redispatch(c10::DispatchKeySet, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xb4d40a7)
    #11 0x7f202c3cc32d in torch::autograd::VariableType::(anonymous namespace)::linalg_eig(c10::DispatchKeySet, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x1057d32d)
    #12 0x7f202c3cba96 in c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&), &(torch::autograd::VariableType::(anonymous namespace)::linalg_eig(c10::DispatchKeySet, at::Tensor const&))>, std::tuple<at::Tensor, at::Tensor>, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&> >, std::tuple<at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x1057ca96)
    #13 0x7f20272798e0 in at::_ops::linalg_eig::call(at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xb42a8e0)
    #14 0x7f2043d97ae3 in torch::autograd::THPVariable_linalg_eig(_object*, _object*, _object*) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_python.so+0x23feae3)
    #15 0x5072d6 in cfunction_call /usr/local/src/conda/python-3.9.17/Objects/methodobject.c:543:19
    ...

SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /var/lib/jenkins/workspace/aten/src/ATen/native/BatchLinearAlgebra.cpp:937:17 in
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106354
Approved by: https://github.com/huydhn, https://github.com/lezcano
2023-08-03 05:36:03 +00:00
a1d0db1c60 [pytorch] Fix MSVC unexpected tokens following preprocessor directive (#105922)
Summary:
Fix this warning:
```
caffe2\c10\macros\Macros.h(138): warning C4067: unexpected tokens following preprocessor directive - expected a newline
```
`caffe2/c10/util/variant.h` already has a similar to check and define a stub for `__has_attribute(x)`, so this would not be new to caffe2/pytorch.

Test Plan: CI should complete, still with plenty of caffe2 warnings but this one should be gone from the Windows build log

Differential Revision: D47735319

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105922
Approved by: https://github.com/kit1980
2023-07-27 06:03:31 +00:00
076781ba9b Revert "fix building errors on FreeBSD (#105897)"
This reverts commit 5c5eece6d85d5be3485f96a6da3905f2dd28331b.

Reverted https://github.com/pytorch/pytorch/pull/105897 on behalf of https://github.com/PaliC due to causing regressions on internal models ([comment](https://github.com/pytorch/pytorch/pull/105897#issuecomment-1652840218))
2023-07-27 03:01:44 +00:00
cyy
5c5eece6d8 fix building errors on FreeBSD (#105897)
Although FreeBSD is not officially supported, this PR fixes some errors on FreeBSD.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105897
Approved by: https://github.com/kit1980
2023-07-26 08:11:42 +00:00
2ac9086987 run buildifier on unified build files (#98141)
This is pretty tricky. buildifier by default doesn't do much to these
files. It does a little more if you tell it that they are
`BUILD.bazel` files with -type=build. But it can do even more if you
remove the target definitions from the `def define_rules()` wrapper
and dedent them.

I wrote a little wrapper that does that. I'll submit it at a later
date.

Differential Revision: [D44606558](https://our.internmc.facebook.com/intern/diff/D44606558/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D44606558/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98141
Approved by: https://github.com/ezyang, https://github.com/PaliC
2023-04-04 00:37:19 +00:00
cyy
37f7c00a8a More fixes and improved clang-tidy checkers (#93213)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93213
Approved by: https://github.com/Skylion007
2023-02-01 14:44:17 +00:00
cyy
f172feae0d More tidy fixes (#93069)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93069
Approved by: https://github.com/Skylion007
2023-01-27 06:40:50 +00:00
eqy
f8026413f5 Fix CUDA_MAX_THREADS_PER_SM for sm_89 (#91972)
Basically the same as #88644, to fix warnings like `ptxas warning : Value of threads per SM for entry _ZN2at6native13reduce_kernelILi512ELi1ENS0_8ReduceOpIfNS0_10NormTwoffEEjfLi4EEEEEvT1_ is out of range. .minnctapersm will be ignored`

CC @ptrblck @ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91972
Approved by: https://github.com/ngimel
2023-01-12 00:30:27 +00:00
bfdc0358dc Compile fix for Clang + libc++ (#91212)
Summary:
LLVM 15 has a compile issue with the deprecated __has_trivial_copy. Update the GCC ifdef logic to exclude Clang + libc++.

```
caffe2/c10/util/Optional.h:536:13: error: builtin __has_trivial_copy is deprecated; use __is_trivially_copyable instead [-Werror,-Wdeprecated-builtins]
            C10_IS_TRIVIALLY_COPYABLE(T) &&
            ^
caffe2/c10/macros/Macros.h:438:38: note: expanded from macro 'C10_IS_TRIVIALLY_COPYABLE'
#define C10_IS_TRIVIALLY_COPYABLE(T) __has_trivial_copy(T)
```

Test Plan: CI

Reviewed By: kit1980

Differential Revision: D42180203

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91212
Approved by: https://github.com/kit1980, https://github.com/soumith
2022-12-21 11:19:58 +00:00
322e4b4c8a set -Wsuggest-override for builds (#89852)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/pytorch/pytorch/pull/89852).
* __->__ #89852
* #89851

set -Wsuggest-override for builds

Summary: This was flagged by a Meta internal build.

Test Plan: Rely on CI.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89852
Approved by: https://github.com/malfet
2022-12-19 22:08:47 +00:00
5654fed23e Export c10/[macros|util] headers to be used by internal inductor builds (#89249)
Summary: Fixes package boundary violation that existed in previous implementation

Test Plan: CI

Differential Revision: D41391862

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89249
Approved by: https://github.com/izaitsevfb
2022-11-18 10:51:07 +00:00
3e30a9ea1c Fix CUDA_MAX_THREADS_PER_SM for sm_87 (#88644)
#88326
CC @ngimel @ptrblck

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88644
Approved by: https://github.com/ngimel
2022-11-08 19:44:23 +00:00
fbd08fb358 Introduce TORCH_DISABLE_GPU_ASSERTS (#84190)
- Asserts for CUDA are enabled by default
- Disabled for ROCm by default by setting `TORCH_DISABLE_GPU_ASSERTS` to `ON`
- Can be enabled for ROCm by setting above variable to`OFF` during build or can be forcefully enabled by setting `ROCM_FORCE_ENABLE_GPU_ASSERTS:BOOL=ON`

This is follow up changes as per comment in PR #81790, comment [link](https://github.com/pytorch/pytorch/pull/81790#issuecomment-1215929021)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84190
Approved by: https://github.com/jeffdaily, https://github.com/malfet
2022-11-04 04:43:05 +00:00