pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Author	SHA1	Message	Date
Richard Barnes	bcb4444cec	PyTorch -> C++17 (#98209 ) (#100557 ) <!-- copilot:summary --> ### <samp>🤖 Generated by Copilot at 4f0b524</samp> This pull request updates the codebase and the documentation to use C++17 instead of C++14 as the minimum required C++ standard. This affects the `ATen`, `c10`, and `torch` libraries and their dependencies, as well as the CI system and the `conda` package metadata. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100557 Approved by: https://github.com/malfet	2023-05-19 00:49:08 +00:00
PyTorch MergeBot	da02ccc60e	Revert "PyTorch -> C++17 (#98209 ) (#100557 )" This reverts commit 083f88e12632059e7e710634fc8708c8205678d5. Reverted https://github.com/pytorch/pytorch/pull/100557 on behalf of https://github.com/jeanschmidt due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/100557#issuecomment-1543285863))	2023-05-11 03:43:11 +00:00
Richard Barnes	083f88e126	PyTorch -> C++17 (#98209 ) (#100557 ) <!-- copilot:summary --> ### <samp>🤖 Generated by Copilot at 4f0b524</samp> This pull request updates the codebase and the documentation to use C++17 instead of C++14 as the minimum required C++ standard. This affects the `ATen`, `c10`, and `torch` libraries and their dependencies, as well as the CI system and the `conda` package metadata. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100557 Approved by: https://github.com/malfet	2023-05-10 04:47:35 +00:00
Catherine Lee	2ec6eb3d09	Revert "PyTorch -> C++17 (#98209 )" (#100497 ) This reverts commit 8f0c825d36d6737000dd93bc86aa18761166a7b6. https://github.com/pytorch/pytorch/pull/98209#issuecomment-1532099965, cannot revert normally due to unmerged linked diff Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/100497 Approved by: https://github.com/huydhn, https://github.com/malfet	2023-05-02 21:22:31 +00:00
Richard Barnes	8f0c825d36	PyTorch -> C++17 (#98209 ) This diff locks in C++17 as the minimum standard with which PyTorch can be compiled. This makes it possible to use all C++17 features in PyTorch. This breaks backward compatibility in the sense that users with older compilers may find their compilers no longer are sufficient for the job. Summary: #buildmore Differential Revision: D44356879 Pull Request resolved: https://github.com/pytorch/pytorch/pull/98209 Approved by: https://github.com/ezyang, https://github.com/malfet, https://github.com/PaliC	2023-05-02 19:41:50 +00:00
PyTorch MergeBot	befe3b68de	Revert "Clean up C++14 code (#92216 )" This reverts commit dfbdfb276eb5b0492b39036f1c49c196b826587f. Reverted https://github.com/pytorch/pytorch/pull/92216 on behalf of https://github.com/atalman due to fails internal build	2023-01-18 21:24:23 +00:00
cyy	dfbdfb276e	Clean up C++14 code (#92216 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/92216 Approved by: https://github.com/ezyang	2023-01-18 08:14:54 +00:00
Nikita Shulga	36ac095ff8	Migrate PyTorch to C++17 (#85969 ) With CUDA-10.2 gone we can finally do it! This PR mostly contains build system related changes, invasive functional ones are to be followed. Among many expected tweaks to the build system, here are few unexpected ones: - Force onnx_proto project to be updated to C++17 to avoid `duplicate symbols` error when compiled by gcc-7.5.0, as storage rule for `constexpr` changed in C++17, but gcc does not seem to follow it - Do not use `std::apply` on CUDA but rely on the built-in variant, as it results in test failures when CUDA runtime picks host rather than device function when `std::apply` is invoked from CUDA code. - `std::decay_t` -> `::std::decay_t` and `std::move`->`::std::move` as VC++ for some reason claims that `std` symbol is ambigious - Disable use of `std::aligned_alloc` on Android, as its `libc++` does not implement it. Some prerequisites: - https://github.com/pytorch/pytorch/pull/89297 - https://github.com/pytorch/pytorch/pull/89605 - https://github.com/pytorch/pytorch/pull/90228 - https://github.com/pytorch/pytorch/pull/90389 - https://github.com/pytorch/pytorch/pull/90379 - https://github.com/pytorch/pytorch/pull/89570 - https://github.com/facebookincubator/gloo/pull/336 - https://github.com/facebookincubator/gloo/pull/343 - `919676fb32` Fixes https://github.com/pytorch/pytorch/issues/56055 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85969 Approved by: https://github.com/ezyang, https://github.com/kulinseth	2022-12-08 02:27:48 +00:00
Lukas N Wirz	301d9c0556	Remove deprecated usage of is_pod/is_pod_v (#88918 ) … as equivalent replacements for std::is_pod and std::is_pod_v because they are deprecated in C++20. When consuming libtorch header files in a project that uses C++20, there are warnings about std::is_pod being deprecated. This patch fixes that issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88918 Approved by: https://github.com/ezyang	2022-12-05 16:50:00 +00:00
Vasu Agrawal	00a1065286	[pytorch] Inline std::forward definition (#85255 ) Summary: Alternative (probably better) solution to the problem laid out in D39562394. Test Plan: CI should be green. Differential Revision: D39612710 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85255 Approved by: https://github.com/ezyang	2022-09-20 17:15:59 +00:00
Lukas N Wirz	5af48581b5	In order to make pytorch headers consumable from cpp20 code bases, … (#79985 ) … all instances of std::result_of and std:result_of_t are conditionally replaced by std::invoke_result and std::invoke_result_t if __cpp_lib_is_invocable >= 201703L. std::invoke_result was only introduced in c++17, so it should probably not be required yet. Fixes #71657 and a small part of #69290 Tested on Centos 7 / gcc11 + a private project that requires cpp20. I think the main questions to check by a maintainer are, - whether my choices of preprocessor blocks are appropriate - whether there are any very subtle differences between std::result_of and std::invoke_result that I have missed - whether in any of the replacements the 'new' side can/should be simplified further Pull Request resolved: https://github.com/pytorch/pytorch/pull/79985 Approved by: https://github.com/ezyang	2022-07-04 20:14:36 +00:00
jason_w	f42202d26c	'typename Base' is checked repeatedly (#72842 ) Summary: 'typename Base' is checked repeatedly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/72842 Reviewed By: albanD Differential Revision: D34481951 Pulled By: swolchok fbshipit-source-id: bd07fb87540397fd2f1829a8d0dad167c6a3c6d0 (cherry picked from commit e63081c469b2073c458c3a4a9530bcc08025c3f7)	2022-03-01 20:34:14 +00:00
Pruthvi Madugundu	085e2f7bdd	[ROCm] Changes not to rely on CUDA_VERSION or HIP_VERSION (#65610 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65610 - Replace HIP_PLATFORM_HCC with USE_ROCM - Dont rely on CUDA_VERSION or HIP_VERSION and use USE_ROCM and ROCM_VERSION. - In the next PR - Will be removing the mapping from CUDA_VERSION to HIP_VERSION and CUDA to HIP in hipify. - HIP_PLATFORM_HCC is deprecated, so will add HIP_PLATFORM_AMD to support HIP host code compilation on gcc. cc jeffdaily sunway513 jithunnair-amd ROCmSupport amathews-amd Reviewed By: jbschlosser Differential Revision: D30909053 Pulled By: ezyang fbshipit-source-id: 224a966ebf1aaec79beccbbd686fdf3d49267e06	2021-09-29 09:55:43 -07:00
Scott Wolchok	44cc873fba	[PyTorch] Autoformat c10 (#56830 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56830 Opt into formatting on GitHub and format everything. This is a trial run before turning on formatting for more and eventually all of the codebase. Test Plan: CI Reviewed By: zertosh Differential Revision: D27979080 fbshipit-source-id: a80f0c48691c08ae8ca0af06377b87e6a2351151	2021-04-30 21:23:28 -07:00
skyline75489	cdac61ecd4	Prevent VS from emitting ambiguous symbol errors (third time) (#53490 ) Summary: Fixes: https://github.com/pytorch/pytorch/issues/53409 First: https://github.com/pytorch/pytorch/issues/15697 Second: https://github.com/pytorch/pytorch/issues/17863 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53490 Reviewed By: VitalyFedyunin Differential Revision: D26946687 Pulled By: mrshenli fbshipit-source-id: 27f85abecbb75456354cc0373529c8cadc8133bd	2021-03-11 13:51:41 -08:00
Chester Liu	8177f63c91	Reorganize and refine the Windows.h import in C++ files (#48009 ) Summary: This PR aims to reduce the import overhead and symbol noises from the `windows.h` headers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48009 Reviewed By: gchanan Differential Revision: D25045840 Pulled By: ezyang fbshipit-source-id: 01fda70f433ba2dd0cd2d7cd676ab6ffe9d98b90	2020-11-20 14:21:09 -08:00
Basil Hosmer	6b94830cdc	faithful signature support in BoxedKernelWrapper (#47267 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47267 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D24701488 Pulled By: bhosmer fbshipit-source-id: dbce246319670f9590c5762ad20c26cb24575fe8	2020-11-10 13:58:36 -08:00
Sebastian Messmer	63c3b89c1c	Simplify code with decltype(auto) (#30922 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30922 New c++14 feature we can use now ghstack-source-id: 103767403 Test Plan: waitforsandcastle Differential Revision: D18869644 fbshipit-source-id: 54541c8004b2116386668a31eb9b0410a603b7dc	2020-05-11 21:31:18 -07:00
Sebastian Messmer	77d8a44802	If we're building on C++17, use actual "if constexpr" (#38154 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38154 This should give better error messages and shorter stack traces on C++17 builds (e.g. fbcode) ghstack-source-id: 103775564 Test Plan: waitforsandcastle Differential Revision: D21483327 fbshipit-source-id: 184d1f9c0543bf43dc9713fa97fcc5955e7be319	2020-05-11 12:22:19 -07:00
Sebastian Messmer	379e717a1b	Back out "Revert D18927220: if_constexpr for C++14" (#37792 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37792 Original commit changeset: a1b8755a2790 ghstack-source-id: 103609715 Test Plan: waitforsandcastle Differential Revision: D21389755 fbshipit-source-id: 1a3c74295dbfbf07fe225be9bcd47d11e31a20fa	2020-05-07 15:20:55 -07:00
Mike Ruberry	b428f454e1	Revert D18927220: if_constexpr for C++14 Test Plan: revert-hammer Differential Revision: D18927220 Original commit changeset: 19a135e00af6 fbshipit-source-id: a1b8755a27903b98b742881b3ecce4f5e99543b2	2020-04-26 04:27:53 -07:00
Sebastian Messmer	f5e6f1f333	if_constexpr for C++14 (#31091 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31091 This implements a C++17 "if constexpr" like feature for C++14. This can be used, for example, to replace SFINAE or to force the compiler to remove some parts of a function in the assembly based on a condition. PRs stacked on top will use this to simplify some of our template metaprogramming. ghstack-source-id: 102867141 Test Plan: unit tests Differential Revision: D18927220 fbshipit-source-id: 19a135e00af6ebb0139ce3730353762d4512158f	2020-04-25 11:31:51 -07:00
Sebastian Messmer	2fa51dde28	Remove unnecessary tensor copies (#33732 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33732 move and forward instead of copy Benchmarks: A microbenchmark calling the add operation on two tensors in a tight loop shows a 5% improvement in performance. No visible change for a model like resnet that does more work in its kernels. ghstack-source-id: 99161486 Test Plan: benchmarks Differential Revision: D20082642 fbshipit-source-id: eeac59686f8621dd5eaa85d61e6d219bba48c847	2020-02-28 14:47:04 -08:00
Xiang Gao	f62f1b2ef0	Revert "Revert D19964089: [pytorch][PR] Allow vectorized gpu loop to … (#33553 ) Summary: …have different argument types" This reverts commit 05fb160048b71c1b8b00d2083a08618318158c1a. Please go to https://github.com/pytorch/pytorch/pull/33558 and check the CUDA9 on CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/33553 Differential Revision: D20017575 Pulled By: ngimel fbshipit-source-id: a5fd78eea00c7b0925ab21fd90a7daeb66725f1a	2020-02-21 14:56:30 -08:00
Vitaly Fedyunin	05fb160048	Revert D19964089: [pytorch][PR] Allow vectorized gpu loop to have different argument types Test Plan: revert-hammer Differential Revision: D19964089 Original commit changeset: a1e8e62d1ebc fbshipit-source-id: fee9423d5924714f0e92eea712cde2d2163b3cf0	2020-02-20 08:19:21 -08:00
Gao, Xiang	1fe635be3c	Allow vectorized gpu loop to have different argument types (#33222 ) Summary: Although currently the only user of GPU loops that has args with different dtypes is `where`, it sounds strange to restrict the args to have the same dtype. Allowing args to have different dtypes also makes it possible for me to clean up legacy code by reusing current code to implement unrolled GPU loop for non-contiguous tensors. The stack storage of `elementwise_kernel_helper` is changed from `arg_t args[nt][arity]` to `traits:: ArgsTuple args[nt]`. Due to this change, we can no longer get element by `operator[]`, but instead we should use `std::get`. As a result, we can no longer unroll the loop wrt arity using pragma, but we have to create a `static_unroll` to make use of template meta-programming to do the same job. A good side effect of this change is, `invoke_with_array` is no longer needed and can be replaced with already existing `c10::guts::apply`. And we don't need the `namespace arg_type` workaround either. This makes the code less ugly. The same approach might also work for ROCm loops, but I didn't change anything on ROCm in this PR, because I don't want potential compilation error or perf regression to delay this PR. But after this gets merged, I will try on ROCm and send a separate PR to make the code less diverge if the same approach trivially applies (trivially apply means a mindless copy-paste doesn't introduce unexpected compilation error or perf regression). Assembly (https://github.com/zasdfgbnm/things/blob/master/2020Q1/disassembly-elementwise-vec.ipynb#33222): ``` Symbol: void at::native::modern::elementwise_kernel<4, 64, 4, at::native::add_kernel_cuda(at::TensorIterator&, c10::Scalar)::{lambda()https://github.com/pytorch/pytorch/issues/1}::operator()() const::{lambda()https://github.com/pytorch/pytorch/issues/4}::operator()() const::{lambda(float, float)https://github.com/pytorch/pytorch/issues/1}, at::detail::Array<char, 3> >(int, at::native::add_kernel_cuda(at::TensorIterator&, c10::Scalar)::{lambda()https://github.com/pytorch/pytorch/issues/1}::operator()() const::{lambda()https://github.com/pytorch/pytorch/issues/4}::operator()() const::{lambda(float, float)https://github.com/pytorch/pytorch/issues/1}, at::detail::Array<char, 3>) ASM: .section .text._ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_,"ax",progbits .sectioninfo @"SHI_REGISTERS=20" .align 128 .global _ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_ .type _ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_,function .size _ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_,(.L_40520 - _ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_) .other _ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_,@"STO_CUDA_ENTRY STV_DEFAULT" _ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_: .text._ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_: //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/CUDALoops.cuh", line 253 /0000/ IMAD.MOV.U32 R1, RZ, RZ, c[0x0][0x28] ; /0010/ @!PT SHFL.IDX PT, RZ, RZ, RZ, RZ ; /0020/ S2R R9, SR_CTAID.X ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 39 /0030/ S2R R0, SR_TID.X ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/CUDALoops.cuh", line 253 /0040/ IMAD.SHL.U32 R9, R9, 0x100, RZ ; /0050/ IADD3 R5, -R9, c[0x0][0x160], RZ ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/CUDALoops.cuh", line 227 /0060/ SHF.R.S32.HI R17, RZ, 0x1f, R9 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/CUDALoops.cuh", line 255 /0070/ ISETP.GE.AND P0, PT, R5, 0x100, PT ; /0080/ @!P0 BRA `(.L_2919) ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/CUDALoops.cuh", line 227 /0090/ IMAD.SHL.U32 R12, R9.reuse, 0x4, RZ ; /00a0/ SHF.L.U64.HI R17, R9, 0x2, R17 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/CUDALoops.cuh", line 229 /00b0/ IADD3 R8, P0, R12.reuse, c[0x0][0x188], RZ ; /00c0/ IADD3 R2, P1, R12, c[0x0][0x190], RZ ; /00d0/ IADD3.X R9, R17.reuse, c[0x0][0x18c], RZ, P0, !PT ; /00e0/ IADD3.X R3, R17, c[0x0][0x194], RZ, P1, !PT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 82 /00f0/ IMAD.WIDE R8, R0, 0x10, R8 ; /0100/ IMAD.WIDE R2, R0, 0x10, R2 ; /0110/ LDG.E.128.SYS R8, [R8] ; /0120/ LDG.E.128.SYS R4, [R2] ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/CUDALoops.cuh", line 227 /0130/ IADD3 R12, P0, R12, c[0x0][0x180], RZ ; /0140/ IADD3.X R13, R17, c[0x0][0x184], RZ, P0, !PT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 102 /0150/ IMAD.WIDE R12, R0, 0x10, R12 ; //## File "/usr/include/c++/8/tuple", line 1315 /0160/ FFMA R7, R7, c[0x0][0x168], R11 ; /0170/ FFMA R6, R6, c[0x0][0x168], R10 ; /0180/ FFMA R5, R5, c[0x0][0x168], R9 ; /0190/ FFMA R4, R4, c[0x0][0x168], R8 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 102 /01a0/ STG.E.128.SYS [R12], R4 ; /01b0/ EXIT ; .L_2919: //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 42 /01c0/ ISETP.GE.AND P0, PT, R0, R5, PT ; /01d0/ BMOV.32.CLEAR RZ, B0 ; /01e0/ BSSY B0, `(.L_2920) ; /01f0/ IMAD.MOV.U32 R4, RZ, RZ, RZ ; /0200/ CS2R R6, SRZ ; /0210/ IMAD.MOV.U32 R8, RZ, RZ, RZ ; /0220/ IMAD.MOV.U32 R10, RZ, RZ, RZ ; /0230/ P0 BRA `(.L_2921) ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 45 /0240/ IADD3 R3, P1, R9, R0, RZ ; /0250/ LEA.HI.X.SX32 R6, R0, R17, 0x1, P1 ; /0260/ LEA R2, P1, R3, c[0x0][0x188], 0x2 ; /0270/ LEA.HI.X R3, R3, c[0x0][0x18c], R6, 0x2, P1 ; /0280/ LDG.E.SYS R10, [R2] ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 46 /0290/ IADD3 R6, R0, 0x40, RZ ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 42 /02a0/ ISETP.GE.AND P1, PT, R6, R5, PT ; /02b0/ P1 BRA `(.L_2922) ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 45 /02c0/ LDG.E.SYS R6, [R2+0x100] ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 46 /02d0/ IADD3 R8, R0, 0x80, RZ ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 42 /02e0/ ISETP.GE.AND P1, PT, R8, R5, PT ; /02f0/ P1 BRA `(.L_2923) ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 46 /0300/ IADD3 R8, R0, 0xc0, RZ ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 42 /0310/ ISETP.GE.AND P1, PT, R8, R5, PT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 45 /0320/ LDG.E.SYS R8, [R2+0x200] ; /0330/ @!P1 LDG.E.SYS R7, [R2+0x300] ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 102 /0340/ P1 IMAD.MOV.U32 R7, RZ, RZ, RZ ; /0350/ BRA `(.L_2921) ; .L_2923: /0360/ IMAD.MOV.U32 R7, RZ, RZ, RZ ; /0370/ IMAD.MOV.U32 R8, RZ, RZ, RZ ; /0380/ BRA `(.L_2921) ; .L_2922: /0390/ CS2R R6, SRZ ; /03a0/ IMAD.MOV.U32 R8, RZ, RZ, RZ ; .L_2921: /03b0/ BSYNC B0 ; .L_2920: /03c0/ BMOV.32.CLEAR RZ, B0 ; /03d0/ BSSY B0, `(.L_2924) ; /03e0/ P0 BRA `(.L_2925) ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 45 /03f0/ IADD3 R3, P1, R9, R0, RZ ; /0400/ LEA.HI.X.SX32 R12, R0, R17, 0x1, P1 ; /0410/ LEA R2, P1, R3, c[0x0][0x190], 0x2 ; /0420/ LEA.HI.X R3, R3, c[0x0][0x194], R12, 0x2, P1 ; /0430/ LDG.E.SYS R11, [R2] ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 46 /0440/ IADD3 R12, R0, 0x40, RZ ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 42 /0450/ ISETP.GE.AND P1, PT, R12, R5, PT ; /0460/ P1 BRA `(.L_2926) ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 45 /0470/ LDG.E.SYS R13, [R2+0x100] ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 46 /0480/ IADD3 R12, R0, 0x80, RZ ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 42 /0490/ ISETP.GE.AND P1, PT, R12, R5, PT ; /04a0/ P1 BRA `(.L_2927) ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 45 /04b0/ LDG.E.SYS R15, [R2+0x200] ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 46 /04c0/ IADD3 R12, R0, 0xc0, RZ ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 42 /04d0/ ISETP.GE.AND P1, PT, R12, R5, PT ; /04e0/ P1 BRA `(.L_2928) ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 45 /04f0/ LDG.E.SYS R4, [R2+0x300] ; /0500/ BRA `(.L_2928) ; .L_2927: /0510/ IMAD.MOV.U32 R15, RZ, RZ, RZ ; /0520/ BRA `(.L_2928) ; .L_2926: /0530/ IMAD.MOV.U32 R15, RZ, RZ, RZ ; /0540/ IMAD.MOV.U32 R13, RZ, RZ, RZ ; /0550/ BRA `(.L_2928) ; .L_2925: /0560/ IMAD.MOV.U32 R15, RZ, RZ, RZ ; /0570/ IMAD.MOV.U32 R13, RZ, RZ, RZ ; /0580/ IMAD.MOV.U32 R11, RZ, RZ, RZ ; .L_2928: /0590/ BSYNC B0 ; .L_2924: //## File "/usr/include/c++/8/tuple", line 1315 /05a0/ P0 EXIT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 58 /05b0/ IADD3 R9, P0, R9, R0, RZ ; //## File "/usr/include/c++/8/tuple", line 1315 /05c0/ FFMA R11, R11, c[0x0][0x168], R10 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 59 /05d0/ IADD3 R14, R0, 0x40, RZ ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 58 /05e0/ LEA.HI.X.SX32 R12, R0, R17, 0x1, P0 ; /05f0/ LEA R2, P0, R9.reuse, c[0x0][0x180], 0x2 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 55 /0600/ ISETP.GE.AND P1, PT, R14, R5, PT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 58 /0610/ LEA.HI.X R3, R9, c[0x0][0x184], R12, 0x2, P0 ; /0620/ STG.E.SYS [R2], R11 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 55 /0630/ P1 EXIT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 59 /0640/ IADD3 R10, R0, 0x80, RZ ; //## File "/usr/include/c++/8/tuple", line 1315 /0650/ FFMA R13, R13, c[0x0][0x168], R6 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 55 /0660/ ISETP.GE.AND P0, PT, R10, R5, PT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 58 /0670/ STG.E.SYS [R2+0x100], R13 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 55 /0680/ P0 EXIT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 59 /0690/ IADD3 R0, R0, 0xc0, RZ ; //## File "/usr/include/c++/8/tuple", line 1315 /06a0/ FFMA R15, R15, c[0x0][0x168], R8 ; /06b0/ FFMA R7, R4, c[0x0][0x168], R7 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 55 /06c0/ ISETP.GE.AND P0, PT, R0, R5, PT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 58 /06d0/ STG.E.SYS [R2+0x200], R15 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 55 /06e0/ P0 EXIT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 58 /06f0/ STG.E.SYS [R2+0x300], R7 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/CUDALoops.cuh", line 260 /0700/ EXIT ; .L_2929: /0710/ BRA `(.L_2929); /0720/ NOP; /0730/ NOP; /0740/ NOP; /0750/ NOP; /0760/ NOP; /0770/ NOP; .L_40520: ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/33222 Differential Revision: D19964089 Pulled By: ngimel fbshipit-source-id: a1e8e62d1ebcc67fb49f00d87c02bcdd13194024	2020-02-19 18:41:27 -08:00
Michael Ranieri	e025f393f6	windows template specialization bug (#33076 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33076 attempt at fixing https://github.com/pytorch/pytorch/issues/30886 Test Plan: circleCI with `call "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvarsall.bat" x64 -vcvars_ver=14.16` passes Differential Revision: D19784550 fbshipit-source-id: 9fb42c3854d1d00d96cd7179bef9dd1aa2972ea6	2020-02-07 00:41:22 -08:00
Sebastian Messmer	ab60cca488	Make c10::util::get_fully_qualified_type_name() backwards compatible with clang 4 (#31351 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31351 Clang 4 needs the c10:: namespace specifier on fully_qualified_type_name_impl() to work correctly. Also, let's add an error message for people using clang 3 and earlier, we don't support those compilers anymore but before this PR, they got a crappy message. ghstack-source-id: 96380163 Test Plan: testinprod Differential Revision: D19135587 fbshipit-source-id: c206b56240b36e5c207fb2b69c389bb39f1e62aa	2020-01-07 17:07:54 -08:00
Sebastian Messmer	643ca5def2	Replace c10::guts::stuff with std::stuff (#30915 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30915 Since we now have C++14, we don't need these c10::guts helpers anymore ghstack-source-id: 95777609 Test Plan: waitforsandcastle Differential Revision: D18869639 fbshipit-source-id: 97716f932297c64c6e814410ac47b444c33d4e2e	2019-12-16 13:57:19 -08:00
Sebastian Messmer	bc2e6d10fa	Back out "Revert D17908478: Switch PyTorch/Caffe2 to C++14" Summary: Original commit changeset: 775d2e29be0b Test Plan: CI Reviewed By: mruberry Differential Revision: D18775520 fbshipit-source-id: a350b3f86b66d97241f208786ee67e9a51172eac	2019-12-03 14:33:43 -08:00
Sebastian Messmer	a2ed50c920	Revert D17908478: Switch PyTorch/Caffe2 to C++14 Test Plan: revert-hammer Differential Revision: D17908478 Original commit changeset: 6e340024591e fbshipit-source-id: 775d2e29be0bc3a0db64f164c8960c44d4877d5d	2019-11-27 14:57:05 -08:00
Sebastian Messmer	d0acc9c085	Switch PyTorch/Caffe2 to C++14 (#30406 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30406 ghstack-source-id: 94642238 Test Plan: waitforsandcastle Differential Revision: D17908478 fbshipit-source-id: 6e340024591ec2c69521668022999df4a33b4ddb	2019-11-27 10:47:31 -08:00
Dylan Bespalko	f8b758b141	CPU-Strided-Complex Support for reduce ops and linpack ops (#27653 ) Summary: In-tree changes to pytorch to support complex numbers are being submitted here. Out-of-tree support for complex numbers is here: [pytorch-cpu-strided-complex extension](https://gitlab.com/pytorch-complex/pytorch-cpu-strided-complex) Changes so far: - [x] Renamed references to variable "I" that may be confused for "I" defined in complex.h. I did this to avoid crazy CI failures messages as complex.h is included by more source files. - aten/src/ATen/native/cpu/Loops.h (Renamed I to INDEX) - aten/src/ATen/native/cuda/Loops.cuh (Renamed I to INDEX) - aten/src/ATen/core/ivalue_inl.h (Renamed I to INDEX) - c10/util/Array.h (Renamed I to INDEX) - c10/util/C++17.h (Renamed I to INDEX) - c10/util/Metaprogramming.h (Renamed I to INDEX) - c10/util/SmallVector.h (custom renaming) - [x] Added complex support of Linear Algebra Ops. - SVD needed to be modified to support mixed data types - Example U(std::complex<double)), S(double), V(std::complex<double>) - See before and after benchmark below (No observable change in performance). - [x] Added complex support of Reduce Ops. - var/std computations could have been faster if it was possible to interpret std::complex<double> Tensor as a double Tensor. - [x] Added complex derivative support for autograd functionality. - derivatives are the same as defined by numpy autograd library for real(), imag(), conj(), angle(). These functions only affect complex numbers. - derivative of abs() has not been modified to not interfere with existing code. - Autograd defines abs() for complex numbers and fabs() for real numbers. I will look into this further down the road. ---------------------------------------- PyTorch/Caffe2 Operator Micro-benchmarks Before Changes ---------------------------------------- Tag : short Benchmarking PyTorch: svd Mode: Eager Name: svd_M512_N512 Input: M: 512, N: 512 Forward Execution Time (us) : 162339.425 Forward Execution Time (us) : 162517.479 Forward Execution Time (us) : 162847.775 ---------------------------------------- PyTorch/Caffe2 Operator Micro-benchmarks After Changes ---------------------------------------- Tag : short Benchmarking PyTorch: svd Mode: Eager Name: svd_M512_N512 Input: M: 512, N: 512 Forward Execution Time (us) : 162032.117 Forward Execution Time (us) : 161943.484 Forward Execution Time (us) : 162513.786 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27653 Differential Revision: D17907886 Pulled By: ezyang fbshipit-source-id: a88b6d0427591ec1fba09e97c880f535c5d0e513	2019-10-24 09:31:06 -07:00
Sebastian Messmer	70e9ef518f	c10::string_view (#26616 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26616 Implement C++17 std::string_view for C++11. This is useful for compile time type name retrievaly which I'm going to stack on top of this. It is also useful to replace `const std::string&` with throughout our codebase. ghstack-source-id: 92100314 Test Plan: unit tests Differential Revision: D17518992 fbshipit-source-id: 48e31c677d51b0041f4b37e89a92bd176d4a0b08	2019-10-21 16:10:40 -07:00
Sebastian Messmer	3ac4267763	Force building with GCC 5 (#28098 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28098 Make sure that we're building with GCC 5 everywhere ghstack-source-id: 92013998 Test Plan: waitforsandcastle Differential Revision: D17953640 fbshipit-source-id: 26d978c60fc973c787383297d730b45d40fa300b	2019-10-16 12:49:59 -07:00
Sebastian Messmer	54b66c8c20	Fix shared_ptr binary size in op registration (#26869 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26869 Having a lot of shared_ptr<Functor> cost us ~1.1MB of binary size in libtorch.so. This PR fixes that. ghstack-source-id: 90842812 Test Plan: measure libtorch.so size Differential Revision: D17595674 fbshipit-source-id: 05151047ee8e85c05205b7510a33915ba98bab58	2019-09-26 16:58:56 -07:00
Sebastian Messmer	5c67b01467	Switch internal CUDA build to C++14 (#26757 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26757 This doesn't switch any open source builds or CI. The internal fbcode build is C++17 already for quite some time, but in CUDA code, we had it restricted to C++11. This diff changes that to C++14. Because this doesn't change anything open source, the risk of this is low. ghstack-source-id: 90728524 Test Plan: waitforsandcastle Differential Revision: D17558142 fbshipit-source-id: 9cfd47e38e71d5a2fdae2f535c01f281bf007d9a	2019-09-26 14:57:21 -07:00
Sebastian Messmer	791347642b	Allow TensorMethods.h to include Dispatcher.h (alternative) (#23888 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23888 This is an alternative to https://github.com/pytorch/pytorch/pull/23684. Instead of splitting a bunch of headers into declaration and definition, we change tensor includes to only include the tensor declaration when the tensor definition isn't needed. ghstack-source-id: 89357687 Test Plan: waitforsandcastle Differential Revision: D16673569 fbshipit-source-id: fa1d92809b05de7910a8c2dc2f55abe071ca63bf	2019-09-04 01:35:19 -07:00
Horace He	f81db8afb8	Initial torchbind prototype (#21098 ) Summary: I have some test code in there as well, along with a script "test_libtorch" to run it. You'll need to modify `test_libtorch` to point to where you have `pytorch` built. I currently require that `pybind11` is included as a subdirectory of the test, but added it to the `.gitignore` to make this reviewable. Currently, something like this works: ```cpp struct Foo { int x, y; Foo(): x(2), y(5){} Foo(int x_, int y_) : x(x_), y(y_) {} void display() { cout<<"x: "<<x<<' '<<"y: "<<y<<endl; } int64_t add(int64_t z) { return (x+y)*z; } }; static auto test = torch::jit::class_<Foo>("Foo") .def(torch::jit::init<int64_t, int64_t>()) .def("display", &Foo::display) .def("add", &Foo::add) .def("combine", &Foo::combine); ``` with ```py torch.jit.script def f(x): val = torch._C.Foo(5, 3) val.display() print(val.add(3)) ``` results in ``` x: 5 y: 3 24 ``` Current issues: - [x] The python class created by torchscript doesn't interactly properly with the surrounding code. ``` torch.jit.script def f(x): val = torch._C.Foo(5, 3) return val ``` - [x] Doesn't properly take in non-pointer classes. Can't define this function signature in cpp (We don't want to support this I believe). ```cpp void combine(Foo x) { ``` - [x] Has some issues with memory for blobs when constructing multiple objects (fix constant propagation pass to not treat capsules as the same object). ```py torch.jit.script def f(x): val = torch._C.Foo(5, 3) val2 = torch._C.Foo(100, 0) val.display() print(val.add(3)) ``` - [ ] Can't define multiple constructors (need to define overload string. Currently not possible since we don't support overloaded methods). - [x] `init` is a little bit different syntax than `pybind`. `.init<...>()` instead of `.def(py::init<>())` - [x] I couldn't figure out how to add some files into the build so they'd be copied to the `include/` directories, so I symlinked them manually. - [ ] Currently, the conversion from Python into Torchscript doesn't work. - [ ] Torchbind also currently requires Python/Pybind dependency. Fixing this would probably involve some kind of macro to bind into Python when possible. - [ ] We pass back into Python by value, currently. There's no way of passing by reference. - [x] Currently can only register one method with the same type signature. This is because we create a `static auto opRegistry`, and the function is templated on the type signature. Somewhat blocked on https://github.com/pytorch/pytorch/pull/21177. We currently use some structures that will be refactored by his PR (namely `return_type_to_ivalue` and `ivalue_to_arg_type`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/21098 Differential Revision: D16634872 Pulled By: Chillee fbshipit-source-id: 1408bb89ea649c27d560df59e2cf9920467fe1de	2019-08-02 18:45:15 -07:00
Sebastian Messmer	fc941d3bca	Catchall kernels instead of fallback kernels (#20773 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20773 This removes the feature to register fallback kernels that are called when no other kernel matches. Instead, we introduce the concept of catchall kernels that are always called independent of inputs. If you only have a fallback/catchall kernel and no kernels with concrete dispatch keys, then both concepts behave in the same way. The difference is that we now disallow operators to have both, a catchall kernel and kernels with concrete dispatch keys. This was possible before when they have been fallback kernels. The reason for this change is that we anticipate needing a method_missing feature in backends, i.e. a backend-wide fallback to call when the backend doesn't specify a kernel for an operator. We are not clear on precendence between this backend-wide fallback and an operator level fallback. Disallow fallbacks for now so we are free to choose later without breaking backwards compatibility. Reviewed By: dzhulgakov Differential Revision: D15438977 fbshipit-source-id: cb3aa764a1659d909ee21a7bd8ec3d32438aafaa	2019-05-23 23:47:51 -07:00
Sebastian Messmer	c7b1fdb767	Fixing function schema parser for Android (#19281 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19281 String<->Number conversions aren't available in the STL used in our Android environment. This diff adds workarounds for that so that the function schema parser can be compiled for android Reviewed By: dzhulgakov Differential Revision: D14931649 fbshipit-source-id: d5d386f2c474d3742ed89e52dff751513142efad	2019-04-17 23:50:17 -07:00
Shuichi KITAGUCHI	17adce1b69	do not use constexpr with CUDA >= 9.2 compiler on Windows. (#18986 ) Summary: Define `AT_CPP14_CONSTEXPR` from `constexpr` to empty on Windows with CUDA >= 9.2 as workaround. Discussed in #18425. When using CUDA 10.1 on Windows, I faced following errors: ~~~ D:/data/source/pytorch\c10/util/ArrayRef.h(144): error: variable in constexpr function does not have automatic storage duration detected during instantiation of "const T &c10::ArrayRef<T>::front() const [with T=at::Tensor]" D:/data/source/pytorch/aten/src\ATen/DeviceGuard.h(30): here ~~~ From documentation of CUDA Toolkit v10.1.105, compiler supports `constexpr` and relaxing requirements (in C++14), but compilation failed. I suppose this could be compiler bug and require this workaround. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18986 Differential Revision: D14821836 Pulled By: ezyang fbshipit-source-id: 9800da2fe7291e7c09e8e5e882adebab08d83ae3	2019-04-09 08:03:13 -07:00
Sebastian Messmer	14c28fabd2	Check kernel against function schema in c10 op registration (#18256 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18256 This diff infers the function schema from the kernel function/functor and checks that it matches the specified function schema. This diff does not allow (yet) to omit specifying the function schema in the registration API. That will come in a future diff. Reviewed By: dzhulgakov Differential Revision: D14552738 fbshipit-source-id: 00202b489ede19f26ae686c97416b38c72c11532	2019-03-30 00:07:22 -07:00
Sebastian Messmer	c4bb09cc42	Add functor- and function-based kernel registration API (#18162 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18162 - Adds the API to register a functor- and function-based kernel. - Change the experimental c10 ops to use this new API instead of the old one - Deletes the old APIs in KernelRegistration.h and OpSchemaRegistration.h Reviewed By: dzhulgakov Differential Revision: D14514239 fbshipit-source-id: 35b2f6e8f62964e54886450a6a5fac812ed20f26	2019-03-30 00:07:19 -07:00
Ilia Cherniavskii	b0d9712938	C++17.h: forward -> c10::guts::forward (#18492 ) Summary: Use c10::guts::forward instead of forward Pull Request resolved: https://github.com/pytorch/pytorch/pull/18492 Reviewed By: smessmer Differential Revision: D14625513 Pulled By: ilia-cher fbshipit-source-id: 8bc4e20f102fe2a107a22f3e172882d60b95ab0e	2019-03-27 21:14:07 -07:00
Sebastian Messmer	104773c715	Fix use of c10::guts::apply (#18159 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18159 In some instances, the call to forward could clash with std::forward. Fully qualify it to make sure it gets the right one Reviewed By: ezyang Differential Revision: D14512189 fbshipit-source-id: 6242607dbe54fcdb93229c1a4aaee8b84a88caa1	2019-03-21 14:57:33 -07:00
Sebastian Messmer	0b96e5d792	Move some files to c10/util (#12245 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12245 Move these files to c10/util: - C++17.h - Metaprogramming.h - TypeList.h - TypeTraits.h - Array.h (including .cpp files and test cases) Reviewed By: ezyang Differential Revision: D10139933 fbshipit-source-id: ce7ce89392bf1a6be070ffdfc0407a8a2ce4ba6e	2018-10-15 16:25:12 -07:00

47 Commits