caffe2/c10/core/TensorImpl.h: adapt to clang 12 (#70973)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70973 clang12 builds fail like this: caffe2/c10/core/TensorImpl.h:2615:1: error: static_assert failed due to requirement 'sizeof(void *) != sizeof(long) || sizeof(c10::TensorImpl) == sizeof(long) * 24' "You changed the size of TensorImpl on 64-bit arch.See Note [TensorImpl size constraints] on how to proceed." Yet eliciting the size of that struct with this one-line addition: char (*__show_sizeof)[sizeof( TensorImpl )] = 1; reports that its size is indeed 192 (aka 8 * 24): caffe2/c10/core/TensorImpl.h:2615:8: error: cannot initialize a variable of type 'char (*)[192]' with an rvalue of type 'int' On closer inspection we determined that failures were occurring because TensorImpl was sometimes of size 208 and other times of size 192. The 192 size was expected and TensorImpl was hard-coded to raise an error for any other case on a 64-bit system, including the one we found where the size was 208. Additional investigation revealed that systems using GCC 11 and CUDA 11040 with either C++ 201402 and 201703 would sometimes yield TensorImpl sizes of 208 whereas systems newer systems without CUDA would always yield sizes of 192. The difference turned out to be that `std::unique_ptr` on NVCC systems is sometimes of size 16 and other times of size 8, accounting fully for the observed difference in TensorImpl sizes. We have not yet been able to find a set of preprocessor macros that predict when each size will occur. To handle the situation, we've added extensive debugging information to the TensorImpl size-checking logic. A number of preprocessing definitions capture compiler versions and other information to help understand what changes might have affected the size of TensorImpl. The size of each member of TensorImpl is now individually checked, along with the total size. Template-based comparison functions are used to provide compile-time outputs about the system state as well as the observed and expected sizes of each item considered. The template-based comparison functions cause the code to break if it's run on a 32-bit system because the templates and their associated static_asserts are compiled whether or not they'll ultimately be used. In C++17 we could prevent this using `if constexpr`; however, PyTorch is pinned to C++14, so we cannot. Instead, we check pointer size (`#if UINTPTR_MAX == 0xFFFFFFFF`) to determine which system we're on and provide separate checks for 32 vs 64-bit systems. A final wrinkle is that 32-bit systems have some variations in data size as well. We handle these by checking that the relevant items are `<=` the expected values. In summary... Improvements over the previous situation: * Added checks for 32-bit systems * The sizes of individual fields are now checked * Compile-time size results (expected versus observed) are provided * Compile-time compiler and system info is provided * Landing this diff will actually enable checks of TensorImpl size; they are currently disabled to expedite LLVM-12 + newer CUDA upgrade efforts. Some work that could still be done: * Figure out what preprocessor flags (if any) predict the size of `std::unique_ptr` for 64-bit systems and of various elements of 32-bit systems. Test Plan: Building no longer triggers that static_assert failure. Reviewed By: luciang Differential Revision: D32749655 fbshipit-source-id: 481f84da6ff61b876a5aaba89b8589ec54d59fbe
2025-10-20 21:14:14 +08:00 · 2022-01-12 14:19:02 -08:00
parent 385773cb77
commit 6c1be299c1
1 changed files with 207 additions and 8 deletions
--- a/c10/core/TensorImpl.h
+++ b/c10/core/TensorImpl.h
@ -509,6 +509,24 @@ struct C10_API VariableVersion {
  }
 };

+// Forward declaration of TensorImpl needed for forward declaration of
+// C10_TensorImpl_Size_Check_Dummy_Class
+struct C10_API TensorImpl;
+
+// Forward declaration needed because TensorImpl needs to be friends with
+// C10_TensorImpl_Size_Check_Dummy_Class in order to check the size
+// of its private fields.
+template <
+    size_t cplusplus,
+    size_t clang_ver_major,
+    size_t gcc_ver,
+    size_t gcc_ver_minor,
+    size_t nvcc,
+    size_t cuda_version,
+    size_t cuda_version_major,
+    size_t ptr_size>
+class C10_TensorImpl_Size_Check_Dummy_Class;
+
 /**
 * NOTE: Some TensorImpl methods are small and not overridden in the
 * PyTorch codebase itself, but may theoretically need to be
@ -2586,6 +2604,20 @@ struct C10_API TensorImpl : public c10::intrusive_ptr_target {
  // INVARIANT: named_tensor_meta_ != nullptr  <==>
  // key_set_.has(DispatchKey::Named)
  DispatchKeySet key_set_;
+
+ private:
+  // C10_TensorImpl_Size_Check_Dummy_Class needs to be friends with
+  // TensorImpl so it can inspect the size of private fields
+  template <
+      size_t cplusplus,
+      size_t clang_ver_major,
+      size_t gcc_ver,
+      size_t gcc_ver_minor,
+      size_t nvcc,
+      size_t cuda_version,
+      size_t cuda_version_major,
+      size_t ptr_size>
+  friend class C10_TensorImpl_Size_Check_Dummy_Class;
 };

 // Note [TensorImpl size constraints]
@ -2639,15 +2671,182 @@ struct C10_API TensorImpl : public c10::intrusive_ptr_target {
 //    DispatchKeySet
 //

-// TODO: On C++20 the size has changed. Temporarily disable the assert to
-// unblock other C++20 migration.
-#if 0
-static_assert(
-    sizeof(void*) != sizeof(int64_t) || // if 64-bit...
-        sizeof(TensorImpl) == sizeof(int64_t) * 24,
-    "You changed the size of TensorImpl on 64-bit arch."
-    "See Note [TensorImpl size constraints] on how to proceed.");
+// Various preprocessor macros we use to check that the
+// TensorImpl size hasn't changed unexpectedly. We undef
+// these later.
+#ifndef __NVCC__
+#define C10_NVCC 0
+#else
+#define C10_NVCC __NVCC__
 #endif
+
+#ifndef __CUDA_VER_MAJOR__
+#define C10_CUDA_VERSION_MAJOR 0
+#else
+#define C10_CUDA_VERSION_MAJOR __CUDA_VER_MAJOR__
+#endif
+
+#ifndef CUDA_VERSION
+#define C10_CUDA_VERSION 0
+#else
+#define C10_CUDA_VERSION CUDA_VERSION
+#endif
+
+#ifndef __clang_major__
+#define C10_CLANG_MAJOR_VERSION 0
+#else
+#define C10_CLANG_MAJOR_VERSION __clang_major__
+#endif
+
+#ifndef __GNUC__
+#define C10_GCC_VERSION 0
+#else
+#define C10_GCC_VERSION __GNUC__
+#endif
+
+#ifndef __GNUC_MINOR__
+#define C10_GCC_VERSION_MINOR 0
+#else
+#define C10_GCC_VERSION_MINOR __GNUC_MINOR__
+#endif
+
+// We use a templatized class to both contain the logic of checking the sizes
+// as well as to provide compile-time information that might be useful in
+// figuring out why sizes may have changed.
+template <
+    size_t cplusplus = __cplusplus,
+    size_t clang_ver_major = C10_CLANG_MAJOR_VERSION,
+    size_t gcc_ver = C10_GCC_VERSION,
+    size_t gcc_ver_minor = C10_GCC_VERSION_MINOR,
+    size_t nvcc = C10_NVCC,
+    size_t cuda_version = C10_CUDA_VERSION,
+    size_t cuda_version_major = C10_CUDA_VERSION_MAJOR,
+    size_t ptr_size = sizeof(void*)>
+class C10_TensorImpl_Size_Check_Dummy_Class : private TensorImpl {
+  // Names of (non-bitfield) fields in TensorImpl; used to provide
+  // compile-time info about fields whose size changes unexpectedly.
+  enum class FieldNameEnum {
+    storage_,
+    autograd_meta_,
+    named_tensor_meta_,
+    version_counter_,
+    pyobj_interpreter_,
+    pyobj_,
+    sizes_and_strides_,
+    storage_offset_,
+    numel_,
+    data_type_,
+    device_opt_,
+    key_set_,
+    TOTAL_SIZE
+  };
+
+  // Provides compile-time equality check that reveals what numbers
+  // were used and on which quantity
+  template <size_t Actual, size_t Expected, FieldNameEnum FiledName>
+  constexpr static bool are_equal() {
+    static_assert(
+        Actual == Expected,
+        "Actual and Expected sizes of a field did not match!");
+    return true;
+  }
+
+  // Provides compile-time <= check that reveals what numbers
+  // were used and on which quantity
+  template <size_t Actual, size_t Expected, FieldNameEnum FiledName>
+  constexpr static bool is_le() {
+    static_assert(
+        Actual <= Expected,
+        "Actual and Expected sizes of a field did not match!");
+    return true;
+  }
+
+ public:
+  // Compile-time check that TensorImpl field sizes are as expected
+  //
+  // Observed total sizes and associated versions
+  // If you find a flag that predicts when unique_ptr has 16 bytes
+  // on 64-bit systems or when sizes_and_strides_ is 84 vs 88 bytes
+  // on 32-bit systems you get a cookie!
+  // Length | LLVM | GCC  |    C++ |  CUDA
+  //    192 |    ? | 11.2 | 201703 | 11040
+  //    208 |    ? | 11.2 | 201703 | 11040
+  //    208 |    ? | 11.2 | 201402 | 11040
+  //    192 |    ? | 11.2 | 201402 | 11040
+  //    160 |   12 |  4.2 | 201703 |     0
+  //
+  // To keep things clean, we split on systems here.
+
+#if UINTPTR_MAX == 0xFFFFFFFF
+  // This is a 32-bit system
+  static constexpr bool check_sizes() {
+    constexpr size_t tsize = 20 * sizeof(int64_t);
+
+    // clang-format off
+    static_assert(are_equal<sizeof(storage_),            4,  FieldNameEnum::storage_>(),           "Size of storage_ changed!");
+    static_assert(are_equal<sizeof(autograd_meta_),      4,  FieldNameEnum::autograd_meta_>(),     "Size of autograd_meta_ changed!");
+    static_assert(are_equal<sizeof(named_tensor_meta_),  4,  FieldNameEnum::named_tensor_meta_>(), "Size of named_tensor_meta_ changed!");
+    static_assert(are_equal<sizeof(version_counter_),    4,  FieldNameEnum::version_counter_>(),   "Size of version_counter_ changed!");
+    static_assert(are_equal<sizeof(pyobj_interpreter_),  4,  FieldNameEnum::pyobj_interpreter_>(), "Size of pyobj_interpreter_ changed!");
+    static_assert(are_equal<sizeof(pyobj_),              4,  FieldNameEnum::pyobj_>(),             "Size of pyobj_ changed!");
+    static_assert(is_le<sizeof(sizes_and_strides_),     88, FieldNameEnum::sizes_and_strides_>(), "Size of sizes_and_strides_ changed!");
+    static_assert(are_equal<sizeof(storage_offset_),     8,  FieldNameEnum::storage_offset_>(),    "Size of storage_offset_ changed!");
+    static_assert(are_equal<sizeof(numel_),              8,  FieldNameEnum::numel_>(),             "Size of numel_ changed!");
+    static_assert(are_equal<sizeof(data_type_),          2,  FieldNameEnum::data_type_>(),         "Size of data_type_ changed!");
+    static_assert(are_equal<sizeof(device_opt_),         3,  FieldNameEnum::device_opt_>(),        "Size of device_opt_ changed!");
+    static_assert(are_equal<sizeof(key_set_),            8,  FieldNameEnum::key_set_>(),           "Size of key_set_ changed!");
+    static_assert(is_le<sizeof(TensorImpl),          tsize,  FieldNameEnum::TOTAL_SIZE>(),         "Total size changed!");
+    // clang-format on
+
+    return true;
+  }
+#else
+  // This is a 64-bit system
+  static constexpr bool check_sizes() {
+    constexpr size_t tsize = 26 * sizeof(int64_t);
+
+    // clang-format off
+    static_assert(are_equal<sizeof(storage_),            8,  FieldNameEnum::storage_>(),           "Size of storage_ changed!");
+    // On some systems involving NVCC the size of unique_ptr is 16 bytes. We haven't
+    // figured out how to detect those via macro preprocessors yet, so we use <=
+    // comparisons for the relevant fields.
+    static_assert(is_le<sizeof(autograd_meta_),         16,  FieldNameEnum::autograd_meta_>(),     "Size of autograd_meta_ changed!");
+    static_assert(is_le<sizeof(named_tensor_meta_),     16,  FieldNameEnum::named_tensor_meta_>(), "Size of named_tensor_meta_ changed!");
+    static_assert(are_equal<sizeof(version_counter_),    8,  FieldNameEnum::version_counter_>(),   "Size of version_counter_ changed!");
+    static_assert(are_equal<sizeof(pyobj_interpreter_),  8,  FieldNameEnum::pyobj_interpreter_>(), "Size of pyobj_interpreter_ changed!");
+    static_assert(are_equal<sizeof(pyobj_),              8,  FieldNameEnum::pyobj_>(),             "Size of pyobj_ changed!");
+    static_assert(are_equal<sizeof(sizes_and_strides_), 88, FieldNameEnum::sizes_and_strides_>(), "Size of sizes_and_strides_ changed!");
+    static_assert(are_equal<sizeof(storage_offset_),     8,  FieldNameEnum::storage_offset_>(),    "Size of storage_offset_ changed!");
+    static_assert(are_equal<sizeof(numel_),              8,  FieldNameEnum::numel_>(),             "Size of numel_ changed!");
+    static_assert(are_equal<sizeof(data_type_),          2,  FieldNameEnum::data_type_>(),         "Size of data_type_ changed!");
+    static_assert(are_equal<sizeof(device_opt_),         3,  FieldNameEnum::device_opt_>(),        "Size of device_opt_ changed!");
+    static_assert(are_equal<sizeof(key_set_),            8,  FieldNameEnum::key_set_>(),           "Size of key_set_ changed!");
+    static_assert(is_le<sizeof(TensorImpl),          tsize,  FieldNameEnum::TOTAL_SIZE>(),         "Total size changed!");
+    // clang-format on
+
+    return true;
+  }
+#endif
+};
+
+// We use a class to encapsulate size-checking logic with
+// templates to capture sizes and flags. We call this within
+// a static assert to prove there is no run-time behaviour.
+// Since the methods we call return either true or fail their
+// own static_asserts, we should never see the error messages
+// below.
+static_assert(
+    C10_TensorImpl_Size_Check_Dummy_Class<>::check_sizes(),
+    "You should not see this message.");
+
+// Clean up after ourselves
+#undef C10_NVCC
+#undef C10_CUDA_VERSION_MAJOR
+#undef C10_CUDA_VERSION
+#undef C10_CLANG_MAJOR_VERSION
+#undef C10_GCC_VERSION
+#undef C10_GCC_VERSION_MINOR
+
 } // namespace c10

 C10_CLANG_DIAGNOSTIC_POP()