mirror of
https://github.com/pytorch/pytorch.git
synced 2025-10-20 21:14:14 +08:00
Summary: ## Motivation This PR upgrades MKL-DNN from v0.20 to DNNL v1.2 and resolves https://github.com/pytorch/pytorch/issues/30300. DNNL (Deep Neural Network Library) is the new brand of MKL-DNN, which improves performance, quality, and usability over the old version. This PR focuses on the migration of all existing functionalities, including minor fixes, performance improvement and code clean up. It serves as the cornerstone of our future efforts to accommodate new features like OpenCL support, BF16 training, INT8 inference, etc. and to let the Pytorch community derive more benefits from the Intel Architecture. <br> ## What's included? Even DNNL has many breaking changes to the API, we managed to absorb most of them in ideep. This PR contains minimalist changes to the integration code in pytorch. Below is a summary of the changes: <br> **General:** 1. Replace op-level allocator with global-registered allocator ``` // before ideep::sum::compute<AllocForMKLDNN>(scales, {x, y}, z); // after ideep::sum::compute(scales, {x, y}, z); ``` The allocator is now being registeted at `aten/src/ATen/native/mkldnn/IDeepRegistration.cpp`. Thereafter all tensors derived from the `cpu_engine` (by default) will use the c10 allocator. ``` RegisterEngineAllocator cpu_alloc( ideep::engine::cpu_engine(), [](size_t size) { return c10::GetAllocator(c10::DeviceType::CPU)->raw_allocate(size); }, [](void* p) { c10::GetAllocator(c10::DeviceType::CPU)->raw_deallocate(p); } ); ``` ------ 2. Simplify group convolution We had such a scenario in convolution where ideep tensor shape mismatched aten tensor: when `groups > 1`, DNNL expects weights tensors to be 5-d with an extra group dimension, e.g. `goihw` instead of `oihw` in 2d conv case. As shown below, a lot of extra checks came with this difference in shape before. Now we've completely hidden this difference in ideep and all tensors are going to align with pytorch's definition. So we could safely remove these checks from both aten and c2 integration code. ``` // aten/src/ATen/native/mkldnn/Conv.cpp if (w.ndims() == x.ndims() + 1) { AT_ASSERTM( groups > 1, "Only group _mkldnn_conv2d weights could have been reordered to 5d"); kernel_size[0] = w.get_dim(0) * w.get_dim(1); std::copy_n( w.get_dims().cbegin() + 2, x.ndims() - 1, kernel_size.begin() + 1); } else { std::copy_n(w.get_dims().cbegin(), x.ndims(), kernel_size.begin()); } ``` ------ 3. Enable DNNL built-in cache Previously, we stored DNNL jitted kernels along with intermediate buffers inside ideep using an LRU cache. Now we are switching to the newly added DNNL built-in cache, and **no longer** caching buffers in order to reduce memory footprint. This change will be mainly reflected in lower memory usage from memory profiling results. On the code side, we removed couple of lines of `op_key_` that depended on the ideep cache before. ------ 4. Use 64-bit integer to denote dimensions We changed the type of `ideep::dims` from `vector<int32_t>` to `vector<int64_t>`. This renders ideep dims no longer compatible with 32-bit dims used by caffe2. So we use something like `{stride_.begin(), stride_.end()}` to cast parameter `stride_` into a int64 vector. <br> **Misc changes in each commit:** **Commit:** change build options Some build options were slightly changed, mainly to avoid name collisions with other projects that include DNNL as a subproject. In addition, DNNL built-in cache is enabled by option `DNNL_ENABLE_PRIMITIVE_CACHE`. Old | New -- | -- WITH_EXAMPLE | MKLDNN_BUILD_EXAMPLES WITH_TEST | MKLDNN_BUILD_TESTS MKLDNN_THREADING | MKLDNN_CPU_RUNTIME MKLDNN_USE_MKL | N/A (not use MKL anymore) ------ **Commit:** aten reintegration - aten/src/ATen/native/mkldnn/BinaryOps.cpp Implement binary ops using new operation `binary` provided by DNNL - aten/src/ATen/native/mkldnn/Conv.cpp Clean up group convolution checks Simplify conv backward integration - aten/src/ATen/native/mkldnn/MKLDNNConversions.cpp Simplify prepacking convolution weights - test/test_mkldnn.py Fixed an issue in conv2d unit test: it didn't check conv results between mkldnn and aten implementation before. Instead, it compared the mkldnn with mkldnn as the default cpu path will also go into mkldnn. Now we use `torch.backends.mkldnn.flags` to fix this issue - torch/utils/mkldnn.py Prepack weight tensor on module `__init__` to achieve better performance significantly ------ **Commit:** caffe2 reintegration - caffe2/ideep/ideep_utils.h Clean up unused type definitions - caffe2/ideep/operators/adam_op.cc & caffe2/ideep/operators/momentum_sgd_op.cc Unify tensor initialization with `ideep::tensor::init`. Obsolete `ideep::tensor::reinit` - caffe2/ideep/operators/conv_op.cc & caffe2/ideep/operators/quantization/int8_conv_op.cc Clean up group convolution checks Revamp convolution API - caffe2/ideep/operators/conv_transpose_op.cc Clean up group convolution checks Clean up deconv workaround code ------ **Commit:** custom allocator - Register c10 allocator as mentioned above <br><br> ## Performance We tested inference on some common models based on user scenarios, and most performance numbers are either better than or on par with DNNL 0.20. ratio: new / old | Latency (batch=1 4T) | Throughput (batch=64 56T) -- | -- | -- pytorch resnet18 | 121.4% | 99.7% pytorch resnet50 | 123.1% | 106.9% pytorch resnext101_32x8d | 116.3% | 100.1% pytorch resnext50_32x4d | 141.9% | 104.4% pytorch mobilenet_v2 | 163.0% | 105.8% caffe2 alexnet | 303.0% | 99.2% caffe2 googlenet-v3 | 101.1% | 99.2% caffe2 inception-v1 | 102.2% | 101.7% caffe2 mobilenet-v1 | 356.1% | 253.7% caffe2 resnet101 | 100.4% | 99.8% caffe2 resnet152 | 99.8% | 99.8% caffe2 shufflenet | 141.1% | 69.0% † caffe2 squeezenet | 98.5% | 99.2% caffe2 vgg16 | 136.8% | 100.6% caffe2 googlenet-v3 int8 | 100.0% | 100.7% caffe2 mobilenet-v1 int8 | 779.2% | 943.0% caffe2 resnet50 int8 | 99.5% | 95.5% _Configuration: Platform: Skylake 8180 Latency Test: 4 threads, warmup 30, iteration 500, batch size 1 Throughput Test: 56 threads, warmup 30, iteration 200, batch size 64_ † Shufflenet is one of the few models that require temp buffers during inference. The performance degradation is an expected issue since we no longer cache any buffer in the ideep. As for the solution, we suggest users opt for caching allocator like **jemalloc** as a drop-in replacement for system allocator in such heavy workloads. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32422 Test Plan: Perf results: https://our.intern.facebook.com/intern/fblearner/details/177790608?tab=Experiment%20Results 10% improvement for ResNext with avx512, neutral on avx2 More results: https://fb.quip.com/ob10AL0bCDXW#NNNACAUoHJP Reviewed By: yinghai Differential Revision: D20381325 Pulled By: dzhulgakov fbshipit-source-id: 803b906fd89ed8b723c5fcab55039efe3e4bcb77
356 lines
15 KiB
Python
356 lines
15 KiB
Python
"Manages CMake."
|
|
|
|
from __future__ import print_function
|
|
|
|
import multiprocessing
|
|
import os
|
|
import re
|
|
from subprocess import check_call, check_output
|
|
import sys
|
|
import distutils
|
|
import distutils.sysconfig
|
|
from distutils.version import LooseVersion
|
|
|
|
from . import which
|
|
from .env import (BUILD_DIR, IS_64BIT, IS_DARWIN, IS_WINDOWS, check_negative_env_flag)
|
|
from .numpy_ import USE_NUMPY, NUMPY_INCLUDE_DIR
|
|
|
|
|
|
def _mkdir_p(d):
|
|
try:
|
|
os.makedirs(d)
|
|
except OSError:
|
|
pass
|
|
|
|
|
|
# Ninja
|
|
# Use ninja if it is on the PATH. Previous version of PyTorch required the
|
|
# ninja python package, but we no longer use it, so we do not have to import it
|
|
USE_NINJA = (not check_negative_env_flag('USE_NINJA') and
|
|
which('ninja') is not None)
|
|
|
|
def convert_cmake_value_to_python_value(cmake_value, cmake_type):
|
|
r"""Convert a CMake value in a string form to a Python value.
|
|
|
|
Arguments:
|
|
cmake_value (string): The CMake value in a string form (e.g., "ON", "OFF", "1").
|
|
cmake_type (string): The CMake type of :attr:`cmake_value`.
|
|
|
|
Returns:
|
|
A Python value corresponding to :attr:`cmake_value` with type :attr:`cmake_type`.
|
|
"""
|
|
|
|
cmake_type = cmake_type.upper()
|
|
up_val = cmake_value.upper()
|
|
if cmake_type == 'BOOL':
|
|
# https://gitlab.kitware.com/cmake/community/wikis/doc/cmake/VariablesListsStrings#boolean-values-in-cmake
|
|
return not (up_val in ('FALSE', 'OFF', 'N', 'NO', '0', '', 'NOTFOUND') or up_val.endswith('-NOTFOUND'))
|
|
elif cmake_type == 'FILEPATH':
|
|
if up_val.endswith('-NOTFOUND'):
|
|
return None
|
|
else:
|
|
return cmake_value
|
|
else: # Directly return the cmake_value.
|
|
return cmake_value
|
|
|
|
def get_cmake_cache_variables_from_file(cmake_cache_file):
|
|
r"""Gets values in CMakeCache.txt into a dictionary.
|
|
|
|
Arguments:
|
|
cmake_cache_file: A CMakeCache.txt file object.
|
|
Returns:
|
|
dict: A ``dict`` containing the value of cached CMake variables.
|
|
"""
|
|
|
|
results = dict()
|
|
for i, line in enumerate(cmake_cache_file, 1):
|
|
line = line.strip()
|
|
if not line or line.startswith(('#', '//')):
|
|
# Blank or comment line, skip
|
|
continue
|
|
|
|
# Almost any character can be part of variable name and value. As a practical matter, we assume the type must be
|
|
# valid if it were a C variable name. It should match the following kinds of strings:
|
|
#
|
|
# USE_CUDA:BOOL=ON
|
|
# "USE_CUDA":BOOL=ON
|
|
# USE_CUDA=ON
|
|
# USE_CUDA:=ON
|
|
# Intel(R) MKL-DNN_SOURCE_DIR:STATIC=/path/to/pytorch/third_party/ideep/mkl-dnn
|
|
# "OpenMP_COMPILE_RESULT_CXX_openmp:experimental":INTERNAL=FALSE
|
|
matched = re.match(r'("?)(.+?)\1(?::\s*([a-zA-Z_-][a-zA-Z0-9_-]*)?)?\s*=\s*(.*)', line)
|
|
if matched is None: # Illegal line
|
|
raise ValueError('Unexpected line {} in {}: {}'.format(i, repr(cmake_cache_file), line))
|
|
_, variable, type_, value = matched.groups()
|
|
if type_ is None:
|
|
type_ = ''
|
|
if type_.upper() in ('INTERNAL', 'STATIC'):
|
|
# CMake internal variable, do not touch
|
|
continue
|
|
results[variable] = convert_cmake_value_to_python_value(value, type_)
|
|
|
|
return results
|
|
|
|
class CMake:
|
|
"Manages cmake."
|
|
|
|
def __init__(self, build_dir=BUILD_DIR):
|
|
self._cmake_command = CMake._get_cmake_command()
|
|
self.build_dir = build_dir
|
|
|
|
@property
|
|
def _cmake_cache_file(self):
|
|
r"""Returns the path to CMakeCache.txt.
|
|
|
|
Returns:
|
|
string: The path to CMakeCache.txt.
|
|
"""
|
|
return os.path.join(self.build_dir, 'CMakeCache.txt')
|
|
|
|
@staticmethod
|
|
def _get_cmake_command():
|
|
"Returns cmake command."
|
|
|
|
cmake_command = 'cmake'
|
|
if IS_WINDOWS:
|
|
return cmake_command
|
|
cmake3 = which('cmake3')
|
|
if cmake3 is not None:
|
|
cmake = which('cmake')
|
|
if cmake is not None:
|
|
bare_version = CMake._get_version(cmake)
|
|
if (bare_version < LooseVersion("3.5.0") and
|
|
CMake._get_version(cmake3) > bare_version):
|
|
cmake_command = 'cmake3'
|
|
return cmake_command
|
|
|
|
@staticmethod
|
|
def _get_version(cmd):
|
|
"Returns cmake version."
|
|
|
|
for line in check_output([cmd, '--version']).decode('utf-8').split('\n'):
|
|
if 'version' in line:
|
|
return LooseVersion(line.strip().split(' ')[2])
|
|
raise RuntimeError('no version found')
|
|
|
|
def run(self, args, env):
|
|
"Executes cmake with arguments and an environment."
|
|
|
|
command = [self._cmake_command] + args
|
|
print(' '.join(command))
|
|
check_call(command, cwd=self.build_dir, env=env)
|
|
|
|
@staticmethod
|
|
def defines(args, **kwargs):
|
|
"Adds definitions to a cmake argument list."
|
|
for key, value in sorted(kwargs.items()):
|
|
if value is not None:
|
|
args.append('-D{}={}'.format(key, value))
|
|
|
|
def get_cmake_cache_variables(self):
|
|
r"""Gets values in CMakeCache.txt into a dictionary.
|
|
Returns:
|
|
dict: A ``dict`` containing the value of cached CMake variables.
|
|
"""
|
|
with open(self._cmake_cache_file) as f:
|
|
return get_cmake_cache_variables_from_file(f)
|
|
|
|
def generate(self, version, cmake_python_library, build_python, build_test, my_env, rerun):
|
|
"Runs cmake to generate native build files."
|
|
|
|
if rerun and os.path.isfile(self._cmake_cache_file):
|
|
os.remove(self._cmake_cache_file)
|
|
ninja_build_file = os.path.join(self.build_dir, 'build.ninja')
|
|
if os.path.exists(self._cmake_cache_file) and not (
|
|
USE_NINJA and not os.path.exists(ninja_build_file)):
|
|
# Everything's in place. Do not rerun.
|
|
return
|
|
|
|
args = []
|
|
if USE_NINJA:
|
|
# Avoid conflicts in '-G' and the `CMAKE_GENERATOR`
|
|
os.environ['CMAKE_GENERATOR'] = 'Ninja'
|
|
args.append('-GNinja')
|
|
elif IS_WINDOWS:
|
|
generator = os.getenv('CMAKE_GENERATOR', 'Visual Studio 15 2017')
|
|
supported = ['Visual Studio 15 2017', 'Visual Studio 16 2019']
|
|
if generator not in supported:
|
|
print('Unsupported `CMAKE_GENERATOR`: ' + generator)
|
|
print('Please set it to one of the following values: ')
|
|
print('\n'.join(supported))
|
|
sys.exit(1)
|
|
args.append('-G' + generator)
|
|
toolset_dict = {}
|
|
toolset_version = os.getenv('CMAKE_GENERATOR_TOOLSET_VERSION')
|
|
if toolset_version is not None:
|
|
toolset_dict['version'] = toolset_version
|
|
curr_toolset = os.getenv('VCToolsVersion')
|
|
if curr_toolset is None:
|
|
print('When you specify `CMAKE_GENERATOR_TOOLSET_VERSION`, you must also '
|
|
'activate the vs environment of this version. Please read the notes '
|
|
'in the build steps carefully.')
|
|
sys.exit(1)
|
|
if IS_64BIT:
|
|
args.append('-Ax64')
|
|
toolset_dict['host'] = 'x64'
|
|
if toolset_dict:
|
|
toolset_expr = ','.join(["{}={}".format(k, v) for k, v in toolset_dict.items()])
|
|
args.append('-T' + toolset_expr)
|
|
|
|
base_dir = os.path.dirname(os.path.dirname(os.path.dirname(
|
|
os.path.abspath(__file__))))
|
|
install_dir = os.path.join(base_dir, "torch")
|
|
|
|
_mkdir_p(install_dir)
|
|
_mkdir_p(self.build_dir)
|
|
|
|
# Store build options that are directly stored in environment variables
|
|
build_options = {
|
|
# The default value cannot be easily obtained in CMakeLists.txt. We set it here.
|
|
'CMAKE_PREFIX_PATH': distutils.sysconfig.get_python_lib()
|
|
}
|
|
# Build options that do not start with "BUILD_", "USE_", or "CMAKE_" and are directly controlled by env vars.
|
|
# This is a dict that maps environment variables to the corresponding variable name in CMake.
|
|
additional_options = {
|
|
# Key: environment variable name. Value: Corresponding variable name to be passed to CMake. If you are
|
|
# adding a new build option to this block: Consider making these two names identical and adding this option
|
|
# in the block below.
|
|
'_GLIBCXX_USE_CXX11_ABI': 'GLIBCXX_USE_CXX11_ABI',
|
|
'CUDNN_LIB_DIR': 'CUDNN_LIBRARY',
|
|
'USE_CUDA_STATIC_LINK': 'CAFFE2_STATIC_LINK_CUDA',
|
|
}
|
|
additional_options.update({
|
|
# Build options that have the same environment variable name and CMake variable name and that do not start
|
|
# with "BUILD_", "USE_", or "CMAKE_". If you are adding a new build option, also make sure you add it to
|
|
# CMakeLists.txt.
|
|
var: var for var in
|
|
('BLAS',
|
|
'BUILDING_WITH_TORCH_LIBS',
|
|
'CUDA_HOST_COMPILER',
|
|
'CUDA_NVCC_EXECUTABLE',
|
|
'CUDNN_LIBRARY',
|
|
'CUDNN_INCLUDE_DIR',
|
|
'CUDNN_ROOT',
|
|
'EXPERIMENTAL_SINGLE_THREAD_POOL',
|
|
'INSTALL_TEST',
|
|
'JAVA_HOME',
|
|
'INTEL_MKL_DIR',
|
|
'INTEL_OMP_DIR',
|
|
'MKL_THREADING',
|
|
'MKLDNN_CPU_RUNTIME',
|
|
'MSVC_Z7_OVERRIDE',
|
|
'Numa_INCLUDE_DIR',
|
|
'Numa_LIBRARIES',
|
|
'ONNX_ML',
|
|
'ONNX_NAMESPACE',
|
|
'ATEN_THREADING',
|
|
'WERROR')
|
|
})
|
|
|
|
for var, val in my_env.items():
|
|
# We currently pass over all environment variables that start with "BUILD_", "USE_", and "CMAKE_". This is
|
|
# because we currently have no reliable way to get the list of all build options we have specified in
|
|
# CMakeLists.txt. (`cmake -L` won't print dependent options when the dependency condition is not met.) We
|
|
# will possibly change this in the future by parsing CMakeLists.txt ourselves (then additional_options would
|
|
# also not be needed to be specified here).
|
|
true_var = additional_options.get(var)
|
|
if true_var is not None:
|
|
build_options[true_var] = val
|
|
elif var.startswith(('BUILD_', 'USE_', 'CMAKE_')):
|
|
build_options[var] = val
|
|
|
|
# Some options must be post-processed. Ideally, this list will be shrunk to only one or two options in the
|
|
# future, as CMake can detect many of these libraries pretty comfortably. We have them here for now before CMake
|
|
# integration is completed. They appear here not in the CMake.defines call below because they start with either
|
|
# "BUILD_" or "USE_" and must be overwritten here.
|
|
build_options.update({
|
|
# Note: Do not add new build options to this dict if it is directly read from environment variable -- you
|
|
# only need to add one in `CMakeLists.txt`. All build options that start with "BUILD_", "USE_", or "CMAKE_"
|
|
# are automatically passed to CMake; For other options you can add to additional_options above.
|
|
'BUILD_PYTHON': build_python,
|
|
'BUILD_TEST': build_test,
|
|
# Most library detection should go to CMake script, except this one, which Python can do a much better job
|
|
# due to NumPy's inherent Pythonic nature.
|
|
'USE_NUMPY': USE_NUMPY,
|
|
})
|
|
|
|
# Options starting with CMAKE_
|
|
cmake__options = {
|
|
'CMAKE_INSTALL_PREFIX': install_dir,
|
|
}
|
|
|
|
# We set some CMAKE_* options in our Python build code instead of relying on the user's direct settings. Emit an
|
|
# error if the user also attempts to set these CMAKE options directly.
|
|
specified_cmake__options = set(build_options).intersection(cmake__options)
|
|
if len(specified_cmake__options) > 0:
|
|
print(', '.join(specified_cmake__options) +
|
|
' should not be specified in the environment variable. They are directly set by PyTorch build script.')
|
|
sys.exit(1)
|
|
build_options.update(cmake__options)
|
|
|
|
CMake.defines(args,
|
|
PYTHON_EXECUTABLE=sys.executable,
|
|
PYTHON_LIBRARY=cmake_python_library,
|
|
PYTHON_INCLUDE_DIR=distutils.sysconfig.get_python_inc(),
|
|
TORCH_BUILD_VERSION=version,
|
|
NUMPY_INCLUDE_DIR=NUMPY_INCLUDE_DIR,
|
|
**build_options)
|
|
|
|
expected_wrapper = '/usr/local/opt/ccache/libexec'
|
|
if IS_DARWIN and os.path.exists(expected_wrapper):
|
|
if 'CMAKE_C_COMPILER' not in build_options and 'CC' not in os.environ:
|
|
CMake.defines(args, CMAKE_C_COMPILER="{}/gcc".format(expected_wrapper))
|
|
if 'CMAKE_CXX_COMPILER' not in build_options and 'CXX' not in os.environ:
|
|
CMake.defines(args, CMAKE_CXX_COMPILER="{}/g++".format(expected_wrapper))
|
|
|
|
for env_var_name in my_env:
|
|
if env_var_name.startswith('gh'):
|
|
# github env vars use utf-8, on windows, non-ascii code may
|
|
# cause problem, so encode first
|
|
try:
|
|
my_env[env_var_name] = str(my_env[env_var_name].encode("utf-8"))
|
|
except UnicodeDecodeError as e:
|
|
shex = ':'.join('{:02x}'.format(ord(c)) for c in my_env[env_var_name])
|
|
print('Invalid ENV[{}] = {}'.format(env_var_name, shex), file=sys.stderr)
|
|
print(e, file=sys.stderr)
|
|
# According to the CMake manual, we should pass the arguments first,
|
|
# and put the directory as the last element. Otherwise, these flags
|
|
# may not be passed correctly.
|
|
# Reference:
|
|
# 1. https://cmake.org/cmake/help/latest/manual/cmake.1.html#synopsis
|
|
# 2. https://stackoverflow.com/a/27169347
|
|
args.append(base_dir)
|
|
self.run(args, env=my_env)
|
|
|
|
def build(self, my_env):
|
|
"Runs cmake to build binaries."
|
|
|
|
from .env import build_type
|
|
|
|
max_jobs = os.getenv('MAX_JOBS', str(multiprocessing.cpu_count()))
|
|
build_args = ['--build', '.', '--target', 'install', '--config', build_type.build_type_string]
|
|
# This ``if-else'' clause would be unnecessary when cmake 3.12 becomes
|
|
# minimum, which provides a '-j' option: build_args += ['-j', max_jobs]
|
|
# would be sufficient by then.
|
|
if IS_WINDOWS and not USE_NINJA: # We are likely using msbuild here
|
|
build_args += ['--', '/p:CL_MPCount={}'.format(max_jobs)]
|
|
else:
|
|
build_args += ['--', '-j', max_jobs]
|
|
self.run(build_args, my_env)
|
|
|
|
# in cmake, .cu compilation involves generating certain intermediates
|
|
# such as .cu.o and .cu.depend, and these intermediates finally get compiled
|
|
# into the final .so.
|
|
# Ninja updates build.ninja's timestamp after all dependent files have been built,
|
|
# and re-kicks cmake on incremental builds if any of the dependent files
|
|
# have a timestamp newer than build.ninja's timestamp.
|
|
# There is a cmake bug with the Ninja backend, where the .cu.depend files
|
|
# are still compiling by the time the build.ninja timestamp is updated,
|
|
# so the .cu.depend file's newer timestamp is screwing with ninja's incremental
|
|
# build detector.
|
|
# This line works around that bug by manually updating the build.ninja timestamp
|
|
# after the entire build is finished.
|
|
ninja_build_file = os.path.join(self.build_dir, 'build.ninja')
|
|
if os.path.exists(ninja_build_file):
|
|
os.utime(ninja_build_file, None)
|