Files
pytorch/cmake/TorchConfig.cmake.in
Ashkan Aliabadi 6aecfd1e80 Mobile Backend: NHWC memory layout + XNNPACK integration. (#33722)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33722

In order to improve CPU performance on floating-point models on mobile, this PR introduces a new CPU backend for mobile that implements the most common mobile operators with NHWC memory layout support through integration with XNNPACK.

XNNPACK itself, and this codepath, are currently only included in the build, but the actual integration is gated with USE_XNNPACK preprocessor guards.  This preprocessor symbol is intentionally not passed on to the compiler, so as to enable this rollout in multiple stages in follow up PRs.  This changeset will build XNNPACK as part of the build if the identically named USE_XNNPACK CMAKE variable, defaulted to ON, is enabled, but will not actually expose or enable this code path in any other way.

Furthermore, it is worth pointing out that in order to efficiently map models to these operators, some front-end method of exposing this backend to the user is needed.  The less efficient implementation would be to hook these operators into their corresponding native implementations, granted that a series of XNNPACK-specific conditions are met, much like how NNPACK is integrated with PyTorch today for instance.

Having said that, while the above implementation is still expected to outperform NNPACK based on the benchmarks I ran, the above integration would be leave a considerable gap between the performance achieved and the maximum performance potential XNNPACK enables, as it does not provide a way to compute and factor out one-time operations out of the inner most forward() loop.

The more optimal solution, and one we will  decide on soon, would involve either providing a JIT pass that maps nn operators onto these newly introduced operators, while allowing one-time calculations to be factored out, much like quantized mobile models.  Alternatively, new eager-mode modules can also be introduced that would directly call into these implementations either through c10 or some other mechanism, also allowing for decoupling of op creation from op execution.

This PR does not include any of the front end changes  mentioned above.  Neither does it include the mobile threadpool unification present in the original https://github.com/pytorch/pytorch/issues/30644.  Furthermore, this codepath seems to be faster than NNPACK in a good number of use cases, which can potentially allow us to remove NNPACK from aten to make the codebase a little simpler, granted that there is widespread support for such a move.

Regardless, these changes will be introduced gradually and in a more controlled way in subsequent PRs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/32509

Test Plan:
Build: CI
Functionality: Not exposed

Reviewed By: dreiss

Differential Revision: D20069796

Pulled By: AshkanAliabadi

fbshipit-source-id: d46c1c91d4bea91979ea5bd46971ced5417d309c
2020-02-24 21:58:56 -08:00

131 lines
4.5 KiB
CMake

# FindTorch
# -------
#
# Finds the Torch library
#
# This will define the following variables:
#
# TORCH_FOUND -- True if the system has the Torch library
# TORCH_INCLUDE_DIRS -- The include directories for torch
# TORCH_LIBRARIES -- Libraries to link against
# TORCH_CXX_FLAGS -- Additional (required) compiler flags
#
# and the following imported targets:
#
# torch
include(FindPackageHandleStandardArgs)
if (DEFINED ENV{TORCH_INSTALL_PREFIX})
set(TORCH_INSTALL_PREFIX $ENV{TORCH_INSTALL_PREFIX})
else()
# Assume we are in <install-prefix>/share/cmake/Torch/TorchConfig.cmake
get_filename_component(CMAKE_CURRENT_LIST_DIR "${CMAKE_CURRENT_LIST_FILE}" PATH)
get_filename_component(TORCH_INSTALL_PREFIX "${CMAKE_CURRENT_LIST_DIR}/../../../" ABSOLUTE)
endif()
# Include directories.
if (EXISTS "${TORCH_INSTALL_PREFIX}/include")
set(TORCH_INCLUDE_DIRS
${TORCH_INSTALL_PREFIX}/include
${TORCH_INSTALL_PREFIX}/include/torch/csrc/api/include)
else()
set(TORCH_INCLUDE_DIRS
${TORCH_INSTALL_PREFIX}/include
${TORCH_INSTALL_PREFIX}/include/torch/csrc/api/include)
endif()
# Library dependencies.
if (@BUILD_SHARED_LIBS@)
find_package(Caffe2 REQUIRED PATHS ${CMAKE_CURRENT_LIST_DIR}/../Caffe2)
set(TORCH_LIBRARIES torch ${Caffe2_MAIN_LIBS})
else()
add_library(torch STATIC IMPORTED) # set imported_location at the bottom
set(TORCH_LIBRARIES torch)
endif()
find_library(C10_LIBRARY c10 PATHS "${TORCH_INSTALL_PREFIX}/lib")
list(APPEND TORCH_LIBRARIES ${C10_LIBRARY})
# We need manually add dependent libraries when they are not linked into the
# shared library.
# TODO: this list might be incomplete.
if (NOT @BUILD_SHARED_LIBS@)
find_library(TORCH_CPU_LIBRARY torch_cpu PATHS "${TORCH_INSTALL_PREFIX}/lib")
list(APPEND TORCH_LIBRARIES ${TORCH_CPU_LIBRARY})
if (@USE_NNPACK@)
find_library(NNPACK_LIBRARY nnpack PATHS "${TORCH_INSTALL_PREFIX}/lib")
list(APPEND TORCH_LIBRARIES ${NNPACK_LIBRARY})
endif()
if (@USE_PYTORCH_QNNPACK@)
find_library(PYTORCH_QNNPACK_LIBRARY pytorch_qnnpack PATHS "${TORCH_INSTALL_PREFIX}/lib")
list(APPEND TORCH_LIBRARIES ${PYTORCH_QNNPACK_LIBRARY})
endif()
if (@USE_XNNPACK@)
find_library(XNNPACK_LIBRARY XNNPACK PATHS "${TORCH_INSTALL_PREFIX}/lib")
list(APPEND TORCH_LIBRARIES ${XNNPACK_LIBRARY})
endif()
if (@INTERN_USE_EIGEN_BLAS@)
find_library(EIGEN_BLAS_LIBRARY eigen_blas PATHS "${TORCH_INSTALL_PREFIX}/lib")
list(APPEND TORCH_LIBRARIES ${EIGEN_BLAS_LIBRARY})
endif()
find_library(CPUINFO_LIBRARY cpuinfo PATHS "${TORCH_INSTALL_PREFIX}/lib")
list(APPEND TORCH_LIBRARIES ${CPUINFO_LIBRARY})
find_library(CLOG_LIBRARY clog PATHS "${TORCH_INSTALL_PREFIX}/lib")
list(APPEND TORCH_LIBRARIES ${CLOG_LIBRARY})
endif()
if (@USE_CUDA@)
if(MSVC)
set(NVTOOLEXT_HOME "C:/Program Files/NVIDIA Corporation/NvToolsExt")
if ($ENV{NVTOOLEXT_HOME})
set(NVTOOLEXT_HOME $ENV{NVTOOLEXT_HOME})
endif()
set(TORCH_CUDA_LIBRARIES
${NVTOOLEXT_HOME}/lib/x64/nvToolsExt64_1.lib
${CUDA_LIBRARIES})
list(APPEND TORCH_INCLUDE_DIRS ${NVTOOLEXT_HOME}/include)
find_library(CAFFE2_NVRTC_LIBRARY caffe2_nvrtc PATHS "${TORCH_INSTALL_PREFIX}/lib")
list(APPEND TORCH_CUDA_LIBRARIES ${CAFFE2_NVRTC_LIBRARY})
elseif(APPLE)
set(TORCH_CUDA_LIBRARIES
${CUDA_TOOLKIT_ROOT_DIR}/lib/libcudart.dylib
${CUDA_TOOLKIT_ROOT_DIR}/lib/libnvrtc.dylib
${CUDA_TOOLKIT_ROOT_DIR}/lib/libnvToolsExt.dylib
${CUDA_LIBRARIES})
else()
find_library(LIBNVTOOLSEXT libnvToolsExt.so PATHS ${CUDA_TOOLKIT_ROOT_DIR}/lib64/)
set(TORCH_CUDA_LIBRARIES
${CUDA_CUDA_LIB}
${CUDA_NVRTC_LIB}
${LIBNVTOOLSEXT}
${CUDA_LIBRARIES})
endif()
find_library(C10_CUDA_LIBRARY c10_cuda PATHS "${TORCH_INSTALL_PREFIX}/lib")
list(APPEND TORCH_CUDA_LIBRARIES ${C10_CUDA_LIBRARY})
list(APPEND TORCH_LIBRARIES ${TORCH_CUDA_LIBRARIES})
endif()
# When we build libtorch with the old GCC ABI, dependent libraries must too.
if ("${CMAKE_CXX_COMPILER_ID}" STREQUAL "GNU")
set(TORCH_CXX_FLAGS "-D_GLIBCXX_USE_CXX11_ABI=@GLIBCXX_USE_CXX11_ABI@")
endif()
find_library(TORCH_LIBRARY torch PATHS "${TORCH_INSTALL_PREFIX}/lib")
set_target_properties(torch PROPERTIES
IMPORTED_LOCATION "${TORCH_LIBRARY}"
INTERFACE_INCLUDE_DIRECTORIES "${TORCH_INCLUDE_DIRS}"
CXX_STANDARD 14
)
if (TORCH_CXX_FLAGS)
set_property(TARGET torch PROPERTY INTERFACE_COMPILE_OPTIONS "${TORCH_CXX_FLAGS}")
endif()
find_package_handle_standard_args(torch DEFAULT_MSG TORCH_LIBRARY TORCH_INCLUDE_DIRS)