According to the [APL documentation](https://developer.arm.com/documentation/101004/2404/General-information/Arm-Performance-Libraries-example-programs), libraries ending with _mp are OpenMP multi-threaded libraries. When a project is compiled with MSVC and the -openmp flag, the vcomp library (Visual C++ implementation of OpenMP) is used for runtime calls. However, the current APL implementation uses the libomp.dll (LLVM) variant. As a result, there are unexpected behaviors at runtime. --- For Example: ```python import torch # Create a sparse tensor # Input (Sparse Tensor): # [[0, 1], # [1, 0]] indices = torch.tensor([[0, 1], [1, 0]]) values = torch.tensor([1, 1], dtype=torch.float32) size = torch.Size([2, 2]) sparse_tensor = torch.sparse_coo_tensor(indices, values, size) # Convert sparse tensor to dense tensor dense_tensor = sparse_tensor.to_dense() # Expected Output (Dense Tensor): # [[0, 1], # [1, 0]] print("\nDense Tensor:") print(dense_tensor) ``` However, it prints unexpected outputs such as: ```python # [[0, 11], # [10, 0]] ``` The issue arises because the following code does not function as expected at runtime: https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/ParallelOpenMP.h#L30 ```c++ // returns 1 , however since OpenMP is enabled it should return total number of threads int64_t num_threads = omp_get_num_threads(); ``` --- In the runtime, loading multiple OpenMP libraries (in this case `libomp` and `vcomp`) is causing unexpected behaviours. So, we've changed libraries from `_mp` to non `_mp` versions and we used `vcomp` for OpenMP calls. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145215 Approved by: https://github.com/ozanMSFT, https://github.com/malfet Co-authored-by: Ozan Aydin <148207261+ozanMSFT@users.noreply.github.com>
This folder contains various custom cmake modules for finding libraries and packages. Details about some of them are listed below.
FindOpenMP.cmake
This is modified from the file included in CMake 3.13 release, with the following changes:
-
Replace
VERSION_GREATER_EQUAL
withNOT ... VERSION_LESS
asVERSION_GREATER_EQUAL
is not supported in CMake 3.5 (our min supported version). -
Update the
separate_arguments
commands to not useNATIVE_COMMAND
which is not supported in CMake 3.5 (our min supported version). -
Make it respect the
QUIET
flag so that, when it is set,try_compile
failures are not reported. -
For
AppleClang
compilers, use-Xpreprocessor
instead of-Xclang
as the later is not documented. -
For
AppleClang
compilers, an extra flag option is tried, which is-Xpreprocessor -openmp -I${DIR_OF_omp_h}
, where${DIR_OF_omp_h}
is a obtained usingfind_path
onomp.h
withbrew
's default include directory as a hint. Without this, the compiler will complain about missing headers as they are not natively included in Apple's LLVM. -
For non-GNU compilers, whenever we try a candidate OpenMP flag, first try it with directly linking MKL's
libomp
if it has one. Otherwise, we may end up linking twolibomp
s and end up with this nasty error:OMP: Error #15: Initializing libomp.dylib, but found libiomp5.dylib already initialized. OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://openmp.llvm.org/
See NOTE [ Linking both MKL and OpenMP ] for details.