[Intel GPU] allow_tf32 for oneDNN backend - XPU part (#137570)

# Motivation Add context variable `torch.bachend.mkldnn.allow_tf32` to control tf32 computation in convolution kernels at XPU side. The tf32 data type is beneficial to improve the performance of deep learning workloads during training/inference. Current PR uses the [oneDNN API fpmath_mode](https://oneapi-src.github.io/oneDNN/dev_guide_attributes_fpmath_mode.html#the-floating-point-math-mode-attribute) to trigger the tf32 acceleration in convolution kernels. # Valiadation * ut to test context variable `python test/xpu/test_conv.py -k test_mkldnn_allow_tf32_get_set` * Runtime exemplification ``` onednn_verbose,primitive,exec,gpu:0,convolution,jit:ir,forward_training,src_f32::blocked:abcd::f0 wei_f32::blocked:abcd::f0 bia_f32::blocked:a::f0 dst_f32::blocked:abcd::f0,attr-scratchpad:user attr-fpmath:tf32,alg:convolution_direct,mb20_ic16oc33_ih50oh24kh3sh2dh0ph0_iw100ow49kw3sw2dw0pw0,0.649902 onednn_verbose,primitive,exec,gpu:0,convolution,jit:ir,forward_training,src_f32::blocked:abcd::f0 wei_f32::blocked:abcd::f0 bia_f32::blocked:a::f0 dst_f32::blocked:abcd::f0,attr-scratchpad:user attr-fpmath:tf32,alg:convolution_direct,mb20_ic33oc33_ih24oh24kh3sh1dh0ph1_iw49ow49kw3sw1dw0pw1,0.151855 onednn_verbose,primitive,exec,gpu:0,convolution,jit:ir,backward_data,src_f32::blocked:abcd::f0 wei_f32::blocked:abcd::f0 bia_undef::undef::: dst_f32::blocked:abcd::f0,attr-scratchpad:user attr-fpmath:tf32,alg:convolution_direct,mb20_ic33oc33_ih24oh24kh3sh1dh0ph1_iw49ow49kw3sw1dw0pw1,0.167969 onednn_verbose,primitive,exec,gpu:0,convolution,jit:ir,backward_weights,src_f32::blocked:abcd::f0 wei_f32::blocked:abcd::f0 bia_f32::blocked:a::f0 dst_f32::blocked:abcd::f0,attr-scratchpad:user attr-fpmath:tf32,alg:convolution_direct,mb20_ic33oc33_ih24oh24kh3sh1dh0ph1_iw49ow49kw3sw1dw0pw1,0.26709 onednn_verbose,primitive,exec,gpu:0,convolution,jit:ir,backward_weights,src_f32::blocked:abcd::f0 wei_f32::blocked:abcd::f0 bia_f32::blocked:a::f0 dst_f32::blocked:abcd::f0,attr-scratchpad:user attr-fpmath:tf32,alg:convolution_direct,mb20_ic16oc33_ih50oh24kh3sh2dh0ph0_iw100ow49kw3sw2dw0pw0,0.219971 ``` According to the field `fpmath:tf32` in verbose, we could see that, current context setting utils could successfully trigger tf32 computation in conv forward/backward_data/backward_weights kernels. Pull Request resolved: https://github.com/pytorch/pytorch/pull/137570 Approved by: https://github.com/guangyey, https://github.com/EikanWang, https://github.com/atalman, https://github.com/malfet Co-authored-by: Yu, Guangye <guangye.yu@intel.com>
2025-10-20 21:14:14 +08:00 · 2025-02-16 03:55:33 +00:00
parent 198ffbdf11
commit ae351d4d0e
9 changed files with 89 additions and 5 deletions
--- a/torch/csrc/Module.cpp
+++ b/torch/csrc/Module.cpp
@ -947,6 +947,29 @@ static PyObject* THPModule_setDeterministicAlgorithms(
  END_HANDLE_TH_ERRORS
 }

+static PyObject* THPModule_setAllowTF32OneDNN(
+    PyObject* _unsued,
+    PyObject* arg) {
+  HANDLE_TH_ERRORS
+  TORCH_CHECK(
+      PyBool_Check(arg),
+      "_set_onednn_allow_tf32 expects a bool, "
+      "but got ",
+      THPUtils_typename(arg));
+  at::globalContext().setAllowTF32OneDNN(arg == Py_True);
+  Py_RETURN_NONE;
+  END_HANDLE_TH_ERRORS
+}
+
+static PyObject* THPModule_allowTF32OneDNN(
+    PyObject* _unused,
+    PyObject* noargs) {
+  if (at::globalContext().allowTF32OneDNN())
+    Py_RETURN_TRUE;
+  else
+    Py_RETURN_FALSE;
+}
+
 static PyObject* THPModule_deterministicAlgorithms(
    PyObject* _unused,
    PyObject* noargs) {
@ -1527,6 +1550,8 @@ static std::initializer_list<PyMethodDef> TorchMethods = {
    {"_set_mkldnn_enabled", THPModule_setUserEnabledMkldnn, METH_O, nullptr},
    {"_get_cudnn_allow_tf32", THPModule_allowTF32CuDNN, METH_NOARGS, nullptr},
    {"_set_cudnn_allow_tf32", THPModule_setAllowTF32CuDNN, METH_O, nullptr},
+    {"_get_onednn_allow_tf32", THPModule_allowTF32OneDNN, METH_NOARGS, nullptr},
+    {"_set_onednn_allow_tf32", THPModule_setAllowTF32OneDNN, METH_O, nullptr},
    {"_get_cudnn_benchmark", THPModule_benchmarkCuDNN, METH_NOARGS, nullptr},
    {"_set_cudnn_benchmark", THPModule_setBenchmarkCuDNN, METH_O, nullptr},
    {"_get_cudnn_deterministic",