Update on "[xpu][feature][Inductor XPU GEMM] Step 10/N: Switch XPU triton scheduling to combined"

This PR officially replaces the Inductor XPU backend scheduling from TritonScheduling to XPUCombinedScheduling, enabling support for both CUTLASS and Triton backends. It also refactors test_cutlass_backend to be enabled on XPU, and the sycl-tla(intel cutlass) is also added in XPU CI. Currently, the CUTLASS XPU backend does not yet support epilogue fusion.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

[ghstack-poisoned]
This commit is contained in:
xinan.lin
2025-11-15 01:00:49 +00:00
3 changed files with 10 additions and 6 deletions

View File

@ -341,7 +341,9 @@ function print_sccache_stats() {
fi
}
function clone_sycl_tla() {
rm -rf ./sycl-tla
git clone --depth 1 --single-branch -b v0.6 --quiet https://github.com/intel/sycl-tla.git
function install_sycl_tla() {
target_dir=$1
rm -rf "$target_dir"
git clone --depth 1 --single-branch -b v0.6 --quiet https://github.com/intel/sycl-tla.git "${target_dir}"
pip install dpctl==0.20.2
}

View File

@ -178,9 +178,11 @@ elif [[ "$BUILD_ENVIRONMENT" == *xpu* ]]; then
export PYTORCH_TESTING_DEVICE_ONLY_FOR="xpu"
# setting PYTHON_TEST_EXTRA_OPTION
export PYTHON_TEST_EXTRA_OPTION="--xpu"
clone_sycl_tla
TORCHINDUCTOR_CUTLASS_DIR=$(realpath ./sycl-tla)
# setting sycl-tla (Intel cutlass)
install_sycl_tla "./third_party/sycl-tla" || exit 1
TORCHINDUCTOR_CUTLASS_DIR=$(realpath "./third_party/sycl-tla")
export TORCHINDUCTOR_CUTLASS_DIR
# end setting sycl-tla
fi
if [[ "$TEST_CONFIG" == *crossref* ]]; then

View File

@ -24,7 +24,7 @@ if TYPE_CHECKING:
import torch
from torch.utils._ordered_set import OrderedSet
from .common import BackendFeature
from ..common import BackendFeature
_IntLike: TypeAlias = Union[int, Expr]