change the test wheel to release wheel when release wheel available (#145884 )

change the test wheel to release wheel when release wheel available (#145252) change the test wheel to release wheel when release wheel available Pull Request resolved: https://github.com/pytorch/pytorch/pull/145252 Approved by: https://github.com/seemethere, https://github.com/atalman Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com> (cherry picked from commit 9003d81144fcda2d96814cf9126dbe2b9deb7de7) Co-authored-by: Zheng, Zhaoqiong <zhaoqiong.zheng@intel.com>
[CUDA] Change slim-wheel libraries load order (#145662 )
2025-11-04 08:00:58 +08:00 · 2025-01-28 16:09:34 -08:00 · 2025-01-24 14:54:25 -08:00 · 2025-01-24 08:40:13 -08:00 · 2025-01-24 08:39:52 -08:00 · 2025-01-24 09:16:57 -05:00
3 changed files with 42 additions and 61 deletions
--- a/.ci/manywheel/build_cuda.sh
+++ b/.ci/manywheel/build_cuda.sh
@ -63,7 +63,7 @@ case ${CUDA_VERSION} in
        if [[ "$GPU_ARCH_TYPE" = "cuda-aarch64" ]]; then
            TORCH_CUDA_ARCH_LIST="9.0"
        else
-            TORCH_CUDA_ARCH_LIST="${TORCH_CUDA_ARCH_LIST};9.0+PTX"
+            TORCH_CUDA_ARCH_LIST="${TORCH_CUDA_ARCH_LIST};9.0"
        fi
        EXTRA_CAFFE2_CMAKE_FLAGS+=("-DATEN_NO_TEST=ON")
        ;;
--- a/docs/source/notes/get_start_xpu.rst
+++ b/docs/source/notes/get_start_xpu.rst
@ -8,23 +8,24 @@ Hardware Prerequisite
   :widths: 50 50
   :header-rows: 1

-   * - Validated Hardware
-     - Supported OS
-   * - Intel® Data Center GPU Max Series
-     - Linux
-   * - Intel Client GPU
-     - Windows/Linux
+   * - Supported OS
+     - Validated Hardware
+   * - Linux
+     - Intel® Client GPUs / Intel® Data Center GPU Max Series
+   * - Windows
+     - Intel® Client GPUs
+   * - WSL2 (experimental feature)
+     - Intel® Client GPUs

-Intel GPUs support (Prototype) is ready in PyTorch* 2.5 for Intel® Data Center GPU Max Series and Intel® Client GPUs on both Linux and Windows, which brings Intel GPUs and the SYCL* software stack into the official PyTorch stack with consistent user experience to embrace more AI application scenarios.
+Intel GPUs support (Prototype) is ready in PyTorch* 2.6 for Intel® Client GPUs and Intel® Data Center GPU Max Series on both Linux and Windows, which brings Intel GPUs and the SYCL* software stack into the official PyTorch stack with consistent user experience to embrace more AI application scenarios.

 Software Prerequisite
 ---------------------

-Visit `PyTorch Installation Prerequisites for Intel GPUs <https://www.intel.com/content/www/us/en/developer/articles/tool/pytorch-prerequisites-for-intel-gpus.html>`_ for more detailed information regarding:
+To use PyTorch on Intel GPUs, you need to install the Intel GPUs driver first. For installation guide, visit `Intel GPUs Driver Installation <https://www.intel.com/content/www/us/en/developer/articles/tool/pytorch-prerequisites-for-intel-gpu/2-6.html#driver-installation>`_.
+
+Intel GPU Drivers are sufficient for binary installation, while building from source requires both Intel GPU Drivers and Intel® Deep Learning Essentials. Please refer to  `PyTorch Installation Prerequisites for Intel GPUs <https://www.intel.com/content/www/us/en/developer/articles/tool/pytorch-prerequisites-for-intel-gpu/2-6.html>`_ for more information.

-#. Intel GPU driver installation
-#. Intel support package installation
-#. Environment setup

 Installation
 ------------
@ -32,17 +33,13 @@ Installation
 Binaries
 ^^^^^^^^

-Platform Linux
-""""""""""""""
+Now that we have `Intel GPU Driver <https://www.intel.com/content/www/us/en/developer/articles/tool/pytorch-prerequisites-for-intel-gpu/2-6.html#driver-installation>`_ installed, use the following commands to install ``pytorch``, ``torchvision``, ``torchaudio`` on Linux.

-
-Now we have all the required packages installed and environment activated. Use the following commands to install ``pytorch``, ``torchvision``, ``torchaudio`` on Linux.
-
-For preview wheels
+For release wheels

 .. code-block::

-    pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/test/xpu
+    pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/xpu

 For nightly wheels

@ -50,26 +47,13 @@ For nightly wheels

    pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/xpu

-Platform Windows
-""""""""""""""""

-Now we have all the required packages installed and environment activated. Use the following commands to install ``pytorch`` on Windows, build from source for ``torchvision`` and ``torchaudio``.
-
-For preview wheels
-
-.. code-block::
-
-    pip3 install torch --index-url https://download.pytorch.org/whl/test/xpu
-
-For nightly wheels
-
-.. code-block::
-
-    pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/xpu

 From Source
 ^^^^^^^^^^^

+Now that we have `Intel GPU Driver and Intel® Deep Learning Essentials <https://www.intel.com/content/www/us/en/developer/articles/tool/pytorch-prerequisites-for-intel-gpu/2-6.html>`_ installed. Follow guides to build ``pytorch``, ``torchvision``, ``torchaudio`` from source.
+
 Build from source for ``torch`` refer to `PyTorch Installation Build from source <https://github.com/pytorch/pytorch?tab=readme-ov-file#from-source>`_.

 Build from source for ``torchvision`` refer to `Torchvision Installation Build from source <https://github.com/pytorch/vision/blob/main/CONTRIBUTING.md#development-installation>`_.
@ -86,11 +70,7 @@ To check if your Intel GPU is available, you would typically use the following c
   import torch
   torch.xpu.is_available()  # torch.xpu is the API for Intel GPU support

-If the output is ``False``, double check following steps below.
-
-#. Intel GPU driver installation
-#. Intel support package installation
-#. Environment setup
+If the output is ``False``, double check driver installation for Intel GPUs.

 Minimum Code Change
 -------------------
@ -183,22 +163,22 @@ Inference with ``torch.compile``
   model = model.to("xpu")
   data = data.to("xpu")

-    for i in range(ITERS):
-        start = time.time()
-        with torch.no_grad():
-            model(data)
-            torch.xpu.synchronize()
-        end = time.time()
-        print(f"Inference time before torch.compile for iteration {i}: {(end-start)*1000} ms")
+   for i in range(ITERS):
+       start = time.time()
+       with torch.no_grad():
+           model(data)
+           torch.xpu.synchronize()
+       end = time.time()
+       print(f"Inference time before torch.compile for iteration {i}: {(end-start)*1000} ms")

-    model = torch.compile(model)
-    for i in range(ITERS):
-        start = time.time()
-        with torch.no_grad():
-            model(data)
-            torch.xpu.synchronize()
-        end = time.time()
-        print(f"Inference time after torch.compile for iteration {i}: {(end-start)*1000} ms")
+   model = torch.compile(model)
+   for i in range(ITERS):
+       start = time.time()
+       with torch.no_grad():
+           model(data)
+           torch.xpu.synchronize()
+       end = time.time()
+       print(f"Inference time after torch.compile for iteration {i}: {(end-start)*1000} ms")

   print("Execution finished")

--- a/torch/init.py
+++ b/torch/init.py
@ -316,20 +316,21 @@ def _load_global_deps() -> None:

    try:
        ctypes.CDLL(global_deps_lib_path, mode=ctypes.RTLD_GLOBAL)
-        # Workaround slim-wheel CUDA-12.4+ dependency bug in libcusparse by preloading nvjitlink
-        # In those versions of cuda cusparse depends on nvjitlink, but does not have rpath when
+        # Workaround slim-wheel CUDA dependency bugs in cusparse and cudnn by preloading nvjitlink
+        # and nvrtc. In CUDA-12.4+ cusparse depends on nvjitlink, but does not have rpath when
        # shipped as wheel, which results in OS picking wrong/older version of nvjitlink library
-        # if `LD_LIBRARY_PATH` is defined
-        # See https://github.com/pytorch/pytorch/issues/138460
-        if version.cuda not in ["12.4", "12.6"]:  # type: ignore[name-defined]
-            return
+        # if `LD_LIBRARY_PATH` is defined, see https://github.com/pytorch/pytorch/issues/138460
+        # Similar issue exist in cudnn that dynamically loads nvrtc, unaware of its relative path.
+        # See https://github.com/pytorch/pytorch/issues/145580
        try:
            with open("/proc/self/maps") as f:
                _maps = f.read()
            # libtorch_global_deps.so always depends in cudart, check if its installed via wheel
            if "nvidia/cuda_runtime/lib/libcudart.so" not in _maps:
                return
-            # If all abovementioned conditions are met, preload nvjitlink
+            # If all above-mentioned conditions are met, preload nvrtc and nvjitlink
+            # Please note that order are important for CUDA-11.8 , as nvjitlink does not exist there
+            _preload_cuda_deps("cuda_nvrtc", "libnvrtc.so.*[0-9]")
            _preload_cuda_deps("nvjitlink", "libnvJitLink.so.*[0-9]")
        except Exception:
            pass
Author	SHA1	Message	Date
pytorchbot	1eba9b3aa3	change the test wheel to release wheel when release wheel available (#145884 ) change the test wheel to release wheel when release wheel available (#145252) change the test wheel to release wheel when release wheel available Pull Request resolved: https://github.com/pytorch/pytorch/pull/145252 Approved by: https://github.com/seemethere, https://github.com/atalman Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com> (cherry picked from commit 9003d81144fcda2d96814cf9126dbe2b9deb7de7) Co-authored-by: Zheng, Zhaoqiong <zhaoqiong.zheng@intel.com>	2025-01-28 16:09:34 -08:00
pytorchbot	2236df1770	[CUDA] Change slim-wheel libraries load order (#145662 ) [CUDA] Change slim-wheel libraries load order (#145638) There is no libnvjitlink in CUDA-11.x , so attempts to load it first will abort the execution and prevent the script from preloading nvrtc Fixes issues reported in https://github.com/pytorch/pytorch/pull/145614#issuecomment-2613107072 Pull Request resolved: https://github.com/pytorch/pytorch/pull/145638 Approved by: https://github.com/atalman, https://github.com/kit1980, https://github.com/malfet Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com> (cherry picked from commit 2a70de7e9257e3f8c2874a10e3612c8939b79867) Co-authored-by: Wei Wang <weiwan@nvidia.com>	2025-01-24 14:54:25 -08:00
pytorchbot	3207040966	[CD] Fix slim-wheel cuda_nvrtc import problem (#145614 ) [CD] Fix slim-wheel cuda_nvrtc import problem (#145582) Similar fix as: https://github.com/pytorch/pytorch/pull/144816 Fixes: https://github.com/pytorch/pytorch/issues/145580 Found during testing of https://github.com/pytorch/pytorch/issues/138340 Please note both nvrtc and nvjitlink exist for cuda 11.8, 12.4 and 12.6 hence we can safely remove if statement. Preloading can apply to all supporting cuda versions. CUDA 11.8 path: ``` (.venv) root@b4ffe5c8ac8c:/pytorch/.ci/pytorch/smoke_test# ls /.venv/lib/python3.12/site-packages/torch/lib/../../nvidia/cuda_nvrtc/lib __init__.py __pycache__ libnvrtc-builtins.so.11.8 libnvrtc-builtins.so.12.4 libnvrtc.so.11.2 libnvrtc.so.12 (.venv) root@b4ffe5c8ac8c:/pytorch/.ci/pytorch/smoke_test# ls /.venv/lib/python3.12/site-packages/torch/lib/../../nvidia/nvjitlink/lib __init__.py __pycache__ libnvJitLink.so.12 ``` Test with rc 2.6 and CUDA 11.8: ``` python cudnn_test.py 2.6.0+cu118 ---------------------------------------------SDPA-Flash--------------------------------------------- ALL GOOD ---------------------------------------------SDPA-CuDNN--------------------------------------------- ALL GOOD ``` Thank you @nWEIdia for discovering this issue Pull Request resolved: https://github.com/pytorch/pytorch/pull/145582 Approved by: https://github.com/nWEIdia, https://github.com/eqy, https://github.com/kit1980, https://github.com/malfet Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com> (cherry picked from commit 9752c7c1c819ce9027806c20492adc235dddecd6) Co-authored-by: atalman <atalman@fb.com>	2025-01-24 08:40:13 -08:00
Andrey Talman	ca3c3a63b8	[Release-Only] Remove ptx from Linux CUDA 12.6 binary builds (#145616 ) Cuda 12.6 remove +ptx	2025-01-24 08:39:52 -08:00
pytorchbot	7be6b5db47	Fix IdentationError of code example (#145525 ) Fix IdentationError of code example (#145251) I found there is IndentationError when try to copy paste the example of inference with torch.compile fix the format in this pr Pull Request resolved: https://github.com/pytorch/pytorch/pull/145251 Approved by: https://github.com/mikaylagawarecki Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com> (cherry picked from commit fef92c9447c6786b095fdbada6cfe7280c510e59) Co-authored-by: Zheng, Zhaoqiong <zhaoqiong.zheng@intel.com>	2025-01-24 09:16:57 -05:00
pytorchbot	dcb8ad070f	update get start xpu (#145286 ) update get start xpu (#143183) - Support new Intel client GPU on Windows [Intel® Arc™ B-Series graphics](https://www.intel.com/content/www/us/en/products/docs/discrete-gpus/arc/desktop/b-series/overview.html) and [Intel® Core™ Ultra Series 2 with Intel® Arc™ Graphics](https://www.intel.com/content/www/us/en/products/details/processors/core-ultra.html) - Support vision/audio prebuilt wheels on Windows Pull Request resolved: https://github.com/pytorch/pytorch/pull/143183 Approved by: https://github.com/EikanWang, https://github.com/leslie-fang-intel, https://github.com/atalman, https://github.com/malfet Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com> (cherry picked from commit 465a1cfe2e8a49cb72df3bb33e78bf1572e13e51) Co-authored-by: ZhaoqiongZ <106125927+ZhaoqiongZ@users.noreply.github.com>	2025-01-24 09:15:54 -05:00