pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-21 05:34:18 +08:00

Author	SHA1	Message	Date
Yu, Guangye	0819de412d	Add a new API torch.xpu.can_device_access_peer for Intel GPU (#162705 ) # Motivation Aligned with other backends, this PR introduces an new API `torch.xpu.can_device_access_peer`, which is used in vllm distributed [scenarios](`2048c4e379/vllm/distributed/device_communicators/custom_all_reduce.py (L37)`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/162705 Approved by: https://github.com/EikanWang, https://github.com/ezyang	2025-09-16 18:00:22 +00:00
Yu, Guangye	f8746b878d	Add uuid to XPU device properties (#161392 ) # Motivation Fix https://github.com/intel/torch-xpu-ops/issues/1955 Refer to https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/supported/sycl_ext_intel_device_info.md#device-uuid, `ext::intel::info::device::uuid` returns `std::array<unsigned char, 16>` as the UUID. Pull Request resolved: https://github.com/pytorch/pytorch/pull/161392 Approved by: https://github.com/EikanWang, https://github.com/albanD	2025-09-02 06:41:32 +00:00
Yu, Guangye	cb1e31362c	Remove background thread UT on XPU to fix CI (#161844 ) # Motivation Because we revert `torch._C._set_allocator_settings` in https://github.com/pytorch/pytorch/pull/161626, this UT becomes invalid. Fix https://github.com/pytorch/pytorch/issues/161697 Pull Request resolved: https://github.com/pytorch/pytorch/pull/161844 Approved by: https://github.com/gujinghui	2025-09-01 03:45:26 +00:00
Yu, Guangye	8cfaf51d4e	Generalize support of background thread in pinned allocator (#160505 ) # Motivation https://github.com/pytorch/pytorch/pull/135524 only introduces the support of background thread for CUDA, this PR intends to support it for other backend such as XPU as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/160505 Approved by: https://github.com/albanD	2025-08-14 02:22:39 +00:00
Yu, Guangye	da1f608ca3	Add UT for torch.accelerator memory-related API (#155200 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/155200 Approved by: https://github.com/albanD ghstack dependencies: #138222, #152932	2025-08-08 17:41:22 +00:00
PyTorch MergeBot	c4e64467b5	Revert "Add UT for torch.accelerator memory-related API (#155200 )" This reverts commit 4604f0482c2b4a3001b62e5bc5085149a9bb053c. Reverted https://github.com/pytorch/pytorch/pull/155200 on behalf of https://github.com/jithunnair-amd due to Broke ROCm periodic runs on MI300 e.g. https://github.com/pytorch/pytorch/actions/runs/16764977800/job/47470050573 ([comment](https://github.com/pytorch/pytorch/pull/138222#issuecomment-3164941815))	2025-08-07 16:34:36 +00:00
Yu, Guangye	4604f0482c	Add UT for torch.accelerator memory-related API (#155200 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/155200 Approved by: https://github.com/albanD ghstack dependencies: #138222, #152932	2025-08-06 02:22:18 +00:00
Yu, Guangye	5cc4e856fd	Add device_id to XPU device properties (#156481 ) # Motivation Some older Intel iGPUs may share the same device name across different hardware products. (See [device name example](`aaa01c06f9/shared/source/dll/devices/devices_base.inl (L190-L199)`)) To help disambiguate which specific iGPU product is being used, we introduce the use of a [device id](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/supported/sycl_ext_intel_device_info.md#device-id). This device id corresponds to the Device ID in [official Intel product specification](https://www.intel.com/content/www/us/en/products/sku/232155/intel-core-i71360p-processor-18m-cache-up-to-5-00-ghz/specifications.html) and enables more accurate identification and troubleshooting for user issues. Pull Request resolved: https://github.com/pytorch/pytorch/pull/156481 Approved by: https://github.com/EikanWang, https://github.com/albanD	2025-07-03 01:22:11 +00:00
Yu, Guangye	10cef1e25d	Remove torch XPU ABI=0 build logic for old compiler (#150095 ) # Motivation Follow https://github.com/pytorch/pytorch/pull/149888, this PR intends to remove ABI=0 build logic for PyTorch XPU build with old compiler (< 2025.0). For newer compilers >= 2025.0, the ABI is neutral by default without requiring additional compilation options (`-fpreview-breaking-changes`). # Additional Context This PR depends on XPU CI pass, which will be fixed by https://github.com/pytorch/pytorch/pull/149843 and https://github.com/intel/torch-xpu-ops/pull/1515 Pull Request resolved: https://github.com/pytorch/pytorch/pull/150095 Approved by: https://github.com/EikanWang, https://github.com/malfet	2025-06-06 13:13:19 +00:00
Yu, Guangye	adfd5b293a	Enhance UT on elapsed_time for XPUEvent (#154494 ) # Motivation UT enhancement to avoid the incorrect elapsed time return by xpu's Event. Pull Request resolved: https://github.com/pytorch/pytorch/pull/154494 Approved by: https://github.com/EikanWang	2025-05-30 02:00:02 +00:00
Panagiotis Kourdis	44f19c7179	Record the XPU and XCCL build settings in the compiled binary (#147161 ) Fixes #ISSUE_NUMBER Currently the XPU and XCCL build settings are not recorded in the compiled binary and are not shown using the `torch.__config__.show()` which is a quick way to check if the binary has been built with such support. Below is the output adding them (see end of last line): ``` Python 3.12.8 \| packaged by conda-forge \| (main, Dec 5 2024, 14:24:40) [GCC 13.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import torch >>> print(torch.__config__.show()) PyTorch built with: - GCC 13.3 - C++ Version: 201703 - Intel(R) oneAPI Math Kernel Library Version 2025.1-Product Build 20250203 for Intel(R) 64 architecture applications - Intel(R) MKL-DNN v3.5.3 (Git Hash 66f0cb9eb66affd2da3bf5f8d897376f04aae6af) - OpenMP 201511 (a.k.a. OpenMP 4.5) - LAPACK is enabled (usually provided by MKL) - CPU capability usage: AVX512 XPU backend - Build settings: BLAS_INFO=mkl, BUILD_TYPE=RelWithDebInfo, COMMIT_SHA=43eb39d7c832b5560f7bfa8d29cc7919ac21c0ca, CXX_COMPILER=/home/pkourdis/compilers/gcc-13.3.0/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=OFF -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-dangling-reference -Wno-error=dangling-reference -Wno-error=redundant-move -DUSE_XPU -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, TORCH_VERSION=2.7.0, USE_CUDA=0, USE_CUDNN=OFF, USE_CUSPARSELT=OFF, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=1, USE_MPI=0, USE_NCCL=OFF, USE_NNPACK=0, USE_OPENMP=ON, USE_ROCM=0, USE_ROCM_KERNEL_ASSERT=OFF, USE_XCCL=1, USE_XPU=1, ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/147161 Approved by: https://github.com/guangyey, https://github.com/EikanWang, https://github.com/albanD Co-authored-by: Yu, Guangye <106960996+guangyey@users.noreply.github.com>	2025-05-20 09:21:39 +00:00
Yu, Guangye	e32a16a9da	Correct torch.xpu.is_bf16_supported return False if no XPU detected (#152317 ) # Motivation Fix https://github.com/pytorch/pytorch/issues/152301 When XPU is not available, calling `torch.xpu.is_bf16_supported()` still returns `True`, which is inconsistent with the expected behavior (should be False). # Solution Align to other backend, adding `including_emulation` to `torch.xpu.is_bf16_supported` and, - return `False` if XPU is not available - return `True` if `including_emulation` is True - return `torch.xpu.get_device_properties().has_bfloat16_conversions` if `including_emulation` is False, it means if the device could generate SPIRV code for bf16. Pull Request resolved: https://github.com/pytorch/pytorch/pull/152317 Approved by: https://github.com/EikanWang	2025-05-06 10:03:17 +00:00
Yu, Guangye	35c727e7ff	Fix typo on `test_multi_device_context_manager` for XPU (#152812 ) # Motivation Align https://github.com/pytorch/pytorch/pull/152474, fix the typo on UT for XPU introduced by https://github.com/pytorch/pytorch/issues/148864 Pull Request resolved: https://github.com/pytorch/pytorch/pull/152812 Approved by: https://github.com/EikanWang, https://github.com/Skylion007	2025-05-06 02:51:19 +00:00
FFFrog	580913290c	[Easy] The event_id of torch.cuda.Event and torch.xpu.Event always is 0 (#151226 ) Although torch.cuda.Event and torch.xpu.Event have cuda_event and sycl_event fields respectively, the event_id exposed from the base class torch.Event is always 0, which can confuse users. The memory of torch.Event is not useful to torch.cuda.Event and torch.xpu.Event, but we still need to inherit from torch.Event because CPython will check it. Repro with cuda: ``` >>> import torch >>> event = torch.cuda.Event() >>> event.cuda_event 0 >>> event.event_id 0 >>> event.record() >>> event.cuda_event 127982096 >>> event.event_id 0 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/151226 Approved by: https://github.com/albanD, https://github.com/guangyey ghstack dependencies: #151404, #151221, #151411	2025-04-26 14:18:22 +00:00
Yu, Guangye	33c75cae0a	Add torch.accelerator.device_index as accelerator's device switch context (#148864 ) # Motivation We propose adding support for the Python with statement on `torch.accelerator.device_index` to enable device switching functionality. This enhancement would simplify writing device-agnostic code and provide benefits across all accelerators. Its device-specific counterparts include [`torch.cuda.device`](`00199acdb8/torch/cuda/__init__.py (L482)`) and [`torch.cuda._DeviceGuard`](`00199acdb8/torch/cuda/__init__.py (L469)`). Design Philosophy It accepts either an `Int` or `None` as input. When `None` is passed, no device switch is performed. Supporting `None` is important for compatibility, as it's possible to encounter `None` values from `torch.device.index`. Therefore, with this PR, we can do like this ```python src = 0 dst = 1 # Set src to current device torch.accelerator.set_device_index(src) with torch.accelerator.device_index(dst): # Inside with statement, we set dst to current device assert torch.accelerator.get_device_index() == dst # Here the current device should be src assert torch.accelerator.get_device_index() == src ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/148864 Approved by: https://github.com/albanD	2025-04-25 09:45:25 +00:00
PyTorch MergeBot	33808f0ebd	Revert "[Easy] The event_id of torch.cuda.Event and torch.xpu.Event always is 0 (#151226 )" This reverts commit 8e5fefedf4af3f31ccd05290c1b21eedf6a4ad1b. Reverted https://github.com/pytorch/pytorch/pull/151226 on behalf of https://github.com/malfet due to Reverting to unblock revert of https://github.com/pytorch/pytorch/pull/151404 ([comment](https://github.com/pytorch/pytorch/pull/151226#issuecomment-2819030735))	2025-04-21 17:07:49 +00:00
FFFrog	8e5fefedf4	[Easy] The event_id of torch.cuda.Event and torch.xpu.Event always is 0 (#151226 ) Although torch.cuda.Event and torch.xpu.Event have cuda_event and sycl_event fields respectively, the event_id exposed from the base class torch.Event is always 0, which can confuse users. The memory of torch.Event is not useful to torch.cuda.Event and torch.xpu.Event, but we still need to inherit from torch.Event because CPython will check it. Repro with cuda: ``` >>> import torch >>> event = torch.cuda.Event() >>> event.cuda_event 0 >>> event.event_id 0 >>> event.record() >>> event.cuda_event 127982096 >>> event.event_id 0 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/151226 Approved by: https://github.com/albanD	2025-04-19 10:42:00 +00:00
LuFengqing	0376bbf5b3	[XPU] skip a subprocess UT for Windows (#150999 ) This case creates subprocess in a subprocess. In Windows it can't load function at this scenario hence I have to skip it ``` File "C:\ProgramData\miniforge3\envs\lfq\lib\multiprocessing\spawn.py", line 116, in spawn_main exitcode = _main(fd, parent_sentinel) File "C:\ProgramData\miniforge3\envs\lfq\lib\multiprocessing\spawn.py", line 126, in _main self = reduction.pickle.load(from_parent) AttributeError: Can't get attribute 'run_model' on <module '__main__' (built-in)> Traceback (most recent call last): File "<string>", line 25, in <module> File "<string>", line 16, in test_multi_process AssertionError ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/150999 Approved by: https://github.com/guangyey, https://github.com/EikanWang Co-authored-by: Yu, Guangye <106960996+guangyey@users.noreply.github.com>	2025-04-18 08:55:47 +00:00
fengqing.lu	a106842ea8	[XPU] Fix XPU unit test on Windows (#150520 ) This PR is to resolve issue reported in https://github.com/intel/torch-xpu-ops/issues/1478 There are two cases failing in our Windows CI enabling. - test_xpu.py::TestXpuXPU::test_lazy_init_xpu Needs to add `if __name__ == '__main__':` for Windows when using multiprocess. Refer to https://stackoverflow.com/a/18205006 ``` RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase. This probably means that you are not using fork to start your child processes and you have forgotten to use the proper idiom in the main module: if __name__ == '__main__': freeze_support() ... The "freeze_support()" line can be omitted if the program is not going to be frozen to produce an executable. Traceback (most recent call last): File "C:\Users\sdp\lufengqing\torch-xpu-ops\test\xpu\xpu_test_utils.py", line 24, in <module> test_multi_process(model, input) File "C:\Users\sdp\lufengqing\torch-xpu-ops\test\xpu\xpu_test_utils.py", line 16, in test_multi_process assert p.exitcode == 0 AssertionError ``` - test_xpu.py::TestXpuXPU::test_wrong_xpu_fork_xpu is a linux only test case, we should skip it on Windows. Refer to `248487f455/test/test_multiprocessing.py (L609)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/150520 Approved by: https://github.com/guangyey, https://github.com/EikanWang	2025-04-08 07:02:40 +00:00
Stonepia	6c0e7463af	Fix test_device_memory_allocated (#147311 ) Fixes #147310 The `torch.ones` allocates memory and is released immediately, thus the following assertion will fail. This PR stores it into a temp variable to fix it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147311 Approved by: https://github.com/guangyey, https://github.com/Skylion007	2025-02-17 19:00:53 +00:00
PyTorch MergeBot	b80ecc4457	Revert "Fix poision child process issue when call getAccelerator() (#144368 )" This reverts commit 2583d831d40d6fa64f0b637d5bc7598e484a3283. Reverted https://github.com/pytorch/pytorch/pull/144368 on behalf of https://github.com/clee2000 due to broke internal tests D68023262, probably the same problem as noted in the issue this PR is mentioned above ([comment](https://github.com/pytorch/pytorch/pull/144368#issuecomment-2584848568))	2025-01-10 23:36:43 +00:00
Yu, Guangye	2583d831d4	Fix poision child process issue when call getAccelerator() (#144368 ) # Motivation fix https://github.com/pytorch/pytorch/issues/144152 # Solution - Align `at::globalContext()::hasXXX` to determine if accelerator XXX is built with PyTorch or an extension already registered to PyTorch. - Define `at::hasXXX` to determine if accelerator XXX is available at runtime. - Use `at::globalContext()::hasXXX` in `getAccelerator` rather than `at::hasXXX` to avoid initializing the XXX runtime (which can poison child processes) while detecting the current accelerator. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144368 Approved by: https://github.com/albanD, https://github.com/atalman, https://github.com/gujinghui	2025-01-10 09:28:27 +00:00
Yu, Guangye	6de110b862	Support with statement on torch.Stream (#140138 ) # Motivation We propose to support Python with statement on `torch.Stream`. This is a benefit for all accelerators when writing device-agnostic code. The device-specific stream will also be supported because they are generally derived from `torch.Stream`. With this PR, we can do like this ```python s1= torch.Stream() # Set s1 to the current stream torch.accelerator.set_stream(s1) with torch.Stream() as s2: # Inside with statement, we set s2 to the current stream assert torch.accelerator.current_stream() == s2 # Here the current stream should be s1 assert torch.accelerator.current_stream() == s1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/140138 Approved by: https://github.com/albanD	2025-01-10 02:05:19 +00:00
Yu, Guangye	07fa6e2c8b	Fix torch.accelerator api abort when passing invaild device (#143550 ) # Motivation Fix https://github.com/pytorch/pytorch/issues/143543 # Solution We should raise python exception instead of aborting... # Additional Context without this PR: ```python >>> import torch >>> torch.accelerator.current_stream(torch.accelerator.device_count()) terminate called after throwing an instance of 'c10::Error' what(): device is out of range, device is 2, total number of device is 2. Exception raised from check_device_index at /home/dvrogozh/git/pytorch/pytorch/c10/xpu/XPUFunctions.h:36 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0xac (0x7f30707eb95c in /home/dvrogozh/git/pytorch/pytorch/torch/lib/libc10.so) frame #1: c10::detail::torchCheckFail(char const, char const, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xf3 (0x7f307078fc57 in /home/dvrogozh/git/pytorch/pytorch/torch/lib/libc10.so) frame #2: <unknown function> + 0x19a3e (0x7f3070c2ba3e in /home/dvrogozh/git/pytorch/pytorch/torch/lib/libc10_xpu.so) frame #3: c10::xpu::getCurrentXPUStream(signed char) + 0x2f (0x7f3070c2c83f in /home/dvrogozh/git/pytorch/pytorch/torch/lib/libc10_xpu.so) frame #4: <unknown function> + 0x1ca35 (0x7f3070c2ea35 in /home/dvrogozh/git/pytorch/pytorch/torch/lib/libc10_xpu.so) frame #5: <unknown function> + 0x653f15 (0x7f3083391f15 in /home/dvrogozh/git/pytorch/pytorch/torch/lib/libtorch_python.so) frame #6: <unknown function> + 0x39e5f2 (0x7f30830dc5f2 in /home/dvrogozh/git/pytorch/pytorch/torch/lib/libtorch_python.so) <omitting python frames> frame #20: <unknown function> + 0x29d90 (0x7f308b19bd90 in /lib/x86_64-linux-gnu/libc.so.6) frame #21: __libc_start_main + 0x80 (0x7f308b19be40 in /lib/x86_64-linux-gnu/libc.so.6) Aborted (core dumped) ``` with this PR: ```python >>> import torch >>> torch.accelerator.current_stream(torch.accelerator.device_count()) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/pt-gpu/4T-4652/guangyey/stock-pytorch/torch/accelerator/__init__.py", line 123, in current_stream return torch._C._accelerator_getStream(device_index) RuntimeError: The device index is out of range. It must be in [0, 2), but got 2. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/143550 Approved by: https://github.com/EikanWang, https://github.com/dvrogozh, https://github.com/albanD	2024-12-23 03:44:22 +00:00
Tom Ritchford	d8c8ba2440	Fix unused Python variables in test/[e-z]* (#136964 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/136964 Approved by: https://github.com/justinchuby, https://github.com/albanD	2024-12-18 23:02:30 +00:00
Yu, Guangye	45ac4ebf15	[RELAND] Add UTs for accelerator device-agnostic runtime APIs (#133572 ) # Motivation This PR intends to add UTs for accelerator device-agnostic APIs. # Additional Context This PR is relanded. It is reverted because `torch.Event` doesn't support mps backend. We have fixed it in https://github.com/pytorch/pytorch/pull/142468. The previous commit is `952514f0c8` Pull Request resolved: https://github.com/pytorch/pytorch/pull/133572 Approved by: https://github.com/EikanWang, https://github.com/albanD ghstack dependencies: #143171	2024-12-16 02:18:41 +00:00
PyTorch MergeBot	1b3f8b7589	Revert "[RELAND] Add UTs for accelerator device-agnostic runtime APIs (#133572 )" This reverts commit 209119424922b135fef39aba1f25da3b67f5879a. Reverted https://github.com/pytorch/pytorch/pull/133572 on behalf of https://github.com/huydhn due to Sorry for reverting your change but the new test is still very flaky on MacOS even when it does not segfault anymore ([comment](https://github.com/pytorch/pytorch/pull/133572#issuecomment-2537256522))	2024-12-11 21:47:18 +00:00
Yu, Guangye	2091194249	[RELAND] Add UTs for accelerator device-agnostic runtime APIs (#133572 ) # Motivation This PR intends to add UTs for accelerator device-agnostic APIs. # Additional Context This PR is relanded. It is reverted because `torch.Event` doesn't support mps backend. We have fixed it in https://github.com/pytorch/pytorch/pull/142468. The previous commit is `952514f0c8` Pull Request resolved: https://github.com/pytorch/pytorch/pull/133572 Approved by: https://github.com/EikanWang, https://github.com/albanD ghstack dependencies: #142468	2024-12-11 02:04:52 +00:00
PyTorch MergeBot	a1c6cf7e9f	Revert "Add UTs for accelerator device-agnostic runtime APIs (#133572 )" This reverts commit 952514f0c8d8ff2e1719e0ca82b0d178a5c5ff45. Reverted https://github.com/pytorch/pytorch/pull/133572 on behalf of https://github.com/malfet due to Sorry for reverting your PR, but it segfaults on MacOS ([comment](https://github.com/pytorch/pytorch/pull/133572#issuecomment-2530354401))	2024-12-10 04:42:55 +00:00
Yu, Guangye	952514f0c8	Add UTs for accelerator device-agnostic runtime APIs (#133572 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/133572 Approved by: https://github.com/EikanWang, https://github.com/albanD	2024-12-07 13:14:10 +00:00
Yu, Guangye	8dd4673cea	Support torch.xpu.mem_get_info API (#141230 ) # Motivate Fix https://github.com/pytorch/pytorch/issues/130599 This PR intends to add a new API, `torch.xpu.mem_get_info,` which is widely used in popular model workloads. For example, [here](`403c0714d1/src/accelerate/utils/modeling.py (L721)`) we need to get current GPU memory usage to split or load the model. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141230 Approved by: https://github.com/EikanWang, https://github.com/albanD	2024-12-05 08:17:25 +00:00
cyy	653efe14e4	[3/N] Enable UBSAN tests (#142022 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/142022 Approved by: https://github.com/ezyang	2024-12-05 06:06:53 +00:00
Yu, Guangye	b556549357	Use default context on Windows for Intel GPU (#138049 ) # Motivation Use default context in Windows to keep consistency with Linux. It makes it easy to interact with external libraries like `dlpack`. # Additional Context This PR depends on Intel GPU oneAPI 2025.0. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138049 Approved by: https://github.com/gujinghui	2024-11-28 02:49:46 +00:00
Yu, Guangye	a8482ab3a8	[Reland] Enable XPUEvent elapsed_time function (#140873 ) # Motivation This PR intends to reland https://github.com/pytorch/pytorch/pull/134666 that has been reverted in https://github.com/pytorch/pytorch/pull/140872 We reverted it because I forgot to support `elapsed_time` for `XPUGuardImpl`, which resulted in `c10::Event` not supporting' elapsed_time' and blocking XPU CI. # Additional Context We split https://github.com/pytorch/pytorch/pull/134666 into two parts: one part, PR #140865, supports `elapsed_time` for `torch.Event` and another one, this PR, supports for `torch.xpu.elapsed_time`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140873 Approved by: https://github.com/gujinghui ghstack dependencies: #140865	2024-11-28 02:41:11 +00:00
Yu, Guangye	b1a8be6b0a	Support torch.Event elapsed_time method on XPU (#140865 ) # Motivation This PR aims to support c10::Event/torch.Event elapsed_time method on XPU. We create a profiling tag Event when the timing flag is enabled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140865 Approved by: https://github.com/Samkm0084, https://github.com/gujinghui	2024-11-28 02:41:11 +00:00
Yu, Guangye	1af69eee4a	Solid XPU UT test_memory_allocation (#141325 ) # Motivation Fix https://github.com/pytorch/pytorch/issues/141326 # Additional Context We use the previous value queried by these APIs as the reference value rather than 0. With this PR, we don't depend on the Python garbage collection mechanism anymore. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141325 Approved by: https://github.com/EikanWang	2024-11-22 13:14:49 +00:00
Yu, Guangye	62d2c5b667	Revert "Enable XPUEvent elapsed_time function (#134666 )" (#140872 ) # Motivation This PR raises an internal UT failure on XPU. This reverts commit 4bbd6da33101a8d709f1d2921ad8ae6f9b0dc166. # Additional Context refer to https://github.com/pytorch/pytorch/issues/140814 Pull Request resolved: https://github.com/pytorch/pytorch/pull/140872 Approved by: https://github.com/EikanWang	2024-11-18 02:58:05 +00:00
xinan.lin	8d3a07e321	[Inductor UT] Skip test_decompose_mem_bound_mm.py for XPU since we have not enabled decompose_mem_bound_mm for XPU. (#140517 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/140517 Approved by: https://github.com/EikanWang, https://github.com/jansel	2024-11-14 03:36:20 +00:00
Yu, Guangye	4bbd6da331	Enable XPUEvent elapsed_time function (#134666 ) # Motivation This PR aims to enable `elapsed_time` function for `XPUEvent`. # Additional Context This PR depends on toolchain oneAPI 2025.0. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134666 Approved by: https://github.com/EikanWang, https://github.com/ezyang	2024-11-13 04:32:50 +00:00
Yu, Guangye	659d2132be	Add architecture to XPU device property (#138186 ) # Motivation Add `architecture` to XPU device property. In some cases, low-level application code can use special features or do specific optimizations depending on the device architecture, and this PR enables such applications. Modified from https://github.com/pytorch/pytorch/pull/129675/files Pull Request resolved: https://github.com/pytorch/pytorch/pull/138186 Approved by: https://github.com/ezyang	2024-11-13 03:35:13 +00:00
Yu, Guangye	052b67e2b4	Add torch.version.xpu (#139466 ) # Motivation We add a new attribute `torch.version.xpu` to facilitate the problem diagnosing and version control. # Additional Context It is aligned with `torch.version.cuda` and `torch.version.hip`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/139466 Approved by: https://github.com/EikanWang, https://github.com/ezyang, https://github.com/atalman, https://github.com/malfet ghstack dependencies: #139258	2024-11-09 13:31:21 +00:00
Yu, Guangye	8cda774a03	Add torch.xpu.get_arch_list and torch.xpu.get_gencode_flags for XPU (#137773 ) # Motivation Add `torch.xpu.get_arch_list()` and `torch.xpu.get_gencode_flags()` methods that return architecture list and AOT flags to preserve what flags PyTorch XPU was built with. Pull Request resolved: https://github.com/pytorch/pytorch/pull/137773 Approved by: https://github.com/EikanWang, https://github.com/albanD	2024-10-18 02:28:08 +00:00
Yu, Guangye	df5bbc09d1	Make device-specific event inherits from torch.Event (#134845 ) # Motivation This PR intends to make device-specific Event inherit from the generic torch.Event. The benefit is providing a generic abstract class `torch.Event` for different devices, like `torch.Stream`. This make it easier for Dynamo to capture the Event of different devices, like torch.cuda.Event and torch.xpu.Event. And the next PR would like to remove previous useless base class `_StreamBase` and `_EventBase` to avoid multiple Inheritance. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134845 Approved by: https://github.com/albanD, https://github.com/EikanWang	2024-10-01 06:28:41 +00:00
Yu, Guangye	e6b68359d7	Fix xpu memory stats error (#135818 ) # Motivation fix https://github.com/pytorch/pytorch/issues/135726 After merging two free blocks, I made a stupid mistake of ignoring the correct size to decrease the active memory size, which should be the original block size instead of the merged block size. # Additional Context Add a UT to guard this scenario. Pull Request resolved: https://github.com/pytorch/pytorch/pull/135818 Approved by: https://github.com/EikanWang	2024-09-13 02:41:21 +00:00
Yu, Guangye	b53d97c7be	[Intel GPU] Add XPU memory-related APIs (#129919 ) # Motivation According to https://github.com/pytorch/pytorch/issues/116322, we will help unify the device allocator. So we introduce a simple xpu device allocator only with the key functionality first. And expect to add some memory statistics-related functionality after the unification. But now, some memory statistic-related APIs listed in https://github.com/pytorch/pytorch/issues/127929 are requested. We need more time to unify the device allocator. In order to facilitate the user experience, we expect to support these memory statistic-related APIs before the unification. # Additional Context Fixes: #127929 Pull Request resolved: https://github.com/pytorch/pytorch/pull/129919 Approved by: https://github.com/dvrogozh, https://github.com/abhilash1910, https://github.com/gujinghui, https://github.com/EikanWang, https://github.com/albanD ghstack dependencies: #130923	2024-09-07 11:15:17 +00:00
FFFrog	80a6d60829	Moving _run_autocast_outofplace to basic class named TestAutocast to reduce redundance (#134460 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/134460 Approved by: https://github.com/EikanWang, https://github.com/ezyang	2024-09-04 10:48:58 +00:00
Yu, Guangye	fbd020fce6	Add new prop to _XpuDevicePropertie for triton gemm optimization (#131738 ) # Motivation This PR aims to add new properties to `_XpuDevicePropertie` for triton gemm optimization. # Additional Context `ext_oneapi_supports_cl_extension` is not a ABI-neutral API. It depends on compiler 2025.0. For more details, see https://github.com/intel/llvm/pull/13212 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131738 Approved by: https://github.com/gujinghui	2024-08-18 08:32:30 +00:00
Xuehai Pan	ba48cf6535	[BE][Easy][6/19] enforce style for empty lines in import segments in `test/` (#129757 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129757 Approved by: https://github.com/ezyang	2024-07-17 06:42:37 +00:00
Yu, Guangye	78a0b010eb	Refine XPU UTs (#130138 ) # Motivation 1. enable all test cases related to `TestXpu` running in XPU CI. 2. make `test_lazy_init` stable. Pull Request resolved: https://github.com/pytorch/pytorch/pull/130138 Approved by: https://github.com/EikanWang	2024-07-05 09:56:22 +00:00
Yu, Guangye	98d34d849d	Add a XPU UT to ensure lazy init (#129638 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/129638 Approved by: https://github.com/gujinghui	2024-06-28 13:22:17 +00:00

1 2

67 Commits