pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Author	SHA1	Message	Date
Raman Kumar	65ddd91421	Fix redundant H2D/D2H memcpy in cpp_wrapper by creating scalar tensors on CPU (#160584 ) Fixes #160520 Summary: When running Inductor with cpp_wrapper under a DeviceContext, non-tensor arguments were being wrapped with torch.tensor(arg) without specifying the device. creating the tensor on the current active device (like CUDA), and later fetching it back to CPU via .item(), causing unnecessary host-device-host memory transfers. PR fixes issue by explicitly creating scalar tensors on the CPU: ``` input_tensors = [ arg if isinstance(arg, torch.Tensor) else torch.tensor(arg, device='cpu') for arg in args ] ``` impact: inductor, codegen Pull Request resolved: https://github.com/pytorch/pytorch/pull/160584 Approved by: https://github.com/benjaminglass1, https://github.com/desertfire, https://github.com/mlazos, https://github.com/jeffdaily	2025-09-24 23:40:37 +00:00
Nicolas Macchioni	13efb2c858	[BE] Deprecate `search_autotune_cache` (#155302 ) We haven't had the offline cache populated in > 1 year, this should be safe; if this passes, we can finally go through and rip out the offline cache logic Pull Request resolved: https://github.com/pytorch/pytorch/pull/155302 Approved by: https://github.com/masnesral	2025-06-26 17:30:08 +00:00
Xuehai Pan	f5e6e52f25	[BE][PYFMT] migrate PYFMT for `test/inductor/` to `ruff format` (#148186 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/148186 Approved by: https://github.com/jansel	2025-06-24 11:12:11 +00:00
xinan.lin	037d7af778	[Inductor UT] Enable PYTORCH_TESTING_DEVICE_ONLY_FOR test case filter for test_torchinductor.py (#149023 ) The environ var PYTORCH_TESTING_DEVICE_ONLY_FOR controls the devices in get_desired_device_type_test_bases, so we add RUN_CPU and RUN_GPU to make sure cases are only enabled for devices specified for PYTORCH_TESTING_DEVICE_ONLY_FOR. eg. Only enable GPU cases, not CPU cases even HAS_CPU. Pull Request resolved: https://github.com/pytorch/pytorch/pull/149023 Approved by: https://github.com/jansel, https://github.com/cyyever	2025-03-13 05:15:28 +00:00
cyy	b7832f0339	Enable ASAN in CUDA tests (#147812 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/147812 Approved by: https://github.com/janeyx99	2025-03-04 02:50:39 +00:00
Gabriel Ferns	bab35eb26a	fix intermediate debug information with cpp_wrapper (#145527 ) Summary: before fix, code like: ```cpp aoti_torch_print_tensor_handle(buf0, "after_launch - triton_poi_fused_randn_0 - buf0"); aoti_torch_print_tensor_handle(buf1, "after_launch - triton_poi_fused_randn_0 - buf1"); printf("[ after_launch - triton_poi_fused_randn_0 - 0: %ld ]", 0); printf(" "); printf("[ after_launch - triton_poi_fused_randn_0 - 1228800L: %ld ]", 1228800L); printf(" "); ``` was generated, which is a syntax error. Test Plan: New unit test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145527 Approved by: https://github.com/desertfire	2025-02-10 22:24:26 +00:00
Gabriel Ferns	5a527fa5ee	Make sure not using cpp wrapper when setting nvtx training annotation (#145538 ) Longer term would be good to add as a feature to cpp_wrapper, but this makes sure it doesn't fail on main. Not sure if this needs a test because it's not meant to compose, but will add one if necessary. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145538 Approved by: https://github.com/desertfire	2025-01-30 18:34:22 +00:00
xinan.lin	934eaa503f	[Inductor XPU] Support max-autotune on XPU and reuse the corresponding Inductor UT. (#143266 ) This PR aims to add the functionality support of max-autotune for XPU. The current triton templates and configurations are not well optimized for XPU, so the performance is not ready yet. Also the `mm_plus_mm` template have accuracy issues in some cases. We will address these issues in the next PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143266 Approved by: https://github.com/EikanWang, https://github.com/jansel	2024-12-30 23:51:17 +00:00
Jason Ansel	e343f46464	[inductor] Refactor is_big_gpu (#142220 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/142220 Approved by: https://github.com/yanboliang ghstack dependencies: #142219, #142033, #142222	2024-12-08 18:51:36 +00:00
xinan.lin	b75bb64eb4	[AOTI XPU] Rename test_cuda_cpp_wrapper.py to test_gpu_cpp_wrapper.py, (#135320 ) [Inductor] Rename test_cuda_cpp_wrapper.py to test_gpu_cpp_wrapper.py, since the test suite is shared by cuda and xpu. Pull Request resolved: https://github.com/pytorch/pytorch/pull/135320 Approved by: https://github.com/jansel, https://github.com/EikanWang, https://github.com/desertfire ghstack dependencies: #135318	2024-11-27 14:08:06 +00:00

10 Commits