pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Author	SHA1	Message	Date
Benjamin Glass	007935a802	[cpp_wrapper] Swap to new PyBind11 simple GIL header (#161063 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/161063 Approved by: https://github.com/Skylion007 ghstack dependencies: #160754	2025-08-27 21:15:01 +00:00
PyTorch MergeBot	1ce423274d	Revert "[cpp_wrapper] Swap to new PyBind11 simple GIL header (#161063 )" This reverts commit 74c4c758afa8c28162f00a456c185552e1159fd3. Reverted https://github.com/pytorch/pytorch/pull/161063 on behalf of https://github.com/atalman due to sorry broke vllm tests please see https://github.com/pytorch/pytorch/pull/160754#issuecomment-3226051449 ([comment](https://github.com/pytorch/pytorch/pull/161063#issuecomment-3226065212))	2025-08-26 23:31:23 +00:00
Benjamin Glass	74c4c758af	[cpp_wrapper] Swap to new PyBind11 simple GIL header (#161063 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/161063 Approved by: https://github.com/Skylion007 ghstack dependencies: #160754	2025-08-26 01:21:18 +00:00
Nicolas Macchioni	13efb2c858	[BE] Deprecate `search_autotune_cache` (#155302 ) We haven't had the offline cache populated in > 1 year, this should be safe; if this passes, we can finally go through and rip out the offline cache logic Pull Request resolved: https://github.com/pytorch/pytorch/pull/155302 Approved by: https://github.com/masnesral	2025-06-26 17:30:08 +00:00
CaoE	159a39ad34	Add an option for cpp_wrapper to compile entry and kernel separately (#156050 ) Fixes #156037. Compiling entry and kernel separately has a non-negligible impact on the performance. This PR is to add an option for cpp_wrapper to control whether to compile entry and kernel separately, and turn it off by default. Pull Request resolved: https://github.com/pytorch/pytorch/pull/156050 Approved by: https://github.com/leslie-fang-intel, https://github.com/benjaminglass1, https://github.com/jansel	2025-06-20 01:11:16 +00:00
xinan.lin	83259cf7a7	[Inductor][Intel GPU] Support mkldnn Conv post op fusion for XPU. (#150287 ) This PR adds support for MKLDNN Conv post-op fusion in the Inductor Intel GPU backend under freezing mode. The implementation reuses the CPU's MKLDNN pattern fusion mechanism, as well as the corresponding Inductor unit tests for CPU MKLDNN pattern fusion. The performance improvement: \| Suite \| Inductor Speedup (Baseline) \| Inductor Speedup (Compared) \| Acc Failed \| Perf Failed \| Inductor Perf Ratio \| Speedup \| \|-------------\|-----------------------------\|------------------------------\|------------\|--------------\|----------------------\|----------\| \| Huggingface \| 2.134838 \| 2.125740314 \| 0 \| 0 \| 1.001462504 \| 100.43% \| \| Torchbench \| 1.808558 \| 1.675100479 \| 0 \| 0 \| 1.075722187 \| 107.97% \| \| Timm \| 2.343893 \| 2.070476653 \| 0 \| 0 \| 1.131023832 \| 113.21% \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/150287 Approved by: https://github.com/ZhiweiYan-96, https://github.com/EikanWang, https://github.com/jansel	2025-06-19 13:17:22 +00:00
Xia, Weiwen	246f3b6530	[Quant][PT2E][X86] enable qconv1d-relu fusion (#150751 ) Summary As the title. - The `conv1d - relu` pattern will be annotated by the `X86InductorQuantizer`. - The pattern will be fused as `qconv_pointwise` during lowering. Test plan ``` python test/inductor/test_mkldnn_pattern_matcher.py -k test_qconv1d_relu_cpu ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/150751 Approved by: https://github.com/jerryzh168, https://github.com/leslie-fang-intel	2025-04-09 14:42:02 +00:00
xinan.lin	58ede0cca3	[Inductor XPU] Refine `test_mkldnn_pattern_matcher.py` to be reusable for XPU. (#150286 ) This PR extracts some test cases from TestPatternMatcher into a newly created TestPatternMatcherGeneric, and uses instantiate_device_type_tests to make them reusable across multiple devices. Pull Request resolved: https://github.com/pytorch/pytorch/pull/150286 Approved by: https://github.com/jansel	2025-04-08 05:42:44 +00:00
Isuru Fernando	66b0a0b61a	[inductor] support dilation in max_pool2d lowering (#148209 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/148209 Approved by: https://github.com/eellison	2025-03-24 13:00:12 +00:00
Xia, Weiwen	c1dd75e4dc	Add AOTI shim for _weight_int4pack_mm_cpu_tensor (#149031 ) Summary Previous implementation of shim did not align with the design and it was removed by https://github.com/pytorch/pytorch/pull/148907 This PR adds it back in the files of MKLDNN backend and re-enable the CPP wrapper UT. Test plan ``` pytest -s test/inductor/test_cpu_cpp_wrapper.py -k test_woq_int4 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/149031 Approved by: https://github.com/leslie-fang-intel, https://github.com/EikanWang, https://github.com/desertfire	2025-03-18 01:33:13 +00:00
Bin Bao	ecfbfe1603	[AOTI] Remove aoti_torch_cpu__weight_int4pack_mm_cpu_tensor (#148907 ) Summary: shim.h is only meant for generic tensor util shim functions. We should switch to use the auto fallback generation, but it will need some extra care on the op schema. Pull Request resolved: https://github.com/pytorch/pytorch/pull/148907 Approved by: https://github.com/janeyx99	2025-03-11 04:41:05 +00:00
Xia, Weiwen	ca3aabc8e6	[Inductor][CPU] Add a lowering pass for _weight_int4pack_mm_for_cpu (#145250 ) Summary It's part of the task to enable max-autotune with GEMM template for WoQ INT4 GEMM on CPU. This PR adds a lowering pass for `torch.ops.aten_weight_int4pack_mm_for_cpu`. This op is used for WoQ int4 in Torchao. The lowering pass is a prerequisite for max-autotune, which is planed to be enabled for this op in subsequent PRs. Test plan ``` python test/inductor/test_mkldnn_pattern_matcher.py -k test_woq_int4 python test/inductor/test_cpu_cpp_wrapper.py -k test_woq_int4 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/145250 Approved by: https://github.com/leslie-fang-intel, https://github.com/jerryzh168 ghstack dependencies: #145245	2025-02-13 08:40:12 +00:00
Benjamin Glass	317dae95fa	cpp_wrapper: fix CPU cpp_wrapper and max-autotune tests (#145683 ) Both of these tests mostly failed due to incorrect assumptions about the generated code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145683 Approved by: https://github.com/desertfire ghstack dependencies: #145095, #145654, #145655	2025-02-04 22:05:59 +00:00
leslie-fang-intel	25de671ea8	[Inductor][CPP] Enable Grouped GEMM Template (#143796 ) Summary Enable the CPP Grouped GEMM Fusion, lowering and Grouped GEMM Template following the RFC: https://github.com/pytorch/pytorch/issues/144012 - Support flexible number of GEMMs - Share activation across GEMMs - The Grouped GEMM Template supports independent activations - However, the pattern matcher requires an anchor node, which is as the shared activation across GEMMs - Each GEMM can have a unique weight but same sizes - Each GEMM can have a unique bias or None - Current PR does not yet support biases; this will be addressed in a follow-up epilogue fusion PR - Each GEMM have its own epilogues - Epilogue fusion is not yet supported in this PR and will be enabled in an upcoming follow-up epilogue fusion PR Test Plan ``` python -u -m pytest -s -v test/inductor/test_cpu_select_algorithm.py -k test_grouped_linear python -u -m pytest -s -v test/inductor/test_cpu_select_algorithm.py -k test_grouped_linear_invalid python -u -m pytest -s -v test/inductor/test_cpu_cpp_wrapper.py -k test_grouped_linear ``` Example Here is the example and generated code ``` batch_size = 4 in_features = 512 out_features = 1024 dtype = torch.bfloat16 class M(torch.nn.Module): def __init__(self, bias): super().__init__() self.linear0 = torch.nn.Linear(in_features, out_features, bias=False) self.linear1 = torch.nn.Linear(in_features, out_features, bias=False) def forward(self, x): return self.linear0(x), self.linear1(x) if __name__ == "__main__": with torch.no_grad(): input = torch.randn(batch_size, in_features, dtype=dtype) m = M(bias=bias).to(dtype=dtype).eval() cm = torch.compile(m) act_res = cm(input) ``` Generated Code: https://gist.github.com/leslie-fang-intel/ed2e8d23aeb3586eb504feeace692e16#file-grouped-gemm-generated-code-py Next Step - Support Epilogue fusion Pull Request resolved: https://github.com/pytorch/pytorch/pull/143796 Approved by: https://github.com/jgong5, https://github.com/jansel	2025-01-14 05:59:07 +00:00
Wu, Chunyuan	d7411c0cc1	[AOTI] add C shim for QConvPointWise (#138540 ) This PR adds C shim for `QConvPointWisePT2E` and `QConvPointWiseBinaryPT2E` similar to https://github.com/pytorch/pytorch/pull/138439. Besides that, we aligned the implementation of `qconv_pointwise` with `qlinear_pointwise` in the following aspects: 1. The parameter order of `qconv_pointwise` and `qlinear_pointwise` are quite different, we aligned the schema of `qconv_pointwise` to have similar parameter order as `qlinear_pointwise` to make it more consistent. 2. We always converted `x_scale` and `x_zero_point` to Tensors, just like in the lowering of `qlinear_pointwise`. This avoids the need to create two separate C APIs (one for `double x_scale` and `int64_t x_zero_point`, and another for `Tensor` versions). Instead, we only need one API for `Tensor`-based `x_scale` and `x_zero_point`. If we later add dynamic quantization for qconv (which will use `Tensor` for `x_scale` and `x_zero_point`), we can reuse the code from this PR and don't need to change the C shim layer API. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138540 Approved by: https://github.com/jgong5, https://github.com/desertfire ghstack dependencies: #138691, #138806	2024-10-31 02:03:01 +00:00
Wu, Chunyuan	489c66fdb3	[AOTI] fix pointer_to_list (#138806 ) Fixes the `pointer_to_list` function to take `(ptr + i)` instead of `ptr`. This fixes the runtime error when running INT8 yolo-v7. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138806 Approved by: https://github.com/jgong5, https://github.com/desertfire ghstack dependencies: #138691	2024-10-29 14:33:16 +00:00
Wu, Chunyuan	9af1816974	[AOTI] add C shim for _weight_int8pack_mm (#138691 ) Fixes the error of running WOQ-INT8 LLaMA: ``` E In file included from /home/user/inductor/pytorch/torch/include/torch/csrc/inductor/aoti_runtime/arrayref_tensor.h:3, E from /tmp/torchinductor_user/sw/csw5gfmlzp5iooqvfwl2gwn574frwdpmtrx2y6nu2m6x76d3xcux.cpp:4: E /tmp/torchinductor_user/sw/csw5gfmlzp5iooqvfwl2gwn574frwdpmtrx2y6nu2m6x76d3xcux.cpp: In function ‘void inductor_entry_impl(AtenTensorOpaque, AtenTensorOpaque)’: E /tmp/torchinductor_user/sw/csw5gfmlzp5iooqvfwl2gwn574frwdpmtrx2y6nu2m6x76d3xcux.cpp:117:33: error: ‘aoti_torch_cpu__weight_int8pack_mm’ was not declared in this scope E 117 \| AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_cpu__weight_int8pack_mm(convert_arrayref_tensor_to_tensor(arg8_1), _frozen_param0, _frozen_param1, &buf0_handle)); E \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/138691 Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5, https://github.com/desertfire	2024-10-29 13:53:36 +00:00
Wu, Chunyuan	a3aca24ae5	[AOTI] add C shim for QLinearPointwise (#138439 ) This PR adds C shim for `QLinearPointwisePT2E` and `QLinearPointwiseBinaryPT2E`. The below changes are needed: - We moved the qlinear API out of the anonymous namespace since we need to call it in the shim layer. - We fixed the code which generated the `inputs` and `constant_args` so that we can directly leverage the `codegen` of the parent class. - `x_scale` and `x_zp` are ensured to be tensor during the lowering stage, thus we can remove the code which handles whether they're tensor or not. `fb0da32377/torch/_inductor/mkldnn_lowerings.py (L492-L496)` `fb0da32377/torch/_inductor/mkldnn_lowerings.py (L499-L503)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/138439 Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5, https://github.com/desertfire	2024-10-26 08:04:15 +00:00
Wu, Chunyuan	de51ed8610	[AOTI] Add C shim for _mkl_linear (#137880 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137880 Approved by: https://github.com/desertfire	2024-10-18 16:26:19 +00:00
Bin Bao	6bc57549f9	[AOTI] Remove non-ABI-compatible tests (#137982 ) Summary: Remove non-ABI-compatible mode tests since ABI-compatible has been turned on as default. Also clean up tests that explicitly set ABI-compatible to True. Differential Revision: [D64439673](https://our.internmc.facebook.com/intern/diff/D64439673) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137982 Approved by: https://github.com/malfet	2024-10-16 21:35:46 +00:00
Huy Do	df114a447e	Parametrize test_lstm_packed (#137447 ) The test runs all its combination (512) sequentially, so it takes more than 30 minutes to finish or timeout on ASAN after one hour. Parametrizing it will break it up, so individual tests can finish and aren't need to be marked as slow anymore. Also, the test seems to run OOM on a 2xlarge with `std::bad_alloc` memory error. Maybe, this would also fix the issue (pending CI testing) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137447 Approved by: https://github.com/albanD, https://github.com/malfet	2024-10-09 05:13:53 +00:00
PyTorch MergeBot	5349ee2934	Revert "Parametrize test_lstm_packed (#137447 )" This reverts commit d5493ed579ba41015ffef981832a3f04f94bb6f8. Reverted https://github.com/pytorch/pytorch/pull/137447 on behalf of https://github.com/huydhn due to Need to up few more instance to 4xlarge, revert to reland ([comment](https://github.com/pytorch/pytorch/pull/137447#issuecomment-2400737602))	2024-10-08 20:15:24 +00:00
Huy Do	d5493ed579	Parametrize test_lstm_packed (#137447 ) The test runs all its combination (512) sequentially, so it takes more than 30 minutes to finish or timeout on ASAN after one hour. Parametrizing it will break it up, so individual tests can finish and aren't need to be marked as slow anymore. Also, the test seems to run OOM on a 2xlarge with `std::bad_alloc` memory error. Maybe, this would also fix the issue (pending CI testing) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137447 Approved by: https://github.com/albanD, https://github.com/malfet	2024-10-08 15:26:27 +00:00
Bin Bao	0878739b11	[AOTI] Add C shim for MKLDNN _convolution_pointwise (#137269 ) Differential Revision: [D63875271](https://our.internmc.facebook.com/intern/diff/D63875271) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137269 Approved by: https://github.com/chenyang78, https://github.com/hl475	2024-10-04 19:42:05 +00:00
Henry Tsang	c318bafe9c	[inductor mkldnn test][BE] Use parametrize to shorten test run time (#137153 ) Summary: Tests in test_mkldnn_pattern_matcher.py can take too long to finish. Splitting them into smaller tests, using `parametrize`. I guess this means this test file has some refactoring opportunities as well. Next time would be the parametrize the add functions. Differential Revision: D63723925 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137153 Approved by: https://github.com/desertfire	2024-10-02 17:20:27 +00:00
Bin Bao	d6d9183456	[Inductor] Switch cpp_wrapper tests to ABI-compatible (#136904 ) Summary: Switch test_cpu_cpp_wrapper and test_cuda_cpp_wrapper to test the ABI-compatible mode only. Fixed a missing Py_NewRef issue for python 3.9. Pull Request resolved: https://github.com/pytorch/pytorch/pull/136904 Approved by: https://github.com/Yoggie9477, https://github.com/chenyang78	2024-09-30 05:44:52 +00:00
Bin Bao	1c9a1a2a19	[AOTI] Support MKL linear ops in cpp wrapper (#134974 ) Summary: Similar to https://github.com/pytorch/pytorch/pull/134475, support mkl linear in the ABI-compatible mode for cpp-wrapper Inductor. Differential Revision: [D63322202](https://our.internmc.facebook.com/intern/diff/D63322202) Pull Request resolved: https://github.com/pytorch/pytorch/pull/134974 Approved by: https://github.com/chenyang78, https://github.com/leslie-fang-intel Co-authored-by: leslie-fang-intel <leslie.fang@intel.com>	2024-09-25 03:53:11 +00:00
Bin Bao	b4c84c3167	[AOTI] Fix a fallback op returning None issue (#135997 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/135781. In some cases, a fallback can return None in the place of a tensor. Differential Revision: [D62659039](https://our.internmc.facebook.com/intern/diff/D62659039) Pull Request resolved: https://github.com/pytorch/pytorch/pull/135997 Approved by: https://github.com/chenyang78	2024-09-14 18:12:06 +00:00
Bin Bao	ea2ecab15b	[AOTI][reland] Fix assert_function call in cpu autotune template (#135920 ) Summary: Reland https://github.com/pytorch/pytorch/pull/135086. In the ABI-compatible mode, assert_function should be AOTI_TORCH_CHECK. Test Plan: CI Differential Revision: D62500592 Pull Request resolved: https://github.com/pytorch/pytorch/pull/135920 Approved by: https://github.com/chenyang78	2024-09-13 12:21:57 +00:00
Aaron Orenstein	8c356ce3da	Fix lint errors in fbcode (#135614 ) Summary: Fixed a bunch of fbcode imports that happened to work but confused autodeps. After this autodeps still suggests "improvements" to TARGETS (which breaks our builds) but at least it can find all the imports. Test Plan: ``` fbpython fbcode/tools/build/buck/linters/lint_autoformat.py --linter=autodeps --default-exec-timeout=1800 -- fbcode/caffe2/TARGETS fbcode/caffe2/test/TARGETS ``` Before: ``` ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "test_export" (from caffe2/test/export/testing.py:229) when processing rule "test_export". Please make sure it's listed in the srcs parameter of another rule. See https://fbur$ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "testing" (from caffe2/test/export/test_export.py:87) when processing rule "test_export". Please make sure it's listed in the srcs parameter of another rule. See https://fburl$ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "test_export" (from caffe2/test/export/test_serdes.py:9) when processing rule "test_export". Please make sure it's listed in the srcs parameter of another rule. See https://fb$ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "testing" (from caffe2/test/export/test_serdes.py:10) when processing rule "test_export". Please make sure it's listed in the srcs parameter of another rule. See https://fburl$ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "testing" (from caffe2/test/export/test_retraceability.py:7) when processing rule "test_export". Please make sure it's listed in the srcs parameter of another rule. See https:$ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "test_export" (from caffe2/test/export/test_retraceability.py:6) when processing rule "test_export". Please make sure it's listed in the srcs parameter of another rule. See ht$ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "testing" (from caffe2/test/export/test_export_nonstrict.py:7) when processing rule "test_export". Please make sure it's listed in the srcs parameter of another rule. See http$ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "test_export" (from caffe2/test/export/test_export_nonstrict.py:6) when processing rule "test_export". Please make sure it's listed in the srcs parameter of another rule. See $ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "test_export" (from caffe2/test/export/test_export_training_ir_to_run_decomp.py:8) when processing rule "test_export". Please make sure it's listed in the srcs parameter of an$ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "testing" (from caffe2/test/export/test_export_training_ir_to_run_decomp.py:10) when processing rule "test_export". Please make sure it's listed in the srcs parameter of anoth$ ERROR while processing caffe2/test/TARGETS: Found "//python/typeshed_internal:typeshed_internal_library" owner for "cv2" but it is protected by visibility rules: [] (from caffe2/test/test_bundled_images.py:7) when processing rule "test_bundled_$ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "caffe2.test.profiler_test_cpp_thread_lib" (from caffe2/test/profiler/test_cpp_thread.py:29) when processing rule "profiler_test_cpp_thread". Please make sure it's listed in t$ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "torch._utils_internal.get_file_path_2" (from caffe2/test/test_custom_ops.py:23) when processing rule "custom_ops". Please make sure it's listed in the srcs parameter of anoth$ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "torch._utils_internal.get_file_path_2" (from caffe2/test/test_public_bindings.py:13) when processing rule "public_bindings". Please make sure it's listed in the srcs paramete$ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "torch._C._profiler.symbolize_tracebacks" (from caffe2/test/test_cuda.py:3348) when processing rule "test_cuda". Please make sure it's listed in the srcs parameter of another $ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for "torch._C._profiler.gather_traceback" (from caffe2/test/test_cuda.py:3348) when processing rule "test_cuda". Please make sure it's listed in the srcs parameter of another rule$ ERROR while processing caffe2/test/TARGETS: Cannot find an owner for include <torch/csrc/autograd/profiler_kineto.h> (from caffe2/test/profiler/test_cpp_thread.cpp:2) when processing profiler_test_cpp_thread_lib. Some things to try: ``` Differential Revision: D62049222 Pull Request resolved: https://github.com/pytorch/pytorch/pull/135614 Approved by: https://github.com/oulgen, https://github.com/laithsakka	2024-09-13 02:04:34 +00:00
Pushpak Raj Gautam	ee8c5cc1cc	For S444023: Back out "deprecate `search_autotune_cache` (#133628 )" (#135186 ) Summary: For S444023 Test Plan: Revert prevented the NaN errors - f639391901 Training job ran for 7767 iterations. NaN errors show up within the first 1k. Reviewed By: nmacchioni Differential Revision: D62224747 Pull Request resolved: https://github.com/pytorch/pytorch/pull/135186 Approved by: https://github.com/kit1980	2024-09-11 14:08:40 +00:00
PyTorch MergeBot	0a9d55d2ee	Revert "[AOTI] Fix assert_function call in cpu autotune template (#135086 )" This reverts commit 16c3b8f87cfa9cb5acee8104820baa389e7ee2bd. Reverted https://github.com/pytorch/pytorch/pull/135086 on behalf of https://github.com/izaitsevfb due to breaks internal tests, see D62405818 ([comment](https://github.com/pytorch/pytorch/pull/135086#issuecomment-2341889428))	2024-09-10 19:51:16 +00:00
Bin Bao	16c3b8f87c	[AOTI] Fix assert_function call in cpu autotune template (#135086 ) Summary: In the ABI-compatible mode, assert_function should be AOTI_TORCH_CHECK. Pull Request resolved: https://github.com/pytorch/pytorch/pull/135086 Approved by: https://github.com/chenyang78, https://github.com/angelayi ghstack dependencies: #134857	2024-09-09 16:54:12 +00:00
Bin Bao	9c6dff4941	[AOTI] Add C shim for aten.mkldnn_rnn_layer in cpp wrapper (#134857 ) Summary: Support aten.mkldnn_rnn_layer in the ABI-compatible mode. Because aten.mkldnn_rnn_layer is an aten op, it is easier to add a C shim function for it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134857 Approved by: https://github.com/angelayi	2024-09-09 16:54:12 +00:00
Bin Bao	1e57ef08fa	[AOTI] Support MKLDNN qconv ops in cpp wrapper (#134795 ) Summary: Similar to https://github.com/pytorch/pytorch/pull/134475, support qconv in the ABI-compatible mode for cpp-wrapper Inductor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134795 Approved by: https://github.com/leslie-fang-intel, https://github.com/chunyuan-w, https://github.com/angelayi ghstack dependencies: #134475, #134783	2024-09-06 01:01:53 +00:00
Bin Bao	614b86d602	[AOTI] Support MKLDNN qlinear ops in cpp wrapper (#134783 ) Summary: Similar to https://github.com/pytorch/pytorch/pull/134475, support qlinear in the ABI-compatible mode for cpp-wrapper Inductor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134783 Approved by: https://github.com/leslie-fang-intel, https://github.com/chunyuan-w, https://github.com/angelayi ghstack dependencies: #134475	2024-09-06 01:01:53 +00:00
Bin Bao	0b96dfb736	[AOTI] Support MKLDNN conv ops in cpp wrapper (#134475 ) Summary: Partially fix https://github.com/pytorch/pytorch/issues/123040. In the ABI-compatible mode, MKLDNN fallback ops do not have C shim implementations and thus need to go through the custom ops launch path. Other MLKDNN ops will be fixed in following PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134475 Approved by: https://github.com/leslie-fang-intel, https://github.com/chunyuan-w, https://github.com/angelayi	2024-09-06 01:01:53 +00:00
Nicolas Macchioni	dd69013c7a	deprecate `search_autotune_cache` (#133628 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/133628 Approved by: https://github.com/oulgen	2024-08-16 09:29:39 +00:00
Bin Bao	fd874b799f	[AOTI][refactor] Update MKLDNN ops cpp wrapper support (#132367 ) Summary: Set op_overload for MKLDNN ops so that cpp_kernel_name and python_kernel_name are constructed from there. This is an important step towards support those MKLDNN ops in the ABI-compatible mode, because we will need to read schema from op_overload for generating correct fallback op call in C++. Differential Revision: [D60909798](https://our.internmc.facebook.com/intern/diff/D60909798) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132367 Approved by: https://github.com/leslie-fang-intel, https://github.com/angelayi	2024-08-08 03:02:29 +00:00
Xu Han	59bbaea3a7	[inductor] disable capture_pre_autograd_graph related UTs on Windows (#132848 ) Contined to https://github.com/pytorch/pytorch/pull/132841 We disable `capture_pre_autograd_graph` related UT on Windows. Disable `test_lstm_packed_change_input_sizes` and `test_multihead_attention` UTs on Windows. TODO: Turn on them after fix `capture_pre_autograd_graph` issue on Windows. ## Local Test: Linux is not skiped: <img width="1387" alt="image" src="https://github.com/user-attachments/assets/28dfbb4b-d9c0-4d5b-be84-d7b3697bcd3f"> And we can skiped them on Windows: <img width="853" alt="image" src="https://github.com/user-attachments/assets/e96ebcf8-9bf3-43aa-93fd-fb33d3743573"> Co-authored-by: Jiong Gong <jiong.gong@intel.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/132848 Approved by: https://github.com/jgong5, https://github.com/desertfire	2024-08-07 19:38:03 +00:00
Gabriel Ferns	c3ee07c71c	add missing profiler include in cpp code generation (#132419 ) Summary: When a user sets config.profiler_mark_wrapper_call, RECORD_FUNCTION annotations are added to the code. This requires importing the header <ATen/record_function.h>, but the conditional for doing so didn't check config.profiler_mark_wrapper_call. Test Plan: This case is already covered in test_profiler_mark_wrapper_call. ``` (pytorch-3.10) [gabeferns@devvm2252.cco0 ~/pytorch (missing-profile-include)]$ TORCHINDUCTOR_ABI_COMPATIBLE=1 TORCHINDUCTOR_CPP_WRAPPER=1 python test/inductor/test_torchinductor.py -k CpuTests.test_profiler_mark_wrapper_call_cpu stats [('calls_captured', 1), ('unique_graphs', 1)] inductor [('fxgraph_cache_miss', 1)] aot_autograd [('total', 1), ('ok', 1)] . ---------------------------------------------------------------------- Ran 1 test in 8.080s OK ``` Fixes https://github.com/pytorch/pytorch/issues/131339 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132419 Approved by: https://github.com/jgong5, https://github.com/desertfire	2024-08-05 13:40:47 +00:00
Gabriel Ferns	2138a710eb	enable test_max_pool2d6 after resolving empty array (#132219 ) Related to Issue: https://github.com/pytorch/pytorch/issues/131335 Resolving PR: https://github.com/pytorch/pytorch/pull/132023 Test output: ``` (pytorch-3.10) [gabeferns@devvm2252.cco0 ~/pytorch (enable-test-max-pool2d6)]$ TORCHINDUCTOR_ABI_COMPATIBLE=1 python test/inductor/test_cpu_cpp_wrapper.py -k test_max_pool2d6 inline_call [] stats [('calls_captured', 3), ('unique_graphs', 1)] inductor [('extern_calls', 3), ('fxgraph_cache_miss', 1)] aot_autograd [('total', 1), ('ok', 1)] .inline_call [] stats [('calls_captured', 3), ('unique_graphs', 1)] aot_autograd [('total', 1), ('ok', 1)] inductor [('extern_calls', 3), ('fxgraph_cache_miss', 1)] . ---------------------------------------------------------------------- Ran 2 tests in 8.668s OK ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/132219 Approved by: https://github.com/desertfire	2024-07-31 19:13:54 +00:00
Wu, Chunyuan	30e7fc0fe1	Cpp wrapper: set args to CppWrapperKernelArgs in cpp template kernel (#129557 ) Fix the compilation error: ```cpp /tmp/tmpywg34bca/tg/ctg7wbli6pvydsjr2xsxamdbamkquhlincuky3dzopa3ilrxqdwt.cpp:401:24: error: cannot convert ‘at::Tensor’ to ‘const bfloat16’ {aka ‘const c10::BFloat16’} 401 \| cpp_fused_div_mm_0(arg2_1, constant2, _frozen_param1, buf1); \| ^~~~~~ \| \| \| at::Tensor ``` The generated code after the fix will be: ```cpp cpp_fused_div_mm_0((bfloat16)(arg2_1.data_ptr()), (bfloat16)(constant2.data_ptr()), (bfloat16)(_frozen_param1.data_ptr()), (bfloat16)(buf1.data_ptr())); ``` Multiple changes are required for ABI compatible mode. Separate it into a follow-up PR in this ghstack: https://github.com/pytorch/pytorch/pull/131841 Pull Request resolved: https://github.com/pytorch/pytorch/pull/129557 Approved by: https://github.com/leslie-fang-intel	2024-07-29 04:01:17 +00:00
Wu, Chunyuan	632910e2a8	Add test to xfail_list only for abi_compatible (#128506 ) https://github.com/pytorch/pytorch/pull/126717 will skip the tests in both ABI compatible and non-ABI compatible mode. It's not expected to skip them in non-ABI compatible mode since they can actually run successfully in such mode but only have issues in ABI compatible mode. We leverage the existing `xfail_list` for those that will only fail in ABI compatible mode. - `test_qlinear_add` is already in the `xfail_list`. - `test_linear_packed` doesn't fail either in my local run (running with `TORCHINDUCTOR_ABI_COMPATIBLE=1`) or in the CI of this PR so I didn't add it into `xfail_list`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/128506 Approved by: https://github.com/jgong5, https://github.com/desertfire	2024-06-21 07:19:28 +00:00
Sam Larsen	571a0db132	[inductor] Fix logging for run_and_get_cpp_code (#128794 ) Summary: Found during testing with remote caching: Use the same output logger object between graph.py and codecache.py since it's patched in `run_and_get_cpp_code`. That allows us to capture any logging produced from the codecache path when using `run_and_get_cpp_code`. I'm also fixing a few tests that were passing mistakenly because logging was missing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/128794 Approved by: https://github.com/oulgen, https://github.com/leslie-fang-intel	2024-06-19 21:32:34 +00:00
Colin Peppler	3a185778ed	[aotinductor] Add torch.polar fallback op for shim v2 (#128722 ) Compilation error: ``` $ TORCHINDUCTOR_C_SHIM_VERSION=2 TORCHINDUCTOR_ABI_COMPATIBLE=1 TORCH_LOGS_FORMAT="%(pathname)s:%(lineno)s: %(message)s" TORCH_LOGS="+output_code" python test/inductor/test_cpu_cpp_wrapper.py -k test_polar /tmp/tmp2sp128xj/dy/cdypvu3hvgg3mwxydwbiuddsnmuoi37it3mrpjktcnu6vt4hr3ki.cpp:59:33: error: ‘aoti_torch_cpu_polar’ was not declared in this scope; did you mean ‘aoti_torch_cpu_topk’? ``` Steps: 1. Add aten.polar 2. run `python torchgen/gen.py --update-aoti-c-shim`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/128722 Approved by: https://github.com/chenyang78, https://github.com/desertfire	2024-06-19 05:06:58 +00:00
PyTorch MergeBot	a584b2a389	Revert "Add test to xfail_list only for abi_compatible (#128506 )" This reverts commit df85f34a14dd30f784418624b05bd52b12ab8b0b. Reverted https://github.com/pytorch/pytorch/pull/128506 on behalf of https://github.com/huydhn due to The failure shows up in trunk `df85f34a14` ([comment](https://github.com/pytorch/pytorch/pull/128506#issuecomment-2177744578))	2024-06-19 04:59:10 +00:00
Wu, Chunyuan	df85f34a14	Add test to xfail_list only for abi_compatible (#128506 ) https://github.com/pytorch/pytorch/pull/126717 will skip the tests in both ABI compatible and non-ABI compatible mode. It's not expected to skip them in non-ABI compatible mode since they can actually run successfully in such mode but only have issues in ABI compatible mode. We leverage the existing `xfail_list` for those that will only fail in ABI compatible mode. - `test_qlinear_add` is already in the `xfail_list`. - `test_linear_packed` doesn't fail either in my local run (running with `TORCHINDUCTOR_ABI_COMPATIBLE=1`) or in the CI of this PR so I didn't add it into `xfail_list`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/128506 Approved by: https://github.com/jgong5, https://github.com/desertfire	2024-06-19 01:18:37 +00:00
PyTorch MergeBot	c8e9656a12	Revert "Add test to xfail_list only for abi_compatible (#128506 )" This reverts commit 49366b2640df1cba5a3b40bedd31b57b08529612. Reverted https://github.com/pytorch/pytorch/pull/128506 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it causes an inductor test to fail in trunk `49366b2640` ([comment](https://github.com/pytorch/pytorch/pull/128506#issuecomment-2166824714))	2024-06-13 21:30:07 +00:00
Wu, Chunyuan	49366b2640	Add test to xfail_list only for abi_compatible (#128506 ) https://github.com/pytorch/pytorch/pull/126717 will skip the tests in both ABI compatible and non-ABI compatible mode. It's not expected to skip them in non-ABI compatible mode since they can actually run successfully in such mode but only have issues in ABI compatible mode. We leverage the existing `xfail_list` for those that will only fail in ABI compatible mode. - `test_qlinear_add` is already in the `xfail_list`. - `test_linear_packed` doesn't fail either in my local run (running with `TORCHINDUCTOR_ABI_COMPATIBLE=1`) or in the CI of this PR so I didn't add it into `xfail_list`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/128506 Approved by: https://github.com/jgong5, https://github.com/desertfire	2024-06-13 15:32:15 +00:00

1 2

82 Commits