pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-21 05:34:18 +08:00

Author	SHA1	Message	Date
angelayi	3c8c509a9c	[export] Fix custom ops in subgraphs (#160004 ) Fixes https://github.com/pytorch/pytorch/issues/159995 Currently there are two problems with extern kernels in subgraphs: 1. They don't get serialized to the extern kernel json file because we only look at the toplevel graph. 2. Since the scope of each extern_kernel list is within its own subgraph, the indices referencing the operator is messed up because each subgraph will start counting from 0. So, this PR moves the extern_kernels list to a global view (under virtualized) so that we can count the extern kernels across subgraphs and the toplevel graph. Pull Request resolved: https://github.com/pytorch/pytorch/pull/160004 Approved by: https://github.com/ydwu4	2025-08-18 15:42:19 +00:00
Xu Han	aacb944079	[aot inductor] fix clang-asan for consts_cpp. (#158175 ) From the perivous PR: https://github.com/pytorch/pytorch/pull/157608 , I added `format_consts_to_cpp` to build consts bytes. But it still raise clang ASAN `stack alloction`, when build large size consts. This PR: 1. add `test_aot_inductor_consts_cpp_build` to stack allocation skip list. 2. add ATTRIBUTE_NO_SANITIZE_ADDRESS to skip ASAN check, because consts array is locate in global area. Pull Request resolved: https://github.com/pytorch/pytorch/pull/158175 Approved by: https://github.com/jansel	2025-07-12 07:14:05 +00:00
Catherine Lee	32c1611263	[CI][run_test] Fix rerun logic for failing at exit (#155853 ) Sometimes a test file reports success according to pytest, but fails afterwards, and the rerun logic doesn't handle that correctly. The name of the last run test is saved in order to do more efficient reruns (target the last run test for a rerun without rerunning the entire file). This usually correct, ex test fails and pytest catches it -> lastrun = the test that failed, test segfaults (pytest doesn't catch) -> lastrun is the test that segfaulted. But sometimes pytest reports a success, but the process has non zero exit code. The two cases I know of are hangs and double freeing at exit. In this case, its unclear which test caused the failure, so lastrun is set to be the first test that ran in that session, so that during the next session it will start from the beginning in an attempt to replicate the error (an alternate solution would be to just fail and not rerun, which might be the better option). But then it reruns with runsingle, which prevents lastrun from being reset (not sure why, I'm pretty sure there's no difference between resetting and not normally), so lastrun becomes the last test that ran, and its not always true that lastrun is the one that caused it. Then on the next run, it starts from the last test and the process now exits cleanly Short term solution here: ensure the lastrun is always set to the initial value if the session succeeds. This is correct even in the normal path because initial value shouldn't change in that case Things that still need to be fixed: * log says "running single test" which is not true * no xml reports get generated here * also no xml reports get generated on segfault * docs for this I think I have a PR that fixes the above but its old so I need to take another look Testing: This from when I was based on a commit that had a hang for macs, and before I added the skips in inductor array ref: `cc862d2c14` Pull Request resolved: https://github.com/pytorch/pytorch/pull/155853 Approved by: https://github.com/malfet	2025-06-17 17:51:40 +00:00
Shangdi Yu	efdcc981d0	Back out "Do not propagate real tensor in extern kernel" (#151813 ) Summary: D73002775 breaks aot_compile for many draft exported models on PT2I dashboard. Revert. Example error msg: ``` OrderedSet([]) >= OrderedSet([u1185, u1186, u1187]) (inductor >= fx) fx node is: %embedding_bag_byte_prepack : [num_users=4] = call_function[target=torch.ops.quantized.embedding_bag_byte_prepack.default](args = (%view_10,), kwargs = {}) new operations are: ``` Differential Revision: D73381032 Pull Request resolved: https://github.com/pytorch/pytorch/pull/151813 Approved by: https://github.com/angelayi, https://github.com/zou3519	2025-04-21 22:54:03 +00:00
Shangdi Yu	931bd05560	Do not propagate real tensor in extern kernel (#151377 ) Summary: See internal Diff for more details. In ExternKernel, the FakeTensors do not have associated real tensors, because they are just created from ir.Node's shape and stride. Test Plan: ``` buck run fbcode//mode/dev-nosan //caffe2/test/inductor:test_aot_inductor -- -r aoti_data_dependent_ex buck2 run mode/dev-nosan fbcode//caffe2/test/inductor:aot_inductor_arrayref_cpu -- -r data_dependent_extern_kernel_op ``` Differential Revision: D73002775 Pull Request resolved: https://github.com/pytorch/pytorch/pull/151377 Approved by: https://github.com/angelayi	2025-04-18 17:28:13 +00:00
Mu-Chu Lee	e567900998	[AOTInductor] Activate CPU test for update_constant_buffer (#149162 ) Summary: Fixed by #145459 Test Plan: Re-activating tests. Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/149162 Approved by: https://github.com/chenyang78, https://github.com/jingsh	2025-03-14 04:09:57 +00:00
Yidi Wu	923ce10f6c	[while_loop] require stride to be the same as input for body_fn (#148002 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/148002 Approved by: https://github.com/zou3519	2025-03-12 17:15:10 +00:00
Yidi Wu	982d7ba3ef	[while_loop][inductor] relax the constraint that all inputs must be on the same device (#148019 ) Previously, we require all inputs of while_loop to be on the same device. However, there're use cases where we want to keep some of the inputs on cpu while others on gpu e.g. an loop_idx on cpu will save the gpu to device copies. This PR relaxes the constraint and only check if carry and input at the same position have the same device. Pull Request resolved: https://github.com/pytorch/pytorch/pull/148019 Approved by: https://github.com/eellison, https://github.com/jansel	2025-02-28 18:27:03 +00:00
Yidi Wu	2d2f60bdda	[cond] support mismatched output in inductor (#147567 ) In this PR, we extract `codegen_unbacked_symbol_defs` of FallbackKernel out as a `codegen_unbacked_symbol_defs_for_outputs` method in wrapper. With it, HOPs can support the case where the subgraph returns a tensor with unbacked symints. This PR only do it for cond, we'll have follow up PRs for others (e.g. while_loop) as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147567 Approved by: https://github.com/jansel	2025-02-28 18:26:48 +00:00
Yidi Wu	8f073065d5	[while_loop][inductor] support sym expression as cond_fn output (#146222 ) As titled. Previously, we only support tensor output of cond_fn, this PR changes to also allow a shape expr to be returned in cond_fn. aoti generated output code looks like: ``` V0203 11:28:05.750000 2611693 torch/_inductor/compile_fx.py:1091] [1/0] [__output_code] bool buf7_cond_result; .... (while_loop_cond_graph_0_arg2_1_handle); V0203 11:27:59.336000 2611693 torch/_inductor/compile_fx.py:1091] [1/0] [__output_code] buf7_cond_result = u0 + u1 < 10L; V0203 11:27:59.336000 2611693 torch/_inductor/compile_fx.py:1091] [1/0] [__output_code] if (!buf7_cond_result) break; ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/146222 Approved by: https://github.com/desertfire	2025-02-10 21:25:40 +00:00
Yidi Wu	b0fe975521	[hop][inductor] track the dependency on unbacked symbols correctly with constant_args for hops (#143456 ) Before the PR, we're getting an undefined symbol error for output code when an unbacked symint is only used in the hop because we didn't correctly record the dependency of the unbacked symbols for hops and it gets DCEed accidentally. This PR adds the symbol arguments to `constant_args`, where the dependencies can be correctly constructed when `get_unbacked_symbol_uses` is called to check constant_args. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143456 Approved by: https://github.com/desertfire	2025-02-04 18:47:34 +00:00
Tom Ritchford	d8c8ba2440	Fix unused Python variables in test/[e-z]* (#136964 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/136964 Approved by: https://github.com/justinchuby, https://github.com/albanD	2024-12-18 23:02:30 +00:00
Scott Wolchok	274223d719	Add and use borrow_arrayref_tensor_as_tensor (#142183 ) Differential Revision: [D66847773](https://our.internmc.facebook.com/intern/diff/D66847773/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/142183 Approved by: https://github.com/desertfire, https://github.com/hl475 ghstack dependencies: #142340, #142182	2024-12-09 22:23:21 +00:00
Scott Wolchok	dc1ef9afb4	Reapply #142091 (Unbreak dynamic shape minimal arrayref interface tests) (#142340 ) Simple bug got introduced somewhere. The original PR was reverted because it broke (caused unexpected successes for) some tests in test_aot_inductor_arrayref.py that still only run internally because #123691 hasn't been fixed. I've fixed those. Differential Revision: [D66890276](https://our.internmc.facebook.com/intern/diff/D66890276/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/142340 Approved by: https://github.com/hl475	2024-12-09 22:23:21 +00:00
PyTorch MergeBot	683ec42958	Revert "Unbreak dynamic shape minimal arrayref interface tests (#142091 )" This reverts commit 2bfc600644ed59332f9da7b94558b9c4c9562b0d. Reverted https://github.com/pytorch/pytorch/pull/142091 on behalf of https://github.com/atalman due to Breaks internal changes ([comment](https://github.com/pytorch/pytorch/pull/142091#issuecomment-2523906048))	2024-12-06 18:25:54 +00:00
Scott Wolchok	2bfc600644	Unbreak dynamic shape minimal arrayref interface tests (#142091 ) Simple bug got introduced somewhere. Differential Revision: [D66792420](https://our.internmc.facebook.com/intern/diff/D66792420/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/142091 Approved by: https://github.com/desertfire, https://github.com/hl475	2024-12-05 23:26:35 +00:00
bhack	ae9cda0221	Add `truediv` support in export serializer (#136364 ) Fixes #136113 - [x] Inital `truediv` coverage - [ ] Expand/reduce coverage? - [x] Add tests - [x] Re-check docstrings - [ ] Linting Pull Request resolved: https://github.com/pytorch/pytorch/pull/136364 Approved by: https://github.com/pianpwk Co-authored-by: Angela Yi <angelayi@meta.com> Co-authored-by: Pian Pawakapan <pianpwk@meta.com>	2024-12-05 17:33:33 +00:00
Mu-Chu Lee	b08bc07cd7	[AOTInductor] Option to not include weight in .so (#141997 ) Summary: Add an option in config to not include weights in .so Test Plan: `test/inductor:test_aot_inductor -- -r test_so_without_weight_cuda` Reviewed By: desertfire Differential Revision: D65968885 Pull Request resolved: https://github.com/pytorch/pytorch/pull/141997 Approved by: https://github.com/desertfire	2024-12-05 03:35:54 +00:00
Shangdi Yu	7dfb439a2a	Only write predicate once when there are multiple torch.cond (#141528 ) Fixes #140606 TEST PLAN: ``` python test/inductor/test_aot_inductor.py -k cond_share python test/inductor/test_aot_inductor_arrayref.py -k cond_share ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/141528 Approved by: https://github.com/desertfire	2024-12-04 01:56:10 +00:00
Mu-Chu Lee	8b0fcad0fd	[AOTInductor] Add update_constant_buffer pybind support (#140755 ) Summary: We add update_constant_buffer python support for testing purpose. Test Plan: Included in commit Differential Revision: D65968613 Pull Request resolved: https://github.com/pytorch/pytorch/pull/140755 Approved by: https://github.com/22quinn	2024-12-03 20:34:25 +00:00
PyTorch MergeBot	6e61ff4fd3	Revert "Add `truediv` support in export serializer (#136364 )" This reverts commit 1df440dc4e7ece40db597ce8e477e14b9c44fea7. Reverted https://github.com/pytorch/pytorch/pull/136364 on behalf of https://github.com/huydhn due to Sorry for reverting your change but its doc build failure is legit ([comment](https://github.com/pytorch/pytorch/pull/136364#issuecomment-2502620732))	2024-11-27 03:24:31 +00:00
bhack	1df440dc4e	Add `truediv` support in export serializer (#136364 ) Fixes #136113 - [x] Inital `truediv` coverage - [ ] Expand/reduce coverage? - [x] Add tests - [x] Re-check docstrings - [ ] Linting Pull Request resolved: https://github.com/pytorch/pytorch/pull/136364 Approved by: https://github.com/pianpwk Co-authored-by: Angela Yi <angelayi@meta.com> Co-authored-by: Pian Pawakapan <pianpwk@meta.com>	2024-11-27 00:31:47 +00:00
angelayi	a3e516d165	[aoti] Split custom ops tests (#140977 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/140977 Approved by: https://github.com/desertfire	2024-11-22 06:18:25 +00:00
Bin Bao	040af3053a	[AOTI] Fix a two-pass kernel missmatch (#141041 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/140766. In AOTI's two-pass codegen, the first pass generates triton_per_fused_add_native_layer_norm_4, and the second pass generates triton_red_fused_add_native_layer_norm_4. While this problem will go away with the incoming one-pass implementation, further debugging reveals there is a mismatch in has_non_contiguous_pw_in_reduction_kernel between the two passes, due to a symbol comparsion problem in stride1_for_last_dim. Differential Revision: [D66203298](https://our.internmc.facebook.com/intern/diff/D66203298) Pull Request resolved: https://github.com/pytorch/pytorch/pull/141041 Approved by: https://github.com/shunting314	2024-11-20 23:34:24 +00:00
Yidi Wu	4b3ce62946	[while_loop] support pytree inputs (#140059 ) Previously, we only support carries to be tuple of tensors. This pr enables us to support pytree of tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140059 Approved by: https://github.com/zou3519	2024-11-20 21:12:29 +00:00
Henry Tsang	350bc2a166	[export] Add support for symbool to make it usable for torch.cond (#138765 ) # Why? I want the following code to work. minimal repro: ``` class M(torch.nn.Module): def forward(self, dilate_flag): return dilate_flag.item() input1 = (torch.tensor([1], dtype=torch.bool, device="cuda"),) model = M().cuda() ep = torch.export.export(model, input1, strict=True) path = torch._inductor.aot_compile(ep.module(), input1) aot_model = torch._export.aot_load(path, device="cuda") actual_output = aot_model(input1) ``` error: AssertionError: Encountered an unsupported object of type <class 'torch.SymBool'> while writing the metadata for exported program second error will be handled by https://github.com/pytorch/pytorch/pull/138760 # Motivation I could technically bypass it with a torch.int tensor. However, it doesn't work with torch.cond. I want the following to work. It would also require https://github.com/pytorch/pytorch/pull/138760 for aot compile to work. ``` class M(torch.nn.Module): def __init__(self) -> None: super().__init__() self.dilate_flag = 0 def forward(self, dilate_flag): self.dilate_flag = dilate_flag.item() def true_fn(dilate_flag): return dilate_flag.clone() def false_fn(dilate_flag): return dilate_flag.clone() torch.cond( self.dilate_flag, true_fn, false_fn, (dilate_flag,), ) return self.dilate_flag input1 = (torch.tensor([1], dtype=torch.bool, device="cuda"),) input2 = (torch.tensor([0], dtype=torch.bool, device="cuda"),) inputs = (input1, input2) model = M().cuda() for input in inputs: expected_output = model(input) ep = torch.export.export(model, input, strict=False) path = torch._inductor.aot_compile(ep.module(), input) aot_model = torch._export.aot_load(path, device="cuda") actual_output = aot_model(*input) assert ( expected_output == actual_output ), f"henry they are not equal {expected_output} != {actual_output}" ``` Differential Revision: D64867504 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138765 Approved by: https://github.com/ydwu4	2024-11-04 23:31:49 +00:00
Bin Bao	7d081cabfb	[AOTI] Forward fix #139458 (#139485 ) Summary: A new test added in https://github.com/pytorch/pytorch/pull/139458 only fails in certain CI instance. Skip for now as the failing test has a low priority. @diff-train-skip-merge (to silent fb bot so that I can land this myself) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139485 Approved by: https://github.com/huydhn, https://github.com/hl475	2024-11-01 17:14:40 +00:00
angelayi	8c22e09e39	[aoti] Add masked_select to cshim (#139071 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/139071 Approved by: https://github.com/desertfire	2024-10-31 21:52:53 +00:00
Wu, Chunyuan	3192bdeea4	[AOTI] Use `len(serialized_weights)` when calculating `consts_size` (#139054 ) Fixes the failure of INT8 DLRM using AOTI. The previous code calculates `consts_size` directly using `tensor` from `graph.constants`: ``` consts_size = sum( get_nbytes_of_tensor(tensor, all_cuda) for (name, tensor) in graph.constants.items() if name not in graph.folded_constants ) ``` Meanwhile, the actual bytes to serialize (`serialized_weights`) is using `graph.get_original_value_of_constant(name)`: ``` serialized_weights = b"".join( _to_bytes(graph.get_original_value_of_constant(name), all_cuda) for name in graph.constants.keys() if name not in graph.folded_constants ) ``` `tensor` from `graph.constants` could be different from `graph.get_original_value_of_constant(name)` thus making the `consts_size` inconsistent with the actual byte size of the `serialized_weights`, resulting in runtime error `weights_offset must be aligned to 16K boundary`, similar to what happened in https://github.com/pytorch/pytorch/pull/135205. This PR direclty gets `consts_size ` using `len(serialized_weights)`, which fixes the inconsistency. We also added a `reduce_range` argument to the `get_default_x86_inductor_quantization_config` function, which is needed in the unit test to avoid accuracy issue on CI machines (earlier CPUs without VNNI). Pull Request resolved: https://github.com/pytorch/pytorch/pull/139054 Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5, https://github.com/desertfire	2024-10-31 09:54:16 +00:00
Bin Bao	2cee5a39ad	[AOTI] Fix check_model_with_multiple_inputs in test_aot_inductor (#138379 ) Summary: Add missing use_minimal_arrayref_interface setting to check_model_with_multiple_inputs. Differential Revision: [D64635211](https://our.internmc.facebook.com/intern/diff/D64635211) Pull Request resolved: https://github.com/pytorch/pytorch/pull/138379 Approved by: https://github.com/hl475 ghstack dependencies: #138544	2024-10-23 00:54:29 +00:00
Bin Bao	54fbd897d9	[AOTI][refactor] Clean up test_aot_inductor skip list (#138544 ) Summary: Remove skips for already fixed tests. Change remaining skip to xfail so that the failure list can be more proactively maintained. Differential Revision: [D64761257](https://our.internmc.facebook.com/intern/diff/D64761257) Pull Request resolved: https://github.com/pytorch/pytorch/pull/138544 Approved by: https://github.com/chenyang78, https://github.com/hl475	2024-10-22 21:32:49 +00:00
Bin Bao	2827befe61	[AOTI][reland] Fix test_index_put_with_none_index_cpu_with_stack_allocation (#138541 ) Summary: The problem happened after splitting CppWrapperCpu and CppWrapperCpuArrayRef, because CppWrapperCpuArrayRef.generate_index_put_fallback missed a statement. Running test_aot_inductor.py as a whole didn't reveal the problem, but running test_index_put_with_none_index_cpu_with_stack_allocation individually did. Digging deeper, the root cause is init_backend_registration has incorrectly cached CPU CppWrapperCodegen class, which means CppWrapperCpuArrayRef was never picked when running test_aot_inductor.py as a whole. To fix the problem, all the ArrayRef tests are split into a separate file. Also a code checking is added to regex match AOTInductorModelRunMinimalArrayrefInterface so this kind of false passing signal won't be unnoticed. Differential Revision: [D64734106](https://our.internmc.facebook.com/intern/diff/D64734106) Pull Request resolved: https://github.com/pytorch/pytorch/pull/138541 Approved by: https://github.com/frank-wei	2024-10-22 14:17:27 +00:00

32 Commits