pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 12:54:11 +08:00

Author	SHA1	Message	Date
Lucas Kabela	979e10f7d6	[Bugfix] Match eager stride semantics for cloned tensors with preserve_format in compile (#163017 ) Fixes #161010 by making `clone_meta` match the semantics of strides for eager mode. This is: * Case 1: Tensor is_non_overlapping_and_dense; in this case, stride should match input tensor stride * Case 2: Otherwise, stride should be contiguous computed from input tensor using `compute_elementwise_output_strides` Pull Request resolved: https://github.com/pytorch/pytorch/pull/163017 Approved by: https://github.com/williamwen42, https://github.com/xmfan Co-authored-by: morrison-turnansky <mturnans@redhat.com>	2025-09-19 19:41:33 +00:00
Laith Sakka	e4174b1fd7	remove gso from collapse_view_helper (#162212 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/162212 Approved by: https://github.com/aorenste Co-authored-by: Aaron Orenstein <aorenste@fb.com>	2025-09-10 00:17:15 +00:00
Tom Ritchford	d8c8ba2440	Fix unused Python variables in test/[e-z]* (#136964 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/136964 Approved by: https://github.com/justinchuby, https://github.com/albanD	2024-12-18 23:02:30 +00:00
Brian Hirsh	8d2c272e5a	properly register conjugate/neg fallthroughs to prim ops (#132699 ) A few aten ops (like `clone` and `copy_` get fallthrough registrations to the Conjugate/Negative keys. We haven't been giving the same treatment to their corresponding `prims` variants, which can cause infinite loops in some cases. Fixes an infinite loop that showed up in tests from https://github.com/pytorch/pytorch/pull/132563 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132699 Approved by: https://github.com/albanD	2024-08-06 17:57:04 +00:00
eellison	fc872e98f3	Infer prim tags from equivalent aten ones (#130367 ) Take intersection of all the tags for corresponding aten op overloads. Previously, some of the rng ops not having tags caused issues with constant folding (they should get decomposed but thats a separate issue). Pull Request resolved: https://github.com/pytorch/pytorch/pull/130367 Approved by: https://github.com/ezyang	2024-07-11 20:53:52 +00:00
Xuehai Pan	67ef2683d9	[BE] wrap deprecated function/class with `typing_extensions.deprecated` (#127689 ) Use `typing_extensions.deprecated` for deprecation annotation if possible. Otherwise, add `category=FutureWarning` to `warnings.warn("message")` if the category is missing. Note that only warnings that their messages contain `[Dd]eprecat(ed\|ion)` are updated in this PR. Resolves #126888 - #126888 This PR is split from PR #126898. - #126898 ------ Pull Request resolved: https://github.com/pytorch/pytorch/pull/127689 Approved by: https://github.com/Skylion007	2024-06-02 12:30:43 +00:00
PyTorch MergeBot	033e733021	Revert "[BE] wrap deprecated function/class with `typing_extensions.deprecated` (#126898 )" This reverts commit 749a132fb0a8325cbad4734a563aa459ca611991. Reverted https://github.com/pytorch/pytorch/pull/126898 on behalf of https://github.com/fbgheith due to switching typing-extensions=4.3.0 to 4.9.0 causes internal failure ([comment](https://github.com/pytorch/pytorch/pull/126898#issuecomment-2142884456))	2024-05-31 19:47:24 +00:00
Xuehai Pan	749a132fb0	[BE] wrap deprecated function/class with `typing_extensions.deprecated` (#126898 ) Use `typing_extensions.deprecated` for deprecation annotation if possible. Otherwise, add `category=FutureWarning` to `warnings.warn("message")` if the category is missing. Note that only warnings that their messages contain `[Dd]eprecat(ed\|ion)` are updated in this PR. UPDATE: Use `FutureWarning` instead of `DeprecationWarning`. Resolves #126888 - #126888 Pull Request resolved: https://github.com/pytorch/pytorch/pull/126898 Approved by: https://github.com/albanD	2024-05-29 12:09:27 +00:00
PyTorch MergeBot	c006c8b50e	Revert "markDynamoStrictTest some more (#115885 )" This reverts commit 55ce4693ff2c0b6e50b8af323f36ecc7ff929638. Reverted https://github.com/pytorch/pytorch/pull/115885 on behalf of https://github.com/atalman due to OSSCI oncall, broke inductor ([comment](https://github.com/pytorch/pytorch/pull/115885#issuecomment-1858409669))	2023-12-15 19:51:24 +00:00
rzou	55ce4693ff	markDynamoStrictTest some more (#115885 ) Featuring test_native_mha.py test_nn.py test_prims.py test_schema_check.py test_serialization.py test_show_pickle.py test_sort_and_select.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/115885 Approved by: https://github.com/voznesenskym ghstack dependencies: #115845, #115855, #115856, #115857, #115858, #115870, #115871, #115879	2023-12-15 13:19:52 +00:00
atalman	ba4285bd9e	Deprecate primTorch module, replace it with decompositions in module Owners (#114754 ) Context: pt2 oncall is revamping its labeling system. One of the guidelines is to remove duplicate labeling in our system. Both primTorch and decomposition labels are referring to the same thing. primTorch was the legacy name (and we no longer have a primTorch project), so using decomposition as the label name makes more sense. Right now, the only open issues that use "module: primTorch" are the ones generated by the DISABLED bots. Once we replace the label in the bot, we can safely remove the primTorch label. Here an example of the issue that has primTorch label : https://github.com/pytorch/pytorch/issues/112719 Torchbot uses following logic to auto extract module owners: https://github.com/pytorch/test-infra/blob/main/torchci/pages/api/flaky-tests/disable.ts#L391 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114754 Approved by: https://github.com/huydhn	2023-11-29 18:27:20 +00:00
Brian Hirsh	7a21e960c6	fix infinite loop with primtorch and .to(meta) (#109632 ) Fixes https://github.com/pytorch/pytorch/issues/103532, which I needed in order to more easily/properly test that python functionalization is at parity with C++ functionalization for conj/neg. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109632 Approved by: https://github.com/ezyang ghstack dependencies: #108654, #109662	2023-09-22 07:09:04 +00:00
Ivan Yashchuk	c913f3857f	Remove dynamo+nvfuser (#105789 ) This PR removes unmaintained Dynamo+nvFuser. Pull Request resolved: https://github.com/pytorch/pytorch/pull/105789 Approved by: https://github.com/jansel, https://github.com/jjsjann123, https://github.com/albanD	2023-08-08 22:29:32 +00:00
PyTorch MergeBot	891bb259f8	Revert "Remove dynamo+nvfuser (#105789 )" This reverts commit 6030151d3758715097b89026e9b3b3f839fbd544. Reverted https://github.com/pytorch/pytorch/pull/105789 on behalf of https://github.com/DanilBaibak due to Break a lot of tests on main. ([comment](https://github.com/pytorch/pytorch/pull/105789#issuecomment-1669710571))	2023-08-08 14:20:32 +00:00
Ivan Yashchuk	6030151d37	Remove dynamo+nvfuser (#105789 ) This PR removes unmaintained Dynamo+nvFuser. Pull Request resolved: https://github.com/pytorch/pytorch/pull/105789 Approved by: https://github.com/jansel, https://github.com/jjsjann123, https://github.com/albanD	2023-08-08 13:29:31 +00:00
Yanbo Liang	0ad93a3d56	Fix aten.logspace decomposition (#105201 ) Fixes #104118 Pull Request resolved: https://github.com/pytorch/pytorch/pull/105201 Approved by: https://github.com/ezyang	2023-07-22 04:10:20 +00:00
soulitzer	3912b722f3	Upgrade LoggingTensor mode and add traceback collection (#103734 ) Parts borrowed from: https://github.com/albanD/subclass_zoo/blob/main/logging_mode.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/103734 Approved by: https://github.com/albanD	2023-06-21 22:04:30 +00:00
Kurt Mohler	ee83c646bb	Replace `_prims_common.check` with `torch._check` (#103240 ) This relands most of the changes from #102219 which were backed out by #103128. However, instead of removing `_prims_common.check`, it adds a warning and a comment mentioning that it will be removed in the future and `torch._check` should be used instead. As mentioned in https://github.com/pytorch/pytorch/pull/103128#pullrequestreview-1466414415, `_prims_common.check` cannot yet be removed because of some internal usage Part of #72948 Pull Request resolved: https://github.com/pytorch/pytorch/pull/103240 Approved by: https://github.com/albanD	2023-06-21 00:46:17 +00:00
Animesh Jain	58d2c66a70	[activation checkpointing] Higher order functional rng op wrappers (#102934 ) Introduces two higher order operators * run_and_save_rng_state - Saves the current rng state and then runs the op. * run_with_rng_state - Runs the op with the rng state supplied as an input Ideally, we would like to use torch.compile for these operators. But currently the plan is to introduce these operators at the partitioner level, obviating the need to support them fully through the torch.compile stack. To ensure that we have good enough debugging with minifiers, we have ensure that they work with make_fx. In future, we can move on torch.compile. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102934 Approved by: https://github.com/jansel, https://github.com/zou3519	2023-06-12 22:54:17 +00:00
Yanbo Liang	4c6f7cbc86	Fix prims unbind if given dimension size is 0 (#100122 ) Fixes #99832 Pull Request resolved: https://github.com/pytorch/pytorch/pull/100122 Approved by: https://github.com/ngimel	2023-04-26 23:40:21 +00:00
Animesh Jain	6bc4651193	[philox_rand] Dynamic shape support (#99290 ) Extends the functionalization of rng work to Dynamic shapes. An example of the generated graph looks like this ~~~ [2023-04-24 21:41:37,446] torch._functorch.aot_autograd.__aot_graphs: [INFO] TRACED GRAPH ===== Forward graph 1 ===== <eval_with_key>.7 class <lambda>(torch.nn.Module): def forward(self, arg0_1: i64[], arg1_1: i64[], arg2_1: Sym(s0), arg3_1: Sym(s1), arg4_1: f32[s0, s1]): # File: /scratch/anijain/work/pytorch/test/test_functionalization_of_rng_ops.py:46, code: a = torch.rand_like(x) * x add: i64[] = torch.ops.aten.add.Tensor(arg1_1, 0) philox_rand = torch.ops.rngprims.philox_rand.default([arg2_1, arg3_1], arg0_1, add, None, device(type='cuda', index=0), torch.float32); add = None getitem: f32[s0, s1] = philox_rand[0] getitem_1: i64[] = philox_rand[1]; philox_rand = None add_1: i64[] = torch.ops.aten.add.Tensor(getitem_1, 0); getitem_1 = None mul: f32[s0, s1] = torch.ops.aten.mul.Tensor(getitem, arg4_1); getitem = arg4_1 = None # File: /scratch/anijain/work/pytorch/test/test_functionalization_of_rng_ops.py:47, code: a = torch.rand_like(x) * a add_2: i64[] = torch.ops.aten.add.Tensor(arg1_1, add_1) philox_rand_1 = torch.ops.rngprims.philox_rand.default([arg2_1, arg3_1], arg0_1, add_2, None, device(type='cuda', index=0), torch.float32); arg2_1 = arg3_1 = arg0_1 = add_2 = None getitem_2: f32[s0, s1] = philox_rand_1[0] getitem_3: i64[] = philox_rand_1[1]; philox_rand_1 = None add_3: i64[] = torch.ops.aten.add.Tensor(add_1, getitem_3); add_1 = getitem_3 = None mul_1: f32[s0, s1] = torch.ops.aten.mul.Tensor(getitem_2, mul); getitem_2 = mul = None # No stacktrace found for following nodes add_4: i64[] = torch.ops.aten.add.Tensor(arg1_1, add_3); arg1_1 = add_3 = None return (mul_1, add_4) ~~~ Each rand op is accompanied by its offset calculation op. Pull Request resolved: https://github.com/pytorch/pytorch/pull/99290 Approved by: https://github.com/ezyang, https://github.com/bdhirsh	2023-04-25 22:40:28 +00:00
Animesh Jain	fdbc8625a1	Functionalization of torch.rand/rand_like ops (#97377 ) This PR introduces the functionalization of RNG ops. Key points are * Introduces a new `philox_rand` prim operator that accepts seed, offset. * Adds decompositions for random operators that use these philox_rand prims * Adds a PhiloxStateTracker to track the offset for each occurence of rand ops * Changes calling convention of AOT Autograd and adds <fwd_seed, fwd_base_offset> and <bwd_seed, bwd_base_offset> * Monkeypatches set_rng_state and get_rng_state while AOT Autograd tracing to record the rng state behavior * Raises assertion for CPU because CPU does not Philox RNG. Not dealt in this PR * dropout op - offset calculation is different * other distributions like normal, poisson etc * Inductor support * Cudagraph support * Dynamic shape support An example ~~~ class Custom(torch.autograd.Function): @staticmethod def forward(ctx, x): ctx.save_for_backward(x) a = torch.rand_like(x) * x a = torch.rand_like(x) * a return a @staticmethod def backward(ctx, grad_out): x, = ctx.saved_tensors return grad_out * torch.rand_like(grad_out) * torch.cos(x) ====== Forward graph 0 ====== def forward(self, fwd_seed_1: i64[], fwd_base_offset_1: i64[], primals_1: f32[16, 16]): # No stacktrace found for following nodes add: i64[] = torch.ops.aten.add.Tensor(fwd_base_offset_1, 0) philox_rand: f32[16, 16] = torch.ops.prims.philox_rand.default([16, 16], fwd_seed_1, add, [16, 1], device(type='cuda', index=0), torch.float32); add = None mul: f32[16, 16] = torch.ops.aten.mul.Tensor(philox_rand, primals_1); philox_rand = None add_1: i64[] = torch.ops.aten.add.Tensor(fwd_base_offset_1, 4); fwd_base_offset_1 = None philox_rand_1: f32[16, 16] = torch.ops.prims.philox_rand.default([16, 16], fwd_seed_1, add_1, [16, 1], device(type='cuda', index=0), torch.float32); fwd_seed_1 = add_1 = None mul_1: f32[16, 16] = torch.ops.aten.mul.Tensor(philox_rand_1, mul); philox_rand_1 = mul = None return [mul_1, primals_1] ====== Backward graph 0 ====== def forward(self, bwd_seed_1: i64[], bwd_base_offset_1: i64[], primals_1: f32[16, 16], tangents_1: f32[16, 16]): # No stacktrace found for following nodes add_2: i64[] = torch.ops.aten.add.Tensor(bwd_base_offset_1, 0); bwd_base_offset_1 = None philox_rand_2: f32[16, 16] = torch.ops.prims.philox_rand.default([16, 16], bwd_seed_1, add_2, [16, 1], device(type='cuda', index=0), torch.float32); bwd_seed_1 = add_2 = None mul_2: f32[16, 16] = torch.ops.aten.mul.Tensor(tangents_1, philox_rand_2); tangents_1 = philox_rand_2 = None cos: f32[16, 16] = torch.ops.aten.cos.default(primals_1); primals_1 = None mul_3: f32[16, 16] = torch.ops.aten.mul.Tensor(mul_2, cos); mul_2 = cos = None return [mul_3] ~~~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/97377 Approved by: https://github.com/ezyang	2023-04-16 09:55:56 +00:00
Peter Bell	640b9c80f9	[primTorch] Redefine prim.collapse{,_view} end point to be inclusive (#92017 ) This makes `prims.collapse(a, start, end)` match the behavior of `torch.flatten(a, start, end)` more closely. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92017 Approved by: https://github.com/mruberry	2023-02-21 20:36:50 +00:00
Peter Bell	2622adb980	[primTorch] Make `prims.collapse` a real prim (#91748 ) `prims.collapse` is currently just a plain python function wrapping `prims.reshape`. This turns it into a real prim, and also factors out some of the code duplicated with `_collapse_view_aten`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91748 Approved by: https://github.com/lezcano, https://github.com/ngimel	2023-02-21 20:36:50 +00:00
jjsjann123	21eb7f70f1	Nvfuser python API import fix (#94036 ) 1. Having nvfuser python API import working with both devel and upstream; 2. Add environment variable to allow custom nvfuser code base to be built with upstream pytorch core. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94036 Approved by: https://github.com/malfet, https://github.com/davidberard98	2023-02-16 20:10:40 +00:00
jjsjann123	c11b301bcd	[NVFUSER] refactor nvfuser build (#89621 ) This PR is the first step towards refactors the build for nvfuser in order to have the coegen being a standalone library. Contents inside this PR: 1. nvfuser code base has been moved to `./nvfuser`, from `./torch/csrc/jit/codegen/cuda/`, except for registration code for integration (interface.h/interface.cpp) 2. splits the build system so nvfuser is generating its own `.so` files. Currently there are: - `libnvfuser_codegen.so`, which contains the integration, codegen and runtime system of nvfuser - `nvfuser.so`, which is nvfuser's python API via pybind. Python frontend is now exposed via `nvfuser._C.XXX` instead of `torch._C._nvfuser` 3. nvfuser cpp tests is currently being compiled into `nvfuser_tests` 4. cmake is refactored so that: - nvfuser now has its own `CMakeLists.txt`, which is under `torch/csrc/jit/codegen/cuda/`. - nvfuser backend code is not compiled inside `libtorch_cuda_xxx` any more - nvfuser is added as a subdirectory under `./CMakeLists.txt` at the very end after torch is built. - since nvfuser has dependency on torch, the registration of nvfuser at runtime is done via dlopen (`at::DynamicLibrary`). This avoids circular dependency in cmake, which will be a nightmare to handle. For details, look at `torch/csrc/jit/codegen/cuda/interface.cpp::LoadingNvfuserLibrary` Future work that's scoped in following PR: - Currently since nvfuser codegen has dependency on torch, we need to refactor that out so we can move nvfuser into a submodule and not rely on dlopen to load the library. @malfet - Since we moved nvfuser into a cmake build, we effectively disabled bazel build for nvfuser. This could impact internal workload at Meta, so we need to put support back. cc'ing @vors Pull Request resolved: https://github.com/pytorch/pytorch/pull/89621 Approved by: https://github.com/davidberard98	2023-01-26 02:50:44 +00:00
Peter Bell	f0b592dae7	Make masked_fill reference traceable (#90972 ) As the comment states, `item()` cannot be used since you can't trace through a scalar. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90972 Approved by: https://github.com/ngimel	2023-01-18 10:54:42 +00:00
Pruthvi Madugundu	15949fc248	[ROCm] Enable few test_prim UTs for ROCm (#88983 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88983 Approved by: https://github.com/IvanYashchuk, https://github.com/jeffdaily, https://github.com/malfet	2022-12-07 06:21:31 +00:00
Ryan Spring	3c9431f505	Add factory functions to python frontend (#89230 ) - Add `full` nvprim to support factory functions because the full reference uses `empty` and `fill` while we have a full factory function. - Change `full_like` reference to call `full` to avoid defining another nvprim. - Enable support for new_zeros to enable `cudnn_batch_norm` decomposition. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89230 Approved by: https://github.com/kevinstephano, https://github.com/mruberry	2022-12-06 07:16:21 +00:00
kshitij12345	fe3a226d74	[minor] use set_default_dtype instead of try and finally (#88295 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88295 Approved by: https://github.com/mruberry	2022-11-03 19:28:33 +00:00
Ivan Yashchuk	4a8382b58e	Update caching of tensor arguments for nvFuser's fusion creation (#87860 ) Previously nvFuser's fusion definition was cached based on concrete shape and strides of tensor inputs for simplicity and correctness. This PR changes Python's cache to check the number of dimensions, size-1 dimensions, and contiguity information based on given strides and shapes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87860 Approved by: https://github.com/kevinstephano, https://github.com/jjsjann123, https://github.com/ngimel	2022-11-02 09:29:20 +00:00
PyTorch MergeBot	c0761a835b	Revert "[dynamo] Error when user nests FX with dynamo (#87797 )" This reverts commit 1da5aeb97b73664ff0fe2f4bb48379655cede969. Reverted https://github.com/pytorch/pytorch/pull/87797 on behalf of https://github.com/ezyang due to breaks nvfuser stack, needs more investigation	2022-10-31 23:49:37 +00:00
Michael Suo	1da5aeb97b	[dynamo] Error when user nests FX with dynamo (#87797 ) Today, this doesn't work and dynamo errors out in a very non-obvious way (see: https://gist.github.com/suo/dde04830372ab51a4a34ea760f14200a). Here, we detect the error early and exit with a nicer msg. Also add a config option to just no-op dynamo (which need to unblock internal enablement). cc @jansel @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx Pull Request resolved: https://github.com/pytorch/pytorch/pull/87797 Approved by: https://github.com/yf225, https://github.com/soumith, https://github.com/jansel	2022-10-28 04:59:08 +00:00
Ivan Yashchuk	ae4fbac819	Enable nvprims.transpose fusions for nvFuser (#86967 ) This PR allows transposes to be fused with other operations. If a fusion group is formed only from operations that just manipulate metadata in PyTorch (transpose, view, etc.) then this group is not sent to nvFuser. On top of that if we have converted to `nvprims` but then decided to not form a fusion group we modify the graph use `prim.impl_aten` attribute instead of calling `prim(args, *kwargs)` that has a higher overhead. cc @kevinstephano @jjsjann123 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86967 Approved by: https://github.com/jjsjann123, https://github.com/SherlockNoMad	2022-10-26 17:00:07 +00:00
Ivan Yashchuk	72f446b9bc	Remove getitem special handling in the partitioner (#87073 ) This special handling of getitem unnecessary splits fusions at functions with tuple outputs. Example script: ```py import torch from torch.fx.passes.infra.partitioner import CapabilityBasedPartitioner from torch._prims.nvfuser_executor import NvfuserPrimOperatorSupport from torch.fx.experimental.proxy_tensor import make_fx def func(x): xx = torch.ops.nvprims.add(x, 1) var, mean = torch.ops.nvprims.var_mean(x, correction=0) var_cos = torch.ops.nvprims.cos(var) mean_sin = torch.ops.nvprims.sin(mean) return torch.ops.nvprims.add(var_cos, mean_sin) a = torch.randn(5, 3, 3, device="cuda") gm = make_fx(func)(a) gm.graph.print_tabular() supported_ops = NvfuserPrimOperatorSupport() partitioner = CapabilityBasedPartitioner( gm, supported_ops, allows_single_node_partition=False ) partitions = partitioner.propose_partitions() print(partitions) partitioned_graph = partitioner.fuse_partitions(partitions) partitioned_graph.graph.print_tabular() ``` Output on master: ```py opcode name target args kwargs ------------- --------- --------------------------- ---------------- ----------------- placeholder x_1 x_1 () {} call_function add nvprims.add.default (x_1, 1) {} call_function var_mean nvprims.var_mean.main (x_1, [0, 1, 2]) {'correction': 0} call_function getitem <built-in function getitem> (var_mean, 0) {} call_function getitem_1 <built-in function getitem> (var_mean, 1) {} call_function cos nvprims.cos.default (getitem,) {} call_function sin nvprims.sin.default (getitem_1,) {} call_function add_1 nvprims.add.default (cos, sin) {} output output output (add_1,) {} [{cos, sin, add_1}, {var_mean, add, getitem, getitem_1}] opcode name target args kwargs ------------- --------- --------------------------- ---------------------- -------- placeholder x_1 x_1 () {} call_module fused_1 fused_1 (x_1,) {} call_function getitem_2 <built-in function getitem> (fused_1, 0) {} call_function getitem_3 <built-in function getitem> (fused_1, 1) {} call_module fused_0 fused_0 (getitem_2, getitem_3) {} output output output (fused_0,) {} ``` Output with this PR: ``` [{var_mean, add_1, cos, sin, add, getitem_1, getitem}] opcode name target args kwargs ----------- ------- -------- ---------- -------- placeholder x_1 x_1 () {} call_module fused_0 fused_0 (x_1,) {} output output output (fused_0,) {} ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/87073 Approved by: https://github.com/jjsjann123, https://github.com/SherlockNoMad	2022-10-26 14:18:46 +00:00
Ivan Yashchuk	31931515bc	Workarounds for cudnn_batch_norm with TorchRefsNvfuserCapabilityMode (#86796 ) This PR adds workarounds to support AOT Autograd's graphs containing `aten.cudnn_batch_norm` and `aten.cudnn_batch_norm_backward` with `TorchRefsNvfuserCapabilityMode`. The problem with the decomposition of `aten.cudnn_batch_norm` is that it uses a `new_empty` call that is not supported by nvFuser and we are conservative with lowering functions to nvprims by default. The problem with the decomposition of `aten.cudnn_batch_norm_backward` is described here https://github.com/pytorch/pytorch/pull/86115#issue-1394883782, but changing the decomposition directly in that PR makes many tests fail. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86796 Approved by: https://github.com/mruberry	2022-10-17 18:46:28 +00:00
Ivan Yashchuk	fc3afc8407	Remove empty_like+fill from AOT Autograd graphs for nvFuser (#86908 ) AOT Autograd records C++ code `1 - tensor` as a sequence of empty_like, fill, and sub (see https://github.com/pytorch/pytorch/issues/86612). Both empty_like and fill are not supported yet. This PR is a workaround for enabling fusions of `silu_backward`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86908 Approved by: https://github.com/ngimel	2022-10-14 19:49:39 +00:00
Ivan Yashchuk	fd80684784	Add nvFuser support for torch.Tensor.view (#84634 ) This is an alternative to https://github.com/pytorch/pytorch/pull/83739. While PrimTorch has `view` as a reference, we would like to use nvFuser's implementation for `view` for now. Later we might transition to PrimTorch's `torch._refs.view`. See `test_nvprims_view` for examples of things that are now sent to nvFuser. Note that nvFuser's `view` is a copy-like operation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84634 Approved by: https://github.com/kevinstephano, https://github.com/mruberry	2022-10-14 12:08:02 +00:00
Ivan Yashchuk	68a6113248	Add nvFuser support for torch.native_batch_norm (#85562 ) This PR adds nvFuser's implementation for batch_norm as there's no reference yet (https://github.com/pytorch/pytorch/pull/81191) and no in-place copy support (https://github.com/pytorch/pytorch/pull/84545). Pull Request resolved: https://github.com/pytorch/pytorch/pull/85562 Approved by: https://github.com/kevinstephano, https://github.com/ngimel	2022-10-03 15:03:08 +00:00
jjsjann123	fd553c46f4	nvprim op support runtime checks on dtype compatibility on prims.convert_element_type (#85566 ) I'm seeing issue that we lower `_to_copy` into `nvprims.convert_element_type`. In cases where we are casting to a dtype that's not supported by nvfuser, this raise runtime error. I added a quick check in the lowering part where each op can peek at fx.node and make a runtime decision on whether the given op should be lowered to nvprim. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85566 Approved by: https://github.com/IvanYashchuk, https://github.com/ngimel	2022-09-30 23:19:25 +00:00
Ivan Yashchuk	b00a5359f7	Add a way to skip lowering to nvprims (#85811 ) This PR adds `skip_ops` argument to `TorchRefsNvfuserCapabilityMode` and `NvfuserPrimsMode` which is an iterable of function names to be skipped in the translation to nvprims process. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85811 Approved by: https://github.com/mruberry, https://github.com/jjsjann123	2022-09-30 12:01:45 +00:00
jjsjann123	cab6ffa0f7	catches failure on nvprim speculative lowering (#85580 ) Fixes #85517 Added a try/catch exception during tracing `get_isolated_graphmodule` inside `_is_func_unsupported_nvfuser`. Stops speculative lowering to nvprim when query errors out. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85580 Approved by: https://github.com/mruberry, https://github.com/IvanYashchuk	2022-09-29 15:22:45 +00:00
Thomas Viehmann	795028a3ce	Make Python reference for permute accept varargs (#85460 ) Fixes #85452 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85460 Approved by: https://github.com/jjsjann123, https://github.com/mruberry, https://github.com/ngimel	2022-09-28 03:50:42 +00:00
samdow	18d8c548f4	[Modes] remove enable and rewrite mode stack (squashed) (#84774 ) Based on @ezyang's suggestion, mode stack now has "one true mode" which is the _only_ mode that can ever be active at the C++ level. That mode's torch dispatch is just to take the top mode in the stack, reenable itself (if we aren't at the end of the mode stack), and run the top mode's torch_{dispatch\|function} This maintains that in the middle of a mode's torch dispatch, the mode itself will not be active. It changes the function the user has to call to see what the current mode is (no longer queries the C++, it's python only) but allows the user to also see the entire mode stack easily Removes `enable_torch_dispatch_mode` and `.restore()` since neither makes sense in this new setup ### Background Why do we want this? Well, a pretty common pattern that was coming up was that users had to do something like ```python ## PRE-PR UX def f(mode): with mode.restore(): # user needs to understand this restore thing? ... with Mode() as m: pass f(m) ``` Many users were getting error from forgetting to call `.restore` or from forgetting to add the (tbh weird) "mode instantiation" step where they use the mode as a context manager with an empty body. Really, they wanted to treat modes like context managers and just write ```python ## FROM FEEDBACK, USER DESIRED CODE. POSSIBLE POST-PR def f(mode): with mode: ... f(Mode()) ``` Technical Details With the old mode stack, we basically had a linked list so the mode itself could only be used once and had a fixed parent. In this new design, the mode stack is just a python list that we're pushing to and popping from. There's only one mode that's ever active at the C++ level and it runs the next mode in the Python list. The modes don't have state on them anymore Pull Request resolved: https://github.com/pytorch/pytorch/pull/84774 Approved by: https://github.com/ezyang, https://github.com/zou3519	2022-09-27 01:04:35 +00:00
Kevin Stephano	c7b17d7eb1	Add nvprims `rand_like` support for Dropout (#85077 ) NM Pull Request resolved: https://github.com/pytorch/pytorch/pull/85077 Approved by: https://github.com/IvanYashchuk, https://github.com/mruberry	2022-09-23 18:03:35 +00:00
Michael Voznesensky	8ca1839d32	Python Dispatcher integration with C++ dispatcher (#85050 ) #84826 but without ghstack Pull Request resolved: https://github.com/pytorch/pytorch/pull/85050 Approved by: https://github.com/malfet	2022-09-15 00:43:36 +00:00
PyTorch MergeBot	706b990306	Revert "Python Dispatcher integration with C++ dispatcher (#84826 )" This reverts commit 35f6a69191ef762cf22b6cbfe94b8d9406e16674. Reverted https://github.com/pytorch/pytorch/pull/84826 on behalf of https://github.com/malfet due to Broke dynamo, see `35f6a69191`	2022-09-14 14:07:58 +00:00
Michael Voznesensky	35f6a69191	Python Dispatcher integration with C++ dispatcher (#84826 ) Signed-off-by: Edward Z. Yang <ezyangfb.com> From @ezyang's original PR: There are a number of situations where we have non-backend kernels (e.g., CompositeImplicitAutograd, batching rules) which we would like to port to Python, but we have no way to integrate these ports with the overall system while using preexisting C++ registrations otherwise. This PR changes that by introducing a Python dispatcher (which can have its own kernels directly in Python), which can be interpose over ordinary C++ dispatch. The ingredients: We introduce a new PythonDispatcher dispatch key, that has the same tenor as FuncTorchDynamicLayerFrontMode: it works by getting triggered before every other dispatch key in the dispatch key, and shunting to a Python implementation The Python dispatcher is a per-interpreter global object that is enabled/disabled via the guard EnablePythonDispatcher/DisablePythonDispatcher. We don't make it compositional as I have no idea what a compositional version of this feature would look like. Because it is global, we don't need to memory manage it and so I use a simpler SafePyHandle (newly added) to control access to this pointer from non-Python C++. Like __torch_dispatch__, we use PyInterpreter to get to the Python interpreter to handle the dispatch. I need to reimplement dispatch table computation logic in Python. To do this, I expose a lot more helper functions for doing computations on alias dispatch keys and similar. I also improve the pybind11 handling for DispatchKey so that you can either accept the pybind11 bound enum or a string; this simplifies our binding code. See https://github.com/pybind/pybind11/issues/483#issuecomment-1237418106 for how this works; the technique is generally useful. I need to be able to call backend fallbacks. I do this by permitting you to call at a dispatch key which doesn't have a kernel for the operator; if the kernel doesn't exist, we check the backend fallback table instead. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/84826 Approved by: https://github.com/ezyang	2022-09-14 06:57:19 +00:00
Ivan Yashchuk	d802fcfcd8	Add config to PrimTorch's nvFuser executor (#84482 ) This PR adds `executor_parameters` keyword argument to `torch._prims.executor.execute`. For now there are two knobs: * `use_python_fusion_cache: bool = True` whether to use lru_cache when constructing fusion object or not. * `allow_single_op_fusion: bool = True` whether to allow fusions with single callable Behavior can be controlled by passing dict with custom specified values as `executor_parameters` argument. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84482 Approved by: https://github.com/jjsjann123, https://github.com/ngimel	2022-09-09 07:58:21 +00:00
jjsjann123	1a33e944b5	nvfuser torchbench patch (#84411 ) 1. Patching nvfuser_execute to take aten nvprim fallback when no cuda tensors are provided as inputs 2. Extending support of nvfuser python API on cpu scalar tensor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84411 Approved by: https://github.com/ngimel, https://github.com/kevinstephano, https://github.com/IvanYashchuk	2022-09-07 05:22:37 +00:00

1 2

87 Commits