pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-11-03 07:24:58 +08:00

Author	SHA1	Message	Date
Prachi Gupta	c5ebc12f7f	[ROCm] unkip test_non_standard_bool except for failings ops (#152956 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/152956 Approved by: https://github.com/jithunnair-amd, https://github.com/jeffdaily	2025-05-13 15:55:42 +00:00
Prachi Gupta	b8f4dc5a9f	[ROCm] opportunistic fastatomics for ReduceAdd operations for MI300 GPUs (#146264 ) In this approach, we are catching any lane within a wave that is doing fastatomics to the same destination address and computing the sum on the CU. This is leading to 3x improvement in scatter_add performance and 2x improvement in index_select. scatter_add performance on MI300x: dtype\|Baseline (before optimizations)\|opportunistic fastatomics -------\|----------------------------------\|---------------------------------- f32\|1.389425039\|0.430447996 fp16\|2.195472956\|0.779729486 bf16\|2.194051027\|0.784599513 Using the following reproducer ``` import torch import triton def main(): dtype = torch.float32 dim = 1305301 a = torch.rand(100, device="cuda", dtype=dtype) index = torch.randint(0, 100, (dim,), device="cuda") src = torch.rand(dim, device="cuda", dtype=dtype) print("=" * 20) print( triton.testing.do_bench( lambda: a.scatter_add(0, index, src), return_mode="median", ) ) print("=" * 20) if __name__ == "__main__": main() ``` co-authored by: @amd-hhashemi Pull Request resolved: https://github.com/pytorch/pytorch/pull/146264 Approved by: https://github.com/jeffdaily, https://github.com/mxz297 Co-authored-by: Hashem Hashemi <hashem.hashemi@amd.com>	2025-04-22 21:55:40 +00:00
FFFrog	b01877aa13	Fix addbmm & addmv & baddbmm out dtype check (#148176 ) ---- - torch.addbmm - torch.addmv - torch.baddbmm ISSUE related: https://github.com/pytorch/pytorch/issues/138399 Pull Request resolved: https://github.com/pytorch/pytorch/pull/148176 Approved by: https://github.com/jansel ghstack dependencies: #148174	2025-04-09 07:02:56 +00:00
FFFrog	3e0038ae85	Fix torch.matmul related out dtype check (#148174 ) ---- - torch.matmul -> CompositeImplicitAutograd -> dot_out (when left_dim == 1 & right_dim == 1) -> mv_out (when left_dim == 2 & right_dim == 1) -> mm_out (when left_dim == 1 & right_dim == 2) -> ... - torch.dot - torch.vdot - torch.mm - torch.mv ISSUE related: https://github.com/pytorch/pytorch/issues/138399 Pull Request resolved: https://github.com/pytorch/pytorch/pull/148174 Approved by: https://github.com/jansel	2025-04-08 17:00:28 +00:00
Kurt Mohler	164d2c887b	Add check in `test_cow_input` to ensure COW data is never changed (#150723 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/150723 Approved by: https://github.com/Skylion007	2025-04-07 04:35:00 +00:00
Robert Hardwick	810d2a3dbd	[ARM] Fix bug in _ref_test_helper in test_ops and fix failing test on Aarch64 (#146597 ) We have a failing unit test on Aarch64 ``` Exception: Caused by reference input at index 34: SampleInput(input=Tensor[size=(5, 5, 4), device="cpu", dtype=torch.complex64, contiguous=False], args=(), kwargs={}, broadcasts_input=False, name='') To execute this test, run the following from the base repo dir: PYTORCH_OPINFO_SAMPLE_INPUT_INDEX=34 python test/test_ops.py TestCommonCPU.test_python_ref__refs_square_cpu_complex64 This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 ``` After debugging it I found that `ex` variable is not being reset to None on each loop inside _ref_test_helper. Which after fixing, highlighted another expectedFailure to reenable - `nn.functional.hinge_embedding_loss` which was incorrectly being skipped due to the same problem. `4a545eb85d/test/test_ops.py (L546)` ex variable is not reset after this for next loop iteration Pull Request resolved: https://github.com/pytorch/pytorch/pull/146597 Approved by: https://github.com/digantdesai	2025-02-25 14:15:10 +00:00
cyy	8f728e28dd	Enable ASAN in CUDA tests (#147512 ) It should work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/147512 Approved by: https://github.com/soulitzer	2025-02-25 02:58:39 +00:00
FFFrog	b0fa92042b	Fix torch.mean out dtype check (#147188 ) For CPU: Type promotion is supported for torch.mean For Meta: Not supported for torch.mean ISSUE related: https://github.com/pytorch/pytorch/issues/138399 Pull Request resolved: https://github.com/pytorch/pytorch/pull/147188 Approved by: https://github.com/albanD	2025-02-25 02:50:03 +00:00
Aaron Orenstein	086d146f6f	Update ruff linter for PEP585 (#147540 ) This turns on PEP585 enforcement in RUFF. - Updates the target python version - Stops ignoring UP006 warnings (PEP585) - Fixes a few issues which crept into the tree in the last day Pull Request resolved: https://github.com/pytorch/pytorch/pull/147540 Approved by: https://github.com/justinchuby, https://github.com/Skylion007	2025-02-22 04:45:17 +00:00
Shunting Zhang	bc0191802f	[inductor] add size-asserts for fallback ops (#145904 ) Fix https://github.com/pytorch/pytorch/issues/144717 Pull Request resolved: https://github.com/pytorch/pytorch/pull/145904 Approved by: https://github.com/jansel	2025-02-07 18:44:32 +00:00
Aaron Gokaslan	f3304571fc	[BE][Ez]: FURB148 - remove useless enumerate calls (#145619 ) Remove useless enumerate calls Pull Request resolved: https://github.com/pytorch/pytorch/pull/145619 Approved by: https://github.com/drisspg	2025-01-24 23:37:15 +00:00
cyy	df458be4e5	[4/N] Apply py39 ruff and pyupgrade fixes (#143257 ) ```torch/fx/passes/annotate_getitem_nodes.py``` was changed to support the new type hinting annotations. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143257 Approved by: https://github.com/justinchuby, https://github.com/albanD	2025-01-04 10:47:51 +00:00
Wenqin Yang	8d9ff9c8a4	Fix a bug for wrong stride in fake tensor (#141427 ) Fixes #141426 Please see details in the issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141427 Approved by: https://github.com/jansel	2024-12-31 23:45:32 +00:00
Tom Ritchford	d8c8ba2440	Fix unused Python variables in test/[e-z]* (#136964 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/136964 Approved by: https://github.com/justinchuby, https://github.com/albanD	2024-12-18 23:02:30 +00:00
cyy	7c1d5db1f3	[2/N] Enable UBSAN tests (#141740 ) Apply c10::load in more places. The function was introduced to cast a byte to valid boolean values, thus fixing the UBSAN errors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/141740 Approved by: https://github.com/ezyang	2024-12-03 20:52:26 +00:00
George Wigley	44707b0667	Pass rounding_mode for div reference inputs through kwargs (#136308 ) Previously, the reference inputs for div with rounding mode did not supply the rounding_mode keyword argument. This didn't match the sample inputs for this op. Pull Request resolved: https://github.com/pytorch/pytorch/pull/136308 Approved by: https://github.com/albanD Co-authored-by: Xia, Weiwen <weiwen.xia@intel.com> Co-authored-by: Bob Ren <bobren@meta.com> Co-authored-by: Xilun Wu <12968408+XilunWu@users.noreply.github.com> Co-authored-by: siahuat0727 <tansiahuat@gmail.com>	2024-11-29 21:28:24 +00:00
Jake Harmon	740d1eb030	Fix test_out when run on CPU with CUDA available (#137140 ) Ever since #135140, this test will fail if run with CPU parameterization (e.g. test_out__refs_logical_or_cpu_float32) and CUDA available - as far as I can tell, the PyTorch CI isn't currently checking for this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/137140 Approved by: https://github.com/ezyang	2024-11-21 23:10:07 +00:00
Yukio Siraichi	446ea2aea5	`pow`: fix meta function output argument dtype check. (#140287 ) Tracking issue: #138399 This PR changes the `pow` C++ implementation, making its C++ meta kernel consistent with its Python ref implementation. The following example shows the inconsistency between the two: ```python def run(device): S = (5,) a = torch.rand(S, device=device, dtype=torch.float32) b = 2 out = torch.empty(S, device=device, dtype=torch.float64) return torch.pow(a, b, out=out) >>> run("cpu") Traceback (most recent call last): File "test.py", line 34, in run return torch.pow(a, b, out=out) RuntimeError: Found dtype Double but expected Float >>> run("meta") tensor(..., device='meta', size=(5,), dtype=torch.float64) ``` ~Update:~ ~Note that this happens only for `pow.Tensor_Scalar` overloads. Therefore, this PR needed further 2 modifications:~ - ~Split the `pow` ref implementation, making `pow.Tensor_Scalar` error on mismatching output dtypes~ - ~Create a dispatch for `pow` when `_refs.pow()` is called~ Update: Changing the `TensorIteratorConfig` for `pow.Tensor_Scalar` was easier and, after the discussion below, more correct. The solution was to change the `TensorIteratorBase::build_output_borrowing_argument_owning_unary_op` function, setting: - `cast_common_dtype_to_outputs`; and - `enforce_safe_casting_to_output`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140287 Approved by: https://github.com/ezyang	2024-11-20 13:28:47 +00:00
Yukio Siraichi	48a276c5a0	`log_softmax`: fix meta function output argument dtype check. (#140289 ) Tracking issue: #138399 Pull Request resolved: https://github.com/pytorch/pytorch/pull/140289 Approved by: https://github.com/ezyang ghstack dependencies: #140186, #140286, #140288	2024-11-18 23:05:29 +00:00
Yukio Siraichi	435286e985	Fix unary references' out dtype check. (#140288 ) Tracking issue: #138399 This PR fixes a number of reference implementations (which are also used as meta functions), making them more consistent with CPU device. More specifically, it fixes those operations that use `_make_elementwise_unary_reference` decorator, and don't error on mismatching out argument dtype while they error when using concrete devices (e.g. CPU). The fixed operations are: - `abs` - `ceil` - `floor` - `frac` - `isneginf` - `isposinf` - `sgn` - `sign` - `signbit` - `trunc` Pull Request resolved: https://github.com/pytorch/pytorch/pull/140288 Approved by: https://github.com/ezyang ghstack dependencies: #140186, #140286	2024-11-18 23:05:29 +00:00
Yukio Siraichi	216b6a952c	`triangular_solve`: fix meta function output argument dtype check. (#140286 ) Tracking issue: #138399 Pull Request resolved: https://github.com/pytorch/pytorch/pull/140286 Approved by: https://github.com/ezyang ghstack dependencies: #140186	2024-11-14 15:25:14 +00:00
zeshengzong	cb71bcc542	Replace clone.detach with detach.clone (#140264 ) Fixes #64532 As state in issue, replace `clone.detach` by `detach.clone` Pull Request resolved: https://github.com/pytorch/pytorch/pull/140264 Approved by: https://github.com/soulitzer	2024-11-13 07:01:02 +00:00
Yukio Siraichi	c182c7ccfc	Fix `triangular_solve` meta function out parameter names. (#140186 ) This PR replaces the parameter names specified in the `triangular_solve_meta` function (specifically in its `@out_wrapper(...)` decorator) by those written in the _native_functions.yaml_ file. This name mismatch caused the operation to fail when using the meta device (see error below): ```python Traceback (most recent call last): File "examples/test.py", line 23, in <module> torch.triangular_solve(b.to("meta"), A.to("meta"), out=meta_out) File "torch/_decomp/__init__.py", line 100, in _fn return f(args, kwargs, out=None if is_none else out_kwargs) File "torch/_prims_common/wrappers.py", line 289, in _fn result = fn(args, **kwargs) TypeError: triangular_solve_meta() got an unexpected keyword argument 'X' ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/140186 Approved by: https://github.com/ezyang	2024-11-12 19:04:34 +00:00
Yukio Siraichi	fef5e94657	`addmm`: error on output dtype mismatch. (#138520 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/138520 Approved by: https://github.com/ezyang ghstack dependencies: #138515	2024-10-30 21:46:39 +00:00
Yukio Siraichi	6da3a043a8	Add test for consistency between meta and CPU devices. (#138515 ) Reference: https://github.com/pytorch/pytorch/issues/138399 This PR introduces an `OpInfo` test that checks whether running each `out=` operation using meta inputs is consistent with using concrete (e.g. CPU) inputs. More specifically, it tests the case where the output tensors are not of the expected data type. According to the `out=` specification, some operations should error. I have added XFAIL to the set of operations that are currently failing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138515 Approved by: https://github.com/ezyang	2024-10-30 21:46:39 +00:00
cyy	da1c1a9884	[4/N] Don't skip ASAN on some tests (#139189 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/139189 Approved by: https://github.com/ezyang	2024-10-30 00:59:32 +00:00
PyTorch MergeBot	228963ad60	Revert "Add test for consistency between meta and CPU devices. (#138515 )" This reverts commit 006130d8eae834d17e3d3e21e61c506740cce6dc. Reverted https://github.com/pytorch/pytorch/pull/138515 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but the test is failing in trunk, maybe a landrace ([comment](https://github.com/pytorch/pytorch/pull/138515#issuecomment-2442357471))	2024-10-28 18:45:09 +00:00
Yukio Siraichi	006130d8ea	Add test for consistency between meta and CPU devices. (#138515 ) Reference: https://github.com/pytorch/pytorch/issues/138399 This PR introduces an `OpInfo` test that checks whether running each `out=` operation using meta inputs is consistent with using concrete (e.g. CPU) inputs. More specifically, it tests the case where the output tensors are not of the expected data type. According to the `out=` specification, some operations should error. I have added XFAIL to the set of operations that are currently failing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138515 Approved by: https://github.com/ezyang	2024-10-28 16:58:48 +00:00
Benjamin Glass	f984b88718	Ensure noncontiguous tensor creation tests offsetting (#136396 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/136396 Approved by: https://github.com/amjames, https://github.com/eellison ghstack dependencies: #136055	2024-10-02 00:40:43 +00:00
Fuzzkatt	6300eb1dc7	tf32 off for test_noncontiguous_samples in test_ops.py (#136484 ) Upstreaming minor unit test fix from nvidia internal CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/136484 Approved by: https://github.com/soulitzer	2024-09-24 14:26:47 +00:00
David Berard	289486d007	Move attention kernels back from fake_impls to meta_registrations (#134288 ) See #121528 for additional context. In #120682, we moved the attention kernels from meta_registrations to fake_impls with the intent of fixing the device handling for seed/offset: these are typically on CPU. We needed to put the registrations in fake_impls to do this because meta_registrations doesn't have a way to specify device, whereas fake_impls does. But when we tried to actually fix the device types (#120839), we had to revert the PR because it broke cudagraph handling (during which seed/offset _are_ on CUDA). Now, we want to put the registrations back in meta_registrations so that we can call these kernels with meta tensors. The use case is later in this stack - we want to be able to use the flop counter with these kernels. Also - I specifically skip the `compare_tensor_meta()` check in test_fake / test_fake_autocast tests for the `_efficient_attention_forward` and `_flash_attention_forward` kernels, which fails because of the device mismatch from the seed/offset tensors. Then we can un-skip these opinfos. I verified that the efficient_attention_forward bug (#120842) is now caught by these opinfos if I revert the fix from this PR. Differential Revision: [D61687369](https://our.internmc.facebook.com/intern/diff/D61687369) Pull Request resolved: https://github.com/pytorch/pytorch/pull/134288 Approved by: https://github.com/drisspg	2024-08-27 21:10:36 +00:00
David Berard	d433a603af	[BE] use torch.amp.autocast instead of torch.cuda.amp.autocast (#134291 ) torch.cuda.amp.autocast / torch.cpu.amp.autocast are deprecated and spew a ton of warnings when these tests run. This PR: Update to just use torch.amp.autocast(device). Note: this uncovers a bug in the test: when `device` is CUDA, it actually shows up as "cuda:0" - so previously, this test was _always_ using `torch.cpu.amp.autocast` even for `cuda` device. This PR fixes this, and uncovers additional bugs in `pinverse` and `linalg.pinv`; `linalg.pinv` was already failing before on CPU, but now the test also catches failures on CUDA, (and this PR adds to the skipped-test list). Pull Request resolved: https://github.com/pytorch/pytorch/pull/134291 Approved by: https://github.com/YuqingJ	2024-08-24 15:07:49 +00:00
Xuehai Pan	4226ed1585	[BE] Format uncategorized Python files with `ruff format` (#132576 ) Remove patterns ``, `test/`, and `torch/**` in `tools/linter/adapters/pyfmt_linter.py` and run `lintrunner`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/132576 Approved by: https://github.com/ezyang, https://github.com/Skylion007 ghstack dependencies: #132574	2024-08-04 17:13:31 +00:00
zengxian	d3e932dc10	[CI] Add inductor cpu accuracy test running on AVX2 runners (#128682 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/128682 Approved by: https://github.com/jgong5, https://github.com/desertfire	2024-07-26 13:24:41 +00:00
ankurneog	ebc012ace6	Add hooks for execution on intel gaudi devices - 1 (#128584 ) ## Motivation This is follow up to PR:https://github.com/pytorch/pytorch/pull/126970 to support Gaudi devices for Pytorch UT execution. ## Changes We are adding additional hooks to: 1. Add dtype exceptions for Gaudi/HPU 2. Extend onlyNativeDevices decorator functionality to add additional devices Pull Request resolved: https://github.com/pytorch/pytorch/pull/128584 Approved by: https://github.com/albanD	2024-07-20 05:03:36 +00:00
Xuehai Pan	ba48cf6535	[BE][Easy][6/19] enforce style for empty lines in import segments in `test/` (#129757 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129757 Approved by: https://github.com/ezyang	2024-07-17 06:42:37 +00:00
Xuehai Pan	4d7bf72d93	[BE][Easy] fix ruff rule needless-bool (SIM103) (#130206 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/130206 Approved by: https://github.com/malfet	2024-07-14 08:17:52 +00:00
Xuehai Pan	973037be6a	[BE][Easy] apply autofix for ruff rules unnecessary-collection-call (C408): `list()` / `tuple()` / `dict()` (#130199 ) This PR changes the empty collection factory call to Python literals: - `list()` -> `[]` - `tuple()` -> `()` - `dict()` -> `{}` The Python literals are more performant and safer. For example, the bytecode for building an empty dictionary: ```bash $ python3 -m dis - <<EOS import collections d1 = {} d2 = dict() dict = collections.OrderedDict d3 = dict() EOS ``` ```text 0 0 RESUME 0 1 2 LOAD_CONST 0 (0) 4 LOAD_CONST 1 (None) 6 IMPORT_NAME 0 (collections) 8 STORE_NAME 0 (collections) 3 10 BUILD_MAP 0 12 STORE_NAME 1 (d1) 4 14 PUSH_NULL 16 LOAD_NAME 2 (dict) 18 CALL 0 26 STORE_NAME 3 (d2) 6 28 LOAD_NAME 0 (collections) 30 LOAD_ATTR 8 (OrderedDict) 50 STORE_NAME 2 (dict) 7 52 PUSH_NULL 54 LOAD_NAME 2 (dict) 56 CALL 0 64 STORE_NAME 5 (d3) 66 RETURN_CONST 1 (None) ``` The dict literal `{}` only has one bytecode `BUILD_MAP`, while the factory call `dict()` has three `PUSH_NULL + LOAD_NAME + CALL`. Also, the factory call is not safe if users override the `dict` name in `locals` or `globals` (see the example of replacing with `OrderedDict` above). Pull Request resolved: https://github.com/pytorch/pytorch/pull/130199 Approved by: https://github.com/malfet	2024-07-11 17:30:28 +00:00
a-gardner1	3c1cf03fde	Add fake impl for aten.unique_dim (#126561 ) Follow-up to #113118 and #124306. Developed in coordination with the solution to https://github.com/microsoft/onnxscript/pull/1547 This PR adds the missing fake tensor implementation for `aten.unique_dim`, thus enabling tracing and compilation of `torch.unique` when `dim` is not None. Local testing has proceeded with the following simple script (provided that one has checked out the changes in https://github.com/microsoft/onnxscript/pull/1547): ```python import onnx import onnxruntime as ort import logging import numpy as np onnx_program = torch.onnx.dynamo_export( lambda x: torch.unique(x, dim=0, return_inverse=True), torch.arange(10), export_options=torch.onnx.ExportOptions( dynamic_shapes=True, diagnostic_options=torch.onnx.DiagnosticOptions( verbosity_level=logging.DEBUG))) onnx_program.save("torch_unique.onnx") onnx_inputs = onnx_program.adapt_torch_inputs_to_onnx(torch.arange(10)) onnx_outputs = onnx_program(*onnx_inputs) loaded_onnx_program = onnx.load("torch_unique.onnx") onnx.checker.check_model(loaded_onnx_program) ort_session = ort.InferenceSession("torch_unique.onnx") inputs = np.random.randint(0, 10, 10) print(f"Inputs: {inputs}") outputs = ort_session.run(None, { "l_x_": inputs }) print(f"Outputs: {outputs}") print("Success") ``` Co-authored-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/126561 Approved by: https://github.com/ezyang	2024-06-01 04:03:10 +00:00
Aaron Gokaslan	5a1216bb2e	[BE]: Update ruff to 0.4.1 (#124549 ) Update ruff to 0.4.1 . This version fixes a lot false negatives/false positives, is 20-40% faster, and has various other bug fixes. Below is a before and after table showing the execution time of ruff lint and ruff format in milliseconds courtesy of https://astral.sh/blog/ruff-v0.4.0 \| Repository \| Linter (v0.3) \| Linter (v0.4) \| Formatter (v0.3) \| Formatter (v0.4) \| \|----------------------------------------------------\|---------------\|---------------\|------------------\|------------------\| \| [pytorch/pytorch](https://github.com/pytorch/pytorch) \| 328.7 \| 251.8 \| 351.1 \| 274.9 \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/124549 Approved by: https://github.com/ezyang	2024-04-21 14:06:23 +00:00
Tugsbayasgalan Manlaibaatar	d23bf9cef0	Add fake impl for aten.unique2 (#124306 ) Reapply of: https://github.com/pytorch/pytorch/pull/121571 Differential Revision: [D56258431](https://our.internmc.facebook.com/intern/diff/D56258431) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124306 Approved by: https://github.com/gmagogsfm	2024-04-17 22:55:27 +00:00
statelesshz	2216068559	Enable UFMT on test/test_ops* (#123935 ) Part of https://github.com/pytorch/pytorch/issues/123062 Pull Request resolved: https://github.com/pytorch/pytorch/pull/123935 Approved by: https://github.com/ezyang	2024-04-13 03:31:56 +00:00
Kurt Mohler	db895ace1d	Only run backward part of COW test if results are strided (#123870 ) Fixes #123792 Pull Request resolved: https://github.com/pytorch/pytorch/pull/123870 Approved by: https://github.com/ezyang	2024-04-12 04:43:02 +00:00
Kurt Mohler	3908ebca86	Test COW materialization in backward ops (#123593 ) Part of #97856 Pull Request resolved: https://github.com/pytorch/pytorch/pull/123593 Approved by: https://github.com/ezyang	2024-04-09 22:31:50 +00:00
Kurt Mohler	ca9606f809	Update COW OpInfo test to include kwargs and expected materialization (#122437 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122437 Approved by: https://github.com/ezyang	2024-03-24 06:07:30 +00:00
PyTorch MergeBot	c80601f35a	Revert "Avoid COW materialize in conv, log sigmoid, repeat, group_norm, batch_norm (#121537 )" This reverts commit a2a88f39ee991f471f2a2c54571886d70f5cd2e6. Reverted https://github.com/pytorch/pytorch/pull/121537 on behalf of https://github.com/kurtamohler due to flaky CI failures ([comment](https://github.com/pytorch/pytorch/pull/121537#issuecomment-2010937226))	2024-03-21 00:03:30 +00:00
Kurt Mohler	a2a88f39ee	Avoid COW materialize in conv, log sigmoid, repeat, group_norm, batch_norm (#121537 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121537 Approved by: https://github.com/ezyang	2024-03-19 06:15:00 +00:00
Peter Bell	34a28f01dd	[Autograd] Improve error for leaf tensors as out argument to fallback (#121089 ) Closes #120988 Currently operators that hit the autograd fallback call `check_inplace` on all mutated inputs, including out arguments. This leads to a slightly confusing error message: ``` RuntimeError: a leaf Variable that requires grad is being used in an in-place operation. ``` Compared to functions that don't fallback, which raise ``` RuntimeError: add(): functions with out=... arguments don't support automatic differentiation, but one of the arguments requires grad. ``` This changes the error message to make clear the issue is with the out argument, but does not tighten the check to outright ban out arguments that require grad. Instead, I use the same checks from `check_inplace` which allows non-leaf tensors that require grad to pass without error. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121089 Approved by: https://github.com/lezcano, https://github.com/soulitzer ghstack dependencies: #121142	2024-03-05 21:13:27 +00:00
Kurt Mohler	77aea289ae	Add test to check that COW inputs are not materialized (#119507 ) Part of #97856 Pull Request resolved: https://github.com/pytorch/pytorch/pull/119507 Approved by: https://github.com/ezyang ghstack dependencies: #120455	2024-03-01 05:05:28 +00:00
Sergii Dymchenko	09aefe1502	Fix ouput typos (#120870 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/120870 Approved by: https://github.com/clee2000	2024-02-29 08:29:14 +00:00

1 2 3 4 5 ...

434 Commits