pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Author	SHA1	Message	Date
Yuanyuan Chen	fdab48a7c1	Enable all PIE rules on ruff (#165814 ) This PR enables all PIE rules on ruff, there are already some enabled rules from this family, the new added rules are ``` PIE796 Enum contains duplicate value: {value} PIE808 Unnecessary start argument in range ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/165814 Approved by: https://github.com/ezyang	2025-10-18 07:36:18 +00:00
PyTorch MergeBot	24520b8386	Revert "Enable all PIE rules on ruff (#165814 )" This reverts commit c79dfdc6550e872783aa5cb5fc9e86589bf18872. Reverted https://github.com/pytorch/pytorch/pull/165814 on behalf of https://github.com/cyyever due to Need to cover more files ([comment](https://github.com/pytorch/pytorch/pull/165814#issuecomment-3417931863))	2025-10-18 07:21:08 +00:00
Yuanyuan Chen	c79dfdc655	Enable all PIE rules on ruff (#165814 ) This PR enables all PIE rules on ruff, there are already some enabled rules from this family, the new added rules are ``` PIE796 Enum contains duplicate value: {value} PIE808 Unnecessary start argument in range ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/165814 Approved by: https://github.com/ezyang	2025-10-18 06:40:12 +00:00
Yinghai Lu	94e634942a	Fix int32 overflow in embedding_dense_backward (#165095 ) If `max_partial_segment` is large we can overflow `gid` and cause a bunch of IMA. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165095 Approved by: https://github.com/ngimel, https://github.com/eqy	2025-10-10 19:47:38 +00:00
Yu, Guangye	3a20a20e70	Fix largeTensorTest malfunction on XPU (#161988 ) # Motivation https://github.com/pytorch/pytorch/pull/143553/files#diff-6492991193449e118ff0c8d42ca544cc38a73604e505ff246a3c711aeab91748R1345 makes `largeTensorTest` malfunction on XPU. This PR aims to fix it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/161988 Approved by: https://github.com/EikanWang, https://github.com/albanD	2025-09-04 16:10:03 +00:00
emmettbicker	21833c9642	Added Diffentiable per_sample_weights Check to EmbeddingBag.cpp (#142338 ) Added a check in aten/src/ATen/native/EmbeddingBag.cpp that checks if per_sample_weights needs a gradient in order to determine if at::_embedding_bag_forward_only or at::_embedding_bag should run. Also, added two tests in test_embedding.py that check if the command now works. Fixes #136457 Pull Request resolved: https://github.com/pytorch/pytorch/pull/142338 Approved by: https://github.com/soulitzer	2024-12-11 03:42:17 +00:00
zeshengzong	cb71bcc542	Replace clone.detach with detach.clone (#140264 ) Fixes #64532 As state in issue, replace `clone.detach` by `detach.clone` Pull Request resolved: https://github.com/pytorch/pytorch/pull/140264 Approved by: https://github.com/soulitzer	2024-11-13 07:01:02 +00:00
Angel Yang	a777dea3b3	Remove dtype check on meta device (#136774 ) Summary: # Latest Update This diff is no longer needed because we did need the check to exist, to make meta behave the same as other devices, see D54526190. --------------------------------- # Background T176105639 \| case \| embedding bag weight \| per_sample_weight \| fbgemm lookup \| forward in meta \| \| A \| fp32 \| fp32 \| good \| good \| \| B \| fp16 \| fp32 \| good\| failed [check](https://fburl.com/code/k3n3h031) that forces weight dtype == per_sample_weights dtype \| \| C \| fp16 \| fp16 \| P1046999270, RuntimeError: "expected scalar type Float but found Half from fbgemm call" \| good \| \| D \| fp32 \| fp16 \| N/A \| N/A \| Currently we are in case A. Users need to add `use_fp32_embedding` in training to force embedding bag dtype to be fp32. However, users actually hope for case B to use fp16 as the embedding bag weight. When deleting `use_fp32_embedding`, they would fail the [check](https://fburl.com/code/k3n3h031) that forces `weight dtype == per_sample_weights dtype ` in meta_registration. The check is actually not necessary. Is it because the backend fbgemm does support case B. Additionally, later on in the `meta_embedding_bag`, `weight` and `per_sample_weights` don't need to be in the same dtype (https://fburl.com/code/q0tho05h, weight is src, per_sample_weights is scale) for `is_fast_path_index_select`. # This diff Therefore, this diff remove the unnecessary [check](https://fburl.com/code/k3n3h031) to support case B in meta forward. With such, users are able to use fp16 to be the emb bag dtype without the need to force per_sample_weights the same dtype in meta forward (see Test Plan). # Reference diffs to resolve this issue Diff 1: D52591217 This passes embedding bag dtype to feature_processor to make per_sample_weights same dtype as emb bag weight. However, `is_meta` also needs to be passed because of case C. fbgemm still does not support per_sample_weights = fp16 (see the above table). Therefore users are forced to only make per_sample_weights fp16 when it is on meta. The solution requires too many hacks. Diff 2: D53232739 Basically doing the same thing in diff 1 D52591217, except that the hack is added in TorchRec library. This adds an if in EBC and PEA for: when emb bag weight is fp16, it forces per_sample_weight fp16 too. However, it would then result in fbgemm issue too and has broken a bunch of prod models. Test Plan: # APS The following command will run icvr_launcher which triggers ads_launcher and run forward in meta device: ``` buck2 run mode/opt -c python.package_style=inplace //aps_models/ads/icvr:icvr_launcher_publish -- mode=mast_ig_fm_when_combo0_uhm_publish launcher.fbl_entitlement=ads_global_tc_ads_score launcher.data_project=oncall_ads_model_platform launcher.tags=[ads_ranking_taxonomy_exlarge_fm_prod] stages.train=false ``` Result: {F1461463993} Reviewed By: ezyang Differential Revision: D54175438 Pull Request resolved: https://github.com/pytorch/pytorch/pull/136774 Approved by: https://github.com/ezyang	2024-10-12 05:45:21 +00:00
Xuehai Pan	fbe6f42dcf	[BE][Easy][8/19] enforce style for empty lines in import segments in `test/[k-p]*/` (#129759 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129759 Approved by: https://github.com/justinchuby, https://github.com/ezyang	2024-07-31 02:09:20 +00:00
hongxyan	637ab85e7f	fix for launching kernel invalid config error when calling embedding … (#130994 ) …with large index Fixes #130806 When an output size of 2147483648 (=131072*16384) is expected in the above issue, it throwed out the following error: RuntimeError: HIP error: invalid configuration argument What happened was that the second parameter passed to hipLaunchKernel was crazy {2147483648,1,1}. Found two issues in the Indexing.cu: 1: ptrdiff_t was used but it is signed int, outTotalSize >= 2147483648 can cause overflow when doing [this](`39493aa934/aten/src/ATen/native/cuda/Indexing.cu (L1367)`): 2: On ROCm, std::min -> ::min did not work as expected when outTotalSize>=2147483648 As the result, 2147483648 was sent to hipLaunchKernel which the GPU does not support such a huge number since this number specifies the number of threads per block. The original code intended to set 128 threads per block, though this is debatable as the perf would not good for latest powerful GPUs (a TODO item to update for perf maybe?) , but at least it would not cause `invalid configuration argument` error. [Test] Run the same code snippet in the [issue](https://github.com/pytorch/pytorch/issues/130806), and print the output, its dim and numel(), which looks like below now: ``` output=tensor([[ 0.4044, -0.0244, -0.6865, ..., -0.7800, 0.1175, 1.6726], [-1.0866, -0.1609, 0.3538, ..., 1.9105, 0.7882, 1.1583], [-2.2079, 0.3736, 0.3610, ..., -0.2658, -0.0459, 1.3077], ..., [ 0.8753, -0.7482, -0.1978, ..., 0.9016, 1.1501, -0.5178], [-1.5845, -0.6277, 1.4520, ..., 0.5733, -2.1198, -0.0915], [-0.6310, -1.0239, -0.1910, ..., 0.4309, 0.1630, 0.3239]], device='cuda:0'), dim=2, numel=2147483648 ``` Added a large tensor unit test too. ``` /pytorch# pytest test/nn/test_embedding.py -k test_large_tensors ================================================================================== test session starts =================================================================================== platform linux -- Python 3.9.19, pytest-7.3.2, pluggy-1.4.0 rootdir: /dockerx/development/pytorch configfile: pytest.ini plugins: flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, cpp-2.3.0, hypothesis-5.35.1 collected 288 items / 287 deselected / 1 selected Running 1 items in this shard test/nn/test_embedding.py . [100%] =========================================================================== 1 passed, 287 deselected in 3.16s ============================================================================ ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/130994 Approved by: https://github.com/jeffdaily, https://github.com/xw285cornell	2024-07-20 08:33:29 +00:00
Yuanhao Ji	a625705290	Enable UFMT on all of `test/nn` (#123809 ) Part of: #123062 Ran lintrunner on: - `test/nn` with command: ```bash lintrunner -a --take UFMT --all-files ``` Co-authored-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/123809 Approved by: https://github.com/mikaylagawarecki	2024-04-12 18:32:25 +00:00
Pearu Peterson	ce2903080c	Add sparse compressed fake tensor support (#120920 ) As in the title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120920 Approved by: https://github.com/ezyang	2024-03-04 14:38:45 +00:00
Tugsbayasgalan Manlaibaatar	39e4d1a535	Make TestEmbeddingNNDeviceTypeCPU::test_EmbeddingBag_per_sample_weights_and_no_offsets_cpu_int32_float32 compatible with TorchDynamo (#120831 ) Previously, the test case directly accesses the tensor data via tensor.data which is not supported on FakeTensor. So we manually copy the tensor as a workaround. Fixes: https://github.com/pytorch/pytorch/issues/119788 Pull Request resolved: https://github.com/pytorch/pytorch/pull/120831 Approved by: https://github.com/janeyx99	2024-03-01 20:27:41 +00:00
Nikita Shulga	e660bd1422	Re-enable some embedded bag tests (#111712 ) They were temporary disabled in 2019 by https://github.com/pytorch/pytorch/pull/26599 As suggested, increased relative tolerance from 0 to 2% when tests are using float16 dtype <!-- copilot:poem --> ### <samp>🤖 Generated by Copilot at 1e49d84</samp> > _`TestEmbeddingNN`_ > _CUDA tests restored_ > _Bug fixed in autumn breeze_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/111712 Approved by: https://github.com/huydhn	2023-10-26 22:16:38 +00:00
Oleg Bulatov	192477b5ba	Enable flake8-bugbear B020 lint (#110823 ) Fixes part of https://github.com/pytorch/pytorch/issues/106571 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110823 Approved by: https://github.com/Skylion007	2023-10-24 22:43:47 +00:00
Driss Guessous	818f2297e6	Ensure fill_ works when value is a view of self (#109835 ) # Summary Introduced a BC breaking change in #109533 when self is a view of the value. By using the copy_() op inside fill_ we were hitting `assert_no_partial_overlap` in tensor iterator. Ideal we would be able to avoid this check if value.numel() ==1 . But rather than monkeying around with tensor iterator I just clone the input instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109835 Approved by: https://github.com/mikaylagawarecki	2023-09-26 17:12:48 +00:00
drisspg	deea268e43	Update aten_fill to avoid d2h sync (#109533 ) Fixes #109115 ### Before: <img width="1526" alt="Screenshot 2023-09-18 at 11 57 32 AM" src="https://github.com/pytorch/pytorch/assets/32754868/394a4c51-7cae-4d05-b9ad-b17d02beaf72"> ### After: <img width="1550" alt="Screenshot 2023-09-18 at 11 57 25 AM" src="https://github.com/pytorch/pytorch/assets/32754868/e2f774f5-5374-49c3-95ec-dd3a85f74a2e"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/109533 Approved by: https://github.com/mikaylagawarecki	2023-09-19 13:34:49 +00:00
Michael Voznesensky	e5e9d563c2	Lift user defined attributes into inputs for certain cases (user defined types and tensors) (#103386 ) (1) Lazy (converts to dynamo variable on access only) (2) Uses existing side effect/reconstruct tech (3) not tensor opinionated Pull Request resolved: https://github.com/pytorch/pytorch/pull/103386 Approved by: https://github.com/jansel	2023-06-20 23:45:19 +00:00
Edward Z. Yang	d303665d33	Make int unspecialization actually work (#95621 ) OK, so this PR used to be about reducing the number of constants we specialize on, but it turns out that unspecialization was ~essentially never used (because we still constant specialized way too aggressively) and I ended up having to fix a bunch of issues to actually get tests to pass. So this PR is now "make int unspecialization actually work". As part of this, I have to turn off unspecialization by default, as there are still latent bugs in inductor. The general strategy is that an unspecialized int is represented as a SymInt. Representing it as a 0d tensor (which is what the code used to do) is untenable: (1) we often need unspecialized ints to participate in size computations, but we have no way of propagating sympy expressions through tensor compute, and (2) a lot of APIs work when passed SymInt, but not when passed a Tensor. However, I continue to represent Numpy scalars as Tensors, as they are rarely used for size computation and they have an explicit dtype, so they are more accurately modeled as 0d tensors. * I folded in the changes from https://github.com/pytorch/pytorch/pull/95099 as I cannot represent unspecialized ints as SymInts without also turning on dynamic shapes. This also eliminates the necessity for test_unspec.py, as toggling specialization without dynamic shapes doesn't do anything. As dynamic shapes defaults to unspecializing, I just deleted this entirely; for the specialization case, I rely on regular static shape tests to catch it. (Hypothetically, we could also rerun all the tests with dynamic shapes, but WITH int/float specialization, but this seems... not that useful? I mean, I guess export wants it, but I'd kind of like our Source heuristic to improve enough that export doesn't have to toggle this either.) * Only 0/1 integers get specialized by default now * A hodgepodge of fixes. I'll comment on the PR about them. Fixes https://github.com/pytorch/pytorch/issues/95469 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/95621 Approved by: https://github.com/jansel, https://github.com/Chillee	2023-03-04 01:22:08 +00:00
puririshi98	8aa34602f7	Jetson Update for CI Redo (#94549 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/94549 Approved by: https://github.com/ezyang, https://github.com/malfet	2023-02-21 17:13:38 +00:00
soulitzer	e5c2a35d83	Add check that embedding_bag's weight is 2D (#94931 ) Fixes https://github.com/pytorch/pytorch/issues/94445 Pull Request resolved: https://github.com/pytorch/pytorch/pull/94931 Approved by: https://github.com/albanD	2023-02-16 02:37:47 +00:00
haozhe.zhu	ed54a5d06b	enable bf16 emb (#94163 ) Merge https://github.com/pytorch/pytorch/pull/89199 and https://github.com/pytorch/pytorch/pull/91949 into one PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94163 Approved by: https://github.com/jianyuh, https://github.com/malfet, https://github.com/jgong5	2023-02-12 00:05:09 +00:00
PyTorch MergeBot	53e4fe076a	Revert "enable bf16 emb (#94163 )" This reverts commit f3bf46e801dec2637751224fd6e27fbf97453bc6. Reverted https://github.com/pytorch/pytorch/pull/94163 on behalf of https://github.com/huydhn due to Sorry for reverting your PR. But I suspect that it causes flaky SIGSEGV failure for linux-bionic-py3.8-clang9 / test (crossref) job in trunk. For example, `05397b1250`	2023-02-07 00:32:22 +00:00
haozhe.zhu	f3bf46e801	enable bf16 emb (#94163 ) Merge https://github.com/pytorch/pytorch/pull/89199 and https://github.com/pytorch/pytorch/pull/91949 into one PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94163 Approved by: https://github.com/jianyuh, https://github.com/malfet, https://github.com/jgong5	2023-02-06 07:11:40 +00:00
Kurt Mohler	93a810b045	Add dim checks for internal `embedding_bag` functions (#85433 ) Fixes #85213 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85433 Approved by: https://github.com/malfet	2022-12-27 19:27:33 +00:00
mingfeima	9d523616b3	fix segfault for EmbeddingBag on CPU slow path when include_last_offset is true (#90358 ) This PR is to fix the segfault reported at https://github.com/pytorch/pytorch/issues/89677, this is a `double free` issue caused by `invalid read`. The reported issue broke at slow path for `EmbeddingBag` on float32, at [EmbeddingBag.cpp#L451](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/EmbeddingBag.cpp#L451) Root cause is that `add_indices` has index which exceeds range of `output_data`, for the reported case. The offsets are given as ``` {0, 6, 12, 15, 25, 32, 40, 42, 46, 53, 53} ``` The `indices` has 55 elements and `offsets[-1] != indices.size(0)`. When `include_last_offset` is true, the `output` will be in the shape of {offsets.size(0) - 1, weight.sizes()[1]}, which will be {10, 5}. Originally, `add_indices` will be (i re-arange the 1D tensor by rows, so here 10 rows in total) ``` ### this is 55 elements 0 0 0 0 0 0 1 1 1 1 1 1 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 6 6 7 7 7 7 8 8 8 8 8 8 8 10 10 ``` The last row has index of 10 which is out of range of output tensor whose size is [10, 5]. The reason is `make_offset2bag` at [EmbeddingBag.cpp#L66](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/EmbeddingBag.cpp#L66) would give the following `offset2bag`: ``` ### this is 55 + 1 elements: 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 0 2 0 0 ``` Notice for index 53, it is added twice. The fix is ignore the last index from `offsets` when `include_last_offset` is true, also this behavior aligns with CUDA, quote from https://github.com/pytorch/pytorch/pull/57208#issuecomment-1021727378 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90358 Approved by: https://github.com/ezyang	2022-12-16 02:08:14 +00:00
inisis	158a071034	add _freeze for embedding op (#86769 ) Fixes #86663 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86769 Approved by: https://github.com/albanD	2022-10-13 20:12:52 +00:00
Kshiteej K	6a5550fca4	[test_nn] split embedding tests from test_nn (#85892 ) Ref https://github.com/pytorch/pytorch/issues/63085 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85892 Approved by: https://github.com/albanD	2022-09-30 21:45:40 +00:00

28 Commits