28 Commits

Author SHA1 Message Date
fdab48a7c1 Enable all PIE rules on ruff (#165814)
This PR enables all PIE rules on ruff, there are already some enabled rules from this family, the new added rules are
```
PIE796  Enum contains duplicate value: {value}
PIE808  Unnecessary start argument in range
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165814
Approved by: https://github.com/ezyang
2025-10-18 07:36:18 +00:00
24520b8386 Revert "Enable all PIE rules on ruff (#165814)"
This reverts commit c79dfdc6550e872783aa5cb5fc9e86589bf18872.

Reverted https://github.com/pytorch/pytorch/pull/165814 on behalf of https://github.com/cyyever due to Need to cover more files ([comment](https://github.com/pytorch/pytorch/pull/165814#issuecomment-3417931863))
2025-10-18 07:21:08 +00:00
c79dfdc655 Enable all PIE rules on ruff (#165814)
This PR enables all PIE rules on ruff, there are already some enabled rules from this family, the new added rules are
```
PIE796  Enum contains duplicate value: {value}
PIE808  Unnecessary start argument in range
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165814
Approved by: https://github.com/ezyang
2025-10-18 06:40:12 +00:00
94e634942a Fix int32 overflow in embedding_dense_backward (#165095)
If `max_partial_segment` is large we can overflow `gid` and cause a bunch of IMA.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165095
Approved by: https://github.com/ngimel, https://github.com/eqy
2025-10-10 19:47:38 +00:00
3a20a20e70 Fix largeTensorTest malfunction on XPU (#161988)
# Motivation
https://github.com/pytorch/pytorch/pull/143553/files#diff-6492991193449e118ff0c8d42ca544cc38a73604e505ff246a3c711aeab91748R1345 makes `largeTensorTest` malfunction on XPU. This PR aims to fix it.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161988
Approved by: https://github.com/EikanWang, https://github.com/albanD
2025-09-04 16:10:03 +00:00
21833c9642 Added Diffentiable per_sample_weights Check to EmbeddingBag.cpp (#142338)
Added a check in aten/src/ATen/native/EmbeddingBag.cpp that checks if per_sample_weights needs a gradient in order to determine if at::_embedding_bag_forward_only or at::_embedding_bag should run.

Also, added two tests in test_embedding.py that check if the command now works.

Fixes #136457
Pull Request resolved: https://github.com/pytorch/pytorch/pull/142338
Approved by: https://github.com/soulitzer
2024-12-11 03:42:17 +00:00
cb71bcc542 Replace clone.detach with detach.clone (#140264)
Fixes #64532

As state in issue, replace `clone.detach` by `detach.clone`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140264
Approved by: https://github.com/soulitzer
2024-11-13 07:01:02 +00:00
a777dea3b3 Remove dtype check on meta device (#136774)
Summary:
# Latest Update

This diff is no longer needed because we did need the check to exist, to make meta behave the same as other devices, see D54526190.

---------------------------------

# Background

T176105639

| case | embedding bag weight | per_sample_weight | fbgemm lookup | forward in meta |
| A | fp32 | fp32 | good | good |
| B | fp16 | fp32 | good| failed [check](https://fburl.com/code/k3n3h031) that forces weight dtype ==  per_sample_weights dtype |
| C | fp16 | fp16 | P1046999270, RuntimeError: "expected scalar type Float but found Half from fbgemm call" | good |
| D | fp32 | fp16 | N/A | N/A |

Currently we are in case A. Users need to add `use_fp32_embedding` in training to force embedding bag dtype to be fp32. However, users actually hope for case B to use fp16 as the embedding bag weight. When deleting `use_fp32_embedding`, they would fail the [check](https://fburl.com/code/k3n3h031) that forces `weight dtype ==  per_sample_weights dtype ` in meta_registration.

The check is actually not necessary. Is it because the backend fbgemm does support case B. Additionally, later on in the `meta_embedding_bag`, `weight` and `per_sample_weights` don't need to be in the same dtype (https://fburl.com/code/q0tho05h, weight is src, per_sample_weights is scale) for `is_fast_path_index_select`.

# This diff
Therefore, this diff remove the unnecessary [check](https://fburl.com/code/k3n3h031) to support case B in meta forward. With such, users are able to use fp16 to be the emb bag dtype without the need to force per_sample_weights the same dtype in meta forward (see Test Plan).

# Reference diffs to resolve this issue
Diff 1: D52591217
This passes embedding bag dtype to feature_processor to make per_sample_weights same dtype as emb bag weight. However, `is_meta` also needs to be passed because of case C. fbgemm still does not support per_sample_weights = fp16 (see the above table). Therefore users are forced to only make per_sample_weights fp16 when it is on meta. The solution requires too many hacks.

Diff 2: D53232739
Basically doing the same thing in diff 1 D52591217, except that the hack is added in TorchRec library. This adds an if in EBC and PEA for: when emb bag weight is fp16, it forces per_sample_weight fp16 too. However, it would then result in fbgemm issue too and has broken a bunch of prod models.

Test Plan:
# APS
The following command will run icvr_launcher which triggers ads_launcher and run forward in meta device:
```
buck2 run mode/opt -c python.package_style=inplace //aps_models/ads/icvr:icvr_launcher_publish -- mode=mast_ig_fm_when_combo0_uhm_publish launcher.fbl_entitlement=ads_global_tc_ads_score launcher.data_project=oncall_ads_model_platform launcher.tags=[ads_ranking_taxonomy_exlarge_fm_prod] stages.train=false
```

Result:
 {F1461463993}

Reviewed By: ezyang

Differential Revision: D54175438

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136774
Approved by: https://github.com/ezyang
2024-10-12 05:45:21 +00:00
fbe6f42dcf [BE][Easy][8/19] enforce style for empty lines in import segments in test/[k-p]*/ (#129759)
See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter.

You can review these PRs via:

```bash
git diff --ignore-all-space --ignore-blank-lines HEAD~1
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129759
Approved by: https://github.com/justinchuby, https://github.com/ezyang
2024-07-31 02:09:20 +00:00
637ab85e7f fix for launching kernel invalid config error when calling embedding … (#130994)
…with large index

Fixes #130806
When an output size of 2147483648 (=131072*16384) is expected in the above issue, it throwed out the following error:
RuntimeError: HIP error: invalid configuration argument

What happened was that the second parameter passed to hipLaunchKernel was crazy {2147483648,1,1}.
Found two issues in the Indexing.cu:

1: ptrdiff_t was used but it is signed int,  outTotalSize >= 2147483648 can cause overflow when doing [this](39493aa934/aten/src/ATen/native/cuda/Indexing.cu (L1367)):
2: On ROCm, std::min -> ::min did not work as expected when outTotalSize>=2147483648

As the result, 2147483648 was sent to hipLaunchKernel which the GPU does not support such a huge number since this number specifies the number of threads per block. The original code intended to set 128 threads per block, though this is debatable as the perf would not good for latest powerful GPUs (a TODO item to update for perf maybe?) , but at least it would not cause `invalid configuration argument` error.

[Test]
Run the same code snippet in the [issue](https://github.com/pytorch/pytorch/issues/130806), and print the output, its dim and numel(), which looks like below now:
```
output=tensor([[ 0.4044, -0.0244, -0.6865,  ..., -0.7800,  0.1175,  1.6726],
        [-1.0866, -0.1609,  0.3538,  ...,  1.9105,  0.7882,  1.1583],
        [-2.2079,  0.3736,  0.3610,  ..., -0.2658, -0.0459,  1.3077],
        ...,
        [ 0.8753, -0.7482, -0.1978,  ...,  0.9016,  1.1501, -0.5178],
        [-1.5845, -0.6277,  1.4520,  ...,  0.5733, -2.1198, -0.0915],
        [-0.6310, -1.0239, -0.1910,  ...,  0.4309,  0.1630,  0.3239]],
       device='cuda:0'), dim=2, numel=2147483648
```

Added a large tensor unit test too.
```
/pytorch# pytest test/nn/test_embedding.py -k test_large_tensors
================================================================================== test session starts ===================================================================================
platform linux -- Python 3.9.19, pytest-7.3.2, pluggy-1.4.0
rootdir: /dockerx/development/pytorch
configfile: pytest.ini
plugins: flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, cpp-2.3.0, hypothesis-5.35.1
collected 288 items / 287 deselected / 1 selected
Running 1 items in this shard

test/nn/test_embedding.py .                                                                                                                                                        [100%]

=========================================================================== 1 passed, 287 deselected in 3.16s ============================================================================
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130994
Approved by: https://github.com/jeffdaily, https://github.com/xw285cornell
2024-07-20 08:33:29 +00:00
a625705290 Enable UFMT on all of test/nn (#123809)
Part of: #123062

Ran lintrunner on:

- `test/nn`

with command:

```bash
lintrunner -a --take UFMT --all-files
```

Co-authored-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123809
Approved by: https://github.com/mikaylagawarecki
2024-04-12 18:32:25 +00:00
ce2903080c Add sparse compressed fake tensor support (#120920)
As in the title.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120920
Approved by: https://github.com/ezyang
2024-03-04 14:38:45 +00:00
39e4d1a535 Make TestEmbeddingNNDeviceTypeCPU::test_EmbeddingBag_per_sample_weights_and_no_offsets_cpu_int32_float32 compatible with TorchDynamo (#120831)
Previously, the test case directly accesses the tensor data via tensor.data which is not supported on FakeTensor. So we manually copy the tensor as a workaround.
Fixes: https://github.com/pytorch/pytorch/issues/119788

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120831
Approved by: https://github.com/janeyx99
2024-03-01 20:27:41 +00:00
e660bd1422 Re-enable some embedded bag tests (#111712)
They were temporary disabled in 2019 by  https://github.com/pytorch/pytorch/pull/26599

As suggested, increased relative tolerance from 0 to 2% when tests are using float16 dtype

<!--
copilot:poem
-->
### <samp>🤖 Generated by Copilot at 1e49d84</samp>

> _`TestEmbeddingNN`_
> _CUDA tests restored_
> _Bug fixed in autumn breeze_

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111712
Approved by: https://github.com/huydhn
2023-10-26 22:16:38 +00:00
192477b5ba Enable flake8-bugbear B020 lint (#110823)
Fixes part of https://github.com/pytorch/pytorch/issues/106571

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110823
Approved by: https://github.com/Skylion007
2023-10-24 22:43:47 +00:00
818f2297e6 Ensure fill_ works when value is a view of self (#109835)
# Summary
Introduced a BC breaking change in #109533 when self is a view of the value. By using the copy_() op inside fill_ we were hitting `assert_no_partial_overlap` in tensor iterator.

Ideal we would be able to avoid this check if value.numel() ==1 . But rather than monkeying around with tensor iterator I just clone the input instead.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109835
Approved by: https://github.com/mikaylagawarecki
2023-09-26 17:12:48 +00:00
deea268e43 Update aten_fill to avoid d2h sync (#109533)
Fixes #109115

### Before:
<img width="1526" alt="Screenshot 2023-09-18 at 11 57 32 AM" src="https://github.com/pytorch/pytorch/assets/32754868/394a4c51-7cae-4d05-b9ad-b17d02beaf72">

### After:
<img width="1550" alt="Screenshot 2023-09-18 at 11 57 25 AM" src="https://github.com/pytorch/pytorch/assets/32754868/e2f774f5-5374-49c3-95ec-dd3a85f74a2e">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109533
Approved by: https://github.com/mikaylagawarecki
2023-09-19 13:34:49 +00:00
e5e9d563c2 Lift user defined attributes into inputs for certain cases (user defined types and tensors) (#103386)
(1) Lazy (converts to dynamo variable on access only)
(2) Uses existing side effect/reconstruct tech
(3) not tensor opinionated

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103386
Approved by: https://github.com/jansel
2023-06-20 23:45:19 +00:00
d303665d33 Make int unspecialization actually work (#95621)
OK, so this PR used to be about reducing the number of constants we specialize on, but it turns out that unspecialization was ~essentially never used (because we still constant specialized way too aggressively) and I ended up having to fix a bunch of issues to actually get tests to pass. So this PR is now "make int unspecialization actually work". As part of this, I have to turn off unspecialization by default, as there are still latent bugs in inductor.

The general strategy is that an unspecialized int is represented as a SymInt. Representing it as a 0d tensor (which is what the code used to do) is untenable: (1) we often need unspecialized ints to participate in size computations, but we have no way of propagating sympy expressions through tensor compute, and (2) a lot of APIs work when passed SymInt, but not when passed a Tensor. However, I continue to represent Numpy scalars as Tensors, as they are rarely used for size computation and they have an explicit dtype, so they are more accurately modeled as 0d tensors.

* I folded in the changes from https://github.com/pytorch/pytorch/pull/95099 as I cannot represent unspecialized ints as SymInts without also turning on dynamic shapes. This also eliminates the necessity for test_unspec.py, as toggling specialization without dynamic shapes doesn't do anything. As dynamic shapes defaults to unspecializing, I just deleted this entirely; for the specialization case, I rely on regular static shape tests to catch it. (Hypothetically, we could also rerun all the tests with dynamic shapes, but WITH int/float specialization, but this seems... not that useful? I mean, I guess export wants it, but I'd kind of like our Source heuristic to improve enough that export doesn't have to toggle this either.)
* Only 0/1 integers get specialized by default now
* A hodgepodge of fixes. I'll comment on the PR about them.

Fixes https://github.com/pytorch/pytorch/issues/95469

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95621
Approved by: https://github.com/jansel, https://github.com/Chillee
2023-03-04 01:22:08 +00:00
8aa34602f7 Jetson Update for CI Redo (#94549)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94549
Approved by: https://github.com/ezyang, https://github.com/malfet
2023-02-21 17:13:38 +00:00
e5c2a35d83 Add check that embedding_bag's weight is 2D (#94931)
Fixes https://github.com/pytorch/pytorch/issues/94445

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94931
Approved by: https://github.com/albanD
2023-02-16 02:37:47 +00:00
ed54a5d06b enable bf16 emb (#94163)
Merge https://github.com/pytorch/pytorch/pull/89199 and https://github.com/pytorch/pytorch/pull/91949 into one PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94163
Approved by: https://github.com/jianyuh, https://github.com/malfet, https://github.com/jgong5
2023-02-12 00:05:09 +00:00
53e4fe076a Revert "enable bf16 emb (#94163)"
This reverts commit f3bf46e801dec2637751224fd6e27fbf97453bc6.

Reverted https://github.com/pytorch/pytorch/pull/94163 on behalf of https://github.com/huydhn due to Sorry for reverting your PR. But I suspect that it causes flaky SIGSEGV failure for linux-bionic-py3.8-clang9 / test (crossref) job in trunk.  For example, 05397b1250
2023-02-07 00:32:22 +00:00
f3bf46e801 enable bf16 emb (#94163)
Merge https://github.com/pytorch/pytorch/pull/89199 and https://github.com/pytorch/pytorch/pull/91949 into one PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94163
Approved by: https://github.com/jianyuh, https://github.com/malfet, https://github.com/jgong5
2023-02-06 07:11:40 +00:00
93a810b045 Add dim checks for internal embedding_bag functions (#85433)
Fixes #85213

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85433
Approved by: https://github.com/malfet
2022-12-27 19:27:33 +00:00
9d523616b3 fix segfault for EmbeddingBag on CPU slow path when include_last_offset is true (#90358)
This PR is to fix the segfault reported at https://github.com/pytorch/pytorch/issues/89677, this is a `double free` issue caused by `invalid read`.

The reported issue broke at slow path for `EmbeddingBag` on float32, at [EmbeddingBag.cpp#L451](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/EmbeddingBag.cpp#L451)

Root cause is that `add_indices` has index which exceeds range of `output_data`, for the reported case.

The offsets are given as
```
{0,  6, 12, 15, 25, 32, 40, 42, 46, 53, 53}
```

The `indices` has 55 elements and `offsets[-1] != indices.size(0)`.

When `include_last_offset` is true, the `output` will be in the shape of {offsets.size(0) - 1, weight.sizes()[1]}, which will be {10, 5}.
Originally, `add_indices` will be (i re-arange the 1D tensor by rows, so here 10 rows in total)
```
### this is 55 elements
  0 0 0 0 0 0
  1 1 1 1 1 1
  2 2 2
  3 3 3 3 3 3 3 3 3 3
  4 4 4 4 4 4 4
  5 5 5 5 5 5 5 5
  6 6
  7 7 7 7
  8 8 8 8 8 8 8
  10 10
```
The last row has index of 10 which is out of range of output tensor whose size is [10, 5].

The reason is `make_offset2bag` at [EmbeddingBag.cpp#L66](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/EmbeddingBag.cpp#L66) would give the following `offset2bag`:
```
### this is 55 + 1 elements:
0 0 0 0 0 0 1
0 0 0 0 0 1
0 0 1
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 1
0 0 0 0 0 0 0 1
0 1
0 0 0 1
0 0 0 0 0 0 2
0 0
```

Notice for index 53, it is added twice.

The fix is ignore the last index from `offsets` when `include_last_offset` is true, also this behavior aligns with CUDA, quote from https://github.com/pytorch/pytorch/pull/57208#issuecomment-1021727378

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90358
Approved by: https://github.com/ezyang
2022-12-16 02:08:14 +00:00
158a071034 add _freeze for embedding op (#86769)
Fixes #86663

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86769
Approved by: https://github.com/albanD
2022-10-13 20:12:52 +00:00
6a5550fca4 [test_nn] split embedding tests from test_nn (#85892)
Ref https://github.com/pytorch/pytorch/issues/63085
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85892
Approved by: https://github.com/albanD
2022-09-30 21:45:40 +00:00