19 Commits

Author SHA1 Message Date
cyy
419a7e197d [6/N] Fix Wextra-semi warning (#139605)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139605
Approved by: https://github.com/ezyang
2024-11-04 13:43:16 +00:00
42994234a6 std::value/std::type -> std::_v/std::_t (#138746)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138746
Approved by: https://github.com/cyyever, https://github.com/malfet
2024-10-26 20:59:24 +00:00
e1e6417d4c Add SVE implementation of embedding_lookup_idx (#133995)
Adds an accelerated version of the embedding_lookup_idx perfkernels. This is done via a python codegen file similarly to `caffe2/perfkernels/hp_emblookup_codegen.py`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133995
Approved by: https://github.com/malfet, https://github.com/huydhn
2024-10-15 18:52:44 +00:00
dac0b4e62b Revert "Add SVE implementation of embedding_lookup_idx (#133995)"
This reverts commit 770c134998d3422bc2fa3b90baa235ed0c409e62.

Reverted https://github.com/pytorch/pytorch/pull/133995 on behalf of https://github.com/clee2000 due to breaking internal tests, I wondering if this just needs a targets change for buck? ([comment](https://github.com/pytorch/pytorch/pull/133995#issuecomment-2414596554))
2024-10-15 17:23:50 +00:00
770c134998 Add SVE implementation of embedding_lookup_idx (#133995)
Adds an accelerated version of the embedding_lookup_idx perfkernels. This is done via a python codegen file similarly to `caffe2/perfkernels/hp_emblookup_codegen.py`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133995
Approved by: https://github.com/malfet, https://github.com/huydhn
2024-10-14 10:17:27 +00:00
cyy
60e8dc4374 Check function declarations in Caffe2 code (#134925)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134925
Approved by: https://github.com/ezyang
2024-09-09 05:03:29 +00:00
cyy
059cae6176 [Caffe2] Remove Caffe2 proto and other files (#127655)
Remove Caffe2 proto files altogether.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127655
Approved by: https://github.com/ezyang
2024-06-04 14:22:21 +00:00
b5594f7df0 Revert "Use missing-prototypes in torch_cpu (#103725)"
This reverts commit 716b3b893d2826f1e47ab5321f082b48c66c8c92.

Reverted https://github.com/pytorch/pytorch/pull/103725 on behalf of https://github.com/osalpekar due to Broke caffe2 builds due. More info at [D46920675](https://www.internalfb.com/diff/D46920675) ([comment](https://github.com/pytorch/pytorch/pull/103725#issuecomment-1603129273))
2023-06-22 18:30:31 +00:00
cyy
716b3b893d Use missing-prototypes in torch_cpu (#103725)
This PR enables  Wmissing-prototypes in torch_cpu except some generated cpp files and the mps and metal backends.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103725
Approved by: https://github.com/albanD
2023-06-21 13:19:55 +00:00
7cd6e6acad add bf16 in fp32 out fast path for embedingbag in caffe2 perfkernel (#89198)
Add BF16 in FP32 out kernel into Caffe2 emb perfkernels. And also update the python code-gen files to generate the kernel.
The ut will be covered in the next PR(#89199) in this stack ( Tested by nn.EmbeddingBag with BF16 data type)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89198
Approved by: https://github.com/jgong5, https://github.com/kit1980
2022-11-30 13:06:13 +00:00
06d1be2447 [NOOP][clangformat][codemod] Enable CLANGFORMAT for caffe2/caffe2/* (#67624)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67624

Test Plan: Visual inspection. Sandcastle.

Reviewed By: malfet

Differential Revision: D31986628

fbshipit-source-id: c872bded7325997a2945dbf5d4d052628dcb3659
2021-11-02 22:14:04 -07:00
92770d25cd fix comparison of narrow type with wide type in loop condition (#53951)
Summary:
fix Semmle warning: Comparison of narrow type with wide type in loop condition

For example there is below piece of code:
for (int i=0; i<array.size(); ++i) {}

The problem is that array.size() return type is size_t can be larger type than int depending on the implementation so there is chance that i overflows (for very large array that array size is beyond the range of integer) and this loop will never be terminated.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53951

Reviewed By: zou3519

Differential Revision: D27181495

Pulled By: malfet

fbshipit-source-id: 0612c5cedcdc656c193085e7fbb87dd163f20688
2021-03-22 16:40:35 -07:00
0ec717c830 Support int32 indices and offsets in nn.EmbeddingBag (#46758)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46758

It's in general helpful to support int32 indices and offsets, especially when such tensors are large and need to be transferred to accelerator backends. Since it may not be very useful to support the combination of int32 indices and int64 offsets, here we enforce that these two must have the same type.

Test Plan: unit tests

Reviewed By: ngimel

Differential Revision: D24470808

fbshipit-source-id: 94b8a1d0b7fc9fe3d128247aa042c04d7c227f0b
2020-11-03 23:33:50 -08:00
3ada2e0d64 [pytorch][embeddingbag] Parallelize the EmbeddingBag operator (#4049)
Summary:
Pull Request resolved: https://github.com/pytorch/glow/pull/4049

Pull Request resolved: https://github.com/pytorch/pytorch/pull/27477

We would like to add the intra-op parallelization support for the EmbeddingBag operator.

This should bring speedup for the DLRM benchmark:
https://github.com/pytorch/pytorch/pull/24385

Benchmark code:
```
from __future__ import absolute_import, division, print_function, unicode_literals

import torch
import time

eb = torch.nn.EmbeddingBag(1000000, 64, mode='sum')

input = torch.LongTensor(1500).random_(0, 1000000)
offsets = torch.zeros(64, dtype=torch.int64)

niter = 10000
s = time.time()
for _ in range(niter):
    out = eb(input, offsets)
time_per_iter = (time.time() - s) / niter
print('time_per_iter', time_per_iter)
print('GB/s', (input.numel() * 64 * 4 + out.numel() * 4) / time_per_iter / 1e9)
```

The following results are single core on Skylake T6:
- Before our change (with the original caffe2::EmbeddingLookup)
time_per_iter 6.313693523406982e-05
GB/s 6.341517821789133

- After our change using the EmbeddingLookupIdx API which takes the offsets instead of lengths.
time_per_iter 5.7627105712890626e-05
GB/s 6.947841559053659

- With Intel's PR: https://github.com/pytorch/pytorch/pull/24385
time_per_iter 7.393271923065185e-05
GB/s 5.415518381664018

For multi-core performance, because Clang doesn't work with OMP, I can only see the single-core performance on SKL T6.
ghstack-source-id: 97124557

Test Plan:
With D16990830:
```
buck run mode/dev //caffe2/caffe2/perfkernels:embedding_bench
```

With D17750961:
```
buck run mode/opt //experimental/jianyuhuang/embeddingbag:eb
buck run mode/opt-lto //experimental/jianyuhuang/embeddingbag:eb
```

OSS test
```
python run_test.py -i nn -- TestNNDeviceTypeCPU.test_EmbeddingBag_per_sample_weights_and_new_offsets_cpu
```

Buck test
```
buck test mode/dev-nosan //caffe2/test:nn -- "test_EmbeddingBag_per_sample_weights_and_new_offsets_cpu"

OMP_NUM_THREADS=3 buck test mode/opt -c pytorch.parallel_backend=tbb //caffe2/test:nn -- "test_EmbeddingBag_per_sample_weights_and_new_offsets"  --print-passing-details
```

Generate the AVX2 code for embedding_lookup_idx_avx2.cc:
```
python hp_emblookup_codegen.py --use-offsets
```

Differential Revision: D17768404

fbshipit-source-id: 8dcd15a62d75b737fa97e0eff17f347052675700
2020-01-23 21:29:44 -08:00
cddc147267 Back out "Revert D17826873: Adding support to offsets based Fused8BitRowwiseEmbeddingLookup" (#27728)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27728

Original commit changeset: 15ad64e49f92

Test Plan: same as previous one.

Reviewed By: dreamingleo

Differential Revision: D17872553

fbshipit-source-id: fd9d180d5e02e2c17285898c79cdd9509ffb8bbf
2019-10-10 23:52:43 -07:00
b3cb072de7 Revert D17826873: Adding support to offsets based Fused8BitRowwiseEmbeddingLookup
Test Plan: revert-hammer

Differential Revision:
D17826873

Original commit changeset: 23c4a96d9252

fbshipit-source-id: 15ad64e49f922a859abc574b261ac0f857682ff4
2019-10-10 16:16:06 -07:00
ce6287f675 Adding support to offsets based Fused8BitRowwiseEmbeddingLookup (#27635)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27635

PyTorch uses `offsets` instead of `lengths` for embedding table lookup. Adding support to that for fused quantized version.

AVX2 version is generated with
```
python caffe2/caffe2/perfkernels/hp_emblookup_codegen.py --fused --use-offsets
```

Test Plan:
```
buck test caffe2/torch/fb/sparsenn:test
```

Reviewed By: jianyuh

Differential Revision: D17826873

fbshipit-source-id: 23c4a96d92521deaebc02b688ad735d76a4476df
2019-10-10 10:50:44 -07:00
2bed201190 remove caffe2.pb.h dependency for embedding_lookup_idx.cc (#25670)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25670

This is part of the effort to get rid of protobuf dependency for
libtorch mobile build.

embedding_lookup_idx.cc is used by ATen/EmbeddingBag.cpp. It indirectly
includes caffe2.pb.h but doesn't really need it. Clean up the headers to
unblock no-protobuf mobile build.

The broader problem is that many common headers in pytorch/caffe2 directly
or indirectly include caffe2.pb.h. After landing the stack of changes to
remove protobuf from OSS libtorch mobile build, it's going to constraint
how ATen and other parts of pytorch use caffe2 components: it will break
OSS mobile CI if a PR introduces a dependency to a caffe2 file that
indirectly includes caffe2.pb.h. We will need to tease out caffe2.pb.h
dependencies like in this diff, or do a refactor to replace protobuf
generated types.

Chatted with gchanan and ezyang to confirm that there is no plan to
add more dependencies to caffe2 components from ATen in near future,
so this should be fine.

Test Plan: - build locally with stacked diffs

Differential Revision: D17191913

Pulled By: ljk53

fbshipit-source-id: 1248fe6424060a8bedcf20e73942b7500ae5e815
2019-09-06 00:54:36 -07:00
ad7250d315 Make EmbeddingLookup APIs take offsets rather than lengths to match the PyTorch's EmbeddingBag (#24944)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24944

As Title says, we would like to make the EmbeddingLookup APIs take offsets rather than lengths to match the PyTorch's EmbeddingBag.
ghstack-source-id: 88883902

Test Plan:
python hp_emblookup_codegen.py --use-offsets
Check the benchmark in D16990830.

Reviewed By: jspark1105

Differential Revision: D16924271

fbshipit-source-id: 7fac640c8587db59fd2304bb8e8d63c413f27cb8
2019-08-23 14:43:56 -07:00