pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Author	SHA1	Message	Date
cyy	419a7e197d	[6/N] Fix Wextra-semi warning (#139605 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/139605 Approved by: https://github.com/ezyang	2024-11-04 13:43:16 +00:00
Richard Barnes	42994234a6	std::value/std::type -> std::_v/std::_t (#138746 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/138746 Approved by: https://github.com/cyyever, https://github.com/malfet	2024-10-26 20:59:24 +00:00
Siddhartha Menon	e1e6417d4c	Add SVE implementation of embedding_lookup_idx (#133995 ) Adds an accelerated version of the embedding_lookup_idx perfkernels. This is done via a python codegen file similarly to `caffe2/perfkernels/hp_emblookup_codegen.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/133995 Approved by: https://github.com/malfet, https://github.com/huydhn	2024-10-15 18:52:44 +00:00
PyTorch MergeBot	dac0b4e62b	Revert "Add SVE implementation of embedding_lookup_idx (#133995 )" This reverts commit 770c134998d3422bc2fa3b90baa235ed0c409e62. Reverted https://github.com/pytorch/pytorch/pull/133995 on behalf of https://github.com/clee2000 due to breaking internal tests, I wondering if this just needs a targets change for buck? ([comment](https://github.com/pytorch/pytorch/pull/133995#issuecomment-2414596554))	2024-10-15 17:23:50 +00:00
Siddhartha Menon	770c134998	Add SVE implementation of embedding_lookup_idx (#133995 ) Adds an accelerated version of the embedding_lookup_idx perfkernels. This is done via a python codegen file similarly to `caffe2/perfkernels/hp_emblookup_codegen.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/133995 Approved by: https://github.com/malfet, https://github.com/huydhn	2024-10-14 10:17:27 +00:00
cyy	60e8dc4374	Check function declarations in Caffe2 code (#134925 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/134925 Approved by: https://github.com/ezyang	2024-09-09 05:03:29 +00:00
cyy	059cae6176	[Caffe2] Remove Caffe2 proto and other files (#127655 ) Remove Caffe2 proto files altogether. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127655 Approved by: https://github.com/ezyang	2024-06-04 14:22:21 +00:00
PyTorch MergeBot	b5594f7df0	Revert "Use missing-prototypes in torch_cpu (#103725 )" This reverts commit 716b3b893d2826f1e47ab5321f082b48c66c8c92. Reverted https://github.com/pytorch/pytorch/pull/103725 on behalf of https://github.com/osalpekar due to Broke caffe2 builds due. More info at [D46920675](https://www.internalfb.com/diff/D46920675) ([comment](https://github.com/pytorch/pytorch/pull/103725#issuecomment-1603129273))	2023-06-22 18:30:31 +00:00
cyy	716b3b893d	Use missing-prototypes in torch_cpu (#103725 ) This PR enables Wmissing-prototypes in torch_cpu except some generated cpp files and the mps and metal backends. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103725 Approved by: https://github.com/albanD	2023-06-21 13:19:55 +00:00
haozhe.zhu	7cd6e6acad	add bf16 in fp32 out fast path for embedingbag in caffe2 perfkernel (#89198 ) Add BF16 in FP32 out kernel into Caffe2 emb perfkernels. And also update the python code-gen files to generate the kernel. The ut will be covered in the next PR(#89199) in this stack ( Tested by nn.EmbeddingBag with BF16 data type) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89198 Approved by: https://github.com/jgong5, https://github.com/kit1980	2022-11-30 13:06:13 +00:00
Shashank Chaudhry	06d1be2447	[NOOP][clangformat][codemod] Enable CLANGFORMAT for caffe2/caffe2/* (#67624 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67624 Test Plan: Visual inspection. Sandcastle. Reviewed By: malfet Differential Revision: D31986628 fbshipit-source-id: c872bded7325997a2945dbf5d4d052628dcb3659	2021-11-02 22:14:04 -07:00
frdong	92770d25cd	fix comparison of narrow type with wide type in loop condition (#53951 ) Summary: fix Semmle warning: Comparison of narrow type with wide type in loop condition For example there is below piece of code: for (int i=0; i<array.size(); ++i) {} The problem is that array.size() return type is size_t can be larger type than int depending on the implementation so there is chance that i overflows (for very large array that array size is beyond the range of integer) and this loop will never be terminated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53951 Reviewed By: zou3519 Differential Revision: D27181495 Pulled By: malfet fbshipit-source-id: 0612c5cedcdc656c193085e7fbb87dd163f20688	2021-03-22 16:40:35 -07:00
Qi Zhou	0ec717c830	Support int32 indices and offsets in nn.EmbeddingBag (#46758 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46758 It's in general helpful to support int32 indices and offsets, especially when such tensors are large and need to be transferred to accelerator backends. Since it may not be very useful to support the combination of int32 indices and int64 offsets, here we enforce that these two must have the same type. Test Plan: unit tests Reviewed By: ngimel Differential Revision: D24470808 fbshipit-source-id: 94b8a1d0b7fc9fe3d128247aa042c04d7c227f0b	2020-11-03 23:33:50 -08:00
Jianyu Huang	3ada2e0d64	[pytorch][embeddingbag] Parallelize the EmbeddingBag operator (#4049 ) Summary: Pull Request resolved: https://github.com/pytorch/glow/pull/4049 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27477 We would like to add the intra-op parallelization support for the EmbeddingBag operator. This should bring speedup for the DLRM benchmark: https://github.com/pytorch/pytorch/pull/24385 Benchmark code: ``` from __future__ import absolute_import, division, print_function, unicode_literals import torch import time eb = torch.nn.EmbeddingBag(1000000, 64, mode='sum') input = torch.LongTensor(1500).random_(0, 1000000) offsets = torch.zeros(64, dtype=torch.int64) niter = 10000 s = time.time() for _ in range(niter): out = eb(input, offsets) time_per_iter = (time.time() - s) / niter print('time_per_iter', time_per_iter) print('GB/s', (input.numel() * 64 * 4 + out.numel() * 4) / time_per_iter / 1e9) ``` The following results are single core on Skylake T6: - Before our change (with the original caffe2::EmbeddingLookup) time_per_iter 6.313693523406982e-05 GB/s 6.341517821789133 - After our change using the EmbeddingLookupIdx API which takes the offsets instead of lengths. time_per_iter 5.7627105712890626e-05 GB/s 6.947841559053659 - With Intel's PR: https://github.com/pytorch/pytorch/pull/24385 time_per_iter 7.393271923065185e-05 GB/s 5.415518381664018 For multi-core performance, because Clang doesn't work with OMP, I can only see the single-core performance on SKL T6. ghstack-source-id: 97124557 Test Plan: With D16990830: ``` buck run mode/dev //caffe2/caffe2/perfkernels:embedding_bench ``` With D17750961: ``` buck run mode/opt //experimental/jianyuhuang/embeddingbag:eb buck run mode/opt-lto //experimental/jianyuhuang/embeddingbag:eb ``` OSS test ``` python run_test.py -i nn -- TestNNDeviceTypeCPU.test_EmbeddingBag_per_sample_weights_and_new_offsets_cpu ``` Buck test ``` buck test mode/dev-nosan //caffe2/test:nn -- "test_EmbeddingBag_per_sample_weights_and_new_offsets_cpu" OMP_NUM_THREADS=3 buck test mode/opt -c pytorch.parallel_backend=tbb //caffe2/test:nn -- "test_EmbeddingBag_per_sample_weights_and_new_offsets" --print-passing-details ``` Generate the AVX2 code for embedding_lookup_idx_avx2.cc: ``` python hp_emblookup_codegen.py --use-offsets ``` Differential Revision: D17768404 fbshipit-source-id: 8dcd15a62d75b737fa97e0eff17f347052675700	2020-01-23 21:29:44 -08:00
Yinghai Lu	cddc147267	Back out "Revert D17826873: Adding support to offsets based Fused8BitRowwiseEmbeddingLookup" (#27728 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27728 Original commit changeset: 15ad64e49f92 Test Plan: same as previous one. Reviewed By: dreamingleo Differential Revision: D17872553 fbshipit-source-id: fd9d180d5e02e2c17285898c79cdd9509ffb8bbf	2019-10-10 23:52:43 -07:00
Ailing Zhang	b3cb072de7	Revert D17826873: Adding support to offsets based Fused8BitRowwiseEmbeddingLookup Test Plan: revert-hammer Differential Revision: D17826873 Original commit changeset: 23c4a96d9252 fbshipit-source-id: 15ad64e49f922a859abc574b261ac0f857682ff4	2019-10-10 16:16:06 -07:00
Yinghai Lu	ce6287f675	Adding support to offsets based Fused8BitRowwiseEmbeddingLookup (#27635 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27635 PyTorch uses `offsets` instead of `lengths` for embedding table lookup. Adding support to that for fused quantized version. AVX2 version is generated with ``` python caffe2/caffe2/perfkernels/hp_emblookup_codegen.py --fused --use-offsets ``` Test Plan: ``` buck test caffe2/torch/fb/sparsenn:test ``` Reviewed By: jianyuh Differential Revision: D17826873 fbshipit-source-id: 23c4a96d92521deaebc02b688ad735d76a4476df	2019-10-10 10:50:44 -07:00
Jiakai Liu	2bed201190	remove caffe2.pb.h dependency for embedding_lookup_idx.cc (#25670 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25670 This is part of the effort to get rid of protobuf dependency for libtorch mobile build. embedding_lookup_idx.cc is used by ATen/EmbeddingBag.cpp. It indirectly includes caffe2.pb.h but doesn't really need it. Clean up the headers to unblock no-protobuf mobile build. The broader problem is that many common headers in pytorch/caffe2 directly or indirectly include caffe2.pb.h. After landing the stack of changes to remove protobuf from OSS libtorch mobile build, it's going to constraint how ATen and other parts of pytorch use caffe2 components: it will break OSS mobile CI if a PR introduces a dependency to a caffe2 file that indirectly includes caffe2.pb.h. We will need to tease out caffe2.pb.h dependencies like in this diff, or do a refactor to replace protobuf generated types. Chatted with gchanan and ezyang to confirm that there is no plan to add more dependencies to caffe2 components from ATen in near future, so this should be fine. Test Plan: - build locally with stacked diffs Differential Revision: D17191913 Pulled By: ljk53 fbshipit-source-id: 1248fe6424060a8bedcf20e73942b7500ae5e815	2019-09-06 00:54:36 -07:00
Jianyu Huang	ad7250d315	Make EmbeddingLookup APIs take offsets rather than lengths to match the PyTorch's EmbeddingBag (#24944 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24944 As Title says, we would like to make the EmbeddingLookup APIs take offsets rather than lengths to match the PyTorch's EmbeddingBag. ghstack-source-id: 88883902 Test Plan: python hp_emblookup_codegen.py --use-offsets Check the benchmark in D16990830. Reviewed By: jspark1105 Differential Revision: D16924271 fbshipit-source-id: 7fac640c8587db59fd2304bb8e8d63c413f27cb8	2019-08-23 14:43:56 -07:00

19 Commits