pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 12:54:11 +08:00

Author	SHA1	Message	Date
PyTorch MergeBot	47804ce467	Revert "12/n : Remove fbandroid_compiler_flags (#165558 )" This reverts commit aead9270f56ebc7302c7f5fa7e5dff959f26608e. Reverted https://github.com/pytorch/pytorch/pull/165558 on behalf of https://github.com/clee2000 due to Diff was actually reverted internally D84832629 ([comment](https://github.com/pytorch/pytorch/pull/165558#issuecomment-3420367955))	2025-10-20 03:03:13 +00:00
Ketan Ambati	aead9270f5	12/n : Remove fbandroid_compiler_flags (#165558 ) Summary: Currently `get_c2_fbandroid_xplat_compiler_flags()` is reading the `caffe2.strip_glog` buckconfig which we want to get rid of. This diff removes the `fbandroid_compiler_flags` arg and merges it with compiler_flags with a nested select and the select version of the method The goal is to get rid of all the usages of `get_c2_fbandroid_xplat_compiler_flags()` so that we can get rid of the `caffe2.strip_glog` buckconfig Test Plan: CI bifferential Revision: D84626885 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165558 Approved by: https://github.com/malfet	2025-10-16 20:41:24 +00:00
PyTorch MergeBot	e1d71a6b35	Revert "12/n : Remove fbandroid_compiler_flags (#165558 )" This reverts commit d7ffa8b8a29ba6071c51499c1df3d702d0a26f72. Reverted https://github.com/pytorch/pytorch/pull/165558 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/165558#issuecomment-3411879769))	2025-10-16 17:18:56 +00:00
Ketan Ambati	d7ffa8b8a2	12/n : Remove fbandroid_compiler_flags (#165558 ) Summary: Currently `get_c2_fbandroid_xplat_compiler_flags()` is reading the `caffe2.strip_glog` buckconfig which we want to get rid of. This diff removes the `fbandroid_compiler_flags` arg and merges it with compiler_flags with a nested select and the select version of the method The goal is to get rid of all the usages of `get_c2_fbandroid_xplat_compiler_flags()` so that we can get rid of the `caffe2.strip_glog` buckconfig Test Plan: CI Differential Revision: D84626885 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165558 Approved by: https://github.com/malfet	2025-10-16 05:46:02 +00:00
Mikayla Gawarecki	f37a6523ef	Move version.h to torch/headeronly (#164381 ) Differential Revision: [D83685392](https://our.internmc.facebook.com/intern/diff/D83685392) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164381 Approved by: https://github.com/janeyx99	2025-10-07 17:47:30 +00:00
PyTorch MergeBot	0fb89b84b9	Revert "Consistently use c10_ovrsource in arvr mode everywhere (#164128 )" This reverts commit efd7fd5ed5ac7ec03201a546a09fb19ec59de431. Reverted https://github.com/pytorch/pytorch/pull/164128 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/164128#issuecomment-3352544006))	2025-09-30 14:43:52 +00:00
Edward Yang	efd7fd5ed5	Consistently use c10_ovrsource in arvr mode everywhere (#164128 ) Summary: Previously, many arvr targets transitively depended on c10, not c10_ovrsource, because they either explicitly depended on c10 (because they didn't know better) or they depended on legacy Caffe2, which never got the ovrsource treatment. So we found all these spots (driven by D82283623) and forced them to query arvr mode to figure out which one they should use. The goal is you NEVER have both targets in the same build rule at the same time. This diff could be reverted if D82224960 works out but I haven't gotten it to work yet. Test Plan: sandcastle Reviewed By: EscapeZero Differential Revision: D82390436 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164128 Approved by: https://github.com/albanD, https://github.com/malfet	2025-09-29 20:47:20 +00:00
PyTorch MergeBot	00059db034	Revert "[RELAND] Always build USE_DISTRIBUTED (#160449 ) and Make distributed modules importable even when backend not built (#159889 ) (#162594 )" This reverts commit 09cb34c1dce8fe1b880bbf3115d8ddad3401d871. Reverted https://github.com/pytorch/pytorch/pull/162594 on behalf of https://github.com/malfet due to reverted internally and now can be safely reverted in OSS ([comment](https://github.com/pytorch/pytorch/pull/162594#issuecomment-3334176367))	2025-09-25 13:47:46 +00:00
Brian Hirsh	7d710403b0	Reapply "Make functionalization `ViewMeta` serializable with pickle. (#143712 )" (#163769 ) ### Summary: NOTE: This is a re-export of https://github.com/pytorch/pytorch/pull/161994 ; the changes between these two PRs is exclusively to the buck/build files (Summary from #161994 ) Attempted rebase of https://github.com/pytorch/pytorch/pull/143712. This reverts commit 6c713ccb5e0df227dd5b630057cbccd373cbe7d6. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames Lucaskabela imported-using-ghimport Test Plan: Imported from OSS Differential Revision: D81524507 Pulled By: Lucaskabela Pull Request resolved: https://github.com/pytorch/pytorch/pull/163769 Approved by: https://github.com/dolpm Co-authored-by: Brian Hirsh <hirsheybar@fb.com>	2025-09-25 10:27:37 +00:00
Edward Yang	09cb34c1dc	[RELAND] Always build USE_DISTRIBUTED (#160449 ) and Make distributed modules importable even when backend not built (#159889 ) (#162594 ) Summary: Original: D81957844 and D81957923 Also, https://github.com/pytorch/pytorch/pull/162142 is patched in as well #buildall Test Plan: sandcastle and oss ci Rollback Plan: Reviewed By: H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/162594 Approved by: https://github.com/H-Huang, https://github.com/dcci	2025-09-22 21:12:18 +00:00
PyTorch MergeBot	f0078941cf	Revert "[RELAND] Always build USE_DISTRIBUTED (#160449 ) and Make distributed modules importable even when backend not built (#159889 ) (#162594 )" This reverts commit 6c334885d48725197b5d35e2c1543efc0f4198d0. Reverted https://github.com/pytorch/pytorch/pull/162594 on behalf of https://github.com/wdvr due to reverted internally - @ezyang see D82281294 ([comment](https://github.com/pytorch/pytorch/pull/162594#issuecomment-3317017530))	2025-09-22 05:39:07 +00:00
Jae Ku	5f25dbe7fd	Rm pytorch deps platform args (#163086 ) Summary: Platform args was a buck1 concept that we decided to port over to buck2 in order to make the migration easier. However, platforms args existing in the repo blocks some buck modernization like modefile free efforts, so we're trying to get rid of the usage. Test Plan: CI Rollback Plan: Differential Revision: D82470032 Pull Request resolved: https://github.com/pytorch/pytorch/pull/163086 Approved by: https://github.com/malfet, https://github.com/8Keep	2025-09-19 02:13:03 +00:00
Edward Yang	6c334885d4	[RELAND] Always build USE_DISTRIBUTED (#160449 ) and Make distributed modules importable even when backend not built (#159889 ) (#162594 ) Summary: Original: D81957844 and D81957923 Also, https://github.com/pytorch/pytorch/pull/162142 is patched in as well #buildall Test Plan: sandcastle and oss ci Rollback Plan: Reviewed By: H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/162594 Approved by: https://github.com/H-Huang, https://github.com/dcci	2025-09-12 10:54:42 +00:00
PyTorch MergeBot	6b59a19242	Revert "[RELAND] Always build USE_DISTRIBUTED (#160449 ) and Make distributed modules importable even when backend not built (#159889 ) (#162594 )" This reverts commit 6e8f17c58029e5fa6bc222b2445ebbc0cbdc17c7. Reverted https://github.com/pytorch/pytorch/pull/162594 on behalf of https://github.com/huydhn due to Reverted internally ([comment](https://github.com/pytorch/pytorch/pull/162594#issuecomment-3283985880))	2025-09-12 06:52:03 +00:00
Edward Yang	6e8f17c580	[RELAND] Always build USE_DISTRIBUTED (#160449 ) and Make distributed modules importable even when backend not built (#159889 ) (#162594 ) Summary: Original: D81957844 and D81957923 Also, https://github.com/pytorch/pytorch/pull/162142 is patched in as well #buildall Test Plan: sandcastle and oss ci Rollback Plan: Reviewed By: H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/162594 Approved by: https://github.com/H-Huang, https://github.com/dcci	2025-09-12 03:56:18 +00:00
Edward Yang	dda071587f	Revert "Make distributed modules importable even when backend not built (#159889 )" (#162568 ) This reverts commit a0d026688cd69583d5a4e0c6f3e5fda141a7f4a9. Revert "Always build USE_DISTRIBUTED. (#160449)" This reverts commit d80297a6846f1f2c36fd4f19e22919f2abe8fcea. Pull Request resolved: https://github.com/pytorch/pytorch/pull/162568 Approved by: https://github.com/huydhn	2025-09-10 04:29:42 +00:00
Edward Yang	d80297a684	Always build USE_DISTRIBUTED. (#160449 ) Signed-off-by: Edward Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/160449 Approved by: https://github.com/wconstab, https://github.com/albanD, https://github.com/dcci	2025-09-08 19:10:36 +00:00
PyTorch MergeBot	1e0656f063	Revert "Always build USE_DISTRIBUTED. (#160449 )" This reverts commit de893e96c775023aa3be895060848fac3296772c. Reverted https://github.com/pytorch/pytorch/pull/160449 on behalf of https://github.com/jeanschmidt due to internal changes breaks import checks, see [D81845053](https://www.internalfb.com/diff/D81845053) ([comment](https://github.com/pytorch/pytorch/pull/160449#issuecomment-3264887002))	2025-09-08 07:04:36 +00:00
Edward Yang	de893e96c7	Always build USE_DISTRIBUTED. (#160449 ) Signed-off-by: Edward Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/160449 Approved by: https://github.com/wconstab, https://github.com/albanD, https://github.com/dcci	2025-09-05 20:15:11 +00:00
PyTorch MergeBot	adae7f66aa	Revert "Always build USE_DISTRIBUTED. (#160449 )" This reverts commit c37103234afc832dcad307e9016230810957c9d5. Reverted https://github.com/pytorch/pytorch/pull/160449 on behalf of https://github.com/jeanschmidt due to Breaking internal build rules, see D81756619 ([comment](https://github.com/pytorch/pytorch/pull/160449#issuecomment-3259430011))	2025-09-05 18:58:47 +00:00
Edward Yang	c37103234a	Always build USE_DISTRIBUTED. (#160449 ) Signed-off-by: Edward Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/160449 Approved by: https://github.com/wconstab, https://github.com/albanD, https://github.com/dcci	2025-09-04 19:43:17 +00:00
Richard Howell	8975cda252	[pt] strip error messages in profile builds (#162076 ) Summary: Profile builds should match production builds, and error messages result in large static initializers running. Omit them for profile builds too. Test Plan: Before: ``` $ buck build //xplat/caffe2:aten_native_cpuApple -c user.sandcastle_build_mode=profile --show-output $ llvm-nm buck-out/v2/gen/fbsource/31fc3668aa0b4012/xplat/caffe2/__aten_native_cpuApple__/libaten_native_cpuApple.pic.a \| grep ZN3c106detail12_str_wrapperIJPKcRKiS3_RKxS3_RKS3_S3_EE4callES9_S5_S9_S7_S9_S9_S9 0000000000003234 T __ZN3c106detail12_str_wrapperIJPKcRKiS3_RKxS3_RKS3_S3_EE4callES9_S5_S9_S7_S9_S9_S9_ ``` After: ``` $ buck build //xplat/caffe2:aten_native_cpuApple -c user.sandcastle_build_mode=profile --show-output $ llvm-nm buck-out/v2/gen/fbsource/31fc3668aa0b4012/xplat/caffe2/__aten_native_cpuApple__/libaten_native_cpuApple.pic.a \| grep ZN3c106detail12_str_wrapperIJPKcRKiS3_RKxS3_RKS3_S3_EE4callES9_S5_S9_S7_S9_S9_S9 ``` Rollback Plan: Reviewed By: yury-dymov, abashyam Differential Revision: D81599582 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162076 Approved by: https://github.com/swolchok	2025-09-04 04:18:27 +00:00
Aidyn-A	3e5b021f21	[ATen][CPU][Sparse] Use Third-Party Eigen for sparse add and addmm (#155357 ) This pull request adds the following ops for sparse matrices using Eigen library: ```python add(a_csr, b_csr) add(a_csc, b_csc) addmm(c_csr, a_csr, b_csr) addmm(c_csr, a_csr, b_csc) addmm(c_csr, a_csc, b_csc) addmm(c_csr, a_csc, b_csr) addmm(c_csc, a_csr, b_csr) addmm(c_csc, a_csr, b_csc) addmm(c_csc, a_csc, b_csc) addmm(c_csc, a_csc, b_csr) ``` Currently, the operations for sparse matrices on CPU are available through MKL only. The non-existence of MKL on `aarch64` causes the unavailability of these ops on any machines with ARM based CPUs, including Apple Silicon, AWS Graviton and NVIDIA Grace. This PR addresses this issue by using Eigen as a backend for the above ops. This is a re-factored version of my previous PR #101814. The main difference with the old one, this does not enable Eigen by default. Pull Request resolved: https://github.com/pytorch/pytorch/pull/155357 Approved by: https://github.com/pearu, https://github.com/eqy Co-authored-by: Eli Uriegas <eliuriegas@meta.com>	2025-08-23 19:03:55 +00:00
Rex Zhang	ce467df5d1	rm platform args xplat/langtech/mobile/BUCK (#161018 ) Differential Revision: D80460691 Pull Request resolved: https://github.com/pytorch/pytorch/pull/161018 Approved by: https://github.com/drisspg	2025-08-22 14:47:36 +00:00
Zhengxu Chen	8460131087	[nativert] Add OSS version of ModelRunner (#159268 ) Summary: Implement a ModelRunner from scratch with the minimum features for OSS only Test Plan: test_export -r NativeRT Rollback Plan: Differential Revision: D78979812 Pull Request resolved: https://github.com/pytorch/pytorch/pull/159268 Approved by: https://github.com/dolpm	2025-07-29 21:08:14 +00:00
Scott Wolchok	fee2377f9e	Reapply D77381084 / #156964 : Rename torch::standalone to headeronly (#157251 ) Was reverted due to internal failure which should be fixed now. I believe Jane wants this reapplied and picked to release, and she's out this week. Original summary: headeronly is more clear, let's change the name before anyone depends on standalone Differential Revision: [D77520173](https://our.internmc.facebook.com/intern/diff/D77520173/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/157251 Approved by: https://github.com/janeyx99, https://github.com/Skylion007, https://github.com/desertfire	2025-06-30 23:25:30 +00:00
PyTorch MergeBot	e290a4c645	Revert "Rename torch::standalone to headeronly (#156964 )" This reverts commit 7e54c02a35b905e758497b856a1953eb009ba836. Reverted https://github.com/pytorch/pytorch/pull/156964 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/156964#issuecomment-3011136947))	2025-06-27 02:20:33 +00:00
Jane Xu	7e54c02a35	Rename torch::standalone to headeronly (#156964 ) Summary: headeronly is more clear, let's change the name before anyone depends on standalone Test Plan: CI should pass! Rollback Plan: Differential Revision: D77381084 Pull Request resolved: https://github.com/pytorch/pytorch/pull/156964 Approved by: https://github.com/swolchok, https://github.com/albanD, https://github.com/desertfire	2025-06-27 01:00:14 +00:00
FFFrog	e8cf5ff564	Fix the Problems About Defining Static Variable in Inline Function (#147095 ) Refer to https://github.com/pytorch/pytorch/issues/125465 for more informations - Remove unused header files - Move common functionality to separate files to reduce dependencies between picklers and unpicklers - Move the inline function that defines the static variable to .cc Differential Revision: [D76266755](https://our.internmc.facebook.com/intern/diff/D76266755) Pull Request resolved: https://github.com/pytorch/pytorch/pull/147095 Approved by: https://github.com/cyyever, https://github.com/albanD Co-authored-by: Edward Yang <ezyang@meta.com>	2025-06-25 01:59:10 +00:00
Xuehai Pan	013dfeabb4	[BE] fix typos in top-level files (#156067 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156067 Approved by: https://github.com/malfet ghstack dependencies: #156066	2025-06-16 14:56:07 +00:00
Stephen Jia	bee93f9f0d	Move glslc to cas to enable remote execution (#155832 ) Meta: `fbsource//xplat/caffe2:gen_torch_vulkan_spv_cpp` takes on average 2 min to build and is one of topmost slow targets in fbandroid. See: https://fb.workplace.com/groups/2840058936242210/posts/4067730240141734 This target hat to run locally because it uses manifold backend for dotslash. This diff moves the `glslc` to cas backend so that it can run on RE. Here are commands executed: ``` % manifold get dotslash_glslc/flat/glslc-linux-x86_64.tar.gz % manifold get dotslash_glslc/flat/glslc-macos-v2024_4.tar.gz % manifold get dotslash_glslc/flat/glslc-windows-v2024_3.tar % ls -rw-r--r-- 1 navidq staff 2.0M Jun 12 10:02 glslc-linux-x86_64.tar.gz -rw-r--r-- 1 navidq staff 4.7M Jun 12 10:03 glslc-macos-v2024_4.tar.gz -rw-r--r-- 1 navidq staff 4.4M Jun 12 10:03 glslc-windows-v2024_3.tar % frecli --use-case dotslash cas upload-blob --skip-find-missing glslc-linux-x86_64.tar.gz ea5d674e0e7e9782be3f5c309e3484732e5b3a331cbe3258f3e929002811627b:2072937 % frecli --use-case dotslash cas upload-blob --skip-find-missing glslc-macos-v2024_4.tar.gz 1331dc691835e4676832b7c21ef669083a3acc8856981583d0698192f466c51a:4898649 % frecli --use-case dotslash cas upload-blob --skip-find-missing glslc-windows-v2024_3.tar 76181fbb1ce5c62d0c905db26df3a64e999d0baff2e93270775921daa91e3a1a:4585984 ``` Differential Revision: [D76513735](https://our.internmc.facebook.com/intern/diff/D76513735/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/155832 Approved by: https://github.com/GregoryComer	2025-06-13 14:38:51 +00:00
Bin Bao	13044b2b04	Move c10/macros/Export.h to torch/standalone (#154850 ) Summary: The goal of this PR and future follow-up PRs is to group a set of header files required by AOTInductor Standalone in a separate directory, ensuring they are implemented in a header-only manner. Test Plan: CI Bifferential Revision: D75756619 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154850 Approved by: https://github.com/janeyx99	2025-06-03 06:18:59 +00:00
dolpm	66f53889d5	[nativert] port semaphore to c10 util (#153504 ) Summary: nativert RFC: https://github.com/zhxchen17/rfcs/blob/master/RFC-0043-torch-native-runtime.md To land the runtime into PyTorch core, we will gradually land logical parts of the code into the Github issue and get each piece properly reviewed. This diff adds a simple semaphore interface into c10 until c++20 where we get counting_semaphore gonna need a oss build export to take a look at this... Test Plan: CI Differential Revision: D73882656 Pull Request resolved: https://github.com/pytorch/pytorch/pull/153504 Approved by: https://github.com/zhxchen17	2025-05-28 19:17:30 +00:00
Scott Wolchok	f0b2706914	remove sleef_arm target (#154166 ) Summary: X-link: https://github.com/pytorch/executorch/pull/11082 We shouldn't need an ARM-specific variant; we have select() where we should need it. Test Plan: CI Reviewed By: nlutsenko Differential Revision: D74356413 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154166 Approved by: https://github.com/kimishpatel, https://github.com/malfet, https://github.com/Skylion007	2025-05-23 22:16:01 +00:00
Zizeng Meng	a762dd1f67	[Memento] On-demand mode using without torch api (#153171 ) Summary: CUDA Post: https://fb.workplace.com/groups/ai.efficiency.tools.users/permalink/2020094788475989/ # Context In this diff, we want to enable the on-demand mode of memory snapshot to allow user to trace any remote process via dyno command line. # Design decision How do we send on-demand signal to remote process We leverage the dyno-Kineto approach. Since dyno is running on all machine in Meta, it can send a request to the remote machine to start the Kineto. Kineto will start another thread for memoryProfiler (https://fburl.com/code/dxsmmrok) why we use different approach as CUDA On CUDA side, we are using pybind to load torch Module and invoke the python api to start/stop the profiling. However, this requires us to compile the whole torch binary in the predictor which is not recommended by runtime(andruwang) Thus, we decide to use the CPP api directly to avoid un-necessary dependency why the snapshot is saved as json string directly instead of pickle Pickle is primarily designed for use with Python and doesn't have well support in cpp. Also, it is hard for user to download the snapshot file and open locally. Due to the dependency issue, it is hard to import the gzip/pickle library to decode the data. Thus, let's use JSON for now. I will work on the visualizer to fasten the render and support other format later. Plan: * Now, we will encoded file into gz for MTIA ondemand only and update the visualizer to support both type. * Update auto-trace and CUDA side to encode in gzip as well * Fully remove pickle dependency. Test Plan: # Remote cogwheel test Servicelab: https://fburl.com/servicelab/pckux7a3 snapshot file manifold: https://fburl.com/manifold/fnotk18c snapshot file in pastry: P1805522232 Visualization on D74399684 {F1977786422} # Local Predictor Test url: https://fburl.com/pytorch_memory_visualizer/y06kskkm {F1977787329} Differential Revision: D74179606 Pull Request resolved: https://github.com/pytorch/pytorch/pull/153171 Approved by: https://github.com/sraikund16	2025-05-15 06:07:04 +00:00
cyy	9785b32189	Remove unused typing-extensions BUCK target (#153229 ) This target is unused. Pull Request resolved: https://github.com/pytorch/pytorch/pull/153229 Approved by: https://github.com/colesbury	2025-05-13 04:29:59 +00:00
Anthony Shoumikhin	e2f9759bd0	Fix broken URLs (#152237 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/152237 Approved by: https://github.com/huydhn, https://github.com/malfet	2025-04-27 09:56:42 +00:00
Keshav Kolur	c98340e268	[autodeps2] Replace third-party/pyyaml with third-party/pypi/pyyaml (#151668 ) Summary: We should use the pypi version. Test Plan: CI Differential Revision: D73211869 Pull Request resolved: https://github.com/pytorch/pytorch/pull/151668 Approved by: https://github.com/Skylion007	2025-04-22 23:27:13 +00:00
Shivam Raikundalia	ad5e9065ac	[Profiler/Easy] Remove temp flag for on-demand Memory Snapshot (#151068 ) Summary: Now that we have profiler impl in we don't need the temporary flag. submodule update too. Test Plan: CI Reviewed By: sanrise Differential Revision: D72672186 Pull Request resolved: https://github.com/pytorch/pytorch/pull/151068 Approved by: https://github.com/davidberard98	2025-04-11 18:50:25 +00:00
Shivam Raikundalia	99c9a31386	[submodule] [Snapshot/Profiler] Memory Snapshot On Demand (#150559 ) Summary: Profiler side of memory snapshot. 1. Add API to actually do snapshot when client interface is called 2. Add ifdefs to builds so that kineto hooks snapshot correctly. Design Philosophy: There is one interesting part of this implementation and it is during export. For export we are callign the python impl of the export rather than CPP even though we are already in CPP. This is because it is better to simply have one path of export rather than 2. Personally, I want there to be parity between auto-trace and on-demand so it if we can limit the side paths then we will have an easier time maintaining this relationship Test Plan: {F1976563426} Reviewed By: sanrise Differential Revision: D70733247 Pull Request resolved: https://github.com/pytorch/pytorch/pull/150559 Approved by: https://github.com/sanrise	2025-04-07 13:04:38 +00:00
Stepan Hruda	2e23768d25	Expose symbols on macos in the xplat pytorch stack (#150487 ) Summary: X-link: https://github.com/pytorch/executorch/pull/9819 Had to revert D71321310 because it affected way too many targets and build sizes. These changes should expose just enough symbols to be buildable in arvr mode on macOS. Could potentially make narrow it down even more by avoiding eg `get_pt_compiler_flags` Differential Revision: D72255474 Pull Request resolved: https://github.com/pytorch/pytorch/pull/150487 Approved by: https://github.com/drisspg	2025-04-04 23:03:16 +00:00
Stepan Hruda	2a9e737839	[caffe2] Do not use --no-as-needed on macOS (#149421 ) Summary: `--no-as-needed` is not available in ld64.lld Applying this on all macos is potentially too broad? I am not sure if `fbcode//mode/mac` uses a different linker, but arvr mode for sure uses ld64.lld. Test Plan: CI / used for a macOS build on top of the stack. Differential Revision: D71315125 Pull Request resolved: https://github.com/pytorch/pytorch/pull/149421 Approved by: https://github.com/colesbury	2025-03-25 00:41:09 +00:00
Scott Wolchok	ade8fee512	Use c10 version of half/bfloat16 in executorch (#144111 ) Summary: X-link: https://github.com/pytorch/executorch/pull/7040 Accomplished by importing relevant files from c10 into executorch/runtime/core/portable_type/c10, and then using `using` in the top-level ExecuTorch headers. This approach should keep the ExecuTorch build hermetic for embedded use cases. In the future, we should add a CI job to ensure the c10 files stay identical to the PyTorch ones. ghstack-source-id: 260047850 exported-using-ghexport Test Plan: builds Differential Revision: D66106969 Pull Request resolved: https://github.com/pytorch/pytorch/pull/144111 Approved by: https://github.com/malfet	2025-02-08 22:40:14 +00:00
Nikhil Gupta	41b38f755c	Revert "Reverting the PR adding Kleidiai-based int4 kernels (#145392 )" (#145505 ) https://github.com/pytorch/pytorch/pull/134124 was reverted by https://github.com/pytorch/pytorch/pull/145392 due to KleidiAI clone issue. 1. This reverts commit 0940eb6d44f3cf69dd840db990245cbe1f78e770 (https://github.com/pytorch/pytorch/pull/145392 )and Fixes KleidiAI mirror issue. 2. KleidiAI is now cloned from github mirror instead of arm gitlab Change-Id: I7d6eee7214cd117d3057d615936fcc3ee6052fa2 Fixes https://github.com/pytorch/pytorch/issues/145273 Pull Request resolved: https://github.com/pytorch/pytorch/pull/145505 Approved by: https://github.com/malfet	2025-01-23 18:50:59 +00:00
albanD	0940eb6d44	Reverting the PR adding Kleidiai-based int4 kernels (#145392 ) Mitigation for https://github.com/pytorch/pytorch/issues/145273 Reverting https://github.com/pytorch/pytorch/pull/134124 and https://github.com/pytorch/pytorch/pull/144074 Pull Request resolved: https://github.com/pytorch/pytorch/pull/145392 Approved by: https://github.com/ZainRizvi, https://github.com/malfet, https://github.com/atalman, https://github.com/digantdesai	2025-01-22 20:11:49 +00:00
Scott Wolchok	b46d00c1b7	Shard RegisterDispatchKey (#144364 ) Should fix https://github.com/pytorch/pytorch/issues/143952 . Testing: built PyTorch on Raspberry Pi 5; this seemed to alleviate high peak memory requirement. (I did increase shard counts for other generated files along the way, but I need to go back and figure out how much of that was strictly necessary vs. needing to use -j1 or -j2.) Differential Revision: [D67925496](https://our.internmc.facebook.com/intern/diff/D67925496/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144364 Approved by: https://github.com/Skylion007, https://github.com/bdhirsh ghstack dependencies: #144363	2025-01-10 18:21:19 +00:00
Nikhil Gupta	94737e8a2a	[ARM][feat]: Add 4 bit dynamic quantization matmuls & KleidiAI Backend (#134124 ) Description: 1. Quantize Linear Layer Weights to 4-bits: Quantize the weights of the Linear layer to 4 bits, using symmetric quantization. Pack two 4-bit weights into one uint8 container. Choose a quantization scheme (channel-wise or group-wise), with the group size being a multiple of 32. 2. Prepare Quantized Weights, Scales, and Optional Bias: After quantizing, obtain the quantized_weights, scales, and groupsize. If the original Linear layer has a bias, prepare it as well. 3. Pack the Weights Efficiently: Use torch.ops.aten._dyn_quant_pack_4bit_weight to optimally pack the weights, scales, and optional bias. ```python packed_weights = torch.ops.aten._dyn_quant_pack_4bit_weight(weight, scales_and_zeros, bias, groupsize, in_features, out_features) ``` Input parameters should include: in_features and out_features (the same as the Linear layer’s corresponding parameters). 4. Perform Dynamic Quantized Matrix Multiplication: Use torch.ops.aten._dyn_quant_matmul_4bit to perform matrix multiplication with quantized weights. ```python output = torch.ops.aten._dyn_quant_matmul_4bit(input, packed_weights, groupsize, in_features, out_features) ``` Inputs required include: The input tensor, packed_weights , groupsize, and the in_features and out_features. API Usage: https://github.com/pytorch/pytorch/issues/143289 Model Perf : 7B Transformer model: Prefill : 340 t/s Decode : 40 t/s 2B Transformer model Prefill : 747 t/s Decode : 80 t/s Tests: python test/test_linalg.py -k test__dyn_quant_pack_4bit_weight Ran 1 test in 0.016s OK python test/test_linalg.py -k test__dyn_quant_matmul_4bit Ran 8 tests in 0.077s OK python test/test_linalg.py -k test_compile_dyn_quant_matmul_4bit Ran 8 tests in 11.454s Change-Id: Ia1672bad5e6ec94e64d8bb1971395d60f4b3a452 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/134124 Approved by: https://github.com/digantdesai, https://github.com/malfet	2024-12-20 19:32:03 +00:00
PyTorch MergeBot	8136daff5a	Revert "[ARM][feat]: Add 4 bit dynamic quantization matmuls & KleidiAI Backend (#134124 )" This reverts commit 4b82251011f85f9d1395b451d61e976af844d9b1. Reverted https://github.com/pytorch/pytorch/pull/134124 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it breaks lots of internal build ([comment](https://github.com/pytorch/pytorch/pull/134124#issuecomment-2555953189))	2024-12-19 23:33:17 +00:00
Nikhil Gupta	4b82251011	[ARM][feat]: Add 4 bit dynamic quantization matmuls & KleidiAI Backend (#134124 ) Description: 1. Quantize Linear Layer Weights to 4-bits: Quantize the weights of the Linear layer to 4 bits, using symmetric quantization. Pack two 4-bit weights into one uint8 container. Choose a quantization scheme (channel-wise or group-wise), with the group size being a multiple of 32. 2. Prepare Quantized Weights, Scales, and Optional Bias: After quantizing, obtain the quantized_weights, scales, and groupsize. If the original Linear layer has a bias, prepare it as well. 3. Pack the Weights Efficiently: Use torch.ops.aten._dyn_quant_pack_4bit_weight to optimally pack the weights, scales, and optional bias. ```python packed_weights = torch.ops.aten._dyn_quant_pack_4bit_weight(weight, scales_and_zeros, bias, groupsize, in_features, out_features) ``` Input parameters should include: in_features and out_features (the same as the Linear layer’s corresponding parameters). 4. Perform Dynamic Quantized Matrix Multiplication: Use torch.ops.aten._dyn_quant_matmul_4bit to perform matrix multiplication with quantized weights. ```python output = torch.ops.aten._dyn_quant_matmul_4bit(input, packed_weights, groupsize, in_features, out_features) ``` Inputs required include: The input tensor, packed_weights , groupsize, and the in_features and out_features. API Usage: https://github.com/pytorch/pytorch/issues/143289 Model Perf : 7B Transformer model: Prefill : 340 t/s Decode : 40 t/s 2B Transformer model Prefill : 747 t/s Decode : 80 t/s Tests: python test/test_linalg.py -k test__dyn_quant_pack_4bit_weight Ran 1 test in 0.016s OK python test/test_linalg.py -k test__dyn_quant_matmul_4bit Ran 8 tests in 0.077s OK python test/test_linalg.py -k test_compile_dyn_quant_matmul_4bit Ran 8 tests in 11.454s Change-Id: Ia1672bad5e6ec94e64d8bb1971395d60f4b3a452 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/134124 Approved by: https://github.com/digantdesai, https://github.com/malfet	2024-12-19 18:51:26 +00:00
PyTorch MergeBot	14fe1f7190	Revert "[ARM][feat]: Add 4 bit dynamic quantization matmuls & KleidiAI Backend (#134124 )" This reverts commit d3ff2d42c28a2c187cbedfd8f60b84a4dfa2d6bf. Reverted https://github.com/pytorch/pytorch/pull/134124 on behalf of https://github.com/malfet due to This broke S390 builds, includes cpuinfo unconditionally ([comment](https://github.com/pytorch/pytorch/pull/134124#issuecomment-2552560208))	2024-12-19 01:05:11 +00:00

1 2 3

140 Commits