pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-11-11 22:34:53 +08:00

Author	SHA1	Message	Date
Nikita Shulga	f2ae7084eb	[BE] Use `linux.2xlarge.memory` for ASAN builds (#165164 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/165164 Approved by: https://github.com/janeyx99	2025-10-10 18:13:42 +00:00
Nikita Shulga	12d7cc5cd3	[BE] Set commit hooks to 3.10 viable/strict/1760135060	2025-10-10 11:09:13 -07:00
Sam Larsen	a2e2e1d8c0	Add pytorch_version and mast_application_packages to pt2 compile scuba logging (#165018 ) Summary: Two more fields requested for conda-on-mast jobs Differential Revision: D84214442 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165018 Approved by: https://github.com/c00w	2025-10-10 17:57:40 +00:00
PyTorch MergeBot	b67785d9eb	Revert "C++ API handle optimizer defaults (#161825 )" This reverts commit f33201729416ed17467228e80b04d01d4d02b5f3. Reverted https://github.com/pytorch/pytorch/pull/161825 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/161825#issuecomment-3391506427))	2025-10-10 17:56:11 +00:00
Malay Bag	4cd06dc82c	[PT2 Archive] Use tensor dtype while deduping/grouping weights (state_dict/constants) (#165090 ) Summary: While saving state_dict tensors, deduping is done to reduce number of tensor data. For this storage point is used. But when the tensor is empty, storage pointer is 0. But dtype of the tensors could be different. Existing logic will consider all such tensor as same. This will fail the model later when different dtype is expected. This change will include dtype also while deduping. For non empty tensor, this should not affect as the storage point will be unique. Test Plan: TBD Differential Revision: D84243094 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165090 Approved by: https://github.com/yiming0416	2025-10-10 17:51:43 +00:00
Animesh Jain	41936f4cf6	[dynamo][executorch] Do not trace into exeuctorch LoweredBackendModule (#165126 ) Required for https://github.com/pytorch/pytorch/pull/164691 .. comments inline Pull Request resolved: https://github.com/pytorch/pytorch/pull/165126 Approved by: https://github.com/tugsbayasgalan	2025-10-10 17:41:33 +00:00
Xiao Fu	dec9a59992	[dynamo][logging] Add most recent bytecode to graph break with torch._dynamo.graph_break() and verbose (#164422 ) https://github.com/pytorch/pytorch/issues/162858 The issue described the feature implemented. This adds to the existing graph break log with the latest 20 (or viable user frame) bytecode instructions. The scenario is when the graph_break happens without errors. It happens during the case when user calling torch._dynamo.graph_break(). Meanwhile, in the testing, one can find that the generated frame based on step() is not deterministic as sometimes it reached the maximum amount, sometimes it generated the less than that. The bytecode generation is python version dependent. Thus, the testing plan excludes the bytecode output but generated the total bytecode line count. This is a helpful process to understand bytecode transformation, symbolic convert, and convert frame. It is a helpful task to provide hands-on experience with dynamo workflow. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164422 Approved by: https://github.com/williamwen42, https://github.com/mlazos Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> viable/strict/1760132980	2025-10-10 17:33:06 +00:00
PyTorch MergeBot	f975bd58af	Revert "Warn if AccumulateGrad stream does not match producer node stream (#165065 )" This reverts commit a70ef954b919e990ebaba715b4072e76352867bf. Reverted https://github.com/pytorch/pytorch/pull/165065 on behalf of https://github.com/izaitsevfb due to breaks lint ([comment](https://github.com/pytorch/pytorch/pull/165065#issuecomment-3391387386))	2025-10-10 17:29:29 +00:00
can-gaa-hou	af42256db4	Fix missing brackets (#165138 ) As stated in the title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165138 Approved by: https://github.com/Aidyn-A, https://github.com/Skylion007	2025-10-10 17:23:31 +00:00
can-gaa-hou	39161e73fc	[Fix] missing lambda in torch._check (#165043 ) Fixes more missing lambda in torch._check in the source code. Inspired by #164225. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165043 Approved by: https://github.com/FFFrog, https://github.com/Skylion007	2025-10-10 17:11:55 +00:00
Avik Chaudhuri	3ed90f5a09	outline various stages from aot stage2 compile (#164808 ) Splits the training and inference paths for aot stage2 compile. 1. Split `aot_stage2_autograd` into `_aot_stage2a_partition`, `_aot_stage2b_fw_compile` and `_aot_stage2b_bw_compile`, and rest. 2. Split `aot_stage2_inference` into `_aot_stage2b_inference_compile` and rest. I'm leaving these as functions with underscore names since the I/O interfaces and the exact boundaries of these splits are somewhat in the air. Differential Revision: D84028203 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164808 Approved by: https://github.com/SherlockNoMad	2025-10-10 17:04:36 +00:00
Aidyn-A	d41aa187ec	Add more B200 smoke test (#165133 ) A follow up to #159494. This PR adds additional `test_scaled_matmul_cuda` to smoke tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165133 Approved by: https://github.com/drisspg	2025-10-10 16:46:26 +00:00
Edward Yang	8b2137e74a	Don't use C++ CIA decomps if there's a Python one (#164970 ) Some more context at https://github.com/pytorch/pytorch/pull/164939 The basic point here is that Python decomps are guaranteed to be functional, whereas C++ ones are not. If we have a Python decomp, we should prefer it over the C++ one. This currently doesn't matter too much as CIA decomps will get functionalized, but it matters after the quoted PR because we now run these decompositions very late (to make it easy for things like aot_eager to get the fused versions of operators in proxy tensor). Signed-off-by: Edward Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/164970 Approved by: https://github.com/bdhirsh	2025-10-10 16:46:09 +00:00
soulitzer	a70ef954b9	Warn if AccumulateGrad stream does not match producer node stream (#165065 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/165065 Approved by: https://github.com/ngimel ghstack dependencies: #162815	2025-10-10 16:46:01 +00:00
Gheorghe-Teodor Bercea	01a2812f48	[ROCm] Adjust grid size for non-unit stride backwards indexing (#165026 ) Adjust grid size for non-unit stride backwards indexing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165026 Approved by: https://github.com/jeffdaily viable/strict/1760129489	2025-10-10 16:36:38 +00:00
bobrenjc93	3f27100d3e	[torchfuzz] remove fixed xfail (#165116 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/165116 Approved by: https://github.com/PaulZhang12	2025-10-10 16:31:27 +00:00
Angel Li	253fd765bd	bf16 support for fake_quantize_learnable_per_channel_affine (#165098 ) Adding bf16 support for `torch._fake_quantize_learnable_per_channel_affine()` op by relaxing the type check on scale TODO: need to add bf16 support to `per_tensor_affine_` as `torch._fake_quantize_learnable_per_tensor_affine_backward` gets called in the backward pass Test Modified unit test in `test_workflow_ops.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/165098 Approved by: https://github.com/jerryzh168, https://github.com/andrewor14	2025-10-10 16:24:52 +00:00
PyTorch MergeBot	abb2f7179e	Revert "Fix truediv numerics between eager and compile (#164144 )" This reverts commit 68913d8f2a953bdbada4033101b04f6e8d49dabe. Reverted https://github.com/pytorch/pytorch/pull/164144 on behalf of https://github.com/malfet due to It breaks CI again, why was it landed for 3 times in a row without any changes? ([comment](https://github.com/pytorch/pytorch/pull/164144#issuecomment-3390973016)) viable/strict/1760127924	2025-10-10 16:10:25 +00:00
Wu, Zhenyu	b57ab9a3f2	Fix #165125 : Type "str" is not assignable to return type "None" (#165128 ) Fixes #165125 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165128 Approved by: https://github.com/malfet	2025-10-10 16:05:07 +00:00
Yuanyuan Chen	fb64da0791	[2/N] Use "is" in python type comparison (#165142 ) This is follow-up of #165037. It generally recommended to use `is/is not` to compare types. Therefore this series of changes apply this suggestion in the code base, and it aims to finally enabling related linter checks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165142 Approved by: https://github.com/albanD	2025-10-10 15:36:44 +00:00
Thanh Ha	10a9fb641b	Switch build jobs from linux.4xlarge to c7i (#165057 ) Switch build jobs that use linux.4xlarge which uses c5 instance types to c7i variant. This should improve performance by ~15-20% while cutting costs by ~10-15%. Relates to pytorch/test-infra#7175 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165057 Approved by: https://github.com/huydhn	2025-10-10 15:13:40 +00:00
PyTorch MergeBot	9420944033	Revert "[AMP][Refactor] Simplify dtype support logic in autocast context manager (#163446 )" This reverts commit 960b0d5f0d0efb1f1962bddcf62e2a698e26edd2. Reverted https://github.com/pytorch/pytorch/pull/163446 on behalf of https://github.com/izaitsevfb due to breaks autocast tests on linux and mac ([comment](https://github.com/pytorch/pytorch/pull/163446#issuecomment-3390688642))	2025-10-10 15:12:46 +00:00
Chinmay Kuchinad	55f01a48af	[ROCm] Enable and fix several FSDP + Inductor distributed unit tests (#165011 ) This PR enables a number of distributed unit tests and applies necessary fixes to ensure they pass on ROCm platforms. The changes have been successfully tested on both MI200 and MI300 hardware. This work addresses the following issues: https://github.com/ROCm/frameworks-internal/issues/13586 https://github.com/ROCm/frameworks-internal/issues/13578 Enabled Tests The following tests have been enabled and are now passing: 1. test_compiled_autograd_ctx 2. test_simple_mlp_fullgraph_backend_aot_eager 3. test_simple_mlp_fullgraph_backend_aot_eager_decomp_partition 4. test_simple_mlp_fullgraph_backend_inductor 5. test_nested_fully_shard_backend_aot_eager 6. test_nested_fully_shard_backend_aot_eager_decomp_partition 7. test_nested_fully_shard_backend_inductor_fullgraph_True 8. test_nested_fully_shard_backend_inductor_fullgraph_True_graph_partition 9. test_transformer_backend_aot_eager 10. test_transformer_backend_aot_eager_decomp_partition 11. test_storage_resize_zero_gpu 12. test_storage_resize_nonzero_gpu 13. test_fake_distributed_inductor Tests skipped due to upstream issues: 1. test_nested_fully_shard_backend_inductor_fullgraph_False 2. test_transformer_backend_inductor_fullgraph_True 3. test_transformer_backend_inductor_fullgraph_True_graph_partition 4. test_transformer_backend_inductor_fullgraph_False Pull Request resolved: https://github.com/pytorch/pytorch/pull/165011 Approved by: https://github.com/jeffdaily	2025-10-10 14:10:54 +00:00
PaulZhang12	68913d8f2a	Fix truediv numerics between eager and compile (#164144 ) Addresses numeric differences between eager and compile in https://github.com/pytorch/pytorch/issues/141753 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164144 Approved by: https://github.com/eellison, https://github.com/jansel, https://github.com/ngimel	2025-10-10 14:00:46 +00:00
PyTorch MergeBot	b8be796a57	Revert "[2/N] More ruff SIM fixes (#165031 )" This reverts commit 38095fbd1323ee4a9541fbcbb9b28bd20f2cd956. Reverted https://github.com/pytorch/pytorch/pull/165031 on behalf of https://github.com/albanD due to One of the changed line started to fail on trunk ([comment](https://github.com/pytorch/pytorch/pull/165031#issuecomment-3390190870))	2025-10-10 13:42:14 +00:00
Howard Huang	238dd5517d	[PP] Move profiler record_function in schedule (#164976 ) Better engineering to move the `record_function` call to also encompass the custom callback, this line is the only change: https://github.com/pytorch/pytorch/pull/164976/files#diff-1d3d91f53db88fb886901fb178d69e47776e71b8103f85688fa9ca64cc55d068R2147, the rest is just formatting. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164976 Approved by: https://github.com/fegin ghstack dependencies: #162016, #164962	2025-10-10 13:09:23 +00:00
eellison	d272ed4b3e	Fix identity expansion (#165066 ) In some cases, we wrap indexing with `Identity` to prevent expansion from int32 -> int64 range. There are some checks in codegen which intend to check for constants, which did not handle Identity. Update these checks and update Identity so that it recursively prints inputs. Fix for https://github.com/pytorch/pytorch/issues/164700 Replaces https://github.com/pytorch/pytorch/pull/160190 cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @jerryzh168 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @njriasan Pull Request resolved: https://github.com/pytorch/pytorch/pull/165066 Approved by: https://github.com/njriasan, https://github.com/shunting314, https://github.com/jansel	2025-10-10 13:07:15 +00:00
Yuanyuan Chen	70925bdf82	[1/N] Use "is" in python type comparison (#165037 ) It generally recommended to use `is/is not` to compare types. Therefore this series of changes apply this suggestion in the code base, and it aims to finally enabling related linter checks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165037 Approved by: https://github.com/mlazos	2025-10-10 12:36:50 +00:00
KarhouTam	960b0d5f0d	[AMP][Refactor] Simplify dtype support logic in autocast context manager (#163446 ) ## Description: This PR refactors the autocast context manager in `autocast_mode.py` to simplify and centralize the logic for checking supported dtypes for each device. The previous implementation repeated similar checks for multiple device types. Now, a single mapping `device_supported_dtypes` is used to associate device types with their supported dtypes, and the validation logic is unified. In my view, this makes the code easier to maintain and extend for new devices. Please share any suggestions and comments with me. BTW, in the original `xla` branch, the `supported_dtype` are `[torch.float16, torch.bfloat16]`, `5d8a226e23/torch/amp/autocast_mode.py (L358-L363)` but the warning message has only `torch.bfloat16`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/163446 Approved by: https://github.com/FFFrog, https://github.com/albanD	2025-10-10 12:30:06 +00:00
FFFrog	e0abcee3b5	[Code Clean] Remove support of python3.9 (#163846 ) As the title stated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/163846 Approved by: https://github.com/ezyang	2025-10-10 11:11:56 +00:00
Shangdi Yu	77bf23d85c	Add an option to put store large mmap weights on disk (#164526 ) As title In windows, we cannot modify the .dll to append weights at the end, the windows .dll loader will complain it's not a valid .dll file. So we store the weight blob as a separete file. 1. We add the following API which allows passing in a pointer to the weight blob and get the size of the weight blob. ```cpp AOTI_API AOTIRuntimeError AOTInductorModelContainerGetConstantsBlobSize( AOTInductorModelContainerHandle container_handle, uint64_t* ret_size); // Load weights from a single blob in weight_blob_ptr AOTI_API AOTIRuntimeError AOTInductorModelUpdateConstantsFromBlob( AOTInductorModelContainerHandle container_handle, const uint8_t* weight_blob_ptr); ``` 2. We also add a method in ModelContainerRunner to load the weight: If the runner see that there is a `.blob` file in the package, if will mmap the .blob file and use the content to load the constants. 3. We also add the `USE_MMAP_EXTERNAL` macro. When this macro is defined, the model expects to load the weights from external mmap'd weights. Test Plan: ``` buck run @mode/dev-nosan caffe2/test/inductor:test_aot_inductor -- -r test_large_mmaped_weights_on_disk ``` Also tested for windows-cross compilation with `6542566585/demo/main_voxtral.cpp` ``` Loaded model.dll audio_encoder loaded C:\Users\shangdiy\source\repos\torchnative\demo\token_embedding\data\aotinductor\model\model.wrapper.so Loaded model.dll token_embedding loaded C:\Users\shangdiy\source\repos\torchnative\demo\text_decoder\data\aotinductor\model\model.wrapper.so Loaded model.dll Loading weights from C:\Users\shangdiy\source\repos\torchnative\demo\text_decoder\data\aotinductor\model\model.wrapper_weights.blob text_decoder loaded Load latency (ms): audio_encoder: 1011.234 archive extraction: 0.000 .so loading: 1011.197 token_embedding: 525.773 archive extraction: 0.000 .so loading: 525.704 text_decoder: 3324.130 archive extraction: 0.000 .so loading: 3323.979 Run latency (ms): audio_encoder: 285.958 audio_encoder output: dtype=bfloat16, shape=[1, 1125, 3072], numel=3456000 token_embedding: 6.676 token_embedding output: dtype=bfloat16, shape=[1, 1138, 3072], numel=3495936 text_decoder: 576.519 text_decoder output: dtype=bfloat16, shape=[1, 1138, 131072], numel=149159936 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/164526 Approved by: https://github.com/desertfire	2025-10-10 07:53:57 +00:00
PyTorch MergeBot	d2cb183344	Revert "[inductor] verify determinism with inductor benchmark script (#164904 )" This reverts commit a3c700656f9a666eb33074b60333a23eb7e99a15. Reverted https://github.com/pytorch/pytorch/pull/164904 on behalf of https://github.com/huydhn due to Sorry for reverting your PR but there seems to be some failed vLLM failures coming out of this ([comment](https://github.com/pytorch/pytorch/pull/164904#issuecomment-3388443678))	2025-10-10 06:23:07 +00:00
Yuanyuan Chen	38095fbd13	[2/N] More ruff SIM fixes (#165031 ) This is follow-up of #164695 to apply ruff SIM rules to more files. Most changes are about simplifying dict.get because None is already the default value. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165031 Approved by: https://github.com/mlazos	2025-10-10 05:37:46 +00:00
Yuanyuan Chen	ffc9559d9f	[7/N] Apply ruff UP035 rule (#164653 ) This PR is follow-up of #164438 to continue applying `UP035` rule. All changes are about proper `Callable` importation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164653 Approved by: https://github.com/aorenste viable/strict/1760091921	2025-10-10 05:16:17 +00:00
Simon Layton	172d6ed8b8	Refactor _scaled_grouped_mm_cuda dispatch (#165060 ) Summary: * Clean & simplify different scaling recipe dispatch * Split out recipes into separate dispatch functions Test Plan: ``` pytest -svv -k grouped test/test_scaled_matmul_cuda.py ``` Reviewers: Subscribers: Tasks: Tags: Signed-off-by: Simon Layton <simonlayton@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/165060 Approved by: https://github.com/danielvegamyhre, https://github.com/ngimel	2025-10-10 04:44:25 +00:00
Nikita Shulga	9a3c4b917e	[CMake] Remove forcing of `-O2` from `torch_compile_options` (#164894 ) That was introduced by `75a65ffe0f` Hattip to @jathu for alerting me about the issue. As result, all our PyTorch builds were shipped with `-O2` for almost all of its modern history Partially undo the damage introduced by https://github.com/pytorch/pytorch/pull/128406 that cause cross-ISA symbols leak, to be properly followed up in https://github.com/pytorch/pytorch/issues/165123 Pull Request resolved: https://github.com/pytorch/pytorch/pull/164894 Approved by: https://github.com/ezyang	2025-10-10 04:43:53 +00:00
PyTorch MergeBot	df514a6d5a	Revert "[inductor][eazy] change how torch.use_deterministic_algorithms affect inductor (#164905 )" This reverts commit 344e6365a0068c2d2847fcec0c55dd53291d475e. Reverted https://github.com/pytorch/pytorch/pull/164905 on behalf of https://github.com/huydhn due to Sorry for reverting your PR but there seems to be some failed vLLM failures coming out of this ([comment](https://github.com/pytorch/pytorch/pull/164905#issuecomment-3388258660))	2025-10-10 04:37:09 +00:00
Maggie Moss	48fe858fef	Fix error, remove file from pyrefly checking (#165094 ) Reported issue with formatting and parsing. Removing suppressions and avoiding this file in future type checking until we can get a more complete fix in . Pull Request resolved: https://github.com/pytorch/pytorch/pull/165094 Approved by: https://github.com/albanD	2025-10-10 04:34:51 +00:00
PyTorch MergeBot	7ab00c7c17	Revert "Hotfix test scaled matmul cuda (#165104 )" This reverts commit 9aa92f246fa5fe5cfda17970d41d167b19a0612a. Reverted https://github.com/pytorch/pytorch/pull/165104 on behalf of https://github.com/malfet due to Looks like it broke cuda tests, isn't it, see `44b1ff54e9/1` ([comment](https://github.com/pytorch/pytorch/pull/165104#issuecomment-3388247886)) viable/strict/1760089782	2025-10-10 04:32:18 +00:00
Nikita Shulga	44b1ff54e9	[CD] Do not propagate download.pytorch.org IP into container (#165075 ) Followup after https://github.com/pytorch/pytorch/pull/164969 Should fix binary build test failures Pull Request resolved: https://github.com/pytorch/pytorch/pull/165075 Approved by: https://github.com/seemethere, https://github.com/huydhn ghstack dependencies: #164968, #164969	2025-10-10 04:27:29 +00:00
PyTorch MergeBot	daea35df5c	Revert "[CD] Do not propagate download.pytorch.org IP into container (#165075 )" This reverts commit 6d27a8e5093ee2a21d44dceeeffcb272e6e0f655. Reverted https://github.com/pytorch/pytorch/pull/165075 on behalf of https://github.com/pytorch-auto-revert due to Reverted automatically by pytorch's autorevert, to avoid this behaviour add the tag autorevert: disable ([comment](https://github.com/pytorch/pytorch/pull/165075#issuecomment-3388228013))	2025-10-10 04:20:51 +00:00
Laith Sakka	7f2a902ea2	more sizelike deprecation (#164889 ) remove expext_size c++ bindings and usages Pull Request resolved: https://github.com/pytorch/pytorch/pull/164889 Approved by: https://github.com/mlazos ghstack dependencies: #164884, #164885, #164886, #164887, #164888	2025-10-10 03:45:06 +00:00
Mikayla Gawarecki	9c057d9863	[BE] Refresh documentation for stable ABI / API (#163899 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/163899 Approved by: https://github.com/janeyx99	2025-10-10 03:26:28 +00:00
Yiming Zhou	938869e7d3	[DTensor] Improve sharding propagation error msg in DTensor dispatch (#164623 ) Fixes #164543 This PR improves the `__str__` method of DTensor's `OpSchema` to provide better readable error message when dispatch fails as the error message prints `{op_info.schema}` example 1 `aten.embedding` ``` aten.embedding.default(Spec(f32[2048, 256](S(0))), Spec(i64[16, 2048](S(0)R))) on DeviceMesh((dp=2, tp=2), 'cuda', stride=(2, 1))) ``` example 2 `aten.mm` ``` aten.mm.default(Spec(f32[1024, 512](S(1))), Spec(f32[512, 256](S(0)))) on DeviceMesh((tp=4), 'cuda', stride=(1,))) ``` example 3 `aten._scaled_dot_product_flash_attention` ``` aten._scaled_dot_product_flash_attention.default(Spec(f16[8, 16, 128, 64](RS(1))), Spec(f16[8, 16, 128, 64](RS(1))), Spec(f16[8, 16, 128, 64](RS(1)))) on DeviceMesh((dp=2, tp=4), 'cuda', stride=(4, 1))) ``` Added test ``` python test/distributed/tensor/test_dtensor_ops.py -k test_embedding_error_msg ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/164623 Approved by: https://github.com/zpcore	2025-10-10 03:16:04 +00:00
Yuanyuan Chen	ce6b589545	Enable B904 check of flake8 (#165047 ) The description of `B904` is `Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling. ` Pull Request resolved: https://github.com/pytorch/pytorch/pull/165047 Approved by: https://github.com/Lucaskabela	2025-10-10 03:08:01 +00:00
Dzmitry Huba	ae25dd51fc	Simplifying computation of the final result for equals op on DTensor (#164999 ) Instead of collecting local results using all_gather_object followed by local reduction, with this change we switch to using a single all_reduce with MIN reduction operation to compute the final equals result. This change is needed to enable LocalTensor work (all_gather_object introduces challenges in for DTensor and LocalTensor integration). topic: not user facing Pull Request resolved: https://github.com/pytorch/pytorch/pull/164999 Approved by: https://github.com/ezyang	2025-10-10 03:01:28 +00:00
Simon Fan	a61d0de9f9	[hop] support local_map filtered gradients (#164437 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164437 Approved by: https://github.com/ezyang ghstack dependencies: #164296, #164321, #164419, #164420, #164340, #163602, #164431, #164433	2025-10-10 02:34:27 +00:00
Simon Fan	3ad88924ad	[hop] support local_map None placements (#164433 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164433 Approved by: https://github.com/ezyang ghstack dependencies: #164296, #164321, #164419, #164420, #164340, #163602, #164431	2025-10-10 02:34:27 +00:00
Simon Fan	3241b9c15f	[hop] support local_map None gradients (#164431 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/164431 Approved by: https://github.com/bdhirsh ghstack dependencies: #164296, #164321, #164419, #164420, #164340, #163602	2025-10-10 02:34:27 +00:00
Simon Fan	25d4d5107e	[dynamo] trace local_map with local shapes for AP (#163602 ) Context is in https://www.internalfb.com/excalidraw/EX519691 and https://docs.google.com/document/d/1qnuXLZk_GYt_PksHTwkn7L2ELRDnYlIRPkHAlXTyuhw/edit?tab=t.0. And the description of the previous PR: https://github.com/pytorch/pytorch/pull/164340. The previous PR adds the support on the HOP side for eager execution and AOTAutograd. Dynamo is still passing the HOP a subgraph with wrong shapes. This PR fixes that. This is similar to the HOP implementation, however we additionally need to manually keep the TensorVariable metadata in sync. Pull Request resolved: https://github.com/pytorch/pytorch/pull/163602 Approved by: https://github.com/ydwu4 ghstack dependencies: #164296, #164321, #164419, #164420, #164340	2025-10-10 02:34:27 +00:00

1 2 3 4 5 ...

94290 Commits