pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Author	SHA1	Message	Date
Tiago Quelhas	26740f853e	Remove unnecessary use of ctx.resolve_tools. (#120493 ) In this case, it's simpler to use ctx.actions.run(executable = ...), which already ensures that the runfiles associated with the executable are present. (It's also possible to use ctx.actions.run_shell(tools = ...) with a custom command line, but it's unclear to me that indirecting through the shell is needed here.) Pull Request resolved: https://github.com/pytorch/pytorch/pull/120493 Approved by: https://github.com/ezyang	2024-03-07 22:33:17 +00:00
William Wen	d14d62b7aa	[dynamo] add more refleak tests (#120657 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/120657 Approved by: https://github.com/jansel	2024-03-07 22:25:43 +00:00
Edward Z. Yang	6490441d8f	Remove dead get_shape_groups (#120813 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/120813 Approved by: https://github.com/albanD	2024-03-07 22:20:30 +00:00
Oguz Ulgen	18d574a07a	[Inductor] Use indices for constants in triton_meta (#121427 ) @bertmaher pointed out that constants are passed with their indices, not their names. Looking at triton source, this appears to be true `392370b303/python/triton/runtime/jit.py (L381-L385)` I'm guessing both indices and names work here but lets be consistent. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121427 Approved by: https://github.com/aakhundov	2024-03-07 21:59:43 +00:00
Antoni Viros	f61192b014	Fix for Wait kernel lowering in inductor not accepting MultiOutputs from non-collective calls (#121428 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121428 Approved by: https://github.com/yifuwang	2024-03-07 21:29:25 +00:00
Zhengxu Chen	76f1461892	[export] Serialize union fields with single entry dict. (#121263 ) (#121337 ) Summary: remove "$type" and "$value" fields, instead only serialize as {type: value} for union fields directly. bypass-github-export-checks Test Plan: CI Differential Revision: D54600943 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121337 Approved by: https://github.com/tugsbayasgalan	2024-03-07 21:24:28 +00:00
Scott Wolchok	4c58f2b675	[PyTorch] Use uint32_t for ProcessedNode::num_outputs (#121335 ) We already use uint32_t for indexing, and the notion of a single graph node with more than four billion outputs stretches credulity. Differential Revision: [D54598821](https://our.internmc.facebook.com/intern/diff/D54598821/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121335 Approved by: https://github.com/Skylion007	2024-03-07 21:15:05 +00:00
Joel Schlosser	ea8f6e2e54	Subclass view fake-ification via reified ViewFuncs (#118405 ) This PR: * Uses reified ViewFuncs to swap in fake tensors / symbolic SymInts for view replay during subclass view fake-ification * Enables automatic dynamic on view bases -> fakeifies according to the resultant symbolic context instead of the old "all-static" approach * Covers the following view types: * subclass -> dense * dense -> subclass * subclass -> subclass * Dense -> dense views are handled the old way via an `as_strided()` call, as it's likely there is no view func available Differential Revision: [D54269082](https://our.internmc.facebook.com/intern/diff/D54269082) Pull Request resolved: https://github.com/pytorch/pytorch/pull/118405 Approved by: https://github.com/ezyang	2024-03-07 19:56:16 +00:00
Catherine Lee	63ec5cd158	TD Heuristic for tests mentioned in PR body, less verbose TD printing (#120621 ) Move tests that are mentioned in PR body or commit message to front. Also attempts to find any issues/PRs mentioned in the PR body and search for those too (ex if you link a disable issue and that issue contains the test file that it was failing on) looking for: dynamo/test_export_mutations Also removes some printed information in TD Pull Request resolved: https://github.com/pytorch/pytorch/pull/120621 Approved by: https://github.com/osalpekar	2024-03-07 19:36:11 +00:00
Nikita Shulga	c7a65f58b0	[CI] Script to fetch creds from current AWS session (#121426 ) Because some implementations, like OpenDAL does not work with AWS IMDSv2, but this script will bridge the gap and enables more recent `sccache` releases(that switched from simple-s3 to OpenDAL) to work in current CI system When launched it prints something like: ``` export AWS_ACCESS_KEY_ID=XXXXX export AWS_SECRET_ACCESS_KEY=YYYY export AWS_SESSION_TOKEN=ZZZZ ``` which can be `eval`ed and passed then sccache can use those failures. Validated in https://github.com/pytorch/pytorch/pull/121323 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121426 Approved by: https://github.com/Skylion007	2024-03-07 19:25:54 +00:00
PyTorch MergeBot	2b1661c7a0	Revert "[compiled autograd] support custom ops backed by c++ autograd::Function (#120681 )" This reverts commit 05c256849b464deee16ccd70152fd54071c6c79c. Reverted https://github.com/pytorch/pytorch/pull/120681 on behalf of https://github.com/izaitsevfb due to breaking internal builds, see D54617701 ([comment](https://github.com/pytorch/pytorch/pull/120681#issuecomment-1984214079))	2024-03-07 18:53:51 +00:00
Shengbao Zheng	60aaba4128	create function to get ProcessGroupNCCL uid (#121132 ) Summary: expose ProcessGroupNCCL uid Differential Revision: D54446056 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121132 Approved by: https://github.com/aaronenyeshi	2024-03-07 18:34:38 +00:00
Jane Xu	83d095c213	[BE] Remove unnecessary requires_cuda in common_optimizers.py (#121249 ) @mlazos had already added the needed decorator on the test itself. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121249 Approved by: https://github.com/Skylion007, https://github.com/mlazos, https://github.com/albanD ghstack dependencies: #121183	2024-03-07 17:57:02 +00:00
Jane Xu	53bdae736d	Add capturable single tensor Adamax (#121183 ) Finishes the work started in https://github.com/pytorch/pytorch/pull/118697. Thanks @MarouaneMaatouk for the attempt, but due to inactivity I have opened this PR for Adamax. Note that the new capturable implementation is much simpler and I've modified the foreach capturable impl--it now calls fewer kernels and is more easily comparable to forloop. Next steps: * This PR discovered two bugs: #121178 and #121238. * Move the now hefty graph optim tests in test_cuda to use OptimInfo. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121183 Approved by: https://github.com/albanD	2024-03-07 17:57:02 +00:00
Catherine Lee	af88425cdc	Forward fix lint after 121202 (#121425 ) Forward fix after #121202, where the lintrunner job failed due to being unable to checkout the pytorch repo Pull Request resolved: https://github.com/pytorch/pytorch/pull/121425 Approved by: https://github.com/ezyang, https://github.com/aakhundov, https://github.com/malfet	2024-03-07 17:20:13 +00:00
suo	c3c15eb9a6	[export] update docs to not export raw functions (#121272 ) as title Differential Revision: [D54555101](https://our.internmc.facebook.com/intern/diff/D54555101/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121272 Approved by: https://github.com/zhxchen17	2024-03-07 17:18:07 +00:00
PyTorch MergeBot	862b99b571	Revert "[ATen][CUDA][CUBLAS] cublasLtMatmul increase workspace_size (#120925 )" This reverts commit 3239f86a3df133b5977d988324639e0de7af8749. Reverted https://github.com/pytorch/pytorch/pull/120925 on behalf of https://github.com/malfet due to Breaks internal tests, likely due to the increased memory requirements ([comment](https://github.com/pytorch/pytorch/pull/120925#issuecomment-1983875400))	2024-03-07 16:16:07 +00:00
Shengbao Zheng	eea37c6db4	[profiler] record nccl version in distributed info (#121044 ) Summary: Add a field of NCCL version in distributed info if backend is NCCL Differential Revision: D54432888 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121044 Approved by: https://github.com/aaronenyeshi	2024-03-07 15:56:02 +00:00
cyy	3aa512cd72	[Clang-tidy header][23/N] Enable clang-tidy coverage on aten/src/ATen/*.{cpp,h} (#121380 ) This PR finishes the works beginning with #https://github.com/pytorch/pytorch/pull/120763 by enabling clang-tidy on aten/src/ATen. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121380 Approved by: https://github.com/Skylion007	2024-03-07 15:11:07 +00:00
IvanKobzarev	9a45001905	[dynamo] relax missing symbols runtime assert (#121339 ) Differential Revision: [D54603361](https://our.internmc.facebook.com/intern/diff/D54603361) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121339 Approved by: https://github.com/ezyang	2024-03-07 14:53:38 +00:00
Bin Bao	0339f1ca82	[Inductor] Allocate another shard for testing cpp-wrapper JIT (#121310 ) Summary: The ABI-compatible for cpp wrapper has not been turned on as default, so test them separately. Expect to add more tests for the shard. Differential Revision: [D54617287](https://our.internmc.facebook.com/intern/diff/D54617287) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121310 Approved by: https://github.com/chenyang78 ghstack dependencies: #121309	2024-03-07 14:24:21 +00:00
Bin Bao	7e598c0053	[Inductor] Enable ABI-compatible mode for cpp-wrapper JIT (#121309 ) Differential Revision: [D54617284](https://our.internmc.facebook.com/intern/diff/D54617284) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121309 Approved by: https://github.com/chenyang78	2024-03-07 14:22:06 +00:00
Kai Londenberg	57fc35a3af	[ Inductor ] Shape padding honors output stride preservation (#120797 ) This fix makes sure that shape padding honors inductors 'keep_output_strides' setting. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120797 Approved by: https://github.com/eellison	2024-03-07 13:52:29 +00:00
cyy	4305c64fea	Change ATEN generator argument type to const std::optional<Generator>& (#120076 ) This PR proposes to use std::optional<Generator>& for underlying functions to avoid unnecessary copy and move operations. The torchgen code was changed to generate the new type. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120076 Approved by: https://github.com/malfet	2024-03-07 09:52:21 +00:00
Shunting Zhang	1ce5049692	[inuctor] fix the layout problem for nll_loss2d_backward (#121173 ) Fixes https://github.com/pytorch/pytorch/issues/120759 . The CUDA implementation of nll_loss2d_backward.default requires that the 'self' tensor to be contiguous. These implicit assumption may be broken by layout optimizations. The fix here is to add the constraint when we explicitly defining the fallback for the op. Not sure if we can improve the cuda kernel to release the constraints though. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121173 Approved by: https://github.com/jansel, https://github.com/desertfire	2024-03-07 09:05:07 +00:00
mingfeima	b3065f6899	add int8 packed gemm support on CPU device (#118056 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/118056 Approved by: https://github.com/mikekgfb	2024-03-07 08:41:43 +00:00
Andrew Gu	e8e3049f57	[FSDP2] Relaxed check for parent mesh (#121360 ) Mixing 1D and 2D `DTensor`s in the same sharded state dict should be okay, so we can remove the check that a parameter for FSDP to shard must be a `DTensor` if passing a child mesh to FSDP. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121360 Approved by: https://github.com/yifuwang, https://github.com/Skylion007 ghstack dependencies: #120351, #121328	2024-03-07 08:09:25 +00:00
Valentine233	db36d21f5c	Add SDPA pattern for HuggingFace models BF16 (#121202 ) ### Description - Add pattern for bf16 input type with fp32 attention mask. (Example model: ElectraForCausalLM) - Add pattern with batch_size=1 to avoid some clones in graph. (Example model: text-classification+prajjwal1-bert-tiny) ### Newly matched models Dtype: bf16, machine: SPR #### Dynamo HuggingFace models - ElectraForCausalLM (speedup=2.09x) - ElectraForQuestionAnswering (speedup=4.22x) - AlbertForQuestionAnswering (speedup=1.36x) - AlbertForMaskedLM (speedup=1.39x) #### OOB HuggingFace models - multiple-choice+google-electra-base-discriminator - text-classification+prajjwal1-bert-tiny - text-classification+prajjwal1-bert-mini - text-classification+google-electra-base-generator - text-classification+bert-large-cased - casual-language-modeling+xlm-roberta-base - text-classification+roberta-base - text-classification+xlm-roberta-base - text-classification+albert-base-v2 - token-classification+google-electra-base-generator - masked-language-modeling+bert-base-cased Pull Request resolved: https://github.com/pytorch/pytorch/pull/121202 Approved by: https://github.com/jgong5, https://github.com/jansel	2024-03-07 07:40:00 +00:00
Oguz Ulgen	953c6c37cb	Wrap remote cache creation with a try-catch (#121340 ) Summary: In production I am seeing errors like "AttributeError: module 'triton.runtime' has no attribute 'fb_memcache'", likely due to some package skew. Until this is resolved, lets wrap this code with try-catch. Test Plan: CI Differential Revision: D54604339 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121340 Approved by: https://github.com/aakhundov	2024-03-07 07:05:49 +00:00
Chen_Liqing	291ce86a6c	Modify StorageImplCreateHelper (#118459 ) I want to use tensor.untyped_storage()[a:b] for ``PrivateUse1`` backend but fail. The code will go into ``THPStorage_get``: `bb6eba189f/torch/csrc/Storage.cpp (L525-L540)` Here ``torch`` will create a new ``c10::StorageImpl`` but not consider about ``PrivateUse1`` backend. Pull Request resolved: https://github.com/pytorch/pytorch/pull/118459 Approved by: https://github.com/albanD	2024-03-07 06:26:55 +00:00
Xia, Weiwen	f848e9c646	[Quant][Inductor] Fix q/dq per channel lowering with 64-bit qparams (#120984 ) Fixes #120869 Fix lowering of `quantize_per_channel` and `dequantize_per_channel` with float64 scale and int64 zero point. Generated codes are incorrect without explicit type conversion. Add type conversion to the lowering pass, i.e., float64 (double) -> float32 and int64 -> int32. Test plan python test/inductor/test_cpu_repro.py -k test_per_channel_fake_quant_module_uint8 Pull Request resolved: https://github.com/pytorch/pytorch/pull/120984 Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5, https://github.com/jerryzh168	2024-03-07 06:23:52 +00:00
Yeounoh Chung	4f9d4e1ab0	[DTensor][XLA] refactor DTensor _xla API (#113214 ) In response to the change pytorch/xla#5776 and #92909 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113214 Approved by: https://github.com/wanchaol	2024-03-07 06:18:05 +00:00
cyy	c723514ef4	[CUDACachingAllocator] Simplify update_stat and avoid casts (#120964 ) update_stat in CUDACachingAllocator.cpp was split into increase and decrease functions in this PR to simplify the implementation and avoid type casts throughout the code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120964 Approved by: https://github.com/albanD	2024-03-07 05:55:38 +00:00
drisspg	55232c4e1c	Make CausalBias a torch.Tensor subclass again (#121358 ) # Summary This was removed in #116071 in order to enable compile support and re-adding this seems to still work with compile Pull Request resolved: https://github.com/pytorch/pytorch/pull/121358 Approved by: https://github.com/Skylion007, https://github.com/cpuhrsch	2024-03-07 05:20:47 +00:00
Xilun Wu	df2ad1fecc	[dtensor][debug] have visualize_sharding correctly print for sub-mesh DTensor (#121216 ) Summary In `visualize_sharding` we chose to only print on rank 0 (global rank) which means calling `visualize_sharind` will never print anything when the dtensor object's mesh doesn't include rank 0 (i.e. a sub-mesh). This PR has `visualize_sharding` always print on rank whose mesh coordinate is (0, 0, ..., 0) instead of whose global rank is 0. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121216 Approved by: https://github.com/wanchaol ghstack dependencies: #121179, #120260	2024-03-07 04:50:15 +00:00
Xilun Wu	77873f6fe5	[dtensor][1/N] add torchrec even row-wise sharding example (#120260 ) Summary our goal is to demonstrate that DTensor's capability to represent TorchRec's parameter sharding. Currently this is done with `ShardedTensor` and theoretically `DTensor` can replace it with minor change. This PR serves as a start of this effort by adding an example test that represents TorchRec's `ShardingType.ROW_WISE` using DTensor. Note that this PR only covers the even sharding case. Test Run `torchrun --standalone --nnodes=1 --nproc-per-node=4 torch/distributed/_tensor/examples/torchrec_sharding_example.py -e row-wise` Pull Request resolved: https://github.com/pytorch/pytorch/pull/120260 Approved by: https://github.com/wanchaol ghstack dependencies: #121179	2024-03-07 04:50:15 +00:00
Xilun Wu	9cc0f23e5c	[dtensor][debug] allow visualize_sharding to print header (#121179 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121179 Approved by: https://github.com/wanchaol	2024-03-07 04:50:06 +00:00
jmarin	a2854ae904	Bugfix consume_prefix_in_state_dict_if_present function to keep the order of the state_dict (#117464 ) This PR proposes to keep the original order as the original state_dict, as the issue creator proposed. It also removes a bug concerning how ``_metadata`` is handled (see below), as well as other small changes to properly remove the prefix when is present. In the original code, ``_metadata`` was handled as a ``key``. ``` # also strip the prefix in metadata if any. if "_metadata" in state_dict: ``` This is not the case, ``_metadata`` is actually an ``attribute``. Hence, the previous condition is changed to: ``` # also strip the prefix in metadata if any. if hasattr(state_dict, "_metadata"): ``` This PR also includes the necessary test. Fixes #106942 Co-authored-by: mikaylagawarecki <mikaylagawarecki@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/117464 Approved by: https://github.com/mikaylagawarecki	2024-03-07 04:00:49 +00:00
Aaron Orenstein	edd80f87b8	Prevent infinite recursion within Tensor.__repr__ (#120206 ) `Tensor.__repr__` calls functions which can perform logging which ends up logging `self` (with `__repr__`) causing an infinite loop. Instead of logging all the args in FakeTensor.dispatch log the actual parameters (and use `id` to log the tensor itself). The change to torch/testing/_internal/common_utils.py came up during testing - in some ways of running the test parts was `('test', 'test_testing.py')` and so `i` was 0 and we were doing a join on `()` which was causing an error. Repro: ``` import torch from torch.testing import make_tensor from torch._subclasses.fake_tensor import FakeTensor, FakeTensorMode t = torch.sparse_coo_tensor(((0, 1), (1, 0)), (1, 2), size=(2, 2)) t2 = FakeTensor.from_tensor(t, FakeTensorMode()) print(repr(t2)) ``` and run with `TORCH_LOGS=+all` Pull Request resolved: https://github.com/pytorch/pytorch/pull/120206 Approved by: https://github.com/yanboliang, https://github.com/pearu	2024-03-07 02:24:45 +00:00
laith sakka	eb4d87f237	graph break on sparse tensors constructions (#120458 ) Fix some tests in https://github.com/pytorch/pytorch/issues/119780 sparse_bsc_tensor is not supported https://github.com/pytorch/pytorch/pull/117907 Also more about the issue here. https://docs.google.com/document/d/1EIb4qG88-SjVFn5TloLERliYdxIu2hwYoAA8skjOVfo/edit Pull Request resolved: https://github.com/pytorch/pytorch/pull/120458 Approved by: https://github.com/ezyang	2024-03-07 02:17:41 +00:00
Wanchao Liang	1a28ebffb3	[TP] Introduce Sequence Parallel Style for Laynorm/RMSNorm/Dropout (#121295 ) As titled, this PR introduces a dedicated `ParallelStyle` to shard the nn.LayerNorm/nn.Dropout/RMSNorm layers. We were mainly using a manual distribute_module calls before when sharding the RMSNorm layer, but I think we should have a dedicate TP API to easily shard those layers, instead of user manually using DTensors. I call this SequenceParallel, which might bring some confusion that we technically "deprecated" a SequenceParallel style months ago. But this time the SeuqenceParallel style is significantly different with the previous ones (which used to shard two consecutive Linear layers). I believe making it the right name is the first priority, instead of worrying about the issue of reusing the old name Pull Request resolved: https://github.com/pytorch/pytorch/pull/121295 Approved by: https://github.com/awgu, https://github.com/tianyu-l ghstack dependencies: #121294	2024-03-07 02:04:59 +00:00
Eddie Yan	967dd31621	[cuDNN] Cleanup cuDNN < 8.1 ifdefs (#120862 ) Follow-up of #95722 Pull Request resolved: https://github.com/pytorch/pytorch/pull/120862 Approved by: https://github.com/Skylion007	2024-03-07 01:46:25 +00:00
briancoutinho	b9087f8571	[profiler] Add execution_trace_observer as an optional argument to profiler (#119912 ) # Update Profiler API to collect Execution Traces ## TLDR We would like to simplify collecting Execution Trace and Kineto together. Execution Trace and Kineto both provide meaningful information that can be combined to enable benchmarking, performance analysis and simulating new hardware. ``` import torch def main(): with torch.profiler.profile( activities=[ torch.profiler.ProfilerActivity.CPU, torch.profiler.ProfilerActivity.CUDA, ], … excution_trace_observer=ExecutionTraceObserver() # <<<<<<< NEW ) as prof: ... prof.step() ``` See test/profiler/test_profiler.py 'test_execution_trace_with_kineto' for an example of using this API. ## What are Execution Traces? [Chakra Execution Traces](https://github.com/mlcommons/chakra/wiki) offer a graph based representation of AI/ML workloads. It stands apart from conventional AI/ML frameworks by focusing on replay benchmarks, simulators, and emulators, prioritizing agile performance modeling and adaptable methodologies. - Chakra is part of ML Commons industry standard and is being adopted by other companies besides NVIDIA too. - At Meta we have instrumented PyPer framework to collect Execution Traces. More details on our [PyTorch implementation of Chakra can be found here](https://github.com/mlcommons/chakra/wiki) Chakra essentially enables benchmarking and co-design for ML Models without having to reproduce entier software stacks and helps companies collaborate together [[chakra paper](https://arxiv.org/pdf/2305.14516.pdf)] ## Why correlate Execution Trace with PyTorch/Kineto Trace Both Execution Traces and Kineto/ provide different types of information and combining. While PyTorch ETs focus on CPU operators with explicit dependencies between them, Kineto traces encode GPU operators with their start and end times. In addition, collecting them at different timestamps will be inaccurate as several operations (NCCL, Embedding lookup) are data dependent and may not match correctly. Thus, it makes sense to collect both ET and Kineto together. The problem is that there are two code paths. ## Proposal The proposal is to modify the PyTorch profiler (Kineto) API to enable execution trace to be collected simultaneously, see TLDR section # Testing Updated the unit test for collecting kineto and Execution Trace together. - Check the collected ET has right range of events. - Compare two sets of IDs - record func Ids in ET and external IDs in Kineto. We check if these have a constant difference. ``` pytest test/profiler/test_profiler.py -k test_execution_trace_with_kineto -rP Running 1 items in this shard test/profiler/test_profiler.py [W execution_trace_observer.cpp:682] Enabling Execution Trace Observer STAGE:2024-03-05 09:05:05 1119546:1119546 ActivityProfilerController.cpp:314] Completed Stage: Warm Up [W execution_trace_observer.cpp:694] Disabling Execution Trace Observer STAGE:2024-03-05 09:05:05 1119546:1119546 ActivityProfilerController.cpp:320] Completed Stage: Collection STAGE:2024-03-05 09:05:05 1119546:1119546 ActivityProfilerController.cpp:324] Completed Stage: Post Processing ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/119912 Approved by: https://github.com/sanrise, https://github.com/aaronenyeshi	2024-03-07 01:30:26 +00:00
Lucas Pasqualin	eb1145436a	[DCP] Adds main in format utils (#120128 ) Adds main in format utils. Usage: `python -m torch.distributed.checkpoint.format_utils dcp_to_torch dcp_dir torch_file.pt` or `python -m torch.distributed.checkpoint.format_utils torch_to_dcp torch_file.pt dcp_dir` Differential Revision: [D53791355](https://our.internmc.facebook.com/intern/diff/D53791355/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/120128 Approved by: https://github.com/fegin, https://github.com/wz337	2024-03-07 01:18:17 +00:00
cyy	5cc511f72f	Use c10::irange and fix other index types in ForeachReduceOp.cu (#121123 ) This PR follows the suggestions in #121066 and changes most loops to c10::irange. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121123 Approved by: https://github.com/soulitzer	2024-03-07 00:11:27 +00:00
Xiaodong Wang	c268ce4a6d	Make ATen-cpu cuda/rocm agnostic (#121082 ) Summary: This specific rocm logic will make aten-cpu code diverge between rocm and cuda. This is not good because we won't be able to share aten-cpu.so between rocm and cuda. More specifically, this will prevent us build aten-hip by default, which requires us to set up rocm specific rules which is an extra burden for our build system. Test Plan: sandcastle + oss ci Differential Revision: D54453492 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121082 Approved by: https://github.com/jeffdaily, https://github.com/aaronenyeshi, https://github.com/albanD	2024-03-06 23:51:40 +00:00
Yichen Yan	e50ded03a6	Use type check for also `is_not` (#113859 ) Handle `is_not` for: `9647a251cb/torch/_dynamo/variables/builtin.py (L1314-L1317)` I noticed https://github.com/pytorch/pytorch/issues/111713 exists, I think it's no harm to land this first. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113859 Approved by: https://github.com/Skylion007	2024-03-06 23:12:42 +00:00
Wanchao Liang	a88356f45c	[dtensor] make add_.Tensor/div_.Scalar to be linear pointwise instead (#121294 ) add_.Tensor and div_.Scalar should support linearity so that we delay the partial results. This fixes the additional collective in the layernorm layer that we seen Pull Request resolved: https://github.com/pytorch/pytorch/pull/121294 Approved by: https://github.com/tianyu-l	2024-03-06 22:52:18 +00:00
Edward Z. Yang	2f064d895c	Switch TORCH_TRACE to accept a directory by default (#121331 ) Directory is better because it works smoothly with distributed runs; otherwise you'd need to modify torchrun to setup distinct log names for each file. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Differential Revision: [D54597814](https://our.internmc.facebook.com/intern/diff/D54597814) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121331 Approved by: https://github.com/albanD	2024-03-06 22:46:18 +00:00
Andrew Gu	372f192050	[DTensor] Initialized RNG tracker if needed (#121328 ) Since we are already checking if the RNG tracker is initialized, there is no real performance difference between erroring vs. just initializing a default RNG tracker (which we choose to be the `OffsetBasedRNGTracker`). ``` pytest test/distributed/_composable/fsdp/test_fully_shard_init.py -k test_meta ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/121328 Approved by: https://github.com/wanchaol ghstack dependencies: #120351	2024-03-06 22:21:44 +00:00

1 2 3 4 5 ...

70323 Commits