pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Author	SHA1	Message	Date
Nikita Shulga	6fb79b7004	Bump version: 1.14.0->2.0.0 (#90491 ) Except for the usual location, had to update the version in one of ONNX expect patterns, namely here: `43660051d8/test/onnx/expect/TestOperators.test_avg_pool2d.expect (L3)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/90491 Approved by: https://github.com/jansel, https://github.com/albanD	2022-12-09 01:08:08 +00:00
Yuxin Wu	ff5a3592e7	Fix static initialization issue for static build (#90133 ) Fixes #83255 Code comes from #83258 after fixing merge conflicts. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90133 Approved by: https://github.com/soumith, https://github.com/malfet	2022-12-09 01:01:15 +00:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	c8f5c194ca	Fix bug in dynamic shapes multiply (#90336 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90336 Approved by: https://github.com/ezyang	2022-12-09 00:59:50 +00:00
Andrew Gu	2cf703214b	[Composable API][Easy] Fix some follow-ups (#90471 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90471 Approved by: https://github.com/mrshenli	2022-12-09 00:26:38 +00:00
William Wen	eb5b4c21e1	Deepcopy GraphModule in minifier (#90401 ) Fixes https://github.com/pytorch/pytorch/issues/90397. Remove deepcopy calls in minifier tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90401 Approved by: https://github.com/anijain2305, https://github.com/mlazos	2022-12-08 23:59:05 +00:00
Howard Huang	80150788bc	[21/N] Add alltoall_base custom op with CPU/CUDA implementations (#89813 ) Differential Revision: [D41812670](https://our.internmc.facebook.com/intern/diff/D41812670) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89813 Approved by: https://github.com/kwen2501	2022-12-08 23:39:26 +00:00
Howard Huang	e65ee3975f	[20/N] Add recv_any_source custom op with CPU/CUDA implementations (#89505 ) Differential Revision: [D41812671](https://our.internmc.facebook.com/intern/diff/D41812671) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89505 Approved by: https://github.com/kwen2501	2022-12-08 23:39:26 +00:00
Rohan Varma	43660051d8	[Ez] Omit HSDP Z2 from doc (#90503 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90503 Approved by: https://github.com/awgu	2022-12-08 23:05:49 +00:00
Sergii Dymchenko	912a1f7b27	Fix issue 38095 TODOs in test_quantized_tensor.py (#90344 ) Fix TODOs related to https://github.com/pytorch/pytorch/issues/38095 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90344 Approved by: https://github.com/malfet	2022-12-08 22:28:15 +00:00
clee2000	fec39f6310	Don't update vision hash on push (#90498 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/90498 Approved by: https://github.com/malfet, https://github.com/seemethere	2022-12-08 22:03:24 +00:00
William Wen	9bb16cd3ca	Track torch.compile calls (#90310 ) Title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90310 Approved by: https://github.com/colin2328, https://github.com/anijain2305	2022-12-08 21:41:15 +00:00
Michael Lazos	76f440f20a	[dynamo] Rewrite inplace addcdiv and inplace add (#90330 ) Rewrite inplace addcdiv to a div, mul and inplace add to avoid graph break Rewrite inplace add to a mul and inplace add to avoid graph break Needed to close optimizer graph breaks Pull Request resolved: https://github.com/pytorch/pytorch/pull/90330 Approved by: https://github.com/jansel	2022-12-08 21:19:23 +00:00
Stephen Macke	0c972fb5c7	[rfc][pkg] check spec for module source before falling back to file in package exporter (#90258 ) Summary: To get source for a particular module, the "correct" thing to do is to check the module's spec and use `get_source` if it's a SourceFileLoader, since subclasses may look elsewhere than the `__file__`, and the spec will give the source of truth. For torch packager, however, we prefer to use linecache, but the loader could still change the file, so we figure out the file for the module using the spec's loader rather than using `module.__file__`, if possible. Test Plan: This code path will get exercised by CI. Also added a test for remapped files. Differential Revision: D41412983 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90258 Approved by: https://github.com/PaliC	2022-12-08 20:24:45 +00:00
Zheng Yan	e1674d7dc0	avoid fork in torch/__init__.py for deploy/multipy (#90492 ) Summary: We should not fork in deploy when initializing torch. Traceback (most recent call last): File "<string>", line 38, in <module> File "<string>", line 36, in __run File "/usr/local/fbcode/platform010/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/local/fbcode/platform010/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/data/users/zyan/fbsource/buck-out/v2/gen/fbcode/104a4d5c3a690252/multipy/runtime/__test_py__/test_py#link-tree/multipy/runtime/test_py.py", line 61, in <module> import torch # has to be done serially otherwise things will segfault File "/data/users/zyan/fbsource/buck-out/v2/gen/fbcode/104a4d5c3a690252/multipy/runtime/__test_py__/test_py#link-tree/torch/__init__.py", line 158, in <module> platform.system() != 'Windows': File "/usr/local/fbcode/platform010/lib/python3.8/platform.py", line 891, in system return uname().system File "/usr/local/fbcode/platform010/lib/python3.8/platform.py", line 857, in uname processor = _syscmd_uname('-p', '') File "/usr/local/fbcode/platform010/lib/python3.8/platform.py", line 613, in _syscmd_uname output = subprocess.check_output(('uname', option), Test Plan: override a local script run trigger init and set `subprocess.check_output` to None Reviewed By: yinghai, houseroad Differential Revision: D41848592 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90492 Approved by: https://github.com/PaliC	2022-12-08 20:22:01 +00:00
Elias Ellison	b651e06049	Add Pointwise Tag from pointwise set in DTensor, use in aot_autograd partitioner (#90029 ) Takes the pointwise op list from [DTensor](https://github.com/pytorch/pytorch/blob/master/torch/distributed/_tensor/ops/pointwise_ops.py#L36) as an initially starting point for pointwise ops, and feeds them to the aot autograd partitioner. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90029 Approved by: https://github.com/ezyang	2022-12-08 20:21:17 +00:00
Edward Z. Yang	8ca1c910fb	Refactor test_inductor_XXX to reduce code duplication (#90443 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/90443 Approved by: https://github.com/desertfire	2022-12-08 19:58:58 +00:00
Richard Zou	7342251281	functorch.grad support for autograd.Function (#89860 ) Happy to split this PR more if it helps. This PR adds functorch.grad support for autograd.Function. There's a lot going on; here is the high level picture and there are more details as comments in the code. Mechanism (PyOperator) - Somehow, autograd.Function needs to dispatch with functorch. This is necessary because every layer of functorch needs to see the autograd.Function; grad layers need to preserve the backward pass. - The mechanism for this is via PyOperator. If functorch transforms are active, then we wrap the autograd.Function in a `custom_function_call` PyOperator where we are able to define various rules for functorch transforms. - `custom_function_call` has a rule for the functorch grad transform. autograd.Function changes - I needed to make some changes to autograd.Function to make this work. - First, this PR splits autograd.Function into a _SingleLevelFunction (that works with a single level of functorch transform) and autograd.Function (which works with multiple levels). This is necessary because functorch's grad rule needs some way of specifying a backward pass for that level only. - This PR changes autograd.Function's apply to eitehr call `custom_function_call` (if functorch is active) or super().apply (if functorch isn't active). Testing - Most of this PR is just testing. It creates an autograd.Function OpInfo database that then gets passed to the functorch grad-based tests (grad, vjp, vjpvjp). - Since functorch transform tests are autogenerated from OpInfo tests, this is the easiest way to test various autograd.Function with functorch. Future - jvp and vmap support coming next - better error message (functorch only supports autograd.Function that have the optional setup_context staticmethod) - documentation to come when we remove the feature flag Pull Request resolved: https://github.com/pytorch/pytorch/pull/89860 Approved by: https://github.com/soulitzer	2022-12-08 19:31:04 +00:00
Richard Zou	eb314f9b1a	Add setup_context staticmethod to autograd.Function (#89859 ) Adds a setup_context staticmethod to autograd.Function. If it exists, then the user splits the ctx-specific logic from the forward() and puts it in the setup_context staticmethod. Docs will come later when we remove the feature flag. Test Plan: - some light tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/89859 Approved by: https://github.com/soulitzer	2022-12-08 19:31:04 +00:00
Richard Zou	103be1f164	Add feature flag for the autograd.Function extension (#89858 ) This PR adds a private runtime feature flag for the feature work we're going to do with extending autograd.Function. The motivation of the feature flag is: - to guard the feature against unsuspecting users - control the release of the feature to when we are ready to release it We might not even need the feature flag (because we hope to have the work done in the next month), but it is good practice and it does touch currently public API (autograd.Function). Concretely, "autograd.Function extension" refers to: - adding an optional `setup_context` staticmethod to autograd.Function - adding an optional `vmap` staticmethod to autograd.Function - autograd.Function support for functorch Test Plan: - new test that the feature flag works Pull Request resolved: https://github.com/pytorch/pytorch/pull/89858 Approved by: https://github.com/soulitzer	2022-12-08 19:31:01 +00:00
Yuxin Wu	1ba5c55992	skip flaky tests (rather than expectedFailure) (#90233 ) They are flaky but don't always fail. So `expectedFailure` is incorrect. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90233 Approved by: https://github.com/mruberry, https://github.com/soumith	2022-12-08 18:29:11 +00:00
PyTorch MergeBot	e89685b0b5	Revert "[inductor] Use decomposition for _to_copy (#90314 )" This reverts commit 3fdb5f2dda7164f6282e80c39799843527d135e7. Reverted https://github.com/pytorch/pytorch/pull/90314 on behalf of https://github.com/desertfire due to regresses performance on hf_Bert	2022-12-08 18:29:06 +00:00
Jiewen Tan	b738da8c8e	[LTC] Tweak LazyTensor Class for XLATensor (#90363 ) Summary: This pull request makes some tweaks on LazyTensor class such that it's easier for XLATensor to inherit. 1. It replaces data_ptr() with data() which now returns a const shared_ptr& type. 2. It adds a temporary ctor to LazyTensor::Data such that XLATensor::Data can easily inherits it. 3. It moves LazyTensor(std::shared_ptr<Data>) and SetTensorData(at::Tensor) to protected for XLATensor to access. Test Plan: CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90363 Approved by: https://github.com/JackCaoG	2022-12-08 18:23:17 +00:00
Denis Vieriu	b71c710db1	Add additional tests for view slice tensors (#86282 ) Fixes https://github.com/pytorch/pytorch/issues/83995 and https://github.com/pytorch/pytorch/issues/84489 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86282 Approved by: https://github.com/kulinseth	2022-12-08 17:59:55 +00:00
PyTorch MergeBot	465005c1e0	Revert "Fix issue 38095 TODO in test_multiprocessing.py (#90335 )" This reverts commit cbb2d5af81dcfaf181db7e9083b9c41b29fdb4eb. Reverted https://github.com/pytorch/pytorch/pull/90335 on behalf of https://github.com/clee2000 due to somehow caused test_multiprocessing to timeout `cbb2d5af81` https://github.com/pytorch/pytorch/actions/runs/3645873711/jobs/6159998523	2022-12-08 17:12:10 +00:00
Driss Guessous	8ea90d926f	Add support to foreach torch empty for bfloat16s (#90437 ) # Summary When training a model with SGD(..., foreach=true) found that bfloat16 model was erroring with no cuda support for empty. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90437 Approved by: https://github.com/soumith	2022-12-08 17:02:06 +00:00
Bin Bao	d2ee94231e	[inductor] Fallback for index with None in the middle of indices (#90022 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90022 Approved by: https://github.com/ngimel	2022-12-08 16:18:57 +00:00
Ankur Verma	b62cfbca84	Remove TORCH_API from inline at::internal::lazy_init_num_thread (#89511 ) The function signature in its current state is ambiguous. Its an inline function that is also declared to be imported from the DLL. which leaves it subject to compilers decision to choose one or the other and depending on what the compiler/linker may choose we may get one of the two behaviors for the `aten::init_num_threads` call: 1. Once-per-dll-in-a-thread (if its inlined) 2. Once-per-thread (if its imported) I suspect once-per-dll-in-a-thread is already the case currently because it being tagged inline So removing the inline will simply make it a little more consistent and clear. The function exists to avoid repeated calls to aten::init_num_threads. Being in an "internal" namespace, the function isnt expected to be called by external plugins which means that the "once-per-dll-in-a-thread" behavior isn't that much of a problem anyway Pull Request resolved: https://github.com/pytorch/pytorch/pull/89511 Approved by: https://github.com/malfet	2022-12-08 16:18:38 +00:00
Rohan Varma	793a999ce0	Hybrid Sharded Data Parallel (#89915 ) Adds 2 new hybrid sharding strategy to FSDP: 1. HYBRID_SHARD: applies zero-3 style sharding within a node, and data parallel across 2. HYBRID_SHARD_ZERO2: applies zero-2 style sharding within a node, and data parallel across These are useful for medium sized models and aim to decrease communication volume, tests and benchmarks will be run to understand which workloads are optimal under which sharding strategy. Hybrid sharding in general works by sharding the model using a process group within a single node, and creating intra-node process groups for replication / data parallelism. The user either needs to pass in a tuple of these process groups, or None, and we generate the process groups appropriately. Acknowledgements - @awgu 's excellent prototype: `5ad3a16d48` - @liangluofb For ideation, feedback, and initial implementation and experimentation Pull Request resolved: https://github.com/pytorch/pytorch/pull/89915 Approved by: https://github.com/awgu	2022-12-08 16:18:03 +00:00
Peter Bell	454361435c	Implement correction argument in torch.masked.{std,var} (#87118 ) This makes the signature of `torch.masked.std` and `var` more consistent with the global namespace variant and also updates the sample inputs to repurpose the existing `sample_inputs_std_var` inputs which fully exercise the `correction` argument. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87118 Approved by: https://github.com/cpuhrsch	2022-12-08 15:59:09 +00:00
Andrew Gu	a6593d6622	[Composable API][Easy] Use `policy=None` since that is supported (#90400 ) I believe that @mrshenli used `ModuleWrapPolicy({UnitModule})` when applying `fully_shard` to `UnitModule`s because `policy=None` was not supported. However, he added that support in a previous PR, so this PR simplifies to using `policy=None` to make the intention more clear. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90400 Approved by: https://github.com/mrshenli	2022-12-08 15:55:20 +00:00
Andrew Gu	21a0e809c2	[Composable API] Match `fully_shard()` comm. schedule with wrapper FSDP (#90387 ) - This PR introduces a new concept, the _communication module_ (denoted `comm_module`), that represents the module responsible for the unshard/reshard pair for a `FlatParamHandle`. This is well-defined because the current design assumes that each `FlatParamHandle` only has _one_ unshard/reshard pair for either the forward or backward pass. - For the wrapper code path, the `comm_module` is exactly the module already being passed to the `FlatParamHandle` constructor. - For the composable code path, the `comm_module` is not necessarily the module already being passed to the `FlatParamHandle`. This is because the module already being passed is always the local FSDP root module to give complete FQNs, instead of local FQNs. Distinguishing the communication module from the local FSDP root module can provide more flexibility for non-recursive wrapping designs in the future. - This PR adds a unit test `test_unshard_reshard_order` that explicitly checks that `_unshard` and `_reshard` are called in the exactly the same order across the two code paths. - This PR does not fix `test_checkpoint_fsdp_submodules_use_reentrant`. However, the error message changes, so this PR accommodates that. - The error is now the same as if we used the equivalent wrapper FSDP: ``` test_model.u1 = FSDP(test_model.u1, use_orig_params=True) test_model.u2 = FSDP(test_model.u2, use_orig_params=True) ``` - The error is also the same as if we used wrapper FSDP with `use_orig_params=False`, so it is not unique to `use_orig_params=True`. --- `comm_module` Example ``` model = Model( seq1: nn.Sequential( nn.Linear nn.ReLU nn.Linear nn.ReLU ) seq2: nn.Sequential( nn.Linear nn.ReLU nn.Linear nn.ReLU ) ) policy = ModuleWrapPolicy({nn.Sequential}) fully_shard(model, policy=policy) FullyShardedDataParallel(model, auto_wrap_policy=policy) ``` - This policy constructs two `FlatParamHandle`s, one for `seq1` and one for `seq2`. - `FullyShardedDataParallel` will pass `seq1` and `seq2` as the `module` argument to the two `FlatParamHandle`s, respectively. - `fully_shard()` will pass `model` as the `module` argument to every `FlatParamHandle`. - `FullyShardedDataParallel` will pass `seq1` and `seq2` as the `comm_module` argument to the two `FlatParamHandle`s, respectively. - `fully_shard()` will pass `seq1` and `seq2` as the `comm_module` argument to the two `FlatParamHandle`s, respectively. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90387 Approved by: https://github.com/mrshenli	2022-12-08 15:55:20 +00:00
Andrew Gu	4011597dd4	[Composable API] Refactor `test_fully_shard.py` to use common models (#90386 ) Unlike for FSDP, where we already diverged to using per-test-file models, let us try to use the same set of models for the composable API effort. This can improve debugging efficiency because we know which module structures we support and which we do not _across all of our composable APIs_. This PR had to perform some surgery for `test_materialize_meta_module`. Writing a correct parameter initialization function for meta device initialization is not easy, and we should revisit this. The old implementation, which followed the style of the previous unit tests--namely, using `module.to_empty()`--is actually incorrect for nested FSDP applications because `module.to_empty()` will re-initialize already materialized parameters and the module materialization proceeds bottom up. The existing unit test in `test_fsdp_meta.py` passes because it sets every parameter to ones (`self.weight.fill_(1)`), which is idempotent to re-initialization. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90386 Approved by: https://github.com/mrshenli	2022-12-08 15:32:36 +00:00
Andrew Gu	5ca4e95f6c	[Composable API] Move test models to common file (#90385 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90385 Approved by: https://github.com/mrshenli	2022-12-08 15:32:36 +00:00
Bin Bao	3fdb5f2dda	[inductor] Use decomposition for _to_copy (#90314 ) Summary: also contains a fix for https://github.com/pytorch/pytorch/issues/89633 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90314 Approved by: https://github.com/ngimel	2022-12-08 15:25:44 +00:00
yanbing-j	dc40b6d043	Upgrade oneDNN to v2.7.2 (#90051 ) This PR is to upgrade oneDNN to v2.7.2. ### oneDNN v2.7.1 & 2.7.2 changes: Fixes #89104 Updated ITT API version to 3.23.0 ### Performance Benchmark Use TorchBench test in ICX with 40 cores Intel OpenMP & tcmalloc were preloaded ![image](https://user-images.githubusercontent.com/61222868/205240855-04e2d50f-8b3a-4097-9038-fdd0c0fc93b9.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90051 Approved by: https://github.com/XiaobingSuper, https://github.com/jgong5	2022-12-08 09:41:02 +00:00
Till Hoffmann	b485781440	Add a transform for positive-definite matrices. (#76777 ) The `PositiveDefiniteTransform` is required to transform from an unconstrained space to positive definite matrices, e.g. to support testing the Wishart mode in #76690. It is a simple extension of the `LowerCholeskyTransform`. I've also added a small test that ensures the generated data belong to the domain of the associated transform. Previously, the data generated for the inverse transform of the `LowerCholeskyTransform` wasn't part of the domain, and the test only passed because the comparison uses `equal_nan=True`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76777 Approved by: https://github.com/lezcano, https://github.com/fritzo, https://github.com/soumith	2022-12-08 09:18:44 +00:00
Yuxin Wu	c00b135adf	Remove deprecated call to tf.io.gfile.get_filesystem (#89832 ) Fixes #30966 . Fixes #47139 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89832 Approved by: https://github.com/soumith	2022-12-08 08:53:27 +00:00
Yuxin Wu	ecd784667c	Avoid overflow in tensorboard image summary (#90423 ) Fix #90419 Added some code such that the test will update the expect files when `expecttest.ACCEPT` is True. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90423 Approved by: https://github.com/soumith	2022-12-08 08:31:52 +00:00
Jiewen Tan	1978773399	[LTC] Overlap data creation and ir_value setting (#90438 ) Summary: Upstreaming changes from torch_xla to lazy tensor core: https://github.com/pytorch/xla/pull/4011. It overlaps data creation and ir_value setting with previous executions. To be noted, this is a clone of https://github.com/pytorch/pytorch/pull/87119, and the author is @aws-rhsoln. Test Plan: CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90438 Approved by: https://github.com/JackCaoG	2022-12-08 08:11:01 +00:00
Rohan Varma	9c80f13692	[Resubmit] state_dict_pre_hook (#90435 ) Resubmit of https://github.com/pytorch/pytorch/pull/88541 which got stale. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90435 Approved by: https://github.com/fegin	2022-12-08 07:54:14 +00:00
Jesse Cai	de016b3799	[pruning][core][feature] Implement prune for structured pruning (#89777 ) Summary: This PR implements `prune` in BaseStructuredSparsifier: `prune` is a function that takes in a model with structured sparsity parametritizations (the result of `prepare`) and will return a resized model with the masked out weights removed. `prune` is defined by a mapping from patterns to different pruning functions. - patterns are just sequences of operations, for example `(nn.Linear, activation, nn.Linear)` - pruning functions are functions that take in an matched pattern as args and will resize the appropriate layer sizes and weights. ``` def prune_linear_activation_linear(linear1, activation, linear2): pass ``` - This is one line in the pattern config `(nn.Linear, activation, nn.Linear): prune_linear_activation_linear` At a high level `prune` works by finding instances of the graph that match different patterns and then calling the mapped pruning functions on those matched patterns. This is unlike the previous code which attempted to do both at the same time. There may be some gaps in the patterns compared to the previous implementation, but the conversion functionality support should be the same. Currently we have pruning functions for the following patterns: - linear -> linear - linear -> activation -> linear - conv2d -> conv2d - conv2d -> activation -> conv2d - conv2d -> activation -> pool -> conv2d - conv2d -> pool -> activation -> conv2d - conv2d -> adaptive pool -> flatten -> linear Added in MyPy type hints as well for the prune_functions. Test Plan: Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/89777 Approved by: https://github.com/vkuzo	2022-12-08 07:13:24 +00:00
Jiewen Tan	c20d41253f	[LTC] Tweak LazyGraphExecutor for XLA (#90420 ) Summary: This patch moves some of the data structures from private to protected such that XLAGraphExecutor can reuse them. Test Plan: CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90420 Approved by: https://github.com/JackCaoG	2022-12-08 06:56:23 +00:00
fduwjj	1a48ae96ba	[PT-D][Easy] Reformat the optim code within PTD code base (#90399 ) Just run two commands: ``` ufmt format torch/distributed/optim/ ufmt format test/distributed/optim/ ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/90399 Approved by: https://github.com/awgu	2022-12-08 06:38:59 +00:00
Sergii Dymchenko	cbb2d5af81	Fix issue 38095 TODO in test_multiprocessing.py (#90335 ) Fix TODO related to https://github.com/pytorch/pytorch/issues/38095 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90335 Approved by: https://github.com/clee2000	2022-12-08 06:27:08 +00:00
titaiwang	06c98e673f	[ONNX] Fix ignored small eps in layer normalization in fp16 (#89869 ) Prior to this change, the symbolic_fn `layer_norm` (before ONNX version 17) always lose precision when eps is smaller than Float type, while PyTorch always take eps as Double. This PR adds `onnx::Cast` into eps related operations to prevent losing precision during the calculation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89869 Approved by: https://github.com/BowenBao	2022-12-08 06:13:09 +00:00
PyTorch MergeBot	5f3ca208c5	Revert "add save and load stats in memory_tracker (#90144 )" This reverts commit 1f137c1e2f738d9021b5e22fb6e52d41b780a1a8. Reverted https://github.com/pytorch/pytorch/pull/90144 on behalf of https://github.com/ezyang due to dirty git working copy broke master	2022-12-08 05:16:56 +00:00
PyTorch MergeBot	22a249e44e	Revert "[Inductor] More robust stride and offset extraction from index expressions (#90184 )" This reverts commit 71f27f768839394ec226c37a763bd524d8589f07. Reverted https://github.com/pytorch/pytorch/pull/90184 on behalf of https://github.com/ngimel due to catastrophically regresses performance	2022-12-08 05:04:15 +00:00
Han Qi (qihqi)	25eb7c3ae3	Clean up dependancy for flatbuffer_loader (#86041 ) Test Plan: waitforsandcastle Differential Revision: D38445936 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86041 Approved by: https://github.com/cccclai	2022-12-08 03:48:04 +00:00
Edward Z. Yang	37892041a1	Always compile tiny graphs with AOTAutograd (#89775 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89775 Approved by: https://github.com/anjali411, https://github.com/bdhirsh	2022-12-08 03:41:29 +00:00
Iris	b8b7480065	[Checkpoint][2D][6/N] Add optimizer and update default_planner to core distributed (#90212 ) This is the last PR for integrating 2D into core distributed. This PR does the following: 1. Add optimizer.py: this adds ability to load a state_dict in conjunction with FSDP sharded optimzer state. 2. Update default_planner.py to support 2D checkpoint. 3. Add test_fsdp_optim_state.py as a unit test for No. 1. 4. Fix bug in torch/testing/_internal/distributed/checkpoint_utils.py 5. Rename the filename for the APIs that should be private. Will organize and cleanup further in following PRs. #90328 Docstring and integration test will be added in the following PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90212 Approved by: https://github.com/wanchaol	2022-12-08 02:53:29 +00:00

1 2 3 4 5 ...

54656 Commits