pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-21 05:34:18 +08:00

Author	SHA1	Message	Date
Yanli Zhao	d8bafd23ab	[DDP] add one option to allow skipping all reduce unused parameters (#151503 ) Summary: add one option to allow skipping all reduce unused parameters, this could help improve training throughput significantly when the number of unused parameters is large in the model. Test Plan: unit tests, CI Differential Revision: D72282069 Pull Request resolved: https://github.com/pytorch/pytorch/pull/151503 Approved by: https://github.com/mrshenli	2025-04-17 23:30:19 +00:00
cyy	15635b14ce	[4/N] Remove unnecessary once flag usage (#146783 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/146783 Approved by: https://github.com/albanD	2025-02-11 13:55:06 +00:00
cyy	b4c0973b59	[2/N] Apply bugprone-unchecked-optional-access (#141091 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/141091 Approved by: https://github.com/Skylion007, https://github.com/albanD Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>	2024-12-09 19:30:19 +00:00
cyyever	ce631939f0	[Distributed] [18/N] Fix clang-tidy warnings in torch/csrc/distributed/ (#138692 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/138692 Approved by: https://github.com/ezyang	2024-10-25 05:32:38 +00:00
cyy	f4dcf2ae93	[1/N] Change #include <c10/util/Optional.h> to #include <optional> (#128301 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/128301 Approved by: https://github.com/ezyang, https://github.com/r-barnes	2024-07-08 07:03:53 +00:00
PyTorch MergeBot	846bb30e13	Revert "[1/N] Change #include <c10/util/Optional.h> to #include <optional> (#128301 )" This reverts commit bd72e28314d8d63bb347becb8309f5ac7761c6b5. Reverted https://github.com/pytorch/pytorch/pull/128301 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it fails XLA build `bd72e28314`. Please rebase your PR before relanding because I think the failure is hidden by an unrelated broken trunk XLA failure from your current base commit ([comment](https://github.com/pytorch/pytorch/pull/128301#issuecomment-2169035822))	2024-06-15 01:58:20 +00:00
cyy	bd72e28314	[1/N] Change #include <c10/util/Optional.h> to #include <optional> (#128301 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/128301 Approved by: https://github.com/ezyang	2024-06-14 23:21:01 +00:00
Richard Barnes	ed327876f5	[codemod] `c10:optional` -> `std::optional` (#126135 ) Generated by running the following from PyTorch root: ``` find . -regex ".*\.$cpp\\|h\\|cu\\|hpp\\|cc\\|cxx$$" \| grep -v "build/" \| xargs -n 50 -P 4 perl -pi -e 's/c10::optional/std::optional/' ``` `c10::optional` is just an alias for `std::optional`. This removes usages of that alias in preparation for eliminating it entirely. Pull Request resolved: https://github.com/pytorch/pytorch/pull/126135 Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/albanD, https://github.com/aaronenyeshi	2024-05-14 19:35:51 +00:00
Shuqiang Zhang	443444dc7f	[c10d] Add generic scuba logging capability into c10d (#121859 ) Summary: This diff tries to periodically (e.g., every 30s) log critical collective progress status to scuba table, starting from a few metric such as last enequeued seq id. With the Scuba table, it is our hope that we can easily detect the straggler of a PG, E.g., the rank that has not progressed it seq_ for X seconds while other ranks in the same PG have a larger seq_ The implementation needs to make sure that Scuba will be used only for FB internal use cases. For OSS, we still provide a generic logger data struct and logger that can be easily extended. If users do not register the logger, nothing will be logged. Test Plan: Re-use the existing unit test for fb side of operations, such as test_register_and_dump in test_c10d_manifold and change the dump period to a very small number, e.g., 1ms, verified that the loggs are correctly shown in scuba table: https://fburl.com/scuba/c10d_work_update/9trhwnmy Reviewed By: wconstab Differential Revision: D54556219 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121859 Approved by: https://github.com/wconstab	2024-03-14 16:03:45 +00:00
Pavan Balaji	94faba5224	[nccl-pg] Revert accidental renaming of env variables (#115082 ) Summary: In [9cc040fef64154a2424b2ccd2c0909641e245cf0], we accidentally changed some of the environment variable names to the non-deprecated form. The intent was to support both the deprecated and the new form of the env variables (with a warning thrown for the deprecated form). Test Plan: OSS CI Reviewers: Subscribers: Tasks: Tags: Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/115082 Approved by: https://github.com/zdevito	2023-12-05 14:52:30 +00:00
Chip Turner	9cc040fef6	Switch env variable use in test harnesses to the non-deprecated names to fix warnings (#114880 ) Previously: ``` [W Utils.hpp:133] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function getCvarInt) [W Utils.hpp:133] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function getCvarInt) ``` With this PR, those warnings disappear. They were introduced in #114077 This change was generated with this sed script, applied with `sed -i -f /tmp/x */.{py,hpp,cpp,cc}` and hand inspected. ``` s/\bNCCL_BLOCKING_WAIT\b/TORCH_NCCL_BLOCKING_WAIT/g s/\bNCCL_ENABLE_TIMING\b/TORCH_NCCL_ENABLE_TIMING/g s/\bNCCL_DESYNC_DEBUG\b/TORCH_NCCL_DESYNC_DEBUG/g s/\bNCCL_ASYNC_ERROR_HANDLING\b/TORCH_NCCL_ASYNC_ERROR_HANDLING/g s/\bENABLE_NCCL_HEALTH_CHECK\b/TORCH_ENABLE_NCCL_HEALTH_CHECK/g s/\bNCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK\b/TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK/g ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/114880 Approved by: https://github.com/kwen2501	2023-12-01 20:08:23 +00:00
Pavan Balaji	8f8722e3f1	[nccl-pg] Avoid using NCCL_ prefix for non-NCCL env variables (#114077 ) NCCL_ prefix should only be used for NCCL library's environment variables. We currently use a few environment variables in PyTorch with the NCCL_ prefix that are the NCCL library does not understand. This patch renames such environment variables to use the TORCH_NCCL_ prefix instead. We still maintain the old NCCL_ variables, but throw a warning when they are used. The following env changes have been made: `NCCL_BLOCKING_WAIT` -> `TORCH_NCCL_BLOCKING_WAIT` `NCCL_ENABLE_TIMING` -> `TORCH_NCCL_ENABLE_TIMING` `NCCL_DESYNC_DEBUG` -> `TORCH_NCCL_DESYNC_DEBUG` `NCCL_ASYNC_ERROR_HANDLING` -> `TORCH_NCCL_ASYNC_ERROR_HANDLING` `ENABLE_NCCL_HEALTH_CHECK` -> `TORCH_ENABLE_NCCL_HEALTH_CHECK` `NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK` -> `TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK` Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/114077 Approved by: https://github.com/fduwjj	2023-11-21 07:23:42 +00:00
Pavan Balaji	958f3b0df6	[nccl-pg] Migrate to getCvar* functions for env variable checking (#113797 ) Summary: The getCvar* functions allow us to provide multiple environment variables for the same value. This allows us to deprecate some variables in favor of others, while still allowing users to temporarily use the old variables for some time. Test Plan: OSS CI Reviewed By: fduwjj, XilunWu Differential Revision: D51225487 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/113797 Approved by: https://github.com/fduwjj	2023-11-19 03:48:58 +00:00
Jeeja	c6d8d10b3e	Fix warning if backend registers timer (#91702 ) currently logger timer is registered default for cpu/cuda. for other backends, it may or may not registers this timer. It reports warning for other backends and return which is not expected. The above may fail, if the backends has have registered this timer. For example, HPU(habana) backend registers this timer. so, in this case it reports a warning and return which is incorrect. Other case is where lazy backend timer is never registered. so, this returns a warning, and this is the reason the check was added, but it fails for other cases. Add a generic check if the timer is registered, then don’t report warning. Signed-off-by: Jeeja <jeejakp@habana.ai> Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/91702 Approved by: https://github.com/kit1980	2023-02-21 14:09:47 +00:00
Aaron Gokaslan	0247ed27cc	Apply Clang-Tidy readability-container-size-empty (#93236 ) Not only is this change usually shorter and more readable, it also can yield better performance. size() is not always a constant time operation (such as on LinkedLists), but empty() always is. Pull Request resolved: https://github.com/pytorch/pytorch/pull/93236 Approved by: https://github.com/malfet	2023-01-29 23:28:19 +00:00
Aaron Gokaslan	97db9fde69	Fix header-filter for clang-tidy c10 and apply some fixes to c10 and … (#91178 ) …c10d Fixes a broken header filters from #90699 and applies a few more clang-tidy fixes that are relevant from c10 and c10d. The header filter pattern was actually broken and the clang-tidy include pattern was redundant. Also fixed a few bugs in torch/distributed/c10d Pull Request resolved: https://github.com/pytorch/pytorch/pull/91178 Approved by: https://github.com/ezyang	2022-12-27 07:34:12 +00:00
Howard Huang	7a0f29b776	Allow Process Group to support multiple backends (#88330 ) (#90997 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/88330 ### Implementation Move backend-specific (NCCL, Gloo, etc) collective implementations to corresponding `Backend` class. Update ProcessGroup to support multiple backends and use dispatcher to calls backends based on tensor device type. ### Changes #### c++ changes (ProcessGroup files, `Ops.cpp`, `init.cpp`) - Update pybind definitions for new process group base class and new backend class - Update pybinded backend class with collective definitions to keep BC with Python PG instances (e.g. `dist.ProcessGroupGloo`, `dist.ProcessGroupNCCL`) which are used in tests - Switch `ProcessGroupGloo`, `ProcessGroupNCCL`, `ProcessGroupMPI`, `ProcessGroupUCC` to derive from the `Backend` class. - Update CPU/CUDA `Ops.cpp` and `OpsImpl.cpp` to perform this dispatching by querying the backend using the device type - Update internal dispatched implementation of `barrier` to use a tensor which allows operation to be dispatched. - Update `allgather` collective to use `TensorList`. For some reason it was using the default implementation of `allgather` rather than dispatching it correctly. I still don't understand why and had originally filed an issue in 85122. #### python changes (`distributed_c10d.py`, test files) - Add BackendConfig class to specify the default configurations of backends and `get_backend_config()` API - `get_backend()` deprecation warning - `init_process_group` how returns a generic `ProcessGroup` object, it contains a list of backends (the ones stated above) which it will dispatch operations to. - `new_group` updated to return the same as above - Update `test_c10d_gloo.py`, Update `DistributedDataParallelTest` to use `init_process_group`, Update `ReducerTest`, update `test_broadcast_coalesced_gloo` to move from PG instance and gloo options - Update `test_c10d_nccl.py`, Update `DistributedDataParallelTest` to use `init_process_group` - Specific tests updated: `test_Backend_enum_class` ### Changes missing - lazy initialization of backends - support parsing of BackendConfig ### open questions - Pure Python PG extensions (https://github.com/pytorch/pytorch/pull/66338) # Example This is a basic script (using 2 backends within a process group) ```python # python -m torch.distributed.run --nnodes=1 --nproc_per_node=2 basic_scenario.py import torch.distributed as dist import torch import os if __name__ == "__main__": rank = os.environ.get("RANK") # initialize with both gloo and nccl dist.init_process_group() # with gloo dist.all_reduce(torch.tensor([1.0])) print(f"Rank {rank} finished") # with nccl dist.all_reduce(torch.tensor([1.0], device=f"cuda:{rank}")) ``` Test Plan: Imported from OSS Differential Revision: D42069829 Pulled By: H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/90997 Approved by: https://github.com/awgu, https://github.com/fduwjj	2022-12-16 23:15:00 +00:00
Min Si	1ad0048b64	Refactor distribuetd to use absolute header path (#85780 ) Headers under torch/csrc/distributed may be referened with relative path, e.g., "<c10d/...>". However, relative path cannot be gracefully handled by Meta internal build when the NCCL PG is hipified to support AMD/RCCL because the "hipified" header files are generated in other directories. Moreover, using absolute path for header inclusion is the state-of-the-art in most components in Pytorch. Thus, this patch refactors all header paths in torch/csrc/distributed to be absolute. See D39835774 for more details about Meta internal complication. How to test: commit 9e5d199 removes -I./torch/csrc/distributed in compile options. Thus use it to verify we don't miss any relative path use of torch/csrc/distributed headers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85780 Approved by: https://github.com/kumpera, https://github.com/huydhn	2022-09-30 05:13:50 +00:00
PyTorch MergeBot	a50d8864fc	Revert "Refactor distribuetd to use absolute header path (#85780 )" This reverts commit 668082718aefce95ecc1b1c312ea6f127b2c662e. Reverted https://github.com/pytorch/pytorch/pull/85780 on behalf of https://github.com/huydhn due to Sorry for reverting your PR but it breaks build due to a missing file <c10d/Store.hpp>	2022-09-30 02:04:29 +00:00
Min Si	668082718a	Refactor distribuetd to use absolute header path (#85780 ) Headers under torch/csrc/distributed may be referened with relative path, e.g., "<c10d/...>". However, relative path cannot be gracefully handled by Meta internal build when the NCCL PG is hipified to support AMD/RCCL because the "hipified" header files are generated in other directories. Moreover, using absolute path for header inclusion is the state-of-the-art in most components in Pytorch. Thus, this patch refactors all header paths in torch/csrc/distributed to be absolute. See D39835774 for more details about Meta internal complication. How to test: commit 9e5d199 removes -I./torch/csrc/distributed in compile options. Thus use it to verify we don't miss any relative path use of torch/csrc/distributed headers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85780 Approved by: https://github.com/kumpera	2022-09-30 00:27:24 +00:00
jjsjann123	9e86796fe3	simple c10 implementation for std::call_once (#78051 ) A long standing bug on std::call_once: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66146 It could hang during re-entry after an exception handling. Added a c10 implementation yielding a bulky mutex. Not the most efficient thing but at least it shouldn't hang. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78051 Approved by: https://github.com/albanD	2022-06-28 15:47:03 +00:00
Michael Suo	30fb2c4aba	[lint] autoformat test/cpp and torch/csrc Let's have some fun. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78828 Approved by: https://github.com/ezyang	2022-06-11 21:11:16 +00:00
yanlizhao	1c35f37c9f	remove bucket_size_limit property from bucket struct Pull Request resolved: https://github.com/pytorch/pytorch/pull/73731 during rebuilting bucket, in addition to sync bucket_indice, per_bucket_limits should be synced as well before calling initialize_buckets(). Syncing per_bucket_limits will increase communicaton volume as well increasing code complexity, after taking a further look at the codes, per_bucket_limits used inside initialize_buckets() is actually not useful, it assigns bucket_size_limit property to bucket struct, but the property is not used anywhere. So it is good to remove this property and avoid syncing per_bucket_limits. Differential Revision: [D34605513](https://our.internmc.facebook.com/intern/diff/D34605513/) Approved by: https://github.com/rohan-varma	2022-04-25 15:30:09 +00:00
Jiewen Tan	1c0a01e709	[Distributed] Add a guard for non CPU/CUDA devices Summary: Adds a guard in Logger such that we won't hit the reducer_->timer_ assertion for any non CPU/CUDA devices such as lazy when they try to integrate with DDP or any other distributed APIs. Test Plan: WIP. Pull Request resolved: https://github.com/pytorch/pytorch/pull/75247 Approved by: https://github.com/wanchaol	2022-04-15 23:26:50 +00:00
Andrew Gu	a37d54b6d1	[Easy][c10d][DDP] (Reland) Minor fixes (#73569 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73569 Reland https://github.com/pytorch/pytorch/pull/73299 and https://github.com/pytorch/pytorch/pull/73318 together. Test Plan: Imported from OSS Reviewed By: zhaojuanmao Differential Revision: D34552418 Pulled By: awgu fbshipit-source-id: 95088d2c1c67cd4fb9bbb115e15ba6b26ae06bdb (cherry picked from commit 695ebc3dc0ccb08a167445588c293b3a6c3c00b7)	2022-03-03 14:30:54 +00:00
Andrew Gu	7cce2d9dbb	[DDP][BE] (Reland) Remove bucket replicas (#73567 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73567 Test Plan: Imported from OSS Reviewed By: zhaojuanmao Differential Revision: D34552126 Pulled By: awgu fbshipit-source-id: b10ad288e3856caf91ffd7a217db7fa16a000dba (cherry picked from commit 4625a7e869cd17199fbcd1e21d142b48aa340b94)	2022-03-03 14:30:54 +00:00
Yanli Zhao	a9c1c205d3	Back out "[DDP][BE] Remove bucket replicas" (#73523 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73523 Original commit changeset: de8124e1628d Original Phabricator Diff: D34399176 (`7fcf6c5c02`) ghstack-source-id: 150128616 Test Plan: unit tests Reviewed By: rohan-varma Differential Revision: D34527919 fbshipit-source-id: 00914eeb6c84188b89d8fa8b32f8a4bf7c58774f (cherry picked from commit e29aa43e36aa096e4a36de1fb7c477190a479cdf)	2022-03-01 04:35:29 +00:00
Yanli Zhao	fb6977cbd5	Back out "[DDP][BE] Fix clang-tidy" (#73522 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73522 Original commit changeset: 41bcb2b9f617 Original Phabricator Diff: D34422578 (`590685dc6e`) ghstack-source-id: 150128504 Test Plan: unit tests Reviewed By: rohan-varma Differential Revision: D34527890 fbshipit-source-id: 33ca3c4d66cbae29a9d05e45e5886a4bd2c55b02 (cherry picked from commit 0e7f7299a2f851a899c667cdba7d4e6cb8f84fde)	2022-03-01 04:35:29 +00:00
Andrew Gu	590685dc6e	[DDP][BE] Fix clang-tidy (#73299 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73299 Test Plan: Imported from OSS Reviewed By: george-qi Differential Revision: D34422578 Pulled By: awgu fbshipit-source-id: 41bcb2b9f617d7a8c446e72250d415e05f8e8b31 (cherry picked from commit 0e534171f05dbc55c4b4496888b25ab1a494b97c)	2022-02-25 15:37:29 +00:00
Andrew Gu	7fcf6c5c02	[DDP][BE] Remove bucket replicas (#73237 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73237 Test Plan: Imported from OSS Reviewed By: rohan-varma, mrshenli Differential Revision: D34399176 Pulled By: awgu fbshipit-source-id: de8124e1628d28ccc151ad4cbf2429e758afd5c3 (cherry picked from commit 647ba64858feb0f8e222f0a4f734e855d703f786)	2022-02-25 15:37:29 +00:00
Can Balioglu	e1db2f13ce	Refactor TORCH_DISTRIBUTED_DEBUG implementation (#73166 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73166 This PR refactors, cleans up, and optimizes the implementation of `TORCH_DISTRIBUTED_DEBUG`. It also introduces three new user APIs: `get_debug_level()`, `set_debug_level()`, and `set_debug_level_from_env()` to retrieve and modify the debug level after a process has started. ghstack-source-id: 149778566 Test Plan: Run the existing unit tests. Reviewed By: rohan-varma Differential Revision: D34371226 fbshipit-source-id: e18443b411adcbaf39b2ec999178c198052fcd5b (cherry picked from commit 26d6bb1584b83a0490d8b766482656a5887fa21d)	2022-02-24 02:33:05 +00:00
Rohan Varma	4feef6c970	Log static graph in constructor if it is set (#72456 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72456 It is easier to log if static graph is set at construction time now that it is natively supported in DDP constructor, as opposed to waiting for the first iteration to finish. In some failure cases we're seeing the first iteration does not finish and thus we don't have this data which is vaulable to debug. ghstack-source-id: 148840679 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D34045204 fbshipit-source-id: 72a187c1ce031db217de4b3ad20a64f2a74995bc (cherry picked from commit 1d622c88f3571e47209dc754aafc58fd8c0bb89d)	2022-02-11 15:55:09 +00:00
Rohan Varma	37651894f9	[Easy] Small DDP fixes (#72455 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72455 - Improve helper function - Improve/fix some logging ghstack-source-id: 148840678 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D34044865 fbshipit-source-id: d2ae820effaaaecdd7155ffa8d3a1d8ebbd9f39e (cherry picked from commit 3efbea8f41463fc815b80f3d07ba81dd6c894e11)	2022-02-11 15:55:09 +00:00
Rohan Varma	2ca552160b	[DDP] logging improvements (#67059 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67059 Debugging some workflows, and sometimes the training does not finish but I want to know whether the graph was not static. Also, log 0 for unused parameter size if no unused params were found. ghstack-source-id: 141428950 Test Plan: Ci Reviewed By: mrshenli Differential Revision: D31846669 fbshipit-source-id: 21763fcdc1b244ba829117da1f15b2271d966983	2021-10-26 13:18:00 -07:00
Rohan Varma	bff64e84cd	[DDP] Track models with sync bn (#66680 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66680 Closes https://github.com/pytorch/pytorch/issues/66215. Tracks models with sync BN so we can find workflows that use them and target for perf optimization. ghstack-source-id: 140875182 Test Plan: CI Reviewed By: pritamdamania87 Differential Revision: D31679477 fbshipit-source-id: 0e68cd1a7aabbc5b26227895c53d33b8e98bfb8e	2021-10-18 22:31:52 -07:00
Rohan Varma	480a1a88d6	[DDP] Log iteration in debug mode (#65770 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65770 This logging info is printed out in debug mode, make it log the iteration as well for clarity. ghstack-source-id: 139838595 Test Plan: CI Reviewed By: zhaojuanmao, wayi1 Differential Revision: D31222132 fbshipit-source-id: 14519aae1ba0b2a35b4b962e7d1a957c9142c8f8	2021-10-06 14:36:07 -07:00
Jessica Choi	158b8bdc8a	Cleaning up DDP SPMD in reducer.cpp (#64113 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64113 Since there is only one model replica per process, `replicas` can be simplified from `std::vector<std::vector<at::Tensor>>` to `std::vector<at::Tensor>` in the Reducer class. Test Plan: All tests are passing `pytest test/distributed/test_c10d_gloo.py -vs` Imported from OSS Reviewed By: mrshenli Differential Revision: D30615965 fbshipit-source-id: d2ec809d99b788c200b01411333e7dbad1269b51	2021-09-21 16:13:18 -07:00
Rohan Varma	d0cb26ba57	[DDP] Fix logging iterations (#64411 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64411 These are not actually the training iterations, but are offset by how frequently DDP stats collection actually runs (default being kDDPRuntimeLoggingSampleRate = 100). So with this change, they are actually logged to scuba every: 10, 10 * 100, 40 * 100, etc iterations. Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D30718274 fbshipit-source-id: 146bd2428753c93363bee37e487f40104fce3c18	2021-09-02 12:35:01 -07:00
Rohan Varma	4d6314a16e	[DDP] Log num threads (#64072 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64072 Log gloo threads to DDP logging. ghstack-source-id: 137119480 Test Plan: CI Reviewed By: mrshenli Differential Revision: D30596083 fbshipit-source-id: 2b4f6e762cb5d850be6056bcc5922029a1af3c91	2021-09-01 18:36:15 -07:00
Rohan Varma	baceea4426	[DDP] Add more logging iterations (#64071 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64071 Adding more logging iterations to get additional data. ghstack-source-id: 137119476 Test Plan: CI Reviewed By: mrshenli Differential Revision: D30579367 fbshipit-source-id: 57195266ada5e5926f0d8eaf4fb4e01dc98924d7	2021-09-01 17:32:17 -07:00
Rohan Varma	44fad84bca	[DDP] Add host-side time to CUDATimer (#62770 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62770 Adding timing of forward, backward comp, backward comm, etc will help detect desynchronization issues. ghstack-source-id: 135195680 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D30115585 fbshipit-source-id: 509bf341c5c92dcc63bdacd3c1e414da4eb4f321	2021-08-06 18:41:40 -07:00
Rohan Varma	9ac56ef0fc	[DDP] log gradient ready order and bucket indices (#62751 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62751 This will help us determine whether gradient ready order and bucket indices are aligned amongst all the ranks. This should always be true for rank 0 as we determine rebuilt bucket order by the gradient ready order on rank 0, but would be interested to see this on different workloads for other ranks ghstack-source-id: 135104369 Test Plan: CI Reviewed By: SciPioneer, wanchaol Differential Revision: D30111833 fbshipit-source-id: a0ab38413a45022d953da76384800bee53cbcf9f	2021-08-05 16:36:25 -07:00
Rohan Varma	4d5607bb25	[Reland][DDP] log bucket sizes (#62625 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62625 reland of https://github.com/pytorch/pytorch/pull/62232 which ran into a land race. Test Plan: ci Reviewed By: SciPioneer Differential Revision: D30058217 fbshipit-source-id: 1454dd481e630f3de9ec6111b3f2e18cd8976091	2021-08-03 10:55:46 -07:00
Eli Uriegas	bd9f35313a	Revert D29922299: [DDP] log bucket sizes Test Plan: revert-hammer Differential Revision: D29922299 (`5429f68f00`) Original commit changeset: 538b331c96e7 fbshipit-source-id: 3595fe04e8dea38bc9d05e8c70f2dcd2ad496ced	2021-07-30 20:27:36 -07:00
Rohan Varma	5429f68f00	[DDP] log bucket sizes (#62232 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62232 Logs the bucket sizes in DDP logging so that we know which workflow ran with what bucket size config. Will be used to verify how changing bucket sizes in DDP affects perf. Based on the test, we can see inconsistency where the "first" bucket size actually is (last before rebuild buckets, first after). ghstack-source-id: 134663867 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D29922299 fbshipit-source-id: 538b331c96e77048164ad130b377433be100a761	2021-07-30 18:07:04 -07:00
Luca Wehrstedt	a016150163	Move torch/lib/c10d to torch/csrc/distributed/c10d (#60543 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60543 Since now c10d is part of libtorch, it would also be nice if the sources lived all in one place. ghstack-source-id: 132306292 Test Plan: It builds Reviewed By: cbalioglu Differential Revision: D29062002 fbshipit-source-id: d9e1301e9d73e1643fa0f0119cd2d618f1ad52e6	2021-06-24 12:38:51 -07:00

46 Commits