pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 12:54:11 +08:00

Author	SHA1	Message	Date
Ke Wen	d444384003	[SymmMem] Tiled reduce (#162243 ) Added op: `tile_reduce(Tensor input, Tensor(a!) out, int root, str group_name)` For now supports only: - NVSHMEM backed symmetric tensor; - 2D tensor and tile; - torch.float. Testing on right-bottom quandrant: ``` rank 0: tensor([[0., 0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 1., 1., 1., 1.], [0., 0., 0., 0., 1., 1., 1., 1.], [0., 0., 0., 0., 1., 1., 1., 1.], [0., 0., 0., 0., 1., 1., 1., 1.]], device='cuda:0') PASSED ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/162243 Approved by: https://github.com/ngimel	2025-10-08 02:03:04 +00:00
Jane Xu	2c9e07ecd2	[BE] Remove outdated RPC benchmark (#146716 ) We have lots of outdated unused + uncalled code in our codebase, namely in our benchmarks and examples folders among others. The last change to this directory was 4 years ago and this code looks dead. cc @albanD @H-Huang for feedback Pull Request resolved: https://github.com/pytorch/pytorch/pull/146716 Approved by: https://github.com/Skylion007, https://github.com/H-Huang	2025-03-29 04:44:36 +00:00
PyTorch MergeBot	99f2491af9	Revert "Use absolute path `path.resolve()` -> `path.absolute()` (#129409 )" This reverts commit 45411d1fc9a2b6d2f891b6ab0ae16409719e09fc. Reverted https://github.com/pytorch/pytorch/pull/129409 on behalf of https://github.com/jeanschmidt due to Breaking internal CI, @albanD please help get this PR merged ([comment](https://github.com/pytorch/pytorch/pull/129409#issuecomment-2571316444))	2025-01-04 14:17:20 +00:00
Xuehai Pan	45411d1fc9	Use absolute path `path.resolve()` -> `path.absolute()` (#129409 ) Changes: 1. Always explicit `.absolute()`: `Path(__file__)` -> `Path(__file__).absolute()` 2. Replace `path.resolve()` with `path.absolute()` if the code is resolving the PyTorch repo root directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/129409 Approved by: https://github.com/albanD	2025-01-03 20:03:40 +00:00
PyTorch MergeBot	cc4e70b7c3	Revert "Use absolute path `path.resolve()` -> `path.absolute()` (#129409 )" This reverts commit 135c7db99d646b8bd9603bf969d47d3dec5987b1. Reverted https://github.com/pytorch/pytorch/pull/129409 on behalf of https://github.com/malfet due to need to revert to as dependency of https://github.com/pytorch/pytorch/pull/129374 ([comment](https://github.com/pytorch/pytorch/pull/129409#issuecomment-2562969825))	2024-12-26 17:26:06 +00:00
Xuehai Pan	b77406a9ec	[BE][CI] bump `ruff` to 0.8.4 (#143753 ) Changes: 1. Bump `ruff` from 0.7.4 to 0.8.4 2. Change `%`-formatted strings to f-string 3. Change arguments with the `__`-prefix to positional-only arguments with the `/` separator in function signature. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143753 Approved by: https://github.com/Skylion007	2024-12-24 12:24:10 +00:00
Xuehai Pan	135c7db99d	Use absolute path `path.resolve()` -> `path.absolute()` (#129409 ) Changes: 1. Always explicit `.absolute()`: `Path(__file__)` -> `Path(__file__).absolute()` 2. Replace `path.resolve()` with `path.absolute()` if the code is resolving the PyTorch repo root directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/129409 Approved by: https://github.com/albanD	2024-12-24 08:33:08 +00:00
Tom Ritchford	498a7808ff	Fix unused Python variables outside torch/ and test/ (#136359 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/136359 Approved by: https://github.com/albanD	2024-12-11 17:10:23 +00:00
Xuehai Pan	267f82b860	[BE] Format `.ci/` / `.github/` / `benchmarks/` / `functorch/` / `tools/` / `torchgen/` with `ruff format` (#132577 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132577 Approved by: https://github.com/malfet	2024-10-11 18:30:26 +00:00
Xuehai Pan	c0ed38e644	[BE][Easy][3/19] enforce style for empty lines in import segments in `benchmarks/` (#129754 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129754 Approved by: https://github.com/ezyang	2024-07-17 14:34:42 +00:00
Aaron Gokaslan	6c2a8b6b38	[Ez][BE]: Enable new stable ruff rules (#129825 ) Applies a bunch of new ruff lint rules that are now stable. Some of these improve efficiency or readability. Since I already did passes on the codebase for these when they were in preview, there should be relatively few changes to the codebase. This is just more for future hardening of it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/129825 Approved by: https://github.com/XuehaiPan, https://github.com/jansel, https://github.com/malfet	2024-07-02 14:47:10 +00:00
Yifu Wang	bbd47f7b2f	Remove ProcessGroupCudaP2P and change async-TP to use SymmetricMemory (#128762 ) This PR removes `ProcessGroupCudaP2P` and changes async-TP to use `SymmetricMemory`. The async-TP implementation is still workspace-based, but it now doesn't require a buffer size to be specified upfront. Pull Request resolved: https://github.com/pytorch/pytorch/pull/128762 Approved by: https://github.com/wanchaol	2024-06-25 22:32:21 +00:00
Ke Wen	01601ebd41	Retire torch.distributed.pipeline (#127354 ) Actually retiring module after deprecation warning for a while. The new supported module is: torch.distributed.pipelining. Please migrate. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127354 Approved by: https://github.com/wconstab	2024-06-07 08:11:58 +00:00
PyTorch MergeBot	0ff60236ab	Revert "Retire torch.distributed.pipeline (#127354 )" This reverts commit b9c058c203ee38032594f898f27cd8404f113a63. Reverted https://github.com/pytorch/pytorch/pull/127354 on behalf of https://github.com/huydhn due to Sorry for reverting your change but the doc build failure looks legit `b9c058c203` ([comment](https://github.com/pytorch/pytorch/pull/127354#issuecomment-2148133982))	2024-06-04 18:19:31 +00:00
Ke Wen	b9c058c203	Retire torch.distributed.pipeline (#127354 ) Actually retiring module after deprecation warning for a while. The new supported module is: torch.distributed.pipelining. Please migrate. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127354 Approved by: https://github.com/wconstab	2024-06-04 07:03:26 +00:00
Xuehai Pan	26f4f10ac8	[5/N][Easy] fix typo for `usort` config in `pyproject.toml` (`kown` -> `known`): sort torch (#127126 ) The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127126 Approved by: https://github.com/kit1980	2024-05-27 14:49:57 +00:00
PyTorch MergeBot	55c0ab2887	Revert "[5/N][Easy] fix typo for `usort` config in `pyproject.toml` (`kown` -> `known`): sort torch (#127126 )" This reverts commit 7763c83af67eebfdd5185dbe6ce15ece2b992a0f. Reverted https://github.com/pytorch/pytorch/pull/127126 on behalf of https://github.com/XuehaiPan due to Broken CI ([comment](https://github.com/pytorch/pytorch/pull/127126#issuecomment-2133044286))	2024-05-27 09:22:08 +00:00
Xuehai Pan	7763c83af6	[5/N][Easy] fix typo for `usort` config in `pyproject.toml` (`kown` -> `known`): sort torch (#127126 ) The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127126 Approved by: https://github.com/kit1980 ghstack dependencies: #127122, #127123, #127124, #127125	2024-05-27 04:22:18 +00:00
Yifu Wang	4a09117d16	Introduce ProcessGroupCudaP2P (#122163 ) ## Context This stack prototypes automatic micro-pipelining of `all-gather -> matmul` and `matmul -> reduce-scatter` via Inductor. The idea originates from the paper [Overlap Communication with Dependent Computation via Decomposition in Large Deep Learning Models](https://dl.acm.org/doi/pdf/10.1145/3567955.3567959). The implementation and some key optimizations are heavily influenced by @lw's implementation in xformers. The stack contains several components: - `ProcessGroupCudaP2P` - a thin wrapper around `ProcessGroupNCCL`. It in addition maintains a P2P workspace that enables SM-free, one-sided P2P communication which is needed for optimal micro-pipelining. - `fused_all_gather_matmul` and `fused_matmul_reduce_scatter` dispatcher ops. - Post-grad fx pass that detects `all-gather -> matmul` and `matmul -> reduce-scatter` and replaces them with the fused dispatcher ops. To enable the prototype feature: - Set the distributed backend to `cuda_p2p`. - Set `torch._inductor.config._micro_pipeline_tp` to `True`. NOTE: the prototype sets nothing in stone w.r.t to each component's design. The purpose is to have a performant baseline with reasonable design on which each component can be further improved. ## Benchmark Setup: - 8 x H100 (500W) + 3rd gen NVSwitch. - Llama3 8B training w/ torchtitan. - 8-way TP. Reduced the number of layers from 32 to 8 for benchmarking purpose. Trace (baseline): https://interncache-all.fbcdn.net/manifold/perfetto-artifacts/tree/ui/index.html#!/?url=https://interncache-all.fbcdn.net/manifold/perfetto_internal_traces/tree/shared_trace/yifu_tmpjaz8zgx0 <img width="832" alt="image" src="https://github.com/pytorch/pytorch/assets/4156752/4addba77-5abc-4d2e-93ea-f68078587fe1"> Trace (w/ micro pipelining): https://interncache-all.fbcdn.net/manifold/perfetto-artifacts/tree/ui/index.html#!/?url=https://interncache-all.fbcdn.net/manifold/perfetto_internal_traces/tree/shared_trace/yifu_tmpn073b4wn <img width="963" alt="image" src="https://github.com/pytorch/pytorch/assets/4156752/4f44e78d-8196-43ab-a1ea-27390f07e9d2"> ## This PR `ProcessGroupCudaP2P` is a thin wrapper around `ProcessGroupNCCL`. By default, it routes all collectives to the underlying `ProcessGroupNCCL`. In addition, `ProcessGroupCudaP2P` initializes a P2P workspace that allows direct GPU memory access among the members. The workspace can be used in Python to optimize intra-node communication patterns or to create custom intra-node collectives in CUDA. `ProcessGroupCudaP2P` aims to bridge the gap where certain important patterns can be better optimized via fine-grained P2P memory access than with collectives in the latest version of NCCL. It is meant to complement NCCL rather than replacing it. Usage: ``` # Using ProcessGroupCudaP2P dist.init_process_group(backend="cuda_p2p", ...) # Using ProcessGroupCudaP2P while specifying ProcessGroupCudaP2P.Options pg_options = ProcessGroupCudaP2P.Options() dist.init_process_group(backend="cuda_p2p", pg_options=pg_options, ...) # Using ProcessGroupCudaP2P while specifying ProcessGroupNCCL.Options pg_options = ProcessGroupNCCL.Options() dist.init_process_group(backend="cuda_p2p", pg_options=pg_options, ...) # Using ProcessGroupCudaP2P while specifying both # ProcessGroupCudaP2P.Options and ProcessGroupNCCL.Options pg_options = ProcessGroupCudaP2P.Options() pg_options.nccl_options = ProcessGroupNCCL.Options() dist.init_process_group(backend="cuda_p2p", pg_options=pg_options, ...) # Down-casting the backend to access p2p buffers for cuda_p2p specific # optimizations if is_cuda_p2p_group(group): backend = get_cuda_p2p_backend(group) if required_p2p_buffer_size > backend.get_buffer_size(): # fallback p2p_buffer = backend.get_p2p_buffer(...) else: # fallback ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/122163 Approved by: https://github.com/wanchaol	2024-05-24 18:33:18 +00:00
PyTorch MergeBot	1b29c16e5e	Revert "Introduce ProcessGroupCudaP2P (#122163 )" This reverts commit 2dd269986027ea25c092f769ef8e9524920aaef6. Reverted https://github.com/pytorch/pytorch/pull/122163 on behalf of https://github.com/jithunnair-amd due to This is breaking ROCm distributed CI on trunk ([comment](https://github.com/pytorch/pytorch/pull/122163#issuecomment-2127518473))	2024-05-23 16:06:14 +00:00
Yifu Wang	2dd2699860	Introduce ProcessGroupCudaP2P (#122163 ) ## Context This stack prototypes automatic micro-pipelining of `all-gather -> matmul` and `matmul -> reduce-scatter` via Inductor. The idea originates from the paper [Overlap Communication with Dependent Computation via Decomposition in Large Deep Learning Models](https://dl.acm.org/doi/pdf/10.1145/3567955.3567959). The implementation and some key optimizations are heavily influenced by @lw's implementation in xformers. The stack contains several components: - `ProcessGroupCudaP2P` - a thin wrapper around `ProcessGroupNCCL`. It in addition maintains a P2P workspace that enables SM-free, one-sided P2P communication which is needed for optimal micro-pipelining. - `fused_all_gather_matmul` and `fused_matmul_reduce_scatter` dispatcher ops. - Post-grad fx pass that detects `all-gather -> matmul` and `matmul -> reduce-scatter` and replaces them with the fused dispatcher ops. To enable the prototype feature: - Set the distributed backend to `cuda_p2p`. - Set `torch._inductor.config._micro_pipeline_tp` to `True`. NOTE: the prototype sets nothing in stone w.r.t to each component's design. The purpose is to have a performant baseline with reasonable design on which each component can be further improved. ## Benchmark Setup: - 8 x H100 (500W) + 3rd gen NVSwitch. - Llama3 8B training w/ torchtitan. - 8-way TP. Reduced the number of layers from 32 to 8 for benchmarking purpose. Trace (baseline): https://interncache-all.fbcdn.net/manifold/perfetto-artifacts/tree/ui/index.html#!/?url=https://interncache-all.fbcdn.net/manifold/perfetto_internal_traces/tree/shared_trace/yifu_tmpjaz8zgx0 <img width="832" alt="image" src="https://github.com/pytorch/pytorch/assets/4156752/4addba77-5abc-4d2e-93ea-f68078587fe1"> Trace (w/ micro pipelining): https://interncache-all.fbcdn.net/manifold/perfetto-artifacts/tree/ui/index.html#!/?url=https://interncache-all.fbcdn.net/manifold/perfetto_internal_traces/tree/shared_trace/yifu_tmpn073b4wn <img width="963" alt="image" src="https://github.com/pytorch/pytorch/assets/4156752/4f44e78d-8196-43ab-a1ea-27390f07e9d2"> ## This PR `ProcessGroupCudaP2P` is a thin wrapper around `ProcessGroupNCCL`. By default, it routes all collectives to the underlying `ProcessGroupNCCL`. In addition, `ProcessGroupCudaP2P` initializes a P2P workspace that allows direct GPU memory access among the members. The workspace can be used in Python to optimize intra-node communication patterns or to create custom intra-node collectives in CUDA. `ProcessGroupCudaP2P` aims to bridge the gap where certain important patterns can be better optimized via fine-grained P2P memory access than with collectives in the latest version of NCCL. It is meant to complement NCCL rather than replacing it. Usage: ``` # Using ProcessGroupCudaP2P dist.init_process_group(backend="cuda_p2p", ...) # Using ProcessGroupCudaP2P while specifying ProcessGroupCudaP2P.Options pg_options = ProcessGroupCudaP2P.Options() dist.init_process_group(backend="cuda_p2p", pg_options=pg_options, ...) # Using ProcessGroupCudaP2P while specifying ProcessGroupNCCL.Options pg_options = ProcessGroupNCCL.Options() dist.init_process_group(backend="cuda_p2p", pg_options=pg_options, ...) # Using ProcessGroupCudaP2P while specifying both # ProcessGroupCudaP2P.Options and ProcessGroupNCCL.Options pg_options = ProcessGroupCudaP2P.Options() pg_options.nccl_options = ProcessGroupNCCL.Options() dist.init_process_group(backend="cuda_p2p", pg_options=pg_options, ...) # Down-casting the backend to access p2p buffers for cuda_p2p specific # optimizations if is_cuda_p2p_group(group): backend = get_cuda_p2p_backend(group) if required_p2p_buffer_size > backend.get_buffer_size(): # fallback p2p_buffer = backend.get_p2p_buffer(...) else: # fallback ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/122163 Approved by: https://github.com/wanchaol	2024-05-22 09:33:05 +00:00
Aaron Gokaslan	5a1216bb2e	[BE]: Update ruff to 0.4.1 (#124549 ) Update ruff to 0.4.1 . This version fixes a lot false negatives/false positives, is 20-40% faster, and has various other bug fixes. Below is a before and after table showing the execution time of ruff lint and ruff format in milliseconds courtesy of https://astral.sh/blog/ruff-v0.4.0 \| Repository \| Linter (v0.3) \| Linter (v0.4) \| Formatter (v0.3) \| Formatter (v0.4) \| \|----------------------------------------------------\|---------------\|---------------\|------------------\|------------------\| \| [pytorch/pytorch](https://github.com/pytorch/pytorch) \| 328.7 \| 251.8 \| 351.1 \| 274.9 \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/124549 Approved by: https://github.com/ezyang	2024-04-21 14:06:23 +00:00
Yifu Wang	c58b0ac7c2	IntraNodeComm primitives for allgather_matmul (#118038 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/118038 Approved by: https://github.com/wanchaol	2024-04-04 00:46:08 +00:00
Aaron Gokaslan	6de28e92d2	[BE]: Apply FURB118 (prev): replaces unnecessary lambdas with operator. (#116027 ) This replaces a bunch of unnecessary lambdas with the operator package. This is semantically equivalent, but the operator package is faster, and arguably more readable. When the FURB rules are taken out of preview, I will enable it as a ruff check. Pull Request resolved: https://github.com/pytorch/pytorch/pull/116027 Approved by: https://github.com/malfet	2023-12-20 19:35:08 +00:00
Edward Z. Yang	dd3a77bc96	Apply UFMT to all files in benchmarks/ (#105928 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/105928 Approved by: https://github.com/albanD	2023-07-26 01:18:48 +00:00
Justin Chu	5ef023b05a	[BE] Enable ruff's UP rules and autoformat benchmarks/ (#105429 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105429 Approved by: https://github.com/malfet	2023-07-19 04:46:37 +00:00
Aaron Gokaslan	597b558c51	[BE]: Update flake8 and plugins and fix bugs (#97795 ) Update flake8 and flake8-plugins in lintrunner to a modern version. Enables more checks and makes flake8 checks significantly faster. Added a few additional rule ignores that will need to be fixed in the future. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97795 Approved by: https://github.com/alexsio27444, https://github.com/janeyx99, https://github.com/ezyang	2023-03-28 23:51:55 +00:00
Aaron Gokaslan	67d9790985	[BE] Apply almost all remaining flake8-comprehension checks (#94676 ) Applies the remaining flake8-comprehension fixes and checks. This changes replace all remaining unnecessary generator expressions with list/dict/set comprehensions which are more succinct, performant, and better supported by our torch.jit compiler. It also removes useless generators such as 'set(a for a in b)`, resolving it into just the set call. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94676 Approved by: https://github.com/ezyang	2023-02-12 01:01:25 +00:00
Xuehai Pan	8d45f555d7	[BE] [1/3] Rewrite `super()` calls in caffe2 and benchmarks (#94587 ) Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied. - #94587 - #94588 - #94592 Also, methods with only a `super()` call are removed: ```diff class MyModule(nn.Module): - def __init__(self): - super().__init__() - def forward(self, ...): ... ``` Some cases that change the semantics should be kept unchanged. E.g.: `f152a79be9/caffe2/python/net_printer.py (L184-L190)` `f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/94587 Approved by: https://github.com/ezyang	2023-02-11 18:19:48 +00:00
Xuehai Pan	a229b4526f	[BE] Prefer dash over underscore in command-line options (#94505 ) Preferring dash over underscore in command-line options. Add `--command-arg-name` to the argument parser. The old arguments with underscores `--command_arg_name` are kept for backward compatibility. Both dashes and underscores are used in the PyTorch codebase. Some argument parsers only have dashes or only have underscores in arguments. For example, the `torchrun` utility for distributed training only accepts underscore arguments (e.g., `--master_port`). The dashes are more common in other command-line tools. And it looks to be the default choice in the Python standard library: `argparse.BooleanOptionalAction`: `4a9dff0e5a/Lib/argparse.py (L893-L895)` ```python class BooleanOptionalAction(Action): def __init__(...): if option_string.startswith('--'): option_string = '--no-' + option_string[2:] _option_strings.append(option_string) ``` It adds `--no-argname`, not `--no_argname`. Also typing `_` need to press the shift or the caps-lock key than `-`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94505 Approved by: https://github.com/ezyang, https://github.com/seemethere	2023-02-09 20:16:49 +00:00
Aaron Gokaslan	8fce9a09cd	[BE]: pyupgrade Python to 3.8 - imports and object inheritance only (#94308 ) Apply parts of pyupgrade to torch (starting with the safest changes). This PR only does two things: removes the need to inherit from object and removes unused future imports. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94308 Approved by: https://github.com/ezyang, https://github.com/albanD	2023-02-07 21:10:56 +00:00
Yanli Zhao	2004df9097	Remove python ddp (#91663 ) As it is not used by anyone and also it is not maintained by PyTorch Pull Request resolved: https://github.com/pytorch/pytorch/pull/91663 Approved by: https://github.com/rohan-varma	2023-01-04 05:22:30 +00:00
Sergii Dymchenko	30edd39bdc	Fix non-existing parameters in docstrings in benchmarks (#91115 ) This is a continuation of https://github.com/pytorch/pytorch/pull/90505 Pull Request resolved: https://github.com/pytorch/pytorch/pull/91115 Approved by: https://github.com/clee2000	2022-12-20 02:07:32 +00:00
Kazuaki Ishizaki	14d5f139d2	Fix typos under benchmarks, test, and tools directories (#87975 ) This PR fixes typos in `.md` files under benchmarks, test, and tools directories Pull Request resolved: https://github.com/pytorch/pytorch/pull/87975 Approved by: https://github.com/kit1980	2022-10-29 01:26:17 +00:00
Sergii Dymchenko	591222f5d9	Fix use-dict-literal lint (#83718 ) Fix use-dict-literal pylint suggestions by changing `dict()` to `{}`. This PR should do the change for every Python file except test/jit/test_list_dict.py, where I think the intent is to test the constructor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83718 Approved by: https://github.com/albanD	2022-08-24 00:26:46 +00:00
Yulv-git	ac2d2e3a3d	Fix some typos. Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/75561 Approved by: https://github.com/albanD	2022-04-11 21:55:59 +00:00
Rodrigo Berriel	a0dea074b2	Remove `.data` from benchmarks and tensorboard (#65389 ) Summary: Related to https://github.com/pytorch/pytorch/issues/30987 and https://github.com/pytorch/pytorch/issues/33628. Fix the following tasks: - Remove the use of `.data` in all our internal code: - [x] `benchmarks/` - [x] `torch/utils/tensorboard/` cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang gcramer23 albanD gchanan Pull Request resolved: https://github.com/pytorch/pytorch/pull/65389 Reviewed By: soulitzer Differential Revision: D31093464 Pulled By: albanD fbshipit-source-id: 3a9c8834fd544a59a1cc2b930ae538fd1d46b232	2021-09-22 11:16:59 -07:00
Sean Lawlor	34c9f5a8da	[DDP Communication Hook] Update get_tensor and set_tensor to be cleaner naming conventions (buffer() and set_buffer()) (#62662 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62662 Replaced the methods set_tensor(.) and get_tensor() in the python exposed API from the C++ logic with buffer() and set_buffer(.) to be a cleaner interface. Reviewed By: SciPioneer Differential Revision: D30012869 fbshipit-source-id: bd8efab583dd89c96f9aeb3dd48a12073f0b1482	2021-08-04 09:27:31 -07:00
Bo Wang	e098e9000b	Compare DDP static graph (C++ core) with legacy DDP forward and backward delay. (#61507 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61507 Benchmark Python-only DDP vs production C++ based DistributedDataParallel. - Implemented a pure python DDP: PythonDDP with support of SYNC and ASYNC reduction - Added compare_ddp to measure the difference in forward and backward step Kudos on Shen and Yi for the great idea. Test Plan: Test on DevGPUS with 2 CUDA devices. $python compare_ddp.py Python only DDP has slightly better (-1%) forward performance and slightly slower (2%-20%) backward performance. This suggested that we need to keep C++ Core since the maximum latency increase can be 20%. See README.md for details. Imported from OSS Differential Revision: D29685364 D29685364 Reviewed By: mrshenli Pulled By: bowangbj fbshipit-source-id: 429e4473fac0ec4c70d6db12d946d2636dd6477a	2021-07-15 12:52:22 -07:00
Garrett Cramer	5a5c7f563d	add trainer hook functions (#60785 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60785 This pr adds hook functions for the trainers. Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D29697299 Pulled By: gcramer23 fbshipit-source-id: cc3b991aad0d32503fbfc5acd4fca8b404e74c0f	2021-07-14 13:19:17 -07:00
Garrett Cramer	304c02ee44	refactor ps benchmark (#60784 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60784 This pr refactors the ps benchmark for modular trainers. Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D29697291 Pulled By: gcramer23 fbshipit-source-id: 64579a1f5326d3cd9f32936dcf53bc243d54b71d	2021-07-14 13:19:13 -07:00
Basil Hosmer	cab926b2c0	faster generate_square_subsequent_mask in nn.Transformer (#60631 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60631 Per #48360, speed up `Transformer.generate_square_subsequent_mask`. New impl is informally ~5x faster, though absolute difference is probably small. PR includes Python and C++ versions as well as a couple of places where the previous impl had been copied around. Test Plan: Imported from OSS Reviewed By: jbschlosser, albanD Differential Revision: D29356673 Pulled By: bhosmer fbshipit-source-id: 4c062ba0ead61a445aeef451c78777bf0b3a631e	2021-06-25 16:07:01 -07:00
Garrett Cramer	4ed2d5d9bb	ps sparse rpc (#58003 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58003 adds trainer class DdpTrainer adds trainer class DdpSparseRpcTrainer adds server class ParameterServerBase adds server class AverageParameterServer adds experiment ddp_cpu_sparse_rpc_nccl_allreduce adds experiment ddp_cuda_sparse_rpc_nccl_allreduce quip document https://fb.quip.com/iQUtAeKIxWpF Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29379696 Pulled By: gcramer23 fbshipit-source-id: 9cf5fb7398ba2fa3eb694afbddc4ed00d97f205f	2021-06-24 17:21:49 -07:00
Zachary Kneupper	b8d56572a1	Open json config file in context manager (#58077 ) Summary: * Open json config file safely using a context manager (using a with block). * This will make sure that the file closed even if an exception is raised. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58077 Reviewed By: anjali411 Differential Revision: D28711177 Pulled By: H-Huang fbshipit-source-id: 597ba578311b1f1d6706e487872db4e784c78c3c	2021-05-26 08:58:40 -07:00
Horace He	79a258f448	s/foward/forward/g (#58497 ) Summary: Annoying typo. Prompted by these profiling results: https://github.com/pytorch/pytorch/issues/56419#issuecomment-825787828 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58497 Reviewed By: malfet Differential Revision: D28521081 Pulled By: Chillee fbshipit-source-id: ab91a2e167dd7d3387fd56106a6cff81f7a32f10	2021-05-19 11:42:42 -07:00
Garrett Cramer	16d617c3e5	test experiment script (#57925 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57925 1. adds test_scripts.py that will run added scripts and verify that there are no errors 2. adds local ddp_nccl_allreduce experiment script test with command `pytest test_scripts.py` Test Plan: Imported from OSS Reviewed By: agolynski Differential Revision: D28382452 Pulled By: gcramer23 fbshipit-source-id: 21028a990ebfedf1aad6b007a723c02403e8bea8	2021-05-12 10:22:47 -07:00
Garrett Cramer	bc2540f0be	benchmark rpc ps (#57454 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57454 DDP with NCCL AllReduce for the entire model experiment from Quip https://fb.quip.com/iQUtAeKIxWpF I have been testing this on the AI cluster. There seem to be some connection problems with RPC when using multiple trainers or parameter servers. ``` Namespace(bconfig_id='3', dconfig_id='DummyData', mconfig_id='DummyModel', pconfig_id='None', tconfig_id='DdpNcclTrainer') benchmark warmup done metrics for trainer=0 +-----------------------------------+----------+---------+----------+------------+-----------+ \| name \| min \| max \| mean \| variance \| stdev \| +===================================+==========+=========+==========+============+===========+ \| backward_metric,backward \| 2.45248 \| 4.18304 \| 3.972 \| 0.097122 \| 0.311644 \| +-----------------------------------+----------+---------+----------+------------+-----------+ \| batch_level_metric,batch_all \| 4.11955 \| 4.58138 \| 4.31439 \| 0.00229848 \| 0.0479424 \| +-----------------------------------+----------+---------+----------+------------+-----------+ \| foward_metric,forward_pass \| 0.141312 \| 1.4807 \| 0.222566 \| 0.0555432 \| 0.235676 \| +-----------------------------------+----------+---------+----------+------------+-----------+ \| hook_future_metric,nccl_allreduce \| 0.191488 \| 3.54099 \| 3.11694 \| 0.557106 \| 0.746395 \| +-----------------------------------+----------+---------+----------+------------+-----------+ metrics for trainer=1 +-----------------------------------+----------+---------+----------+-------------+------------+ \| name \| min \| max \| mean \| variance \| stdev \| +===================================+==========+=========+==========+=============+============+ \| backward_metric,backward \| 2.4617 \| 2.59174 \| 2.51196 \| 0.000938276 \| 0.0306313 \| +-----------------------------------+----------+---------+----------+-------------+------------+ \| batch_level_metric,batch_all \| 4.22605 \| 4.71757 \| 4.27921 \| 0.00468424 \| 0.0684415 \| +-----------------------------------+----------+---------+----------+-------------+------------+ \| foward_metric,forward_pass \| 0.807936 \| 1.50118 \| 0.846008 \| 0.00601693 \| 0.0775688 \| +-----------------------------------+----------+---------+----------+-------------+------------+ \| hook_future_metric,nccl_allreduce \| 0.108544 \| 0.1536 \| 0.11222 \| 2.16726e-05 \| 0.00465538 \| +-----------------------------------+----------+---------+----------+-------------+------------+ metrics for all trainer +-----------------------------------+----------+---------+----------+------------+-----------+ \| name \| min \| max \| mean \| variance \| stdev \| +===================================+==========+=========+==========+============+===========+ \| backward_metric,backward \| 2.45248 \| 4.18304 \| 3.24198 \| 0.584391 \| 0.764455 \| +-----------------------------------+----------+---------+----------+------------+-----------+ \| batch_level_metric,batch_all \| 4.11955 \| 4.71757 \| 4.2968 \| 0.00378467 \| 0.0615197 \| +-----------------------------------+----------+---------+----------+------------+-----------+ \| foward_metric,forward_pass \| 0.141312 \| 1.50118 \| 0.534287 \| 0.128284 \| 0.358167 \| +-----------------------------------+----------+---------+----------+------------+-----------+ \| hook_future_metric,nccl_allreduce \| 0.108544 \| 3.54099 \| 1.61458 \| 2.5456 \| 1.59549 \| +-----------------------------------+----------+---------+----------+------------+-----------+ ``` Test Plan: Imported from OSS Reviewed By: H-Huang, ngimel Differential Revision: D28296175 Pulled By: gcramer23 fbshipit-source-id: 5dd208fc86f8b5558d7c8860d685bb25c2e09fe7	2021-05-07 19:58:40 -07:00
Sam Estep	75024e228c	Add lint for unqualified `type: ignore` (#56290 ) Summary: The other half of https://github.com/pytorch/pytorch/issues/56272. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56290 Test Plan: CI should pass on the tip of this PR, and we know that the lint works because the following CI runs (before this PR was finished) failed: - https://github.com/pytorch/pytorch/runs/2384511062 - https://github.com/pytorch/pytorch/actions/runs/765036024 Reviewed By: seemethere Differential Revision: D27867219 Pulled By: samestep fbshipit-source-id: e648f07b6822867e70833e23ddafe7fb7eaca235	2021-04-21 08:07:23 -07:00
Sam Estep	8c798e0622	Forbid trailing whitespace (#53406 ) Summary: Context: https://github.com/pytorch/pytorch/pull/53299#discussion_r587882857 These are the only hand-written parts of this diff: - the addition to `.github/workflows/lint.yml` - the file endings changed in these four files (to appease FB-internal land-blocking lints): - `GLOSSARY.md` - `aten/src/ATen/core/op_registration/README.md` - `scripts/README.md` - `torch/csrc/jit/codegen/fuser/README.md` The rest was generated by running this command (on macOS): ``` git grep -I -l ' $' -- . ':(exclude)/contrib/' ':(exclude)third_party' \| xargs gsed -i 's/ *$//' ``` I looked over the auto-generated changes and didn't see anything that looked problematic. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53406 Test Plan: This run (after adding the lint but before removing existing trailing spaces) failed: - https://github.com/pytorch/pytorch/runs/2043032377 This run (on the tip of this PR) succeeded: - https://github.com/pytorch/pytorch/runs/2043296348 Reviewed By: walterddr, seemethere Differential Revision: D26856620 Pulled By: samestep fbshipit-source-id: 3f0de7f7c2e4b0f1c089eac9b5085a58dd7e0d97	2021-03-05 17:22:55 -08:00
Rohan Varma	5021582fe6	Fix benchmarks/distributed/ddp/benchmark.py (#51095 ) Summary: Fixes the issue reported in https://github.com/pytorch/pytorch/issues/50679 by using built-in object-based collectives. User has verified this patch works Test with: RANK=0 python3 pytorch-dist-benchmark.py --world-size 2 --master-addr 127.0.0.1 --master-port 23456 RANK=1 python3 pytorch-dist-benchmark.py --world-size 2 --master-addr 127.0.0.1 --master-port 23456 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51095 Reviewed By: SciPioneer Differential Revision: D26070275 Pulled By: rohan-varma fbshipit-source-id: 59abcaac9e395bcdd8a018bf6ba07521d94b2fdf	2021-01-29 11:10:13 -08:00

1 2

57 Commits