mirror of
https://github.com/pytorch/pytorch.git
synced 2025-11-02 14:34:54 +08:00
Used [rst2myst tool](https://rst-to-myst.readthedocs.io/en/latest/) Fixes #155018 Docs comparison (check out the 'new' whenever docs build) 1. distributed.checkpoint ([old](https://docs.pytorch.org/docs/main/distributed.checkpoint.html) vs. [new](https://docs-preview.pytorch.org/pytorch/pytorch/155528/distributed.checkpoint.html)) 2. distributed.elastic ([old](https://docs.pytorch.org/docs/main/distributed.elastic.html) vs. [new](https://docs-preview.pytorch.org/pytorch/pytorch/155528/distributed.elastic.html)) 3. distributed.fsdp.fully_shard ([old](https://docs.pytorch.org/docs/main/distributed.fsdp.fully_shard.html) vs. [new](https://docs-preview.pytorch.org/pytorch/pytorch/155528/distributed.fsdp.fully_shard.html)) 4. distributed.optim ([old](https://docs.pytorch.org/docs/main/distributed.optim.html) vs. [new](https://docs-preview.pytorch.org/pytorch/pytorch/155528/distributed.optim.html)) 5. distributed.pipelining ([old](https://docs.pytorch.org/docs/main/distributed.pipelining.html) vs. [new](https://docs-preview.pytorch.org/pytorch/pytorch/155528/distributed.pipelining.html)) Pull Request resolved: https://github.com/pytorch/pytorch/pull/155528 Approved by: https://github.com/wz337, https://github.com/svekars
588 B
588 B
Torch Distributed Elastic
Makes distributed PyTorch fault-tolerant and elastic.
Get Started
:caption: Usage
:maxdepth: 1
elastic/quickstart
elastic/train_script
elastic/examples
Documentation
:caption: API
:maxdepth: 1
elastic/run
elastic/agent
elastic/multiprocessing
elastic/errors
elastic/rendezvous
elastic/timer
elastic/metrics
elastic/events
elastic/subprocess_handler
elastic/control_plane
:caption: Advanced
:maxdepth: 1
elastic/customization
:caption: Plugins
:maxdepth: 1
elastic/kubernetes