Files
pytorch/docs/source/elastic/multiprocessing.rst
Kurman Karabukaev 67d3e4f2a2 [TorchElastic] Refactoring to support non-default logging strategy (#120691)
Summary:
Pulling out logging parameters into a logging specs that can be overridden (follow-up changes on possible mechanism)

Why?
Right now the logging approach is quite rigid:
- Requires for log directory to exist and not be empty
- Will create tempdir otherwise,
- Creates subdir for a run
- creates subdir for each attempt
- creates files named as stdout.log, stderr.log, error.json

In some instances some of the users would like to customize the behavior including file names based on context. And we do have right now a mechanism to template multiplexed teed output prefix.

With current changes, users can create custom log spec that can use env variables to change the behavior.

Notes:
Made `LaunchConf.logs_specs` as an optional field that will be bound to `DefaultLogsSpecs` instance. There are large number of clients (code) that use the API directly without using torchrun API. For those cases, we have to explicitly pass LogSpecs implementation if we would like to override the implementation. For the regular torchrun users, we can use pluggable approach proposed in the follow up change.

Test Plan: CI + unit tests

Differential Revision: D54176265

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120691
Approved by: https://github.com/ezyang
2024-02-29 20:59:17 +00:00

33 lines
611 B
ReStructuredText

:github_url: https://github.com/pytorch/elastic
Multiprocessing
================
.. automodule:: torch.distributed.elastic.multiprocessing
Starting Multiple Workers
---------------------------
.. autofunction:: torch.distributed.elastic.multiprocessing.start_processes
Process Context
----------------
.. currentmodule:: torch.distributed.elastic.multiprocessing.api
.. autoclass:: PContext
.. autoclass:: MultiprocessContext
.. autoclass:: SubprocessContext
.. autoclass:: RunProcsResult
.. autoclass:: DefaultLogsSpecs
:members:
.. autoclass:: LogsDest
.. autoclass:: LogsSpecs
:members: