mirror of
https://github.com/pytorch/pytorch.git
synced 2025-10-20 21:14:14 +08:00
During comms reordering , sink wait iterative observed previous runtime estimations pretty off for collectives and mms. Adding optional usage of: - c10d.time_estimator for collectives, which is based on NCCL estimator Benchmark mode only for matmuls, as they are highly dependent on mm backend - The logic mostly copied from Ruisi's PRs for inductor simple_fsdp https://github.com/pytorch/pytorch/pull/157572 This estimations corrections are in default `BaseSchedulerNode.estimate_runtime()` Differential Revision: [D81152294](https://our.internmc.facebook.com/intern/diff/D81152294) Pull Request resolved: https://github.com/pytorch/pytorch/pull/161405 Approved by: https://github.com/eellison
16 lines
542 B
Python
16 lines
542 B
Python
import sys
|
|
|
|
from torch.utils._config_module import install_config_module
|
|
|
|
|
|
# Whether to use c10d._time_estimator for collectives runtime estimations.
|
|
runtime_estimations_use_nccl_lib_estimations: bool = False
|
|
|
|
# Config to enable sync of runtime estimations across distributed ranks,
|
|
# To prevent passes using this runtime estimations to make different
|
|
# decisions on different distributed ranks.
|
|
runtime_estimations_align_across_all_distributed_ranks: bool = False
|
|
|
|
# adds patch, save_config, etc
|
|
install_config_module(sys.modules[__name__])
|