mirror of
https://github.com/pytorch/pytorch.git
synced 2025-10-20 21:14:14 +08:00
Summary: This diff modifies the elastic agent's API to pass the event log handler to the record function calls. This change enables the elastic agent to log events to a specific destination, improving the monitoring and debugging capabilities of the distributed training process. Test Plan: unit tests ran an e2e training job. Differential Revision: D75194115 Pull Request resolved: https://github.com/pytorch/pytorch/pull/155457 Approved by: https://github.com/d4l3k