pytorch/_thread_safe_fork.py at 57ba5752423249dd659e76e4d5a3d7b893edc85a - pytorch - Gitea: Git for Me

frozenleaves/pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Files

Sam Larsen 8bb8c3997b [inductor] parallel compile: add import of thread_safe_fork for internal (#137155 )

Summary: We had a report of crashes in parallel compile subprocesses linked to reading justknobs. See https://fburl.com/workplace/14a4mcbh internally. This is a known issue with justknobs. It looks like we don't have a lot of control over evaluating knobs. Some are read in inductor (`"pytorch/remote_cache:autotune_memcache_version`), but many are read by the triton compiler. According to this advice https://fburl.com/workplace/imx9lsx3, we can import thread_safe_fork which installs some functionality to destroy some singletons before forking and re-enable them after. This apporach works for the failing workload.

Test Plan: See D63719673 where the reporting user was kind enough to provide us with a local repro. Without the relevant import, we can reproduce the crash. With the import, the training runs successfully to completion.

Differential Revision: D63736829

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137155
Approved by: https://github.com/xmfan, https://github.com/eellison

2024-10-03 17:37:21 +00:00

0 lines

0 B

Python

Raw Blame History

The file is empty.