Changes:
1. Bump `ruff` from 0.7.4 to 0.8.4
2. Change `%`-formatted strings to f-string
3. Change arguments with the `__`-prefix to positional-only arguments with the `/` separator in function signature.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/143753
Approved by: https://github.com/Skylion007
When one process fails, others are immediately killed. This prevents other processes to do necessary cleanups, or dump debug information (in particular, the NCCL flight recorder).
This PR adds a grace period. Default behavior is unchanged.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131278
Approved by: https://github.com/albanD
Summary:
This is to fix the pytorch issue filed https://github.com/pytorch/pytorch/issues/133010
one way to fix this problem is to enable parallel start processes in mp.start_processes.
What else in the diff:
refactored a test case api_test which was repeating a lot of tests due to the inheritance.
added unit test for forkserver when parallel start is on.
Test Plan: Added unit tests
Differential Revision: D61878552
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134629
Approved by: https://github.com/d4l3k
Summary:
This file is currently failing with
```
file /home/gaoxiang/pytorch-tf32/test/test_multiprocessing_spawn.py, line 13
def test_success_func(i):
E fixture 'i' not found
> available fixtures: cache, capfd, capfdbinary, caplog, capsys, capsysbinary, doctest_namespace, include_metadata_in_junit_xml, json_metadata, metadata, monkeypatch, pytestconfig, record_property, record_testsuite_property, record_xml_attribute, recwarn, tmp_path, tmp_path_factory, tmpdir, tmpdir_factory
> use 'pytest --fixtures [testpath]' for help on them.
/home/gaoxiang/pytorch-tf32/test/test_multiprocessing_spawn.py:13
________________________________________________________________________________________________________________ ERROR at setup of test_success_single_arg_func ________________________________________________________________________________________________________________
file /home/gaoxiang/pytorch-tf32/test/test_multiprocessing_spawn.py, line 17
def test_success_single_arg_func(i, arg):
E fixture 'i' not found
> available fixtures: cache, capfd, capfdbinary, caplog, capsys, capsysbinary, doctest_namespace, include_metadata_in_junit_xml, json_metadata, metadata, monkeypatch, pytestconfig, record_property, record_testsuite_property, record_xml_attribute, recwarn, tmp_path, tmp_path_factory, tmpdir, tmpdir_factory
> use 'pytest --fixtures [testpath]' for help on them.
/home/gaoxiang/pytorch-tf32/test/test_multiprocessing_spawn.py:17
_________________________________________________________________________________________________________________ ERROR at setup of test_exception_single_func _________________________________________________________________________________________________________________
file /home/gaoxiang/pytorch-tf32/test/test_multiprocessing_spawn.py, line 22
def test_exception_single_func(i, arg):
E fixture 'i' not found
> available fixtures: cache, capfd, capfdbinary, caplog, capsys, capsysbinary, doctest_namespace, include_metadata_in_junit_xml, json_metadata, metadata, monkeypatch, pytestconfig, record_property, record_testsuite_property, record_xml_attribute, recwarn, tmp_path, tmp_path_factory, tmpdir, tmpdir_factory
> use 'pytest --fixtures [testpath]' for help on them.
/home/gaoxiang/pytorch-tf32/test/test_multiprocessing_spawn.py:22
__________________________________________________________________________________________________________________ ERROR at setup of test_exception_all_func ___________________________________________________________________________________________________________________
file /home/gaoxiang/pytorch-tf32/test/test_multiprocessing_spawn.py, line 28
def test_exception_all_func(i):
E fixture 'i' not found
> available fixtures: cache, capfd, capfdbinary, caplog, capsys, capsysbinary, doctest_namespace, include_metadata_in_junit_xml, json_metadata, metadata, monkeypatch, pytestconfig, record_property, record_testsuite_property, record_xml_attribute, recwarn, tmp_path, tmp_path_factory, tmpdir, tmpdir_factory
> use 'pytest --fixtures [testpath]' for help on them.
/home/gaoxiang/pytorch-tf32/test/test_multiprocessing_spawn.py:28
_________________________________________________________________________________________________________________ ERROR at setup of test_terminate_signal_func _________________________________________________________________________________________________________________
file /home/gaoxiang/pytorch-tf32/test/test_multiprocessing_spawn.py, line 33
def test_terminate_signal_func(i):
E fixture 'i' not found
> available fixtures: cache, capfd, capfdbinary, caplog, capsys, capsysbinary, doctest_namespace, include_metadata_in_junit_xml, json_metadata, metadata, monkeypatch, pytestconfig, record_property, record_testsuite_property, record_xml_attribute, recwarn, tmp_path, tmp_path_factory, tmpdir, tmpdir_factory
> use 'pytest --fixtures [testpath]' for help on them.
/home/gaoxiang/pytorch-tf32/test/test_multiprocessing_spawn.py:33
__________________________________________________________________________________________________________________ ERROR at setup of test_terminate_exit_func __________________________________________________________________________________________________________________
file /home/gaoxiang/pytorch-tf32/test/test_multiprocessing_spawn.py, line 39
def test_terminate_exit_func(i, arg):
E fixture 'i' not found
> available fixtures: cache, capfd, capfdbinary, caplog, capsys, capsysbinary, doctest_namespace, include_metadata_in_junit_xml, json_metadata, metadata, monkeypatch, pytestconfig, record_property, record_testsuite_property, record_xml_attribute, recwarn, tmp_path, tmp_path_factory, tmpdir, tmpdir_factory
> use 'pytest --fixtures [testpath]' for help on them.
/home/gaoxiang/pytorch-tf32/test/test_multiprocessing_spawn.py:39
___________________________________________________________________________________________________________ ERROR at setup of test_success_first_then_exception_func ___________________________________________________________________________________________________________
file /home/gaoxiang/pytorch-tf32/test/test_multiprocessing_spawn.py, line 45
def test_success_first_then_exception_func(i, arg):
E fixture 'i' not found
> available fixtures: cache, capfd, capfdbinary, caplog, capsys, capsysbinary, doctest_namespace, include_metadata_in_junit_xml, json_metadata, metadata, monkeypatch, pytestconfig, record_property, record_testsuite_property, record_xml_attribute, recwarn, tmp_path, tmp_path_factory, tmpdir, tmpdir_factory
> use 'pytest --fixtures [testpath]' for help on them.
/home/gaoxiang/pytorch-tf32/test/test_multiprocessing_spawn.py:45
___________________________________________________________________________________________________________________ ERROR at setup of test_nested_child_body ___________________________________________________________________________________________________________________
file /home/gaoxiang/pytorch-tf32/test/test_multiprocessing_spawn.py, line 52
def test_nested_child_body(i, ready_queue, nested_child_sleep):
E fixture 'i' not found
> available fixtures: cache, capfd, capfdbinary, caplog, capsys, capsysbinary, doctest_namespace, include_metadata_in_junit_xml, json_metadata, metadata, monkeypatch, pytestconfig, record_property, record_testsuite_property, record_xml_attribute, recwarn, tmp_path, tmp_path_factory, tmpdir, tmpdir_factory
> use 'pytest --fixtures [testpath]' for help on them.
/home/gaoxiang/pytorch-tf32/test/test_multiprocessing_spawn.py:52
_____________________________________________________________________________________________________________________ ERROR at setup of test_infinite_task _____________________________________________________________________________________________________________________
file /home/gaoxiang/pytorch-tf32/test/test_multiprocessing_spawn.py, line 57
def test_infinite_task(i):
E fixture 'i' not found
> available fixtures: cache, capfd, capfdbinary, caplog, capsys, capsysbinary, doctest_namespace, include_metadata_in_junit_xml, json_metadata, metadata, monkeypatch, pytestconfig, record_property, record_testsuite_property, record_xml_attribute, recwarn, tmp_path, tmp_path_factory, tmpdir, tmpdir_factory
> use 'pytest --fixtures [testpath]' for help on them.
/home/gaoxiang/pytorch-tf32/test/test_multiprocessing_spawn.py:57
_____________________________________________________________________________________________________________________ ERROR at setup of test_process_exit ______________________________________________________________________________________________________________________
file /home/gaoxiang/pytorch-tf32/test/test_multiprocessing_spawn.py, line 62
def test_process_exit(idx):
E fixture 'idx' not found
> available fixtures: cache, capfd, capfdbinary, caplog, capsys, capsysbinary, doctest_namespace, include_metadata_in_junit_xml, json_metadata, metadata, monkeypatch, pytestconfig, record_property, record_testsuite_property, record_xml_attribute, recwarn, tmp_path, tmp_path_factory, tmpdir, tmpdir_factory
> use 'pytest --fixtures [testpath]' for help on them.
/home/gaoxiang/pytorch-tf32/test/test_multiprocessing_spawn.py:62
________________________________________________________________________________________________________________________ ERROR at setup of test_nested _________________________________________________________________________________________________________________________
file /home/gaoxiang/pytorch-tf32/test/test_multiprocessing_spawn.py, line 66
def test_nested(i, pids_queue, nested_child_sleep, start_method):
E fixture 'i' not found
> available fixtures: cache, capfd, capfdbinary, caplog, capsys, capsysbinary, doctest_namespace, include_metadata_in_junit_xml, json_metadata, metadata, monkeypatch, pytestconfig, record_property, record_testsuite_property, record_xml_attribute, recwarn, tmp_path, tmp_path_factory, tmpdir, tmpdir_factory
> use 'pytest --fixtures [testpath]' for help on them.
```
when running with pytest. This is because pytest considers anything starting with `test_` as a test, so I renamed it to `_test_...` to prevent this from happening.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50408
Reviewed By: bdhirsh
Differential Revision: D34118341
Pulled By: VitalyFedyunin
fbshipit-source-id: 7c74843462b79df351e3c60f313ef388a9e0df4e
(cherry picked from commit fd8b66bea0e2c182db0c77cb0c516822559b3cc1)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45174
Introduce different types of exceptions that map to different failures
of torch.multiprocessing.spawn. The change introduces three different exception types:
ProcessRaisedException - occurs when the process initiated by spawn raises an exception
ProcessExitedException - occurs when the process initiated by spawn exits
The following logic will allow frameworks that use mp.spawn to categorize failures.
This can be helpful for tracking metrics and enhancing logs.
Test Plan: Imported from OSS
Reviewed By: taohe
Differential Revision: D23889400
Pulled By: tierex
fbshipit-source-id: 8849624c616230a6a81158c52ce0c18beb437330
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35615
Python 2 has reached end-of-life and is no longer supported by PyTorch.
Now we can clean up a lot of cruft that we put in place to support it.
These changes were all done manually, and I skipped anything that seemed
like it would take more than a few seconds, so I think it makes sense to
review it manually as well (though using side-by-side view and ignoring
whitespace change might be helpful).
Test Plan: CI
Differential Revision: D20842886
Pulled By: dreiss
fbshipit-source-id: 8cad4e87c45895e7ce3938a88e61157a79504aed
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30445
Create distributed and rpc directories under caffe/test for better management
of unit tests.
Differential Revision: D18702786
fbshipit-source-id: e9daeed0cfb846ef68806f6decfcb57c0e0e3606
Summary:
If torch.multiprocessing.spawn is used to launch non-daemonic
processes (the default since #14391), the spawned children won't be
automatically terminated when the parent terminates.
On Linux, we can address this by setting PR_SET_PDEATHSIG, which
delivers a configurable signal to child processes when their parent
terminates.
Fixes#14394.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14491
Differential Revision: D13270374
Pulled By: pietern
fbshipit-source-id: 092c9d3c3cea2622c3766b467957bc27a1bd500c
Summary:
This helper addresses a common pattern where one spawns N processes to
work on some common task (e.g. parallel preprocessing or multiple
training loops).
A straightforward approach is to use the multiprocessing API directly
and then consecutively call join on the resulting processes.
This pattern breaks down in the face of errors. If one of the
processes terminates with an exception or via some signal, and it is
not the first process that was launched, the join call on the first
process won't be affected. This helper seeks to solve this by waiting
on termination from any of the spawned processes. When any process
terminates with a non-zero exit status, it terminates the remaining
processes, and raises an exception in the parent process. If the
process terminated with an exception, it is propagated to the parent.
If the process terminated via a signal (e.g. SIGINT, SIGSEGV), this is
mentioned in the exception as well.
Requires Python >= 3.4.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13518
Reviewed By: orionr
Differential Revision: D12929045
Pulled By: pietern
fbshipit-source-id: 00df19fa16a568d1e22f37a2ba65677ab0cce3fd