mirror of
https://github.com/pytorch/pytorch.git
synced 2025-11-02 22:51:51 +08:00
Summary: Part of an effort to extract some important error logs (e.g. [#157996](https://github.com/pytorch/pytorch/pull/157996)) that was `tee`'ed to `stdout` and `stderr`. The general idea is to: - Duplicate the `tee`s on `stdout` and `stderr` to a separate file, `filtered_stdout.log` and `filtered_stderr.log`, respectively. - In these files, as its name suggests, only log lines matching a customizable filter. - Later on in another PR, append the contents of these files to the reply file. Outline of changes in this PR: - Enhance `TailLog` to be able to 1) stream to a file, and 2) only write when the line matches the passed filter. - Add `filtered_stdout` and `filtered_stderr` to `LogsDest` and have `LogsSpecs` `reify` them. - In `start_processes()` and `PContext`, add params `duplicate_stdout_filters` and `duplicate_stderr_filters` to filter and write the duplicated stream to the files above. When no filters are passed in, no duplicated streams are created. Test Plan: ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/multiprocessing:api_test ``` ``` Buck UI: https://www.internalfb.com/buck2/f5c6b7da-217d-4a0b-872a-c7cd3d05587f Test UI: https://www.internalfb.com/intern/testinfra/testrun/4222124951617688 Network: Up: 398B Down: 44MiB (reSessionID-a489a961-b602-45be-b851-3490ebb7a26a) Analyzing targets. Remaining 0/200 Executing actions. Remaining 0/12856 0.1s exec time total Command: test. Finished 1 local Time elapsed: 17:37.9s Tests finished: Pass 52. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` ``` $ buck test 'fbcode//mode/opt' caffe2/test/distributed/elastic/multiprocessing:tail_log_test ``` ``` Buck UI: https://www.internalfb.com/buck2/d6d5c1c1-db98-4d9c-b608-7ba6fbb5e3ee Test UI: https://www.internalfb.com/intern/testinfra/testrun/13510798985149262 Network: Up: 94KiB Down: 417MiB (reSessionID-27b46fba-d31c-4c04-8ede-a506454e6922) Analyzing targets. Remaining 0/3 536 actions, 555 artifacts declared Executing actions. Remaining 0/186 1:05.5s exec time total Command: test. Finished 7 local, 1 remote, 115 cache (93% hit) 37.0s exec time cached (56%) Time elapsed: 1:11.5s Tests finished: Pass 7. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` Rollback Plan: Differential Revision: D80188995 Pull Request resolved: https://github.com/pytorch/pytorch/pull/160712 Approved by: https://github.com/fduwjj
29 lines
851 B
Python
Executable File
29 lines
851 B
Python
Executable File
#!/usr/bin/env python3
|
|
|
|
# Copyright (c) Facebook, Inc. and its affiliates.
|
|
# All rights reserved.
|
|
#
|
|
# This source code is licensed under the BSD-style license found in the
|
|
# LICENSE file in the root directory of this source tree.
|
|
|
|
import argparse
|
|
import os
|
|
import sys
|
|
|
|
|
|
if __name__ == "__main__":
|
|
parser = argparse.ArgumentParser(description="test binary, exits with exitcode")
|
|
parser.add_argument("--exitcode", type=int, default=0)
|
|
parser.add_argument("msg", type=str)
|
|
args = parser.parse_args()
|
|
|
|
rank = int(os.environ["RANK"])
|
|
exitcode = args.exitcode
|
|
if exitcode != 0:
|
|
print(f"exit {exitcode} from {rank}", file=sys.stderr)
|
|
sys.exit(exitcode)
|
|
else:
|
|
for msg in args.msg.split(","):
|
|
print(f"{msg} stdout from {rank}")
|
|
print(f"{msg} stderr from {rank}", file=sys.stderr)
|