Commit Graph

16 Commits

Author SHA1 Message Date
3255e7872b Enable all flake8-logging-format rules (#164655)
These rules are enabled by removing existing suppressions.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164655
Approved by: https://github.com/janeyx99, https://github.com/mlazos
2025-10-19 00:59:28 +00:00
7457d139c5 Add pyrefly suppressions to torch/distributed (7/n) (#165002)
Adds suppressions to pyrefly will typecheck clean: https://github.com/pytorch/pytorch/issues/163283

One more PR after this one.

Test plan:
dmypy restart && python3 scripts/lintrunner.py -a
pyrefly check

step 1: delete lines in the pyrefly.toml file from the project-excludes field
step 2: run pyrefly check
step 3: add suppressions, clean up unused suppressions
before: https://gist.github.com/maggiemoss/4b3bf2037014e116bc00706a16aef199

after:
INFO 0 errors (6,884 ignored)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165002
Approved by: https://github.com/oulgen
2025-10-09 04:08:25 +00:00
3443627e07 Revert "[BE]: Enable RUFF TRY400 rule - log.exception (#153473)"
This reverts commit 4f4ecc583e0f48ad2d062a53bf91c61ab40b4948.

Reverted https://github.com/pytorch/pytorch/pull/153473 on behalf of https://github.com/jeanschmidt due to seems to have broken internal signals, @albanD may I count on you to help the author merge his PR? D74837988 ([comment](https://github.com/pytorch/pytorch/pull/153473#issuecomment-2886017075))
2025-05-16 08:29:26 +00:00
4f4ecc583e [BE]: Enable RUFF TRY400 rule - log.exception (#153473)
Change logging.error to logging.exception to log additional information when relevant.  A few places have slipped in logging.errors in try except since I last did a clean up here and the rule is stabilized so I am enabling it codebase wide. I have NOQA'd much of our custom exception stack trace handling for RPC calls and distributed and tried to a fix a few errors based on whether we immediately reraised it or if we didn't print any exception handling where it could be useful.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153473
Approved by: https://github.com/albanD, https://github.com/cyyever
2025-05-15 13:36:59 +00:00
316808e4e9 PEP585 update - torch/distributed/elastic torch/distributed/checkpoint (#145163)
See #145101 for details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145163
Approved by: https://github.com/Skylion007
2025-01-19 20:55:59 +00:00
e6d4451ae8 [BE][Easy] enable UFMT for torch/distributed/{algorithms,autograd,benchmarks,checkpoint,elastic}/ (#128866)
Part of #123062

- #123062

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128866
Approved by: https://github.com/fegin
2024-06-18 13:51:53 +00:00
3a0d088517 Flip default value for mypy disallow_untyped_defs [5/11] (#127842)
See #127836 for details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127842
Approved by: https://github.com/oulgen
2024-06-08 18:49:18 +00:00
1dd42e42c4 [BE]: Try TCH autofixes on torch/ (#125536)
Tries TCH autofixes and see what breaks

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125536
Approved by: https://github.com/ezyang
2024-05-05 23:13:59 +00:00
b6201a60c5 [BE] minor logging cleanup in distributed (#122921)
Summary:
    Minor logging cleanup in distributed library
    1. Don't use "f" formatted strings - address linter issues.
    2. Nits: Make use of unused `e` (error) in a few logs.
    3. Change info->debug as asked in issue #113545
    4. Nit: rename log -> logger in a few files for consistency
    5. Fix a linter error.

    Test Plan:
    1. Local build passes.
    2. Linter is happy.

    Reviewers: wanchaol

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122921
Approved by: https://github.com/wanchaol
2024-03-29 03:34:01 +00:00
51fb99250b Fix missing MAST log when there is Unicode non-decodable text in logs (#119298)
Summary:
## Issue
When there is Unicode non-decodable text in logs, `tail_logger` will stop working afterwards, i.e. f527390102

In the example, the process stopped producing Python logs after 17:20:21 untill the job finished
```
[0]:I0201 17:20:21.338000 3429 gen_ai/genie_projects/llm/metaformers/reward_model_score.py:335] Progress: 118 batches out of 512 total batches. 23.05 % | (gpu mem: 25.8GB, free CPU mem: 1387.8GB)
I0201 17:39:14 Stopping twtask-main.service with Service Result: [success] Exit Code: [exited] Exit Status: [0]
```
At the end, `UnicodeDecodeError` was thrown at the end with no call stack.

## Fix
Use `errors="replace"` to avoid throwing exception when `UnicodeDecodeError` happens.

Test Plan: f528854819

Differential Revision: D53483644

Co-authored-by: Jack Zhang <jackzh@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119298
Approved by: https://github.com/XilunWu
2024-02-07 19:25:43 +00:00
91973e1c31 Issue113185 (#113523)
Fixes #113185

I have fixed the given docstring errors. The followings are the outputs with numbers before and after the changes:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113523
Approved by: https://github.com/kit1980
2023-11-14 22:25:28 +00:00
bae8506589 [TorchElastic] Add option to configure log prefix for each rank (#112357)
Summary:
Add an ability to customize log lines and addtional template like behavior to enrich log information.

Motivation:
a) Log stream processing/aggregation gains additional value when it includes information about the global rank. Extension to that is that it will be easier to map ranks to hosts from log stream information (less relevant at the moment)
b) Users can easily map the failure to the right rank without matching node rank offset+local rank.

Implementation
- BC change - keeps the logs line prefix as `[<role name><local rank>]:`
- Optional env variable TORCHELASTIC_LOG_LINE_HEADER that will be used as a prefix when specified and currently exposes `role_name`, `rank` and `local_rank` variables that will be bound when agent assigns the ranks.

Test Plan:
CI

https://fburl.com/mlhub/mzx5xspv

Differential Revision: D50584590

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112357
Approved by: https://github.com/kiukchung
2023-11-08 01:00:26 +00:00
232b96b6e2 [BE] Enable ruff's UP rules and autoformat distributed/ (#105433)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105433
Approved by: https://github.com/albanD
2023-07-19 14:27:11 +00:00
b8b840be3d Convert logging f-strings to use % format, part five (#98765)
This does some annoying but simple cases by hand.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98765
Approved by: https://github.com/wanchaol
2023-04-11 13:17:59 +00:00
cf2f552cd8 Add __all__ to torch.{fx, distributed, backends} submodules (#85079)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85079
Approved by: https://github.com/rohan-varma
2022-09-20 12:51:08 +00:00
b03c92a9c5 [2/n][torch/elastic][upstream] Move torchelastic/timer torchelastic/multiprocessing to torch/distributed/elastic (#53574)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53574

Upstreams `torchelastic/timer|multiprocessing` to `torch/distributed/elastic/timer|multiprocessing`

Test Plan:
```
buck test mode/dev-nosan //caffe2/torch/distributed/elastic/...
buck test mode/dev-nosan //caffe2/test/distributed/elastic/...
buck test mode/dev-nosan //pytorch/elastic/torchelastic/...
buck test mode/dev-nosan //hpc/...
buck test mode/dev-nosan //caffe2/torch/fb/training_toolkit/...
```

Reviewed By: borovsky-d, wilson100hong

Differential Revision: D26899809

fbshipit-source-id: e6dbc2a78282eac296c262b3206a979e3ef1ff53
2021-03-10 12:32:53 -08:00