19 Commits

Author SHA1 Message Date
beb4d7816d [BE]: ruff PLC0207 - use maxsplit kwarg (#160107)
Automatically replaces split with rsplit when relevant and only performs the split up to the first ( or last value). This allows early return of the split function and improve efficiency.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160107
Approved by: https://github.com/albanD
2025-08-08 03:14:59 +00:00
3443627e07 Revert "[BE]: Enable RUFF TRY400 rule - log.exception (#153473)"
This reverts commit 4f4ecc583e0f48ad2d062a53bf91c61ab40b4948.

Reverted https://github.com/pytorch/pytorch/pull/153473 on behalf of https://github.com/jeanschmidt due to seems to have broken internal signals, @albanD may I count on you to help the author merge his PR? D74837988 ([comment](https://github.com/pytorch/pytorch/pull/153473#issuecomment-2886017075))
2025-05-16 08:29:26 +00:00
4f4ecc583e [BE]: Enable RUFF TRY400 rule - log.exception (#153473)
Change logging.error to logging.exception to log additional information when relevant.  A few places have slipped in logging.errors in try except since I last did a clean up here and the rule is stabilized so I am enabling it codebase wide. I have NOQA'd much of our custom exception stack trace handling for RPC calls and distributed and tried to a fix a few errors based on whether we immediately reraised it or if we didn't print any exception handling where it could be useful.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153473
Approved by: https://github.com/albanD, https://github.com/cyyever
2025-05-15 13:36:59 +00:00
50657120a0 Allow workflows to opt-out of experiments (#153085)
This change adds support to allow workflows to opt-out of experiments.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153085
Approved by: https://github.com/ZainRizvi

Co-authored-by: Zain Rizvi <ZainRizvi@users.noreply.github.com>
2025-05-09 16:34:46 +00:00
60f98262f1 PEP585: .github (#145707)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145707
Approved by: https://github.com/huydhn
2025-01-27 21:21:01 +00:00
498a7808ff Fix unused Python variables outside torch/ and test/ (#136359)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136359
Approved by: https://github.com/albanD
2024-12-11 17:10:23 +00:00
b69282c98c Enable opting out of experiments even when they're being rolled out (#140433)
Enables opting out of specific experiments in the runner determinator

To opt out:
1. Go to the tracking issue: https://github.com/pytorch/test-infra/issues/5132
2. In the entry by your name, enter the experiment name, prefixed with a `-`.  For example, to opt out of the LF fleet you could enter `@ZainRIzvi,-lf`

This lets you simultaneously be opted into some experiments and opted out of others.

While the `disable-runner-experiments` label offers an option to disable all experiments on a given PR, this one lets you disable a selected set of experiments across all your PRs.

Fixes https://github.com/pytorch/pytorch/issues/138099

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140433
Approved by: https://github.com/zxiiro, https://github.com/jeanschmidt
2024-11-14 19:18:24 +00:00
09ba38c4b7 Add an opt-out label to runner determinator on PR (#140054)
My sales pitch:  I need to ssh into the runner from time to time on my PR to debug issues, but it's well-known that LF runners don't support SSH login anymore.  So, the propose fix here is to introduce a new label called ~no-runner-determinator~ `no-runner-experiments` that can be attached to the PR.  Whenever `.github/scripts/runner_determinator.py` runs on a PR and sees this label, it will not apply any logic and just straight up use an empty prefix.

### Testing

With the label:

```
python3 runner_determinator.py \
    --github-token "MY_TOKEN" \
    --github-issue "5132" \
    --github-branch "install-torchao-torchtune-et" \
    --github-actor "huydhn" \
    --github-issue-owner "huydhn" \
    --github-ref-type "branch" \
    --github-repo "pytorch/pytorch" \
    --eligible-experiments "" \
    --pr-number "139947"

INFO    : Opt-out runner determinator because #139947 has no-runner-determinator label
WARNING : No env var found for GITHUB_OUTPUT, you must be running this code locally. Falling back to the deprecated print method.
::set-output name=label-type::
```

Without the label:

```
python3 runner_determinator.py \
    --github-token "MY_TOKEN" \
    --github-issue "5132" \
    --github-branch "install-torchao-torchtune-et" \
    --github-actor "huydhn" \
    --github-issue-owner "huydhn" \
    --github-ref-type "branch" \
    --github-repo "pytorch/pytorch" \
    --eligible-experiments "" \
    --pr-number "139947"

INFO    : Based on rollout percentage of 95%, enabling experiment lf.
INFO    : Skipping experiment 'awsa100', as it is not a default experiment
WARNING : No env var found for GITHUB_OUTPUT, you must be running this code locally. Falling back to the deprecated print method.
::set-output name=label-type::lf.
```

Running in trunk commit without a PR number will use the regular logic:

```
python3 runner_determinator.py \
    --github-token "MY_TOKEN" \
    --github-issue "5132" \
    --github-branch "install-torchao-torchtune-et" \
    --github-actor "huydhn" \
    --github-issue-owner "huydhn" \
    --github-ref-type "branch" \
    --github-repo "pytorch/pytorch" \
    --eligible-experiments "" \
    --pr-number ""

INFO    : Based on rollout percentage of 95%, enabling experiment lf.
INFO    : Skipping experiment 'awsa100', as it is not a default experiment
WARNING : No env var found for GITHUB_OUTPUT, you must be running this code locally. Falling back to the deprecated print method.
::set-output name=label-type::lf.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140054
Approved by: https://github.com/malfet, https://github.com/ZainRizvi
2024-11-07 22:55:27 +00:00
2cb983ab97 [CI] Adds support for selecting experiments for workflows on runner determinator (#137614)
adds a `default` tag to experiment configurations, allowing to remove some experiments by default on the random draw:

```
        experiments:
            lf:
                rollout_perc: 25
            otherExp:
                rollout_perc: 25
                default: false
        ---
```

and includes the configuration to filter what experiments are of interest for a particular workflow (comma separated):

```
  get-test-label-type:
    name: get-test-label-type
    uses: ./.github/workflows/_runner-determinator.yml
    with:
      ...
      check_experiments: "awsa100"
```

The end goal, is to enable us to run multiple experiments, that are independent from one another. For example, while we still runs the LF infra experiment, we want to migrate other runners leveraging the current solution. A immediate UC is for the A100 instances, where we want to migrate to AWS.

Those new instances will during the migration period be labeled both `awsa100.linux.gcp.a100` and `linux.aws.a100`. Once the experiment ends, we will remove the first confusing one.

```
jobs:
  get-build-label-type:
    name: get-build-label-type
    uses: ./.github/workflows/_runner-determinator.yml
    with:
      ...

  get-test-label-type:
    name: get-test-label-type
    uses: ./.github/workflows/_runner-determinator.yml
    with:
      ...
      check_experiments: "awsa100"

  linux-focal-cuda12_1-py3_10-gcc9-inductor-build:
    name: cuda12.1-py3.10-gcc9-sm80
    uses: ./.github/workflows/_linux-build.yml
    needs:
      - get-build-label-type
      - get-test-label-type
    with:
      runner_prefix: "${{ needs.get-build-label-type.outputs.label-type }}"
      ...
      test-matrix: |
        { include: [
          { config: "inductor_huggingface_perf_compare", shard: 1, num_shards: 1, runner: "${{ needs.get-test-label-type.outputs.label-type }}linux.gcp.a100" },
          ...
        ]}
      ...
```

```
experiments:
    lf:
        rollout_perc: 50
    awsa100:
        rollout_perc: 50
         default: false
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137614
Approved by: https://github.com/malfet
2024-10-11 19:20:02 +00:00
f0fa460c60 [BE] Add script to keept the runner-determinator scripts in sync (#136794)
Whenever we update runner_determinator.py it needs to be copied over into _runner-determinator.yml.

This is a quick utility script to make that process less tedious
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136794
Approved by: https://github.com/zxiiro, https://github.com/jeanschmidt
2024-10-01 22:26:28 +00:00
d46ebcb31b Enable experiments for protected branches (#136785)
This is to allow the protected branches (like `main` and `nightly`) also run on the LF fleet, now that we've migrated over
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136785
Approved by: https://github.com/jeanschmidt
2024-09-30 20:58:28 +00:00
09519eb195 Support rolling over a percentage of workflows (#134816)
In order to support adding a rollover percentage, this ended up being a complete rewrite of runner_determinator.py.

Details of the new format are in the comments up top.

On the plus side, this now includes some unit tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134816
Approved by: https://github.com/PaliC, https://github.com/zxiiro
2024-09-11 18:01:26 +00:00
8f66995459 Revert "Support rolling over a percentage of workflows (#134816)"
This reverts commit fc890b55b51098437b6149abf1026a8b2aaee389.

Reverted https://github.com/pytorch/pytorch/pull/134816 on behalf of https://github.com/malfet due to Causes lint to intermittently fail ([comment](https://github.com/pytorch/pytorch/pull/134816#issuecomment-2332902609))
2024-09-05 23:39:41 +00:00
fc890b55b5 Support rolling over a percentage of workflows (#134816)
In order to support adding a rollover percentage, this ended up being a complete rewrite of runner_determinator.py.

Details of the new format are in the comments up top.

On the plus side, this now includes some unit tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134816
Approved by: https://github.com/PaliC, https://github.com/zxiiro
2024-09-05 22:21:45 +00:00
469429b959 Refactor runner determinator (#134796)
Some minor refactorings to make the code easier to parse and easier to add unit tests for.  Keeping this as a separate PR for ease of review, since it should have zero functional behavior changes
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134796
Approved by: https://github.com/zxiiro, https://github.com/PaliC
2024-09-03 23:29:04 +00:00
c6582f11cd Add get_optin_feature() to allow opt-in to amz2023 (#131792)
This extends the runner determinator to be able to opt-in to keywords
to provide additional options when determining which systems to run
jobs on. This enables us to support opt-in users to Amazon Linux 2023.

This change creates a generic get_optin_feature() which hopefully will
be useful to handle additional future features that we might want to
experiment with.

This change has kept backwards compatability with the existing issue
userlist format and adds support for the comma-separated list of users
in a backwards compatible way.

The user list has the following rules:

- Users are GitHub usernames with the @ prefix
- If the first line is a "*" then all users will use the new runners
- If the first line is a "!" then all users will use the old runners
- Each user is also a comma-separated list of features/experiments to enable
- A "#" prefix indicates the user is opted out of the new runners but is opting
  into features/experiments.

Example user list:

```
@User1
@User2,amz2023
#@UserOptOutOfNewRunner,amz2023
```

This closes pytorch/ci-infra#249.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131792
Approved by: https://github.com/jeanschmidt, https://github.com/ZainRizvi
2024-08-06 17:54:20 +00:00
3eb9fa5d58 Add support for using LF Canary runners (#131188)
The script is updated such that if a canary build is detected and the label_type is LF runner it will run on an LF Canary runner.

Closes pytorch/ci-infra#245.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131188
Approved by: https://github.com/ZainRizvi
2024-07-22 13:26:46 +00:00
9645eaaaec [BE] Improve logging for runner-determinator (#129679)
This lets us be more flexible about what data we output and throwing exceptions. It's also less likely to break when others make changes (e.g. any print statement would have broken this code before since the printed output was expected to only be a json)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129679
Approved by: https://github.com/zxiiro, https://github.com/jeanschmidt, https://github.com/Skylion007
2024-07-01 22:31:35 +00:00
389492e264 Fix runner determinator bug (#129612)
Currently the runner determinator is buggy and doesn't let anyone's workflows run against the LF runners (it prefixes a "@" to the user names in the issue instead of either stripping it or prefixing it to the incoming names)

This PR fixes the bug so that people opted in to using LF runners can actually use them. It also puts the python code back into the repo.  Even though the code isn't directly invoked, having it there makes testing and linting easier/possible

Also includes lint fixes

Note: if you just review the .yml file you'll see all the relevant diffs

### Testing:
#### Before
```
python .github/scripts/runner_determinator.py --github-token $GH_KEY --github-issue 5132 --github-actor ZainRizvi --github-issue-owner ZainRizvi --github-branch foo
{"label_type": "", "message": "LF Workflows are disabled for ZainRizvi, ZainRizvi. Using meta runners."}
```

#### After
```
python .github/scripts/runner_determinator.py --github-token $GH_KEY --github-issue 5132 --github-actor ZainRizvi --github-issue-owner ZainRizvi --github-branch foo
{"label_type": "lf.", "message": "LF Workflows are enabled for ZainRizvi, ZainRizvi. Using LF runners."}
```

Aside: updated test case after rebase:
```
python .github/scripts/runner_determinator.py --github-token $GH_KEY --github-issue 5132 --github-actor ZainRizvi --github-issue-owner ZainRizvi2 --github-branch foo  --github-repo python/pythonss --github-ref-type branch
{"label_type": "lf.", "message": "LF Workflows are enabled for ZainRizvi. Using LF runners."}
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129612
Approved by: https://github.com/zxiiro, https://github.com/jeanschmidt
2024-06-27 17:51:09 +00:00