pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Author	SHA1	Message	Date
Yuanyuan Chen	b2953f5643	[9/N] Apply ruff UP035 rule (#165515 ) This is follow-up of #165214 to continue applying ruff UP035 rule to the code base. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165515 Approved by: https://github.com/Lucaskabela	2025-10-17 00:09:51 +00:00
Catherine Lee	561193e5f2	[CI][testing] Use 3 processes for testing on sm89 and sm90 jobs (#158691 ) 3 procs were used for sm86, but we switched to sm89 and the check failed so it switched back to 2 sm90 is H100, but idk what unittests we have running there, but I assume they also have a lot of memory They use larger runners, which have more GPU memory, so its usually ok. I think it's ~22GB -> 10GB per proc if 2, 6GB per proc if 3 (cuda context maybe 1GB) I've applied skips to the ones that OOMed Time decreases from ~2.7hr per test job -> ~2hr Pull Request resolved: https://github.com/pytorch/pytorch/pull/158691 Approved by: https://github.com/huydhn	2025-07-25 15:26:29 +00:00
PyTorch MergeBot	11ea3736dd	Revert "[CI][testing] Use 3 processes for testing on sm89 and sm90 jobs (#158691 )" This reverts commit 0c0fcb53ff5ee1eb5f0d1f535ed3726d01f8abb5. Reverted https://github.com/pytorch/pytorch/pull/158691 on behalf of https://github.com/ZainRizvi due to Sorry but these are causing jobs to fail with out of memory errors on trunk ([comment](https://github.com/pytorch/pytorch/pull/158691#issuecomment-3113922186))	2025-07-24 15:31:53 +00:00
Catherine Lee	0c0fcb53ff	[CI][testing] Use 3 processes for testing on sm89 and sm90 jobs (#158691 ) 3 procs were used for sm86, but we switched to sm89 and the check failed so it switched back to 2 sm90 is H100, but idk what unittests we have running there, but I assume they also have a lot of memory They use larger runners, which have more GPU memory, so its usually ok. I think it's ~22GB -> 10GB per proc if 2, 6GB per proc if 3 (cuda context maybe 1GB) I've applied skips to the ones that OOMed Time decreases from ~2.7hr per test job -> ~2hr Pull Request resolved: https://github.com/pytorch/pytorch/pull/158691 Approved by: https://github.com/huydhn	2025-07-24 01:51:28 +00:00
Xuehai Pan	c73a92fbf5	[BE][CI] bump `ruff` to 0.9.2: multiline `assert` statements (#144546 ) Reference: https://docs.astral.sh/ruff/formatter/black/#assert-statements > Unlike Black, Ruff prefers breaking the message over breaking the assertion, similar to how both Ruff and Black prefer breaking the assignment value over breaking the assignment target: > > ```python > # Input > assert ( > len(policy_types) >= priority + num_duplicates > ), f"This tests needs at least {priority+num_duplicates} many types." > > > # Black > assert ( > len(policy_types) >= priority + num_duplicates > ), f"This tests needs at least {priority+num_duplicates} many types." > > # Ruff > assert len(policy_types) >= priority + num_duplicates, ( > f"This tests needs at least {priority + num_duplicates} many types." > ) > ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/144546 Approved by: https://github.com/malfet	2025-02-27 20:46:16 +00:00
PyTorch MergeBot	99f2491af9	Revert "Use absolute path `path.resolve()` -> `path.absolute()` (#129409 )" This reverts commit 45411d1fc9a2b6d2f891b6ab0ae16409719e09fc. Reverted https://github.com/pytorch/pytorch/pull/129409 on behalf of https://github.com/jeanschmidt due to Breaking internal CI, @albanD please help get this PR merged ([comment](https://github.com/pytorch/pytorch/pull/129409#issuecomment-2571316444))	2025-01-04 14:17:20 +00:00
Xuehai Pan	45411d1fc9	Use absolute path `path.resolve()` -> `path.absolute()` (#129409 ) Changes: 1. Always explicit `.absolute()`: `Path(__file__)` -> `Path(__file__).absolute()` 2. Replace `path.resolve()` with `path.absolute()` if the code is resolving the PyTorch repo root directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/129409 Approved by: https://github.com/albanD	2025-01-03 20:03:40 +00:00
Xuehai Pan	b6bdb67f82	[BE][Easy] use `pathlib.Path` instead of `dirname` / `".."` / `pardir` (#129374 ) Changes by apply order: 1. Replace all `".."` and `os.pardir` usage with `os.path.dirname(...)`. 2. Replace nested `os.path.dirname(os.path.dirname(...))` call with `str(Path(...).parent.parent)`. 3. Reorder `.absolute()` ~/ `.resolve()`~ and `.parent`: always resolve the path first. `.parent{...}.absolute()` -> `.absolute().parent{...}` 4. Replace chained `.parent x N` with `.parents[${N - 1}]`: the code is easier to read (see 5.) `.parent.parent.parent.parent` -> `.parents[3]` 5. ~Replace `.parents[${N - 1}]` with `.parents[${N} - 1]`: the code is easier to read and does not introduce any runtime overhead.~ ~`.parents[3]` -> `.parents[4 - 1]`~ 6. ~Replace `.parents[2 - 1]` with `.parent.parent`: because the code is shorter and easier to read.~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/129374 Approved by: https://github.com/justinchuby, https://github.com/malfet	2024-12-29 17:23:13 +00:00
PyTorch MergeBot	475656fd9c	Revert "[BE][Easy] use `pathlib.Path` instead of `dirname` / `".."` / `pardir` (#129374 )" This reverts commit 2293fe1024812d6349f6e2b3b7de82c6b73f11e4. Reverted https://github.com/pytorch/pytorch/pull/129374 on behalf of https://github.com/malfet due to failing internal ROCM builds with error: ModuleNotFoundError: No module named hipify ([comment](https://github.com/pytorch/pytorch/pull/129374#issuecomment-2562973920))	2024-12-26 17:32:23 +00:00
PyTorch MergeBot	cc4e70b7c3	Revert "Use absolute path `path.resolve()` -> `path.absolute()` (#129409 )" This reverts commit 135c7db99d646b8bd9603bf969d47d3dec5987b1. Reverted https://github.com/pytorch/pytorch/pull/129409 on behalf of https://github.com/malfet due to need to revert to as dependency of https://github.com/pytorch/pytorch/pull/129374 ([comment](https://github.com/pytorch/pytorch/pull/129409#issuecomment-2562969825))	2024-12-26 17:26:06 +00:00
Xuehai Pan	135c7db99d	Use absolute path `path.resolve()` -> `path.absolute()` (#129409 ) Changes: 1. Always explicit `.absolute()`: `Path(__file__)` -> `Path(__file__).absolute()` 2. Replace `path.resolve()` with `path.absolute()` if the code is resolving the PyTorch repo root directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/129409 Approved by: https://github.com/albanD	2024-12-24 08:33:08 +00:00
Xuehai Pan	2293fe1024	[BE][Easy] use `pathlib.Path` instead of `dirname` / `".."` / `pardir` (#129374 ) Changes by apply order: 1. Replace all `".."` and `os.pardir` usage with `os.path.dirname(...)`. 2. Replace nested `os.path.dirname(os.path.dirname(...))` call with `str(Path(...).parent.parent)`. 3. Reorder `.absolute()` ~/ `.resolve()`~ and `.parent`: always resolve the path first. `.parent{...}.absolute()` -> `.absolute().parent{...}` 4. Replace chained `.parent x N` with `.parents[${N - 1}]`: the code is easier to read (see 5.) `.parent.parent.parent.parent` -> `.parents[3]` 5. ~Replace `.parents[${N - 1}]` with `.parents[${N} - 1]`: the code is easier to read and does not introduce any runtime overhead.~ ~`.parents[3]` -> `.parents[4 - 1]`~ 6. ~Replace `.parents[2 - 1]` with `.parent.parent`: because the code is shorter and easier to read.~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/129374 Approved by: https://github.com/justinchuby, https://github.com/malfet	2024-12-21 22:08:01 +00:00
cyy	82aaf64422	[3/N] Apply py39 ruff fixes (#142115 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/142115 Approved by: https://github.com/ezyang	2024-12-11 17:50:10 +00:00
Tom Ritchford	498a7808ff	Fix unused Python variables outside torch/ and test/ (#136359 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/136359 Approved by: https://github.com/albanD	2024-12-11 17:10:23 +00:00
Catherine Lee	4fe6a5dc34	Move slow tests to be in repo (#132379 ) Move the slow test json to be in the pytorch/pytorch repo and make a job that will update it weekly. The job uses the same environment as the commit hash. It uses similar code to the hash updates, but the hash update contains a lot of code that is specific to the hash update, so I chose to pick out the parts that are relevant Remove references to the old file and set up testing to read from the new file instead The old update cadence was every day, the new one is every week The auto slow test infra + the lack of pinning between pytorch and test-infra makes it really hard to tell if a test started failing because of a change or because of the slow test json changing. While this can have benefits, like disable test issues being effective everywhere immediately, it can also be very confusing, especially since we don't have the same insight into slow tests like we do for disable issues. Example PR made: https://github.com/pytorch/pytorch/pull/132383 (with all the changes from this PR because it was working on top of this) We should just get rid of this at some point in favor of the slowTest decorator, but there are some tests that take 5+ minutes to run and I don't want to track them down right now Pull Request resolved: https://github.com/pytorch/pytorch/pull/132379 Approved by: https://github.com/huydhn	2024-08-07 18:42:56 +00:00
Xuehai Pan	f6838d521a	[BE][Easy][5/19] enforce style for empty lines in import segments in `tools/` and `torchgen/` (#129756 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129756 Approved by: https://github.com/ezyang	2024-07-17 06:44:35 +00:00
Xuehai Pan	8a67daf283	[BE][Easy] enable postponed annotations in `tools` (#129375 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/129375 Approved by: https://github.com/malfet	2024-06-29 09:23:35 +00:00
PyTorch MergeBot	3d96217891	Revert "[BE][Easy] use `pathlib.Path` instead of `dirname` / `".."` / `pardir` (#129374 )" This reverts commit 9e1f3ecaa710785a1ab03c6ad5093a5566d6c5e5. Reverted https://github.com/pytorch/pytorch/pull/129374 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is still failing with the same error ([comment](https://github.com/pytorch/pytorch/pull/129374#issuecomment-2197801405))	2024-06-29 00:47:15 +00:00
PyTorch MergeBot	a32ce5ce34	Revert "[BE][Easy] enable postponed annotations in `tools` (#129375 )" This reverts commit 59eb2897f1745f513edb6c63065ffad481c4c8d0. Reverted https://github.com/pytorch/pytorch/pull/129375 on behalf of https://github.com/huydhn due to Sorry for reverting your change but I need to revert to cleanly revert https://github.com/pytorch/pytorch/pull/129374, please do a rebase and reland this ([comment](https://github.com/pytorch/pytorch/pull/129375#issuecomment-2197800541))	2024-06-29 00:44:25 +00:00
Xuehai Pan	59eb2897f1	[BE][Easy] enable postponed annotations in `tools` (#129375 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/129375 Approved by: https://github.com/malfet	2024-06-28 15:37:54 +00:00
Xuehai Pan	9e1f3ecaa7	[BE][Easy] use `pathlib.Path` instead of `dirname` / `".."` / `pardir` (#129374 ) Changes by apply order: 1. Replace all `".."` and `os.pardir` usage with `os.path.dirname(...)`. 2. Replace nested `os.path.dirname(os.path.dirname(...))` call with `str(Path(...).parent.parent)`. 3. Reorder `.absolute()` ~/ `.resolve()`~ and `.parent`: always resolve the path first. `.parent{...}.absolute()` -> `.absolute().parent{...}` 4. Replace chained `.parent x N` with `.parents[${N - 1}]`: the code is easier to read (see 5.) `.parent.parent.parent.parent` -> `.parents[3]` 5. ~Replace `.parents[${N - 1}]` with `.parents[${N} - 1]`: the code is easier to read and does not introduce any runtime overhead.~ ~`.parents[3]` -> `.parents[4 - 1]`~ 6. ~Replace `.parents[2 - 1]` with `.parent.parent`: because the code is shorter and easier to read.~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/129374 Approved by: https://github.com/justinchuby, https://github.com/malfet	2024-06-28 00:35:15 +00:00
PyTorch MergeBot	895316119d	Revert "[BE][Easy] use `pathlib.Path` instead of `dirname` / `".."` / `pardir` (#129374 )" This reverts commit 0314c4c101c44d5d89b4fad9d37a012dc6f31128. Reverted https://github.com/pytorch/pytorch/pull/129374 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it causes lots of internal build failures where they fail to find hipify module ([comment](https://github.com/pytorch/pytorch/pull/129374#issuecomment-2192437052))	2024-06-26 19:03:57 +00:00
Xuehai Pan	0314c4c101	[BE][Easy] use `pathlib.Path` instead of `dirname` / `".."` / `pardir` (#129374 ) Changes by apply order: 1. Replace all `".."` and `os.pardir` usage with `os.path.dirname(...)`. 2. Replace nested `os.path.dirname(os.path.dirname(...))` call with `str(Path(...).parent.parent)`. 3. Reorder `.absolute()` ~/ `.resolve()`~ and `.parent`: always resolve the path first. `.parent{...}.absolute()` -> `.absolute().parent{...}` 4. Replace chained `.parent x N` with `.parents[${N - 1}]`: the code is easier to read (see 5.) `.parent.parent.parent.parent` -> `.parents[3]` 5. ~Replace `.parents[${N - 1}]` with `.parents[${N} - 1]`: the code is easier to read and does not introduce any runtime overhead.~ ~`.parents[3]` -> `.parents[4 - 1]`~ 6. ~Replace `.parents[2 - 1]` with `.parent.parent`: because the code is shorter and easier to read.~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/129374 Approved by: https://github.com/justinchuby, https://github.com/malfet	2024-06-25 08:28:38 +00:00
Catherine Lee	9689532106	[CI] 3 procs non cuda (#125932 ) Too lazy to figure out actual time reduction here, I'll figure it out later. Also I'd rather get an average of a couple of runs on trunk rather than just this one PR Things got faster. Source? Trust me bro * rel to https://github.com/pytorch/pytorch/pull/125598 Pull Request resolved: https://github.com/pytorch/pytorch/pull/125932 Approved by: https://github.com/ZainRizvi	2024-05-15 16:18:36 +00:00
Catherine Lee	bef7d650c4	[CI] 3 procs on sm86 (#125598 ) yolo iirc the a10g/sm86 runners have ~21 GB of space, so we can increase parallelism on it to 3. This results in about 6GB CUDA mem per proc. The previous calculation + 2 procs resulted in about 8 GB Also fixes the the calc for per proc memory, assuming that CUDA context + anything else take about a little under 1GB of space (previous calc was .11 on about 7.5 - 8 GB <= .9GB) Times on main are about 1.9-2.5hr per shard This commit is around 1.6-2hr per shard Risks: increase in flaky tests due to OOM Pull Request resolved: https://github.com/pytorch/pytorch/pull/125598 Approved by: https://github.com/huydhn	2024-05-10 18:48:43 +00:00
Catherine Lee	6801595349	Fix round robin sharding (#121022 ) Fix round robin sharding when there are no test times and sort_by_time=False Adds more tests to test_test_selections for sort_by_time=False Adds more checks to test_split_shards_random for serial/parallel ordering + ordering of tests Refactoring of dup code Tested locally by running `python test/run_test.py --shard 3 5` with no test times downloaded and checked that it wasn't an empty list. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121022 Approved by: https://github.com/huydhn, https://github.com/osalpekar	2024-03-11 17:30:12 +00:00
PyTorch MergeBot	9eb8fae02d	Revert "Fix round robin sharding (#121022 )" This reverts commit effdea5fc62c6bf13cb8035f7bfcc205f05a8b6a. Reverted https://github.com/pytorch/pytorch/pull/121022 on behalf of https://github.com/clee2000 due to made sharding really uneven ([comment](https://github.com/pytorch/pytorch/pull/121022#issuecomment-1986552662))	2024-03-08 23:16:24 +00:00
Catherine Lee	effdea5fc6	Fix round robin sharding (#121022 ) Fix round robin sharding when there are no test times and sort_by_time=False Adds more tests to test_test_selections for sort_by_time=False Adds more checks to test_split_shards_random for serial/parallel ordering + ordering of tests Refactoring of dup code Tested locally by running `python test/run_test.py --shard 3 5` with no test times downloaded and checked that it wasn't an empty list. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121022 Approved by: https://github.com/huydhn, https://github.com/osalpekar	2024-03-08 17:01:34 +00:00
Catherine Lee	c39bbd6def	Numbers based TD (#119901 ) Convert from a list/bucket based TD system to just a numbers based TD system. Looks like a massive change but a decent amount of it is tests and removing code. Main file of interest is interface.py, which Github is collapsing by default due to size The test files pretty much got rewritten entirely since a lot of the old tests are no longer relevant. Other notable changes: * Use Frozenset to make TestRun hashable * Adds tools/test/heuristics/__init__.py to ensure that unittest can discover the tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/119901 Approved by: https://github.com/osalpekar, https://github.com/huydhn	2024-02-26 17:01:19 +00:00
Catherine Lee	cfddfce0d3	Alternate sharding (#119078 ) Changes sharding to attempt to put all serial tests on as few shards as possible. Parallel tests are then distributed across all shards, with most of which likely ending up on the non serial shards Example: 8 minutes of serial tests, 20 minutes of parallel tests, 2 proc per machine, 6 machines -> 8 + 20/2 = 18 total minutes of tests -> 18 / 6 machines = 3 min per machine -> all serial tests should fit on 3 machines (3min, 3 min, 2min) -> majority of parallel tests should go on last 4 machines, one of which is shared with the serial tests Move serial tests to run first If I want to move to a purely numbers based sharding, this ensures that parallel tests are run with parallel tests as much as possible instead of interleaving serial + parallel tests, which decreases effectiveness of parallelization, while also ensuring that test reordering is still mostly effective. See `73e816ee80` for example logs Pull Request resolved: https://github.com/pytorch/pytorch/pull/119078 Approved by: https://github.com/huydhn	2024-02-21 16:40:27 +00:00
PyTorch MergeBot	9b38ee2343	Revert "Alternate sharding (#119078 )" This reverts commit 861acda20577739d52dd0bcf09e162192f25020f. Reverted https://github.com/pytorch/pytorch/pull/119078 on behalf of https://github.com/clee2000 due to failing `861acda205` ([comment](https://github.com/pytorch/pytorch/pull/119078#issuecomment-1946583857))	2024-02-15 16:59:50 +00:00
Catherine Lee	861acda205	Alternate sharding (#119078 ) Changes sharding to attempt to put all serial tests on as few shards as possible. Parallel tests are then distributed across all shards, with most of which likely ending up on the non serial shards Example: 8 minutes of serial tests, 20 minutes of parallel tests, 2 proc per machine, 6 machines -> 8 + 20/2 = 18 total minutes of tests -> 18 / 6 machines = 3 min per machine -> all serial tests should fit on 3 machines (3min, 3 min, 2min) -> majority of parallel tests should go on last 4 machines, one of which is shared with the serial tests Move serial tests to run first If I want to move to a purely numbers based sharding, this ensures that parallel tests are run with parallel tests as much as possible instead of interleaving serial + parallel tests, which decreases effectiveness of parallelization, while also ensuring that test reordering is still mostly effective. See `73e816ee80` for example logs Pull Request resolved: https://github.com/pytorch/pytorch/pull/119078 Approved by: https://github.com/huydhn	2024-02-15 01:32:44 +00:00
Aaron Gokaslan	ea7d70aecc	[BE]: ruff FURB136: replace ternary with min/max (preview) (#114382 ) Replaces ternary if else statements with simple min max when appropriate. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114382 Approved by: https://github.com/albanD	2023-11-22 22:10:01 +00:00
Zain Rizvi	a5641bc56b	[TD] Enable Test Class granularity on heuristics (#112161 ) Changes the heuristic framework to support multiple prioritizing individual classes within a test file. Components of this included: - Updating TestPrioritizations to accept individual test classes being prioritized. Previously, when a heuristic wanted to prioritize a test file it would pass in the test's name, now to prioritize a class within a test it uses the notation "test::classname" - Changes are fully backwards compatible with existing heuristics - Test sharding now supports sharding individual tests (for when they're prioritized) - When a TestClass is prioritized, we pass the appropriate "-k" flags down to pytest Pull Request resolved: https://github.com/pytorch/pytorch/pull/112161 Approved by: https://github.com/huydhn	2023-10-31 18:11:05 +00:00
Zain Rizvi	36399d067a	Port existing heuristics to TD framework (#107071 ) This PR looks big, but it's mostly just refactorings with a bit of dead code deletion. Exceptions are: - Some metric emissions were changed to comply with the new TD format - Some logging changes - We now run tests in three batches (highly_relevant, probably_relevant, unranked_relevance) instead of the previous two (prioritized and general) Refactorings done: - Moves all test reordering code to the new TD framework - Refactors run_test.py to cleanly support multiple levels of test priorities - Deletes some dead code that was originally written for logging Pull Request resolved: https://github.com/pytorch/pytorch/pull/107071 Approved by: https://github.com/clee2000, https://github.com/huydhn	2023-08-23 21:23:23 +00:00
Zain Rizvi	5ddb8ef827	Make emit_metrics importable without having boto3 installed (#107070 ) Make it so that scripts can import and run the `emit_metrics` function even if they don't have boto3 installed, in which case it will still validate the inputs but skip the actual metric emission part. It's purely a refactor without any real logic changes Motivation: So that run_test.py and the target determination code can use this library easily without worrying about if it was imported or if it's dependencies are installed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107070 Approved by: https://github.com/huydhn	2023-08-21 21:13:01 +00:00
Catherine Lee	f16be5e0d4	Reordering tests experiment (#106347 ) Companion with https://github.com/pytorch/test-infra/pull/4424 Uses the file rating generated by the test infra PR to re order tests. For each test file, sum the file ratings from the changed files in the PR, and put the tests in order of sum. A lot of tests are probably going to end up as "prioritized" since it takes anything with a rating > 0 right now. Sharding is done twice, once on the prioritized tests, and once on the general/non prioritized tests. Prioritized tests have an order, so they should be sharded according to that order, while general tests don't have an order and are sharded by test time, which should result in more balanced shards. I'll change the metric name before I merge, i want to quarantine my testing stuff from actual results Pull Request resolved: https://github.com/pytorch/pytorch/pull/106347 Approved by: https://github.com/ZainRizvi	2023-08-16 18:23:09 +00:00
PyTorch MergeBot	9858edd99f	Revert "Reordering tests experiment (#106347 )" This reverts commit 7dfab082be9eaeeee95c7b0363e59c824c6a9009. Reverted https://github.com/pytorch/pytorch/pull/106347 on behalf of https://github.com/clee2000 due to probably broke sharding ([comment](https://github.com/pytorch/pytorch/pull/106347#issuecomment-1675542738))	2023-08-11 23:59:48 +00:00
Catherine Lee	7dfab082be	Reordering tests experiment (#106347 ) Companion with https://github.com/pytorch/test-infra/pull/4424 Uses the file rating generated by the test infra PR to re order tests. For each test file, sum the file ratings from the changed files in the PR, and put the tests in order of sum. A lot of tests are probably going to end up as "prioritized" since it takes anything with a rating > 0 right now. Sharding is done twice, once on the prioritized tests, and once on the general/non prioritized tests. Prioritized tests have an order, so they should be sharded according to that order, while general tests don't have an order and are sharded by test time, which should result in more balanced shards. I'll change the metric name before I merge, i want to quarantine my testing stuff from actual results Pull Request resolved: https://github.com/pytorch/pytorch/pull/106347 Approved by: https://github.com/ZainRizvi	2023-08-09 20:11:11 +00:00
Justin Chu	14d87bb5ff	[BE] Enable ruff's UP rules and autoformat tools and scripts (#105428 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105428 Approved by: https://github.com/albanD, https://github.com/soulitzer, https://github.com/malfet	2023-07-19 01:24:44 +00:00
Zain Rizvi	c3d3165f16	Enable uploading metrics and upload Test Reordering metrics to dynamodb (#102691 ) Added a feature to upload test statistics to DynamoDB and Rockset using a new function `emit_metric` in `tools/stats/upload_stats_lib.py`. Added metrics to measure test reordering effectiveness in `tools/testing/test_selections.py`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102691 Approved by: https://github.com/malfet	2023-06-12 23:01:53 +00:00
PyTorch MergeBot	b52ee80cdc	Revert "Add print statements to debug sharding error (#102713 )" This reverts commit c7873522c2ceefbc3b747224da1d26d566115c9a. Reverted https://github.com/pytorch/pytorch/pull/102713 on behalf of https://github.com/clee2000 due to issue should be resolved now ([comment](https://github.com/pytorch/pytorch/pull/102713#issuecomment-1583334560))	2023-06-08 21:02:17 +00:00
Catherine Lee	90fd90dd94	Fix rocm sharding (#102871 ) Rocm queries for the number of processes it should use per machine, which might cause it be different across shards, which leads to inconsistencies when distributing tests among shards. My solution is to separate the vars used for shard calculations and the actual number of procs that can be used and to ensure that the var used for shard calculations is consistent across all shards for a test config + job. I believe that the only consequence is that rocm sharding might become unbalanced. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102871 Approved by: https://github.com/huydhn, https://github.com/malfet	2023-06-06 17:29:53 +00:00
Catherine Lee	c7873522c2	Add print statements to debug sharding error (#102713 ) sharding on rocm is broken, i cant replicate on dummy PRs even though it seems to happen pretty often on main, so adding this to increase my sample size. Hopefully this is enough print statements... Pull Request resolved: https://github.com/pytorch/pytorch/pull/102713 Approved by: https://github.com/huydhn	2023-06-01 22:38:28 +00:00
Zain Rizvi	c84f246c83	Improve time savings calculation math for test reordering (#102411 ) Use a more accurate method that accounts for tests being run in parallel Right now we still log results to the console, but later it'll get logged to Rockset for better tracking Pull Request resolved: https://github.com/pytorch/pytorch/pull/102411 Approved by: https://github.com/huydhn, https://github.com/malfet	2023-05-31 23:51:27 +00:00
Zain Rizvi	686b12c93d	Reduce log output when no tests are prioritized (#101803 ) <!-- copilot:summary --> ### <samp>🤖 Generated by Copilot at 733b991</samp> Improve test reordering output in `tools/testing/test_selections.py`. Add a check to only print reordering information when there are tests to prioritize. Pull Request resolved: https://github.com/pytorch/pytorch/pull/101803 Approved by: https://github.com/malfet, https://github.com/kit1980	2023-05-18 20:21:41 +00:00
Zain Rizvi	9a17989b63	Prioritize modified tests when running on `main` (#101618 ) If a PR modifies a test, prioritize running that test on the default branch so that we get the test signal faster Fixes https://github.com/pytorch/pytorch/issues/101617 Pull Request resolved: https://github.com/pytorch/pytorch/pull/101618 Approved by: https://github.com/huydhn	2023-05-17 00:49:45 +00:00
Zain Rizvi	b1474019a4	Test Reordering: Run previously failing tests first (#101123 ) Makes the CI prioritize running any test files that had a failing test in a previous iteration of the given PR. A follow up to https://github.com/pytorch/pytorch/pull/100522 which makes the `.pytest_cache` available to use here A concrete example: 1. Person A pushes a new commit and creates a PR. 2. 2 hours later, test_im_now_broken.py fails 3. Person A attempts to fix the test, but the test is actually still broken 4. The CI, seeing that test_im_now_broken.py had failed on a previous run, will now prioritize running that test first. Instead of waiting another 2 hours to get a signal, Person A only needs to wait ~15 minutes (which is how long it takes for tests to start running) # Testing I modified a file to make the tests invoking it fail and triggered CI twice with this failure. First run: https://github.com/pytorch/pytorch/actions/runs/4963943209/jobs/8883800811 Test step took 1h 9m to run Second run: https://github.com/pytorch/pytorch/actions/runs/4965016776/jobs/8885657992 Test step failed within 2m 27s Pull Request resolved: https://github.com/pytorch/pytorch/pull/101123 Approved by: https://github.com/malfet, https://github.com/huydhn	2023-05-16 19:57:54 +00:00
Zain Rizvi	ceecccc09e	Bugfix: Correctly detect test changes in PRs (#101304 ) Fixes a bug where the logic for deciding what tests have been edited by a PR would include all files that had been edited since the merge base, including files that were in main! Now it will only consider the files that are part of the PR itself Pull Request resolved: https://github.com/pytorch/pytorch/pull/101304 Approved by: https://github.com/seemethere, https://github.com/malfet	2023-05-13 00:59:41 +00:00
Zain Rizvi	95f191a248	Always run prioritized tests first, even if they're expected to run serially (#100748 ) Today, we prioritize running test files that were edited in the user's PR, with the idea being to run them before we run any other test. Except, if the modified test is supposed to run serially, then we still end up running it after all the parallelized tests have finished running. This PR fixes that to _always_ run the prioritized tests before the regular tests, regardless of if the test is supposed to run serially or in parallel Pull Request resolved: https://github.com/pytorch/pytorch/pull/100748 Approved by: https://github.com/huydhn	2023-05-08 20:23:46 +00:00

1 2

85 Commits