Commit Graph

695 Commits

Author SHA1 Message Date
e0238577b6 Always import test selection tools (#107644)
https://github.com/pytorch/pytorch/pull/107070 made emit_metrics importable without boto3, so we could just import all the files without the try catch.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107644
Approved by: https://github.com/huydhn, https://github.com/malfet
2023-08-22 16:36:20 +00:00
5ddb8ef827 Make emit_metrics importable without having boto3 installed (#107070)
Make it so that scripts can import and run the `emit_metrics` function even if they don't have boto3 installed, in which case it will still validate the inputs but skip the actual metric emission part.

It's purely a refactor without any real logic changes

Motivation: So that run_test.py and the target determination code can use this library easily without worrying about if it was imported or if it's dependencies are installed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107070
Approved by: https://github.com/huydhn
2023-08-21 21:13:01 +00:00
3b2c5d47c0 Use default build env and test config for test times (#107325)
Redo of #107312

Pairs with https://github.com/pytorch/test-infra/pull/4476

If build env and test config combo cannot be found in the test times, use default.  Then we don't have to go manually change the test-times.json a new job is added or we update the jobs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107325
Approved by: https://github.com/huydhn
2023-08-21 18:39:55 +00:00
e108f33299 Update distutils.Version to packaging.version due to the deprecation … (#107207)
Update distutils.Version to packaging.version due to the deprecation warning.

```python
/root/Git.d/pytorch/pytorch/torch/testing/_internal/common_methods_invocations.py:17136: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
  active_if=TEST_SCIPY and LooseVersion(scipy.__version__) < "1.4.0"),
/root/Git.d/pytorch/pytorch/torch/testing/_internal/common_methods_invocations.py:17138: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
  active_if=TEST_SCIPY and LooseVersion(scipy.__version__) < "1.4.0"),
/root/Git.d/pytorch/pytorch/torch/testing/_internal/common_methods_invocations.py:17140: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
  active_if=TEST_SCIPY and LooseVersion(scipy.__version__) < "1.4.0"),
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107207
Approved by: https://github.com/soulitzer
2023-08-17 11:19:44 +00:00
f16be5e0d4 Reordering tests experiment (#106347)
Companion with https://github.com/pytorch/test-infra/pull/4424

Uses the file rating generated by the test infra PR to re order tests.  For each test file, sum the file ratings from the changed files in the PR, and put the tests in order of sum.

A lot of tests are probably going to end up as "prioritized" since it takes anything with a rating > 0 right now.

Sharding is done twice, once on the prioritized tests, and once on the general/non prioritized tests.  Prioritized tests have an order, so they should be sharded according to that order, while general tests don't have an order and are sharded by test time, which should result in more balanced shards.

I'll change the metric name before I merge, i want to quarantine my testing stuff from actual results

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106347
Approved by: https://github.com/ZainRizvi
2023-08-16 18:23:09 +00:00
9858edd99f Revert "Reordering tests experiment (#106347)"
This reverts commit 7dfab082be9eaeeee95c7b0363e59c824c6a9009.

Reverted https://github.com/pytorch/pytorch/pull/106347 on behalf of https://github.com/clee2000 due to probably broke sharding ([comment](https://github.com/pytorch/pytorch/pull/106347#issuecomment-1675542738))
2023-08-11 23:59:48 +00:00
b9ad7bc533 Don't run test/autograd/test_fallback.py in parallel (#106866)
Fixes https://github.com/pytorch/pytorch/issues/106754

This PR:
- moves test/autograd/test_fallback.py to test_autograd_fallback.py and
removes it from test_autograd.py (necessary for the next step)
- adds test_autograd_fallback.py to parallel test blocklist.
- lintrunner really wanted to make changes to the files, but other than
that, it is a move.

The problem is that we set a global option (the autograd fallback mode)
during these tests which may cause the tests to interfere with each
other.

Test Plan:
- python test/run_test.py -i test_autograd_fallback

NOTE to diff train oncall:
- You'll also need to modify the test/autograd/test_fallback.py TARGET in
caffe2/test/TARGETS since we renamed the file.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106866
Approved by: https://github.com/soulitzer
2023-08-10 00:26:23 +00:00
7dfab082be Reordering tests experiment (#106347)
Companion with https://github.com/pytorch/test-infra/pull/4424

Uses the file rating generated by the test infra PR to re order tests.  For each test file, sum the file ratings from the changed files in the PR, and put the tests in order of sum.

A lot of tests are probably going to end up as "prioritized" since it takes anything with a rating > 0 right now.

Sharding is done twice, once on the prioritized tests, and once on the general/non prioritized tests.  Prioritized tests have an order, so they should be sharded according to that order, while general tests don't have an order and are sharded by test time, which should result in more balanced shards.

I'll change the metric name before I merge, i want to quarantine my testing stuff from actual results

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106347
Approved by: https://github.com/ZainRizvi
2023-08-09 20:11:11 +00:00
6d43c89f37 [BE]: Update Ruff to 0.0.280 (#105724)
Removes unusued loop values in python dictionary iteration. Automated fix from Ruff master

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105724
Approved by: https://github.com/ezyang, https://github.com/janeyx99
2023-07-22 23:03:34 +00:00
4cc1745b13 [BE] f-stringify torch/ and scripts (#105538)
This PR is a follow up on the pyupgrade series to convert more strings to use f-strings using `flynt`.

- https://docs.python.org/3/reference/lexical_analysis.html#f-strings
- https://pypi.org/project/flynt/

Command used:

```
flynt torch/ -ll 120
flynt scripts/ -ll 120
flynt tools/ -ll 120
```

and excluded `collect_env.py`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105538
Approved by: https://github.com/ezyang, https://github.com/malfet
2023-07-21 19:35:24 +00:00
73e1455327 [BE] Enable ruff's UP rules and autoformat test/ (#105434)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105434
Approved by: https://github.com/albanD
2023-07-19 20:36:06 +00:00
ece19bf018 Update run_test.py to use TEST_WITH_SLOW_GRADCHECK flag (#104819)
Finishes the job from #104537. See https://github.com/pytorch/pytorch/pull/104537#pullrequestreview-1520065008
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104819
Approved by: https://github.com/huydhn
2023-07-11 21:58:46 +00:00
40b8d10d5e Re-land: Turn translation validation on for tests and accuracy runs by default. (#104467)
Re-landing: #103611

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104467
Approved by: https://github.com/malfet
2023-07-05 19:01:50 +00:00
ddd7da7546 Enable more tests (#104437)
Remove `test_segment_reductions` from list of blocklisted tests Remove `@onlyCPU` qualifier from test_segment_reductions as it has CUDA specific parts

Fixes https://github.com/pytorch/pytorch/issues/104410

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104437
Approved by: https://github.com/atalman, https://github.com/huydhn
2023-06-30 16:26:11 +00:00
a2a8b4d415 Revert "Turn translation validation on for tests and accuracy runs by default. (#103611)"
This reverts commit e311bed2a8e014f0ccf6fdc3fce11884982ac930.

Reverted https://github.com/pytorch/pytorch/pull/103611 on behalf of https://github.com/malfet due to Broke inductor tests ([comment](https://github.com/pytorch/pytorch/pull/103611#issuecomment-1614850276))
2023-06-30 15:54:18 +00:00
e311bed2a8 Turn translation validation on for tests and accuracy runs by default. (#103611)
This PR turns translation validation on by default for tests and accuracy benchmark
runs. It also installs Z3 on CI.

The main changes are:

- Add `--no-translation-validation` as an option in _test/run_tests.py_
    - Set `PYTORCH_TEST_WITH_TV` environment variable
- Add `TEST_WITH_TV` variable in _torch/testing/_internal/common_utils.py_
- Turn translation validation on for accuracy benchmarks in _benchmarks/dynamo/common.py_
- Add Z3 installation on CI scripts

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103611
Approved by: https://github.com/ezyang
2023-06-30 01:32:21 +00:00
c40f5edf7b Change tools search order (#104214)
Prevents following cryptic error if one attempts to use `run_tests.py` on system that also has torchaudio installed in dev mode (as `tools` from https://github.com/pytorch/audio might take precedence, but this is not how script should behave):
```
Unable to import test_selections from tools/testing. Running without test selection stats.... Reason: No module named 'tools.stats'
Traceback (most recent call last):
  File "/Users/nshulga/git/pytorch/pytorch/test/run_test.py", line 1673, in <module>
    main()
  File "/Users/nshulga/git/pytorch/pytorch/test/run_test.py", line 1604, in main
    selected_tests = get_selected_tests(options)
  File "/Users/nshulga/git/pytorch/pytorch/test/run_test.py", line 1418, in get_selected_tests
    path = os.path.join(str(REPO_ROOT), TEST_TIMES_FILE)
NameError: name 'TEST_TIMES_FILE' is not defined
```

But make sure to remove it in the end, otherwise it will not work if torch is installed from wheel, but tests are running from clean repo checkout.

<!--
copilot:poem
-->
### <samp>🤖 Generated by Copilot at dd52521</samp>

> _Sing, O Muse, of the cunning code review_
> _That fixed the tests of the `tools` module_
> _By adding and removing the root path_
> _As a shepherd guides his flock to and fro._
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104214
Approved by: https://github.com/kit1980
2023-06-27 15:54:34 +00:00
925f0a01c7 Do not pass stepcurrent option unless in CI (#104135)
Should allow one to run the same tests multiple times on local machine

<!--
copilot:poem
-->
### <samp>🤖 Generated by Copilot at 740a92d</samp>

> _`pytest_args` change_
> _Only add `--sc` on CI_
> _Avoid conflicts - fall_

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104135
Approved by: https://github.com/huydhn, https://github.com/kit1980
2023-06-24 09:34:14 +00:00
63f66d19ea [Tests] Make run_test.py usable without boto3 (#104111)
There is a `HAVE_TEST_SELECTION_TOOLS` conditional, but turns out it does not really work, so fix it by defining all missing prototypes and make it work as single-shard instance

Add lint rule to test stat it would succeed for runnign only test_cuda with released version of PyTorch

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104111
Approved by: https://github.com/clee2000, https://github.com/ZainRizvi
2023-06-24 03:10:49 +00:00
98d513cabf [BE][Test] Remove --pytest option from run_test.py (#104125)
Because we always run tests with pytest now.

Marking it as `bc-breaking` as there could technically be some scripts depending on it somewhere...

<!--
copilot:poem
-->
### <samp>🤖 Generated by Copilot at 1760568</samp>

> _`pytest` option gone_
> _simpler test runner script_
> _autumn leaves fall fast_
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104125
Approved by: https://github.com/seemethere
2023-06-24 00:20:20 +00:00
7ac1c64bc4 Exclude _nvfuser from test collection (#104003)
The three files in this folder are run by should instead be run by test_jit_cuda_fuser.py, test_nvfuser_dynamo.py, and test_nvfuser_frontend.py
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104003
Approved by: https://github.com/huydhn, https://github.com/jjsjann123
2023-06-22 19:46:45 +00:00
c3d3165f16 Enable uploading metrics and upload Test Reordering metrics to dynamodb (#102691)
Added a feature to upload test statistics to DynamoDB and Rockset using a new function `emit_metric` in `tools/stats/upload_stats_lib.py`.

Added metrics to measure test reordering effectiveness in `tools/testing/test_selections.py`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102691
Approved by: https://github.com/malfet
2023-06-12 23:01:53 +00:00
b52ee80cdc Revert "Add print statements to debug sharding error (#102713)"
This reverts commit c7873522c2ceefbc3b747224da1d26d566115c9a.

Reverted https://github.com/pytorch/pytorch/pull/102713 on behalf of https://github.com/clee2000 due to issue should be resolved now ([comment](https://github.com/pytorch/pytorch/pull/102713#issuecomment-1583334560))
2023-06-08 21:02:17 +00:00
591134f2a5 [CI] Enable UCC in CI (#100395)
UCC was temporarily disabled in #98832. This PR re-enables it with the necessary fix.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100395
Approved by: https://github.com/atalman
2023-06-08 19:01:22 +00:00
c7873522c2 Add print statements to debug sharding error (#102713)
sharding on rocm is broken, i cant replicate on dummy PRs even though it seems to happen pretty often on main, so adding this to increase my sample size.  Hopefully this is enough print statements...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102713
Approved by: https://github.com/huydhn
2023-06-01 22:38:28 +00:00
c84f246c83 Improve time savings calculation math for test reordering (#102411)
Use a more accurate method that accounts for tests being run in parallel

Right now we still log results to the console, but later it'll get logged to Rockset for better tracking
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102411
Approved by: https://github.com/huydhn, https://github.com/malfet
2023-05-31 23:51:27 +00:00
a5ddb72aec Quick fix for keep-going + reruns (#102569)
Currently file level reruns + stepcurrent are incompatible and it's making PRs green when they are actually red, so turn off stepcurrent + file level reruns when keep-going is used until I figure out a better way to do this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102569
Approved by: https://github.com/huydhn, https://github.com/malfet
2023-05-31 04:46:25 +00:00
1a6ab8a5dc Revert "Quick fix for keep-going + reruns (#102569)"
This reverts commit 7f6edcf422d133b6fd747ec0775d1c840a91ee46.

Reverted https://github.com/pytorch/pytorch/pull/102569 on behalf of https://github.com/clee2000 due to broke a ton of stuff ([comment](https://github.com/pytorch/pytorch/pull/102569#issuecomment-1569167673))
2023-05-30 22:04:27 +00:00
7f6edcf422 Quick fix for keep-going + reruns (#102569)
Currently file level reruns + stepcurrent are incompatible and it's making PRs green when they are actually red, so turn off stepcurrent + file level reruns when keep-going is used until I figure out a better way to do this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102569
Approved by: https://github.com/huydhn
2023-05-30 21:29:56 +00:00
6e3e3dd477 Do not collect and skip non-disabled tests when rerunning disabled tests (#102107)
The console log blows up to much when running in rerun disabled tests mode (x50) e132f09e88.  Each log is around 1GB and the whole uncompressed logs is ~50GB.  After compression, it will be around 1GB, still too big.  The increase comes mainly from the multiple SKIPPED message for non-disabled tests, which is expected due to how SkipTest and pytest-flakyfinder currently work.

I update `test/conftest.py` to completely ignore skipped tests when rerunning disabled test instead of collecting then skipping 50 tests each.  The benefit of doing is is much more than I originally expect:
  * Rerun disabled tests jobs now finish in less than half an hour as they should be
  * Fix OOM runner crash because of too many collected tests
  * Fix verbosity issue as now only disabled tests are run x50 times.  There are only few hundreds of them atm
  * Fix timed out issue when rerunning disabled distributed and ASAN tests.  They are just too slow when running at x50

### Testing

When rerunning disabled tests https://github.com/pytorch/pytorch/actions/runs/5084508614, only disabled tests on the platform are run, for example `test_ops_jit` on https://ossci-raw-job-status.s3.amazonaws.com/log/13770164954 only ran 100 tests (`test_variant_consistency_jit_linalg_lu_cuda_float32` + `test_variant_consistency_jit_linalg_lu_factor_cuda_complex64`) x50.

```
Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_ops_jit.py', '--shard-id=1', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '--sc=test_ops_jit_1', '--flake-finder', '--flake-runs=50', '--import-slow-tests', '--import-disabled-tests', '--rerun-disabled-tests'] ... [2023-05-25 21:32:49.763856]

Expand the folded group to see the log file of test_ops_jit 2/2
##[group]PRINTING LOG FILE of test_ops_jit 2/2 (/var/lib/jenkins/workspace/test/test-reports/test_ops_jit_h2wr_t2c.log)
Test results will be stored in test-reports/python-pytest/test_ops_jit/test_ops_jit-51a83bd44549074e.xml
============================= test session starts ==============================
platform linux -- Python 3.10.11, pytest-7.3.1, pluggy-1.0.0 -- /opt/conda/envs/py_3.10/bin/python
cachedir: .pytest_cache
hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
rootdir: /var/lib/jenkins/workspace
configfile: pytest.ini
plugins: hypothesis-5.35.1, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-11.1.2, shard-0.1.2, xdist-3.3.0, xdoctest-1.1.0
collecting ... collected 1084 items
Running 100 items in this shard: test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_lu_cuda_float32 (x50), test/test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_lu_factor_cuda_complex64 (x50)
stepcurrent: Cannot find last run test, not skipping

test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_lu_cuda_float32 PASSED [2.1876s] [  1%]
test_ops_jit.py::TestJitCUDA::test_variant_consistency_jit_linalg_lu_factor_cuda_complex64 PASSED [4.5615s] [  2%]
```

* [pull](https://github.com/pytorch/pytorch/actions/runs/5093566864)
* [trunk](https://github.com/pytorch/pytorch/actions/runs/5095364311)
* [periodic](https://github.com/pytorch/pytorch/actions/runs/5095378850)
* [slow](https://github.com/pytorch/pytorch/actions/runs/5095390285)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102107
Approved by: https://github.com/clee2000, https://github.com/malfet
2023-05-27 12:10:36 +00:00
2232cce69c No cpp + step current (#102001)
stepcurrent cannot handle xdist
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102001
Approved by: https://github.com/huydhn
2023-05-24 17:39:32 +00:00
d06802778e No need to run C++ tests under rerun disabled tests mode (#102132)
Per title.  I extract this part out of the draft PR that I'm working on https://github.com/pytorch/pytorch/pull/102107 because
the remaining issues with rerun disabled tests: log size and unexpected runner failures requires some further investigations while this one is clearing breaking in trunk atm.

Until we can support disable C++ tests, there is no need to run them in rerun disabled tests mode.

### Testing

Coming from https://github.com/pytorch/pytorch/pull/102107, for example https://github.com/pytorch/pytorch/actions/runs/5062224659/jobs/9087747981

```
2023-05-23T22:46:50.1953318Z Running cpp/basic 1/1 ... [2023-05-23 22:46:50.195077]
2023-05-23T22:46:50.1953847Z Skipping C++ tests when running under RERUN_DISABLED_TESTS mode
2023-05-23T22:46:50.2066032Z Running cpp/atest 1/1 ... [2023-05-23 22:46:50.206348]
2023-05-23T22:46:50.2066435Z Skipping C++ tests when running under RERUN_DISABLED_TESTS mode
2023-05-23T22:46:52.2666743Z No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
2023-05-23T22:46:52.2691817Z Ignoring disabled issues:  []
...
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102132
Approved by: https://github.com/clee2000
2023-05-24 07:45:48 +00:00
d26c8f26d1 Lower xdist processes from auto to NUM_PROCS (#102124)
This is to avoid CUDA OOM issues when running C++ tests both regularly and in memory leak check mode.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102124
Approved by: https://github.com/clee2000
2023-05-24 06:50:55 +00:00
f3fc531eee Check for pytest extensions in run_test (#100916)
not very elegant

checked on separate conda env that doesnt have the usual ci dependencies

the two pytest extensions at fault are pytest-rerunfailures and pytest-shard, also included pytest-flakefinder just incase

no idea if this is a good way to do this

could also check individually and add flags based on that, but was told that needing to requiring all the ci dependencies to be downloaded was also ok
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100916
Approved by: https://github.com/huydhn
2023-05-17 20:27:55 +00:00
e3c9a1e5c4 Run dynamo tests in parallel (#101432)
cuts off ~30 min per shard
(2 shards and 2 python versions so 2 hours total)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101432
Approved by: https://github.com/huydhn, https://github.com/desertfire, https://github.com/ZainRizvi
2023-05-17 20:26:24 +00:00
552b712f80 Run C++ testcases in parallel with pytest-xdist (#101440)
After an investigation, running C++ tests with https://github.com/pytest-dev/pytest-cpp is just slower than running them directly, plain and simple. I'm curious on the exact root cause, but that's a story for another day.

`time build/bin/test_lazy` takes half a minute to run 610 tests on `linux-bionic-cuda11.8-py3.10-gcc7 / test (default, 2, 5, linux.4xlarge.nvidia.gpu)` while `time pytest /var/lib/jenkins/workspace/build/bin/test_lazy -v` takes 20+ minutes on the same runner.  This is a very costly price to pay.

The saving grace here is that https://github.com/pytest-dev/pytest-cpp supports pytest-xdist to run tests in parallel with `-n auto`, so `time pytest /var/lib/jenkins/workspace/build/bin/test_lazy -v -n auto` takes only 3 minutes.  This is still not as fast as running C++ tests directly, but it's order of magnitude faster than running them sequentially.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101440
Approved by: https://github.com/clee2000
2023-05-16 21:52:36 +00:00
35834a405c Run C++ tests on CI with run_test.py (#99956)
After https://github.com/pytorch/pytorch/pull/99559, we can now run C++ test with `run_test.py`.  Although advance features such as `--import-slow-tests` and `--import-disabled-tests` won't work for now, there will still be a gain in reliability and performance as C++ can now be retried and run in parallel.

This covers all C++ tests in the CI including aten, libtorch, and Vulkan C++ tests across all platforms Linux, Windows, MacOS.

Notes:
* To support C++ test discovery, the env variable `CPP_TESTS_DIR` can be set to where the C++ test binaries is located
* Support pytest -k argument via run_test as this is used by pytest-cpp to replace `--gtest-filter`
* The XML output is in pytest format, but it's ok now because we don't have slow test or flaky test support for C++ test yet
* ~~I need to figure out why conftest.py doesn't work when I invoke pytest directly for C++ test, so `--sc` is not available for C++ tests at the moment.  Proper pytest plugin like stepwise works fine though.  I'll investigate and fix it in a separate PR~~ Found the cause, `conftest.py` is per directory and needs to be in any arbitrary directory that holds C++ test
* Two tests `test_api` and `test_tensorexpr` timed out on ASAN, I suspect that ASAN is now used on top of the python executable, which is slower than running native C++ code.  IMO, it's ok to run these tests as before on ASAN for now
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99956
Approved by: https://github.com/clee2000, https://github.com/ZainRizvi
2023-05-09 21:24:12 +00:00
cecfcf1e17 [MPS] Handle MPS failures of test_modules.py in common_modules.py (#95334)
- Also cleaned up `test_modules.py` from skipMPS code.
- Added `skipMPS` for unsupported or failing tests on MPS backend in common_modules.py.
   (We'll remove `skipMPS` from those tests once a fix is available for them.)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95334
Approved by: https://github.com/kulinseth, https://github.com/albanD
2023-05-09 03:55:16 +00:00
95f191a248 Always run prioritized tests first, even if they're expected to run serially (#100748)
Today, we prioritize running test files that were edited in the user's PR, with the idea being to run them before we run any other test.

Except, if the modified test is supposed to run serially, then we still end up running it after all the parallelized tests have finished running.

This PR fixes that to _always_ run the prioritized tests before the regular tests, regardless of if the test is supposed to run serially or in parallel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100748
Approved by: https://github.com/huydhn
2023-05-08 20:23:46 +00:00
a1f318daba Fix get_reordered_tests in run_test.py (#100752)
i think get_reordered_tests broken since master -> main switch

add typing for some functions

checked for `prioritized` in the logs

limited testing because I only care about one very small part of the log thats near the beginning

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100752
Approved by: https://github.com/huydhn
2023-05-05 22:46:56 +00:00
e88e92e7a2 Update to reruns + timeouts in run_test.py (#100412)
https://github.com/pytorch/pytorch/pull/100200/files made unknown tests more likely to fail b/c lacking test times but still have time outs, so fix that
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100412
Approved by: https://github.com/huydhn
2023-05-01 21:51:53 +00:00
73645a8412 Add CUDA 12.1 CI workflows (#98832)
Adds CUDA 12.1 CI workflows, removes CUDA 11.7.
CC @malfet

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98832
Approved by: https://github.com/atalman
2023-05-01 16:25:53 +00:00
9075e3c2c6 Revert "Run test_fx_to_onnx_with_onnxruntime serially (#100298)"
This reverts commit 3a3f781f6cd90abbceb63a9cb59546d892ef899e.

Reverted https://github.com/pytorch/pytorch/pull/100298 on behalf of https://github.com/huydhn due to No need as https://github.com/pytorch/pytorch/pull/100297 has been landed ([comment](https://github.com/pytorch/pytorch/pull/100298#issuecomment-1528476786))
2023-04-29 02:07:39 +00:00
3a3f781f6c Run test_fx_to_onnx_with_onnxruntime serially (#100298)
This test starts to fail out of nowhere in trunk

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100298
Approved by: https://github.com/kit1980
2023-04-29 00:51:25 +00:00
6ab9453ea9 File level rerun changes (#100200)
Fixes #ISSUE_NUMBER
* change hook so that test still gets saved in --sc when fails in test setup (caused an off by 1 error due to setup being called before the logreport hook)
* allow reruns for all tests now that --sc is used
* increase number of reruns now that --sc is used
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100200
Approved by: https://github.com/huydhn
2023-04-28 20:57:49 +00:00
ae5e1819a5 stepcurrent (#98035)
* add stepcurrent flag (--sc) based off the stepwise flag that saves the currently running test so that test running can resume from the last successful test after segfaults, takes in an argument for a key so that different test runs dont overwrite each other
* send sigint to process when timeout so that xml can be made

* add currently unused stepcurrent skip flag (--scs) based off stepwise skip flag that skips the failing test, was going to use if for the keep-going label but having trouble with CI
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98035
Approved by: https://github.com/huydhn
2023-04-25 20:56:04 +00:00
96d3f3dee3 Discover and run C++ tests with run_test.py (#99559)
This depends on [pytest-cpp](https://github.com/pytest-dev/pytest-cpp) to discover and run C++ tests with pytest. C++ tests are built under `${WORKSPACE}/build/bin` directory and copied to the test job under the same path.

* To expose them to `run_test`, I choose to use the mock path prefix `cpp`, for example `build/bin/c10_Array_test` would be named as `cpp/c10_Array_test` and the `python test/run_test.py --cpp -i cpp/c10_Array_test` would run the test in the same way as other Python tests.  I could copy them from `build/bin` to `test/cpp`, but it will be mixed with the source code and CMake file.  So this looks easier
* Some executable under `build/bin` are not C++ tests, and they are exclude, for example `build/bin/torch_shm_manager`
* C++ tests need to run with pytest directly as python command doesn't understand it
* The change is gated by the new `--cpp` argument to `run_test.py`, for example `python test/run_test.py --cpp` will run all available C++ tests
* The tests can be run in parallel
* Failing tests can be retried with `--reruns=2` and `--sw`

```
============================= test session starts ==============================
platform darwin -- Python 3.9.15, pytest-7.2.0, pluggy-1.0.0 -- /Users/huydo/miniconda3/envs/py3.9/bin/python3
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/Users/huydo/Storage/mine/pytorch/test/.hypothesis/examples')
rootdir: /Users/huydo/Storage/mine/pytorch, configfile: pytest.ini
plugins: xdoctest-1.1.0, cpp-2.3.0, rerunfailures-10.3, shard-0.1.2, flakefinder-1.1.0, hypothesis-6.56.4, xdist-3.0.2, repeat-0.9.1
collecting ... collected 3 items / 2 deselected / 1 selected
Running 1 items in this shard: build/bin/scalar_tensor_test::TestScalarTensor.TestScalarTensorMPS
stepwise: skipping 2 already passed items.

../build/bin/scalar_tensor_test::TestScalarTensor::TestScalarTensorMPS RERUN [100%]
../build/bin/scalar_tensor_test::TestScalarTensor::TestScalarTensorMPS RERUN [100%]
../build/bin/scalar_tensor_test::TestScalarTensor::TestScalarTensorMPS FAILED [100%]
```

* `--import-slow-tests` and `--import-disabled-tests` won't work for now and that's ok to have it as a future task.

I also add `pytest-cpp==2.3.0` to Linux Docker, MacOS, and Windows.

### Testing

Build PyTorch and run `python test/run_test.py --cpp` on my laptop.  CI change would come later in a separate PR.  Also running `python test/run_test.py --help` now shows all C++ test discovered under `build/bin`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99559
Approved by: https://github.com/clee2000
2023-04-22 00:23:31 +00:00
7546972565 [BE] Refactoring test execution and improving comments (#99467)
Sharing code between the code that handles test results in parallel vs serial mode.

Note that the original version of this code had an inconsistency between the two versions where it would execute `print_to_stderr(err_message)` on every test that ran in parallel, but for serial tests it would only invoke `print_to_stderr(err_message)` if `continue_on_error` was also specified.  By sharing code, this PR changes that behavior to be consistent between the two modes.

Also adding some comments.

<!--
copilot:poem
-->
### <samp>🤖 Generated by Copilot at 029342c</samp>

> _Sing, O Muse, of the skillful coder who refined_
> _The PyTorch testing script, `run_test.py`, and shined_
> _A light on its obscure logic, with docstrings and comments_
> _And made it run more smoothly, with better error contents_
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99467
Approved by: https://github.com/huydhn, https://github.com/malfet
2023-04-19 19:29:07 +00:00
d41aa448b8 [ONNX] Run ONNX tests as part of standard run_test script (#99215)
<!--
copilot:all
-->
### <samp>🤖 Generated by Copilot at dcbf7e2</samp>

### Summary
📝🧹🚩

<!--
1.  📝 for simplifying the `./scripts/onnx/test.sh` script
2.  🧹 for refactoring the `test/onnx/dynamo/test_exporter_api.py` file
3.  🚩 for adding the `--onnx` flag to `test/run_test.py` and updating the `TESTS` list
-->
This pull request improves the ONNX testing infrastructure in PyTorch by refactoring the test code, normalizing the scope names, adding a flag to run only the ONNX tests, and simplifying the test script.

> _To export PyTorch models to ONNX_
> _We refactored some scripts and contexts_
> _We used `common_utils`_
> _And normalized the scopes_
> _And added a flag to run the tests_

### Walkthrough
*  Simplify `./scripts/onnx/test.sh` to use `run_test.py` with `--onnx` flag instead of `pytest` ([link](https://github.com/pytorch/pytorch/pull/99215/files?diff=unified&w=0#diff-0017f5b22ae1329acb0f54af8d9811c9b6180a72dac70d7a5b89d7c23c958198L44-R46))
*  Remove `onnx` test from `TESTS` list in `test/run_test.py` ([link](https://github.com/pytorch/pytorch/pull/99215/files?diff=unified&w=0#diff-e72503c9e3e8766e2d1bacf3fad7b88aa166e0e90a7e103e7df99357a35df8d7L127-R127)). Replace with `onnx_caffe2`.
*  Add `onnx/test_pytorch_onnx_onnxruntime_cuda` and `onnx/test_models` tests to `blocklisted_tests` list in `test/run_test.py` ([link](https://github.com/pytorch/pytorch/pull/99215/files?diff=unified&w=0#diff-e72503c9e3e8766e2d1bacf3fad7b88aa166e0e90a7e103e7df99357a35df8d7R154-R155))
*  Add `ONNX_SERIAL_LIST` list to `test/run_test.py` to specify ONNX tests that must run serially ([link](https://github.com/pytorch/pytorch/pull/99215/files?diff=unified&w=0#diff-e72503c9e3e8766e2d1bacf3fad7b88aa166e0e90a7e103e7df99357a35df8d7R296-R301))
*  Add `ONNX_TESTS` list to `test/run_test.py` to store all ONNX tests ([link](https://github.com/pytorch/pytorch/pull/99215/files?diff=unified&w=0#diff-e72503c9e3e8766e2d1bacf3fad7b88aa166e0e90a7e103e7df99357a35df8d7R370))
*  Add `--onnx` flag to `parse_args` function in `test/run_test.py` to run only ONNX tests ([link](https://github.com/pytorch/pytorch/pull/99215/files?diff=unified&w=0#diff-e72503c9e3e8766e2d1bacf3fad7b88aa166e0e90a7e103e7df99357a35df8d7R920-R928))
*  Include `ONNX_SERIAL_LIST` in `must_serial` function in `test/run_test.py` to run ONNX tests serially or parallelly based on memory usage ([link](https://github.com/pytorch/pytorch/pull/99215/files?diff=unified&w=0#diff-e72503c9e3e8766e2d1bacf3fad7b88aa166e0e90a7e103e7df99357a35df8d7R1120))
*  Filter selected tests based on `--onnx` flag in `get_selected_tests` function in `test/run_test.py` to exclude non-ONNX tests ([link](https://github.com/pytorch/pytorch/pull/99215/files?diff=unified&w=0#diff-e72503c9e3e8766e2d1bacf3fad7b88aa166e0e90a7e103e7df99357a35df8d7R1158-R1165))

### Other minor changes to accommodate this change
*  Replace `unittest` module with `common_utils.TestCase` in `test/onnx/dynamo/test_exporter_api.py` ([link](https://github.com/pytorch/pytorch/pull/99215/files?diff=unified&w=0#diff-4545f0c15c73ebe90a875e9bee6c5ca4b6b92fb1ed0ec5560d1568e0f6339d02L4), [link](https://github.com/pytorch/pytorch/pull/99215/files?diff=unified&w=0#diff-4545f0c15c73ebe90a875e9bee6c5ca4b6b92fb1ed0ec5560d1568e0f6339d02L29-R28), [link](https://github.com/pytorch/pytorch/pull/99215/files?diff=unified&w=0#diff-4545f0c15c73ebe90a875e9bee6c5ca4b6b92fb1ed0ec5560d1568e0f6339d02L71-R70), [link](https://github.com/pytorch/pytorch/pull/99215/files?diff=unified&w=0#diff-4545f0c15c73ebe90a875e9bee6c5ca4b6b92fb1ed0ec5560d1568e0f6339d02L147-R146))
*  Import `TemporaryFileName` class from `common_utils` in `test/onnx/dynamo/test_exporter_api.py` ([link](https://github.com/pytorch/pytorch/pull/99215/files?diff=unified&w=0#diff-4545f0c15c73ebe90a875e9bee6c5ca4b6b92fb1ed0ec5560d1568e0f6339d02L19-R18))
*  Use `common_utils.TemporaryFileName` instead of `TemporaryFileName` in `TestDynamoExportAPI` class in `test/onnx/dynamo/test_exporter_api.py` ([link](https://github.com/pytorch/pytorch/pull/99215/files?diff=unified&w=0#diff-4545f0c15c73ebe90a875e9bee6c5ca4b6b92fb1ed0ec5560d1568e0f6339d02L92-R91), [link](https://github.com/pytorch/pytorch/pull/99215/files?diff=unified&w=0#diff-4545f0c15c73ebe90a875e9bee6c5ca4b6b92fb1ed0ec5560d1568e0f6339d02L110-R109), [link](https://github.com/pytorch/pytorch/pull/99215/files?diff=unified&w=0#diff-4545f0c15c73ebe90a875e9bee6c5ca4b6b92fb1ed0ec5560d1568e0f6339d02L129-R128))
*  Use `common_utils.run_tests` instead of `unittest.main` in `test/onnx/dynamo/test_exporter_api.py` ([link](https://github.com/pytorch/pytorch/pull/99215/files?diff=unified&w=0#diff-4545f0c15c73ebe90a875e9bee6c5ca4b6b92fb1ed0ec5560d1568e0f6339d02L155-R154))
*  Add `re` module to `test/onnx/test_utility_funs.py` ([link](https://github.com/pytorch/pytorch/pull/99215/files?diff=unified&w=0#diff-da71d2c81c9dc7ac0c47ff086fded82e4edcb67ba0cd3d8b5c983d7467343bc7R6))
*  Add `_remove_test_environment_prefix_from_scope_name` function to `test/onnx/test_utility_funs.py` to normalize scope names of ONNX nodes ([link](https://github.com/pytorch/pytorch/pull/99215/files?diff=unified&w=0#diff-da71d2c81c9dc7ac0c47ff086fded82e4edcb67ba0cd3d8b5c983d7467343bc7R32-R58))
*  Use `_remove_test_environment_prefix_from_scope_name` function to compare scope names of ONNX nodes in `TestUtilityFuns` class in `test/onnx/test_utility_funs.py` ([link](https://github.com/pytorch/pytorch/pull/99215/files?diff=unified&w=0#diff-da71d2c81c9dc7ac0c47ff086fded82e4edcb67ba0cd3d8b5c983d7467343bc7L1099-R1133), [link](https://github.com/pytorch/pytorch/pull/99215/files?diff=unified&w=0#diff-da71d2c81c9dc7ac0c47ff086fded82e4edcb67ba0cd3d8b5c983d7467343bc7L1119-R1152), [link](https://github.com/pytorch/pytorch/pull/99215/files?diff=unified&w=0#diff-da71d2c81c9dc7ac0c47ff086fded82e4edcb67ba0cd3d8b5c983d7467343bc7L1170-R1188), [link](https://github.com/pytorch/pytorch/pull/99215/files?diff=unified&w=0#diff-da71d2c81c9dc7ac0c47ff086fded82e4edcb67ba0cd3d8b5c983d7467343bc7L1181-R1199), [link](https://github.com/pytorch/pytorch/pull/99215/files?diff=unified&w=0#diff-da71d2c81c9dc7ac0c47ff086fded82e4edcb67ba0cd3d8b5c983d7467343bc7L1220-R1239), [link](https://github.com/pytorch/pytorch/pull/99215/files?diff=unified&w=0#diff-da71d2c81c9dc7ac0c47ff086fded82e4edcb67ba0cd3d8b5c983d7467343bc7L1235-R1258))

Fixes #98626

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99215
Approved by: https://github.com/huydhn, https://github.com/titaiwangms
2023-04-19 06:17:47 +00:00
7ff1f3f3f6 Revert "Revert "Expandable blocks in allocator (#96995)"" (#99275)
This reverts commit 851e89c8e817f28270e0fc21d74ced9446bea747.

Differential Revision: [D45034526](https://our.internmc.facebook.com/intern/diff/D45034526)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99275
Approved by: https://github.com/eellison
2023-04-17 23:46:08 +00:00