40 Commits

Author SHA1 Message Date
ffc7552e01 See if we can handle uploading all test data (#165484)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/165484
Approved by: https://github.com/izaitsevfb
2025-10-15 19:57:41 +00:00
d88a8c41d5 Fix flaky "Upload test stats" job (#143991)
Test stat uploading was intermittently failing due to certain XML strings being opportunistically converted to numbers, when string output was expected. This PR makes the conversion behavior optional, which should fix the stat uploads.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/143991
Approved by: https://github.com/clee2000, https://github.com/huydhn
2024-12-30 21:40:01 +00:00
0db21a6b23 Remove most rockset references (#139922)
Remove most references to rockset:
* replace comments and docs with a generic "backend database"
* Delete `upload_to_rockset`, so we no longer need to install the package.
* Do not upload perf stats to rockset as well (we should be completely on DynamoDB now right @huydhn?)

According to VSCode, it went from 41 -> 7 instances of "rockset" in the repo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/139922
Approved by: https://github.com/huydhn, https://github.com/ZainRizvi
2024-11-12 21:17:43 +00:00
6baee60e3c upload test stats: remove nan/inf when uploading (#136877)
`json.dumps(float("inf"))` returns `Infinity`, which is technically invalid json

This is fine if you json.load, but ClickHouse cannot handle it

Solution here: cast inf and nan to string (which ClickHouse is able to cast back to float)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136877
Approved by: https://github.com/huydhn
2024-10-01 21:47:46 +00:00
8a67daf283 [BE][Easy] enable postponed annotations in tools (#129375)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129375
Approved by: https://github.com/malfet
2024-06-29 09:23:35 +00:00
a32ce5ce34 Revert "[BE][Easy] enable postponed annotations in tools (#129375)"
This reverts commit 59eb2897f1745f513edb6c63065ffad481c4c8d0.

Reverted https://github.com/pytorch/pytorch/pull/129375 on behalf of https://github.com/huydhn due to Sorry for reverting your change but I need to revert to cleanly revert https://github.com/pytorch/pytorch/pull/129374, please do a rebase and reland this ([comment](https://github.com/pytorch/pytorch/pull/129375#issuecomment-2197800541))
2024-06-29 00:44:25 +00:00
59eb2897f1 [BE][Easy] enable postponed annotations in tools (#129375)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129375
Approved by: https://github.com/malfet
2024-06-28 15:37:54 +00:00
7c00635125 [CI] Move gha artifact download before xml parsing for test stat uploads (#125609)
Move gha artifact download to before any xml parsing is done for uplaod-test-stats

Do not download gha artifacts during xml parsing since got uploaded to s3 in the above and will be downloaded when all the artifacts are downloaded from s3

The previous method resulted in dups if you run the script again

TODO: write a deduper so we don't have to worry at all
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125609
Approved by: https://github.com/huydhn
2024-05-09 20:35:09 +00:00
fd59554be6 Scripts to compile reruns + td exclusions and upload to s3 (#124312)
Edits upload_test_stats to also upload a condensed version that contains reruns, and one that contains the list of td_exclusions.

Grouped by build name + test config
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124312
Approved by: https://github.com/malfet
2024-04-22 20:19:35 +00:00
0617f7fa75 [ez] Remove unused code in upload_test_stats (#111504)
This is code related to parallelism and test times that isn't used, so remove it.

Tested by running locally with `python3 -m tools.stats.upload_test_stats --workflow-run-id 6551035874 --workflow-run-attempt 1 --head-branch main --head-repository "pytorch/pytorch"` and commenting out parts for uploading to s3.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111504
Approved by: https://github.com/huydhn
2023-10-19 16:09:15 +00:00
c0a278d6f0 Upload all test stats only if the workflow is from pytorch/pytorch main (#105087)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105087
Approved by: https://github.com/huydhn
2023-07-14 17:07:48 +00:00
e9f2921bff Fix rerun disabled test uploading logic (#103476)
After https://github.com/pytorch/pytorch/pull/102107, rerunning disabled tests only collect and run disable tests.  A side effect of this change is that the skip message `Test is enabled but --rerun-disabled-tests verification mode is set, so only disabled tests are run` isn't in the test report anymore as these non-disabled tests are not going to be collected in the first place.  This breaks the logic in the uploading script that depends on this string to know if the test report belongs to a rerunning disabled tests workflow.

* This PR updates the logic in `is_rerun_disabled_tests` check to count the number of times a test is run instead.  In rerunning disabled tests mode, a test is run 50 times by default and 15 times for distributed tests (to avoid timeout). Both these numbers are larger than the max number of retries a test can get normally (3 x 3)
* This also removes the hacky `is_rerun_disabled_tests` check in `tools/stats/upload_test_stats.py` as rerun disabled tests reports are now very small (50 x the number of disabled tests)

### Testing

* `test_gradgrad_nn_GroupNorm_cuda_float64` now shows up correctly https://github.com/pytorch/pytorch/issues/98678
```
python3 -m tools.stats.check_disabled_tests --workflow-run-id 5229037746 --workflow-run-attempt 1 --repo "pytorch/pytorch"

Using temporary directory: /var/folders/x4/2kd9r0fn5b9bf_sbcw16fxsc0000gn/T/tmpdojg5vq5
Downloading test-reports-test-default-1-4-linux.g5.4xlarge.nvidia.gpu_14154925022.zip
Downloading test-reports-test-default-1-4-linux.g5.4xlarge.nvidia.gpu_14154925093.zip
Downloading test-reports-test-default-2-4-linux.g5.4xlarge.nvidia.gpu_14154925167.zip
Downloading test-reports-test-default-2-4-linux.g5.4xlarge.nvidia.gpu_14154925226.zip
Downloading test-reports-test-default-3-4-linux.g5.4xlarge.nvidia.gpu_14154925295.zip
Downloading test-reports-test-default-3-4-linux.g5.4xlarge.nvidia.gpu_14154925371.zip
Downloading test-reports-test-default-4-4-linux.g5.4xlarge.nvidia.gpu_14154925453.zip
Downloading test-reports-test-default-4-4-linux.g5.4xlarge.nvidia.gpu_14154925536.zip
Downloading test-reports-test-slow-1-1-linux.2xlarge_14154853469.zip
Downloading test-reports-test-slow-1-1-linux.rocm.gpu_14154932523.zip
Downloading test-reports-test-slow-1-1-linux.rocm.gpu_14154932563.zip
Downloading test-reports-test-slow-1-2-linux.4xlarge_14154873704.zip
Downloading test-reports-test-slow-1-2-linux.g5.4xlarge.nvidia.gpu_14154931154.zip
Downloading test-reports-test-slow-1-2-linux.g5.4xlarge.nvidia.gpu_14154931186.zip
Downloading test-reports-test-slow-2-2-linux.4xlarge_14154873756.zip
Downloading test-reports-test-slow-2-2-linux.g5.4xlarge.nvidia.gpu_14154931225.zip
Downloading test-reports-test-slow-2-2-linux.g5.4xlarge.nvidia.gpu_14154931267.zip
Extracting test-reports-test-default-1-4-linux.g5.4xlarge.nvidia.gpu_14154925022.zip to unzipped-test-reports-test-default-1-4-linux.g5.4xlarge.nvidia.gpu_14154925022
Extracting test-reports-test-default-1-4-linux.g5.4xlarge.nvidia.gpu_14154925093.zip to unzipped-test-reports-test-default-1-4-linux.g5.4xlarge.nvidia.gpu_14154925093
Extracting test-reports-test-default-2-4-linux.g5.4xlarge.nvidia.gpu_14154925167.zip to unzipped-test-reports-test-default-2-4-linux.g5.4xlarge.nvidia.gpu_14154925167
Extracting test-reports-test-default-2-4-linux.g5.4xlarge.nvidia.gpu_14154925226.zip to unzipped-test-reports-test-default-2-4-linux.g5.4xlarge.nvidia.gpu_14154925226
Extracting test-reports-test-default-3-4-linux.g5.4xlarge.nvidia.gpu_14154925295.zip to unzipped-test-reports-test-default-3-4-linux.g5.4xlarge.nvidia.gpu_14154925295
Extracting test-reports-test-default-3-4-linux.g5.4xlarge.nvidia.gpu_14154925371.zip to unzipped-test-reports-test-default-3-4-linux.g5.4xlarge.nvidia.gpu_14154925371
Extracting test-reports-test-default-4-4-linux.g5.4xlarge.nvidia.gpu_14154925453.zip to unzipped-test-reports-test-default-4-4-linux.g5.4xlarge.nvidia.gpu_14154925453
Extracting test-reports-test-default-4-4-linux.g5.4xlarge.nvidia.gpu_14154925536.zip to unzipped-test-reports-test-default-4-4-linux.g5.4xlarge.nvidia.gpu_14154925536
Extracting test-reports-test-slow-1-1-linux.2xlarge_14154853469.zip to unzipped-test-reports-test-slow-1-1-linux.2xlarge_14154853469
Extracting test-reports-test-slow-1-1-linux.rocm.gpu_14154932523.zip to unzipped-test-reports-test-slow-1-1-linux.rocm.gpu_14154932523
Extracting test-reports-test-slow-1-1-linux.rocm.gpu_14154932563.zip to unzipped-test-reports-test-slow-1-1-linux.rocm.gpu_14154932563
Extracting test-reports-test-slow-1-2-linux.4xlarge_14154873704.zip to unzipped-test-reports-test-slow-1-2-linux.4xlarge_14154873704
Extracting test-reports-test-slow-1-2-linux.g5.4xlarge.nvidia.gpu_14154931154.zip to unzipped-test-reports-test-slow-1-2-linux.g5.4xlarge.nvidia.gpu_14154931154
Extracting test-reports-test-slow-1-2-linux.g5.4xlarge.nvidia.gpu_14154931186.zip to unzipped-test-reports-test-slow-1-2-linux.g5.4xlarge.nvidia.gpu_14154931186
Extracting test-reports-test-slow-2-2-linux.4xlarge_14154873756.zip to unzipped-test-reports-test-slow-2-2-linux.4xlarge_14154873756
Extracting test-reports-test-slow-2-2-linux.g5.4xlarge.nvidia.gpu_14154931225.zip to unzipped-test-reports-test-slow-2-2-linux.g5.4xlarge.nvidia.gpu_14154931225
Extracting test-reports-test-slow-2-2-linux.g5.4xlarge.nvidia.gpu_14154931267.zip to unzipped-test-reports-test-slow-2-2-linux.g5.4xlarge.nvidia.gpu_14154931267
Downloading test-reports-runattempt1-test-slow-1-1-linux.rocm.gpu_14154932523.zip
Downloading test-reports-runattempt1-test-slow-1-1-linux.rocm.gpu_14154932563.zip
Extracting test-reports-runattempt1-test-slow-1-1-linux.rocm.gpu_14154932523.zip to unzipped-test-reports-runattempt1-test-slow-1-1-linux.rocm.gpu_14154932523
Extracting test-reports-runattempt1-test-slow-1-1-linux.rocm.gpu_14154932563.zip to unzipped-test-reports-runattempt1-test-slow-1-1-linux.rocm.gpu_14154932563
The following 32 tests should be re-enabled:
  test_huge_index (__main__.TestCuda) from test_cuda.py
  test_conv_bn_fuse_cpu (__main__.CpuTests) from inductor/test_torchinductor.py
  test_multi_threads (__main__.TestTorchrun) from backends/xeon/test_launch.py
  test_huge_index (__main__.TestCuda) from test_cuda_expandable_segments.py
  test_memory_timeline_no_id (__main__.TestMemoryProfilerE2E) from profiler/test_memory_profiler.py
  test_inverse_errors_large_cuda_float64 (__main__.TestLinalgCUDA) from test_linalg.py
  test_trace_dependencies (__main__.TestAnalyze) from test_package.py
  test_caching_pinned_memory (__main__.TestCuda) from test_cuda_expandable_segments.py
  test_graph_concurrent_replay (__main__.TestCuda) from test_cuda_expandable_segments.py
  test_module_attribute_mutation_violation_negative_1 (__main__.MutationExportTests) from dynamo/test_export_mutations.py
  test_module_attribute_mutation_violation_negative_2 (__main__.MutationExportTests) from dynamo/test_export_mutations.py
  test_module_attribute_mutation_violation_negative_4 (__main__.MutationExportTests) from dynamo/test_export_mutations.py
  test_vmapjvpall_linalg_lu_cuda_float32 (__main__.TestOperatorsCUDA) from functorch/test_ops.py
  test_vmapjvpvjp_linalg_lu_cuda_float32 (__main__.TestOperatorsCUDA) from functorch/test_ops.py
  test_Conv2d_no_bias_cuda_tf32 (__main__.TestNN) from test_nn.py
  test_save_graph_repro (__main__.TestAfterAot) from dynamo/test_after_aot.py
  test_doc_examples (__main__.TestTypeHints) from test_type_hints.py
  test_caching_pinned_memory (__main__.TestCuda) from test_cuda.py
  test_graph_concurrent_replay (__main__.TestCuda) from test_cuda.py
  test_non_contiguous_tensors_nn_ConvTranspose1d_cuda_complex32 (__main__.TestModuleCUDA) from test_modules.py
  test_pickle_nn_RNN_eval_mode_cuda_float64 (__main__.TestModuleCUDA) from test_modules.py
  test_op_has_batch_rule_nn_functional_conv_transpose3d_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA) from functorch/test_vmap.py
  test_geometric_kstest_cuda_float32 (__main__.TestTorchDeviceTypeCUDA) from test_torch.py
  test_profiler_experimental_tree_with_memory (__main__.TestProfilerTree) from profiler/test_profiler_tree.py
  test_fs_pool (__main__.TestMultiprocessing) from test_multiprocessing.py
  test_forward_mode_AD_linalg_lu_factor_ex_cuda_complex128 (__main__.TestFwdGradientsCUDA) from test_ops_fwd_gradients.py
  test_vjp_linalg_lu_cuda_float32 (__main__.TestOperatorsCUDA) from functorch/test_ops.py
  test_inplace_grad_fmod_cuda_float64 (__main__.TestBwdGradientsCUDA) from test_ops_gradients.py
  test_inplace_gradgrad_remainder_cuda_float64 (__main__.TestBwdGradientsCUDA) from test_ops_gradients.py
  test_bottleneck_cuda (__main__.TestBottleneck) from test_utils.py
  test_comprehensive_empty_strided_cuda_int32 (__main__.TestInductorOpInfoCUDA) from inductor/test_torchinductor_opinfo.py
  test_vmapvjpvjp_linalg_lu_cuda_float32 (__main__.TestOperatorsCUDA) from functorch/test_ops.py
The following 11 are still flaky:
  test_transpose_with_norm (__main__.CPUReproTests) from inductor/test_cpu_repro.py, failing 215/215
  test_compare_cpu_linalg_pinv_singular_cuda_float32 (__main__.TestCommonCUDA) from test_ops.py, failing 100/100
  test_conv_bn_fuse_dynamic_shapes_cpu (__main__.DynamicShapesCodegenCpuTests) from inductor/test_torchinductor_codegen_dynamic_shapes.py, failing 115/115
  test_lobpcg (__main__.TestAutograd) from test_autograd.py, failing 50/50
  test_module_attribute_mutation_violation_negative_3 (__main__.MutationExportTests) from dynamo/test_export_mutations.py, failing 2/50
  test_Conv2d_dilated_cuda_tf32 (__main__.TestNN) from test_nn.py, failing 1/50
  test_grad_nn_GroupNorm_cuda_float64 (__main__.TestModuleCUDA) from test_modules.py, failing 50/50
  test_index_add_correctness (__main__.TestTorch) from test_torch.py, failing 22/50
  test_attn_cuda (__main__.TestMin) from functorch/test_dims.py, failing 1/50
  test_open_device_registration (__main__.TestCppExtensionOpenRgistration) from test_cpp_extensions_open_device_registration.py, failing 50/50
  test_gradgrad_nn_GroupNorm_cuda_float64 (__main__.TestModuleCUDA) from test_modules.py, failing 50/50
```

* Uploading tests stats for rerunning disabled tests takes only half a minute

```
time python3 -m tools.stats.upload_test_stats --workflow-run-id 5229037746 --workflow-run-attempt 1 --head-branch main
31.94s user 2.94s system 44% cpu 1:19.07 total
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103476
Approved by: https://github.com/clee2000
2023-06-13 17:07:40 +00:00
2cea2edc27 [easy] Fix upload test stats after master -> main switch (#99924)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99924
Approved by: https://github.com/huydhn
2023-04-24 21:19:09 +00:00
7554c10899 Fix typos under tools directory (#97779)
Fix typos under tools directory

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97779
Approved by: https://github.com/clee2000, https://github.com/kit1980
2023-03-30 08:21:35 +00:00
712bd9ae88 Upload failed and rerun tests (#97304)
Upload data for any test that didn't cleanly succeed to S3 for injestion by rockset.

About 0.001% of tests fall under this category, keeping the data usage low.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97304
Approved by: https://github.com/clee2000
2023-03-22 22:03:56 +00:00
3df1a9baca Upload external contribution data to s3 (#95747)
Context: We want to create a metric panel to track external contributions to the PyTorch repo

This PR creates a daily job to track how many external contributions occurred the day before and uploads it to a s3 collection which is accessible by rockset.

`upload_external_contrib_stats.py` is a python script which grabs the neccesary stats from github and sticks them into an s3 bucket. It is used here to do daily uploads, but can generally be used for larger queries as well.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95747
Approved by: https://github.com/huydhn, https://github.com/kit1980
2023-03-02 21:57:28 +00:00
06562529d2 Revert "Upload external contribution data to s3 (#95747)"
This reverts commit f418e1f8b63c0c15f52b373a57bfd9d65d02b172.

Reverted https://github.com/pytorch/pytorch/pull/95747 on behalf of https://github.com/clee2000 due to broke lint on master, merge base is too old, https://github.com/pytorch/pytorch/actions/runs/4315881630/jobs/7531170401 f418e1f8b6 (11721314649)
2023-03-02 17:34:14 +00:00
f418e1f8b6 Upload external contribution data to s3 (#95747)
Context: We want to create a metric panel to track external contributions to the PyTorch repo

This PR creates a daily job to track how many external contributions occurred the day before and uploads it to a s3 collection which is accessible by rockset.

`upload_external_contrib_stats.py` is a python script which grabs the neccesary stats from github and sticks them into an s3 bucket. It is used here to do daily uploads, but can generally be used for larger queries as well.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95747
Approved by: https://github.com/huydhn, https://github.com/kit1980
2023-03-02 16:03:32 +00:00
1dcd2609b5 Add retries for get_workflow_job_id and try catch in upload_test_stats (#93401)
upload_test_stats keeps failing b/c it can't handle when the id is workflow-<workflow_id> so add a try catch for this.

Add retries to get_workflow_job_id to try and reduce the number of times the id can't be found

Failure to upload test stats and inability to get the job id cause our sharding infra and slow test infra (probably also flaky test detection) to be less effective.  This does not completely resolve the issue since we do rely on the job id

Failure to get the workflow job id happens tragically often, hopefully retries will help
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93401
Approved by: https://github.com/huydhn
2023-02-01 18:33:32 +00:00
b8d3afd886 Skip upload test stats for test reports from rerun disabled tests workflow (#89548)
I have found the reason why uploading tests stats fails for rerun disabled workflow, for example https://github.com/pytorch/pytorch/actions/runs/3522896778/jobs/5917765699.  The problem is that the pytest XML file is now too big to be processed quickly (x50 bigger). Unlike unittest, `pytest-flakefinder` used by rerun disabled tests for test_ops includes skipped messages multiple times (50 times by default, retrying and skipping).  This slows down the upload test stats script too much (O(n)) because it tries to gather all the stats. On the other hand, `check_disabled_tests` doesn't suffer from the same issue because it ignores all these skipped messages.

This is a quick fix to skip test reports from rerun disabled tests workflow when trying to upload test stats.

I'll try to fix this properly later in the way we use pytest-flakefinder. From what I see, a zipped test report from rerun disabled test is only few MB ([example](https://gha-artifacts.s3.amazonaws.com/pytorch/pytorch/3521687954/1/artifact/test-reports-test-default-1-2-linux.2xlarge_9636028803.zip)), but will balloon up to a much bigger XML file after extracting from a dozen to a few hundred MB (text).  The size of the zipped file is not a big immediate problem

### Testing

[3521687954](https://github.com/pytorch/pytorch/actions/runs/3521687954) is an example workflow with rerun disabled tests and mem leak check.  The script can now finish when running locally:

* `upload_test_stats` finishes around 3+ minutes
```
time python -m tools.stats.upload_test_stats --workflow-run-id 3521687954 --workflow-run-attempt 1 --head-branch master
...
Writing 8925 documents to S3
Done!
Writing 1760 documents to S3
Done!
Writing 1675249 documents to S3
Done!
python3 -m tools.stats.upload_test_stats --workflow-run-id 3521687954  1    185.69s user 12.89s system 75% cpu 4:22.82 total
```

* `check_disabled_tests` finishes within 3 minutes
```
time python -m tools.stats.check_disabled_tests --workflow-run-id 3521687954 --workflow-run-attempt 1 --repo pytorch/pytorch
...
python -m tools.stats.check_disabled_tests --workflow-run-id 3521687954  1    154.19s user 4.17s system 97% cpu 2:42.50 total
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89548
Approved by: https://github.com/clee2000
2022-11-23 22:39:39 +00:00
582c0833d5 mac circleci workflows (#82780)
Add mac and ios workflows to circleci so they can be run on pull

m1 tests not included because circleci doesnt have machines

Unsure how to get certain environment variables (specifically for arm64 ios builds that require env vars like `IOS_SIGN_KEY_2022` and `IOS_DEV_TEAM_ID` that are stored in the org-member context which is not accessible by everyone.

doc regarding env vars https://docs.google.com/document/d/1J_3Z9sfu2vlHMF1fjdJfeTuxPXC6dgqJs7aU0KpYSBU/edit#

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82780
Approved by: https://github.com/malfet, https://github.com/huydhn
2022-08-26 18:48:48 +00:00
06a0cfc0ea pytest to run test_ops, test_ops_gradients, test_ops_jit in non linux cuda environments (#79898)
This PR uses pytest to run test_ops, test_ops_gradients, and test_ops_jit in parallel in non linux cuda environments to decrease TTS.  I am excluding linux cuda because running in parallel results in errors due to running out of memory

Notes:
* update hypothesis version for compatability with pytest
* use rerun-failures to rerun tests (similar to flaky tests, although these test files generally don't have flaky tests)
  * reruns are denoted by a rerun tag in the xml.  Failed reruns also have the failure tag.  Successes (meaning that the test is flaky) do not have the failure tag.
* see https://docs.google.com/spreadsheets/d/1aO0Rbg3y3ch7ghipt63PG2KNEUppl9a5b18Hmv2CZ4E/edit#gid=602543594 for info on speedup (or slowdown in the case of slow tests)
  * expecting windows tests to decrease by 60 minutes total
* slow test infra is expected to stay the same - verified by running pytest and unittest on the same job and check the number of skipped/run tests
* test reports to s3 changed - add entirely new table to keep track of invoking_file times
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79898
Approved by: https://github.com/malfet, https://github.com/janeyx99
2022-07-19 19:50:57 +00:00
347b036350 Apply ufmt linter to all py files under tools (#81285)
With ufmt in place https://github.com/pytorch/pytorch/pull/81157, we can now use it to gradually format all files. I'm breaking this down into multiple smaller batches to avoid too many merge conflicts later on.

This batch (as copied from the current BLACK linter config):
* `tools/**/*.py`

Upcoming batchs:
* `torchgen/**/*.py`
* `torch/package/**/*.py`
* `torch/onnx/**/*.py`
* `torch/_refs/**/*.py`
* `torch/_prims/**/*.py`
* `torch/_meta_registrations.py`
* `torch/_decomp/**/*.py`
* `test/onnx/**/*.py`

Once they are all formatted, BLACK linter will be removed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81285
Approved by: https://github.com/suo
2022-07-13 07:59:22 +00:00
769b446f18 [test stats] record invoking file in test cases (#80792)
See the comment for explanation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80792
Approved by: https://github.com/janeyx99
2022-07-05 17:04:36 +00:00
d6cd130e0b [ci] remove errant comment (#80711)
[skip ci]
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80711
Approved by: https://github.com/janeyx99
2022-06-30 17:28:03 +00:00
4dd646d036 [test stats] flush stdout before uploading to rockset (#80487)
as title
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80487
Approved by: https://github.com/janeyx99
2022-06-30 16:57:42 +00:00
4e12300e4e [ci] upload test stats to s3 instead of rockset directly (#80593)
Previously we were writing documents to Rockset directly using the write
API. This turned out to be a source of issues, occupying Rockset leaf
CPU and making other queries timeout. Also, we are starting to get rate
limited by Rockset, leading to data loss.

Instead, write test stats to s3 and let Rocket's managed integration
sync it. This appears to be significantly more efficient, and should
solve our throughput issues fundamentally.

Hopefully we can re-enable per-PR stats after this change, but let's see
how it does first.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80593
Approved by: https://github.com/janeyx99
2022-06-30 15:17:56 +00:00
d67ce755ad [ci] only upload summarized test stats for PRs (#80295)
We are uploading too many tests which is stressing out Rockset. We do
not really use granular per-test statistics for PRs, so change to
uploading summary information for PRs.

We will still be uploading per-test stats from master.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80295
Approved by: https://github.com/seemethere, https://github.com/malfet, https://github.com/janeyx99
2022-06-27 18:40:39 +00:00
8878bc7bfe Revert "[ci] only upload summarized test stats for PRs (#80295)"
This reverts commit 5d595182c7151b8dfae3a9b42569e22657bc5fbc.

Reverted https://github.com/pytorch/pytorch/pull/80295 on behalf of https://github.com/suo due to reverting and relanding with a small bugfix
2022-06-27 18:33:05 +00:00
5d595182c7 [ci] only upload summarized test stats for PRs (#80295)
We are uploading too many tests which is stressing out Rockset. We do
not really use granular per-test statistics for PRs, so change to
uploading summary information for PRs.

We will still be uploading per-test stats from master.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80295
Approved by: https://github.com/seemethere, https://github.com/malfet
2022-06-27 18:24:15 +00:00
ea934a580c [ci] Pull out generic upload_test_stats functionality
I want to re-use this to upload sccache stats.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79367

Approved by: https://github.com/janeyx99
2022-06-13 16:09:53 +00:00
03a1bb61fc [ci] improvements to test stats uploading
Two improvements that are useful for `testsuite` uploading:
- When there are multiple tags of the same name, coalesce them into a
  list. This allows us to capture, e.g. when a `testsuite` contains
  multiple `testcase`s.
- Be able to skip a tag. We don't really want the inner `testcase` tags,
  since we upload those separately.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79364

Approved by: https://github.com/janeyx99
2022-06-13 16:09:51 +00:00
eaaa34daef [ci] write test suites to rockset
Currently we upload all `testcase` elements as individual test runs to
Rockset. It would be nice to also have `testsuite`s as well, which
aggregate high level information.

These aggregations could technically be performed in the backend, but it's
faster to just log the data since we already have it in the XML test
report.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79265

Approved by: https://github.com/seemethere
2022-06-10 15:38:09 +00:00
b26c5b4638 [ci] refactor upload_test_stats + add unit test
Clean some things up and add a unit test.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79210

Approved by: https://github.com/janeyx99
2022-06-10 15:38:09 +00:00
e99c8ec3c2 [ci] use TemporaryDirectory in upload_test_stats.py
This replaces some of the manual work we were doign with the python
standard lib

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79152

Approved by: https://github.com/janeyx99
2022-06-09 16:21:28 +00:00
298b9ad708 [ci] harden test stats reporting
1. Fix bug where we were uploading stats to the wrong place in s3
2. Hard error when upload_test_stats.py didn't find any stats in s3
3. Change the name of the workflow job to include the corresponding id,
to make debugging easier in the future.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78763

Approved by: https://github.com/janeyx99, https://github.com/seemethere
2022-06-03 15:21:19 +00:00
a1f0e69519 [ci] improve upload test stats logging
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78179

Approved by: https://github.com/janeyx99
2022-05-24 17:32:20 +00:00
2c0268c41f [ci] fix bugs in test stats upload
- Run attempt detection was broken because it was comparing a str
  (retrieved from the CLI input) to an int (retrieved from the
  filename). Make them both ints so they will actually compare equal.
- `root.findall` only searches direct children, which didn't work for cpp
  unittests and pytest-generated reports. Change to `root.iter` which
  does a recursive search.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76982

Approved by: https://github.com/janeyx99
2022-05-06 21:37:49 +00:00
8e95a2cda6 [wip] Correctly use pip3 and python3 for upload-test-stats
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75456

Approved by: https://github.com/ZolotukhinM
2022-04-08 03:36:10 +00:00
9b563d6468 initial test stats work
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74406

Approved by: https://github.com/atalman, https://github.com/janeyx99
2022-04-06 20:40:04 +00:00