Target determination sorts the tests in a PR CI run based on heuristics about which tests are more relevant to the PR's changes. This can help provide faster CI signal as well as help alleviate capacity concerns as job durations should decrease due to catching failures earlier.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156545
Approved by: https://github.com/jeffdaily, https://github.com/clee2000
Sometimes a test file reports success according to pytest, but fails afterwards, and the rerun logic doesn't handle that correctly.
The name of the last run test is saved in order to do more efficient reruns (target the last run test for a rerun without rerunning the entire file). This usually correct, ex test fails and pytest catches it -> lastrun = the test that failed, test segfaults (pytest doesn't catch) -> lastrun is the test that segfaulted. But sometimes pytest reports a success, but the process has non zero exit code. The two cases I know of are hangs and double freeing at exit. In this case, its unclear which test caused the failure, so lastrun is set to be the first test that ran in that session, so that during the next session it will start from the beginning in an attempt to replicate the error (an alternate solution would be to just fail and not rerun, which might be the better option). But then it reruns with runsingle, which prevents lastrun from being reset (not sure why, I'm pretty sure there's no difference between resetting and not normally), so lastrun becomes the last test that ran, and its not always true that lastrun is the one that caused it. Then on the next run, it starts from the last test and the process now exits cleanly
Short term solution here: ensure the lastrun is always set to the initial value if the session succeeds. This is correct even in the normal path because initial value shouldn't change in that case
Things that still need to be fixed:
* log says "running single test" which is not true
* no xml reports get generated here
* also no xml reports get generated on segfault
* docs for this
I think I have a PR that fixes the above but its old so I need to take another look
Testing:
This from when I was based on a commit that had a hang for macs, and before I added the skips in inductor array ref:
cc862d2c14
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155853
Approved by: https://github.com/malfet
I guess this is more of an RFC
Goal:
Enable keep going so that we can get information immediately for failures. We want be aware of failures as soon as possible, especially on the main branch, this is so that reverts can happen quickly.
Proposal:
A job with `keep-going` will continue through errors in `python run_test.py`. If a test fails, before it runs the next test, it will upload a fake log that should have enough information in it so that viewing the log will be able to tell you what failed and any stack traces/error logs, and should be able to be parsed by log classifier to get a line.
I am getting the log by concating the test logs in test/test-reports, which is all the text outputted by pytest (unless someone runs with `ci-verbose-test-logs` label). There are obviously many things this won't catch, ex output outside of run_test.py, some output inside of run_test.py, but it should be enough.
After a log finishes, eventually its raw log is uploaded to ossci-raw-job-status s3 bucket and the log classifier will read it to do classification. This means we will have to change log classifier to read from this bucket as well.
I'm thinking just add an input parameter to log classifier like https://github.com/pytorch/test-infra/pull/6723/files
Also upload the temp results to a temp attribute instead of the real one
To overwrite the conclusion on HUD, I'm thinking a lambda that is s3 put trigger on the fake log being put into s3, that does something similar to log classifier where it just mutates the entry 13a990b678/aws/lambda/log-classifier/src/network.rs (L85) to add a new field like "will_fail": true, and also triggers the log classifier to run
Then we change HUD/ClickHouse to point the raw log url to the alternate place, the new "will_fail" field as the conclusion, and the temp log classifier result if needed
Why always write to temp attribution/column? I am unsure about overwriting the real results with fake ones
Pros:
Not many changes outside of HUD/UI
Cons:
Lots of moving parts, lots of temp fields that will require adjustment for queries, temp fields never really get deleted
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155371
Approved by: https://github.com/malfet
This is a follow-up PR of the reverted one https://github.com/pytorch/pytorch/pull/148981 re-opening for visibility :
Modified TorchInductor’s autotuning flow so that each best_config JSON file also includes the Triton “base32” (or base64) cache key.
Motivation
Debugging & Analysis: With this change, we can quickly identify which compiled binary and IRs belongs to a given best config.
The impact is minimal since it is only an extra field in .best_config. It can help advanced performance tuning or kernel-level debugging.
Also, since Triton already stores cubin/hsaco in its cache, developers/researchers can avoid to set store_cubin = True since they can get the cubin/hsaco in the Triton cache and with the code provided in this PR, they can easily match the best_config with the right Triton cache directory for the "best" kernel.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154618
Approved by: https://github.com/jansel
Related: #148920
This PR:
* Provides a helper `install_cpp_extension(extension_root)` for building C++ extensions. This is intended to be used in `TestMyCppExtension.setUpClass()`
* Updates libtorch_agnostic tests to use this
* Deletes preexisting libtorch_agnostic tests from `test/test_cpp_extensions_aot.py`
* Fixes `run_test.py` to actually run tests in `test/cpp_extensions/libtorch_agnostic_extension/test/test_libtorch_agnostic.py` to avoid losing coverage. This wasn't being run due to logic excluding tests that start with "cpp"; this is fixed now
After this PR, it is now possible to run:
```
python test/cpp_extensions/libtorch_agnostic_extension/test/test_libtorch_agnostic.py
```
and the test will build the `libtorch_agnostic` extension before running the tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153264
Approved by: https://github.com/janeyx99
This is a follow-up PR of the reverted one https://github.com/pytorch/pytorch/pull/147019 :
Modified TorchInductor’s autotuning flow so that each best_config JSON file also includes the Triton “base32” (or base64) cache key.
Motivation
Debugging & Analysis: With this change, we can quickly identify which compiled binary and IRs belongs to a given best config.
The impact is minimal since it is only an extra field in .best_config. It can help advanced performance tuning or kernel-level debugging.
Also, since Triton already stores cubin/hsaco in its cache, developers/researchers can avoid to set store_cubin = True since they can get the cubin/hsaco in the Triton cache and with the code provided in this PR, they can easily match the best_config with the right Triton cache directory for the "best" kernel.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148981
Approved by: https://github.com/davidberard98
There are only 118 failures atm, mark them all with xfail to avoid new regressions
Add `xfail_if_mps_unimplemented` decorator to distinguish between tests that call unimplemented eager op vs ones that fail for some other reason.
Added `aten._scaled_dot_product_attention_math_for_mps` fallback to make test behavior consistent between MacOS-15 (where falback is in place) and MacOS-14
Weird MacOS-14 specific skips:
- test_torchinductor.py::GPUTests::test_cat_extern_kernel_mps
- test_torchinductor.py::GPUTests::test_sort_transpose_mps (likely an eager bug)
- test_torchinductor.py::GPUTests::test_unaligned_input_mps
Numerous MacOS-13 skips, including few eager hard crashes, for example running `test_torchinductor.py::GPUTests::test_scatter5_mps` causes
```
/AppleInternal/Library/BuildRoots/c651a45f-806e-11ed-a221-7ef33c48bc85/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSNDArray/Kernels/MPSNDArrayScatter.mm:309: failed assertion `Rank of destination array (1) must be greater than or equal to inner-most dimension of indices array (3)'
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150821
Approved by: https://github.com/ZainRizvi, https://github.com/dcci
ghstack dependencies: #151224, #151246, #151272, #151282, #151288
Enable TD on distributed cpu, I think the only reason it's not is because I forgot to enable it
Get rid of some of the statements that are no ops:
* asan uses default shard
* nogpu got moved to periodic
* no windows cuda testing anymore
Only thing on pull and trunk that doesn't use TD is dynamo_wrapped but I think it's fast enough to be ok for now, we can take another look after this
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150028
Approved by: https://github.com/ZainRizvi
This PR adds two main parts:
- shim.h stable C APIs into torch::Library APIs
- a higher level API in torch/csrc/stable/library.h that calls into this shim.h + otherwise is self contained
Goal: custom kernel writers should be able to call the apis in the directories above in order to register their library in a way that allows their custom extension to run with a different libtorch version than it was built with.
Subplots resolved:
- Do we want a whole separate StableLibrary or do we want to freeze torch::Library and add `m.stable_impl(cstring, void (*fn)(void **, int64_t, int64_t)` into it
- Yes, we want a separate StableLibrary. We cannot freeze Library and it is NOT header only.
- Should I use unint64_t as the common denominator instead of void* to support 32bit architectures better?
- Yes, and done
- Should I add a stable `def` and `fragment` when those can be done in python?
- I think we do want these --- and now they're done
- Where should library_stable_impl.cpp live? -- no longer relevant
- I need some solid test cases to make sure everything's going ok. I've intentionally thrown in a bunch of random dtypes into the signature, but I still haven't tested returning multiple things, returning nothing, complex dtypes, etc.
- Have since tested all the torch library endpoints. the others can be tested in a followup to separate components that need to be in shim.h vs can be added later
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148124
Approved by: https://github.com/albanD, https://github.com/zou3519, https://github.com/atalman
This PR adds two main parts:
- shim.h stable C APIs into torch::Library APIs
- a higher level API in torch/csrc/stable/library.h that calls into this shim.h + otherwise is self contained
Goal: custom kernel writers should be able to call the apis in the directories above in order to register their library in a way that allows their custom extension to run with a different libtorch version than it was built with.
Subplots resolved:
- Do we want a whole separate StableLibrary or do we want to freeze torch::Library and add `m.stable_impl(cstring, void (*fn)(void **, int64_t, int64_t)` into it
- Yes, we want a separate StableLibrary. We cannot freeze Library and it is NOT header only.
- Should I use unint64_t as the common denominator instead of void* to support 32bit architectures better?
- Yes, and done
- Should I add a stable `def` and `fragment` when those can be done in python?
- I think we do want these --- and now they're done
- Where should library_stable_impl.cpp live? -- no longer relevant
- I need some solid test cases to make sure everything's going ok. I've intentionally thrown in a bunch of random dtypes into the signature, but I still haven't tested returning multiple things, returning nothing, complex dtypes, etc.
- Have since tested all the torch library endpoints. the others can be tested in a followup to separate components that need to be in shim.h vs can be added later
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148124
Approved by: https://github.com/albanD, https://github.com/zou3519
Modified TorchInductor’s autotuning flow so that each `best_config` JSON file also includes the Triton “base32” (or base64) cache key.
**Motivation**
Debugging & Analysis: With this change, we can quickly identify which compiled binary and IRs belongs to a given best config.
The impact is minimal since it is only an extra field in .best_config. It can help advanced performance tuning or kernel-level debugging.
Also, since Triton already stores cubin/hsaco in its cache, developers/researchers can avoid to set `store_cubin = True` since they can get the cubin/hsaco in the Triton cache and with the code provided in this PR, they can easily match the best_config with the right Triton cache directory for the "best" kernel.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/147019
Approved by: https://github.com/davidberard98
The distributed tests are executed once for each backend and for each init method.
`$TEST_REPORT_SOURCE_OVERRIDE` is used such that test results from different backends are stored in different files.
The same needs to be done for the init method.
Move the setting of the variable into `test_distributed` and incorporate the init method into the name.
Useful for e.g. https://github.com/pytorch/pytorch/issues/126523
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148325
Approved by: https://github.com/clee2000
Reference: https://docs.astral.sh/ruff/formatter/black/#assert-statements
> Unlike Black, Ruff prefers breaking the message over breaking the assertion, similar to how both Ruff and Black prefer breaking the assignment value over breaking the assignment target:
>
> ```python
> # Input
> assert (
> len(policy_types) >= priority + num_duplicates
> ), f"This tests needs at least {priority+num_duplicates} many types."
>
>
> # Black
> assert (
> len(policy_types) >= priority + num_duplicates
> ), f"This tests needs at least {priority+num_duplicates} many types."
>
> # Ruff
> assert len(policy_types) >= priority + num_duplicates, (
> f"This tests needs at least {priority + num_duplicates} many types."
> )
> ```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144546
Approved by: https://github.com/malfet
Split test_transformers.py into test_transformers.py and test_transformers_privateuser1.py. Currently the privateuse1 test cases in test_transformers.py are skipped since they conflict with cuda test cases.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/147441
Approved by: https://github.com/drisspg
When the test times are generated, it doesn't know what the build environment is because it's an environment variable. But when we index into the test times, we (previously) didn't know what the job name is. These are usually the same but sometimes they're different and when they're different it ends up using default, which can have unbalanced sharding
I think job name was added at some point to most of the CI environments but I didn't realize, so we can now update this code to use the job name instead so the generation and the indexing match
also upload stats workflow for mps
Checked that inductor_amx doesn't use default
Pull Request resolved: https://github.com/pytorch/pytorch/pull/147154
Approved by: https://github.com/huydhn
Periodically run testsuite for s390x
**Dependencies update**
Package z3-solver is updated from version 4.12.2.0 to version 4.12.6.0. This is a minor version update, so no functional change is expected.
The reason for update is build on s390x. pypi doesn't provide binary build for z3-solver for versions 4.12.2.0 or 4.12.6.0 for s390x. Unfortunately, version 4.12.2.0 fails to build with newer gcc used on s390x builders, but those errors are fixed in version 4.12.6.0. Due to this minor version bump fixes build on s390x.
```
# pip3 install z3-solver==4.12.2.0
...
In file included from /tmp/pip-install-756iytc6/z3-solver_ce6f750b780b4146a9a7c01e52672071/core/src/util/region.cpp:53:
/tmp/pip-install-756iytc6/z3-solver_ce6f750b780b4146a9a7c01e52672071/core/src/util/region.cpp: In member function ‘void* region::allocate(size_t)’:
/tmp/pip-install-756iytc6/z3-solver_ce6f750b780b4146a9a7c01e52672071/core/src/util/tptr.h:29:62: error: ‘uintptr_t’ does not name a type
29 | #define ALIGN(T, PTR) reinterpret_cast<T>(((reinterpret_cast<uintptr_t>(PTR) >> PTR_ALIGNMENT) + \
| ^~~~~~~~~
/tmp/pip-install-756iytc6/z3-solver_ce6f750b780b4146a9a7c01e52672071/core/src/util/region.cpp:82:22: note: in expansion of macro ‘ALIGN’
82 | m_curr_ptr = ALIGN(char *, new_curr_ptr);
| ^~~~~
/tmp/pip-install-756iytc6/z3-solver_ce6f750b780b4146a9a7c01e52672071/core/src/util/region.cpp:57:1: note: ‘uintptr_t’ is defined in header ‘<cstdint>’; did you forget to ‘#include <cstdint>’?
56 | #include "util/page.h"
+++ |+#include <cstdint>
57 |
```
**Python paths update**
On AlmaLinux 8 s390x, old paths:
```
python -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())'
/usr/lib/python3.12/site-packages
```
Total result is `/usr/lib/python3.12/site-packages/torch;/usr/lib/python3.12/site-packages`
New paths:
```
python -c 'import site; print(";".join([x for x in site.getsitepackages()] + [x + "/torch" for x in site.getsitepackages()]))'
/usr/local/lib64/python3.12/site-packages;/usr/local/lib/python3.12/site-packages;/usr/lib64/python3.12/site-packages;/usr/lib/python3.12/site-packages;/usr/local/lib64/python3.12/site-packages/torch;/usr/local/lib/python3.12/site-packages/torch;/usr/lib64/python3.12/site-packages/torch;/usr/lib/python3.12/site-packages/torch
```
```
# python -c 'import torch ; print(torch)'
<module 'torch' from '/usr/local/lib64/python3.12/site-packages/torch/__init__.py'>
```
`pip3 install dist/*.whl` installs torch into `/usr/local/lib64/python3.12/site-packages`, and later it's not found by cmake with old paths:
```
CMake Error at CMakeLists.txt:9 (find_package):
By not providing "FindTorch.cmake" in CMAKE_MODULE_PATH this project has
asked CMake to find a package configuration file provided by "Torch", but
CMake did not find one.
```
https://github.com/pytorch/pytorch/actions/runs/10994060107/job/30521868178?pr=125401
**Builders availability**
Build took 60 minutes
Tests took: 150, 110, 65, 55, 115, 85, 50, 70, 105, 110 minutes (split into 10 shards)
60 + 150 + 110 + 65 + 55 + 115 + 85 + 50 + 70 + 105 + 110 = 975 minutes used. Let's double it. It would be 1950 minutes.
We have 20 machines * 24 hours = 20 * 24 * 60 = 20 * 1440 = 28800 minutes
We currently run 5 nightly binaries builds, each on average 90 minutes build, 15 minutes test, 5 minutes upload, 110 minutes total for each, 550 minutes total. Doubling would be 1100 minutes.
That leaves 28800 - 1100 = 27700 minutes total. Periodic tests would use will leave 25750 minutes.
Nightly binaries build + nightly tests = 3050 minutes.
25750 / 3050 = 8.44. So we could do both 8 more times for additional CI runs for any reason. And that is with pretty good safety margin.
**Skip test_tensorexpr**
On s390x, pytorch is built without llvm.
Even if it would be built with llvm, llvm currently doesn't support used features on s390x and test fails with errors like:
```
JIT session error: Unsupported target machine architecture in ELF object pytorch-jitted-objectbuffer
unknown file: Failure
C++ exception with description "valOrErr INTERNAL ASSERT FAILED at "/var/lib/jenkins/workspace/torch/csrc/jit/tensorexpr/llvm_jit.h":34, please report a bug to PyTorch. Unexpected failure in LLVM JIT: Failed to materialize symbols: { (main, { func }) }
```
**Disable cpp/static_runtime_test on s390x**
Quantization is not fully supported on s390x in pytorch yet.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125401
Approved by: https://github.com/malfet
Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>