Commit Graph

674 Commits

Author SHA1 Message Date
674d59359d [ROCm] Enable dist sharded_tensor test suites (#137724)
Following test suites are enabled on ROCm
test_sharded_tensor
test_sharded_tensor_reshard
test_sharding_plan

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137724
Approved by: https://github.com/jithunnair-amd, https://github.com/pruthvistony, https://github.com/malfet
2024-10-14 20:20:57 +00:00
47af7cc962 Add compiler bisector (#131936)
This is a utility to aid the torch.compile debugging. You provide a function that returns True on success, False on failure, or do something out of process and run bisect_helper `good | bad`.

The bisector will first go through backends - `eager`, `aot_eager`, `aot_eager_decomp_partition`, `inductor` to find the first failing backend. Then, it will go through subsystems within the backend - currently limited but could be expanded - and try to find the first subsystem for which disabling fixes the problem. Once it has found the failing subsystem, it will find the number of times the subsystem is applied, and then bisect through it.

An example usage of how to hook it up for aot_eager_decomp_partition and decomposition subsystem is :

```
    from torch._inductor.bisect_helper import BisectionManager
    if op in CURRENT_DECOMPOSITION_TABLE:
        if BisectionManager.disable_subsystem("aot_eager_decomp_partition", "decomposition", lambda: repr(op)):
            return NotImplemented
```

Once it has discovered the problematic change, it will print out the associated debug info, and you can set the same limits with `TORCH_BISECT_BACKEND` `TORCH_BISECT_SUBSYSTEM` and `TORCH_BISECT_MAX`.

We could add further options as an automated way of going through a check list for checking divergence - e.g., the mode to emulate amp casts.

Fix for https://github.com/pytorch/pytorch/issues/126546

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131936
Approved by: https://github.com/ezyang
2024-10-09 20:34:11 +00:00
e27c0048db Enable additional tests for MPS CI runs (#134356)
As part of the follow up for https://github.com/pytorch/pytorch/issues/133520, adapting existing unused tests for use in MPS CI runs. Focusing on nhwc & other memory formatting tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134356
Approved by: https://github.com/malfet, https://github.com/eqy, https://github.com/huydhn
2024-10-04 21:52:38 +00:00
a619ced5ed Revert "Update run_test.py"
This reverts commit 193073b4914a7f80758541d391eacbe21194ecdf.
2024-09-26 17:34:52 -07:00
193073b491 Update run_test.py 2024-09-26 16:56:29 -07:00
74fd1bf965 [ROCm] Update to AOTriton 0.7b (#134498)
Notable changes:
1. Enable CudaGraph related tests
2. Fix UT problems
3. EXPERIMENTAL Navi31 support. User should enable Navi31 support with Env Var `TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1`

Know Problem:
1. `test/test_transformers.py` will massive failures and/or NaN outputs with `--use-pytest`
    + Update: Confirmed skip `class TestSDPAPrivateUse1Only` can fix the problem with `--use-pytest`

Note:
AOTriton 0.7b adds support to nestedtenosrs+SDPA but need more work (and consequently a separate PR) to enable it.

Fixes #133540

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134498
Approved by: https://github.com/pruthvistony, https://github.com/jeffdaily, https://github.com/malfet
2024-09-11 20:34:01 +00:00
16b8146c9e Exclude test_transformers and unit tests which require recent GPU arch (#132895)
This PR is to exclude test_transformers on ROCm temporarily and skip some unit tests which require recent GPU arch.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132895
Approved by: https://github.com/jithunnair-amd, https://github.com/pruthvistony, https://github.com/malfet
2024-08-27 20:40:53 +00:00
1565940114 [MPS] Add test/test_nn.py to test suite (#134184)
This PR increases test coverage by including the tests in `test/test_nn.py` in the test suite of MPS.

Some of the tests are decorated with `@expectedFailureMPS` for various reasons. Either that the op is not implemented, or that the outputs do not align. Those tests that contain differing results should be investigated further to rule out any live bugs.

```bash
$ python test/run_test.py --mps --verbose -k TestNN
Running test batch 'tests to run' cost 84.76 seconds
```

Ref #133520

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134184
Approved by: https://github.com/albanD, https://github.com/malfet
2024-08-26 23:48:23 +00:00
28a4db84f2 [ARM] Fix infinite recursion in unwind (#134387)
Fixes #119905

The `TORCH_SHOW_CPP_STACKTRACES=1` setting on ARM causes infinite recursive unwind because on failure a `StackTraceFetcher` attempts to unwind the <ins>failed instruction</ins>: 5ad759ca33/torch/csrc/profiler/combined_traceback.cpp (L25)
then the unwind itself fails:
5ad759ca33/torch/csrc/profiler/unwind/unwind.cpp (L10-L12)
and it causes another attempt to unwind the failure in `unwind()`...

In summary, the executed instruction is equivalent to:
```C++
std::vector<void*> unwind() {
  // some instructions ...
  return unwind();
}
```
This PR replaces `TORCH_CHECK` by `TORCH_WARN_ONCE` as it will not cause an uncontrolled recursion. The only side effect would be an empty back-trace.

Huge thanks to @nWEIdia who found the root cause!

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134387
Approved by: https://github.com/eqy, https://github.com/nWEIdia, https://github.com/malfet
2024-08-26 21:02:31 +00:00
99cf567714 Make SCRIBE_GRAPHQL_ACCESS_TOKEN available to test jobs running on main (#133536)
It is possible to write to Meta's internal in-memory database Scuba via the Scribe Graph API: https://www.internalfb.com/intern/wiki/Scribe/users/Knowledge_Base/Interacting_with_Scribe_categories/Graph_API/ This is currently being used by pytorch/benchmark repo to upload torchbench performance results.

I want to make this API generally available to all jobs running on CI in a semi-trusted context. To talk to Scribe, you need a secret access token. I have initially configured an environment prod-branch-main which contains `SCRIBE_GRAPHQL_ACCESS_TOKEN`, and switched a single class of jobs (linux-test) to use this environment when they are running on the main branch. Because we require approvals for running CI on untrusted contributions, we could potentially allow all jobs to run in this environment, including jobs on PRs, but I don't need this for my use case (per-PR benchmark result reporting, and miscellaneous statistics on main.)

If this works, I'll push out this environment to the rest of our test jobs.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133536
Approved by: https://github.com/xuzhao9, https://github.com/malfet, https://github.com/albanD
2024-08-15 19:53:17 +00:00
a6ad834fa8 Fix counting execution time in run_test.py (#133199)
Counting `elapsed_time` immediately after `start_time`, not reflect real execution time of `test_batch`.

Move `elapsed_time` and print method after `run_tests` method call to fix it.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133199
Approved by: https://github.com/clee2000
2024-08-15 15:29:44 +00:00
72f2b29bb0 [CI] disable xpu kineto build (#133069)
Due to the xpu kineto support PR https://github.com/pytorch/pytorch/pull/130811 landed, but the xpu ci infra not ready for now. Disable kineto build as a temp WA
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133069
Approved by: https://github.com/seemethere
2024-08-09 23:58:50 +00:00
4226ed1585 [BE] Format uncategorized Python files with ruff format (#132576)
Remove patterns `**`, `test/**`, and `torch/**` in `tools/linter/adapters/pyfmt_linter.py` and run `lintrunner`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132576
Approved by: https://github.com/ezyang, https://github.com/Skylion007
ghstack dependencies: #132574
2024-08-04 17:13:31 +00:00
5cc34f61d1 [CI] add new test config label ci-test-showlocals to control test log verbosity (#131981)
Add a new label `ci-test-showlocals` and add it to test config filter.
If the PR is labeled with `ci-test-showlocals` or "ci-test-showlocals"
present in the PR comment, the test config filter will set a environment
variable `TEST_SHOWLOCALS`. Then `pytest` will show local variables on
failures for better debugging.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131981
Approved by: https://github.com/malfet
ghstack dependencies: #131151
2024-07-29 18:53:14 +00:00
4694ee1ad2 [BE][tests] show local variables on failure in tests (#131151)
------

As per the title, add argument `--locals` for `unittest` and `--showlocals --tb=long` for `pytest` in CI.

Some failures cannot be reproduced on the local machine but exist on cloud CI. This change allows us to investigate the test failure more easily.

Example output: https://github.com/pytorch/pytorch/actions/runs/9961546996/job/27523888353?pr=130710#step:20:3361

```text
/opt/conda/envs/py_3.8/lib/python3.8/site-packages/sympy/core/function.py:307:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

cls = FloorDiv, base = -1.00000000000000, divisor = -1.00000000000000

    @classmethod
    def eval(cls, base, divisor):
        # python test/test_dynamic_shapes.py -k TestDimConstraints.test_dim_constraints_solve_full
        # Assert triggered by inequality solver
        # assert base.is_integer, base
        # assert divisor.is_integer, divisor

        # We don't provide the same error message as in Python because SymPy
        # makes it difficult to check the types.
        if divisor.is_zero:
            raise ZeroDivisionError("division by zero")
        if base in (int_oo, -int_oo, sympy.oo, -sympy.oo) and divisor in (
            int_oo,
            -int_oo,
            sympy.oo,
            -sympy.oo,
        ):
            return sympy.nan
        if base is sympy.nan or divisor is sympy.nan:
            return sympy.nan

        if base.is_zero:
            return sympy.S.Zero
        if base.is_integer and divisor == 1:
            return base
        if base.is_integer and divisor == -1:
            return sympy.Mul(base, -1)
        if (
            isinstance(base, sympy.Number)
            and isinstance(divisor, sympy.Number)
            and (
                base in (int_oo, -int_oo, sympy.oo, -sympy.oo)
                or divisor in (int_oo, -int_oo, sympy.oo, -sympy.oo)
            )
        ):
            r = float(base) / float(divisor)
            if r == math.inf:
                return int_oo
            elif r == -math.inf:
                return -int_oo
            elif math.isnan(r):
                return sympy.nan
            else:
                return sympy.Integer(math.floor(r))
        if isinstance(base, sympy.Integer) and isinstance(divisor, sympy.Integer):
            return sympy.Integer(int(base) // int(divisor))
        if isinstance(base, FloorDiv):
            return FloorDiv(base.args[0], base.args[1] * divisor)

        # Expands (x + y) // b into x // b + y // b.
        # This only works if floor is an identity, i.e. x / b is an integer.
        for term in sympy.Add.make_args(base):
            quotient = term / divisor
            if quotient.is_integer and isinstance(divisor, sympy.Integer):
                # NB: this is correct even if the divisor is not an integer, but it
                # creates rational expressions that cause problems with dynamic
                # shapes.
                return FloorDiv(base - term, divisor) + quotient

        try:
            gcd = sympy.gcd(base, divisor)
            if gcd != 1:
>               return FloorDiv(
                    sympy.simplify(base / gcd), sympy.simplify(divisor / gcd)
                )

base       = -1.00000000000000
cls        = FloorDiv
divisor    = -1.00000000000000
gcd        = 1.00000000000000
quotient   = 1.00000000000000
term       = -1.00000000000000

/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/utils/_sympy/functions.py:159:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

args = (FloorDiv, -1.00000000000000, -1.00000000000000), kwargs = {}

    @wraps(func)
    def wrapper(*args, **kwargs):
        try:
>           retval = cfunc(*args, **kwargs)
E           RecursionError: maximum recursion depth exceeded in comparison
E
E           To execute this test, run the following from the base repo dir:
E               python test/test_sympy_utils.py -k TestValueRanges.test_binary_ref_fn_floordiv_dtype_float
E
E           This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0

args       = (FloorDiv, -1.00000000000000, -1.00000000000000)
cfunc      = <functools._lru_cache_wrapper object at 0x7fc5303173a0>
func       = <function Function.__new__ at 0x7fc530317280>
kwargs     = {}
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131151
Approved by: https://github.com/ezyang
2024-07-29 18:53:14 +00:00
c35f21e5fc Revert "[BE][tests] show local variables on failure in tests (#131151)"
This reverts commit 14158d892a2bd9b34edb5637f9a05217ea0330bd.

Reverted https://github.com/pytorch/pytorch/pull/131151 on behalf of https://github.com/atalman due to Broke CI: test_testing.py::TestTestingCUDA::test_cuda_assert_should_stop_common_device_type_test_suite_cuda [GH job link](https://github.com/pytorch/pytorch/actions/runs/10131415299/job/28014665693) [HUD commit link](14158d892a) ([comment](https://github.com/pytorch/pytorch/pull/131151#issuecomment-2255921015))
2024-07-29 13:19:38 +00:00
06fe99a097 Revert "[CI] add new test config label ci-test-showlocals to control test log verbosity (#131981)"
This reverts commit dfa18bf3f39c5a90b48baf956e50fa7da4462d3d.

Reverted https://github.com/pytorch/pytorch/pull/131981 on behalf of https://github.com/atalman due to Sorry, need to revert bottom PR, which broke CI: https://github.com/pytorch/pytorch/pull/131151 ([comment](https://github.com/pytorch/pytorch/pull/131981#issuecomment-2255892628))
2024-07-29 13:09:41 +00:00
dfa18bf3f3 [CI] add new test config label ci-test-showlocals to control test log verbosity (#131981)
Add a new label `ci-test-showlocals` and add it to test config filter.
If the PR is labeled with `ci-test-showlocals` or "ci-test-showlocals"
present in the PR comment, the test config filter will set a environment
variable `TEST_SHOWLOCALS`. Then `pytest` will show local variables on
failures for better debugging.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131981
Approved by: https://github.com/malfet
2024-07-29 07:40:42 +00:00
14158d892a [BE][tests] show local variables on failure in tests (#131151)
------

As per the title, add argument `--locals` for `unittest` and `--showlocals --tb=long` for `pytest` in CI.

Some failures cannot be reproduced on the local machine but exist on cloud CI. This change allows us to investigate the test failure more easily.

Example output: https://github.com/pytorch/pytorch/actions/runs/9961546996/job/27523888353?pr=130710#step:20:3361

```text
/opt/conda/envs/py_3.8/lib/python3.8/site-packages/sympy/core/function.py:307:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

cls = FloorDiv, base = -1.00000000000000, divisor = -1.00000000000000

    @classmethod
    def eval(cls, base, divisor):
        # python test/test_dynamic_shapes.py -k TestDimConstraints.test_dim_constraints_solve_full
        # Assert triggered by inequality solver
        # assert base.is_integer, base
        # assert divisor.is_integer, divisor

        # We don't provide the same error message as in Python because SymPy
        # makes it difficult to check the types.
        if divisor.is_zero:
            raise ZeroDivisionError("division by zero")
        if base in (int_oo, -int_oo, sympy.oo, -sympy.oo) and divisor in (
            int_oo,
            -int_oo,
            sympy.oo,
            -sympy.oo,
        ):
            return sympy.nan
        if base is sympy.nan or divisor is sympy.nan:
            return sympy.nan

        if base.is_zero:
            return sympy.S.Zero
        if base.is_integer and divisor == 1:
            return base
        if base.is_integer and divisor == -1:
            return sympy.Mul(base, -1)
        if (
            isinstance(base, sympy.Number)
            and isinstance(divisor, sympy.Number)
            and (
                base in (int_oo, -int_oo, sympy.oo, -sympy.oo)
                or divisor in (int_oo, -int_oo, sympy.oo, -sympy.oo)
            )
        ):
            r = float(base) / float(divisor)
            if r == math.inf:
                return int_oo
            elif r == -math.inf:
                return -int_oo
            elif math.isnan(r):
                return sympy.nan
            else:
                return sympy.Integer(math.floor(r))
        if isinstance(base, sympy.Integer) and isinstance(divisor, sympy.Integer):
            return sympy.Integer(int(base) // int(divisor))
        if isinstance(base, FloorDiv):
            return FloorDiv(base.args[0], base.args[1] * divisor)

        # Expands (x + y) // b into x // b + y // b.
        # This only works if floor is an identity, i.e. x / b is an integer.
        for term in sympy.Add.make_args(base):
            quotient = term / divisor
            if quotient.is_integer and isinstance(divisor, sympy.Integer):
                # NB: this is correct even if the divisor is not an integer, but it
                # creates rational expressions that cause problems with dynamic
                # shapes.
                return FloorDiv(base - term, divisor) + quotient

        try:
            gcd = sympy.gcd(base, divisor)
            if gcd != 1:
>               return FloorDiv(
                    sympy.simplify(base / gcd), sympy.simplify(divisor / gcd)
                )

base       = -1.00000000000000
cls        = FloorDiv
divisor    = -1.00000000000000
gcd        = 1.00000000000000
quotient   = 1.00000000000000
term       = -1.00000000000000

/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/utils/_sympy/functions.py:159:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

args = (FloorDiv, -1.00000000000000, -1.00000000000000), kwargs = {}

    @wraps(func)
    def wrapper(*args, **kwargs):
        try:
>           retval = cfunc(*args, **kwargs)
E           RecursionError: maximum recursion depth exceeded in comparison
E
E           To execute this test, run the following from the base repo dir:
E               python test/test_sympy_utils.py -k TestValueRanges.test_binary_ref_fn_floordiv_dtype_float
E
E           This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0

args       = (FloorDiv, -1.00000000000000, -1.00000000000000)
cfunc      = <functools._lru_cache_wrapper object at 0x7fc5303173a0>
func       = <function Function.__new__ at 0x7fc530317280>
kwargs     = {}
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131151
Approved by: https://github.com/ezyang
2024-07-27 19:39:40 +00:00
0f9bf208ec Revert "[BE][tests] show local variables on failure in tests (#131151)"
This reverts commit 054d214c504b415b155ef2da1a70764a115e1276.

Reverted https://github.com/pytorch/pytorch/pull/131151 on behalf of https://github.com/jbschlosser due to pollutes test failure output for OpInfo tests ([comment](https://github.com/pytorch/pytorch/pull/131151#issuecomment-2253310448))
2024-07-26 19:03:10 +00:00
054d214c50 [BE][tests] show local variables on failure in tests (#131151)
------

As per the title, add argument `--locals` for `unittest` and `--showlocals --tb=long` for `pytest` in CI.

Some failures cannot be reproduced on the local machine but exist on cloud CI. This change allows us to investigate the test failure more easily.

Example output: https://github.com/pytorch/pytorch/actions/runs/9961546996/job/27523888353?pr=130710#step:20:3361

```text
/opt/conda/envs/py_3.8/lib/python3.8/site-packages/sympy/core/function.py:307:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

cls = FloorDiv, base = -1.00000000000000, divisor = -1.00000000000000

    @classmethod
    def eval(cls, base, divisor):
        # python test/test_dynamic_shapes.py -k TestDimConstraints.test_dim_constraints_solve_full
        # Assert triggered by inequality solver
        # assert base.is_integer, base
        # assert divisor.is_integer, divisor

        # We don't provide the same error message as in Python because SymPy
        # makes it difficult to check the types.
        if divisor.is_zero:
            raise ZeroDivisionError("division by zero")
        if base in (int_oo, -int_oo, sympy.oo, -sympy.oo) and divisor in (
            int_oo,
            -int_oo,
            sympy.oo,
            -sympy.oo,
        ):
            return sympy.nan
        if base is sympy.nan or divisor is sympy.nan:
            return sympy.nan

        if base.is_zero:
            return sympy.S.Zero
        if base.is_integer and divisor == 1:
            return base
        if base.is_integer and divisor == -1:
            return sympy.Mul(base, -1)
        if (
            isinstance(base, sympy.Number)
            and isinstance(divisor, sympy.Number)
            and (
                base in (int_oo, -int_oo, sympy.oo, -sympy.oo)
                or divisor in (int_oo, -int_oo, sympy.oo, -sympy.oo)
            )
        ):
            r = float(base) / float(divisor)
            if r == math.inf:
                return int_oo
            elif r == -math.inf:
                return -int_oo
            elif math.isnan(r):
                return sympy.nan
            else:
                return sympy.Integer(math.floor(r))
        if isinstance(base, sympy.Integer) and isinstance(divisor, sympy.Integer):
            return sympy.Integer(int(base) // int(divisor))
        if isinstance(base, FloorDiv):
            return FloorDiv(base.args[0], base.args[1] * divisor)

        # Expands (x + y) // b into x // b + y // b.
        # This only works if floor is an identity, i.e. x / b is an integer.
        for term in sympy.Add.make_args(base):
            quotient = term / divisor
            if quotient.is_integer and isinstance(divisor, sympy.Integer):
                # NB: this is correct even if the divisor is not an integer, but it
                # creates rational expressions that cause problems with dynamic
                # shapes.
                return FloorDiv(base - term, divisor) + quotient

        try:
            gcd = sympy.gcd(base, divisor)
            if gcd != 1:
>               return FloorDiv(
                    sympy.simplify(base / gcd), sympy.simplify(divisor / gcd)
                )

base       = -1.00000000000000
cls        = FloorDiv
divisor    = -1.00000000000000
gcd        = 1.00000000000000
quotient   = 1.00000000000000
term       = -1.00000000000000

/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/utils/_sympy/functions.py:159:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

args = (FloorDiv, -1.00000000000000, -1.00000000000000), kwargs = {}

    @wraps(func)
    def wrapper(*args, **kwargs):
        try:
>           retval = cfunc(*args, **kwargs)
E           RecursionError: maximum recursion depth exceeded in comparison
E
E           To execute this test, run the following from the base repo dir:
E               python test/test_sympy_utils.py -k TestValueRanges.test_binary_ref_fn_floordiv_dtype_float
E
E           This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0

args       = (FloorDiv, -1.00000000000000, -1.00000000000000)
cfunc      = <functools._lru_cache_wrapper object at 0x7fc5303173a0>
func       = <function Function.__new__ at 0x7fc530317280>
kwargs     = {}
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131151
Approved by: https://github.com/ezyang
2024-07-25 10:10:58 +00:00
ba48cf6535 [BE][Easy][6/19] enforce style for empty lines in import segments in test/ (#129757)
See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter.

You can review these PRs via:

```bash
git diff --ignore-all-space --ignore-blank-lines HEAD~1
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129757
Approved by: https://github.com/ezyang
2024-07-17 06:42:37 +00:00
4d7bf72d93 [BE][Easy] fix ruff rule needless-bool (SIM103) (#130206)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130206
Approved by: https://github.com/malfet
2024-07-14 08:17:52 +00:00
312652c325 [RFC] Add support for device extension autoloading (#127074)
Fixes #122468

- Load device extensions at the end of `torch/__init__.py`
- Enabled by default, or you can disable it with `TORCH_DEVICE_BACKEND_AUTOLOAD=0`

run test:

```python
python test/run_test.py -i test_autoload_enable
python test/run_test.py -i test_autoload_disable
```

doc:

https://docs-preview.pytorch.org/pytorch/pytorch/127074/miscellaneous_environment_variables.html

co-author:  @jgong5 @bsochack @bkowalskiINTEL @jczaja @FFFrog @hipudding

Co-authored-by: albanD <desmaison.alban@gmail.com>
Co-authored-by: Jiong Gong <jiong.gong@intel.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127074
Approved by: https://github.com/albanD, https://github.com/jgong5
2024-07-09 06:14:13 +00:00
91a8376d47 run_test: Unset cpp stacktraces after reruns (#129004)
Rerun the failing test singly with the env var set.  If it succeeds, start a new process without the cpp stack traces env var

We don't want to waste time generating these if we don't have to

They can also show up in assertion errors, which may cause unexpected failures if a test wants to check these

Adds new --rs (run single) to be used the same way --scs and --sc are.  It will only run the single test in the step current file

https://hud.pytorch.org/pytorch/pytorch/pull/129004?sha=2c349d3557d399020bf1f6a8b7045e2e4957ba46 has some examples of logs

In the above:
* test_checkpoint_valid failed, then passed in another subprocess.  The testing continued in a different new subprocess from the test right after it (test_checkpointing_without_reentrant_early_free)
* test_format_traceback_short failed consistently, but it continued to run because keep-going was set

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129004
Approved by: https://github.com/PaliC
2024-07-03 01:50:15 +00:00
4ee1cb9b95 [BE][Easy] replace import pathlib with from pathlib import Path (#129426)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129426
Approved by: https://github.com/malfet
2024-06-30 01:36:07 +00:00
2effbcfcd8 Revert "[BE][Easy] replace import pathlib with from pathlib import Path (#129426)"
This reverts commit 6d75604ef135925e8c85363c2f4a5e0b6f7fef28.

Reverted https://github.com/pytorch/pytorch/pull/129426 on behalf of https://github.com/XuehaiPan due to recognize `Path` as new exported API ([comment](https://github.com/pytorch/pytorch/pull/129426#issuecomment-2198371625))
2024-06-29 23:24:06 +00:00
6d75604ef1 [BE][Easy] replace import pathlib with from pathlib import Path (#129426)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129426
Approved by: https://github.com/malfet
2024-06-29 15:42:09 +00:00
8892ddaacc [TD] Test removal on sm86 (#127131)
Yolo

I'm excited to break CI :')
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127131
Approved by: https://github.com/huydhn, https://github.com/ZainRizvi
2024-06-07 20:19:18 +00:00
baaa914bf7 [small] test clean up (#128079)
remove unnecessary line: https://github.com/pytorch/pytorch/issues/123733
add main so test can be run `python ...`: https://github.com/pytorch/pytorch/issues/124906

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128079
Approved by: https://github.com/awgu
2024-06-06 21:21:40 +00:00
627d2cd87d [CI] disable td for xpu ci test by default (#127611)
Due to the xpu ci test has been enabled td by default, a lot of test cases (75%) have been skipped in CI tests. It caused some ci failures escaped from the ci tests, for example issue #127539. This PR depends on PR #127595 landed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127611
Approved by: https://github.com/etaf, https://github.com/atalman
2024-06-04 17:15:10 +00:00
a31a60d85b Change run_test.py arg parsing to handle additional args better (#126709)
Do not inherit parser from common_utils
* I don't think we use any variables in run_test that depend on those, and I think all tests except doctests run in a subprocess so they will parse the args in common_utils and set the variables.  I don't think doctests wants any of those variables?

Parse known args, add the extra args as extra, pass the extra ones along to the subprocess
Removes the first instance of `--`

I think I will miss run_test telling me if an arg is valid or not

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126709
Approved by: https://github.com/ZainRizvi, https://github.com/huydhn, https://github.com/Flamefire
2024-05-23 21:08:12 +00:00
ac2c547838 [TD] Upload names of failures to s3 for pytest cache (#126315)
Some tests don't get run through pytest and pytest crashes when a test segfaults, so in both caess, the pytest cache won't have an entry (similar to https://github.com/pytorch/test-infra/pull/5205).

Instead, manually upload/download an extra file that lists the failing test files

Technically this would be more general than the pytest cache
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126315
Approved by: https://github.com/ZainRizvi
2024-05-21 16:29:31 +00:00
8bca0847c2 Revert "[TD] Upload names of failures to s3 for pytest cache (#126315)"
This reverts commit 655038687afd19a4a4c9371b77ff046fd6c84be1.

Reverted https://github.com/pytorch/pytorch/pull/126315 on behalf of https://github.com/clee2000 due to broke inductor ([comment](https://github.com/pytorch/pytorch/pull/126315#issuecomment-2121133045))
2024-05-20 20:15:08 +00:00
655038687a [TD] Upload names of failures to s3 for pytest cache (#126315)
Some tests don't get run through pytest and pytest crashes when a test segfaults, so in both caess, the pytest cache won't have an entry (similar to https://github.com/pytorch/test-infra/pull/5205).

Instead, manually upload/download an extra file that lists the failing test files

Technically this would be more general than the pytest cache
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126315
Approved by: https://github.com/ZainRizvi
2024-05-20 17:36:30 +00:00
762ce6f062 Add Lowering for FlexAttention Backwards (#125515)
# Summary
#### What does this PR do?
It enables Inductor to actually generate the fused flex attention kernel for the backwards

I did some other things along the way:
- Abstract out the 'build_subgraph_buffer' subroutine and make it reusable between flex attention and flex_attention backwards. In total we need too build 3 subgraphs for fwd + bwd. 1 for the fwd graph and then 2 in the bwd. The FAv2 algorithm recomputes the parts of the forward (more efficiently since we already have the row_max via logsumexp), therefore we need to inline both the fwd graph and the joint graph in the bwds kernel.
- The version of the backwards kernel is from a somewhat older version of the triton tutorial implementation. I think that we should update in a follow up to a newer version. Notably the blocks need to be square for this to work as currently implemented. I am sure there are many opportunities for optimization.
- I didnt correctly register the decomp table + IndexMode when I landed: https://github.com/pytorch/pytorch/pull/123902, this remedies that.
- The rel_bias helper func was reversed in terms of causality. I updated and then add a test specific for "future causal" attention.
- This PRs but the main point that I think still needs to be worked out is the store_output call. I have it hacked up to be 'fake' but I dont think we want to land that and likely want to just have a mutated 'dq' and a stored_output 'dk'
- I also needed to update the `TritonTemplateKernel` to actually accept multiple subgraphs (modifications)
- I updated the benchmark to also profile bwds performance

### Benchmark Numbers:
_The current implementation is not parallelizing over ctx length in the bwd_
FWD Speedups

| Type    |   Speedup | shape              | score_mod   | dtype          |
|---------|-----------|--------------------|-------------|----------------|
| Average |     0.991 |                    |             |                |
| Max     |     1.182 | (16, 16, 4096, 64) | noop        | torch.bfloat16 |
| Min     |     0.796 | (2, 16, 512, 256)  | head_bias   | torch.bfloat16 |

BWD Speedups

| Type    |   Speedup | shape              | score_mod   | dtype          |
|---------|-----------|--------------------|-------------|----------------|
| Average |     0.291 |                    |             |                |
| Max     |     0.652 | (8, 16, 512, 64)   | head_bias   | torch.bfloat16 |
| Min     |     0.073 | (2, 16, 4096, 128) | head_bias   | torch.bfloat16 |

<details>

<summary>Full Data</summary>

| shape               | score_mod     | dtype          |   fwd_eager_time |   fwd_compiled_time |   bwd_eager_time |   bwd_compiled_time |   fwd_speedup |   bwd_speedup |
|---------------------|---------------|----------------|------------------|---------------------|------------------|---------------------|---------------|---------------|
| (2, 16, 512, 64)    | noop          | torch.bfloat16 |           19.936 |              19.092 |           57.851 |             193.564 |         1.044 |         0.299 |
| (2, 16, 512, 64)    | causal_mask   | torch.bfloat16 |           19.955 |              19.497 |           57.662 |             206.278 |         1.024 |         0.280 |
| (2, 16, 512, 64)    | relative_bias | torch.bfloat16 |           19.455 |              21.297 |           57.674 |             195.219 |         0.913 |         0.295 |
| (2, 16, 512, 64)    | head_bias     | torch.bfloat16 |           19.958 |              21.289 |           57.674 |             193.859 |         0.938 |         0.298 |
| (2, 16, 512, 128)   | noop          | torch.bfloat16 |           28.157 |              28.615 |           82.831 |             454.211 |         0.984 |         0.182 |
| (2, 16, 512, 128)   | causal_mask   | torch.bfloat16 |           28.154 |              28.444 |           83.091 |             432.083 |         0.990 |         0.192 |
| (2, 16, 512, 128)   | relative_bias | torch.bfloat16 |           28.722 |              27.897 |           83.175 |             446.789 |         1.030 |         0.186 |
| (2, 16, 512, 128)   | head_bias     | torch.bfloat16 |           28.299 |              27.673 |           83.052 |             459.179 |         1.023 |         0.181 |
| (2, 16, 512, 256)   | noop          | torch.bfloat16 |           41.167 |              50.504 |          175.019 |            1083.545 |         0.815 |         0.162 |
| (2, 16, 512, 256)   | causal_mask   | torch.bfloat16 |           41.656 |              51.933 |          175.078 |            1171.176 |         0.802 |         0.149 |
| (2, 16, 512, 256)   | relative_bias | torch.bfloat16 |           41.697 |              50.722 |          175.159 |            1097.312 |         0.822 |         0.160 |
| (2, 16, 512, 256)   | head_bias     | torch.bfloat16 |           41.690 |              52.387 |          175.184 |            1097.336 |         0.796 |         0.160 |
| (2, 16, 1024, 64)   | noop          | torch.bfloat16 |           39.232 |              37.454 |          127.847 |             612.430 |         1.047 |         0.209 |
| (2, 16, 1024, 64)   | causal_mask   | torch.bfloat16 |           39.930 |              39.599 |          127.755 |             665.359 |         1.008 |         0.192 |
| (2, 16, 1024, 64)   | relative_bias | torch.bfloat16 |           39.417 |              41.304 |          127.902 |             614.990 |         0.954 |         0.208 |
| (2, 16, 1024, 64)   | head_bias     | torch.bfloat16 |           39.965 |              42.034 |          127.953 |             613.273 |         0.951 |         0.209 |
| (2, 16, 1024, 128)  | noop          | torch.bfloat16 |           63.964 |              71.024 |          226.510 |            1637.669 |         0.901 |         0.138 |
| (2, 16, 1024, 128)  | causal_mask   | torch.bfloat16 |           63.843 |              72.451 |          226.750 |            1558.949 |         0.881 |         0.145 |
| (2, 16, 1024, 128)  | relative_bias | torch.bfloat16 |           64.301 |              70.487 |          226.651 |            1610.063 |         0.912 |         0.141 |
| (2, 16, 1024, 128)  | head_bias     | torch.bfloat16 |           64.033 |              71.394 |          226.676 |            1668.511 |         0.897 |         0.136 |
| (2, 16, 1024, 256)  | noop          | torch.bfloat16 |          129.348 |             141.390 |          507.337 |            4405.175 |         0.915 |         0.115 |
| (2, 16, 1024, 256)  | causal_mask   | torch.bfloat16 |          129.538 |             145.680 |          507.178 |            4768.874 |         0.889 |         0.106 |
| (2, 16, 1024, 256)  | relative_bias | torch.bfloat16 |          129.438 |             142.782 |          507.004 |            4401.002 |         0.907 |         0.115 |
| (2, 16, 1024, 256)  | head_bias     | torch.bfloat16 |          129.058 |             146.242 |          507.547 |            4434.251 |         0.883 |         0.114 |
| (2, 16, 4096, 64)   | noop          | torch.bfloat16 |          481.606 |             409.120 |         1440.890 |           14147.269 |         1.177 |         0.102 |
| (2, 16, 4096, 64)   | causal_mask   | torch.bfloat16 |          480.227 |             438.847 |         1434.419 |           14973.386 |         1.094 |         0.096 |
| (2, 16, 4096, 64)   | relative_bias | torch.bfloat16 |          480.831 |             458.104 |         1432.935 |           14193.253 |         1.050 |         0.101 |
| (2, 16, 4096, 64)   | head_bias     | torch.bfloat16 |          480.749 |             452.497 |         1437.040 |           14084.869 |         1.062 |         0.102 |
| (2, 16, 4096, 128)  | noop          | torch.bfloat16 |          872.534 |             848.275 |         2600.895 |           35156.849 |         1.029 |         0.074 |
| (2, 16, 4096, 128)  | causal_mask   | torch.bfloat16 |          872.647 |             868.279 |         2587.581 |           31919.531 |         1.005 |         0.081 |
| (2, 16, 4096, 128)  | relative_bias | torch.bfloat16 |          871.484 |             827.644 |         2593.989 |           34805.634 |         1.053 |         0.075 |
| (2, 16, 4096, 128)  | head_bias     | torch.bfloat16 |          871.422 |             856.437 |         2602.482 |           35708.591 |         1.017 |         0.073 |
| (2, 16, 4096, 256)  | noop          | torch.bfloat16 |         1904.497 |            1758.183 |         6122.416 |           66754.593 |         1.083 |         0.092 |
| (2, 16, 4096, 256)  | causal_mask   | torch.bfloat16 |         1911.174 |            1762.821 |         6113.207 |           72759.392 |         1.084 |         0.084 |
| (2, 16, 4096, 256)  | relative_bias | torch.bfloat16 |         1911.254 |            1727.108 |         6123.530 |           66577.988 |         1.107 |         0.092 |
| (2, 16, 4096, 256)  | head_bias     | torch.bfloat16 |         1916.977 |            1801.804 |         6118.158 |           67359.680 |         1.064 |         0.091 |
| (8, 16, 512, 64)    | noop          | torch.bfloat16 |           44.984 |              43.974 |          170.276 |             262.259 |         1.023 |         0.649 |
| (8, 16, 512, 64)    | causal_mask   | torch.bfloat16 |           45.001 |              46.265 |          170.509 |             274.893 |         0.973 |         0.620 |
| (8, 16, 512, 64)    | relative_bias | torch.bfloat16 |           45.466 |              48.211 |          170.606 |             262.759 |         0.943 |         0.649 |
| (8, 16, 512, 64)    | head_bias     | torch.bfloat16 |           45.481 |              48.435 |          170.267 |             261.265 |         0.939 |         0.652 |
| (8, 16, 512, 128)   | noop          | torch.bfloat16 |           72.565 |              74.736 |          313.220 |             773.126 |         0.971 |         0.405 |
| (8, 16, 512, 128)   | causal_mask   | torch.bfloat16 |           72.015 |              75.755 |          313.311 |             775.513 |         0.951 |         0.404 |
| (8, 16, 512, 128)   | relative_bias | torch.bfloat16 |           72.105 |              74.189 |          313.806 |             769.238 |         0.972 |         0.408 |
| (8, 16, 512, 128)   | head_bias     | torch.bfloat16 |           72.005 |              74.364 |          313.509 |             775.237 |         0.968 |         0.404 |
| (8, 16, 512, 256)   | noop          | torch.bfloat16 |          138.656 |             165.453 |          663.707 |            2672.067 |         0.838 |         0.248 |
| (8, 16, 512, 256)   | causal_mask   | torch.bfloat16 |          139.096 |             172.613 |          663.593 |            2926.538 |         0.806 |         0.227 |
| (8, 16, 512, 256)   | relative_bias | torch.bfloat16 |          139.500 |             168.417 |          663.938 |            2658.629 |         0.828 |         0.250 |
| (8, 16, 512, 256)   | head_bias     | torch.bfloat16 |          139.776 |             173.549 |          662.920 |            2667.266 |         0.805 |         0.249 |
| (8, 16, 1024, 64)   | noop          | torch.bfloat16 |          134.883 |             125.004 |          484.706 |            1195.254 |         1.079 |         0.406 |
| (8, 16, 1024, 64)   | causal_mask   | torch.bfloat16 |          134.297 |             132.875 |          485.420 |            1234.953 |         1.011 |         0.393 |
| (8, 16, 1024, 64)   | relative_bias | torch.bfloat16 |          134.839 |             139.231 |          485.470 |            1198.556 |         0.968 |         0.405 |
| (8, 16, 1024, 64)   | head_bias     | torch.bfloat16 |          133.822 |             136.449 |          485.608 |            1189.198 |         0.981 |         0.408 |
| (8, 16, 1024, 128)  | noop          | torch.bfloat16 |          235.470 |             234.765 |          886.094 |            2662.944 |         1.003 |         0.333 |
| (8, 16, 1024, 128)  | causal_mask   | torch.bfloat16 |          236.305 |             241.382 |          886.293 |            2646.984 |         0.979 |         0.335 |
| (8, 16, 1024, 128)  | relative_bias | torch.bfloat16 |          236.414 |             233.980 |          885.250 |            2642.178 |         1.010 |         0.335 |
| (8, 16, 1024, 128)  | head_bias     | torch.bfloat16 |          237.176 |             239.040 |          885.754 |            2665.242 |         0.992 |         0.332 |
| (8, 16, 1024, 256)  | noop          | torch.bfloat16 |          504.445 |             517.855 |         1978.956 |            9592.906 |         0.974 |         0.206 |
| (8, 16, 1024, 256)  | causal_mask   | torch.bfloat16 |          502.428 |             536.002 |         1978.611 |           10607.342 |         0.937 |         0.187 |
| (8, 16, 1024, 256)  | relative_bias | torch.bfloat16 |          503.396 |             523.960 |         1977.993 |            9539.284 |         0.961 |         0.207 |
| (8, 16, 1024, 256)  | head_bias     | torch.bfloat16 |          503.818 |             536.014 |         1980.131 |            9576.262 |         0.940 |         0.207 |
| (8, 16, 4096, 64)   | noop          | torch.bfloat16 |         1970.139 |            1674.930 |         5750.940 |           16724.134 |         1.176 |         0.344 |
| (8, 16, 4096, 64)   | causal_mask   | torch.bfloat16 |         1959.036 |            1775.056 |         5780.512 |           17390.350 |         1.104 |         0.332 |
| (8, 16, 4096, 64)   | relative_bias | torch.bfloat16 |         1947.198 |            1773.869 |         5780.643 |           16779.699 |         1.098 |         0.345 |
| (8, 16, 4096, 64)   | head_bias     | torch.bfloat16 |         1963.935 |            1829.502 |         5780.018 |           16703.259 |         1.073 |         0.346 |
| (8, 16, 4096, 128)  | noop          | torch.bfloat16 |         3582.711 |            3362.623 |        10436.069 |           36415.565 |         1.065 |         0.287 |
| (8, 16, 4096, 128)  | causal_mask   | torch.bfloat16 |         3581.504 |            3499.472 |        10346.869 |           36164.959 |         1.023 |         0.286 |
| (8, 16, 4096, 128)  | relative_bias | torch.bfloat16 |         3589.779 |            3337.849 |        10529.621 |           36261.696 |         1.075 |         0.290 |
| (8, 16, 4096, 128)  | head_bias     | torch.bfloat16 |         3602.265 |            3436.444 |        10458.660 |           36507.790 |         1.048 |         0.286 |
| (8, 16, 4096, 256)  | noop          | torch.bfloat16 |         7695.923 |            7126.275 |        24643.009 |          140949.081 |         1.080 |         0.175 |
| (8, 16, 4096, 256)  | causal_mask   | torch.bfloat16 |         7679.939 |            7186.252 |        24538.105 |          157156.067 |         1.069 |         0.156 |
| (8, 16, 4096, 256)  | relative_bias | torch.bfloat16 |         7681.374 |            6994.832 |        24549.713 |          140077.179 |         1.098 |         0.175 |
| (8, 16, 4096, 256)  | head_bias     | torch.bfloat16 |         7679.822 |            7212.278 |        24627.823 |          140675.003 |         1.065 |         0.175 |
| (16, 16, 512, 64)   | noop          | torch.bfloat16 |           80.126 |              78.291 |          333.719 |             541.165 |         1.023 |         0.617 |
| (16, 16, 512, 64)   | causal_mask   | torch.bfloat16 |           80.065 |              81.696 |          333.779 |             551.113 |         0.980 |         0.606 |
| (16, 16, 512, 64)   | relative_bias | torch.bfloat16 |           80.138 |              86.715 |          333.364 |             542.118 |         0.924 |         0.615 |
| (16, 16, 512, 64)   | head_bias     | torch.bfloat16 |           80.415 |              85.204 |          333.294 |             536.840 |         0.944 |         0.621 |
| (16, 16, 512, 128)  | noop          | torch.bfloat16 |          134.964 |             138.025 |          607.093 |            1333.102 |         0.978 |         0.455 |
| (16, 16, 512, 128)  | causal_mask   | torch.bfloat16 |          134.192 |             141.523 |          606.269 |            1424.318 |         0.948 |         0.426 |
| (16, 16, 512, 128)  | relative_bias | torch.bfloat16 |          135.711 |             138.639 |          606.283 |            1327.974 |         0.979 |         0.457 |
| (16, 16, 512, 128)  | head_bias     | torch.bfloat16 |          135.552 |             140.555 |          607.107 |            1347.370 |         0.964 |         0.451 |
| (16, 16, 512, 256)  | noop          | torch.bfloat16 |          275.113 |             315.144 |         1301.583 |            5268.153 |         0.873 |         0.247 |
| (16, 16, 512, 256)  | causal_mask   | torch.bfloat16 |          274.867 |             328.106 |         1302.513 |            5770.594 |         0.838 |         0.226 |
| (16, 16, 512, 256)  | relative_bias | torch.bfloat16 |          276.052 |             321.770 |         1302.904 |            5241.920 |         0.858 |         0.249 |
| (16, 16, 512, 256)  | head_bias     | torch.bfloat16 |          271.409 |             328.839 |         1302.142 |            5266.037 |         0.825 |         0.247 |
| (16, 16, 1024, 64)  | noop          | torch.bfloat16 |          260.489 |             237.463 |          955.884 |            1817.558 |         1.097 |         0.526 |
| (16, 16, 1024, 64)  | causal_mask   | torch.bfloat16 |          262.378 |             254.350 |          955.280 |            1843.807 |         1.032 |         0.518 |
| (16, 16, 1024, 64)  | relative_bias | torch.bfloat16 |          261.338 |             268.253 |          956.038 |            1820.036 |         0.974 |         0.525 |
| (16, 16, 1024, 64)  | head_bias     | torch.bfloat16 |          262.153 |             264.156 |          956.023 |            1810.076 |         0.992 |         0.528 |
| (16, 16, 1024, 128) | noop          | torch.bfloat16 |          476.475 |             461.413 |         1760.578 |            4306.521 |         1.033 |         0.409 |
| (16, 16, 1024, 128) | causal_mask   | torch.bfloat16 |          473.794 |             479.178 |         1761.277 |            4619.439 |         0.989 |         0.381 |
| (16, 16, 1024, 128) | relative_bias | torch.bfloat16 |          473.839 |             463.282 |         1758.692 |            4290.562 |         1.023 |         0.410 |
| (16, 16, 1024, 128) | head_bias     | torch.bfloat16 |          472.979 |             472.896 |         1763.086 |            4367.931 |         1.000 |         0.404 |
| (16, 16, 1024, 256) | noop          | torch.bfloat16 |         1014.184 |            1026.764 |         3922.997 |           19104.147 |         0.988 |         0.205 |
| (16, 16, 1024, 256) | causal_mask   | torch.bfloat16 |         1013.217 |            1039.046 |         3928.382 |           21086.281 |         0.975 |         0.186 |
| (16, 16, 1024, 256) | relative_bias | torch.bfloat16 |         1008.519 |            1015.278 |         3922.133 |           18980.652 |         0.993 |         0.207 |
| (16, 16, 1024, 256) | head_bias     | torch.bfloat16 |         1011.360 |            1047.542 |         3931.245 |           19069.172 |         0.965 |         0.206 |
| (16, 16, 4096, 64)  | noop          | torch.bfloat16 |         3929.850 |            3325.667 |        11411.704 |           23344.280 |         1.182 |         0.489 |
| (16, 16, 4096, 64)  | causal_mask   | torch.bfloat16 |         3885.262 |            3581.544 |        11390.515 |           23725.639 |         1.085 |         0.480 |
| (16, 16, 4096, 64)  | relative_bias | torch.bfloat16 |         3865.737 |            3537.308 |        11489.901 |           23406.330 |         1.093 |         0.491 |
| (16, 16, 4096, 64)  | head_bias     | torch.bfloat16 |         3880.530 |            3665.249 |        11484.411 |           23299.496 |         1.059 |         0.493 |
| (16, 16, 4096, 128) | noop          | torch.bfloat16 |         7030.306 |            6745.715 |        20621.264 |           57464.096 |         1.042 |         0.359 |
| (16, 16, 4096, 128) | causal_mask   | torch.bfloat16 |         7095.414 |            7034.385 |        20410.656 |           61660.511 |         1.009 |         0.331 |
| (16, 16, 4096, 128) | relative_bias | torch.bfloat16 |         7084.779 |            6686.497 |        20315.161 |           57243.969 |         1.060 |         0.355 |
| (16, 16, 4096, 128) | head_bias     | torch.bfloat16 |         7075.367 |            6863.305 |        20494.385 |           58481.953 |         1.031 |         0.350 |
| (16, 16, 4096, 256) | noop          | torch.bfloat16 |        15612.741 |           14297.482 |        55306.847 |          281161.865 |         1.092 |         0.197 |
| (16, 16, 4096, 256) | causal_mask   | torch.bfloat16 |        15326.592 |           14263.878 |        55227.806 |          313063.232 |         1.075 |         0.176 |
| (16, 16, 4096, 256) | relative_bias | torch.bfloat16 |        15297.963 |           14007.379 |        54558.029 |          279529.175 |         1.092 |         0.195 |
| (16, 16, 4096, 256) | head_bias     | torch.bfloat16 |        15216.160 |           14276.027 |        55081.581 |          280996.826 |         1.066 |         0.196 |

</details>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125515
Approved by: https://github.com/Chillee
2024-05-17 00:41:55 +00:00
14d8e3aec0 Add distributed/_tensor/test_attention to ROCM_BLOCKLIST (#126336)
Fixes #125504
Fixes #126252
Fixes #126296
Fixes #126330

This PR doesn't really fix the RingAttentionTest tests for ROCm, but explicitly adds the whole test file to ROCM_BLOCKLIST to get a clean signal on ROCm distributed CI. We will enable these tests in a follow-up PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126336
Approved by: https://github.com/huydhn, https://github.com/pruthvistony
2024-05-16 16:38:09 +00:00
48f98bcdfc [TD] Enable test removal on most default configs + distributed CUDA for everyone (#125931)
yolo

Add the longest jobs in pull:
* default cpu configs
* non sm86 cuda
* distributed cuda for everyone

Still excluding
* slow, inductor, rocm, onnx, mac, dynamo
* distributed cpu
* windows cuda
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125931
Approved by: https://github.com/huydhn, https://github.com/ZainRizvi
2024-05-14 17:35:12 +00:00
6f619cc727 [ez] functorch/test_vmap and test_dataloader to run in parallel (#125597)
Also mark test_svd serial in linalg to see if it helps with the flakiness
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125597
Approved by: https://github.com/huydhn, https://github.com/ZainRizvi
2024-05-08 15:37:29 +00:00
0e57bbb6d7 Set timeout for C++ tests (#125517)
Looking at the unrelated Windows timeout failure on https://github.com/pytorch/pytorch/pull/125199, it looks like we don't have a timeout value set for C++ tests atm.  In this case, a C++ test on Windows timed out after 2+ hours.

```
2024-05-02T23:35:34.0639067Z Running cpp/c10_TypeList_test 1/1 ... [2024-05-02 23:35:34.059021]
2024-05-02T23:35:34.0641108Z Executing ['pytest', 'C:\\actions-runner\\_work\\pytorch\\pytorch\\build\\win_tmp\\build\\torch\\test\\c10_TypeList_test.exe', '-m', 'not serial', '-v', '-vv', '-rfEX', '-n', '2', '--junit-xml-reruns', 'test-reports\\python-pytest\\test\\run_test\\test\\run_test-c898ddeff8f33cbf.xml', '-x', '--reruns=2'] ... [2024-05-02 23:35:34.062137]
2024-05-03T02:45:33.7862004Z Process SpawnPoolWorker-2:
2024-05-03T02:45:33.7927201Z Traceback (most recent call last):
2024-05-03T02:45:33.7928032Z   File "C:\Jenkins\Miniconda3\lib\multiprocessing\process.py", line 315, in _bootstrap
2024-05-03T02:45:33.7928722Z     self.run()
2024-05-03T02:45:33.7929722Z   File "C:\Jenkins\Miniconda3\lib\multiprocessing\process.py", line 108, in run
2024-05-03T02:45:33.7931639Z     self._target(*self._args, **self._kwargs)
2024-05-03T02:45:33.7932435Z   File "C:\Jenkins\Miniconda3\lib\multiprocessing\pool.py", line 114, in worker
2024-05-03T02:45:33.7933338Z     task = get()
2024-05-03T02:45:33.7933946Z   File "C:\Jenkins\Miniconda3\lib\multiprocessing\queues.py", line 365, in get
2024-05-03T02:45:33.7935219Z     res = self._reader.recv_bytes()
2024-05-03T02:45:33.7935897Z   File "C:\Jenkins\Miniconda3\lib\multiprocessing\connection.py", line 221, in recv_bytes
2024-05-03T02:45:33.7936609Z     buf = self._recv_bytes(maxlength)
2024-05-03T02:45:33.7937302Z   File "C:\Jenkins\Miniconda3\lib\multiprocessing\connection.py", line 310, in _recv_bytes
2024-05-03T02:45:33.7938316Z     waitres = _winapi.WaitForMultipleObjects(
2024-05-03T02:45:33.7938766Z KeyboardInterrupt
```

Retrying was working, but it was already too late to finish the job.  I'm setting the same default `THRESHOLD * 3` timeout value here for C++ tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125517
Approved by: https://github.com/clee2000
2024-05-07 16:41:38 +00:00
848fce35b5 [CI][ez] Don't retry when it says don't retry (#125643)
default arg for retry_shell is retries=1
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125643
Approved by: https://github.com/huydhn
2024-05-07 16:20:00 +00:00
1b3fd83ab2 [TD] Enable TD on AVX related configs (#125482)
On test configs `nogpu_AVX512` and `nogpu_NO_AVX2`, which are the next longest jobs on trunk after windows
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125482
Approved by: https://github.com/huydhn
2024-05-06 22:02:16 +00:00
d4727fd4eb [TD][ez] Better check for is pr or not (#125485)
You can trigger ciflow tags on main branch commits, so we should be more conservative when checking to see if a workflow is a PR/on the main branch.

get_pr_number checks for the pr number based on the PR_NUMBER env var or a tag of the for `ciflow/workflow/pr number`

If we fail to find something like this, then assume it is on the main branch

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125485
Approved by: https://github.com/huydhn
2024-05-04 03:08:44 +00:00
e16f1ee4cc [ez][CI] Move test_modules and test_schema_check off CI_SERIAL_LIST (#125193)
* Related https://github.com/pytorch/pytorch/pull/124085

As in title, move test_modules and test_schema_check off CI_SERIAL_LIST
If things fail, they can get the serialTest decorator instead
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125193
Approved by: https://github.com/huydhn
2024-05-01 15:48:48 +00:00
e7631d6eae Revert "CI: add aarch64 linux workflow (#121284)"
This reverts commit 32cf04cb7f7aa14aff4d1cf40517d5de797550e7.

Reverted https://github.com/pytorch/pytorch/pull/121284 on behalf of https://github.com/malfet due to Test only changes has not been reverted ([comment](https://github.com/pytorch/pytorch/pull/121284#issuecomment-2083925890))
2024-04-30 00:24:11 +00:00
4d717cd7c3 [TD] Enable td on cpu windows (#125049)
yolo

Also
* Ensure that at least 1 test always gets run (`//` does truncation which results in 0 if you have too few tests discovered)
* Don't run test removal on slow tests - I'm not touching that yet

I am avoid everything other than pull + trunk workflows, so not doing this on windows CUDA, which runs on periodic
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125049
Approved by: https://github.com/huydhn, https://github.com/ZainRizvi
2024-04-29 23:39:54 +00:00
faee0e5ee8 [ez][CI] Move test_linalg and test_sparse_csr off CI_SERIAL_LIST (#125068)
* https://github.com/pytorch/pytorch/pull/124649 for context

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125068
Approved by: https://github.com/huydhn, https://github.com/ZainRizvi
2024-04-29 21:22:35 +00:00
32cf04cb7f CI: add aarch64 linux workflow (#121284)
aarch64 linux workflow is triggered for ciflow/aarch64 tags.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121284
Approved by: https://github.com/atalman, https://github.com/malfet
2024-04-29 18:25:40 +00:00
8461e7ed9e Add test_cpp_extensions tests for stream_and_event and mita_backend (#123614)
Test the generic torch.Stream/Event with fake device gurad and hooks. Since we added a fake device backend, it is mutual exclusive to other backends. Tests will be skipped if TEST_CUDA or TEST_ROCM is true.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123614
Approved by: https://github.com/albanD
ghstack dependencies: #123611, #123612
2024-04-26 16:17:54 +00:00
4a1299cc0e Revert "Add test_cpp_extensions tests for stream_and_event and mita_backend (#123614)"
This reverts commit 355dc34f865036c4c625fcdafe54db846b2be2c2.

Reverted https://github.com/pytorch/pytorch/pull/123614 on behalf of https://github.com/jeffdaily due to this PR broke ROCm with message RuntimeError: Cannot have MTIA with other devices ([comment](https://github.com/pytorch/pytorch/pull/123612#issuecomment-2077649762))
2024-04-25 16:06:46 +00:00