Compare commits

..

134 Commits

Author SHA1 Message Date
e10ed21af7 Use runner with more memory for ASAN builds 2025-10-08 15:09:43 -07:00
aed5ed1076 Refactor memory estimator to use node storages, add test (#164783)
- Update the Memory Estimator to use node storages for analysis, which simplifies book keeping, as opposed to manually looking at operator schema. This will also allow me to reuse this component elsewhere.

- Factor out into separate class, so that this same logic can be used  in scheduling (node allocations / aliasing / uses)

- Adds Tests for correctness - right now only on fwd/bwd by itself, not with both.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164783
Approved by: https://github.com/ruisizhang123
ghstack dependencies: #164738
2025-10-08 22:07:43 +00:00
af4c29fea8 [dynamo, nested graph breaks] fix nested step graph break related issues (#162737)
Turns out codegen'ing a nested step graph break is significantly more complicated than first thought. The optimized function should actually do:
- call graph/load values/do side effects etc.
- call into the leaf's resume function, but skipped (this essentially step graph break function for just the leaf function)
- call into all the other resume functions, traced.

This PR also adds `torch._dynamo.step_unsupported()`, which can be used for internal testing purposes to better test step graph break handling.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162737
Approved by: https://github.com/Lucaskabela
ghstack dependencies: #160601
2025-10-08 22:02:52 +00:00
486b4d2414 [dynamo, nested graph breaks] move cell codegen before side effects codegen (#160601)
This is needed because if we codegen cells for nested frames AFTER side effects, then reconstruction could get messed up. From below:

>The added test case demonstrates the reconstruction failure if we kept cell codegen at the original place (only happens with nested graph breaks since we reconstruct nested frame cells from VariableTracker rather than directly using LOAD_CLOSURE).

>At a high level, what happened before this change was that side_effects was pruning the cells (I don't recall exactly why this happens), and because cells were codegen'd after the side effects were applied, we were unable to properly reconstruct the cell. The error I was seeing was a list/tuple IndexError.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160601
Approved by: https://github.com/mlazos
2025-10-08 22:02:52 +00:00
8f83b3e71c add device generalization support for distributed checkpoint tests (#159242)
## MOTIVATION
To generalize Distributed checkpoint test cases for non-CUDA devices

## CHANGES
18 test files with minimal device abstraction changes updated in
test/distributed/checkpoint/

- Use device_type from DTensorTestBase wherever appropriate
- Replaced hard coded device names with torch.accelerator.current_accelerator()
- extend multi gpu decrator for other devices

test/distributed/checkpoint/test_state_dict_stager.py has large diff, that's because i changed the name cuda_obj  to gpu_obj. Functional change is minimum.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159242
Approved by: https://github.com/guangyey, https://github.com/d4l3k
2025-10-08 21:56:31 +00:00
f0c9f3bddb [PP] [BE] Remove runtime tests (#164962)
BE cleaning up dead code since we migrated the Multi-stage schedules to use schedule execution runtime

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164962
Approved by: https://github.com/Skylion007
ghstack dependencies: #162016
2025-10-08 21:42:33 +00:00
1d182dd81c [MPS] sparse norm (#164961)
Norms for sparse mps tensors

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164961
Approved by: https://github.com/malfet
2025-10-08 21:41:42 +00:00
0b15f7ae05 [fr] Enable dynamic path write for FR dump when it comes to torchft (#164752)
When it comes to FR dump, in the case of fault tolerance, users want to set the dump path to a different one when there is restart, so we just enable this case for users.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164752
Approved by: https://github.com/tushar00jain
2025-10-08 21:36:32 +00:00
f1229b6db9 [BE] Remove manual IP address resolution (#164969)
As https://github.com/pytorch/pytorch/issues/100400 has been closed a while back
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164969
Approved by: https://github.com/seemethere
ghstack dependencies: #164968
2025-10-08 21:22:34 +00:00
b1ac252f55 [Replicate][Test] tests that pp model grads are the same as single-device model grads (#164890)
**Summary:** Created a test so that we can verify that a model that has been pipelined + replicated has the same gradients as a reference model. To do this, I mapped the layers and their parameters in each partial model to the original full model and then compared the gradients.
**Test Case**
1. pytest test/distributed/_composable/test_composability/test_pp_composability.py -k test_replicate_pp_grads

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164890
Approved by: https://github.com/H-Huang
2025-10-08 21:07:05 +00:00
5ba11df4f8 [DeviceMesh] Make all members of DeviceMesh private and add public access API (#164954)
This is mostly mechanical change which make device mesh members all private and use a public property API instead. This is not a BC breaking change since the new API still guarantee BC.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164954
Approved by: https://github.com/fegin
ghstack dependencies: #164750
2025-10-08 21:04:07 +00:00
15800888b6 [CI] Print GPU info during setup linux (#164968)
I.e. run `nvidia-smi` if present

Helps detecting what driver version this runner is on, which would have helped debugging some of the issues recently
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164968
Approved by: https://github.com/ngimel
2025-10-08 20:58:33 +00:00
e7ed1a00eb Run inductor-perf-test-nightly-h100 once per day (#164967)
To reduce inductor costs, though I'm not sure how much this one matters specifically since h100s are reserved

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164967
Approved by: https://github.com/BoyuanFeng
2025-10-08 20:58:19 +00:00
2982406721 [inductor] ban benchmarking by default in deterministic mode (#164532)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164532
Approved by: https://github.com/eellison
ghstack dependencies: #164801
2025-10-08 20:55:15 +00:00
005c3d449e Support custom callback functions in schedule (#162016)
This is going to be used in https://github.com/pytorch/torchtitan/issues/1682

Add a `register_custom_function` to the `_PipelineScheduleRuntime` which allows users to implement any custom function to replace the runtime operation dynamically.

The signature of the callback should look like:

```python
class _CustomFunctionProtocol(Protocol):
    def __call__(self, action: _Action, ctx: _PipelineContext) -> None: ...
```

`_PipelineContext` contains a reference to the schedule which is executing the operations.

### Testing

Added a test which adds custom methods for `FORWARD` and `OVERLAP_F_B` which are just the same implementations as those used in the default schedule runtime. Check that the schedule can still run, numerics are correct, and the callbacks are executed the correct number of times.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162016
Approved by: https://github.com/fegin
2025-10-08 20:43:26 +00:00
b2b3947565 [DeviceMesh] Remove private _set_mesh_dim_group_options API (#164750)
We allow passing in PG option via https://github.com/pytorch/pytorch/pull/159371 and we did a clean up of Meta internal usage of `_set_mesh_dim_group_options`, since this a private API, we don't have any bc guarantee, we want to directly remove so that people use the new behavior from now on.

Also since we now allow passing pg in both DeviceMesh constructor and flatten API, so that we also want to get rid of the global pg option override variable.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164750
Approved by: https://github.com/lw, https://github.com/fegin
2025-10-08 20:38:17 +00:00
81994b08a0 [inductor] don't tune xblock for reduction (#164801)
It turns out that tuning XBLOCK for a reduction can also change numerics ( https://github.com/pytorch/pytorch/pull/164525#pullrequestreview-3306235454 ).

The PR skip tuning XBLOCK for a reduction. If we have multiple configs left with different XBLOCKs, the heuristic will pick the configs with second-largest XBLOCK.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164801
Approved by: https://github.com/jansel, https://github.com/mlazos, https://github.com/v0i0
2025-10-08 20:31:39 +00:00
71aefd5595 [reland] Allow setting grad_dtype on leaf tensors (#164751)
ghstack-source-id: e44b3941530be83a630ec93f1478eec741ffca2e
Pull-Request-resolved: https://github.com/pytorch/pytorch/pull/162815

Fixes #ISSUE_NUMBER

Relanding due to internal weirdness. Separate PR to codev w/o ghstack.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164751
Approved by: https://github.com/albanD
2025-10-08 20:23:13 +00:00
001e1d2637 Add memory estimator (#164738)
Original work by @ShatianWang, with lints applied. I am going to a few changes and add tests in subsequent prs but I want to preserve original commit first.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164738
Approved by: https://github.com/IvanKobzarev
2025-10-08 20:04:33 +00:00
e0cb1848d0 Use TMA loads always for Triton grouped MM kernel (#164256)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164256
Approved by: https://github.com/ngimel
2025-10-08 19:40:06 +00:00
a4110fedcf Use insert_or_assign instead of erase+emplace (#164868)
insert_or_assign does effectively the same thing as
erase+emplace but more efficiently since the search
does not need to be repeated

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164868
Approved by: https://github.com/eqy
2025-10-08 19:13:49 +00:00
37c6087334 Add split-K control to cuBLAS reduced-precision settings (#164766)
## Summary
- add a CuBLASReductionOption enum so the CUDA context can track reduced-precision and split-K options
- extend the Python bindings, backend helpers, and docs to accept an optional allow_splitk argument for fp16/bf16 matmul controls
- update cuBLAS/cuBLASLt call sites plus dynamo guards and tests to respect the new combinations

## Testing
- python test/test_cuda.py TestCuda.test_cublas_allow_fp16_reduced_precision_reduction_get_set -v *(fails: ModuleNotFoundError: No module named 'psutil')*

------
https://chatgpt.com/codex/tasks/task_e_68e404623178832f8a3e1d34e1e175da

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164766
Approved by: https://github.com/malfet, https://github.com/albanD
2025-10-08 18:48:45 +00:00
0b85236477 Fix refine_ranges corner case (#164075) (#164846)
Summary:
address https://github.com/pytorch/pytorch/issues/161360

u0>0 should update the range of u0 to start from [1, ..] this fix it. it was not doing that.

Test Plan: contbuild & OSS CI, see 27234792ad

D84038721

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164846
Approved by: https://github.com/izaitsevfb, https://github.com/ezyang
2025-10-08 18:42:37 +00:00
4c0fec3e4d [Max Autotune][B200] Skip carveout tests (#164435)
Summary: Skip sm `carveout` tests on B200, as carveout is currently unsupported.

Test Plan:
```
buck2 test 'fbcode//mode/opt' fbcode//caffe2/test/inductor:max_autotune -c fbcode.nvcc_arch=b200a -c fbcode.enable_gpu_sections=true -c fbcode.platform010_cuda_version=12.8 -c fbcode.re_gpu_tests=False -- test_honor_sm_carveout_with_triton_tma
```

Differential Revision: D83395610

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164435
Approved by: https://github.com/eellison
2025-10-08 18:39:43 +00:00
cyy
fdc622b513 [CMake] Remove LLVM link code (#134940)
This handling is not needed no recent LLVM APIs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134940
Approved by: https://github.com/ezyang, https://github.com/malfet
2025-10-08 18:39:16 +00:00
91b9484264 [ez] fix small doc error (#164915)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164915
Approved by: https://github.com/svekars
2025-10-08 18:27:44 +00:00
5c827a4133 [SymmMem] Multi-root tile reduction (#164757)
Stack from [ghstack](https://github.com/ezyang/ghstack/tree/0.12.0) (oldest at bottom):

Perform multiple tile reductions concurrently, with each tile reduced to a separate root.

- The number of concurrent reductions can be smaller than world size, i.e. roots can be a subset of all ranks. But all ranks are still required to call into this API.

- Currently supports NVLink SHARP scope only.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164757
Approved by: https://github.com/weifengpy, https://github.com/fegin
ghstack dependencies: #162243
2025-10-08 17:28:00 +00:00
83458197d1 [Benchmark] remove old timm models from benchmark (#164805)
Prune models from TorchInductor dashboard to reduce ci cost. This PR prunes for timm models according to the [doc](https://docs.google.com/document/d/1nLPNNAU-_M9Clx9FMrJ1ycdPxe-xRA54olPnsFzdpoU/edit?tab=t.0), which reduces from 60 to 14 models.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164805
Approved by: https://github.com/anijain2305, https://github.com/seemethere, https://github.com/huydhn, https://github.com/malfet
2025-10-08 17:14:58 +00:00
0b01ff4de0 [ROCm] Improve non stride-one backwards indexing for small index sets (#164409)
This patch fixes a performance problem which occurs when a small set of indices is used and there are practically no duplicates.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164409
Approved by: https://github.com/jerrymannil, https://github.com/jeffdaily
2025-10-08 17:04:52 +00:00
01f3a43462 [MPS] Update OS version in error message (#164946)
Followup after https://github.com/pytorch/pytorch/pull/159912
Fixes https://github.com/pytorch/pytorch/issues/164943

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164946
Approved by: https://github.com/Camyll
2025-10-08 16:43:50 +00:00
f332017294 C++ API handle optimizer defaults (#161825)
Fixes #141884

This fixes the issue for all optimizers and parameter options.
A member function `overwrite_from` is added to the optimizer base class. Each optimizer then implements this function for comparing their accepted parameters to defaults. A SFINAE approach to handle the different optimizer parameters generically (in optimizer.h only) was evaluated, but I think this is easier to review and maintain.

This mirrors the Python API up to one edge case. An example of the edge case is provided below.

Python can distinguish between 1) Key not present in dict = "not specified"  and 2) Key present in dict = "explicitly set". The C++ implementation cannot.
The issue hinges on whether or not to track if a particular parameter was set by the user explicitly or not (discrepancy in the case when the constructor default is explicitly passed in).

To track this seems like it will take more intervention than would be worth it (modify TORCH_ARG to keep track, use std::optional for the parameter types, use bitset tracking) and was not pursued in the current PR. I'm happy to alter the design if appropriate.

### Example of edge case hinging on CONSTRUCTOR DEFAULTS vs OPTIMIZER DEFAULTS

1. CONSTRUCTOR DEFAULTS:
   These are the values you get when calling AdamOptions()
   AdamOptions().lr() = 0.001
   AdamOptions().weight_decay() = 0
   AdamOptions().eps() = 1e-08

2. OPTIMIZER DEFAULTS:
   These are the values the user chose when creating the optimizer
   User's optimizer defaults:
   optimizer.lr() = 0.005
   optimizer.weight_decay() = 0.1
   optimizer.eps() = 1e-07

3. THE PROBLEM SCENARIO:
   User wants to add a parameter group with explicit weight_decay=0.0
   User sets: weight_decay(0)

4. THE CONFUSION:
   Constructor default weight_decay: 0
   User's explicit weight_decay:     0
   Are they equal? YES

   Since they're equal, our overwrite_from() logic thinks:
   "User didn't set weight_decay explicitly, use optimizer default"

5. CURRENT BEHAVIOR:
   Final weight_decay: 0.1
   User expected:      0
   Match?  NO

=== KEY INSIGHT ===
Constructor defaults are built into the C++ class definition.
Optimizer defaults are chosen by the user at runtime. We want to respect the user intention.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161825
Approved by: https://github.com/janeyx99
2025-10-08 16:40:45 +00:00
0a3e4e894c [PP]: Optimize memory by early releasing stage inputs' gradients (#164329)
Seems that we can release input activations' gradients early in `stage_backward()` in PP, which helps to reduce the peak memory.

I tested this using `1F1B` and `Interleaved1F1B` PP strategy (for simplicity, I use 4 decoder layers of llama3, set PP size to 2 and set num_microbatches to 128)  based on torchtitan
run command using torchtitan:
```bash
CUDA_VISIBLE_DEVICES=4,5 LOG_RANK=0,1 NGPU=2 CONFIG_FILE=./torchtitan/models/llama3/train_configs/llama3_8b.toml ./run_train.sh --metrics.log_freq 1  --training.seq_len 8192 --training.steps 10 --parallelism.data_parallel_shard_degree 1 --activation_checkpoint.mode full --model.tokenizer_path /workspace/torchtitan-v0.1.0/torchtitan/torchtitan/datasets/tokenizer/original/tokenizer.model --tr
aining.dataset wikipedia  --parallelism.pipeline_parallel_degree 2  --training.local_batch_size 128 --parallelism.pipeline_parallel_microbatch_size 1 --training.dataset_path /workspace/wikipedia_subset --training.seed 42 --parallelism.pipeline_parallel_schedule 1F1B
```
## 1F1B torchtitan train results
### before fix
<img width="1526" height="606" alt="b8e281cce1dac15e827c216e7d83f402" src="https://github.com/user-attachments/assets/545c0a80-6276-40c0-893f-fd2df0a53b8d" />

### after fix
<img width="1526" height="594" alt="70d5ceba311a8398d041189bf8897cfc" src="https://github.com/user-attachments/assets/0d606e08-238a-4115-a1c0-b40df101d867" />

after fix, the memory usage on rank1, i.e., non first stages saving 6.9GB compare to before fix. the memory usage on rank0 remains unchanged (rank0 represents stage0)

## Interleaved1F1B torchtitan train results
### before fix
<img width="1514" height="601" alt="a28b7f9704b9234870619c43194e8a72" src="https://github.com/user-attachments/assets/2c28565f-ffff-4747-a8f5-722b5c65dc7e" />

### after fix
<img width="1526" height="621" alt="2d8d6d956b72885186f8c7059146c41a" src="https://github.com/user-attachments/assets/8c4a4ff2-336b-4e0b-8ac4-014ae22c2ed1" />

after fix, the memory usage on rank1 saving 14.57GB (rank1 holds layer1 and layer3) and rank0 saving 7.5GB (rank0 holds layer0 and layer2)

## Memory snapshot results
also, I have dumped the memory snapshot to observe the memory under the 1F1B PP strategy.

### before fix
<img width="1906" height="918" alt="6fd4e4ba82b8bacf9ca6edee4f3d5581" src="https://github.com/user-attachments/assets/d1b9245c-b09f-43c5-87ce-87ba48533a70" />

we can see the memory is increasing as pp step_microbatches running. (the lifetime of input activation's gradient, i.e., the output of `FusedRMSNormBackward`  lasts too long)

### after fix
<img width="1903" height="918" alt="2e415f25af6750d06e5e647683b212b9" src="https://github.com/user-attachments/assets/b657c8f6-5a56-46bd-8743-f3b8375c81b0" />

after fix, we got more steady memory usage during training. (the input activation's gradient will be released or return allocator soon)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164329
Approved by: https://github.com/H-Huang
2025-10-08 16:12:00 +00:00
73adac05d1 Triton 3.5.x pin update to 7416ffc (#164587)
Updates triton pin to latest: https://github.com/triton-lang/triton/commits/release/3.5.x/

This updates contains 1 cherry-pick to fix flex_attention_fwd regression on B200:
- https://github.com/triton-lang/triton/pull/8366

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164587
Approved by: https://github.com/atalman
2025-10-08 16:07:18 +00:00
eqy
0d39ecb2ce [cuDNN][RNN] cuDNN RNN supports BFloat16 inputs since 9.13 (#164411)
seems to work

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164411
Approved by: https://github.com/Skylion007
2025-10-08 15:26:50 +00:00
90c0825e2d [GHF] Allow reverts from pytorch-auto-revert app (#164911)
This is a bit weird, but author_login is not a unique field, but author_url is.

Explicitly allow https://github.com/apps/pytorch-auto-revert to issue revert commands

Update mocks by running
```
sed -i -e s/8e262b0495bd934d39dda198d4c09144311c5ddd6cca6a227194bd48dbfe7201/47860a8f57a214a426d1150c29893cbc2aa49507f12b731483b1a1254bca3428/ gql_mocks.json
```

Test plan: Run
```python
from trymerge import GitHubPR
pr=GitHubPR("pytorch", "pytorch", 164660)
print(pr.get_last_comment().author_url, pr.get_comment_by_id(3375785595).author_url)
```
that should produce
```
https://github.com/pytorch-auto-revert https://github.com/apps/pytorch-auto-revert
```
Plus added a regression test that checks two particular comments for revert validity

`pytorch-auto-revert` user is my alter ego :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164911
Approved by: https://github.com/jeanschmidt
2025-10-08 15:15:45 +00:00
fd4bde430a Revert "list_stored_sd_metadata API. (#160610)"
This reverts commit da903b6a8be422529d47649e89c0d50bb95c37ca.

Reverted https://github.com/pytorch/pytorch/pull/160610 on behalf of https://github.com/jeffdaily due to broke ROCm CI, but flaky also on CUDA CI https://hud.pytorch.org/failure?name=periodic%20%2F%20linux-jammy-rocm-py3.10%20%2F%20test%20(distributed%2C%202%2C%203%2C%20linux.rocm.gpu.mi250.4%2C%20module%3Arocm%2C%20oncall%3Adistributed)&jobName=undefined&failureCaptures=distributed%2Fcheckpoint%2Ftest_list_stored_state_dict.py%3A%3ATestListStateDict%3A%3Atest_list_stored_sd_metadata ([comment](https://github.com/pytorch/pytorch/pull/160610#issuecomment-3382023022))
2025-10-08 15:10:38 +00:00
b5e93ffdcf Revert "Limit path search within range (#164581)"
This reverts commit 415e641572473479fc9d9eaea12762e1a223a9e0.

Reverted https://github.com/pytorch/pytorch/pull/164581 on behalf of https://github.com/eellison due to merge sets makes this trickier ([comment](https://github.com/pytorch/pytorch/pull/164581#issuecomment-3381955240))
2025-10-08 14:56:21 +00:00
f8d0d65ddc Revert "Add memory estimator (#164738)"
This reverts commit ab01a0d7d352e7fd07989b8d6bf035bf82aea74e.

Reverted https://github.com/pytorch/pytorch/pull/164738 on behalf of https://github.com/eellison due to merge sets makes this trickier ([comment](https://github.com/pytorch/pytorch/pull/164581#issuecomment-3381955240))
2025-10-08 14:56:21 +00:00
f46ddb1e65 [ROCm][CI] add gfx1150 gfx1151 to docker images for binary builds (#164854)
Fixes #164346.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164854
Approved by: https://github.com/jeffdaily

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
2025-10-08 14:34:22 +00:00
20082d7136 Revert "fix flex attention eager bwd: more rounding (#164317)"
This reverts commit 41808b2ba9a61ab2f4c7af394c1668d09a4a0331.

Reverted https://github.com/pytorch/pytorch/pull/164317 on behalf of https://github.com/jeffdaily due to inductor/test_flex_attention.py::TestFlexAttentionCUDA::test_builtin_score_mods_seqlen_lt_custom_sparse_block_size_score_mod4_cuda_float16 [GH job link](https://github.com/pytorch/pytorch/actions/runs/18330774537/job/52207370954) [HUD commit link](41808b2ba9) ([comment](https://github.com/pytorch/pytorch/pull/164317#issuecomment-3381812090))
2025-10-08 14:29:10 +00:00
7158aa22e8 remove more (#164753)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164753
Approved by: https://github.com/aorenste, https://github.com/mlazos
ghstack dependencies: #164664, #164665, #164667, #164668
2025-10-08 14:23:38 +00:00
2035f6b2e6 use check_size instead of check_is_size in ops.py (#164668)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164668
Approved by: https://github.com/angelayi
ghstack dependencies: #164664, #164665, #164667
2025-10-08 14:23:38 +00:00
2b58adc3bd [inductor][templates] Distinguish between kernel input nodes and codegen input nodes (#163752)
If there is a single autotuner choice, the wrong type of input node is used to instantiate `TritonTemplateBuffer` through `TritonTemplateCaller.output_node`. This PR distinguishes the input nodes used in `AlgorithmSelectorCache.__call__` between the actual inputs passed to the kernel at runtime, vs the possibly viewed inputs that influence scheduling behaviour (e.g. `MemoryDeps`) and codegen. See the added unit test for more detail.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163752
Approved by: https://github.com/eellison
2025-10-08 14:12:14 +00:00
322091d8d8 [opaque_obj] Add make_fx tracing support (#163278)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163278
Approved by: https://github.com/zou3519
ghstack dependencies: #163279, #163277
2025-10-08 09:09:16 +00:00
2bb4e6876c [opaque obj] Error for torch.library.custom_op infer_schema (#163277)
Unsure how we can get infer_schema to infer the scriptObject type from just the type annotation, so for now will just error clearly and ask users to specify a schema.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163277
Approved by: https://github.com/zou3519
ghstack dependencies: #163279
2025-10-08 09:09:16 +00:00
56ef7743fc [opaque_obj] Add __eq__ and __deepcopy__ (#163279)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163279
Approved by: https://github.com/zou3519
2025-10-08 09:09:16 +00:00
64108bdbed [BC-Breaking] Remove long-deprecated casting functions from native_functions.yaml (#164641)
This PR removes `torch._cast_XXX` from generated OPs. They were deprecated in PyTorch 1

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164641
Approved by: https://github.com/albanD, https://github.com/justinchuby
2025-10-08 08:27:58 +00:00
c855f8632e Pyrefly suppressions 7/n (#164913)
Adds suppressions to pyrefly will typecheck clean: https://github.com/pytorch/pytorch/issues/163283

Almost there!

Test plan:
dmypy restart && python3 scripts/lintrunner.py -a
pyrefly check

step 1: delete lines in the pyrefly.toml file from the project-excludes field
step 2: run pyrefly check
step 3: add suppressions, clean up unused suppressions
before: https://gist.github.com/maggiemoss/4b3bf2037014e116bc00706a16aef199

after:
 INFO 0 errors (6,884 ignored)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164913
Approved by: https://github.com/oulgen
2025-10-08 07:27:17 +00:00
12d2ef557f Update round size with 1 division behavior (#162203)
have round size return nearest power of 2 greater than or equal to size with 1 division

Fixes #161139

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162203
Approved by: https://github.com/ezyang
2025-10-08 06:41:46 +00:00
65aa62d50d Use codegen for the boxed interpreters (#164573)
Authored with claude code.  The arg parsing is kind of horrible, open
to more suggestions.

Signed-off-by: Edward Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164573
Approved by: https://github.com/albanD, https://github.com/jansel
2025-10-08 06:27:44 +00:00
6a09f9306c Fix #164742, all header-impl'd userfacing functions should be inline (#164871)
It is as @mxmpl pointed out; we are missing an inline.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164871
Approved by: https://github.com/mikaylagawarecki
2025-10-08 05:57:19 +00:00
19bf67be32 multimem reduce (#164517)
Modified `multimem_one_shot_all_reduce_out` function to accept a `root` argument, making it a `multimem_reduce` op.

The original `multimem_one_shot_all_reduce` op becomes a caller of the `multimem_reduce`, with each rank providing its own rank id as root.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164517
Approved by: https://github.com/ngimel
2025-10-08 05:25:16 +00:00
1927783aa3 Revert "Reland vision pinned commit hash update (#164492)"
This reverts commit 6861a270624b44954826688f8dad668eb0154452.

Reverted https://github.com/pytorch/pytorch/pull/164492 on behalf of https://github.com/izaitsevfb due to see autorevert msg above, inductor breakage is legit ([comment](https://github.com/pytorch/pytorch/pull/164492#issuecomment-3379537888))
2025-10-08 04:38:26 +00:00
184817c7a8 locks + unit tests (#164636)
Test Plan:
```
buck test fbcode//mode/opt caffe2/test/inductor:caching
```

Reviewed By: aorenste

D83714690

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164636
Approved by: https://github.com/aorenste
2025-10-08 04:34:22 +00:00
da903b6a8b list_stored_sd_metadata API. (#160610)
Summary:
1\ Certain checkpoint load use cases are not aware of the properties of the data/tensors they want to load.
2\ These usecases include data loader checkpoints, reading data for post processing (when the original model definition is not available).
3\ There, we have to use saved checkpoint  (metadata) as our source of truth.
4\ This RFC proposal exposes the checkpoint metadata using a public API.

In this proposal we expose the stored state-dict metadata  (minus associated storage/chunk metadata).

Chunk/storage details should not be exposed to the users and is a impl detail of the storage writer/reader.

Test Plan:
UT.

Rollback Plan:

Differential Revision: D80231457

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160610
Approved by: https://github.com/saumishr
2025-10-08 04:33:51 +00:00
f76fdcaaf8 [Benchmark] cleanup huggingface models (#164815)
Prune models from TorchInductor dashboard to reduce ci cost. This PR prunes for hugging face models according to the [doc](https://docs.google.com/document/d/1nLPNNAU-_M9Clx9FMrJ1ycdPxe-xRA54olPnsFzdpoU/edit?tab=t.0), which reduces from 46 to 27 models.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164815
Approved by: https://github.com/anijain2305, https://github.com/seemethere, https://github.com/huydhn, https://github.com/malfet
2025-10-08 03:21:04 +00:00
608792153f [inductor][codecache] Print bytes in codecache debug output (#164898)
Summary: We have an internal request to help understand why the hash of `post_grad_custom_post_pass` is changing between attempts. We don't get useful info from the debug output, because we just print "<bytes>". Instead, attempt to print at least _some_ of the value in case it contains readable characters.

Test Plan:
Registered a dummy post_grad_custom_pass and printed codecache debug output
`TORCH_LOGS=+torch._inductor.codecache python ~/foo.py`

Yields something like:
```
V1007 16:41:19.024000 3546009 /data/users/slarsen/pytorch-3.10_4/torch/_inductor/codecache.py:989] [0/0] [law2ujt2wzjb5tyiu6jh64r2lxpvl62yvxcsmdouhg3qyelhhdv] post_grad_custom_post_pass: HelloWorld!����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������...
```

Differential Revision: [D84108770](https://our.internmc.facebook.com/intern/diff/D84108770)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164898
Approved by: https://github.com/oulgen
2025-10-08 02:45:20 +00:00
086dec3235 Pyrefly suppressions 6/n (#164877)
Adds suppressions to pyrefly will typecheck clean: https://github.com/pytorch/pytorch/issues/163283

Almost there!

Test plan:
dmypy restart && python3 scripts/lintrunner.py -a
pyrefly check

step 1: delete lines in the pyrefly.toml file from the project-excludes field
step 2: run pyrefly check
step 3: add suppressions, clean up unused suppressions
before: https://gist.github.com/maggiemoss/4b3bf2037014e116bc00706a16aef199

after:

INFO 0 errors (5,064 ignored)

Only four directories left to enable

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164877
Approved by: https://github.com/oulgen
2025-10-08 02:30:57 +00:00
ad7b2bebc6 Use tuples to have a deterministic ordering. (#164851)
When debugging I noticed some non-deterministic behavior and tracked it down to this literal set. Changed to be a tuple for determinism. Changed two other small literal sets also because using a set for a small lookup like that is slow.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164851
Approved by: https://github.com/bobrenjc93, https://github.com/bdhirsh
2025-10-08 02:12:03 +00:00
d444384003 [SymmMem] Tiled reduce (#162243)
Added op: `tile_reduce(Tensor input, Tensor(a!) out, int root, str group_name)`

For now supports only:
- NVSHMEM backed symmetric tensor;
- 2D tensor and tile;
- torch.float.

Testing on right-bottom quandrant:
```
rank 0:
tensor([[0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 1., 1., 1., 1.],
        [0., 0., 0., 0., 1., 1., 1., 1.],
        [0., 0., 0., 0., 1., 1., 1., 1.],
        [0., 0., 0., 0., 1., 1., 1., 1.]], device='cuda:0')
PASSED
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/162243
Approved by: https://github.com/ngimel
2025-10-08 02:03:04 +00:00
3040a5d294 Revert "[dynamo] Support torch.fx.traceback.annotate (#164678)"
This reverts commit 801e282f39e9ef4424dfd3ecfd2b550a44595229.

Reverted https://github.com/pytorch/pytorch/pull/164678 on behalf of https://github.com/izaitsevfb due to breaks executorch internally, see [D84068062](https://www.internalfb.com/diff/D84068062?entry_point=16) ([comment](https://github.com/pytorch/pytorch/pull/164678#issuecomment-3379281844))
2025-10-08 01:49:34 +00:00
97463d4cf3 Revert "Fix double dispatch to Python for detach (#163671)"
This reverts commit c32118dc3e50505fd285e6e448a90883fce11535.

Reverted https://github.com/pytorch/pytorch/pull/163671 on behalf of https://github.com/izaitsevfb due to breaks export tests ([comment](https://github.com/pytorch/pytorch/pull/163671#issuecomment-3379281422))
2025-10-08 01:46:45 +00:00
c813617c53 [PP] Migrate other schedules to use PipelineScheduleRuntime (#164777)
Second fix for https://github.com/pytorch/pytorch/issues/164756

This has been a TODO to make the all schedules execute using the same runtime. Now after this change, schedules will use the same logic for `_PipelineScheduleRuntime` where it adds `UNSHARD` and `RESHARD` operations to the schedules which fixes the issue mentioned above.

<img width="920" height="406" alt="image" src="https://github.com/user-attachments/assets/a4d5bcd0-7dac-43cd-96f9-8ca33cfd8b91" />

A test is failing after the conversion:
- Fixed a gradient scaling issue for dWeight

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164777
Approved by: https://github.com/fegin
ghstack dependencies: #164775
2025-10-08 01:45:57 +00:00
e659661ffa [PP] Fix FSDP unshard/reshard (#164775)
First fix for https://github.com/pytorch/pytorch/issues/164756

In the pipeline IR we call `UNSHARD` and `RESHARD`,  but there is a bug because when we call `module.unshard()` these do not recursively call the FSDP modules, hence leading to sometime call allgather before the module forward.

Since we want the pipeline IR to explicitly handle this, we can call `group.unshard` instead which ensures that all the modules are unsharded.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164775
Approved by: https://github.com/weifengpy
2025-10-08 01:45:57 +00:00
41808b2ba9 fix flex attention eager bwd: more rounding (#164317)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164317
Approved by: https://github.com/drisspg
ghstack dependencies: #163986
2025-10-08 01:17:45 +00:00
c0510dc447 [ContextParallel] add _LoadBalancer classes, and load-balance interface to Context Parallel APIs (#161062)
**Summary**
This PR provides an interface for users to specify how to load-balance the attention
input. The load-balance is essentially a rearrangement of the input tensor(s) over the
seq_dim before sharding and can be specified via an index tensor `rearrange` such
that Q[rearrange] is the balanced Q users want (i.e. `rearrange[i] == j` where `i` is the new
index of `Q[j]` in the balanced Q). An example is the `_generate_round_robin_indices()` added
in https://github.com/pytorch/pytorch/pull/155442.

**New `_LoadBalancer` classes**
New `_LoadBalancer` class (defined in `torch/distributed/tensor/experimental/_load_balancer.py`)
provides one interface for defining load-balance behavior: `_generate_indices(self, restore: bool = False)`.

When `restore == False`, this method should output an index Tensor (namely `rearrange_idx`) such
that QKV will be transformed into Q' K' V' in a way that `Q'[i] == Q[rearrange_idx[i]]` (same applies
to K and V).

When `restore == True`, this method outputs an index Tensor (namely `restore_idx` such that
`Q'[restore_idx] == Q` (same applies to K and V).

**Impact**
2 public CP APIs and 1 private CP API is modified. This PR should be backward-compatible by:
- For uses w/ SDPA, existing users must be using the `context_parallel()` API which does not
take in the extra `load_balancer` argument and solely determines from the global var
`_cp_options.enable_load_balance`.
- For new users including who want to try `flex_attention()`, we require to use the new API
`_context_parallel_buffers` to explicitly shard the QKV input instead of using `context_parallel()`
because we no longer rely on TorchDispatchMode nor TorchFunctionMode for op replacement. And
we also require users to explicitly pass in a `load_balancer` argument if load-balancing is demanded.

**Load-Balance Behavior**
`context_parallel_unshard()`, and `create_cp_block_mask()` APIs now take an extra optional argument
`load_balancer`. This argument is optional because of backward compatibility but we require new users
to explicitly pass in a `load_balancer` if load-balancing is demanded:
- if `load_balancer == None` and `_cp_options.enable_load_balance == False`, CP performs
no load-balancing on input Tensors.
- if `load_balancer == None` and `_cp_options.enable_load_balance ==True`, CP performs
head-tail load-balancing (e.g. split a Tensor into 2*N chunks and first N are called head and
the rest are called tail. Place the first head chunk the last tail chunk on rank 0, and the second
head along with the second last tail chunk on rank 1, and so on).

`_context_parallel_buffers()` also takes the extra optional argument `load_balancer`, but the behavior
is slightly different from the other 2 APIs -- it doesn't branch on `_cp_options.enable_load_balance` :
- if `load_balancer == None`, no load-balancing will be performed
- otherwise, apply load-balancing using `load_balancer._generate_indices()` before sharding.

**Changes**
This PR moves the index Tensor generation logic into a set of LoadBalancer classes and
make LoadBalancer the common interface for Context Parallel APIs that leverages
load-balancing:
* _context_parallel_buffers
* context_parallel_unshard
* create_cp_block_mask

The `_LoadBalancer` classes added are:
- `_LoadBalancer`: the abstract base class that provides “_generate_indices” interface index Tensor generation.
- `_HeadTailLoadBalancer`: Implements head-tail balancing logic.
- `_PerDocumentHeadTailLoadBalancer`: Supports per-document head-tail balancing for batched sequences.

**Test**
`pytest test/distributed/tensor/test_attention.py`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161062
Approved by: https://github.com/fegin
2025-10-08 01:09:14 +00:00
9ec10dc26a utils + unit tests (#164551)
Test Plan:
```
buck test fbcode//mode/opt caffe2/test/inductor:caching
```

Reviewed By: aorenste

Differential Revision: D83714691

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164551
Approved by: https://github.com/aorenste
2025-10-08 01:05:45 +00:00
43fc859625 Don't return values in void functions (#164809)
This PR fixes returning values in void C++ functions.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164809
Approved by: https://github.com/janeyx99
2025-10-08 01:04:14 +00:00
f713abab16 Revert "Enable all flake8-logging-format rules (#164655)"
This reverts commit e98c4e835b1db22092fc93b49d2cddd7b3537d1f.

Reverted https://github.com/pytorch/pytorch/pull/164655 on behalf of https://github.com/malfet due to Looks like it broke lint in trunk, see bd3b98a8a5/1 ([comment](https://github.com/pytorch/pytorch/pull/164655#issuecomment-3379209309))
2025-10-08 00:55:17 +00:00
bd3b98a8a5 [dynamic shapes] make backed_size_oblivious behavior consistent b/w symbolic_shapes/inductor (#164796)
Summary: call guard_or_ directly to enable backed_size_obl in inductor calls to guard_or

Test Plan: CI and unit test added.

Differential Revision: D84009392

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164796
Approved by: https://github.com/laithsakka
2025-10-08 00:19:06 +00:00
e98c4e835b Enable all flake8-logging-format rules (#164655)
These rules are enabled by removing existing suppressions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164655
Approved by: https://github.com/janeyx99
2025-10-08 00:16:13 +00:00
7b15534434 [export] Fix weight sharing when there is no complete tensor (#164857)
Summary: As titled.

Test Plan: CI

Differential Revision: D84079625

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164857
Approved by: https://github.com/yushangdi
2025-10-07 23:40:13 +00:00
c32118dc3e Fix double dispatch to Python for detach (#163671)
This fixes #71725.

Differential Revision: [D83857880](https://our.internmc.facebook.com/intern/diff/D83857880)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/163671
Approved by: https://github.com/ezyang, https://github.com/albanD
2025-10-07 23:34:37 +00:00
e3ae80fc03 [PP] Let PP split BlockMask into micro-BlockMask (#164111)
BlockMask has batch dimension information. So PP has to split it as well just like all other tensors. All the tensors in BlockMask have the batch dimension, so we can just split it without too many issues. However, `mask_mod` requires the batch index as the input, which the value is going to be changed after the split. So we have to wrap it inside a closure to modify the batch index.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164111
Approved by: https://github.com/H-Huang
2025-10-07 23:25:34 +00:00
483f4e0db9 CUDA 13.0 builds fix on Amazon Linux 2023 (#164870)
During 2.9 rc testing I am seeing an issue on Amazon Linux 2023 with CUDA 13.0 builds

This is related to:
 https://github.com/pytorch/pytorch/issues/152756

Workflow: https://github.com/pytorch/test-infra/actions/runs/18324074610/job/52184079262

Error:
```
WARNING: There was an error checking the latest version of pip.
+ python3.11 .ci/pytorch/smoke_test/smoke_test.py --package torchonly
Traceback (most recent call last):
  File "/usr/local/lib64/python3.11/site-packages/torch/__init__.py", line 333, in _load_global_deps
    ctypes.CDLL(global_deps_lib_path, mode=ctypes.RTLD_GLOBAL)
  File "/usr/lib64/python3.11/ctypes/__init__.py", line 376, in __init__
    self._handle = _dlopen(self._name, mode)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: libcudart.so.13: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/pytorch/pytorch/.ci/pytorch/smoke_test/smoke_test.py", line 12, in <module>
    import torch
  File "/usr/local/lib64/python3.11/site-packages/torch/__init__.py", line 425, in <module>
    _load_global_deps()
  File "/usr/local/lib64/python3.11/site-packages/torch/__init__.py", line 383, in _load_global_deps
    _preload_cuda_deps(lib_folder, lib_name)
  File "/usr/local/lib64/python3.11/site-packages/torch/__init__.py", line 317, in _preload_cuda_deps
    raise ValueError(f"{lib_name} not found in the system path {sys.path}")
Traceback (most recent call last):
ValueError: libnvToolsExt.so.*[0-9] not found in the system path ['/pytorch/pytorch/.ci/pytorch/smoke_test', '/usr/lib64/python311.zip', '/usr/lib64/python3.11', '/usr/lib64/python3.11/lib-dynload', '/usr/local/lib64/python3.11/site-packages', '/usr/local/lib/python3.11/site-packages', '/usr/lib64/python3.11/site-packages', '/usr/lib/python3.11/site-packages']
  File "/home/ec2-user/actions-runner/_work/test-infra/test-infra/test-infra/.github/scripts/run_with_env_secrets.py", line 102, in <module>
    main()
  File "/home/ec2-user/actions-runner/_work/test-infra/test-infra/test-infra/.github/scripts/run_with_env_secrets.py", line 98, in main
    run_cmd_or_die(f"docker exec -t {container_name} /exec")
  File "/home/ec2-user/actions-runner/_work/test-infra/test-infra/test-infra/.github/scripts/run_with_env_secrets.py", line 39, in run_cmd_or_die
    raise RuntimeError(f"Command {cmd} failed with exit code {exit_code}")
RuntimeError: Command docker exec -t 7d9c5bd403cac9a9ee824d63a1d6f6057ecce89a7daa94a81617dbf8eff0ff2e /exec failed with exit code 1
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164870
Approved by: https://github.com/Camyll

Co-authored-by: Eli Uriegas <1700823+seemethere@users.noreply.github.com>
2025-10-07 22:52:53 +00:00
d1a62c8036 [BE][Ez]: Enable RUF007 Prefer itertools.pairwise over zip slicing (#164856)
Now that our min version is 3.10 we can support this rule. This is more concise, readable, and efficient than the previous zip slicing.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164856
Approved by: https://github.com/williamwen42
2025-10-07 22:51:17 +00:00
6861a27062 Reland vision pinned commit hash update (#164492)
Redo https://github.com/pytorch/pytorch/pull/154694

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164492
Approved by: https://github.com/yangw-dev
2025-10-07 22:45:05 +00:00
955f21dc2c [ROCm][CI] Add support for gfx1100 in rocm workflow + test skips (#148355)
This PR adds infrastructure support for gfx1100 in the rocm workflow. Nodes have been allocated for this effort.
@dnikolaev-amd contributed all the test skips.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148355
Approved by: https://github.com/jeffdaily

Co-authored-by: Dmitry Nikolaev <dmitry.nikolaev@amd.com>
Co-authored-by: Jeff Daily <jeff.daily@amd.com>
2025-10-07 22:36:25 +00:00
9f5e1beaf3 [multi-kernel] base tensor sizes for shape cache key (#164499)
to match shape key in 3ca09d65f1/torch/_inductor/select_algorithm.py (L3571)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164499
Approved by: https://github.com/ColinPeppler
2025-10-07 21:27:40 +00:00
2e027e8742 [inductor] Improve bound on the number of dims to match for the block (#163755)
- Removes redundant broadcast code when `len(kernel.range_tree_nodes)` is much larger than `len(range_tree.nodes)`. For example:
```python
# before, the broadcast is to [1, 1, XBLOCK, R0_BLOCK]
tmp0 = tl.reshape(tl.broadcast_to(tl.load(block_ptr0, boundary_check=[2], padding_option='zero', eviction_policy='evict_last')[:, None, :, :], [(511 + XBLOCK) // 512, ((1) * ((1) <= ((511 + XBLOCK) // 512)) + ((511 + XBLOCK) // 512) * (((511 + XBLOCK) // 512) < (1))), ((512) * ((512) <= (XBLOCK)) + (XBLOCK) * ((XBLOCK) < (512))), R0_BLOCK]), [XBLOCK, R0_BLOCK])
# after
tmp0 = tl.reshape(tl.load(block_ptr0, boundary_check=[2], padding_option='zero', eviction_policy='evict_last'), [XBLOCK, R0_BLOCK])
```
- Fix: also save range_tree_nodes per subgraph

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163755
Approved by: https://github.com/eellison, https://github.com/blaine-rister
2025-10-07 21:02:37 +00:00
1e42fde45e Revert "[CUDA] Add experimental green context support for SM carveout (#159104)"
This reverts commit 746fe78ecd52f3e9cfddda41f0ac82dada7bdd0b.

Reverted https://github.com/pytorch/pytorch/pull/159104 on behalf of https://github.com/malfet due to Breaks Windows CD build ([comment](https://github.com/pytorch/pytorch/pull/159104#issuecomment-3378675515))
2025-10-07 20:51:22 +00:00
f505caa71b Revert "multimem reduce (#164517)"
This reverts commit d1cbb74fb16406488a174832e1b58b7c242f418d.

Reverted https://github.com/pytorch/pytorch/pull/164517 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/164517#issuecomment-3378529654))
2025-10-07 20:12:38 +00:00
65f10becdf Support OVERLAP_F_B in schedule (#161072)
Previously, we converted the overlap_f_b into separate forward and backward operations in the plan. This is a small change that includes it in the plan and handles it in the runtime

Pull Request resolved: https://github.com/pytorch/pytorch/pull/161072
Approved by: https://github.com/fegin, https://github.com/wconstab
2025-10-07 19:55:10 +00:00
df640df68a Revert "Reapply "C++-accessible Placements via pybind11 (#163030)" (#164519)"
This reverts commit 8c0bc879b97bc580aaa0777b2d266bdd068cb528.

Reverted https://github.com/pytorch/pytorch/pull/164519 on behalf of https://github.com/malfet due to Still breaks internal workflows ([comment](https://github.com/pytorch/pytorch/pull/164519#issuecomment-3378469432))
2025-10-07 19:46:17 +00:00
4c3c0ef2f1 [precompile] Load source cache for AOT compile as well. (#164773)
Adding source_get_cache also to AOT compile case. Since the guard manager loader code can be shared between AOT and caching, we added a new function load_guard_manager to avoid code duplication between two workflows, for loading guards.

Test Plan: test_guard_serialization.py

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164773
Approved by: https://github.com/yiming0416, https://github.com/dolpm
2025-10-07 18:47:09 +00:00
bc33b10202 fix copy_ for scalar in inductor (#164167)
Fixes #158437

### Summary

- TorchInductor was not properly handling scalar copy operations `(tensor.copy_(scalar_value))`
- Ensured scalar sources are converted to appropriate tensor representations with correct dtype and device

### Impact

- Enables compilation of models using ` tensor.copy_(scalar) `patterns
- module: inductor

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164167
Approved by: https://github.com/shunting314
2025-10-07 18:31:37 +00:00
2855a045b3 Use sym_eq and sym_and on symbolic shapes in common_meta_baddbmm_bmm (#164781)
Differential Revision: D84005053

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164781
Approved by: https://github.com/Skylion007
2025-10-07 18:25:00 +00:00
9ecd092bd9 Add python bindings for NCCL CTA policies (#164309)
NCCLConfig can now be constructed with non-default [cta policies][1]

```python
import torch
from torch.distributed import ProcessGroupNCCL as nccl

config = nccl.NCCLConfig()
config.cta_policy = nccl.NCCL_CTA_POLICY_ZERO  # NCCL version >= 2.28
```

[1]: https://docs.nvidia.com/deeplearning/nccl/archives/nccl_2283/user-guide/docs/api/flags.html#nccl-communicator-cta-policy-flags

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164309
Approved by: https://github.com/eqy
2025-10-07 18:16:20 +00:00
078d475d3b move partition and compiler fns from stage 1 to stage 2 (#164765)
Differential Revision: D83995689

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164765
Approved by: https://github.com/zhxchen17
2025-10-07 18:02:03 +00:00
f37a6523ef Move version.h to torch/headeronly (#164381)
Differential Revision: [D83685392](https://our.internmc.facebook.com/intern/diff/D83685392)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164381
Approved by: https://github.com/janeyx99
2025-10-07 17:47:30 +00:00
b13cd141b3 Add pyrefly suppressions (#164748)
Adds suppressions to pyrefly will typecheck clean: https://github.com/pytorch/pytorch/issues/163283

Test plan:
dmypy restart && python3 scripts/lintrunner.py -a
pyrefly check

step 1: delete lines in the pyrefly.toml file from the `project-excludes` field
step 2: run pyrefly check
step 3: add suppressions, clean up unused suppressions
before: https://gist.github.com/maggiemoss/4b3bf2037014e116bc00706a16aef199

after:

0 errors (4,263 ignored)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164748
Approved by: https://github.com/oulgen
2025-10-07 17:31:18 +00:00
5e47b4dd60 Remove device_id param from DeviceCachingAllocator::malloc (#164798)
The `malloc` call in DeviceCachingAllocator accepts a DeviceIndex param which
can be confusion because the allocator can only allocate memory for the device
that it corresponds to. This associated device is fixed at construction time
and the runtime param can be misleading.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164798
Approved by: https://github.com/ngimel, https://github.com/cyyever, https://github.com/eqy
2025-10-07 16:42:04 +00:00
ee5389d520 Enable batch samples in sparse tests (#164677)
The test cases are enabled because the issue was fixed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164677
Approved by: https://github.com/albanD
2025-10-07 15:58:37 +00:00
ab01a0d7d3 Add memory estimator (#164738)
Original work by @ShatianWang, with lints applied. I am going to a few changes and add tests in subsequent prs but I want to preserve original commit first.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164738
Approved by: https://github.com/IvanKobzarev
ghstack dependencies: #164568, #164569, #164581
2025-10-07 15:32:27 +00:00
801e282f39 [dynamo] Support torch.fx.traceback.annotate (#164678)
Builds on top of https://github.com/pytorch/pytorch/pull/163673 and https://github.com/pytorch/pytorch/pull/164174. This will be used in the followup PRs to apply regional inductor compilation.

The existing implementation let Dynamo trace into the `torch.fx.traceback.annotate`, but thats not what we want. We want Dynamo to essentially run the torch.fx.traceback.annotate function in eager, so that every Fx node created in Dynamo Fx graph has the custom meta node.

This does not work with graph breaks yet. But we can solve that problem, if needed, in a separate PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164678
Approved by: https://github.com/SherlockNoMad, https://github.com/jansel, https://github.com/xmfan
2025-10-07 14:54:26 +00:00
87c9fbda22 Follow up to PR 163980 for s390x (#164464)
Now with same updates propagated to s390x it works on s390x runners too.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164464
Approved by: https://github.com/atalman
2025-10-07 12:02:29 +00:00
3cc8af2d67 torch.topk: refactor global histogram/cumsum into a dedicated kernel to eliminate redundant memory access (#164459)
# TLDR
This PR removes the regression in torch.topk introduced from torch 2.7.0 and delivers much better performance for large inputs.

The table below reports execution times on H20 for various input sizes with float32 data, extracting the top-100 values. Results indicate that this PR restores and improves performance, especially on large inputs.
| Input Shape    | torch2.6.0 (ms) | torch2.8.0 (ms) | 2.8.0+this PR (ms) |
| -------------- | --------------- | --------------- | ------------------ |
| (1, 1B)        | 36.6            | 1564.1          | 25.6               |
| (1, 100M)      | 3.56            | 17.4            | 2.54               |
| (1, 1000,000)  | 0.135           | 0.145           | 0.098              |
| (512, 128000)  | 1.33            | 1.33            | 1.32               |
| (8192, 128000) | 19.6            | 19.6            | 19.4               |

# Background
After upgrading PyTorch from 2.6.0 to 2.7.0, we observed a significant GPU performance regression in `torch.topk` on NVIDIA GPUs. For instance, extracting the top-1000 largest values from one billion floats on an NVIDIA H20 increased from **36 ms** to **1.6 s**.

Profiling with Nsight Compute indicates that the slowdown is caused by redundant memory accesses introduced in [PR #145536](https://github.com/pytorch/pytorch/pull/145536).

# Analysis

`torch.topk` relies on **RadixSelect** to find the target values. Each radix pass requires computing a histogram of the input values. For large inputs, histogram computation is split into two stages:

1. **Local histogram**: Each CUDA block processes a subset of the input and writes its local histogram to global memory.
2. **Global reduction**: A single CUDA block reads all local histograms from global memory and reduces them into the final global histogram.

Before [PR #145536](https://github.com/pytorch/pytorch/pull/145536), both stages ran inside a single kernel (`radixFindKthValues`), using a semaphore to ensure that all local histograms were completed before reduction.

In PR #145536, the global histogram computation was merged with subsequent top-k calculations into a single kernel (`computeBlockwiseKthCounts`) to avoid the semaphore. While this simplifies synchronization, it introduces **redundant memory reads**:

- `computeBlockwiseKthCounts` launches `numInputSlices * blocks_per_slice` blocks.
- For each row (slice), `blocks_per_slice` CUDA blocks redundantly reload the same local histograms from global memory.

# This PR

To address this inefficiency, we introduce the following optimizations:

1. **Dedicated kernel**: Refactor global histogram and cumsum computation into a separate GPU kernel, `computeDigitCumSum`.
2. **Loop unrolling**: Apply loop unrolling in `computeDigitCumSum` to speed up local histogram reads.

# Performance
We benchmarked torch.topk on NVIDIA H20 with float32 inputs, extracting the top-100 values across different input sizes. The results in the table below demonstrate that this PR effectively eliminates the performance regression introduced in 2.7.0 and delivers substantial improvements on large inputs.

| Input Shape    | torch2.6.0 (ms) | torch2.8.0 (ms) | 2.8.0+this PR (ms) |
| -------------- | --------------- | --------------- | ------------------ |
| (1, 1B)        | 36.6            | 1564.1          | 25.6               |
| (1, 100M)      | 3.56            | 17.4            | 2.54               |
| (1, 1000,000)  | 0.135           | 0.145           | 0.098              |
| (512, 128000)  | 1.33            | 1.33            | 1.32               |
| (8192, 128000) | 19.6            | 19.6            | 19.4               |

Besides, I have verified the correctness of this PR with different inputs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164459
Approved by: https://github.com/ngimel, https://github.com/Skylion007
2025-10-07 11:04:03 +00:00
1fb072ac2a exceptions + unit tests (#164550)
Test Plan:
```
buck test fbcode//mode/opt caffe2/test/inductor:caching
```

Reviewed By: aorenste

Differential Revision: D83714688

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164550
Approved by: https://github.com/aorenste
2025-10-07 10:04:58 +00:00
cac5e13e13 [dynamo] Inline nn module calls using __call__ methods (#164817)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164817
Approved by: https://github.com/SherlockNoMad, https://github.com/mlazos
2025-10-07 08:57:20 +00:00
68350660ee Increase timeout for nightly macOS performance tests to 300 minutes (#164793)
the Test step time recently went slightly up.

hopefully this fixes https://github.com/pytorch/alerting-infra/issues/263
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164793
Approved by: https://github.com/seemethere
2025-10-07 08:44:07 +00:00
ef7e2ca77e remove check_is_size from test_misc.py (#164667)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164667
Approved by: https://github.com/angelayi
ghstack dependencies: #164664, #164665
2025-10-07 07:33:50 +00:00
cdaaf3e4a3 remove size-like based size-oblivious special max simplifications (#164665)
As we removed guard_size_oblivious this simplification is no longer relevant, this is part of the process of
deprecation for guard_size_oblivious and its dependencies.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164665
Approved by: https://github.com/aorenste
ghstack dependencies: #164664
2025-10-07 07:33:50 +00:00
0ea59c3c55 do not suggest torch._check_is_size() (#164664)
size like concept for data dependency is not relevant anymore as we removed all guard_size_oblivious calls.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164664
Approved by: https://github.com/angelayi, https://github.com/mlazos
2025-10-07 07:33:50 +00:00
8f705d019a context + unit tests (#164549)
Summary:
the context module provides configurable context selection + isolation key hashing;

context selection is broken into runtime and compile context. runtime context is decided at call time (inductor configs, precision configs, etc.) and compile context is decided at compile time (hardware type, software hashes).

callees will be given access to SelectedRuntimeContext and SelectedCompileContext, which they can use to determine and select what context is necessary with regards to the function which is being cached.

these selected contexts are wrapped in an IsolationSchema, which denotes what context should be taken into consideration when producing an isolation key. The isolation key is essentially a salt of the function signature key, which says that some function signature key result is valid under a given context (isolation schema)

Test Plan:
```
buck test fbcode//mode/opt caffe2/test/inductor:caching
```

Reviewed By: aorenste

 D83714689

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164549
Approved by: https://github.com/aorenste
2025-10-07 06:02:10 +00:00
4bcc05777e [torchfuzz] synthesize inputs for data dependent ops (#164716)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164716
Approved by: https://github.com/pianpwk
ghstack dependencies: #164432, #164434, #164514, #164646, #164647, #164649, #164687, #164688, #164693, #164694, #164715
2025-10-07 05:40:32 +00:00
2a6cdba6e5 [torchfuzz] various edge case fixes (#164715)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164715
Approved by: https://github.com/pianpwk
ghstack dependencies: #164432, #164434, #164514, #164646, #164647, #164649, #164687, #164688, #164693, #164694
2025-10-07 05:30:46 +00:00
53f6cc7529 [torchfuzz] make ops_fuzzer deterministic (#164694)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164694
Approved by: https://github.com/pianpwk
ghstack dependencies: #164432, #164434, #164514, #164646, #164647, #164649, #164687, #164688, #164693
2025-10-07 05:30:46 +00:00
ac901bf79a [torchfuzz] consolidate on a base implementation of args_codegen (#164693)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164693
Approved by: https://github.com/pianpwk
ghstack dependencies: #164432, #164434, #164514, #164646, #164647, #164649, #164687, #164688
2025-10-07 05:20:28 +00:00
c965d6dbb2 [torchfuzz] move into experimental dir (#164688)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164688
Approved by: https://github.com/pianpwk
ghstack dependencies: #164432, #164434, #164514, #164646, #164647, #164649, #164687
2025-10-07 05:09:08 +00:00
ac08556f67 [torchfuzz] support more unbacked functions (#164687)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164687
Approved by: https://github.com/pianpwk
ghstack dependencies: #164432, #164434, #164514, #164646, #164647, #164649
2025-10-07 05:00:03 +00:00
5fe7f29b9e [torchfuzz] add support for operator weights (#164649)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164649
Approved by: https://github.com/pianpwk
ghstack dependencies: #164432, #164434, #164514, #164646, #164647
2025-10-07 05:00:03 +00:00
ded099ecbf [torchfuzz] don't use the first gpu in multi process fuzzer (#164647)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164647
Approved by: https://github.com/pianpwk
ghstack dependencies: #164432, #164434, #164514, #164646
2025-10-07 04:59:56 +00:00
63fcc3e6c4 [torchfuzz] update README.md (#164646)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164646
Approved by: https://github.com/pianpwk
ghstack dependencies: #164432, #164434, #164514
2025-10-07 04:59:50 +00:00
fd3e15c14f Fix typo in class definition of bytecodedispatchtable (#164762)
ghstack-source-id: 84f0d7bb7e3780ca75473782abfae530010be56e
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164761

Fixes the type in naming of bytecodedispatchtable

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164762
Approved by: https://github.com/StrongerXi, https://github.com/williamwen42
2025-10-07 04:36:09 +00:00
ff5faa744a Remove unused THPXXX macros (#164660)
These macros are not used in OSS.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164660
Approved by: https://github.com/albanD
2025-10-07 04:04:21 +00:00
4725871a81 Return fake mode from export graph capture API (#164730)
This PR is to temporarily unblock various experiments to re-use dynamo create fake mode. Note that this is still not what we want as the end state. The end state should look sth like:
```
out = fulllgraph_capture(mod, inputs)
fake_mode = out.backend_inputs.fake_mode
gm  = out.module()
```
This doesn't work today because export requires we need to wrap the original module to setup a flat module to trace for easier handling of pytree. As a result, we would need to carry export specific flag in fullgraph_capture which seems not ideal.
Regardless, the end state is that we need to give downstream user a graph module and a fake mode in some form, so I think _dynamo_graph_capture_for_export returning a fake mode within graph module itself via gm.meta

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164730
Approved by: https://github.com/avikchaudhuri
2025-10-07 03:42:46 +00:00
bcd96cc6ff [annotate] Copy fwd to bwd metadata for subgraphs as well (#164795)
The test is in the next PR. My older PR on dynamo annotate - https://github.com/pytorch/pytorch/pull/164678 is getting reverted due to unknown reasons, so difficult to add a test right now in this PR.  When I reland, I can add a test for this as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164795
Approved by: https://github.com/yushangdi
ghstack dependencies: #164772
2025-10-07 02:42:47 +00:00
50e077beaa Fix outdated info in requirements-ci.txt (#164441)
Fixes installation instructions and descriptions for `numba` and `scikit-image`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164441
Approved by: https://github.com/albanD
2025-10-07 02:10:41 +00:00
56d66ac0d7 Make custom op alias check consistent (#164576)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164576
Approved by: https://github.com/soulitzer
2025-10-07 02:05:09 +00:00
49f7d8d19d [ROCm] Fix test_cuda_synchronize failure on ROCm (#164735)
This PR skips the hipify step of torch/csrc/jit/ir/ir.h to avoid a build-time error for the JIT cuda namespace.  This fixes two skipped tests in test/jit/test_cuda.py.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164735
Approved by: https://github.com/jeffdaily

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
2025-10-07 01:14:24 +00:00
afee8062d5 Revert "Fix mesh.get_local_rank when it is > 1d (#164473)"
This reverts commit 83d71dfb2fd993a6242372b8123549acaa85ffdb.

Reverted https://github.com/pytorch/pytorch/pull/164473 on behalf of https://github.com/izaitsevfb due to appears to be causing vision_maskrcnn regression ([comment](https://github.com/pytorch/pytorch/pull/164473#issuecomment-3374738997))
2025-10-07 00:37:41 +00:00
e89d12bf5d Numpy zerotensor handling (#164487)
Fixes #89034

Updated tensor_to_numpy() function in tensor_numpy.cpp to handle ZeroTensors by throwing an error if force=False and returning an array full of zeros if force=True.

@ngimel, I just saw that you mentioned PyTorch is not too concerned with this issue but I had already worked on it so I figured I would push it anyways and see what you thought. Feel free to close the PR if you think it is not worth merging.

@albanD

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164487
Approved by: https://github.com/izaitsevfb
2025-10-07 00:34:14 +00:00
d4752bc7f6 [caffe2] tweak Unpickler::readInstruction handling TUPLE (#164764)
Summary: Creating the vector was a bit awkward. Use the natural iterator-pair constructor with move-iterators.

Test Plan: CI.

Reviewed By: dolpm

Differential Revision: D83995108

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164764
Approved by: https://github.com/drisspg
2025-10-07 00:18:10 +00:00
44a5d41993 [ROCm] add gfx1150 gfx1151 to supported gemm lists (#164744)
This is one of a few PRs needed to address https://github.com/pytorch/pytorch/pull/164744 fully.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164744
Approved by: https://github.com/jeffdaily

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
2025-10-07 00:02:23 +00:00
361c5d362c [fx][traceback] Actually disable preservation of node metadata when enable=False (#164772)
This will come in handy when we run graph passes that add new nodes, and
create_proxy can add seq_nr meta.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164772
Approved by: https://github.com/SherlockNoMad
2025-10-06 23:39:12 +00:00
1fc71d1b57 Revert "Numpy zerotensor handling (#164487)"
This reverts commit f7ad6dbad67161333a1473d1e0b478b7475a0ec1.

Reverted https://github.com/pytorch/pytorch/pull/164487 on behalf of https://github.com/malfet due to Did it break torchbench?, see 8c728e129d/1 ([comment](https://github.com/pytorch/pytorch/pull/164487#issuecomment-3374635051))
2025-10-06 23:32:12 +00:00
8f54e27e5d [ROCm][CI] rebuild magma binary for gfx1150 gfx1151 (#164782)
After #164763 added gfx1150 gfx1151 to list of targets, this PR will trigger rebuild of magma binary for ROCm 7 with the new targets.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164782
Approved by: https://github.com/jeffdaily

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
2025-10-06 23:29:21 +00:00
8c0bc879b9 Reapply "C++-accessible Placements via pybind11 (#163030)" (#164519)
This makes Placement data representation available in C++ via pybind11. Reapply with fix for internal errors.

D83788896

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164519
Approved by: https://github.com/Skylion007, https://github.com/ezyang
2025-10-06 23:19:14 +00:00
746fe78ecd [CUDA] Add experimental green context support for SM carveout (#159104)
Low-level PyTorch APIs should be usable/stable enough at this point but we might move the underlying driver API usage a bit from here...

Built on top of @drisspg 's branch

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159104
Approved by: https://github.com/ngimel

Co-authored-by: drisspg <drisspguessous@gmail.com>
2025-10-06 23:11:23 +00:00
b63bbe1661 Remove old ROCm version check in tests (#164245)
This PR removes ROCm<6 version checks.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164245
Approved by: https://github.com/jeffdaily
2025-10-06 22:42:01 +00:00
3912ba3e94 Revert "Fix refine_ranges corner case (#164075)"
This reverts commit 27234792add2ee9bedd84ca02dbf34f8f244bc5c.

Reverted https://github.com/pytorch/pytorch/pull/164075 on behalf of https://github.com/izaitsevfb due to fails executorch builds, see [D83938444](https://www.internalfb.com/diff/D83938444) ([comment](https://github.com/pytorch/pytorch/pull/164075#issuecomment-3374430964))
2025-10-06 22:09:39 +00:00
cfc5cc17dc Revert "[dynamo] Support torch.fx.traceback.annotate (#164678)"
This reverts commit 2883b5ab773daf5861d43ff0b65be49a441ab3f9.

Reverted https://github.com/pytorch/pytorch/pull/164678 on behalf of https://github.com/izaitsevfb due to fails inductor:max_autotune tests internally, see D83948169 ([comment](https://github.com/pytorch/pytorch/pull/164678#issuecomment-3374407009))
2025-10-06 22:03:42 +00:00
fdc8ccc5bc Make Adam, AdamW work with nonzero-dim Tensor betas (#149939)
Fixes #147921

## Changes

- Convert tensor `betas` using `_to_scalar`
- Change annotation of `betas` param
- Change param type in docs

## Test Result

```bash
pytest -s test/test_optim.py -k test_tensor_lr -vv
```

![image](https://github.com/user-attachments/assets/312ee045-1e8b-4789-aa6e-ba63e6df7e81)

![image](https://github.com/user-attachments/assets/7e6ec274-645b-46b9-b1a6-2b340a685203)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149939
Approved by: https://github.com/janeyx99

Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com>
2025-10-06 22:03:25 +00:00
48b54b45d6 Replace pynvml with nvidia-ml-py in win-test.sh (#164681)
pynvml was deprecated.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164681
Approved by: https://github.com/Aidyn-A, https://github.com/eqy
2025-10-06 21:57:26 +00:00
677 changed files with 11505 additions and 11217 deletions

View File

@ -344,7 +344,7 @@ docker build \
--build-arg "NINJA_VERSION=${NINJA_VERSION:-}" \
--build-arg "KATEX=${KATEX:-}" \
--build-arg "ROCM_VERSION=${ROCM_VERSION:-}" \
--build-arg "PYTORCH_ROCM_ARCH=${PYTORCH_ROCM_ARCH:-gfx90a;gfx942}" \
--build-arg "PYTORCH_ROCM_ARCH=${PYTORCH_ROCM_ARCH:-gfx90a;gfx942;gfx1100}" \
--build-arg "IMAGE_NAME=${IMAGE_NAME}" \
--build-arg "UCX_COMMIT=${UCX_COMMIT}" \
--build-arg "UCC_COMMIT=${UCC_COMMIT}" \

View File

@ -1 +1 @@
27664085f804afc83df26f740bb46c365854f2c4
7416ffcb92cdbe98d9f97e4e6f95247e46dfc9fd

View File

@ -46,9 +46,9 @@ case ${DOCKER_TAG_PREFIX} in
BASE_TARGET=rocm
GPU_IMAGE=rocm/dev-ubuntu-22.04:${GPU_ARCH_VERSION}-complete
PYTORCH_ROCM_ARCH="gfx900;gfx906;gfx908;gfx90a;gfx942;gfx1030;gfx1100;gfx1101;gfx1102;gfx1200;gfx1201"
# add gfx950 conditionally starting in ROCm 7.0
# add gfx950, gfx115x conditionally starting in ROCm 7.0
if [[ "$GPU_ARCH_VERSION" == *"7.0"* ]]; then
PYTORCH_ROCM_ARCH="${PYTORCH_ROCM_ARCH};gfx950"
PYTORCH_ROCM_ARCH="${PYTORCH_ROCM_ARCH};gfx950;gfx1150;gfx1151"
fi
DOCKER_GPU_BUILD_ARG="--build-arg PYTORCH_ROCM_ARCH=${PYTORCH_ROCM_ARCH} --build-arg ROCM_VERSION=${GPU_ARCH_VERSION}"
;;

View File

@ -115,6 +115,9 @@ RUN env GRPC_PYTHON_BUILD_SYSTEM_OPENSSL=True pip3 install grpcio
# cmake-3.28.0 from pip for onnxruntime
RUN python3 -mpip install cmake==3.28.0
ADD ./common/patch_libstdc.sh patch_libstdc.sh
RUN bash ./patch_libstdc.sh && rm patch_libstdc.sh
# build onnxruntime 1.21.0 from sources.
# it is not possible to build it from sources using pip,
# so just build it from upstream repository.

View File

@ -84,9 +84,9 @@ case ${image} in
DEVTOOLSET_VERSION="11"
GPU_IMAGE=rocm/dev-almalinux-8:${GPU_ARCH_VERSION}-complete
PYTORCH_ROCM_ARCH="gfx900;gfx906;gfx908;gfx90a;gfx942;gfx1030;gfx1100;gfx1101;gfx1102;gfx1200;gfx1201"
# add gfx950 conditionally starting in ROCm 7.0
# add gfx950, gfx115x conditionally starting in ROCm 7.0
if [[ "$GPU_ARCH_VERSION" == *"7.0"* ]]; then
PYTORCH_ROCM_ARCH="${PYTORCH_ROCM_ARCH};gfx950"
PYTORCH_ROCM_ARCH="${PYTORCH_ROCM_ARCH};gfx950;gfx1150;gfx1151"
fi
DOCKER_GPU_BUILD_ARG="--build-arg ROCM_VERSION=${GPU_ARCH_VERSION} --build-arg PYTORCH_ROCM_ARCH=${PYTORCH_ROCM_ARCH} --build-arg DEVTOOLSET_VERSION=${DEVTOOLSET_VERSION}"
;;

View File

@ -120,9 +120,8 @@ ninja==1.11.1.4
numba==0.55.2 ; python_version == "3.10" and platform_machine != "s390x"
numba==0.60.0 ; python_version == "3.12" and platform_machine != "s390x"
#Description: Just-In-Time Compiler for Numerical Functions
#Pinned versions: 0.54.1, 0.49.0, <=0.49.1
#Pinned versions: 0.55.2, 0.60.0
#test that import: test_numba_integration.py
#For numba issue see https://github.com/pytorch/pytorch/issues/51511
#Need release > 0.61.2 for s390x due to https://github.com/numba/numba/pull/10073
#numpy
@ -242,10 +241,9 @@ pygments==2.15.0
#Pinned versions: 14.1.0
#test that import:
scikit-image==0.19.3 ; python_version < "3.10"
scikit-image==0.22.0 ; python_version >= "3.10"
scikit-image==0.22.0
#Description: image processing routines
#Pinned versions:
#Pinned versions: 0.22.0
#test that import: test_nn.py
#scikit-learn

View File

@ -5,7 +5,7 @@ DESIRED_ROCM ?= 7.0
DESIRED_ROCM_SHORT = $(subst .,,$(DESIRED_ROCM))
PACKAGE_NAME = magma-rocm
# inherit this from underlying docker image, do not pass this env var to docker
#PYTORCH_ROCM_ARCH ?= gfx900;gfx906;gfx908;gfx90a;gfx942;gfx950;gfx1030;gfx1100;gfx1101;gfx1102;gfx1200;gfx1201
#PYTORCH_ROCM_ARCH ?= gfx900;gfx906;gfx908;gfx90a;gfx942;gfx950;gfx1030;gfx1100;gfx1101;gfx1102;gfx1150;gfx1151;gfx1200;gfx1201
DOCKER_RUN = set -eou pipefail; ${DOCKER_CMD} run --rm -i \
-v $(shell git rev-parse --show-toplevel)/.ci:/builder \
@ -18,7 +18,6 @@ DOCKER_RUN = set -eou pipefail; ${DOCKER_CMD} run --rm -i \
.PHONY: all
all: magma-rocm70
all: magma-rocm64
all: magma-rocm63
.PHONY:
clean:
@ -34,8 +33,3 @@ magma-rocm70:
magma-rocm64: DESIRED_ROCM := 6.4
magma-rocm64:
$(DOCKER_RUN)
.PHONY: magma-rocm63
magma-rocm63: DESIRED_ROCM := 6.3
magma-rocm63:
$(DOCKER_RUN)

View File

@ -67,7 +67,7 @@ fi
# wheels with cxx11-abi
echo "Checking that the gcc ABI is what we expect"
if [[ "$(uname)" != 'Darwin' && "$(uname -m)" != "s390x" ]]; then
if [[ "$(uname)" != 'Darwin' ]]; then
# We also check that there are cxx11 symbols in libtorch
#
echo "Checking that symbols in libtorch.so have the right gcc abi"

View File

@ -886,7 +886,7 @@ test_inductor_torchbench_smoketest_perf() {
done
# Perform some "warm-start" runs for a few huggingface models.
for test in AlbertForQuestionAnswering AllenaiLongformerBase DistilBertForMaskedLM DistillGPT2 GoogleFnet YituTechConvBert; do
for test in AllenaiLongformerBase DistilBertForMaskedLM DistillGPT2 GoogleFnet YituTechConvBert; do
python benchmarks/dynamo/huggingface.py --accuracy --training --amp --inductor --device cuda --warm-start-latency \
--only $test --output "$TEST_REPORTS_DIR/inductor_warm_start_smoketest_$test.csv"
python benchmarks/dynamo/check_accuracy.py \

View File

@ -38,7 +38,7 @@ if [[ "$BUILD_ENVIRONMENT" == *cuda* ]]; then
fi
# TODO: Move this to .ci/docker/requirements-ci.txt
python -m pip install "psutil==5.9.1" "pynvml==11.4.1" "pytest-shard==0.1.2"
python -m pip install "psutil==5.9.1" nvidia-ml-py "pytest-shard==0.1.2"
run_tests() {
# Run nvidia-smi if available

View File

@ -28,6 +28,10 @@ runs:
echo "instance-type: $(get_ec2_metadata instance-type)"
echo "system info $(uname -a)"
- name: Print GPU info (if present)
shell: bash
run: if [ -f /usr/bin/nvidia-smi ]; then nvidia-smi; fi
- name: Check if in a container runner
shell: bash
id: check_container_runner
@ -82,37 +86,6 @@ runs:
# Prune all of the docker images
docker system prune -af
- name: Manually resolve download.pytorch.org
shell: bash
continue-on-error: true
run: |
set +e
set -x
PT_DOMAIN=download.pytorch.org
# TODO: Flaky access to download.pytorch.org https://github.com/pytorch/pytorch/issues/100400,
# cleaning this up once the issue is fixed. There are more than one resolved IP here, the last
# one is returned at random
RESOLVED_IP=$(dig -4 +short "${PT_DOMAIN}" | tail -n1)
if [ -z "${RESOLVED_IP}" ]; then
echo "Couldn't resolve ${PT_DOMAIN}, retrying with Google DNS..."
RESOLVED_IP=$(dig -4 +short "${PT_DOMAIN}" @8.8.8.8 | tail -n1)
if [ -z "${RESOLVED_IP}" ]; then
echo "Couldn't resolve ${PT_DOMAIN}, exiting..."
exit 1
fi
fi
if grep -r "${PT_DOMAIN}" /etc/hosts; then
# Clean up any old records first
sudo sed -i "/${PT_DOMAIN}/d" /etc/hosts
fi
echo "${RESOLVED_IP} ${PT_DOMAIN}" | sudo tee -a /etc/hosts
cat /etc/hosts
- name: Check that the docker daemon is running
shell: bash
continue-on-error: true

Binary file not shown.

View File

@ -18,6 +18,7 @@ class GitHubComment:
body_text: str
created_at: str
author_login: str
author_url: Optional[str]
author_association: str
editor_login: Optional[str]
database_id: int

Binary file not shown.

View File

@ -38,6 +38,7 @@ def mock_get_comments() -> list[GitHubComment]:
body_text="mock_body_text",
created_at="",
author_login="",
author_url=None,
author_association="",
editor_login=None,
database_id=1,
@ -48,6 +49,7 @@ def mock_get_comments() -> list[GitHubComment]:
body_text=" #" + LABEL_ERR_MSG_TITLE.replace("`", ""),
created_at="",
author_login=BOT_AUTHORS[1],
author_url=None,
author_association="",
editor_login=None,
database_id=2,

View File

@ -32,6 +32,7 @@ from trymerge import (
main as trymerge_main,
MandatoryChecksMissingError,
MergeRule,
PostCommentError,
RE_GHSTACK_DESC,
read_merge_rules,
remove_job_name_suffix,
@ -588,6 +589,23 @@ class TestTryMerge(TestCase):
self.assertEqual(mock_merge_base, pr.get_merge_base())
mocked_gh_fetch_merge_base.assert_called_once()
def test_app_can_revert(self, *args: Any) -> None:
pr = GitHubPR("pytorch", "pytorch", 164660)
repo = DummyGitRepo()
app_comment_id, impostor_comment_id = 3375785595, 3377647892
# Check that app can revert
self.assertIsNotNone(validate_revert(repo, pr, comment_id=app_comment_id))
# But impostor can not
self.assertRaises(
PostCommentError,
lambda: validate_revert(repo, pr, comment_id=impostor_comment_id),
)
# Despite it's name being the name of the bot
self.assertEqual(
pr.get_comment_by_id(impostor_comment_id).author_login,
"pytorch-auto-revert",
)
@mock.patch("trymerge.gh_graphql", side_effect=mocked_gh_graphql)
@mock.patch("trymerge.gh_fetch_merge_base", return_value="")

View File

@ -234,6 +234,7 @@ query ($owner: String!, $name: String!, $number: Int!) {
createdAt
author {
login
url
}
authorAssociation
editor {
@ -1093,6 +1094,7 @@ class GitHubPR:
body_text=node["bodyText"],
created_at=node["createdAt"] if "createdAt" in node else "",
author_login=node["author"]["login"],
author_url=node["author"].get("url", None),
author_association=node["authorAssociation"],
editor_login=editor["login"] if editor else None,
database_id=node["databaseId"],
@ -2029,6 +2031,11 @@ def validate_revert(
# For some reason, one can not be a member of private repo, only CONTRIBUTOR
if pr.is_base_repo_private():
allowed_reverters.append("CONTRIBUTOR")
# Special case the pytorch-auto-revert app, whose does not have association
# But should be able to issue revert command
if comment.author_url == "https://github.com/apps/pytorch-auto-revert":
allowed_reverters.append("NONE")
if author_association not in allowed_reverters:
raise PostCommentError(
f"Will not revert as @{author_login} is not one of "

View File

@ -2,7 +2,7 @@ name: inductor-perf-nightly-h100
on:
schedule:
- cron: 15 0,12 * * 1-6
- cron: 15 0 * * 1-6
- cron: 0 7 * * 0
# NB: GitHub has an upper limit of 10 inputs here, so before we can sort it
# out, let try to run torchao cudagraphs_low_precision as part of cudagraphs

View File

@ -63,6 +63,7 @@ jobs:
# Same as the build job
python-version: 3.12.7
test-matrix: ${{ needs.macos-perf-py3-arm64-build.outputs.test-matrix }}
timeout-minutes: 300
disable-monitor: false
monitor-log-interval: 15
monitor-data-collect-interval: 4

View File

@ -128,6 +128,7 @@ jobs:
needs: get-label-type
with:
runner_prefix: "${{ needs.get-label-type.outputs.label-type }}"
runner: linux.2xlarge.memory
build-environment: linux-jammy-py3.10-clang18-asan
docker-image-name: ci-image:pytorch-linux-jammy-py3-clang18-asan
test-matrix: |

View File

@ -59,3 +59,29 @@ jobs:
docker-image: ${{ needs.linux-jammy-rocm-py3_10-build.outputs.docker-image }}
test-matrix: ${{ needs.linux-jammy-rocm-py3_10-build.outputs.test-matrix }}
secrets: inherit
linux-jammy-rocm-py3_10-gfx1100-test:
if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
permissions:
id-token: write
contents: read
name: linux-jammy-rocm-py3_10-gfx1100
uses: ./.github/workflows/_rocm-test.yml
needs:
- linux-jammy-rocm-py3_10-build
- target-determination
with:
build-environment: linux-jammy-rocm-py3.10
docker-image: ${{ needs.linux-jammy-rocm-py3_10-build.outputs.docker-image }}
test-matrix: |
{ include: [
{ config: "default", shard: 1, num_shards: 2, runner: "linux.rocm.gpu.gfx1100" },
{ config: "default", shard: 2, num_shards: 2, runner: "linux.rocm.gpu.gfx1100" },
]}
tests-to-include: >
test_nn test_torch test_cuda test_ops test_unary_ufuncs test_binary_ufuncs
test_autograd inductor/test_torchinductor inductor/test_kernel_benchmark
inductor/test_pad_mm inductor/test_benchmark_fusion inductor/test_aot_inductor
inductor/test_torchinductor inductor/test_decompose_mem_bound_mm
inductor/test_flex_attention inductor/test_max_autotune
secrets: inherit

2
.gitignore vendored
View File

@ -88,7 +88,7 @@ torch_compile_debug/
# Listed manually because some files in this directory are not generated
torch/testing/_internal/generated/annotated_fn_args.py
torch/testing/_internal/data/*.pt
torch/csrc/api/include/torch/version.h
torch/headeronly/version.h
torch/csrc/cudnn/cuDNN.cpp
torch/csrc/generated
torch/csrc/generic/TensorMethods.cpp

View File

@ -28,7 +28,7 @@ exclude_patterns = [
'torch/lib/**',
'venv/**',
'**/*.pyi',
"tools/experimental/dynamic_shapes/torchfuzz/**",
"tools/experimental/torchfuzz/**",
'tools/test/test_selective_build.py',
]
command = [
@ -198,7 +198,7 @@ exclude_patterns = [
'tools/test/gen_operators_yaml_test.py',
'tools/test/gen_oplist_test.py',
'tools/test/test_selective_build.py',
'tools/experimental/dynamic_shapes/torchfuzz/**',
'tools/experimental/torchfuzz/**',
]
command = [
'python3',

View File

@ -13,6 +13,9 @@ load(":build_variables.bzl", "jit_core_sources", "lazy_tensor_ts_sources", "libt
load(":ufunc_defs.bzl", "aten_ufunc_generated_cpu_kernel_sources", "aten_ufunc_generated_cpu_sources", "aten_ufunc_generated_cuda_sources")
load("//:tools/bazel.bzl", "rules")
# Export files for use by torch/headeronly (where version.h generation now lives)
exports_files(["version.txt"])
define_targets(rules = rules)
COMMON_COPTS = [
@ -690,7 +693,9 @@ cc_library(
"torch/csrc/*/generated/*.h",
"torch/csrc/jit/serialization/mobile_bytecode_generated.h",
] + torch_cuda_headers,
) + GENERATED_AUTOGRAD_CPP + [":version_h"],
) + GENERATED_AUTOGRAD_CPP + [
"//torch/headeronly:version_h",
],
includes = [
"third_party/kineto/libkineto/include",
"torch/csrc",

View File

@ -483,8 +483,8 @@ at::BlasBackend Context::blasPreferredBackend() {
#if ROCM_VERSION >= 60300
"gfx1100", "gfx1101", "gfx1200", "gfx1201", "gfx908",
#endif
#if ROCM_VERSION >= 60500
"gfx950"
#if ROCM_VERSION >= 70000
"gfx950", "gfx1150", "gfx1151"
#endif
};
for (auto index: c10::irange(detail::getCUDAHooks().deviceCount())) {
@ -587,20 +587,33 @@ void Context::setROCmFAPreferredBackend(at::ROCmFABackend b) {
rocm_fa_preferred_backend = b;
}
bool Context::allowFP16ReductionCuBLAS() const {
CuBLASReductionOption Context::allowFP16ReductionCuBLAS() const {
return allow_fp16_reduction_cublas;
}
void Context::setAllowFP16ReductionCuBLAS(bool b) {
allow_fp16_reduction_cublas = b;
CuBLASReductionOption inline get_reduction_option(bool allow_reduced_precision, bool allow_splitk) {
TORCH_CHECK(
!(allow_reduced_precision && !allow_splitk),
"allow_splitk=False is not supported when reduced precision reductions are enabled");
if (allow_reduced_precision) {
return CuBLASReductionOption::AllowReducedPrecisionWithSplitK;
} else if (allow_splitk) {
return CuBLASReductionOption::DisallowReducedPrecisionAllowSplitK;
} else {
return CuBLASReductionOption::DisallowReducedPrecisionDisallowSplitK;
}
}
bool Context::allowBF16ReductionCuBLAS() const {
void Context::setAllowFP16ReductionCuBLAS(bool allow_reduced_precision, bool allow_splitk) {
allow_fp16_reduction_cublas = get_reduction_option(allow_reduced_precision, allow_splitk);
}
CuBLASReductionOption Context::allowBF16ReductionCuBLAS() const {
return allow_bf16_reduction_cublas;
}
void Context::setAllowBF16ReductionCuBLAS(bool b) {
allow_bf16_reduction_cublas = b;
void Context::setAllowBF16ReductionCuBLAS(bool allow_reduced_precision, bool allow_splitk) {
allow_bf16_reduction_cublas = get_reduction_option(allow_reduced_precision, allow_splitk);
}
bool Context::allowFP16AccumulationCuBLAS() const {

View File

@ -38,6 +38,12 @@ namespace at {
class Tensor;
enum class TORCH_API Float32MatmulPrecision { HIGHEST, HIGH, MEDIUM };
enum class CuBLASReductionOption : uint8_t {
AllowReducedPrecisionWithSplitK = 0,
DisallowReducedPrecisionAllowSplitK = 1,
DisallowReducedPrecisionDisallowSplitK = 2,
};
enum class TORCH_API Float32Backend { GENERIC, CUDA, MKLDNN };
enum class TORCH_API Float32Op { ALL, CONV, RNN, MATMUL };
enum class TORCH_API Float32Precision { NONE, IEEE, TF32, BF16 };
@ -357,10 +363,14 @@ class TORCH_API Context {
void setAllowTF32CuBLAS(bool);
Float32MatmulPrecision float32MatmulPrecision() const;
Float32Precision float32Precision(Float32Backend backend, Float32Op op) const;
bool allowFP16ReductionCuBLAS() const;
void setAllowFP16ReductionCuBLAS(bool);
bool allowBF16ReductionCuBLAS() const;
void setAllowBF16ReductionCuBLAS(bool);
CuBLASReductionOption allowFP16ReductionCuBLAS() const;
void setAllowFP16ReductionCuBLAS(
bool allow_reduced_precision,
bool allow_splitk = true);
CuBLASReductionOption allowBF16ReductionCuBLAS() const;
void setAllowBF16ReductionCuBLAS(
bool allow_reduced_precision,
bool allow_splitk = true);
bool allowFP16AccumulationCuBLAS() const;
void setAllowFP16AccumulationCuBLAS(bool);
@ -452,8 +462,10 @@ class TORCH_API Context {
: at::Float32MatmulPrecision::HIGHEST;
int benchmark_limit_cudnn = 10;
bool allow_tf32_cudnn = true;
bool allow_fp16_reduction_cublas = true;
bool allow_bf16_reduction_cublas = true;
CuBLASReductionOption allow_fp16_reduction_cublas =
CuBLASReductionOption::AllowReducedPrecisionWithSplitK;
CuBLASReductionOption allow_bf16_reduction_cublas =
CuBLASReductionOption::AllowReducedPrecisionWithSplitK;
bool allow_fp16_accumulation_cublas = false;
std::optional<int32_t> sm_carveout = std::nullopt;
bool enabled_mkldnn = true;

View File

@ -229,14 +229,14 @@ struct TORCH_API SparseTensorImpl : public TensorImpl {
}
void resize_(int64_t sparse_dim, int64_t dense_dim, ArrayRef<int64_t> size) {
return _resize_(sparse_dim, dense_dim, size);
_resize_(sparse_dim, dense_dim, size);
}
void resize_(
int64_t sparse_dim,
int64_t dense_dim,
ArrayRef<c10::SymInt> size) {
return _resize_(sparse_dim, dense_dim, size);
_resize_(sparse_dim, dense_dim, size);
}
// NOTE: this function will resize the sparse tensor and also set `indices`

View File

@ -59,7 +59,7 @@ static inline void set_item(const Tensor& self, ArrayRef<TensorIndex> indices, c
}
}
return set_item(self, indices, value);
set_item(self, indices, value);
}
} // namespace indexing

View File

@ -765,7 +765,8 @@ void TensorIteratorBase::for_each(loop2d_t loop, int64_t grain_size) {
if (numel == 0) {
return;
} else if (numel < grain_size || at::get_num_threads() == 1) {
return serial_for_each(loop, {0, numel});
serial_for_each(loop, {0, numel});
return;
} else {
at::parallel_for(0, numel, grain_size, [&](int64_t begin, int64_t end) {
serial_for_each(loop, {begin, end});

View File

@ -49,7 +49,7 @@ static void check_unique_names(DimnameList names) {
}
void check_names_valid_for(const TensorBase& tensor, DimnameList names) {
return impl::check_names_valid_for(tensor.unsafeGetTensorImpl(), names);
impl::check_names_valid_for(tensor.unsafeGetTensorImpl(), names);
}
void check_names_valid_for(size_t tensor_dim, DimnameList names) {

View File

@ -138,7 +138,7 @@ void Tensor::_backward(TensorList inputs,
const std::optional<Tensor>& gradient,
std::optional<bool> keep_graph,
bool create_graph) const {
return impl::GetVariableHooks()->_backward(*this, inputs, gradient, keep_graph, create_graph);
impl::GetVariableHooks()->_backward(*this, inputs, gradient, keep_graph, create_graph);
}
const TensorBase& TensorBase::requires_grad_(bool _requires_grad) const {
@ -173,4 +173,12 @@ unsigned TensorBase::_register_hook(std::function<TensorBase(const TensorBase&)>
return impl::GetVariableHooks()->_register_hook(*this, std::move(hook));
}
std::optional<ScalarType> TensorBase::grad_dtype() const {
return impl::GetVariableHooks()->grad_dtype(*this);
}
void TensorBase::set_grad_dtype(const std::optional<ScalarType>& grad_dtype) const {
return impl::GetVariableHooks()->set_grad_dtype(*this, grad_dtype);
}
} // namespace at

View File

@ -930,6 +930,10 @@ public:
const TensorBase& requires_grad_(bool _requires_grad=true) const;
std::optional<ScalarType> grad_dtype() const;
void set_grad_dtype(const std::optional<ScalarType>& grad_dtype) const;
// View Variables
//~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

View File

@ -68,6 +68,8 @@ struct TORCH_API VariableHooksInterface {
const c10::OperatorHandle& op,
c10::DispatchKeySet dispatch_keys,
torch::jit::Stack* stack) const = 0;
virtual std::optional<c10::ScalarType> grad_dtype(const TensorBase&) const = 0;
virtual void set_grad_dtype(const TensorBase&, const std::optional<c10::ScalarType>&) const = 0;
};
TORCH_API void SetVariableHooks(VariableHooksInterface* hooks);

View File

@ -496,7 +496,7 @@ class TORCH_API OperatorHandle {
}
void checkInvariants() const {
return operatorDef_->op.checkInvariants();
operatorDef_->op.checkInvariants();
}
c10::ArrayRef<at::Tag> getTags() const {
@ -932,7 +932,7 @@ inline void Dispatcher::redispatchBoxed(
}
#endif
const auto& kernel = entry.lookup(dispatchKeySet);
return kernel.callBoxed(op, dispatchKeySet, stack);
kernel.callBoxed(op, dispatchKeySet, stack);
}
} // namespace c10

View File

@ -422,18 +422,34 @@ static inline bool bgemm_internal_cublaslt(CUDABLAS_BGEMM_ARGTYPES_AND_C_DTYPE(D
abType = CUDA_R_16F;
cType = (std::is_same_v<C_Dtype, float>) ? CUDA_R_32F : CUDA_R_16F;
#ifndef USE_ROCM
if (!at::globalContext().allowFP16ReductionCuBLAS()) {
preference.setAttribute(CUBLASLT_MATMUL_PREF_REDUCTION_SCHEME_MASK,
CUBLASLT_REDUCTION_SCHEME_COMPUTE_TYPE | CUBLASLT_REDUCTION_SCHEME_NONE);
auto fp16_reduction = at::globalContext().allowFP16ReductionCuBLAS();
if (fp16_reduction !=
at::CuBLASReductionOption::AllowReducedPrecisionWithSplitK) {
uint32_t mask =
fp16_reduction ==
at::CuBLASReductionOption::DisallowReducedPrecisionAllowSplitK
? (CUBLASLT_REDUCTION_SCHEME_COMPUTE_TYPE |
CUBLASLT_REDUCTION_SCHEME_NONE)
: CUBLASLT_REDUCTION_SCHEME_NONE;
preference.setAttribute(
CUBLASLT_MATMUL_PREF_REDUCTION_SCHEME_MASK, mask);
}
#endif
} else if constexpr (std::is_same_v<Dtype, at::BFloat16>) {
abType = CUDA_R_16BF;
cType = (std::is_same_v<C_Dtype, float>) ? CUDA_R_32F : CUDA_R_16BF;
#ifndef USE_ROCM
if (!at::globalContext().allowBF16ReductionCuBLAS()) {
preference.setAttribute(CUBLASLT_MATMUL_PREF_REDUCTION_SCHEME_MASK,
CUBLASLT_REDUCTION_SCHEME_COMPUTE_TYPE | CUBLASLT_REDUCTION_SCHEME_NONE);
auto bf16_reduction = at::globalContext().allowBF16ReductionCuBLAS();
if (bf16_reduction !=
at::CuBLASReductionOption::AllowReducedPrecisionWithSplitK) {
uint32_t mask =
bf16_reduction ==
at::CuBLASReductionOption::DisallowReducedPrecisionAllowSplitK
? (CUBLASLT_REDUCTION_SCHEME_COMPUTE_TYPE |
CUBLASLT_REDUCTION_SCHEME_NONE)
: CUBLASLT_REDUCTION_SCHEME_NONE;
preference.setAttribute(
CUBLASLT_MATMUL_PREF_REDUCTION_SCHEME_MASK, mask);
}
#endif
} else {
@ -1120,8 +1136,15 @@ inline void gemm_internal_cublas_half_helper(CUDABLAS_GEMM_ARGTYPES_AND_C_DTYPE(
}
if (prop->major >= 5) {
cublasMath_t cublas_flags = CUBLAS_DEFAULT_MATH;
if (!at::globalContext().allowFP16ReductionCuBLAS()) {
cublas_flags = static_cast<cublasMath_t>(cublas_flags | CUBLAS_MATH_DISALLOW_REDUCED_PRECISION_REDUCTION);
auto fp16_reduction = at::globalContext().allowFP16ReductionCuBLAS();
TORCH_CHECK(fp16_reduction !=
at::CuBLASReductionOption::DisallowReducedPrecisionDisallowSplitK,
"torch.backends.cuda.matmul.allow_fp16_reduced_precision_reduction("
"..., allow_splitk=False) requires the cuBLASLt backend");
if (fp16_reduction !=
at::CuBLASReductionOption::AllowReducedPrecisionWithSplitK) {
cublas_flags = static_cast<cublasMath_t>(
cublas_flags | CUBLAS_MATH_DISALLOW_REDUCED_PRECISION_REDUCTION);
}
// Disallow fp16 reductions that could lead to unexpected overflow issues.
TORCH_CUDABLAS_CHECK(cublasSetMathMode(handle, cublas_flags));
@ -1180,8 +1203,15 @@ inline void gemm_internal_cublas_bfloat16_helper(CUDABLAS_GEMM_ARGTYPES_AND_C_DT
GEMM_CHECK_ARGVALUES(at::BFloat16);
#ifndef USE_ROCM
cublasMath_t cublas_flags = CUBLAS_DEFAULT_MATH;
if (!at::globalContext().allowBF16ReductionCuBLAS()) {
cublas_flags = static_cast<cublasMath_t>(cublas_flags | CUBLAS_MATH_DISALLOW_REDUCED_PRECISION_REDUCTION);
auto bf16_reduction = at::globalContext().allowBF16ReductionCuBLAS();
TORCH_CHECK(bf16_reduction !=
at::CuBLASReductionOption::DisallowReducedPrecisionDisallowSplitK,
"torch.backends.cuda.matmul.allow_fp16_reduced_precision_reduction("
"..., allow_splitk=False) requires the cuBLASLt backend");
if (bf16_reduction !=
at::CuBLASReductionOption::AllowReducedPrecisionWithSplitK) {
cublas_flags = static_cast<cublasMath_t>(
cublas_flags | CUBLAS_MATH_DISALLOW_REDUCED_PRECISION_REDUCTION);
}
#endif
#if defined(USE_ROCM)
@ -1270,7 +1300,7 @@ void gemm_internal<float>(CUDABLAS_GEMM_ARGTYPES(float))
}
#if defined(USE_ROCM) && defined(USE_ROCM_CK_GEMM)
else if (at::globalContext().blasPreferredBackend() == BlasBackend::Ck) {
if (at::detail::getCUDAHooks().isGPUArch({"gfx1100"})) { //no CK GEMM version for gfx1100
if (at::detail::getCUDAHooks().isGPUArch({"gfx11", "gfx12"})) { //no CK GEMM version
gemm_internal_cublaslt<float>(CUDABLAS_GEMM_ARGS(float));
} else{
at::native::gemm_internal_ck<float>(CUDABLAS_GEMM_ARGS(float));
@ -1577,18 +1607,34 @@ bool gemm_and_bias(
abType = CUDA_R_16F;
cType = (std::is_same_v<C_Dtype, float>) ? CUDA_R_32F : CUDA_R_16F;
#ifndef USE_ROCM
if (!at::globalContext().allowFP16ReductionCuBLAS()) {
preference.setAttribute(CUBLASLT_MATMUL_PREF_REDUCTION_SCHEME_MASK,
CUBLASLT_REDUCTION_SCHEME_COMPUTE_TYPE | CUBLASLT_REDUCTION_SCHEME_NONE);
auto fp16_reduction = at::globalContext().allowFP16ReductionCuBLAS();
if (fp16_reduction !=
at::CuBLASReductionOption::AllowReducedPrecisionWithSplitK) {
uint32_t mask =
fp16_reduction ==
at::CuBLASReductionOption::DisallowReducedPrecisionAllowSplitK
? (CUBLASLT_REDUCTION_SCHEME_COMPUTE_TYPE |
CUBLASLT_REDUCTION_SCHEME_NONE)
: CUBLASLT_REDUCTION_SCHEME_NONE;
preference.setAttribute(
CUBLASLT_MATMUL_PREF_REDUCTION_SCHEME_MASK, mask);
}
#endif
} else if constexpr (std::is_same_v<Dtype, at::BFloat16>) {
abType = CUDA_R_16BF;
cType = (std::is_same_v<C_Dtype, float>) ? CUDA_R_32F : CUDA_R_16BF;
#ifndef USE_ROCM
if (!at::globalContext().allowBF16ReductionCuBLAS()) {
preference.setAttribute(CUBLASLT_MATMUL_PREF_REDUCTION_SCHEME_MASK,
CUBLASLT_REDUCTION_SCHEME_COMPUTE_TYPE | CUBLASLT_REDUCTION_SCHEME_NONE);
auto bf16_reduction = at::globalContext().allowBF16ReductionCuBLAS();
if (bf16_reduction !=
at::CuBLASReductionOption::AllowReducedPrecisionWithSplitK) {
uint32_t mask =
bf16_reduction ==
at::CuBLASReductionOption::DisallowReducedPrecisionAllowSplitK
? (CUBLASLT_REDUCTION_SCHEME_COMPUTE_TYPE |
CUBLASLT_REDUCTION_SCHEME_NONE)
: CUBLASLT_REDUCTION_SCHEME_NONE;
preference.setAttribute(
CUBLASLT_MATMUL_PREF_REDUCTION_SCHEME_MASK, mask);
}
#endif
}

View File

@ -326,6 +326,23 @@ bool CUDAHooks::supportsBFloat16ConvolutionWithCuDNNv8() const {
#endif
}
bool CUDAHooks::supportsBFloat16RNNWithCuDNN() const {
#if AT_CUDNN_ENABLED() && (CUDNN_VERSION >= 91300)
if (!hasCUDA()) {
return false;
}
cudaDeviceProp* prop = at::cuda::getCurrentDeviceProperties();
// Check for Volta cores
if (prop->major >= 8) {
return true;
} else {
return false;
}
#else
return false;
#endif
}
long CUDAHooks::versionCuDNN() const {
#if AT_CUDNN_ENABLED()
return CUDNN_VERSION;

View File

@ -45,6 +45,7 @@ struct CUDAHooks : public at::CUDAHooksInterface {
bool supportsDilatedConvolutionWithCuDNN() const override;
bool supportsDepthwiseConvolutionWithCuDNN() const override;
bool supportsBFloat16ConvolutionWithCuDNNv8() const override;
bool supportsBFloat16RNNWithCuDNN() const override;
bool hasCUDART() const override;
long versionCUDART() const override;
long versionCuDNN() const override;

View File

@ -166,6 +166,10 @@ struct TORCH_API CUDAHooksInterface : AcceleratorHooksInterface {
return false;
}
virtual bool supportsBFloat16RNNWithCuDNN() const {
return false;
}
virtual long versionCuDNN() const {
TORCH_CHECK(false, "Cannot query cuDNN version without ATen_cuda library. ", CUDA_HELP);
}

View File

@ -465,11 +465,11 @@ static void dynamicLayerBack(const c10::OperatorHandle& op, torch::jit::Stack* s
// used for functions that have aliasing operations but should be treated like they're out of place (i.e. lift_fresh)
static void dynamicLayerBackGradSpecialCase(const c10::OperatorHandle& op, torch::jit::Stack* stack) {
return dynamicLayerBack(op, stack, true);
dynamicLayerBack(op, stack, true);
}
static void dynamicLayerBackFallback(const c10::OperatorHandle& op, torch::jit::Stack* stack) {
return dynamicLayerBack(op, stack, false);
dynamicLayerBack(op, stack, false);
}
TORCH_LIBRARY_IMPL(_, FuncTorchDynamicLayerFrontMode, m) {

View File

@ -12,7 +12,7 @@
#define MPS_ERROR_NOT_COMPILED "PyTorch code is not compiled with MPS enabled"
#define MPS_ERROR_RUNTIME_TOO_LOW \
"The MPS backend is supported on MacOS 13.0+.", \
"The MPS backend is supported on MacOS 14.0+. ", \
"Current OS version can be queried using `sw_vers`"
#define MPS_ERROR_DOUBLE_NOT_SUPPORTED "Cannot convert a MPS Tensor to float64 dtype " \
"as the MPS framework doesn't support float64. Please use float32 instead."

View File

@ -375,7 +375,7 @@ static void bf16_gemv_trans(
const at::BFloat16 beta,
at::BFloat16* y,
const int incy) {
return bf16_gemv_trans_stub(kCPU, m, n, alpha, a, lda, x, incx, beta, y, incy);
bf16_gemv_trans_stub(kCPU, m, n, alpha, a, lda, x, incx, beta, y, incy);
}
template <>

View File

@ -70,7 +70,7 @@ inline void searchsorted_maybe_trim_input_tensors(
const Tensor& raw_boundaries) {
Tensor trimmed_sorter;
Tensor raw_sorter;
return searchsorted_maybe_trim_input_tensors(
searchsorted_maybe_trim_input_tensors(
trimmed_input,
trimmed_boundaries,
trimmed_sorter,

View File

@ -93,6 +93,12 @@ inline bool cond_cudnn_grid_sampler(
const TensorBase& input,
const TensorBase& grid
) {
auto st = input.scalar_type();
if (!(st == kDouble || st == kFloat || st == kHalf))
return false;
st = grid.scalar_type();
if (!(st == kDouble || st == kFloat || st == kHalf))
return false;
return (
at::native::cudnn_is_acceptable(input) &&
at::native::cudnn_is_acceptable(grid) &&

View File

@ -108,6 +108,13 @@ bool use_mkldnn(const Tensor& input, TensorList params, TensorList hx) {
return false;
}
bool use_cudnn(const Tensor& t) {
bool acceptable = at::cudnn_is_acceptable(t);
auto st = t.scalar_type();
bool bfloat16_cond = st == kBFloat16 && at::detail::getCUDAHooks().supportsBFloat16RNNWithCuDNN();
return acceptable && (bfloat16_cond || st == kDouble || st == kFloat || st == kHalf);
}
template<typename T>
using pair_of = std::pair<T, T>;
@ -1200,7 +1207,7 @@ std::tuple<Tensor, Tensor, Tensor, Tensor, Tensor> _thnn_fused_lstm_cell_backwar
bool train, \
bool bidirectional, \
bool batch_first) { \
if (at::cudnn_is_acceptable(_input)) { \
if (use_cudnn(_input)) { \
Tensor output, hy; \
NAME##_cudnn_stub( \
_input.device().type(), \
@ -1262,7 +1269,7 @@ std::tuple<Tensor, Tensor, Tensor, Tensor, Tensor> _thnn_fused_lstm_cell_backwar
double dropout_p, \
bool train, \
bool bidirectional) { \
if (at::cudnn_is_acceptable(data)) { \
if (use_cudnn(data)) { \
Tensor output, hy; \
NAME##_packed_cudnn_stub( \
data.device().type(), \
@ -1430,7 +1437,7 @@ std::tuple<Tensor, Tensor, Tensor> lstm(
TensorList _params, bool has_biases,
int64_t num_layers, double dropout_p, bool train, bool bidirectional, bool batch_first) {
TORCH_CHECK(hx.size() == 2, "lstm expects two hidden states");
if (at::cudnn_is_acceptable(_input)) {
if (use_cudnn(_input)) {
Tensor output, hy, cy;
lstm_cudnn_stub(_input.device().type(), output, hy, cy, _input, hx, _params, has_biases,
num_layers, dropout_p, train, bidirectional, batch_first);
@ -1491,7 +1498,7 @@ std::tuple<Tensor, Tensor, Tensor> lstm(
TensorList _params, bool has_biases,
int64_t num_layers, double dropout_p, bool train, bool bidirectional) {
TORCH_CHECK(hx.size() == 2, "lstm expects two hidden states");
if (at::cudnn_is_acceptable(data)) {
if (use_cudnn(data)) {
Tensor output, hy, cy;
lstm_packed_cudnn_stub(data.device().type(), output, hy, cy, data, batch_sizes, hx,
_params, has_biases, num_layers, dropout_p, train, bidirectional);

View File

@ -23,14 +23,6 @@
#include <ATen/Functions.h>
#include <ATen/NativeFunctions.h>
#else
#include <ATen/ops/_cast_Byte_native.h>
#include <ATen/ops/_cast_Char_native.h>
#include <ATen/ops/_cast_Double_native.h>
#include <ATen/ops/_cast_Float_native.h>
#include <ATen/ops/_cast_Half_native.h>
#include <ATen/ops/_cast_Int_native.h>
#include <ATen/ops/_cast_Long_native.h>
#include <ATen/ops/_cast_Short_native.h>
#include <ATen/ops/_dim_arange_native.h>
#include <ATen/ops/_efficientzerotensor_native.h>
#include <ATen/ops/_empty_affine_quantized.h>

View File

@ -91,9 +91,6 @@ bool cudnn_is_acceptable(const TensorBase& self) {
return false;
if (!self.is_cuda())
return false;
auto st = self.scalar_type();
if (!(st == kDouble || st == kFloat || st == kHalf))
return false;
if (!detail::getCUDAHooks().compiledWithCuDNN())
return false;
// cuDNN functions like grid_sampler returns CUDNN_STATUS_BAD_PARAM on empty

View File

@ -25,11 +25,11 @@
namespace at::native {
void _backward(const Tensor& self, TensorList inputs, const std::optional<Tensor>& gradient_opt, std::optional<bool> keep_graph, bool create_graph) {
return self._backward(inputs, gradient_opt, keep_graph, create_graph);
self._backward(inputs, gradient_opt, keep_graph, create_graph);
}
void set_data(Tensor& self, const Tensor& new_data) {
return self.set_data(new_data);
self.set_data(new_data);
}
Tensor data(const Tensor& self) {
@ -54,7 +54,7 @@ Tensor& requires_grad_(Tensor& self, bool _requires_grad) {
}
void retain_grad(Tensor& self) {
return self.retain_grad();
self.retain_grad();
}
bool retains_grad(const Tensor& self) {

View File

@ -300,7 +300,8 @@ void div_floor_kernel(TensorIteratorBase& iter) {
// In the special case of unsigned integer division, floor division is
// equivalent to truncation division (since the signs of the divisor and
// dividend are always the same)
return div_trunc_kernel(iter);
div_trunc_kernel(iter);
return;
} else if (isIntegralType(dtype, /*includeBool*/ false)) {
// There's no SIMD integer division, so don't try to vectorize it.
AT_DISPATCH_INTEGRAL_TYPES(dtype, "div_floor_cpu", [&]() {

View File

@ -749,21 +749,29 @@ void flip_kernel(TensorIterator& iter, const bool quantized) {
// });
if (iter_dtype == kByte) {
return cpu_hflip_vec<uint8_t>(iter);
cpu_hflip_vec<uint8_t>(iter);
return;
} else if (iter_dtype == kChar) {
return cpu_hflip_vec<int8_t>(iter);
cpu_hflip_vec<int8_t>(iter);
return;
} else if (iter_dtype == kInt) {
return cpu_hflip_vec<int32_t>(iter);
cpu_hflip_vec<int32_t>(iter);
return;
} else if (iter_dtype == kLong) {
return cpu_hflip_vec<int64_t>(iter);
cpu_hflip_vec<int64_t>(iter);
return;
} else if (iter_dtype == kShort) {
return cpu_hflip_vec<int16_t>(iter);
cpu_hflip_vec<int16_t>(iter);
return;
} else if (iter_dtype == kBool) {
return cpu_hflip_vec<bool>(iter);
cpu_hflip_vec<bool>(iter);
return;
} else if (iter_dtype == kFloat) {
return cpu_hflip_vec<float>(iter);
cpu_hflip_vec<float>(iter);
return;
} else if (iter_dtype == kDouble) {
return cpu_hflip_vec<double>(iter);
cpu_hflip_vec<double>(iter);
return;
}
}
// other dtypes (float16, bfloat16, complex) are handled by cpu_kernel_vec (see below)
@ -778,10 +786,12 @@ void flip_kernel(TensorIterator& iter, const bool quantized) {
c == input_strides_2[1] &&
c == iter.element_size(0) * iter.shape()[0] // checks if dim=1 is contiguous as well
) {
return cpu_hflip_channels_last_vec(iter);
cpu_hflip_channels_last_vec(iter);
return;
}
// Special case: vertical flip using memcpy (faster than generic cpu_kernel_vec)
return cpu_vflip_memcpy(iter);
cpu_vflip_memcpy(iter);
return;
}
AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND3(kBool, kHalf, kBFloat16, iter.dtype(), "flip_cpu",

View File

@ -96,11 +96,14 @@ static void pow_tensor_scalar_kernel(
dtype == kBFloat16 || isComplexType(dtype)) {
// Dispatch to fast specialization for sqrt, rsqrt and reciprocal
if (exp_scalar.equal(.5)) {
return sqrt_kernel(iter);
sqrt_kernel(iter);
return;
} else if (exp_scalar.equal(-0.5)) {
return rsqrt_kernel(iter);
rsqrt_kernel(iter);
return;
} else if (exp_scalar.equal(-1.0)) {
return reciprocal_kernel(iter);
reciprocal_kernel(iter);
return;
}
}

View File

@ -256,10 +256,10 @@ static void norm_kernel_tensor_iterator_impl(
} else {
if (iter.input_dtype() == kHalf && iter.dtype(0) == kFloat) {
// type promotion that does cast and reduction in a single kernel
return norm_kernel_cpu_impl<at::Half, float>(iter, val);
norm_kernel_cpu_impl<at::Half, float>(iter, val); return;
} else if (iter.input_dtype() == kBFloat16 && iter.dtype(0) == kFloat) {
// type promotion that does cast and reduction in a single kernel
return norm_kernel_cpu_impl<at::BFloat16, float>(iter, val);
norm_kernel_cpu_impl<at::BFloat16, float>(iter, val); return;
}
AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES_AND3(kHalf, kBFloat16, kComplexHalf, iter.input_dtype(), "norm_cpu", [&] {

View File

@ -428,10 +428,11 @@ void fp16_gemv_trans(
TORCH_INTERNAL_ASSERT_DEBUG_ONLY(incx == 1 && alpha == 1.0);
#if !defined(__aarch64__) || defined(__ARM_FEATURE_FP16_SCALAR_ARITHMETIC)
if (at::globalContext().allowFP16ReductionCPU()) {
return fp16_gemv_trans_fp16_arith_by_dot_products(m, n, a, lda, x, beta, y, incy);
fp16_gemv_trans_fp16_arith_by_dot_products(m, n, a, lda, x, beta, y, incy);
return;
}
#endif
return fp16_gemv_trans_fp32_arith_by_dot_products(m, n, a, lda, x, beta, y, incy);
fp16_gemv_trans_fp32_arith_by_dot_products(m, n, a, lda, x, beta, y, incy);
}
float bf16_dot_with_fp32_arith(const at::BFloat16* vec1, const at::BFloat16* vec2, int64_t len) {
@ -465,7 +466,7 @@ void bf16_gemv_trans(
at::BFloat16* y,
const int incy) {
TORCH_INTERNAL_ASSERT_DEBUG_ONLY(incx == 1 && alpha == 1.0 && beta == 0.0);
return bf16_gemv_trans_fp32_arith_by_dot_products(m, n, a, lda, x, y, incy);
bf16_gemv_trans_fp32_arith_by_dot_products(m, n, a, lda, x, y, incy);
}
float fp16_dot(

View File

@ -285,8 +285,8 @@ static bool isSupportedHipLtROCmArch(int index) {
#if ROCM_VERSION >= 60300
"gfx1100", "gfx1101", "gfx1200", "gfx1201", "gfx908",
#endif
#if ROCM_VERSION >= 60500
"gfx950"
#if ROCM_VERSION >= 70000
"gfx950", "gfx1150", "gfx1151"
#endif
};
return at::detail::getCUDAHooks().isGPUArch(archs, index);

View File

@ -59,7 +59,7 @@ constexpr uint64_t getDefaultMaxThreadsPerBlock() {
#ifdef USE_ROCM
#define SKIP_SORTED_INDICES 32
template <typename scalar_t, int SZ>
__global__ void indexing_backward_kernel(
__global__ void indexing_backward_kernel_many_indices(
const int64_t* sorted_indices, const int64_t* indices, const scalar_t* grad_output, scalar_t* grad_weight,
int64_t numel, int64_t stride, int64_t stride_before, int64_t outer_dim, bool accumulate) {
using opmath_t = at::opmath_type<scalar_t>;
@ -254,7 +254,8 @@ __global__ void indexing_backward_kernel_stride_1(
}
}
}
#else
#endif
template <typename scalar_t, int SZ>
__global__ void indexing_backward_kernel(
const int64_t* sorted_indices, const int64_t* indices, const scalar_t* grad_output, scalar_t* grad_weight,
@ -333,6 +334,7 @@ __global__ void indexing_backward_kernel(
}
}
#ifndef USE_ROCM
template <typename scalar_t>
__global__ void indexing_backward_kernel_stride_1(
const int64_t* sorted_indices, const int64_t* indices, const scalar_t* grad_output, scalar_t* grad_weight,
@ -780,11 +782,43 @@ void index_put_with_sort_kernel(Tensor & self, const c10::List<std::optional<Ten
kBool,
kBFloat16);
} else {
#ifdef USE_ROCM
if (num_indices >= 200000)
AT_DISPATCH_V2(
expandedValue.scalar_type(),
"indexing_backward_many_indices",
AT_WRAP([&] {
indexing_backward_kernel_many_indices<scalar_t, UNROLL><<<new_grid, block, smem_dups_size, stream>>>(
sorted_indices.const_data_ptr<int64_t>(),
orig_indices.const_data_ptr<int64_t>(),
expandedValue.const_data_ptr<scalar_t>(),
src_.mutable_data_ptr<scalar_t>(),
num_indices,
sliceSize,
strideBefore,
nElemBefore,
accumulate);
C10_CUDA_KERNEL_LAUNCH_CHECK();
}),
AT_EXPAND(AT_ALL_TYPES_AND_COMPLEX),
// AT_EXPAND(AT_FLOAT8_TYPES),
// TODO(#113663): clean up accumulation behavior in float8 dtypes, accumulate=True
// should not be supported here, then reenable AT_FLOAT8_DTYPES
kFloat8_e4m3fn,
kFloat8_e5m2,
kFloat8_e4m3fnuz,
kFloat8_e5m2fnuz,
kComplexHalf,
kHalf,
kBool,
kBFloat16);
else
#endif
AT_DISPATCH_V2(
expandedValue.scalar_type(),
"indexing_backward",
AT_WRAP([&] {
indexing_backward_kernel<scalar_t, UNROLL><<<KERNEL_GRID, block, KERNEL_SMEM, stream>>>(
indexing_backward_kernel<scalar_t, UNROLL><<<grid, block, 0, stream>>>(
sorted_indices.const_data_ptr<int64_t>(),
orig_indices.const_data_ptr<int64_t>(),
expandedValue.const_data_ptr<scalar_t>(),

View File

@ -121,7 +121,7 @@ void cufft_set_plan_cache_max_size_impl(DeviceIndex device_index, int64_t max_si
"cufft_set_plan_cache_max_size: expected 0 <= device_index < ",
at::detail::getCUDAHooks().deviceCount(), "], but got device_index=",
device_index);
return cufft_get_plan_cache(device_index).resize(max_size);
cufft_get_plan_cache(device_index).resize(max_size);
}
int64_t cufft_get_plan_cache_size_impl(DeviceIndex device_index) {
@ -137,7 +137,7 @@ void cufft_clear_plan_cache_impl(DeviceIndex device_index) {
"cufft_clear_plan_cache: expected 0 <= device_index < ",
at::detail::getCUDAHooks().deviceCount(), "], but got device_index=",
device_index);
return cufft_get_plan_cache(device_index).clear();
cufft_get_plan_cache(device_index).clear();
}
} // namespace at::native::detail

View File

@ -230,7 +230,7 @@ constexpr int BLOCK_THREADS = 256;
constexpr int RADIX_BITS = 8;
constexpr int RADIX_DIGITS = 1 << RADIX_BITS; // 2 ^ RADIX_BITS
constexpr int RADIX_MASK = (RADIX_DIGITS - 1);
static_assert(RADIX_DIGITS <= BLOCK_THREADS, "radixFindKthValues kernel requires RADIX_DIGITS <= BLOCK_THREADS");
static_assert(RADIX_DIGITS <= BLOCK_THREADS, "RADIX_DIGITS must be <= BLOCK_THREADS");
constexpr int MIN_ITEMS_PER_THREAD = 4;
constexpr int MAX_ITEMS_PER_THREAD = 64;
@ -242,11 +242,10 @@ __global__ void fill(T* x, T value, IndexType size) {
}
}
// find the kth smallest value,
// for largest topk, k_to_find = slice_size - k + 1
// compute local histogram for each block
template <typename T, typename IndexType, typename Bitwise, int Dim>
C10_LAUNCH_BOUNDS_1(BLOCK_THREADS)
__global__ void radixFindKthValues(
__global__ void computeBlockDigitCounts(
at::cuda::detail::TensorInfo<const T, IndexType> input,
uint32_t slice_size,
uint32_t* ks_to_find, // size: num_slices, unused arg but for mysterious reasons perf is better when it's present
@ -321,12 +320,51 @@ __global__ void radixFindKthValues(
}
}
// compute global histogram and cumsum for each row
__global__ void computeDigitCumSum(
short* counts,
uint32_t* digit_cum_sum,
uint32_t blocks_per_slice) {
int tidx = threadIdx.x + blockIdx.x * blockDim.x;
int digit_idx = threadIdx.x;
uint32_t slice_idx = blockIdx.x;
typedef cub::BlockScan<uint32_t, RADIX_DIGITS> BlockScan;
__shared__ typename BlockScan::TempStorage scan_storage;
// accumulates counters from multiple blocks
uint32_t digit_count = 0;
if (threadIdx.x < RADIX_DIGITS) {
constexpr int HISTO_ACCUM_TILE = 4;
uint32_t rounds = blocks_per_slice / HISTO_ACCUM_TILE;
for (int iter = 0; iter < rounds; iter++) {
int base = HISTO_ACCUM_TILE * iter;
#pragma unroll
for (int j = 0; j < HISTO_ACCUM_TILE; j++) {
int blk = base + j;
digit_count += counts[(slice_idx * blocks_per_slice + blk) * RADIX_DIGITS + digit_idx];
}
}
for (int blk = HISTO_ACCUM_TILE * rounds; blk < blocks_per_slice; blk++) {
digit_count += counts[(slice_idx * blocks_per_slice + blk) * RADIX_DIGITS + digit_idx];
}
}
// compute the block-wide inclusive prefix sum
uint32_t digit_count_cumsum;
BlockScan(scan_storage).InclusiveSum(digit_count, digit_count_cumsum);
__syncthreads();
if (threadIdx.x < RADIX_DIGITS) {
digit_cum_sum[tidx] = digit_count_cumsum;
}
}
// Assumption: k can not be larger than UINT32_MAX
template <typename Bitwise, typename T>
C10_LAUNCH_BOUNDS_1(RADIX_DIGITS) // one thread per digit
__global__ void computeBlockwiseWithinKCounts(
Bitwise* desires_in, // size: num_slices
short* counts, // size: num_slices * blocks_per_slice * radix_digits
uint32_t* digit_cum_sum,
uint32_t* ks_to_find_in, // size: num_slices
uint32_t blocks_per_slice,
int current_bit,
@ -338,7 +376,7 @@ __global__ void computeBlockwiseWithinKCounts(
Bitwise* desires_out,
uint32_t num_blocks
) {
// This kernel should be launched with the same number of blocks as the `radixFindKthValues` kernel.
// This kernel should be launched with the same number of blocks as the `computeBlockDigitCounts` kernel.
int tidx = threadIdx.x;
uint32_t block_idx = getLinearBlockId<uint32_t>();
uint32_t slice_idx = block_idx / blocks_per_slice;
@ -351,36 +389,15 @@ __global__ void computeBlockwiseWithinKCounts(
if (block_idx >= num_blocks) {
return;
}
typedef cub::BlockScan<uint32_t, BLOCK_THREADS> BlockScan;
union __align__(16) TempStorage {
uint32_t digit_count_cumsum[RADIX_DIGITS]; // only used if this it the last block for this slice
typename BlockScan::TempStorage scan_storage;
};
__shared__ TempStorage temp_storage;
// accumulates counters from multiple blocks
uint32_t digit_count = 0;
if (tidx < RADIX_DIGITS) {
for (int blk = 0; blk < blocks_per_slice; ++blk) {
digit_count += counts[(slice_idx * blocks_per_slice + blk) * RADIX_DIGITS + tidx];
}
}
// compute the block-wide inclusive prefix sum
uint32_t digit_count_cumsum;
BlockScan(temp_storage.scan_storage).InclusiveSum(digit_count, digit_count_cumsum);
__syncthreads();
// every thread also need the perfix_sum of it's left value for comparison, so save a copy in shared mem
if (tidx < RADIX_DIGITS) {
temp_storage.digit_count_cumsum[tidx] = digit_count_cumsum;
}
__syncthreads();
__shared__ Bitwise desired;
uint32_t k_to_find = ks_to_find_in[slice_idx];
if (tidx < RADIX_DIGITS) {
uint32_t digit_count_cumsum_left = (tidx == 0) ? 0 : temp_storage.digit_count_cumsum[tidx - 1];
uint32_t position = slice_idx * RADIX_DIGITS + tidx;
uint32_t digit_count_cumsum = digit_cum_sum[position];
uint32_t digit_count_cumsum_left = (tidx == 0) ? 0 : digit_cum_sum[position - 1];
// if not the last pass: update desired and ks_to_find
// if last pass: write out the kth value
@ -466,7 +483,7 @@ template <typename Bitwise>
__global__ void computeBlockwiseKthCounts(
Bitwise* desires, // size: num_slices
short* counts, // size: num_slices * blocks_per_slice * radix_digits
uint32_t num_blocks, // the number of blocks used by `radixFindKthValues` kernel
uint32_t num_blocks, // the number of blocks used by `computeBlockDigitCounts` kernel
uint32_t blocks_per_slice,
// outputs:
uint32_t* kthCounts // size: num_slices * blocks_per_slice == num_blocks
@ -649,9 +666,7 @@ void launch(
T* kthValues = reinterpret_cast<T*>(kthValues_buffer.get());
TORCH_CHECK(blocks_per_slice <= std::numeric_limits<uint32_t>::max(), "blocks_per_slice larger than uint32 maximum is not supported");
auto semaphores_buffer = allocator.allocate(numInputSlices * sizeof(uint32_t));
uint32_t* semaphores = reinterpret_cast<uint32_t*>(semaphores_buffer.get());
AT_CUDA_CHECK(cudaMemsetAsync(semaphores, 0, numInputSlices * sizeof(uint32_t), stream));
auto ks_to_find_buffer = allocator.allocate(2 * numInputSlices * sizeof(uint32_t));
uint32_t* ks_to_find = reinterpret_cast<uint32_t*>(ks_to_find_buffer.get());
@ -668,6 +683,10 @@ void launch(
static_assert(MAX_ITEMS_PER_THREAD * BLOCK_THREADS < std::numeric_limits<short>::max(),
"blockwise counter too large");
auto digit_cum_sum_buffer = allocator.allocate(numInputSlices * RADIX_DIGITS * sizeof(uint32_t));
uint32_t* digit_cum_sum = reinterpret_cast<uint32_t*>(digit_cum_sum_buffer.get());
AT_CUDA_CHECK(cudaMemsetAsync(digit_cum_sum, 0, numInputSlices * RADIX_DIGITS * sizeof(uint32_t), stream));
#if CUB_SUPPORTS_SCAN_BY_KEY()
auto withinKCounts_buffer = allocator.allocate(num_blocks * sizeof(uint32_t));
uint32_t* withinKCounts = reinterpret_cast<uint32_t*>(withinKCounts_buffer.get());
@ -691,7 +710,7 @@ void launch(
// iterate radix bits for multiple passes
for (int current_bit = sizeof(T) * 8 - RADIX_BITS; current_bit >= 0; current_bit -= RADIX_BITS) {
radixFindKthValues<T, IndexType, Bitwise, Dim><<<grid, block, 0, stream>>>(
computeBlockDigitCounts<T, IndexType, Bitwise, Dim><<<grid, block, 0, stream>>>(
input,
inputSliceSize,
ks_to_find_in, // unused arg
@ -704,10 +723,14 @@ void launch(
desired_in,
counts);
C10_CUDA_KERNEL_LAUNCH_CHECK();
computeDigitCumSum<<<numInputSlices, RADIX_DIGITS, 0, stream>>>(counts, digit_cum_sum, blocks_per_slice);
C10_CUDA_KERNEL_LAUNCH_CHECK();
// we unconditionally call this kernel to update desired/ks_to_find/kthValues
// if cub supports scan_by_key we additionally do k counts
computeBlockwiseWithinKCounts<Bitwise, T><<<grid, RADIX_DIGITS, 0, stream>>>(
desired_in, counts, ks_to_find_in, blocks_per_slice, current_bit, largest, withinKCounts, kthValues, ks_to_find_out, desired_out, num_blocks);
desired_in, counts, digit_cum_sum, ks_to_find_in, blocks_per_slice, current_bit, largest, withinKCounts, kthValues, ks_to_find_out, desired_out, num_blocks);
C10_CUDA_KERNEL_LAUNCH_CHECK();
// swap desired/ks_to_find in and out for next iter
auto tmp_desired = desired_in;

View File

@ -1107,10 +1107,14 @@ void ldl_factor_kernel(
auto preferred_backend = at::globalContext().linalgPreferredBackend();
switch (preferred_backend) {
case at::LinalgBackend::Cusolver:
return ldl_factor_cusolver(
{ ldl_factor_cusolver(
LD, pivots, info, upper, hermitian);
return;
}
case at::LinalgBackend::Magma:
return ldl_factor_magma(LD, pivots, info, upper, hermitian);
{ ldl_factor_magma(LD, pivots, info, upper, hermitian);
return;
}
default:
// By default use cusolver if available and magma otherwise.
// If cusolver and magma 2.5.4+ are both available and hermitian=true,
@ -1122,8 +1126,10 @@ void ldl_factor_kernel(
LD, pivots, info, upper, hermitian);
}
#endif
return ldl_factor_cusolver(
LD, pivots, info, upper, hermitian);
{ ldl_factor_cusolver(
LD, pivots, info, upper, hermitian);
return;
}
#else
return ldl_factor_magma(LD, pivots, info, upper, hermitian);
#endif
@ -1839,11 +1845,14 @@ void geqrf_kernel(const Tensor& input, const Tensor& tau) {
// For the benchmarks see
// https://github.com/pytorch/pytorch/pull/56253#discussion_r622851107
if (input.size(-2) <= 256 && batchCount(input) >= std::max<int64_t>(2, input.size(-2) / 16)) {
return geqrf_batched_cublas(input, tau);
geqrf_batched_cublas(input, tau);
return;
} else {
return geqrf_cusolver(input, tau);
geqrf_cusolver(input, tau);
return;
}
return geqrf_batched_cublas(input, tau);
geqrf_batched_cublas(input, tau);
return;
};
auto preferred_backend = at::globalContext().linalgPreferredBackend();
@ -1856,10 +1865,14 @@ void geqrf_kernel(const Tensor& input, const Tensor& tau) {
// - ?geqrf_gpu allows fast computation of Q via ?orgqr_gpu, but doesn't give R properly.
// - ?geqrf2_gpu gives correct R, but doesn't allow computation of Q via ?orgqr_gpu
case at::LinalgBackend::Magma:
return geqrf_magma(input, tau);
{ geqrf_magma(input, tau);
return;
}
case at::LinalgBackend::Cusolver:
default:
return geqrf_cusolver_backend(input, tau);
{ geqrf_cusolver_backend(input, tau);
return;
}
}
#else
return geqrf_magma(input, tau);
@ -2703,13 +2716,17 @@ void gels_looped(const Tensor& a, Tensor& b, Tensor& infos) {
auto preferred_backend = at::globalContext().linalgPreferredBackend();
switch (preferred_backend) {
case at::LinalgBackend::Magma:
return gels_magma(a, b, infos);
{ gels_magma(a, b, infos);
return;
}
case at::LinalgBackend::Cusolver:
default:
// linalg_lstsq_gels is a generic function that is implemented using
// geqrf_stub, ormqr_stub, and triangular_solve_stub
// It dispatches to cuSOLVER for CUDA inputs if USE_LINALG_SOLVER is defined
return linalg_lstsq_gels(a, b, infos);
{ linalg_lstsq_gels(a, b, infos);
return;
}
}
#else
return gels_magma(a, b, infos);

View File

@ -337,8 +337,7 @@ struct BenchmarkCache {
engine_cache_order.begin(), engine_cache_order, it->second.second);
}
} else {
engine_cache.erase(key);
engine_cache.emplace(
engine_cache.insert_or_assign(
key,
std::make_pair(results, engine_cache_order.end())); // dummy iterator
}

View File

@ -371,8 +371,7 @@ struct MHAGraphCache {
}
void update(const KeyType& key, T& results) {
engine_cache.erase(key);
engine_cache.emplace(key, std::move(results));
engine_cache.insert_or_assign(key, std::move(results));
}
};

View File

@ -1222,7 +1222,7 @@ cudnnRNNAlgo_t get_algo(
}
cudnnDataType_t promote_rnn_math_type(cudnnDataType_t dtype) {
if (dtype == CUDNN_DATA_HALF) {
if (dtype == CUDNN_DATA_HALF || dtype == CUDNN_DATA_BFLOAT16) {
return CUDNN_DATA_FLOAT;
}
return dtype;

View File

@ -772,13 +772,21 @@ void dispatch_bfloat16_gemm_wmma(CUDABLAS_GEMM_ARGTYPES(at::BFloat16)) {
template <>
void gemm_internal_ck<at::BFloat16>(CUDABLAS_GEMM_ARGTYPES(at::BFloat16)) {
auto dprops = at::cuda::getCurrentDeviceProperties();
std::string_view arch(dprops->gcnArchName);
if (arch == "gfx1100") {
static const std::vector<std::string> wmma_archs = {
"gfx1100", "gfx1101", "gfx1102", "gfx1200", "gfx1201",
#if ROCM_VERSION >= 70000
"gfx1150", "gfx1151"
#endif
};
if (at::detail::getCUDAHooks().isGPUArch(wmma_archs)) {
dispatch_bfloat16_gemm_wmma(CUDABLAS_GEMM_ARGS(at::BFloat16));
} else{
}
else if (at::detail::getCUDAHooks().isGPUArch({"gfx9"})) {
dispatch_bfloat16_gemm(CUDABLAS_GEMM_ARGS(at::BFloat16));
}
else {
TORCH_CHECK(false, "gemm_internal_ck<at::BFloat16> unsupported gfx arch");
}
}
} // namespace at::native

View File

@ -599,11 +599,21 @@ void dispatch_half_gemm_wmma(CUDABLAS_GEMM_ARGTYPES(at::Half)) {
template <>
void gemm_internal_ck<at::Half>(CUDABLAS_GEMM_ARGTYPES(at::Half)) {
if (at::detail::getCUDAHooks().isGPUArch({"gfx1100"})) {
static const std::vector<std::string> wmma_archs = {
"gfx1100", "gfx1101", "gfx1102", "gfx1200", "gfx1201",
#if ROCM_VERSION >= 70000
"gfx1150", "gfx1151"
#endif
};
if (at::detail::getCUDAHooks().isGPUArch(wmma_archs)) {
dispatch_half_gemm_wmma(CUDABLAS_GEMM_ARGS(at::Half));
} else{
}
else if (at::detail::getCUDAHooks().isGPUArch({"gfx9"})) {
dispatch_half_gemm(CUDABLAS_GEMM_ARGS(at::Half));
}
else {
TORCH_CHECK(false, "gemm_internal_ck<at::Half> unsupported gfx arch");
}
}
} // namespace at::native

View File

@ -373,59 +373,67 @@ void addmm_out_sparse_csr(
if (mat2.layout() == kSparseCsr) {
if (result.layout() == kStrided) {
// TODO: Add native CSC support via cuSPARSE if supported.
return addmm_dense_result(
addmm_dense_result(
mat2.transpose(0, 1).to_sparse_csr(),
mat1.transpose(0, 1),
beta,
alpha,
result.transpose(0, 1));
return;
}
}
if (mat2.layout() == kSparseCsc) {
if (result.layout() == kStrided) {
return addmm_dense_result(
addmm_dense_result(
mat2.transpose(-2, -1),
mat1.transpose(-2, -1),
beta,
alpha,
result.transpose(-2, -1));
return;
}
}
if (mat2.layout() == kSparseBsc) {
if (result.layout() == kStrided) {
return addmm_dense_result(
addmm_dense_result(
mat2.transpose(-2, -1),
mat1.transpose(-2, -1),
beta,
alpha,
result.transpose(-2, -1));
return;
}
}
}
if (mat1.layout() == kSparseCsr) {
if (mat2.layout() == kStrided) {
if (result.layout() == kStrided) {
return addmm_dense_result(mat1, mat2, beta, alpha, result);
addmm_dense_result(mat1, mat2, beta, alpha, result);
return;
}
}
if (mat2.layout() == kSparseCsr) {
if (result.layout() == kStrided) {
return addmm_sparse_input_dense_result(mat1, mat2, beta, alpha, result);
addmm_sparse_input_dense_result(mat1, mat2, beta, alpha, result);
return;
}
if (result.layout() == kSparseCsr) {
return addmm_sparse_result(mat1, mat2, beta, alpha, result);
addmm_sparse_result(mat1, mat2, beta, alpha, result);
return;
}
}
if (mat2.layout() == kSparseCsc) {
if (result.layout() == kStrided) {
// TODO: CSR @ CSC kernel would be very fast due to format alignment
return addmm_sparse_input_dense_result(
mat1, mat2.to_sparse_csr(), beta, alpha, result);
addmm_sparse_input_dense_result(
mat1, mat2.to_sparse_csr(), beta, alpha, result);
return;
}
if (result.layout() == kSparseCsr) {
// TODO: CSR @ CSC kernel would be very fast due to format alignment
return addmm_sparse_result(
mat1, mat2.to_sparse_csr(), beta, alpha, result);
addmm_sparse_result(
mat1, mat2.to_sparse_csr(), beta, alpha, result);
return;
}
}
}
@ -433,56 +441,62 @@ void addmm_out_sparse_csr(
if (mat2.layout() == kStrided) {
if (result.layout() == kStrided) {
// TODO: avoid csc->csr conversion with native csc support
return addmm_dense_result(
mat1.to_sparse_csr(), mat2, beta, alpha, result);
addmm_dense_result(
mat1.to_sparse_csr(), mat2, beta, alpha, result);
return;
}
}
if (mat2.layout() == kSparseCsr) {
if (result.layout() == kSparseCsr) {
// TODO: avoid csc->csr conversion with native csc support
return addmm_sparse_result(
mat1.to_sparse_csr(), mat2, beta, alpha, result);
addmm_sparse_result(
mat1.to_sparse_csr(), mat2, beta, alpha, result);
return;
}
}
if (mat2.layout() == kSparseCsc) {
if (result.layout() == kStrided) {
return addmm_sparse_input_dense_result(
mat2.transpose(-2, -1),
mat1.transpose(-2, -1),
beta,
alpha,
result.transpose(-2, -1));
addmm_sparse_input_dense_result(
mat2.transpose(-2, -1),
mat1.transpose(-2, -1),
beta,
alpha,
result.transpose(-2, -1));
return;
}
if (result.layout() == kSparseCsr) {
// TODO avoid csc->csr
return addmm_sparse_result(
mat1.to_sparse_csr(), mat2.to_sparse_csr(), beta, alpha, result);
addmm_sparse_result(
mat1.to_sparse_csr(), mat2.to_sparse_csr(), beta, alpha, result);
return;
}
if (result.layout() == kSparseCsc) {
return addmm_sparse_result(
mat2.transpose(-2, -1),
mat1.transpose(-2, -1),
beta,
alpha,
result.transpose(-2, -1));
addmm_sparse_result(
mat2.transpose(-2, -1),
mat1.transpose(-2, -1),
beta,
alpha,
result.transpose(-2, -1));
return;
}
}
}
if (mat1.layout() == kSparseBsr) {
if (mat2.layout() == kStrided) {
if (result.layout() == kStrided) {
return addmm_dense_result(mat1, mat2, beta, alpha, result);
addmm_dense_result(mat1, mat2, beta, alpha, result);
return;
}
}
}
TORCH_CHECK(
false,
"addmm: computation on CPU is not implemented for ",
result.layout(),
" + ",
mat1.layout(),
" @ ",
mat2.layout());
false,
"addmm: computation on CPU is not implemented for ",
result.layout(),
" + ",
mat1.layout(),
" @ ",
mat2.layout());
}
/*
@ -496,16 +510,16 @@ void addmm_out_sparse_csr(
[out] result of the operation.
*/
void addmv_out_sparse_csr(
const Tensor& mat,
const Tensor& vec,
const Scalar& beta,
const Scalar& alpha,
const Tensor& result) {
const Tensor& mat,
const Tensor& vec,
const Scalar& beta,
const Scalar& alpha,
const Tensor& result) {
#if !AT_USE_MKL_SPARSE()
TORCH_CHECK(
false,
"Calling addmv on a sparse CPU tensor requires Linux platform. ",
"Please use PyTorch built with MKL on Linux.");
false,
"Calling addmv on a sparse CPU tensor requires Linux platform. ",
"Please use PyTorch built with MKL on Linux.");
#else
c10::MaybeOwned<Tensor> result_ = prepare_dense_vector_for_mkl(result);
c10::MaybeOwned<Tensor> vec_ = prepare_dense_vector_for_mkl(vec);

View File

@ -5,38 +5,6 @@
# representing ScalarType's. They are now superseded by usage of
# `aten::to()`. The ops remain here for backward compatibility purposes.
# DEPRECATED. DO NOT USE
- func: _cast_Byte(Tensor self, bool non_blocking=False) -> Tensor
variants: function
# DEPRECATED. DO NOT USE
- func: _cast_Char(Tensor self, bool non_blocking=False) -> Tensor
variants: function
# DEPRECATED. DO NOT USE
- func: _cast_Double(Tensor self, bool non_blocking=False) -> Tensor
variants: function
# DEPRECATED. DO NOT USE
- func: _cast_Float(Tensor self, bool non_blocking=False) -> Tensor
variants: function
# DEPRECATED. DO NOT USE
- func: _cast_Int(Tensor self, bool non_blocking=False) -> Tensor
variants: function
# DEPRECATED. DO NOT USE
- func: _cast_Long(Tensor self, bool non_blocking=False) -> Tensor
variants: function
# DEPRECATED. DO NOT USE
- func: _cast_Short(Tensor self, bool non_blocking=False) -> Tensor
variants: function
# DEPRECATED. DO NOT USE
- func: _cast_Half(Tensor self, bool non_blocking=False) -> Tensor
variants: function
# Computes the gradient of current tensor w.r.t. graph leaves.
- func: _backward(Tensor self, Tensor[] inputs, Tensor? gradient=None, bool? retain_graph=None, bool create_graph=False) -> ()
manual_cpp_binding: True
@ -6725,12 +6693,12 @@
- func: native_norm(Tensor self, Scalar p=2) -> Tensor
dispatch:
SparseCPU, SparseCUDA: norm_sparse
SparseCPU, SparseCUDA, SparseMPS: norm_sparse
autogen: native_norm.out
- func: native_norm.ScalarOpt_dim_dtype(Tensor self, Scalar? p, int[1] dim, bool keepdim, ScalarType? dtype) -> Tensor
dispatch:
SparseCPU, SparseCUDA: norm_sparse
SparseCPU, SparseCUDA, SparseMPS: norm_sparse
autogen: native_norm.ScalarOpt_dim_dtype_out
- func: _batch_norm_with_update(Tensor input, Tensor? weight, Tensor? bias, Tensor(a!) running_mean, Tensor(b!) running_var, float momentum, float eps) -> (Tensor, Tensor, Tensor, Tensor)
@ -6856,14 +6824,14 @@
device_check: NoCheck # TensorIterator
variants: function, method
dispatch:
SparseCPU, SparseCUDA: sparse_dtype_norm
SparseCPU, SparseCUDA, SparseMPS: sparse_dtype_norm
- func: norm.ScalarOpt_dim(Tensor self, Scalar? p, int[1] dim, bool keepdim=False) -> Tensor
structured_delegate: norm.out
device_check: NoCheck # TensorIterator
variants: function, method
dispatch:
SparseCPU, SparseCUDA: sparse_norm
SparseCPU, SparseCUDA, SparseMPS: sparse_norm
- func: norm.dtype_out(Tensor self, Scalar? p, int[1] dim, bool keepdim, *, ScalarType dtype, Tensor(a!) out) -> Tensor(a!)
structured: True

View File

@ -810,7 +810,8 @@ void addmm_out_sparse_csr(
if (mat1.layout() == kSparseBsr) {
if (mat2.layout() == kStrided) {
if (result.layout() == kStrided)
return block_sparse_mm(input, mat1, mat2, beta, alpha, result);
{ block_sparse_mm(input, mat1, mat2, beta, alpha, result); return;
}
}
}
@ -819,13 +820,13 @@ void addmm_out_sparse_csr(
if (result.layout() == kStrided) {
auto result_t = result.transpose(-2, -1);
auto input_t = (result.is_same(input) ? result_t : input.transpose(-2, -1));
return block_sparse_mm(
block_sparse_mm(
input_t,
mat2.transpose(-2, -1),
mat1.transpose(-2, -1),
beta,
alpha,
result_t);
result_t); return;
}
}
}
@ -840,41 +841,41 @@ void addmm_out_sparse_csr(
if (mat2.layout() == kSparseCsr) {
if (result.layout() == kStrided) {
// TODO: Add native CSC support via cuSPARSE if supported.
return spmm(
spmm(
mat2.transpose(0, 1).to_sparse_csr(),
mat1.transpose(0, 1),
beta,
alpha,
result.transpose(0, 1));
result.transpose(0, 1)); return;
}
}
if (mat2.layout() == kSparseCsc) {
if (result.layout() == kStrided) {
return spmm(
spmm(
mat2.transpose(-2, -1),
mat1.transpose(-2, -1),
beta,
alpha,
result.transpose(-2, -1));
result.transpose(-2, -1)); return;
}
}
}
if (mat1.layout() == kSparseCsr) {
if (mat2.layout() == kStrided) {
if (result.layout() == kStrided) {
return spmm(mat1, mat2, beta, alpha, result);
spmm(mat1, mat2, beta, alpha, result); return;
}
}
if (mat2.layout() == kSparseCsr) {
if (result.layout() == kSparseCsr) {
return spgemm(mat1, mat2, beta, alpha, result);
spgemm(mat1, mat2, beta, alpha, result); return;
}
}
if (mat2.layout() == kSparseCsc) {
if (result.layout() == kSparseCsr) {
// TODO: Add native CSC support via cuSPARSE if supported.
// CSR @ CSC kernel would be very fast due to format alignment
return spgemm(mat1, mat2.to_sparse_csr(), beta, alpha, result);
spgemm(mat1, mat2.to_sparse_csr(), beta, alpha, result); return;
}
}
}
@ -882,27 +883,28 @@ void addmm_out_sparse_csr(
if (mat2.layout() == kStrided) {
if (result.layout() == kStrided) {
// TODO: Add native CSC support via cuSPARSE if supported.
return spmm(mat1.to_sparse_csr(), mat2, beta, alpha, result);
spmm(mat1.to_sparse_csr(), mat2, beta, alpha, result); return;
}
}
if (mat2.layout() == kSparseCsr) {
if (result.layout() == kSparseCsr)
// TODO: Add native CSC support via cuSPARSE if supported.
return spgemm(mat1.to_sparse_csr(), mat2, beta, alpha, result);
{ spgemm(mat1.to_sparse_csr(), mat2, beta, alpha, result); return;
}
}
if (mat2.layout() == kSparseCsc) {
if (result.layout() == kSparseCsr) {
// TODO: Add native CSC support via cuSPARSE if supported.
return spgemm(
mat1.to_sparse_csr(), mat2.to_sparse_csr(), beta, alpha, result);
spgemm(
mat1.to_sparse_csr(), mat2.to_sparse_csr(), beta, alpha, result); return;
}
if (result.layout() == kSparseCsc) {
return spgemm(
spgemm(
mat2.transpose(-2, -1),
mat1.transpose(-2, -1),
beta,
alpha,
result.transpose(-2, -1));
result.transpose(-2, -1)); return;
}
}
}
@ -933,7 +935,7 @@ void addmv_out_sparse_csr(
const Scalar& alpha,
const Tensor& result) {
if (mat.layout() == kSparseBsr) {
return block_sparse_mv(mat, vec, beta, alpha, result);
block_sparse_mv(mat, vec, beta, alpha, result); return;
}
cusparseOperation_t opA = CUSPARSE_OPERATION_NON_TRANSPOSE;
@ -1213,9 +1215,9 @@ void triangular_solve_out_sparse_csr(
}
if (A.layout() == kSparseBsr) {
if (B.size(-1) == 1) {
return block_sparse_triangular_solve_vec(A, B, X, upper, transpose, unitriangular);
block_sparse_triangular_solve_vec(A, B, X, upper, transpose, unitriangular); return;
} else {
return block_sparse_triangular_solve_mat(A, B, X, upper, transpose, unitriangular);
block_sparse_triangular_solve_mat(A, B, X, upper, transpose, unitriangular); return;
}
}
#ifdef USE_ROCM

View File

@ -117,7 +117,7 @@ class FwdKernel:
def get_all(cls) -> list["FwdKernel"]:
kernels: list[FwdKernel] = []
for aligned, dtype, (sm, sm_max) in itertools.product(
[True, False], DTYPES.keys(), zip(SM, SM[1:])
[True, False], DTYPES.keys(), itertools.pairwise(SM)
):
# Remove some kernels we don't use
if dtype == "bf16" and sm < 80:
@ -228,7 +228,7 @@ class BwdKernel:
for aligned, dtype, (sm, sm_max), apply_dropout, max_k in itertools.product(
[True, False],
DTYPES.keys(),
zip(SM, SM[1:]),
itertools.pairwise(SM),
[True, False],
[32, 64, 128, 2**16],
):

View File

@ -0,0 +1,191 @@
#!/usr/bin/env python3
"""
Benchmark for NVSHMEM tile reduce operations.
Usage:
python benchmarks/distributed/bench_nvshmem_tile_reduce.py
This benchmark measures the performance of tile reduce operations across different
matrix sizes and tile configurations.
"""
import time
import torch
import torch.distributed as dist
import torch.distributed._symmetric_memory as symm_mem
from torch.testing._internal.common_distributed import MultiProcContinuousTest
from torch.testing._internal.common_utils import (
requires_cuda_p2p_access,
skip_but_pass_in_sandcastle_if,
skipIfRocm,
)
# Decorator
def requires_nvshmem():
return skip_but_pass_in_sandcastle_if(
not symm_mem.is_nvshmem_available(),
"bench_nvshmem_tile_reduce requires NVSHMEM, skipping benchmark",
)
# So that benchmarks are written in device-agnostic way
device_type = "cuda"
device_module = torch.get_device_module(device_type)
@requires_nvshmem()
@requires_cuda_p2p_access()
class NVSHMEMTileReduceBenchmark(MultiProcContinuousTest):
def _init_device(self) -> None:
# TODO: relieve this (seems to hang if without)
device_module.set_device(self.device)
# Set NVSHMEM as SymmMem backend
symm_mem.set_backend("NVSHMEM")
@property
def device(self) -> torch.device:
return torch.device(device_type, self.rank)
def _benchmark_tile_reduce_single(
self,
full_size: int,
tile_size: int,
warmup_iters: int = 5,
bench_iters: int = 10,
) -> dict:
"""
Benchmark a single configuration of tile reduce.
Args:
full_size: Size of the full matrix (full_size x full_size)
warmup_iters: Number of warmup iterations
bench_iters: Number of benchmark iterations
Returns:
Dictionary with benchmark results
"""
self._init_device()
group_name = dist.group.WORLD.group_name
symm_mem.enable_symm_mem_for_group(group_name)
dtype = torch.float
# Allocate full matrices
full_inp = symm_mem.empty(
full_size, full_size, dtype=dtype, device=self.device
).fill_(self.rank)
full_out = symm_mem.empty(
full_size, full_size, dtype=dtype, device=self.device
).fill_(0)
slice_ut = slice(0, tile_size)
inp_tile = full_inp[slice_ut, slice_ut]
out_tile = full_out[slice_ut, slice_ut]
root = 0
# Warmup iterations
for _ in range(warmup_iters):
torch.ops.symm_mem.tile_reduce(inp_tile, out_tile, root, group_name)
torch.cuda.synchronize(self.device)
# Benchmark iterations
times = []
dist.barrier()
torch.cuda.synchronize(self.device)
start_time = time.perf_counter()
for _ in range(bench_iters):
torch.ops.symm_mem.tile_reduce(inp_tile, out_tile, root, group_name)
torch.cuda.synchronize(self.device)
end_time = time.perf_counter()
times.append((end_time - start_time) / bench_iters)
# Calculate statistics
times = torch.tensor(times, dtype=torch.float64)
tile_elements = tile_size * tile_size
tile_bytes = (
tile_elements * dtype.itemsize
if hasattr(dtype, "itemsize")
else tile_elements * 4
)
results = {
"full_size": full_size,
"tile_size": tile_size,
"tile_elements": tile_elements,
"tile_bytes": tile_bytes,
"world_size": self.world_size,
"mean_time_ms": times.mean().item() * 1000,
"std_time_ms": times.std().item() * 1000,
"min_time_ms": times.min().item() * 1000,
"max_time_ms": times.max().item() * 1000,
"throughput_gb_s": tile_bytes / (times.mean().item() * 1e9),
"elements_per_sec": tile_elements / times.mean().item(),
}
return results
@skipIfRocm
def test_benchmark_tile_reduce_various_sizes(self) -> None:
"""
Benchmark tile reduce across various matrix sizes.
"""
# Test various matrix sizes
tile_sizes = [512, 1024, 2048, 4096, 8192, 16384]
full_size = tile_sizes[-1]
warmup_iters = 5
bench_iters = 20
results = []
for tile_size in tile_sizes:
try:
result = self._benchmark_tile_reduce_single(
full_size, tile_size, warmup_iters, bench_iters
)
results.append(result)
if self.rank == 0:
print(
f"Matrix Size: {full_size}x{full_size}, Tile Size: {tile_size}x{tile_size}"
)
print(
f" Mean Time: {result['mean_time_ms']:.3f} ± {result['std_time_ms']:.3f} ms"
)
print(f" Throughput: {result['throughput_gb_s']:.2f} GB/s")
print(f" Bytes: {result['tile_bytes']:.0f}")
print()
except Exception as e:
if self.rank == 0:
print(f"Failed to benchmark matrix size {full_size}: {e}")
# Print summary
if self.rank == 0 and results:
print("=== BENCHMARK SUMMARY ===")
print(
f"{'Matrix Size':<12} {'Tile Size':<10} {'Time (ms)':<12} {'Throughput (GB/s)':<18} {'Bytes':<15}"
)
print("-" * 70)
for result in results:
print(
f"{result['full_size']}x{result['full_size']:<7} "
f"{result['tile_size']}x{result['tile_size']:<5} "
f"{result['mean_time_ms']:<12.3f} "
f"{result['throughput_gb_s']:<18.2f} "
f"{result['tile_bytes']:<15.0f}"
)
if __name__ == "__main__":
# For standalone usage, you'd need to set up distributed environment
# For now, this is meant to be run via the PyTorch test framework
from torch.testing._internal.common_utils import run_tests
run_tests()

View File

@ -10,7 +10,6 @@ import pandas as pd
flaky_models = {
"yolov3",
"gluon_inception_v3",
"detectron2_maskrcnn_r_101_c4",
"timm_efficientnet", # see https://github.com/pytorch/pytorch/issues/148699
"XGLMForCausalLM", # discovered in https://github.com/pytorch/pytorch/pull/128148
@ -36,15 +35,11 @@ def check_accuracy(actual_csv, expected_csv, expected_filename):
{
"Background_Matting",
"alexnet",
"cait_m36_384",
"dla102",
"demucs",
"densenet121",
"detectron2_fcos_r_50_fpn",
"doctr_det_predictor",
"doctr_reco_predictor",
"dpn107",
"fbnetv3_b",
"hf_BigBird",
"hf_Longformer",
"hf_Reformer",
@ -52,7 +47,6 @@ def check_accuracy(actual_csv, expected_csv, expected_filename):
"hf_T5",
"hf_T5_base",
"hf_T5_generate",
"levit_128",
"llava",
"microbench_unbacked_tolist_sum",
"mnasnet1_0",
@ -69,7 +63,6 @@ def check_accuracy(actual_csv, expected_csv, expected_filename):
"squeezenet1_1",
"stable_diffusion_text_encoder",
"stable_diffusion_unet",
"swsl_resnext101_32x16d",
"timm_efficientdet",
"timm_efficientnet",
"timm_nfnet",

View File

@ -10,7 +10,6 @@ import pandas as pd
flaky_models = {
"yolov3",
"gluon_inception_v3",
"detectron2_maskrcnn_r_101_c4",
"XGLMForCausalLM", # discovered in https://github.com/pytorch/pytorch/pull/128148
"detectron2_fcos_r_50_fpn",
@ -32,7 +31,6 @@ def check_graph_breaks(actual_csv, expected_csv, expected_filename):
flaky_models.update(
{
"alexnet",
"cait_m36_384",
"demucs",
"densenet121",
"detectron2_fcos_r_50_fpn",
@ -44,7 +42,6 @@ def check_graph_breaks(actual_csv, expected_csv, expected_filename):
"hf_Roberta_base",
"hf_T5",
"hf_T5_base",
"levit_128",
"llava",
"microbench_unbacked_tolist_sum",
"resnet50",

View File

@ -6,10 +6,6 @@ AlbertForMaskedLM,pass,0
AlbertForQuestionAnswering,pass,0
AllenaiLongformerBase,pass,4
@ -18,50 +14,22 @@ BartForCausalLM,pass,0
BartForConditionalGeneration,pass,0
BertForMaskedLM,pass,0
BertForQuestionAnswering,pass,0
BlenderbotForCausalLM,pass_due_to_skip,0
BlenderbotSmallForCausalLM,pass,0
BlenderbotSmallForConditionalGeneration,pass,0
CamemBert,pass,0
DebertaV2ForMaskedLM,pass_due_to_skip,0
DebertaV2ForQuestionAnswering,pass,0
DistilBertForMaskedLM,pass,0
DistilBertForQuestionAnswering,pass,0
DistillGPT2,pass,2
@ -70,10 +38,6 @@ ElectraForCausalLM,pass,0
ElectraForQuestionAnswering,pass,0
GPT2ForSequenceClassification,pass,0
@ -86,10 +50,6 @@ LayoutLMForMaskedLM,pass,0
LayoutLMForSequenceClassification,pass,0
M2M100ForConditionalGeneration,pass,0
@ -98,10 +58,6 @@ MBartForCausalLM,pass,0
MBartForConditionalGeneration,pass,0
MT5ForConditionalGeneration,pass,0
@ -110,18 +66,10 @@ MegatronBertForCausalLM,pass,0
MegatronBertForQuestionAnswering,pass,0
MobileBertForMaskedLM,pass,0
MobileBertForQuestionAnswering,pass,0
OPTForCausalLM,pass,0
@ -130,26 +78,14 @@ PLBartForCausalLM,pass,0
PLBartForConditionalGeneration,pass,0
PegasusForCausalLM,pass,0
PegasusForConditionalGeneration,pass,0
RobertaForCausalLM,pass,0
RobertaForQuestionAnswering,pass,0
T5ForConditionalGeneration,pass,0

1 name accuracy graph_breaks
6 BartForConditionalGeneration BlenderbotForCausalLM pass pass_due_to_skip 0
7 BertForMaskedLM DebertaV2ForMaskedLM pass pass_due_to_skip 0
8 BertForQuestionAnswering DistilBertForMaskedLM pass 0
BlenderbotForCausalLM pass_due_to_skip 0
BlenderbotSmallForCausalLM pass 0
BlenderbotSmallForConditionalGeneration pass 0
CamemBert pass 0
9 DebertaV2ForMaskedLM DistillGPT2 pass_due_to_skip pass 0 2
10 DebertaV2ForQuestionAnswering ElectraForCausalLM pass 0
11 DistilBertForMaskedLM GPT2ForSequenceClassification pass 0
14 ElectraForCausalLM M2M100ForConditionalGeneration pass 0
15 ElectraForQuestionAnswering MBartForCausalLM pass 0
16 GPT2ForSequenceClassification MT5ForConditionalGeneration pass 0
GoogleFnet pass 0
LayoutLMForMaskedLM pass 0
LayoutLMForSequenceClassification pass 0
M2M100ForConditionalGeneration pass 0
17 MBartForCausalLM MegatronBertForCausalLM pass 0
18 MBartForConditionalGeneration MobileBertForMaskedLM pass 0
19 MT5ForConditionalGeneration OPTForCausalLM pass 0
20 MegatronBertForCausalLM PLBartForCausalLM pass 0
MegatronBertForQuestionAnswering pass 0
MobileBertForMaskedLM pass 0
MobileBertForQuestionAnswering pass 0
OPTForCausalLM pass 0
21 PLBartForCausalLM PegasusForCausalLM pass 0
22 PLBartForConditionalGeneration RobertaForCausalLM pass 0
23 PegasusForCausalLM T5ForConditionalGeneration pass 0
24 PegasusForConditionalGeneration T5Small pass 0
RobertaForCausalLM pass 0
RobertaForQuestionAnswering pass 0
T5ForConditionalGeneration pass 0
T5Small pass 0
TrOCRForCausalLM pass 0
XGLMForCausalLM pass 0
XLNetLMHeadModel pass 0
YituTechConvBert pass 0
meta-llama/Llama-3.2-1B pass 0
google/gemma-2-2b pass 0
google/gemma-3-4b-it pass_due_to_skip 0
openai/whisper-tiny pass 0
25 Qwen/Qwen3-0.6B TrOCRForCausalLM pass 0
26 mistralai/Mistral-7B-Instruct-v0.3 XGLMForCausalLM pass 0
27 openai/gpt-oss-20b XLNetLMHeadModel pass 0
28 YituTechConvBert pass 0
29 meta-llama/Llama-3.2-1B pass 0
30 google/gemma-2-2b pass 0
31 google/gemma-3-4b-it pass_due_to_skip 0
32 openai/whisper-tiny pass 0
33 Qwen/Qwen3-0.6B pass 0
34 mistralai/Mistral-7B-Instruct-v0.3 pass 0
35 openai/gpt-oss-20b pass 0
38
39
40
41
42
43
50
51
52
53
54
55
58
59
60
61
62
63
66
67
68
69
70
71
72
73
74
75
78
79
80
81
82
83
84
85
86
87
88
89
90
91

View File

@ -6,10 +6,6 @@ AlbertForMaskedLM,pass,4
AlbertForQuestionAnswering,pass,5
AllenaiLongformerBase,pass,9
@ -18,50 +14,22 @@ BartForCausalLM,pass,6
BartForConditionalGeneration,pass,8
BertForMaskedLM,pass,5
BertForQuestionAnswering,pass,5
BlenderbotForCausalLM,eager_fail_to_run,0
BlenderbotSmallForCausalLM,pass,6
BlenderbotSmallForConditionalGeneration,pass,8
CamemBert,pass,5
DebertaV2ForMaskedLM,pass_due_to_skip,0
DebertaV2ForQuestionAnswering,eager_1st_run_OOM,0
DistilBertForMaskedLM,pass,5
DistilBertForQuestionAnswering,pass,5
DistillGPT2,pass,7
@ -70,10 +38,6 @@ ElectraForCausalLM,pass,4
ElectraForQuestionAnswering,pass,5
GPT2ForSequenceClassification,pass,6
@ -86,10 +50,6 @@ LayoutLMForMaskedLM,pass,5
LayoutLMForSequenceClassification,pass,6
M2M100ForConditionalGeneration,pass,4
@ -98,10 +58,6 @@ MBartForCausalLM,pass,6
MBartForConditionalGeneration,pass,8
MT5ForConditionalGeneration,pass,5
@ -110,18 +66,10 @@ MegatronBertForCausalLM,pass,5
MegatronBertForQuestionAnswering,pass,5
MobileBertForMaskedLM,pass,3
MobileBertForQuestionAnswering,pass,3
OPTForCausalLM,pass,8
@ -130,26 +78,14 @@ PLBartForCausalLM,pass,6
PLBartForConditionalGeneration,pass,8
PegasusForCausalLM,pass,6
PegasusForConditionalGeneration,pass,7
RobertaForCausalLM,pass,5
RobertaForQuestionAnswering,pass,5
T5ForConditionalGeneration,pass,5

1 name accuracy graph_breaks
6 BartForConditionalGeneration BlenderbotForCausalLM pass eager_fail_to_run 8 0
7 BertForMaskedLM DebertaV2ForMaskedLM pass pass_due_to_skip 5 0
8 BertForQuestionAnswering DistilBertForMaskedLM pass 5
BlenderbotForCausalLM eager_fail_to_run 0
BlenderbotSmallForCausalLM pass 6
BlenderbotSmallForConditionalGeneration pass 8
CamemBert pass 5
9 DebertaV2ForMaskedLM DistillGPT2 pass_due_to_skip pass 0 7
10 DebertaV2ForQuestionAnswering ElectraForCausalLM eager_1st_run_OOM pass 0 4
11 DistilBertForMaskedLM GPT2ForSequenceClassification pass 5 6
14 ElectraForCausalLM M2M100ForConditionalGeneration pass 4
15 ElectraForQuestionAnswering MBartForCausalLM pass 5 6
16 GPT2ForSequenceClassification MT5ForConditionalGeneration pass 6 5
GoogleFnet pass 5
LayoutLMForMaskedLM pass 5
LayoutLMForSequenceClassification pass 6
M2M100ForConditionalGeneration pass 4
17 MBartForCausalLM MegatronBertForCausalLM pass 6 5
18 MBartForConditionalGeneration MobileBertForMaskedLM pass 8 3
19 MT5ForConditionalGeneration OPTForCausalLM pass 5 8
20 MegatronBertForCausalLM PLBartForCausalLM pass 5 6
MegatronBertForQuestionAnswering pass 5
MobileBertForMaskedLM pass 3
MobileBertForQuestionAnswering pass 3
OPTForCausalLM pass 8
21 PLBartForCausalLM PegasusForCausalLM pass 6
22 PLBartForConditionalGeneration RobertaForCausalLM pass 8 5
23 PegasusForCausalLM T5ForConditionalGeneration pass 6 5
24 PegasusForConditionalGeneration T5Small pass 7 5
RobertaForCausalLM pass 5
RobertaForQuestionAnswering pass 5
T5ForConditionalGeneration pass 5
T5Small pass 5
TrOCRForCausalLM pass 6
XGLMForCausalLM pass 6
XLNetLMHeadModel pass 5
YituTechConvBert pass 5
meta-llama/Llama-3.2-1B eager_fail_to_run 0
google/gemma-2-2b eager_fail_to_run 0
google/gemma-3-4b-it eager_fail_to_run 0
openai/whisper-tiny eager_fail_to_run 0
25 Qwen/Qwen3-0.6B TrOCRForCausalLM eager_fail_to_run pass 0 6
26 XGLMForCausalLM pass 6
27 XLNetLMHeadModel pass 5
28 YituTechConvBert pass 5
29 meta-llama/Llama-3.2-1B eager_fail_to_run 0
30 google/gemma-2-2b eager_fail_to_run 0
31 google/gemma-3-4b-it eager_fail_to_run 0
32 openai/whisper-tiny eager_fail_to_run 0
33 Qwen/Qwen3-0.6B eager_fail_to_run 0
34
35
38
39
40
41
42
43
50
51
52
53
54
55
58
59
60
61
62
63
66
67
68
69
70
71
72
73
74
75
78
79
80
81
82
83
84
85
86
87
88
89
90
91

View File

@ -10,126 +10,22 @@ beit_base_patch16_224,pass,0
botnet26t_256,pass,0
cait_m36_384,pass,0
coat_lite_mini,pass,0
convit_base,pass,0
convmixer_768_32,pass,0
convnext_base,pass,0
crossvit_9_240,pass,0
cspdarknet53,pass,0
deit_base_distilled_patch16_224,pass,0
dla102,pass,0
dm_nfnet_f0,pass,0
dpn107,pass,0
eca_botnext26ts_256,pass,0
eca_halonext26ts,pass,0
ese_vovnet19b_dw,pass,0
fbnetc_100,pass,0
fbnetv3_b,pass,0
gernet_l,pass,0
ghostnet_100,pass,0
gluon_inception_v3,pass,0
gmixer_24_224,pass,0
gmlp_s16_224,pass,0
hrnet_w18,pass,0
inception_v3,pass,0
jx_nest_base,pass,0
lcnet_050,pass,0
levit_128,pass,0
mixer_b16_224,pass,0
mixnet_l,pass,0
mnasnet_100,pass,0
mobilenetv2_100,pass,0
@ -146,100 +42,16 @@ nfnet_l0,pass,0
pit_b_224,pass,0
pnasnet5large,pass,0
poolformer_m36,pass,0
regnety_002,pass,0
repvgg_a2,pass,0
res2net101_26w_4s,pass,0
res2net50_14w_8s,pass,0
res2next50,pass,0
resmlp_12_224,pass,0
resnest101e,pass,0
rexnet_100,pass,0
sebotnet33ts_256,pass,0
selecsls42b,pass,0
spnasnet_100,pass,0
swin_base_patch4_window7_224,pass,0
swsl_resnext101_32x16d,pass,0
tf_efficientnet_b0,pass,0
tf_mixnet_l,pass,0
tinynet_a,pass,0
tnt_s_patch16_224,pass,0
twins_pcpvt_base,pass,0
visformer_small,pass,0
vit_base_patch16_224,pass,0
volo_d1_224,pass,0
xcit_large_24_p8_224,pass,0

1 name accuracy graph_breaks
10 crossvit_9_240 mobilevit_s pass 0
11 cspdarknet53 nfnet_l0 pass 0
12 deit_base_distilled_patch16_224 repvgg_a2 pass 0
dla102 pass 0
dm_nfnet_f0 pass 0
dpn107 pass 0
eca_botnext26ts_256 pass 0
eca_halonext26ts pass 0
ese_vovnet19b_dw pass 0
fbnetc_100 pass 0
fbnetv3_b pass 0
gernet_l pass 0
ghostnet_100 pass 0
gluon_inception_v3 pass 0
gmixer_24_224 pass 0
gmlp_s16_224 pass 0
hrnet_w18 pass 0
inception_v3 pass 0
jx_nest_base pass 0
lcnet_050 pass 0
levit_128 pass 0
mixer_b16_224 pass 0
mixnet_l pass 0
mnasnet_100 pass 0
mobilenetv2_100 pass 0
mobilenetv3_large_100 pass 0
mobilevit_s pass 0
nfnet_l0 pass 0
pit_b_224 pass 0
pnasnet5large pass 0
poolformer_m36 pass 0
regnety_002 pass 0
repvgg_a2 pass 0
res2net101_26w_4s pass 0
res2net50_14w_8s pass 0
13 res2next50 swin_base_patch4_window7_224 pass 0
14 resmlp_12_224 tf_efficientnet_b0 pass 0
15 resnest101e visformer_small pass 0
16
sebotnet33ts_256 pass 0
selecsls42b pass 0
spnasnet_100 pass 0
swin_base_patch4_window7_224 pass 0
17
18
19
20
tnt_s_patch16_224 pass 0
twins_pcpvt_base pass 0
visformer_small pass 0
vit_base_patch16_224 pass 0
volo_d1_224 pass 0
xcit_large_24_p8_224 pass 0
21
22
23
24
25
26
27
28
29
30
31
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57

View File

@ -10,126 +10,22 @@ beit_base_patch16_224,pass,7
botnet26t_256,pass,6
cait_m36_384,eager_fail_to_run,0
coat_lite_mini,pass,6
convit_base,pass,7
convmixer_768_32,pass,5
convnext_base,pass,7
crossvit_9_240,pass,7
cspdarknet53,pass,7
deit_base_distilled_patch16_224,pass,7
dla102,pass,7
dm_nfnet_f0,pass,6
dpn107,pass,6
eca_botnext26ts_256,pass,7
eca_halonext26ts,pass,7
ese_vovnet19b_dw,pass,7
fbnetc_100,pass,7
fbnetv3_b,pass,6
gernet_l,pass,6
ghostnet_100,pass,6
gluon_inception_v3,pass,7
gmixer_24_224,pass,6
gmlp_s16_224,pass,7
hrnet_w18,pass,5
inception_v3,pass,6
jx_nest_base,pass,7
lcnet_050,fail_accuracy,6
levit_128,pass,7
mixer_b16_224,pass,7
mixnet_l,pass,6
mnasnet_100,pass,7
mobilenetv2_100,pass,7
@ -146,100 +42,16 @@ nfnet_l0,pass,7
pit_b_224,pass,6
pnasnet5large,pass,5
poolformer_m36,pass,6
regnety_002,pass,6
repvgg_a2,pass,7
res2net101_26w_4s,pass,6
res2net50_14w_8s,pass,6
res2next50,pass,6
resmlp_12_224,pass,6
resnest101e,pass,6
rexnet_100,pass,7
sebotnet33ts_256,pass,6
selecsls42b,pass,6
spnasnet_100,pass,7
swin_base_patch4_window7_224,pass,7
swsl_resnext101_32x16d,pass,6
tf_efficientnet_b0,pass,6
tf_mixnet_l,pass,6
tinynet_a,pass,6
tnt_s_patch16_224,pass,7
twins_pcpvt_base,pass,7
visformer_small,pass,7
vit_base_patch16_224,pass,7
volo_d1_224,pass,7
xcit_large_24_p8_224,pass_due_to_skip,7

1 name accuracy graph_breaks
10 crossvit_9_240 mobilevit_s pass 7 6
11 cspdarknet53 nfnet_l0 pass 7
12 deit_base_distilled_patch16_224 repvgg_a2 pass 7
dla102 pass 7
dm_nfnet_f0 pass 6
dpn107 pass 6
eca_botnext26ts_256 pass 7
eca_halonext26ts pass 7
ese_vovnet19b_dw pass 7
fbnetc_100 pass 7
fbnetv3_b pass 6
gernet_l pass 6
ghostnet_100 pass 6
gluon_inception_v3 pass 7
gmixer_24_224 pass 6
gmlp_s16_224 pass 7
hrnet_w18 pass 5
inception_v3 pass 6
jx_nest_base pass 7
lcnet_050 fail_accuracy 6
levit_128 pass 7
mixer_b16_224 pass 7
mixnet_l pass 6
mnasnet_100 pass 7
mobilenetv2_100 pass 7
mobilenetv3_large_100 pass 7
mobilevit_s pass 6
nfnet_l0 pass 7
pit_b_224 pass 6
pnasnet5large pass 5
poolformer_m36 pass 6
regnety_002 pass 6
repvgg_a2 pass 7
res2net101_26w_4s pass 6
res2net50_14w_8s pass 6
13 res2next50 swin_base_patch4_window7_224 pass 6 7
14 resmlp_12_224 tf_efficientnet_b0 pass 6
15 resnest101e visformer_small pass 6 7
16
sebotnet33ts_256 pass 6
selecsls42b pass 6
spnasnet_100 pass 7
swin_base_patch4_window7_224 pass 7
17
18
19
20
tnt_s_patch16_224 pass 7
twins_pcpvt_base pass 7
visformer_small pass 7
vit_base_patch16_224 pass 7
volo_d1_224 pass 7
xcit_large_24_p8_224 pass_due_to_skip 7
21
22
23
24
25
26
27
28
29
30
31
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57

View File

@ -6,58 +6,26 @@ AlbertForMaskedLM,pass,0
AlbertForQuestionAnswering,pass,0
BartForCausalLM,pass,0
BartForConditionalGeneration,pass,0
BertForMaskedLM,pass,0
BertForQuestionAnswering,pass,0
BlenderbotForCausalLM,pass_due_to_skip,0
BlenderbotSmallForCausalLM,pass,0
BlenderbotSmallForConditionalGeneration,pass,0
CamemBert,pass,0
DebertaV2ForMaskedLM,pass_due_to_skip,0
DebertaV2ForQuestionAnswering,pass,0
DistilBertForMaskedLM,pass,0
DistilBertForQuestionAnswering,pass,0
DistillGPT2,pass,0
@ -66,10 +34,6 @@ ElectraForCausalLM,pass,0
ElectraForQuestionAnswering,pass,0
GPT2ForSequenceClassification,pass,0
@ -82,10 +46,6 @@ LayoutLMForMaskedLM,pass,0
LayoutLMForSequenceClassification,pass,0
M2M100ForConditionalGeneration,pass,0
@ -94,10 +54,6 @@ MBartForCausalLM,pass,0
MBartForConditionalGeneration,pass,0
MT5ForConditionalGeneration,pass,0
@ -106,18 +62,10 @@ MegatronBertForCausalLM,pass,0
MegatronBertForQuestionAnswering,pass,0
MobileBertForMaskedLM,pass,0
MobileBertForQuestionAnswering,pass,0
OPTForCausalLM,pass,0
@ -126,26 +74,14 @@ PLBartForCausalLM,pass,0
PLBartForConditionalGeneration,pass,0
PegasusForCausalLM,pass,0
PegasusForConditionalGeneration,pass,0
RobertaForCausalLM,pass,0
RobertaForQuestionAnswering,pass,0
T5ForConditionalGeneration,pass,0

1 name accuracy graph_breaks
6 BertForMaskedLM DebertaV2ForMaskedLM pass pass_due_to_skip 0
7 BertForQuestionAnswering DistilBertForMaskedLM pass 0
8 BlenderbotForCausalLM DistillGPT2 pass_due_to_skip pass 0
BlenderbotSmallForCausalLM pass 0
BlenderbotSmallForConditionalGeneration pass 0
CamemBert pass 0
DebertaV2ForMaskedLM pass_due_to_skip 0
9 DebertaV2ForQuestionAnswering ElectraForCausalLM pass 0
10 DistilBertForMaskedLM GPT2ForSequenceClassification pass 0
11 DistilBertForQuestionAnswering GoogleFnet pass 0
12 DistillGPT2 LayoutLMForMaskedLM pass 0
ElectraForCausalLM pass 0
ElectraForQuestionAnswering pass 0
GPT2ForSequenceClassification pass 0
GoogleFnet pass 0
13 LayoutLMForMaskedLM M2M100ForConditionalGeneration pass 0
14 LayoutLMForSequenceClassification MBartForCausalLM pass 0
15 M2M100ForConditionalGeneration MT5ForConditionalGeneration pass 0
16 MBartForCausalLM MegatronBertForCausalLM pass 0
MBartForConditionalGeneration pass 0
MT5ForConditionalGeneration pass 0
MegatronBertForCausalLM pass 0
MegatronBertForQuestionAnswering pass 0
17 MobileBertForMaskedLM pass 0
18 MobileBertForQuestionAnswering OPTForCausalLM pass 0
19 OPTForCausalLM PLBartForCausalLM pass 0
20 PLBartForCausalLM PegasusForCausalLM pass 0
PLBartForConditionalGeneration pass 0
PegasusForCausalLM pass 0
PegasusForConditionalGeneration pass 0
RobertaForCausalLM pass 0
RobertaForQuestionAnswering pass 0
T5ForConditionalGeneration pass 0
T5Small pass 0
TrOCRForCausalLM pass 0
XGLMForCausalLM pass 0
XLNetLMHeadModel pass 0
YituTechConvBert pass 0
meta-llama/Llama-3.2-1B fail_to_run 0
21 google/gemma-2-2b RobertaForCausalLM fail_to_run pass 0
22 google/gemma-3-4b-it T5ForConditionalGeneration fail_to_run pass 0
23 openai/whisper-tiny T5Small fail_to_run pass 0
24 Qwen/Qwen3-0.6B TrOCRForCausalLM fail_to_run pass 0
mistralai/Mistral-7B-Instruct-v0.3 fail_to_run 0
openai/gpt-oss-20b fail_to_run 0
25 XGLMForCausalLM pass 0
26 XLNetLMHeadModel pass 0
27 YituTechConvBert pass 0
28 meta-llama/Llama-3.2-1B fail_to_run 0
29 google/gemma-2-2b fail_to_run 0
30 google/gemma-3-4b-it fail_to_run 0
31 openai/whisper-tiny fail_to_run 0
34 openai/gpt-oss-20b fail_to_run 0
35
36
37
38
39
46
47
48
49
50
51
54
55
56
57
58
59
62
63
64
65
66
67
68
69
70
71
74
75
76
77
78
79
80
81
82
83
84
85
86
87

View File

@ -10,126 +10,22 @@ beit_base_patch16_224,pass,0
botnet26t_256,pass,0
cait_m36_384,pass,0
coat_lite_mini,pass,0
convit_base,pass,0
convmixer_768_32,pass,0
convnext_base,pass,0
crossvit_9_240,pass,0
cspdarknet53,pass,0
deit_base_distilled_patch16_224,pass,0
dla102,pass,0
dm_nfnet_f0,pass,0
dpn107,pass,0
eca_botnext26ts_256,pass,0
eca_halonext26ts,pass,0
ese_vovnet19b_dw,pass,0
fbnetc_100,pass,0
fbnetv3_b,pass,0
gernet_l,pass,0
ghostnet_100,pass,0
gluon_inception_v3,pass,0
gmixer_24_224,pass,0
gmlp_s16_224,pass,0
hrnet_w18,pass,0
inception_v3,pass,0
jx_nest_base,pass,0
lcnet_050,pass,0
levit_128,pass,0
mixer_b16_224,pass,0
mixnet_l,pass,0
mnasnet_100,pass,0
mobilenetv2_100,pass,0
@ -146,100 +42,16 @@ nfnet_l0,pass,0
pit_b_224,pass,0
pnasnet5large,pass,0
poolformer_m36,pass,0
regnety_002,pass,0
repvgg_a2,pass,0
res2net101_26w_4s,pass,0
res2net50_14w_8s,pass,0
res2next50,pass,0
resmlp_12_224,pass,0
resnest101e,pass,0
rexnet_100,pass,0
sebotnet33ts_256,pass,0
selecsls42b,pass,0
spnasnet_100,pass,0
swin_base_patch4_window7_224,pass,0
swsl_resnext101_32x16d,pass,0
tf_efficientnet_b0,pass,0
tf_mixnet_l,pass,0
tinynet_a,pass,0
tnt_s_patch16_224,pass,0
twins_pcpvt_base,pass,0
visformer_small,pass,0
vit_base_patch16_224,pass,0
volo_d1_224,pass,0
xcit_large_24_p8_224,pass,0

1 name accuracy graph_breaks
10 crossvit_9_240 mobilevit_s pass 0
11 cspdarknet53 nfnet_l0 pass 0
12 deit_base_distilled_patch16_224 repvgg_a2 pass 0
dla102 pass 0
dm_nfnet_f0 pass 0
dpn107 pass 0
eca_botnext26ts_256 pass 0
eca_halonext26ts pass 0
ese_vovnet19b_dw pass 0
fbnetc_100 pass 0
fbnetv3_b pass 0
gernet_l pass 0
ghostnet_100 pass 0
gluon_inception_v3 pass 0
gmixer_24_224 pass 0
gmlp_s16_224 pass 0
hrnet_w18 pass 0
inception_v3 pass 0
jx_nest_base pass 0
lcnet_050 pass 0
levit_128 pass 0
mixer_b16_224 pass 0
mixnet_l pass 0
mnasnet_100 pass 0
mobilenetv2_100 pass 0
mobilenetv3_large_100 pass 0
mobilevit_s pass 0
nfnet_l0 pass 0
pit_b_224 pass 0
pnasnet5large pass 0
poolformer_m36 pass 0
regnety_002 pass 0
repvgg_a2 pass 0
res2net101_26w_4s pass 0
res2net50_14w_8s pass 0
13 res2next50 swin_base_patch4_window7_224 pass 0
14 resmlp_12_224 tf_efficientnet_b0 pass 0
15 resnest101e visformer_small pass 0
16
sebotnet33ts_256 pass 0
selecsls42b pass 0
spnasnet_100 pass 0
swin_base_patch4_window7_224 pass 0
17
18
19
20
tnt_s_patch16_224 pass 0
twins_pcpvt_base pass 0
visformer_small pass 0
vit_base_patch16_224 pass 0
volo_d1_224 pass 0
xcit_large_24_p8_224 pass 0
21
22
23
24
25
26
27
28
29
30
31
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57

View File

@ -6,58 +6,26 @@ AlbertForMaskedLM,pass,0
AlbertForQuestionAnswering,pass,0
BartForCausalLM,pass,0
BartForConditionalGeneration,pass,0
BertForMaskedLM,pass,0
BertForQuestionAnswering,pass,0
BlenderbotForCausalLM,pass_due_to_skip,0
BlenderbotSmallForCausalLM,pass,0
BlenderbotSmallForConditionalGeneration,pass,0
CamemBert,pass,0
DebertaV2ForMaskedLM,pass_due_to_skip,0
DebertaV2ForQuestionAnswering,pass,0
DistilBertForMaskedLM,pass,0
DistilBertForQuestionAnswering,pass,0
DistillGPT2,pass,0
@ -66,10 +34,6 @@ ElectraForCausalLM,pass,0
ElectraForQuestionAnswering,pass,0
GPT2ForSequenceClassification,pass,0
@ -82,10 +46,6 @@ LayoutLMForMaskedLM,pass,0
LayoutLMForSequenceClassification,pass,0
M2M100ForConditionalGeneration,pass,0
@ -94,10 +54,6 @@ MBartForCausalLM,pass,0
MBartForConditionalGeneration,pass,0
MT5ForConditionalGeneration,pass,0
@ -106,18 +62,10 @@ MegatronBertForCausalLM,pass,0
MegatronBertForQuestionAnswering,pass,0
MobileBertForMaskedLM,pass,0
MobileBertForQuestionAnswering,pass,0
OPTForCausalLM,pass,0
@ -126,26 +74,14 @@ PLBartForCausalLM,pass,0
PLBartForConditionalGeneration,pass,0
PegasusForCausalLM,pass,0
PegasusForConditionalGeneration,pass,0
RobertaForCausalLM,pass,0
RobertaForQuestionAnswering,pass,0
T5ForConditionalGeneration,pass,0

1 name accuracy graph_breaks
6 BertForMaskedLM DebertaV2ForMaskedLM pass pass_due_to_skip 0
7 BertForQuestionAnswering DistilBertForMaskedLM pass 0
8 BlenderbotForCausalLM DistillGPT2 pass_due_to_skip pass 0
BlenderbotSmallForCausalLM pass 0
BlenderbotSmallForConditionalGeneration pass 0
CamemBert pass 0
DebertaV2ForMaskedLM pass_due_to_skip 0
9 DebertaV2ForQuestionAnswering ElectraForCausalLM pass 0
10 DistilBertForMaskedLM GPT2ForSequenceClassification pass 0
11 DistilBertForQuestionAnswering GoogleFnet pass 0
12 DistillGPT2 LayoutLMForMaskedLM pass 0
ElectraForCausalLM pass 0
ElectraForQuestionAnswering pass 0
GPT2ForSequenceClassification pass 0
GoogleFnet pass 0
13 LayoutLMForMaskedLM M2M100ForConditionalGeneration pass 0
14 LayoutLMForSequenceClassification MBartForCausalLM pass 0
15 M2M100ForConditionalGeneration MT5ForConditionalGeneration pass 0
16 MBartForCausalLM MegatronBertForCausalLM pass 0
MBartForConditionalGeneration pass 0
MT5ForConditionalGeneration pass 0
MegatronBertForCausalLM pass 0
MegatronBertForQuestionAnswering pass 0
17 MobileBertForMaskedLM pass 0
18 MobileBertForQuestionAnswering OPTForCausalLM pass 0
19 OPTForCausalLM PLBartForCausalLM pass 0
20 PLBartForCausalLM PegasusForCausalLM pass 0
PLBartForConditionalGeneration pass 0
PegasusForCausalLM pass 0
PegasusForConditionalGeneration pass 0
RobertaForCausalLM pass 0
RobertaForQuestionAnswering pass 0
T5ForConditionalGeneration pass 0
T5Small pass 0
TrOCRForCausalLM pass 0
XGLMForCausalLM pass 0
XLNetLMHeadModel pass 0
YituTechConvBert pass 0
21 RobertaForCausalLM pass 0
22 T5ForConditionalGeneration pass 0
23 T5Small pass 0
24 TrOCRForCausalLM pass 0
25 XGLMForCausalLM pass 0
26 XLNetLMHeadModel pass 0
27 YituTechConvBert pass 0
28
29
30
31
34
35
36
37
38
39
46
47
48
49
50
51
54
55
56
57
58
59
62
63
64
65
66
67
68
69
70
71
74
75
76
77
78
79
80
81
82
83
84
85
86
87

View File

@ -10,126 +10,22 @@ beit_base_patch16_224,pass,0
botnet26t_256,pass,0
cait_m36_384,pass,0
coat_lite_mini,pass,0
convit_base,pass,0
convmixer_768_32,pass,0
convnext_base,pass,0
crossvit_9_240,pass,0
cspdarknet53,pass,0
deit_base_distilled_patch16_224,pass,0
dla102,pass,0
dm_nfnet_f0,pass,0
dpn107,pass,0
eca_botnext26ts_256,pass,0
eca_halonext26ts,pass,0
ese_vovnet19b_dw,pass,0
fbnetc_100,pass,0
fbnetv3_b,pass,0
gernet_l,pass,0
ghostnet_100,pass,0
gluon_inception_v3,pass,0
gmixer_24_224,pass,0
gmlp_s16_224,pass,0
hrnet_w18,pass,0
inception_v3,pass,0
jx_nest_base,pass,0
lcnet_050,pass,0
levit_128,pass,0
mixer_b16_224,pass,0
mixnet_l,pass,0
mnasnet_100,pass,0
mobilenetv2_100,pass,0
@ -146,100 +42,16 @@ nfnet_l0,pass,0
pit_b_224,pass,0
pnasnet5large,pass,0
poolformer_m36,pass,0
regnety_002,pass,0
repvgg_a2,pass,0
res2net101_26w_4s,pass,0
res2net50_14w_8s,pass,0
res2next50,pass,0
resmlp_12_224,pass,0
resnest101e,pass,0
rexnet_100,pass,0
sebotnet33ts_256,pass,0
selecsls42b,pass,0
spnasnet_100,pass,0
swin_base_patch4_window7_224,pass,0
swsl_resnext101_32x16d,pass,0
tf_efficientnet_b0,pass,0
tf_mixnet_l,pass,0
tinynet_a,pass,0
tnt_s_patch16_224,pass,0
twins_pcpvt_base,pass,0
visformer_small,pass,0
vit_base_patch16_224,pass,0
volo_d1_224,pass,0
xcit_large_24_p8_224,pass,0

1 name accuracy graph_breaks
10 crossvit_9_240 mobilevit_s pass 0
11 cspdarknet53 nfnet_l0 pass 0
12 deit_base_distilled_patch16_224 repvgg_a2 pass 0
dla102 pass 0
dm_nfnet_f0 pass 0
dpn107 pass 0
eca_botnext26ts_256 pass 0
eca_halonext26ts pass 0
ese_vovnet19b_dw pass 0
fbnetc_100 pass 0
fbnetv3_b pass 0
gernet_l pass 0
ghostnet_100 pass 0
gluon_inception_v3 pass 0
gmixer_24_224 pass 0
gmlp_s16_224 pass 0
hrnet_w18 pass 0
inception_v3 pass 0
jx_nest_base pass 0
lcnet_050 pass 0
levit_128 pass 0
mixer_b16_224 pass 0
mixnet_l pass 0
mnasnet_100 pass 0
mobilenetv2_100 pass 0
mobilenetv3_large_100 pass 0
mobilevit_s pass 0
nfnet_l0 pass 0
pit_b_224 pass 0
pnasnet5large pass 0
poolformer_m36 pass 0
regnety_002 pass 0
repvgg_a2 pass 0
res2net101_26w_4s pass 0
res2net50_14w_8s pass 0
13 res2next50 swin_base_patch4_window7_224 pass 0
14 resmlp_12_224 tf_efficientnet_b0 pass 0
15 resnest101e visformer_small pass 0
16
sebotnet33ts_256 pass 0
selecsls42b pass 0
spnasnet_100 pass 0
swin_base_patch4_window7_224 pass 0
17
18
19
20
tnt_s_patch16_224 pass 0
twins_pcpvt_base pass 0
visformer_small pass 0
vit_base_patch16_224 pass 0
volo_d1_224 pass 0
xcit_large_24_p8_224 pass 0
21
22
23
24
25
26
27
28
29
30
31
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57

View File

@ -6,10 +6,6 @@ AlbertForMaskedLM,pass,0
AlbertForQuestionAnswering,pass,0
AllenaiLongformerBase,pass,4
@ -18,50 +14,22 @@ BartForCausalLM,pass,0
BartForConditionalGeneration,pass,0
BertForMaskedLM,pass,0
BertForQuestionAnswering,pass,0
BlenderbotForCausalLM,pass_due_to_skip,0
BlenderbotSmallForCausalLM,pass,0
BlenderbotSmallForConditionalGeneration,pass,0
CamemBert,pass,0
DebertaV2ForMaskedLM,pass_due_to_skip,0
DebertaV2ForQuestionAnswering,pass,0
DistilBertForMaskedLM,pass,0
DistilBertForQuestionAnswering,pass,0
DistillGPT2,pass,2
@ -70,10 +38,6 @@ ElectraForCausalLM,pass,0
ElectraForQuestionAnswering,pass,0
GPT2ForSequenceClassification,pass,0
@ -86,10 +50,6 @@ LayoutLMForMaskedLM,pass,0
LayoutLMForSequenceClassification,pass,0
M2M100ForConditionalGeneration,pass,0
@ -98,10 +58,6 @@ MBartForCausalLM,pass,0
MBartForConditionalGeneration,pass,0
MT5ForConditionalGeneration,pass,0
@ -110,18 +66,10 @@ MegatronBertForCausalLM,pass,0
MegatronBertForQuestionAnswering,pass,0
MobileBertForMaskedLM,pass,0
MobileBertForQuestionAnswering,pass,0
OPTForCausalLM,pass,0
@ -130,26 +78,14 @@ PLBartForCausalLM,pass,0
PLBartForConditionalGeneration,pass,0
PegasusForCausalLM,pass,0
PegasusForConditionalGeneration,pass,0
RobertaForCausalLM,pass,0
RobertaForQuestionAnswering,pass,0
T5ForConditionalGeneration,pass,0

1 name accuracy graph_breaks
6 BartForConditionalGeneration BlenderbotForCausalLM pass pass_due_to_skip 0
7 BertForMaskedLM DebertaV2ForMaskedLM pass pass_due_to_skip 0
8 BertForQuestionAnswering DistilBertForMaskedLM pass 0
BlenderbotForCausalLM pass_due_to_skip 0
BlenderbotSmallForCausalLM pass 0
BlenderbotSmallForConditionalGeneration pass 0
CamemBert pass 0
9 DebertaV2ForMaskedLM DistillGPT2 pass_due_to_skip pass 0 2
10 DebertaV2ForQuestionAnswering ElectraForCausalLM pass 0
11 DistilBertForMaskedLM GPT2ForSequenceClassification pass 0
14 ElectraForCausalLM M2M100ForConditionalGeneration pass 0
15 ElectraForQuestionAnswering MBartForCausalLM pass 0
16 GPT2ForSequenceClassification MT5ForConditionalGeneration pass 0
GoogleFnet pass 0
LayoutLMForMaskedLM pass 0
LayoutLMForSequenceClassification pass 0
M2M100ForConditionalGeneration pass 0
17 MBartForCausalLM MegatronBertForCausalLM pass 0
18 MBartForConditionalGeneration MobileBertForMaskedLM pass 0
19 MT5ForConditionalGeneration OPTForCausalLM pass 0
20 MegatronBertForCausalLM PLBartForCausalLM pass 0
MegatronBertForQuestionAnswering pass 0
MobileBertForMaskedLM pass 0
MobileBertForQuestionAnswering pass 0
OPTForCausalLM pass 0
21 PLBartForCausalLM PegasusForCausalLM pass 0
22 PLBartForConditionalGeneration RobertaForCausalLM pass 0
23 PegasusForCausalLM T5ForConditionalGeneration pass 0
24 PegasusForConditionalGeneration T5Small pass 0
RobertaForCausalLM pass 0
RobertaForQuestionAnswering pass 0
T5ForConditionalGeneration pass 0
T5Small pass 0
TrOCRForCausalLM pass 0
XGLMForCausalLM pass 0
XLNetLMHeadModel pass 0
YituTechConvBert pass 0
meta-llama/Llama-3.2-1B pass_due_to_skip 0
google/gemma-2-2b pass_due_to_skip 0
google/gemma-3-4b-it pass_due_to_skip 0
openai/whisper-tiny pass_due_to_skip 0
25 Qwen/Qwen3-0.6B TrOCRForCausalLM pass_due_to_skip pass 0
26 mistralai/Mistral-7B-Instruct-v0.3 XGLMForCausalLM pass_due_to_skip pass 0
27 openai/gpt-oss-20b XLNetLMHeadModel pass_due_to_skip pass 0
28 YituTechConvBert pass 0
29 meta-llama/Llama-3.2-1B pass_due_to_skip 0
30 google/gemma-2-2b pass_due_to_skip 0
31 google/gemma-3-4b-it pass_due_to_skip 0
32 openai/whisper-tiny pass_due_to_skip 0
33 Qwen/Qwen3-0.6B pass_due_to_skip 0
34 mistralai/Mistral-7B-Instruct-v0.3 pass_due_to_skip 0
35 openai/gpt-oss-20b pass_due_to_skip 0
38
39
40
41
42
43
50
51
52
53
54
55
58
59
60
61
62
63
66
67
68
69
70
71
72
73
74
75
78
79
80
81
82
83
84
85
86
87
88
89
90
91

View File

@ -10,126 +10,22 @@ beit_base_patch16_224,pass,0
botnet26t_256,pass,0
cait_m36_384,pass,0
coat_lite_mini,pass,0
convit_base,pass,0
convmixer_768_32,pass,0
convnext_base,pass,0
crossvit_9_240,pass,0
cspdarknet53,pass,0
deit_base_distilled_patch16_224,pass,0
dla102,timeout,0
dm_nfnet_f0,pass,0
dpn107,pass,0
eca_botnext26ts_256,pass,0
eca_halonext26ts,pass,0
ese_vovnet19b_dw,pass,0
fbnetc_100,pass,0
fbnetv3_b,pass,0
gernet_l,pass,0
ghostnet_100,pass,0
gluon_inception_v3,pass,0
gmixer_24_224,pass,0
gmlp_s16_224,pass,0
hrnet_w18,pass,0
inception_v3,pass,0
jx_nest_base,pass,0
lcnet_050,pass,0
levit_128,pass,0
mixer_b16_224,pass,0
mixnet_l,pass,0
mnasnet_100,pass,0
mobilenetv2_100,pass,0
@ -146,100 +42,16 @@ nfnet_l0,pass,0
pit_b_224,pass,0
pnasnet5large,pass,0
poolformer_m36,pass,0
regnety_002,pass,0
repvgg_a2,pass,0
res2net101_26w_4s,pass,0
res2net50_14w_8s,pass,0
res2next50,pass,0
resmlp_12_224,pass,0
resnest101e,pass,0
rexnet_100,pass,0
sebotnet33ts_256,pass,0
selecsls42b,pass,0
spnasnet_100,pass,0
swin_base_patch4_window7_224,pass,0
swsl_resnext101_32x16d,pass,0
tf_efficientnet_b0,pass,0
tf_mixnet_l,pass,0
tinynet_a,pass,0
tnt_s_patch16_224,pass,0
twins_pcpvt_base,pass,0
visformer_small,pass,0
vit_base_patch16_224,pass,0
volo_d1_224,pass,0
xcit_large_24_p8_224,pass,0

1 name accuracy graph_breaks
10 crossvit_9_240 mobilevit_s pass 0
11 cspdarknet53 nfnet_l0 pass 0
12 deit_base_distilled_patch16_224 repvgg_a2 pass 0
dla102 timeout 0
dm_nfnet_f0 pass 0
dpn107 pass 0
eca_botnext26ts_256 pass 0
eca_halonext26ts pass 0
ese_vovnet19b_dw pass 0
fbnetc_100 pass 0
fbnetv3_b pass 0
gernet_l pass 0
ghostnet_100 pass 0
gluon_inception_v3 pass 0
gmixer_24_224 pass 0
gmlp_s16_224 pass 0
hrnet_w18 pass 0
inception_v3 pass 0
jx_nest_base pass 0
lcnet_050 pass 0
levit_128 pass 0
mixer_b16_224 pass 0
mixnet_l pass 0
mnasnet_100 pass 0
mobilenetv2_100 pass 0
mobilenetv3_large_100 pass 0
mobilevit_s pass 0
nfnet_l0 pass 0
pit_b_224 pass 0
pnasnet5large pass 0
poolformer_m36 pass 0
regnety_002 pass 0
repvgg_a2 pass 0
res2net101_26w_4s pass 0
res2net50_14w_8s pass 0
13 res2next50 swin_base_patch4_window7_224 pass 0
14 resmlp_12_224 tf_efficientnet_b0 pass 0
15 resnest101e visformer_small pass 0
16
sebotnet33ts_256 pass 0
selecsls42b pass 0
spnasnet_100 pass 0
swin_base_patch4_window7_224 pass 0
17
18
19
20
tnt_s_patch16_224 pass 0
twins_pcpvt_base pass 0
visformer_small pass 0
vit_base_patch16_224 pass 0
volo_d1_224 pass 0
xcit_large_24_p8_224 pass 0
21
22
23
24
25
26
27
28
29
30
31
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57

View File

@ -6,10 +6,6 @@ AlbertForMaskedLM,pass,0
AlbertForQuestionAnswering,pass,0
AllenaiLongformerBase,pass,4
@ -18,50 +14,22 @@ BartForCausalLM,pass,0
BartForConditionalGeneration,pass,0
BertForMaskedLM,pass,0
BertForQuestionAnswering,pass,0
BlenderbotForCausalLM,pass_due_to_skip,0
BlenderbotSmallForCausalLM,pass,0
BlenderbotSmallForConditionalGeneration,pass,0
CamemBert,pass,0
DebertaV2ForMaskedLM,pass_due_to_skip,0
DebertaV2ForQuestionAnswering,pass,0
DistilBertForMaskedLM,pass,0
DistilBertForQuestionAnswering,pass,0
DistillGPT2,pass,2
@ -70,10 +38,6 @@ ElectraForCausalLM,pass,0
ElectraForQuestionAnswering,pass,0
GPT2ForSequenceClassification,pass,0
@ -86,10 +50,6 @@ LayoutLMForMaskedLM,pass,0
LayoutLMForSequenceClassification,pass,0
M2M100ForConditionalGeneration,pass,0
@ -98,10 +58,6 @@ MBartForCausalLM,pass,0
MBartForConditionalGeneration,pass,0
MT5ForConditionalGeneration,pass,0
@ -110,18 +66,10 @@ MegatronBertForCausalLM,pass,0
MegatronBertForQuestionAnswering,pass,0
MobileBertForMaskedLM,pass,0
MobileBertForQuestionAnswering,pass,0
OPTForCausalLM,pass,0
@ -130,26 +78,14 @@ PLBartForCausalLM,pass,0
PLBartForConditionalGeneration,pass,0
PegasusForCausalLM,pass,0
PegasusForConditionalGeneration,pass,0
RobertaForCausalLM,pass,0
RobertaForQuestionAnswering,pass,0
T5ForConditionalGeneration,pass,0

1 name accuracy graph_breaks
6 BartForConditionalGeneration BlenderbotForCausalLM pass pass_due_to_skip 0
7 BertForMaskedLM DebertaV2ForMaskedLM pass pass_due_to_skip 0
8 BertForQuestionAnswering DistilBertForMaskedLM pass 0
BlenderbotForCausalLM pass_due_to_skip 0
BlenderbotSmallForCausalLM pass 0
BlenderbotSmallForConditionalGeneration pass 0
CamemBert pass 0
9 DebertaV2ForMaskedLM DistillGPT2 pass_due_to_skip pass 0 2
10 DebertaV2ForQuestionAnswering ElectraForCausalLM pass 0
11 DistilBertForMaskedLM GPT2ForSequenceClassification pass 0
14 ElectraForCausalLM M2M100ForConditionalGeneration pass 0
15 ElectraForQuestionAnswering MBartForCausalLM pass 0
16 GPT2ForSequenceClassification MT5ForConditionalGeneration pass 0
GoogleFnet pass 0
LayoutLMForMaskedLM pass 0
LayoutLMForSequenceClassification pass 0
M2M100ForConditionalGeneration pass 0
17 MBartForCausalLM MegatronBertForCausalLM pass 0
18 MBartForConditionalGeneration MobileBertForMaskedLM pass 0
19 MT5ForConditionalGeneration OPTForCausalLM pass 0
20 MegatronBertForCausalLM PLBartForCausalLM pass 0
MegatronBertForQuestionAnswering pass 0
MobileBertForMaskedLM pass 0
MobileBertForQuestionAnswering pass 0
OPTForCausalLM pass 0
21 PLBartForCausalLM PegasusForCausalLM pass 0
22 PLBartForConditionalGeneration RobertaForCausalLM pass 0
23 PegasusForCausalLM T5ForConditionalGeneration pass 0
24 PegasusForConditionalGeneration T5Small pass 0
RobertaForCausalLM pass 0
RobertaForQuestionAnswering pass 0
T5ForConditionalGeneration pass 0
T5Small pass 0
TrOCRForCausalLM pass 0
XGLMForCausalLM pass 0
XLNetLMHeadModel pass 0
YituTechConvBert pass 0
meta-llama/Llama-3.2-1B pass_due_to_skip 0
google/gemma-2-2b pass_due_to_skip 0
google/gemma-3-4b-it pass_due_to_skip 0
openai/whisper-tiny pass_due_to_skip 0
25 Qwen/Qwen3-0.6B TrOCRForCausalLM pass_due_to_skip pass 0
26 mistralai/Mistral-7B-Instruct-v0.3 XGLMForCausalLM pass_due_to_skip pass 0
27 openai/gpt-oss-20b XLNetLMHeadModel pass_due_to_skip pass 0
28 YituTechConvBert pass 0
29 meta-llama/Llama-3.2-1B pass_due_to_skip 0
30 google/gemma-2-2b pass_due_to_skip 0
31 google/gemma-3-4b-it pass_due_to_skip 0
32 openai/whisper-tiny pass_due_to_skip 0
33 Qwen/Qwen3-0.6B pass_due_to_skip 0
34 mistralai/Mistral-7B-Instruct-v0.3 pass_due_to_skip 0
35 openai/gpt-oss-20b pass_due_to_skip 0
38
39
40
41
42
43
50
51
52
53
54
55
58
59
60
61
62
63
66
67
68
69
70
71
72
73
74
75
78
79
80
81
82
83
84
85
86
87
88
89
90
91

View File

@ -10,126 +10,22 @@ beit_base_patch16_224,pass,0
botnet26t_256,pass,0
cait_m36_384,pass,0
coat_lite_mini,pass,0
convit_base,pass,0
convmixer_768_32,pass,0
convnext_base,pass,0
crossvit_9_240,pass,0
cspdarknet53,pass,0
deit_base_distilled_patch16_224,pass,0
dla102,timeout,0
dm_nfnet_f0,pass,0
dpn107,pass,0
eca_botnext26ts_256,pass,0
eca_halonext26ts,pass,0
ese_vovnet19b_dw,pass,0
fbnetc_100,pass,0
fbnetv3_b,pass,0
gernet_l,pass,0
ghostnet_100,pass,0
gluon_inception_v3,pass,0
gmixer_24_224,pass,0
gmlp_s16_224,pass,0
hrnet_w18,pass,0
inception_v3,pass,0
jx_nest_base,pass,0
lcnet_050,pass,0
levit_128,pass,0
mixer_b16_224,pass,0
mixnet_l,pass,0
mnasnet_100,pass,0
mobilenetv2_100,pass,0
@ -146,100 +42,16 @@ nfnet_l0,pass,0
pit_b_224,pass,0
pnasnet5large,pass,0
poolformer_m36,pass,0
regnety_002,pass,0
repvgg_a2,pass,0
res2net101_26w_4s,pass,0
res2net50_14w_8s,pass,0
res2next50,pass,0
resmlp_12_224,pass,0
resnest101e,pass,0
rexnet_100,pass,0
sebotnet33ts_256,pass,0
selecsls42b,pass,0
spnasnet_100,pass,0
swin_base_patch4_window7_224,pass,0
swsl_resnext101_32x16d,pass,0
tf_efficientnet_b0,pass,0
tf_mixnet_l,pass,0
tinynet_a,pass,0
tnt_s_patch16_224,pass,0
twins_pcpvt_base,pass,0
visformer_small,pass,0
vit_base_patch16_224,pass,0
volo_d1_224,pass,0
xcit_large_24_p8_224,pass,0

1 name accuracy graph_breaks
10 crossvit_9_240 mobilevit_s pass 0
11 cspdarknet53 nfnet_l0 pass 0
12 deit_base_distilled_patch16_224 repvgg_a2 pass 0
dla102 timeout 0
dm_nfnet_f0 pass 0
dpn107 pass 0
eca_botnext26ts_256 pass 0
eca_halonext26ts pass 0
ese_vovnet19b_dw pass 0
fbnetc_100 pass 0
fbnetv3_b pass 0
gernet_l pass 0
ghostnet_100 pass 0
gluon_inception_v3 pass 0
gmixer_24_224 pass 0
gmlp_s16_224 pass 0
hrnet_w18 pass 0
inception_v3 pass 0
jx_nest_base pass 0
lcnet_050 pass 0
levit_128 pass 0
mixer_b16_224 pass 0
mixnet_l pass 0
mnasnet_100 pass 0
mobilenetv2_100 pass 0
mobilenetv3_large_100 pass 0
mobilevit_s pass 0
nfnet_l0 pass 0
pit_b_224 pass 0
pnasnet5large pass 0
poolformer_m36 pass 0
regnety_002 pass 0
repvgg_a2 pass 0
res2net101_26w_4s pass 0
res2net50_14w_8s pass 0
13 res2next50 swin_base_patch4_window7_224 pass 0
14 resmlp_12_224 tf_efficientnet_b0 pass 0
15 resnest101e visformer_small pass 0
16
sebotnet33ts_256 pass 0
selecsls42b pass 0
spnasnet_100 pass 0
swin_base_patch4_window7_224 pass 0
17
18
19
20
tnt_s_patch16_224 pass 0
twins_pcpvt_base pass 0
visformer_small pass 0
vit_base_patch16_224 pass 0
volo_d1_224 pass 0
xcit_large_24_p8_224 pass 0
21
22
23
24
25
26
27
28
29
30
31
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57

View File

@ -6,10 +6,6 @@ AlbertForMaskedLM,pass,0
AlbertForQuestionAnswering,pass,0
AllenaiLongformerBase,pass,4
@ -18,50 +14,22 @@ BartForCausalLM,pass,0
BartForConditionalGeneration,pass,0
BertForMaskedLM,pass,0
BertForQuestionAnswering,pass,0
BlenderbotForCausalLM,pass_due_to_skip,0
BlenderbotSmallForCausalLM,pass,0
BlenderbotSmallForConditionalGeneration,pass,0
CamemBert,pass,0
DebertaV2ForMaskedLM,pass_due_to_skip,0
DebertaV2ForQuestionAnswering,pass,0
DistilBertForMaskedLM,pass,0
DistilBertForQuestionAnswering,pass,0
DistillGPT2,pass,2
@ -70,10 +38,6 @@ ElectraForCausalLM,pass,0
ElectraForQuestionAnswering,pass,0
GPT2ForSequenceClassification,pass,0
@ -86,10 +50,6 @@ LayoutLMForMaskedLM,pass,0
LayoutLMForSequenceClassification,pass,0
M2M100ForConditionalGeneration,pass,0
@ -98,10 +58,6 @@ MBartForCausalLM,pass,0
MBartForConditionalGeneration,pass,0
MT5ForConditionalGeneration,pass,0
@ -110,18 +66,10 @@ MegatronBertForCausalLM,pass,0
MegatronBertForQuestionAnswering,pass,0
MobileBertForMaskedLM,pass,0
MobileBertForQuestionAnswering,pass,0
OPTForCausalLM,pass,0
@ -130,26 +78,14 @@ PLBartForCausalLM,pass,0
PLBartForConditionalGeneration,pass,0
PegasusForCausalLM,pass,0
PegasusForConditionalGeneration,pass,0
RobertaForCausalLM,pass,0
RobertaForQuestionAnswering,pass,0
T5ForConditionalGeneration,pass,0

1 name accuracy graph_breaks
6 BartForConditionalGeneration BlenderbotForCausalLM pass pass_due_to_skip 0
7 BertForMaskedLM DebertaV2ForMaskedLM pass pass_due_to_skip 0
8 BertForQuestionAnswering DistilBertForMaskedLM pass 0
BlenderbotForCausalLM pass_due_to_skip 0
BlenderbotSmallForCausalLM pass 0
BlenderbotSmallForConditionalGeneration pass 0
CamemBert pass 0
9 DebertaV2ForMaskedLM DistillGPT2 pass_due_to_skip pass 0 2
10 DebertaV2ForQuestionAnswering ElectraForCausalLM pass 0
11 DistilBertForMaskedLM GPT2ForSequenceClassification pass 0
14 ElectraForCausalLM M2M100ForConditionalGeneration pass 0
15 ElectraForQuestionAnswering MBartForCausalLM pass 0
16 GPT2ForSequenceClassification MT5ForConditionalGeneration pass 0
GoogleFnet pass 0
LayoutLMForMaskedLM pass 0
LayoutLMForSequenceClassification pass 0
M2M100ForConditionalGeneration pass 0
17 MBartForCausalLM MegatronBertForCausalLM pass 0
18 MBartForConditionalGeneration MobileBertForMaskedLM pass 0
19 MT5ForConditionalGeneration OPTForCausalLM pass 0
20 MegatronBertForCausalLM PLBartForCausalLM pass 0
MegatronBertForQuestionAnswering pass 0
MobileBertForMaskedLM pass 0
MobileBertForQuestionAnswering pass 0
OPTForCausalLM pass 0
21 PLBartForCausalLM PegasusForCausalLM pass 0
22 PLBartForConditionalGeneration RobertaForCausalLM pass 0
23 PegasusForCausalLM T5ForConditionalGeneration pass 0
24 PegasusForConditionalGeneration T5Small pass 0
RobertaForCausalLM pass 0
RobertaForQuestionAnswering pass 0
T5ForConditionalGeneration pass 0
T5Small pass 0
TrOCRForCausalLM pass 0
XGLMForCausalLM pass 0
XLNetLMHeadModel pass 0
YituTechConvBert pass 0
meta-llama/Llama-3.2-1B pass_due_to_skip 0
google/gemma-2-2b pass_due_to_skip 0
google/gemma-3-4b-it pass_due_to_skip 0
openai/whisper-tiny pass_due_to_skip 0
25 Qwen/Qwen3-0.6B TrOCRForCausalLM pass_due_to_skip pass 0
26 mistralai/Mistral-7B-Instruct-v0.3 XGLMForCausalLM pass_due_to_skip pass 0
27 openai/gpt-oss-20b XLNetLMHeadModel pass_due_to_skip pass 0
28 YituTechConvBert pass 0
29 meta-llama/Llama-3.2-1B pass_due_to_skip 0
30 google/gemma-2-2b pass_due_to_skip 0
31 google/gemma-3-4b-it pass_due_to_skip 0
32 openai/whisper-tiny pass_due_to_skip 0
33 Qwen/Qwen3-0.6B pass_due_to_skip 0
34 mistralai/Mistral-7B-Instruct-v0.3 pass_due_to_skip 0
35 openai/gpt-oss-20b pass_due_to_skip 0
38
39
40
41
42
43
50
51
52
53
54
55
58
59
60
61
62
63
66
67
68
69
70
71
72
73
74
75
78
79
80
81
82
83
84
85
86
87
88
89
90
91

View File

@ -10,126 +10,22 @@ beit_base_patch16_224,pass,0
botnet26t_256,pass,0
cait_m36_384,pass,0
coat_lite_mini,pass,0
convit_base,pass,0
convmixer_768_32,pass,0
convnext_base,pass,0
crossvit_9_240,pass,0
cspdarknet53,pass,0
deit_base_distilled_patch16_224,pass,0
dla102,pass,0
dm_nfnet_f0,pass,0
dpn107,pass,0
eca_botnext26ts_256,pass,0
eca_halonext26ts,pass,0
ese_vovnet19b_dw,pass,0
fbnetc_100,pass,0
fbnetv3_b,pass,0
gernet_l,pass,0
ghostnet_100,pass,0
gluon_inception_v3,pass,0
gmixer_24_224,pass,0
gmlp_s16_224,pass,0
hrnet_w18,pass,0
inception_v3,pass,0
jx_nest_base,pass,0
lcnet_050,pass,0
levit_128,pass,0
mixer_b16_224,pass,0
mixnet_l,pass,0
mnasnet_100,pass,0
mobilenetv2_100,pass,0
@ -146,100 +42,16 @@ nfnet_l0,pass,0
pit_b_224,pass,0
pnasnet5large,pass,0
poolformer_m36,pass,0
regnety_002,pass,0
repvgg_a2,pass,0
res2net101_26w_4s,pass,0
res2net50_14w_8s,pass,0
res2next50,pass,0
resmlp_12_224,pass,0
resnest101e,pass,0
rexnet_100,pass,0
sebotnet33ts_256,pass,0
selecsls42b,pass,0
spnasnet_100,pass,0
swin_base_patch4_window7_224,pass,0
swsl_resnext101_32x16d,pass,0
tf_efficientnet_b0,pass,0
tf_mixnet_l,pass,0
tinynet_a,pass,0
tnt_s_patch16_224,pass,0
twins_pcpvt_base,pass,0
visformer_small,pass,0
vit_base_patch16_224,pass,0
volo_d1_224,pass,0
xcit_large_24_p8_224,pass,0

1 name accuracy graph_breaks
10 crossvit_9_240 mobilevit_s pass 0
11 cspdarknet53 nfnet_l0 pass 0
12 deit_base_distilled_patch16_224 repvgg_a2 pass 0
dla102 pass 0
dm_nfnet_f0 pass 0
dpn107 pass 0
eca_botnext26ts_256 pass 0
eca_halonext26ts pass 0
ese_vovnet19b_dw pass 0
fbnetc_100 pass 0
fbnetv3_b pass 0
gernet_l pass 0
ghostnet_100 pass 0
gluon_inception_v3 pass 0
gmixer_24_224 pass 0
gmlp_s16_224 pass 0
hrnet_w18 pass 0
inception_v3 pass 0
jx_nest_base pass 0
lcnet_050 pass 0
levit_128 pass 0
mixer_b16_224 pass 0
mixnet_l pass 0
mnasnet_100 pass 0
mobilenetv2_100 pass 0
mobilenetv3_large_100 pass 0
mobilevit_s pass 0
nfnet_l0 pass 0
pit_b_224 pass 0
pnasnet5large pass 0
poolformer_m36 pass 0
regnety_002 pass 0
repvgg_a2 pass 0
res2net101_26w_4s pass 0
res2net50_14w_8s pass 0
13 res2next50 swin_base_patch4_window7_224 pass 0
14 resmlp_12_224 tf_efficientnet_b0 pass 0
15 resnest101e visformer_small pass 0
16
sebotnet33ts_256 pass 0
selecsls42b pass 0
spnasnet_100 pass 0
swin_base_patch4_window7_224 pass 0
17
18
19
20
tnt_s_patch16_224 pass 0
twins_pcpvt_base pass 0
visformer_small pass 0
vit_base_patch16_224 pass 0
volo_d1_224 pass 0
xcit_large_24_p8_224 pass 0
21
22
23
24
25
26
27
28
29
30
31
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57

View File

@ -6,10 +6,6 @@ AlbertForMaskedLM,pass,0
AlbertForQuestionAnswering,pass,0
AllenaiLongformerBase,pass,4
@ -18,50 +14,22 @@ BartForCausalLM,pass,0
BartForConditionalGeneration,pass,0
BertForMaskedLM,pass,0
BertForQuestionAnswering,pass,0
BlenderbotForCausalLM,pass_due_to_skip,0
BlenderbotSmallForCausalLM,pass,0
BlenderbotSmallForConditionalGeneration,pass,0
CamemBert,pass,0
DebertaV2ForMaskedLM,pass_due_to_skip,0
DebertaV2ForQuestionAnswering,pass,0
DistilBertForMaskedLM,pass,0
DistilBertForQuestionAnswering,pass,0
DistillGPT2,pass,2
@ -70,10 +38,6 @@ ElectraForCausalLM,pass,0
ElectraForQuestionAnswering,pass,0
GPT2ForSequenceClassification,pass,0
@ -86,10 +50,6 @@ LayoutLMForMaskedLM,pass,0
LayoutLMForSequenceClassification,pass,0
M2M100ForConditionalGeneration,pass,0
@ -98,10 +58,6 @@ MBartForCausalLM,pass,0
MBartForConditionalGeneration,pass,0
MT5ForConditionalGeneration,pass,0
@ -110,18 +66,10 @@ MegatronBertForCausalLM,pass,0
MegatronBertForQuestionAnswering,pass,0
MobileBertForMaskedLM,pass,0
MobileBertForQuestionAnswering,pass,0
OPTForCausalLM,pass,0
@ -130,26 +78,14 @@ PLBartForCausalLM,pass,0
PLBartForConditionalGeneration,pass,0
PegasusForCausalLM,pass,0
PegasusForConditionalGeneration,pass,0
RobertaForCausalLM,pass,0
RobertaForQuestionAnswering,pass,0
T5ForConditionalGeneration,pass,0

1 name accuracy graph_breaks
6 BartForConditionalGeneration BlenderbotForCausalLM pass pass_due_to_skip 0
7 BertForMaskedLM DebertaV2ForMaskedLM pass pass_due_to_skip 0
8 BertForQuestionAnswering DistilBertForMaskedLM pass 0
BlenderbotForCausalLM pass_due_to_skip 0
BlenderbotSmallForCausalLM pass 0
BlenderbotSmallForConditionalGeneration pass 0
CamemBert pass 0
9 DebertaV2ForMaskedLM DistillGPT2 pass_due_to_skip pass 0 2
10 DebertaV2ForQuestionAnswering ElectraForCausalLM pass 0
11 DistilBertForMaskedLM GPT2ForSequenceClassification pass 0
14 ElectraForCausalLM M2M100ForConditionalGeneration pass 0
15 ElectraForQuestionAnswering MBartForCausalLM pass 0
16 GPT2ForSequenceClassification MT5ForConditionalGeneration pass 0
GoogleFnet pass 0
LayoutLMForMaskedLM pass 0
LayoutLMForSequenceClassification pass 0
M2M100ForConditionalGeneration pass 0
17 MBartForCausalLM MegatronBertForCausalLM pass 0
18 MBartForConditionalGeneration MobileBertForMaskedLM pass 0
19 MT5ForConditionalGeneration OPTForCausalLM pass 0
20 MegatronBertForCausalLM PLBartForCausalLM pass 0
MegatronBertForQuestionAnswering pass 0
MobileBertForMaskedLM pass 0
MobileBertForQuestionAnswering pass 0
OPTForCausalLM pass 0
21 PLBartForCausalLM PegasusForCausalLM pass 0
22 PLBartForConditionalGeneration RobertaForCausalLM pass 0
23 PegasusForCausalLM T5ForConditionalGeneration pass 0
24 PegasusForConditionalGeneration T5Small pass 0
RobertaForCausalLM pass 0
RobertaForQuestionAnswering pass 0
T5ForConditionalGeneration pass 0
T5Small pass 0
TrOCRForCausalLM pass 0
XGLMForCausalLM pass 0
XLNetLMHeadModel pass 0
YituTechConvBert pass 0
meta-llama/Llama-3.2-1B pass 0
google/gemma-2-2b pass 0
google/gemma-3-4b-it pass_due_to_skip 0
openai/whisper-tiny pass 0
25 Qwen/Qwen3-0.6B TrOCRForCausalLM pass 0
26 mistralai/Mistral-7B-Instruct-v0.3 XGLMForCausalLM pass 0
27 openai/gpt-oss-20b XLNetLMHeadModel pass 0
28 YituTechConvBert pass 0
29 meta-llama/Llama-3.2-1B pass 0
30 google/gemma-2-2b pass 0
31 google/gemma-3-4b-it pass_due_to_skip 0
32 openai/whisper-tiny pass 0
33 Qwen/Qwen3-0.6B pass 0
34 mistralai/Mistral-7B-Instruct-v0.3 pass 0
35 openai/gpt-oss-20b pass 0
38
39
40
41
42
43
50
51
52
53
54
55
58
59
60
61
62
63
66
67
68
69
70
71
72
73
74
75
78
79
80
81
82
83
84
85
86
87
88
89
90
91

View File

@ -6,10 +6,6 @@ AlbertForMaskedLM,pass,4
AlbertForQuestionAnswering,pass,5
AllenaiLongformerBase,pass,9
@ -18,50 +14,22 @@ BartForCausalLM,pass,6
BartForConditionalGeneration,pass,8
BertForMaskedLM,pass,5
BertForQuestionAnswering,pass,5
BlenderbotForCausalLM,eager_fail_to_run,0
BlenderbotSmallForCausalLM,pass,6
BlenderbotSmallForConditionalGeneration,pass,8
CamemBert,pass,5
DebertaV2ForMaskedLM,pass_due_to_skip,0
DebertaV2ForQuestionAnswering,eager_1st_run_OOM,0
DistilBertForMaskedLM,pass,5
DistilBertForQuestionAnswering,pass,5
DistillGPT2,pass,7
@ -70,10 +38,6 @@ ElectraForCausalLM,pass,4
ElectraForQuestionAnswering,pass,5
GPT2ForSequenceClassification,pass,6
@ -86,10 +50,6 @@ LayoutLMForMaskedLM,pass,5
LayoutLMForSequenceClassification,pass,6
M2M100ForConditionalGeneration,pass,4
@ -98,10 +58,6 @@ MBartForCausalLM,pass,6
MBartForConditionalGeneration,pass,8
MT5ForConditionalGeneration,pass,5
@ -110,18 +66,10 @@ MegatronBertForCausalLM,pass,5
MegatronBertForQuestionAnswering,pass,5
MobileBertForMaskedLM,pass,3
MobileBertForQuestionAnswering,pass,3
OPTForCausalLM,pass,8
@ -130,26 +78,14 @@ PLBartForCausalLM,pass,6
PLBartForConditionalGeneration,pass,8
PegasusForCausalLM,pass,6
PegasusForConditionalGeneration,pass,7
RobertaForCausalLM,pass,5
RobertaForQuestionAnswering,pass,5
T5ForConditionalGeneration,pass,5

1 name accuracy graph_breaks
6 BartForConditionalGeneration BlenderbotForCausalLM pass eager_fail_to_run 8 0
7 BertForMaskedLM DebertaV2ForMaskedLM pass pass_due_to_skip 5 0
8 BertForQuestionAnswering DistilBertForMaskedLM pass 5
BlenderbotForCausalLM eager_fail_to_run 0
BlenderbotSmallForCausalLM pass 6
BlenderbotSmallForConditionalGeneration pass 8
CamemBert pass 5
9 DebertaV2ForMaskedLM DistillGPT2 pass_due_to_skip pass 0 7
10 DebertaV2ForQuestionAnswering ElectraForCausalLM eager_1st_run_OOM pass 0 4
11 DistilBertForMaskedLM GPT2ForSequenceClassification pass 5 6
14 ElectraForCausalLM M2M100ForConditionalGeneration pass 4
15 ElectraForQuestionAnswering MBartForCausalLM pass 5 6
16 GPT2ForSequenceClassification MT5ForConditionalGeneration pass 6 5
GoogleFnet pass 5
LayoutLMForMaskedLM pass 5
LayoutLMForSequenceClassification pass 6
M2M100ForConditionalGeneration pass 4
17 MBartForCausalLM MegatronBertForCausalLM pass 6 5
18 MBartForConditionalGeneration MobileBertForMaskedLM pass 8 3
19 MT5ForConditionalGeneration OPTForCausalLM pass 5 8
20 MegatronBertForCausalLM PLBartForCausalLM pass 5 6
MegatronBertForQuestionAnswering pass 5
MobileBertForMaskedLM pass 3
MobileBertForQuestionAnswering pass 3
OPTForCausalLM pass 8
21 PLBartForCausalLM PegasusForCausalLM pass 6
22 PLBartForConditionalGeneration RobertaForCausalLM pass 8 5
23 PegasusForCausalLM T5ForConditionalGeneration pass 6 5
24 PegasusForConditionalGeneration T5Small pass 7 5
RobertaForCausalLM pass 5
RobertaForQuestionAnswering pass 5
T5ForConditionalGeneration pass 5
T5Small pass 5
TrOCRForCausalLM pass 6
XGLMForCausalLM pass 6
XLNetLMHeadModel pass 5
YituTechConvBert pass 5
meta-llama/Llama-3.2-1B eager_fail_to_run 0
google/gemma-2-2b eager_fail_to_run 0
google/gemma-3-4b-it eager_fail_to_run 0
openai/whisper-tiny eager_fail_to_run 0
25 Qwen/Qwen3-0.6B TrOCRForCausalLM eager_fail_to_run pass 0 6
26 XGLMForCausalLM pass 6
27 XLNetLMHeadModel pass 5
28 YituTechConvBert pass 5
29 meta-llama/Llama-3.2-1B eager_fail_to_run 0
30 google/gemma-2-2b eager_fail_to_run 0
31 google/gemma-3-4b-it eager_fail_to_run 0
32 openai/whisper-tiny eager_fail_to_run 0
33 Qwen/Qwen3-0.6B eager_fail_to_run 0
34
35
38
39
40
41
42
43
50
51
52
53
54
55
58
59
60
61
62
63
66
67
68
69
70
71
72
73
74
75
78
79
80
81
82
83
84
85
86
87
88
89
90
91

View File

@ -10,126 +10,22 @@ beit_base_patch16_224,pass,0
botnet26t_256,pass,0
cait_m36_384,pass,0
coat_lite_mini,pass,0
convit_base,pass,0
convmixer_768_32,pass,0
convnext_base,pass,0
crossvit_9_240,pass,0
cspdarknet53,pass,0
deit_base_distilled_patch16_224,pass,0
dla102,pass,0
dm_nfnet_f0,pass,0
dpn107,pass,0
eca_botnext26ts_256,pass,0
eca_halonext26ts,pass,0
ese_vovnet19b_dw,pass,0
fbnetc_100,pass,0
fbnetv3_b,pass,0
gernet_l,pass,0
ghostnet_100,pass,0
gluon_inception_v3,pass,0
gmixer_24_224,pass,0
gmlp_s16_224,pass,0
hrnet_w18,pass,0
inception_v3,pass,0
jx_nest_base,pass,0
lcnet_050,pass,0
levit_128,pass,0
mixer_b16_224,pass,0
mixnet_l,pass,0
mnasnet_100,pass,0
mobilenetv2_100,pass,0
@ -146,100 +42,16 @@ nfnet_l0,pass,0
pit_b_224,pass,0
pnasnet5large,pass,0
poolformer_m36,pass,0
regnety_002,pass,0
repvgg_a2,pass,0
res2net101_26w_4s,pass,0
res2net50_14w_8s,pass,0
res2next50,pass,0
resmlp_12_224,pass,0
resnest101e,pass,0
rexnet_100,pass,0
sebotnet33ts_256,pass,0
selecsls42b,pass,0
spnasnet_100,pass,0
swin_base_patch4_window7_224,pass,0
swsl_resnext101_32x16d,pass,0
tf_efficientnet_b0,pass,0
tf_mixnet_l,pass,0
tinynet_a,pass,0
tnt_s_patch16_224,pass,0
twins_pcpvt_base,pass,0
visformer_small,pass,0
vit_base_patch16_224,pass,0
volo_d1_224,pass,0
xcit_large_24_p8_224,pass,0

1 name accuracy graph_breaks
10 crossvit_9_240 mobilevit_s pass 0
11 cspdarknet53 nfnet_l0 pass 0
12 deit_base_distilled_patch16_224 repvgg_a2 pass 0
dla102 pass 0
dm_nfnet_f0 pass 0
dpn107 pass 0
eca_botnext26ts_256 pass 0
eca_halonext26ts pass 0
ese_vovnet19b_dw pass 0
fbnetc_100 pass 0
fbnetv3_b pass 0
gernet_l pass 0
ghostnet_100 pass 0
gluon_inception_v3 pass 0
gmixer_24_224 pass 0
gmlp_s16_224 pass 0
hrnet_w18 pass 0
inception_v3 pass 0
jx_nest_base pass 0
lcnet_050 pass 0
levit_128 pass 0
mixer_b16_224 pass 0
mixnet_l pass 0
mnasnet_100 pass 0
mobilenetv2_100 pass 0
mobilenetv3_large_100 pass 0
mobilevit_s pass 0
nfnet_l0 pass 0
pit_b_224 pass 0
pnasnet5large pass 0
poolformer_m36 pass 0
regnety_002 pass 0
repvgg_a2 pass 0
res2net101_26w_4s pass 0
res2net50_14w_8s pass 0
13 res2next50 swin_base_patch4_window7_224 pass 0
14 resmlp_12_224 tf_efficientnet_b0 pass 0
15 resnest101e visformer_small pass 0
16
sebotnet33ts_256 pass 0
selecsls42b pass 0
spnasnet_100 pass 0
swin_base_patch4_window7_224 pass 0
17
18
19
20
tnt_s_patch16_224 pass 0
twins_pcpvt_base pass 0
visformer_small pass 0
vit_base_patch16_224 pass 0
volo_d1_224 pass 0
xcit_large_24_p8_224 pass 0
21
22
23
24
25
26
27
28
29
30
31
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57

View File

@ -10,126 +10,22 @@ beit_base_patch16_224,pass,7
botnet26t_256,pass,6
cait_m36_384,eager_fail_to_run,0
coat_lite_mini,pass,6
convit_base,pass,7
convmixer_768_32,pass,5
convnext_base,pass,7
crossvit_9_240,pass,7
cspdarknet53,pass,7
deit_base_distilled_patch16_224,pass,7
dla102,pass,7
dm_nfnet_f0,pass,6
dpn107,pass,6
eca_botnext26ts_256,pass,7
eca_halonext26ts,pass,7
ese_vovnet19b_dw,pass,7
fbnetc_100,pass,7
fbnetv3_b,pass,6
gernet_l,pass,6
ghostnet_100,pass,6
gluon_inception_v3,pass,7
gmixer_24_224,pass,6
gmlp_s16_224,pass,7
hrnet_w18,pass,5
inception_v3,pass,6
jx_nest_base,pass,7
lcnet_050,fail_accuracy,6
levit_128,pass,7
mixer_b16_224,pass,7
mixnet_l,pass,6
mnasnet_100,pass,7
mobilenetv2_100,pass,7
@ -146,100 +42,16 @@ nfnet_l0,pass,7
pit_b_224,pass,6
pnasnet5large,pass,5
poolformer_m36,pass,6
regnety_002,pass,6
repvgg_a2,pass,7
res2net101_26w_4s,pass,6
res2net50_14w_8s,pass,6
res2next50,pass,6
resmlp_12_224,pass,6
resnest101e,pass,6
rexnet_100,pass,7
sebotnet33ts_256,pass,6
selecsls42b,pass,6
spnasnet_100,pass,7
swin_base_patch4_window7_224,pass,7
swsl_resnext101_32x16d,pass,6
tf_efficientnet_b0,pass,6
tf_mixnet_l,pass,6
tinynet_a,pass,6
tnt_s_patch16_224,pass,7
twins_pcpvt_base,pass,7
visformer_small,pass,7
vit_base_patch16_224,pass,7
volo_d1_224,pass,7
xcit_large_24_p8_224,pass_due_to_skip,7

1 name accuracy graph_breaks
10 crossvit_9_240 mobilevit_s pass 7 6
11 cspdarknet53 nfnet_l0 pass 7
12 deit_base_distilled_patch16_224 repvgg_a2 pass 7
dla102 pass 7
dm_nfnet_f0 pass 6
dpn107 pass 6
eca_botnext26ts_256 pass 7
eca_halonext26ts pass 7
ese_vovnet19b_dw pass 7
fbnetc_100 pass 7
fbnetv3_b pass 6
gernet_l pass 6
ghostnet_100 pass 6
gluon_inception_v3 pass 7
gmixer_24_224 pass 6
gmlp_s16_224 pass 7
hrnet_w18 pass 5
inception_v3 pass 6
jx_nest_base pass 7
lcnet_050 fail_accuracy 6
levit_128 pass 7
mixer_b16_224 pass 7
mixnet_l pass 6
mnasnet_100 pass 7
mobilenetv2_100 pass 7
mobilenetv3_large_100 pass 7
mobilevit_s pass 6
nfnet_l0 pass 7
pit_b_224 pass 6
pnasnet5large pass 5
poolformer_m36 pass 6
regnety_002 pass 6
repvgg_a2 pass 7
res2net101_26w_4s pass 6
res2net50_14w_8s pass 6
13 res2next50 swin_base_patch4_window7_224 pass 6 7
14 resmlp_12_224 tf_efficientnet_b0 pass 6
15 resnest101e visformer_small pass 6 7
16
sebotnet33ts_256 pass 6
selecsls42b pass 6
spnasnet_100 pass 7
swin_base_patch4_window7_224 pass 7
17
18
19
20
tnt_s_patch16_224 pass 7
twins_pcpvt_base pass 7
visformer_small pass 7
vit_base_patch16_224 pass 7
volo_d1_224 pass 7
xcit_large_24_p8_224 pass_due_to_skip 7
21
22
23
24
25
26
27
28
29
30
31
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57

View File

@ -6,10 +6,6 @@ AlbertForMaskedLM,pass,0
AlbertForQuestionAnswering,pass,0
AllenaiLongformerBase,pass,4
@ -18,50 +14,22 @@ BartForCausalLM,pass,0
BartForConditionalGeneration,pass,0
BertForMaskedLM,pass,0
BertForQuestionAnswering,pass,0
BlenderbotForCausalLM,pass_due_to_skip,0
BlenderbotSmallForCausalLM,pass,0
BlenderbotSmallForConditionalGeneration,pass,0
CamemBert,pass,0
DebertaV2ForMaskedLM,pass_due_to_skip,0
DebertaV2ForQuestionAnswering,pass,0
DistilBertForMaskedLM,pass,0
DistilBertForQuestionAnswering,pass,0
DistillGPT2,pass,2
@ -70,10 +38,6 @@ ElectraForCausalLM,pass,0
ElectraForQuestionAnswering,pass,0
GPT2ForSequenceClassification,pass,0
@ -86,10 +50,6 @@ LayoutLMForMaskedLM,pass,0
LayoutLMForSequenceClassification,pass,0
M2M100ForConditionalGeneration,pass,0
@ -98,10 +58,6 @@ MBartForCausalLM,pass,0
MBartForConditionalGeneration,pass,0
MT5ForConditionalGeneration,pass,0
@ -110,18 +66,10 @@ MegatronBertForCausalLM,pass,0
MegatronBertForQuestionAnswering,pass,0
MobileBertForMaskedLM,pass,0
MobileBertForQuestionAnswering,pass,0
OPTForCausalLM,pass,0
@ -130,26 +78,14 @@ PLBartForCausalLM,pass,0
PLBartForConditionalGeneration,pass,0
PegasusForCausalLM,pass,0
PegasusForConditionalGeneration,pass,0
RobertaForCausalLM,pass,0
RobertaForQuestionAnswering,pass,0
T5ForConditionalGeneration,pass,0

1 name accuracy graph_breaks
6 BartForConditionalGeneration BlenderbotForCausalLM pass pass_due_to_skip 0
7 BertForMaskedLM DebertaV2ForMaskedLM pass pass_due_to_skip 0
8 BertForQuestionAnswering DistilBertForMaskedLM pass 0
BlenderbotForCausalLM pass_due_to_skip 0
BlenderbotSmallForCausalLM pass 0
BlenderbotSmallForConditionalGeneration pass 0
CamemBert pass 0
9 DebertaV2ForMaskedLM DistillGPT2 pass_due_to_skip pass 0 2
10 DebertaV2ForQuestionAnswering ElectraForCausalLM pass 0
11 DistilBertForMaskedLM GPT2ForSequenceClassification pass 0
14 ElectraForCausalLM M2M100ForConditionalGeneration pass 0
15 ElectraForQuestionAnswering MBartForCausalLM pass 0
16 GPT2ForSequenceClassification MT5ForConditionalGeneration pass 0
GoogleFnet pass 0
LayoutLMForMaskedLM pass 0
LayoutLMForSequenceClassification pass 0
M2M100ForConditionalGeneration pass 0
17 MBartForCausalLM MegatronBertForCausalLM pass 0
18 MBartForConditionalGeneration MobileBertForMaskedLM pass 0
19 MT5ForConditionalGeneration OPTForCausalLM pass 0
20 MegatronBertForCausalLM PLBartForCausalLM pass 0
MegatronBertForQuestionAnswering pass 0
MobileBertForMaskedLM pass 0
MobileBertForQuestionAnswering pass 0
OPTForCausalLM pass 0
21 PLBartForCausalLM PegasusForCausalLM pass 0
22 PLBartForConditionalGeneration RobertaForCausalLM pass 0
23 PegasusForCausalLM T5ForConditionalGeneration pass 0
24 PegasusForConditionalGeneration T5Small pass 0
RobertaForCausalLM pass 0
RobertaForQuestionAnswering pass 0
T5ForConditionalGeneration pass 0
T5Small pass 0
TrOCRForCausalLM pass 0
XGLMForCausalLM pass 0
XLNetLMHeadModel pass 0
YituTechConvBert pass 0
meta-llama/Llama-3.2-1B pass 0
google/gemma-2-2b pass 0
google/gemma-3-4b-it pass_due_to_skip 0
openai/whisper-tiny pass 0
25 Qwen/Qwen3-0.6B TrOCRForCausalLM pass 0
26 mistralai/Mistral-7B-Instruct-v0.3 XGLMForCausalLM pass 0
27 openai/gpt-oss-20b XLNetLMHeadModel pass 0
28 YituTechConvBert pass 0
29 meta-llama/Llama-3.2-1B pass 0
30 google/gemma-2-2b pass 0
31 google/gemma-3-4b-it pass_due_to_skip 0
32 openai/whisper-tiny pass 0
33 Qwen/Qwen3-0.6B pass 0
34 mistralai/Mistral-7B-Instruct-v0.3 pass 0
35 openai/gpt-oss-20b pass 0
38
39
40
41
42
43
50
51
52
53
54
55
58
59
60
61
62
63
66
67
68
69
70
71
72
73
74
75
78
79
80
81
82
83
84
85
86
87
88
89
90
91

View File

@ -10,126 +10,22 @@ beit_base_patch16_224,pass,0
botnet26t_256,pass,0
cait_m36_384,pass,0
coat_lite_mini,pass,0
convit_base,pass,0
convmixer_768_32,pass,0
convnext_base,pass,0
crossvit_9_240,pass,0
cspdarknet53,pass,0
deit_base_distilled_patch16_224,pass,0
dla102,pass,0
dm_nfnet_f0,pass,0
dpn107,pass,0
eca_botnext26ts_256,pass,0
eca_halonext26ts,pass,0
ese_vovnet19b_dw,pass,0
fbnetc_100,pass,0
fbnetv3_b,pass,0
gernet_l,pass,0
ghostnet_100,pass,0
gluon_inception_v3,pass,0
gmixer_24_224,pass,0
gmlp_s16_224,pass,0
hrnet_w18,pass,0
inception_v3,pass,0
jx_nest_base,pass,0
lcnet_050,pass,0
levit_128,pass,0
mixer_b16_224,pass,0
mixnet_l,pass,0
mnasnet_100,pass,0
mobilenetv2_100,pass,0
@ -146,100 +42,16 @@ nfnet_l0,pass,0
pit_b_224,pass,0
pnasnet5large,pass,0
poolformer_m36,pass,0
regnety_002,pass,0
repvgg_a2,pass,0
res2net101_26w_4s,pass,0
res2net50_14w_8s,pass,0
res2next50,pass,0
resmlp_12_224,pass,0
resnest101e,pass,0
rexnet_100,pass,0
sebotnet33ts_256,pass,0
selecsls42b,pass,0
spnasnet_100,pass,0
swin_base_patch4_window7_224,pass,0
swsl_resnext101_32x16d,pass,0
tf_efficientnet_b0,pass,0
tf_mixnet_l,pass,0
tinynet_a,pass,0
tnt_s_patch16_224,pass,0
twins_pcpvt_base,pass,0
visformer_small,pass,0
vit_base_patch16_224,pass,0
volo_d1_224,pass,0
xcit_large_24_p8_224,pass,0

1 name accuracy graph_breaks
10 crossvit_9_240 mobilevit_s pass 0
11 cspdarknet53 nfnet_l0 pass 0
12 deit_base_distilled_patch16_224 repvgg_a2 pass 0
dla102 pass 0
dm_nfnet_f0 pass 0
dpn107 pass 0
eca_botnext26ts_256 pass 0
eca_halonext26ts pass 0
ese_vovnet19b_dw pass 0
fbnetc_100 pass 0
fbnetv3_b pass 0
gernet_l pass 0
ghostnet_100 pass 0
gluon_inception_v3 pass 0
gmixer_24_224 pass 0
gmlp_s16_224 pass 0
hrnet_w18 pass 0
inception_v3 pass 0
jx_nest_base pass 0
lcnet_050 pass 0
levit_128 pass 0
mixer_b16_224 pass 0
mixnet_l pass 0
mnasnet_100 pass 0
mobilenetv2_100 pass 0
mobilenetv3_large_100 pass 0
mobilevit_s pass 0
nfnet_l0 pass 0
pit_b_224 pass 0
pnasnet5large pass 0
poolformer_m36 pass 0
regnety_002 pass 0
repvgg_a2 pass 0
res2net101_26w_4s pass 0
res2net50_14w_8s pass 0
13 res2next50 swin_base_patch4_window7_224 pass 0
14 resmlp_12_224 tf_efficientnet_b0 pass 0
15 resnest101e visformer_small pass 0
16
sebotnet33ts_256 pass 0
selecsls42b pass 0
spnasnet_100 pass 0
swin_base_patch4_window7_224 pass 0
17
18
19
20
tnt_s_patch16_224 pass 0
twins_pcpvt_base pass 0
visformer_small pass 0
vit_base_patch16_224 pass 0
volo_d1_224 pass 0
xcit_large_24_p8_224 pass 0
21
22
23
24
25
26
27
28
29
30
31
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57

View File

@ -6,10 +6,6 @@ AlbertForMaskedLM,pass,0
AlbertForQuestionAnswering,pass,0
AllenaiLongformerBase,pass,4
@ -18,50 +14,22 @@ BartForCausalLM,pass,0
BartForConditionalGeneration,pass,0
BertForMaskedLM,pass,0
BertForQuestionAnswering,pass,0
BlenderbotForCausalLM,pass_due_to_skip,0
BlenderbotSmallForCausalLM,pass,0
BlenderbotSmallForConditionalGeneration,pass,0
CamemBert,pass,0
DebertaV2ForMaskedLM,pass_due_to_skip,0
DebertaV2ForQuestionAnswering,pass,0
DistilBertForMaskedLM,pass,0
DistilBertForQuestionAnswering,pass,0
DistillGPT2,pass,2
@ -70,10 +38,6 @@ ElectraForCausalLM,pass,0
ElectraForQuestionAnswering,pass,0
GPT2ForSequenceClassification,pass,0
@ -86,10 +50,6 @@ LayoutLMForMaskedLM,pass,0
LayoutLMForSequenceClassification,pass,0
M2M100ForConditionalGeneration,pass,0
@ -98,10 +58,6 @@ MBartForCausalLM,pass,0
MBartForConditionalGeneration,pass,0
MT5ForConditionalGeneration,pass,0
@ -110,18 +66,10 @@ MegatronBertForCausalLM,pass,0
MegatronBertForQuestionAnswering,pass,0
MobileBertForMaskedLM,pass,0
MobileBertForQuestionAnswering,pass,0
OPTForCausalLM,pass,0
@ -130,26 +78,14 @@ PLBartForCausalLM,pass,0
PLBartForConditionalGeneration,pass,0
PegasusForCausalLM,pass,0
PegasusForConditionalGeneration,pass,0
RobertaForCausalLM,pass,0
RobertaForQuestionAnswering,pass,0
T5ForConditionalGeneration,pass,0

1 name accuracy graph_breaks
6 BartForConditionalGeneration BlenderbotForCausalLM pass pass_due_to_skip 0
7 BertForMaskedLM DebertaV2ForMaskedLM pass pass_due_to_skip 0
8 BertForQuestionAnswering DistilBertForMaskedLM pass 0
BlenderbotForCausalLM pass_due_to_skip 0
BlenderbotSmallForCausalLM pass 0
BlenderbotSmallForConditionalGeneration pass 0
CamemBert pass 0
9 DebertaV2ForMaskedLM DistillGPT2 pass_due_to_skip pass 0 2
10 DebertaV2ForQuestionAnswering ElectraForCausalLM pass 0
11 DistilBertForMaskedLM GPT2ForSequenceClassification pass 0
14 ElectraForCausalLM M2M100ForConditionalGeneration pass 0
15 ElectraForQuestionAnswering MBartForCausalLM pass 0
16 GPT2ForSequenceClassification MT5ForConditionalGeneration pass 0
GoogleFnet pass 0
LayoutLMForMaskedLM pass 0
LayoutLMForSequenceClassification pass 0
M2M100ForConditionalGeneration pass 0
17 MBartForCausalLM MegatronBertForCausalLM pass 0
18 MBartForConditionalGeneration MobileBertForMaskedLM pass 0
19 MT5ForConditionalGeneration OPTForCausalLM pass 0
20 MegatronBertForCausalLM PLBartForCausalLM pass 0
MegatronBertForQuestionAnswering pass 0
MobileBertForMaskedLM pass 0
MobileBertForQuestionAnswering pass 0
OPTForCausalLM pass 0
21 PLBartForCausalLM PegasusForCausalLM pass 0
22 PLBartForConditionalGeneration RobertaForCausalLM pass 0
23 PegasusForCausalLM T5ForConditionalGeneration pass 0
24 PegasusForConditionalGeneration T5Small pass 0
RobertaForCausalLM pass 0
RobertaForQuestionAnswering pass 0
T5ForConditionalGeneration pass 0
T5Small pass 0
TrOCRForCausalLM pass 0
XGLMForCausalLM pass 0
XLNetLMHeadModel pass 0
YituTechConvBert pass 0
25 TrOCRForCausalLM pass 0
26 XGLMForCausalLM pass 0
27 XLNetLMHeadModel pass 0
28 YituTechConvBert pass 0
29
30
31
32
33
34
35
38
39
40
41
42
43
50
51
52
53
54
55
58
59
60
61
62
63
66
67
68
69
70
71
72
73
74
75
78
79
80
81
82
83
84
85
86
87
88
89
90
91

View File

@ -10,126 +10,22 @@ beit_base_patch16_224,pass,0
botnet26t_256,pass,0
cait_m36_384,pass,0
coat_lite_mini,pass,0
convit_base,pass,0
convmixer_768_32,pass,0
convnext_base,pass,0
crossvit_9_240,pass,0
cspdarknet53,pass,0
deit_base_distilled_patch16_224,pass,0
dla102,pass,0
dm_nfnet_f0,pass,0
dpn107,pass,0
eca_botnext26ts_256,pass,0
eca_halonext26ts,pass,0
ese_vovnet19b_dw,pass,0
fbnetc_100,pass,0
fbnetv3_b,pass,0
gernet_l,pass,0
ghostnet_100,pass,0
gluon_inception_v3,pass,0
gmixer_24_224,pass,0
gmlp_s16_224,pass,0
hrnet_w18,pass,0
inception_v3,pass,0
jx_nest_base,pass,0
lcnet_050,pass,0
levit_128,pass,0
mixer_b16_224,pass,0
mixnet_l,pass,0
mnasnet_100,pass,0
mobilenetv2_100,pass,0
@ -146,100 +42,16 @@ nfnet_l0,pass,0
pit_b_224,pass,0
pnasnet5large,pass,0
poolformer_m36,pass,0
regnety_002,pass,0
repvgg_a2,pass,0
res2net101_26w_4s,pass,0
res2net50_14w_8s,pass,0
res2next50,pass,0
resmlp_12_224,pass,0
resnest101e,pass,0
rexnet_100,pass,0
sebotnet33ts_256,pass,0
selecsls42b,pass,0
spnasnet_100,pass,0
swin_base_patch4_window7_224,pass,0
swsl_resnext101_32x16d,pass,0
tf_efficientnet_b0,pass,0
tf_mixnet_l,pass,0
tinynet_a,pass,0
tnt_s_patch16_224,pass,0
twins_pcpvt_base,pass,0
visformer_small,pass,0
vit_base_patch16_224,pass,0
volo_d1_224,pass,0
xcit_large_24_p8_224,pass,0

1 name accuracy graph_breaks
10 crossvit_9_240 mobilevit_s pass 0
11 cspdarknet53 nfnet_l0 pass 0
12 deit_base_distilled_patch16_224 repvgg_a2 pass 0
dla102 pass 0
dm_nfnet_f0 pass 0
dpn107 pass 0
eca_botnext26ts_256 pass 0
eca_halonext26ts pass 0
ese_vovnet19b_dw pass 0
fbnetc_100 pass 0
fbnetv3_b pass 0
gernet_l pass 0
ghostnet_100 pass 0
gluon_inception_v3 pass 0
gmixer_24_224 pass 0
gmlp_s16_224 pass 0
hrnet_w18 pass 0
inception_v3 pass 0
jx_nest_base pass 0
lcnet_050 pass 0
levit_128 pass 0
mixer_b16_224 pass 0
mixnet_l pass 0
mnasnet_100 pass 0
mobilenetv2_100 pass 0
mobilenetv3_large_100 pass 0
mobilevit_s pass 0
nfnet_l0 pass 0
pit_b_224 pass 0
pnasnet5large pass 0
poolformer_m36 pass 0
regnety_002 pass 0
repvgg_a2 pass 0
res2net101_26w_4s pass 0
res2net50_14w_8s pass 0
13 res2next50 swin_base_patch4_window7_224 pass 0
14 resmlp_12_224 tf_efficientnet_b0 pass 0
15 resnest101e visformer_small pass 0
16
sebotnet33ts_256 pass 0
selecsls42b pass 0
spnasnet_100 pass 0
swin_base_patch4_window7_224 pass 0
17
18
19
20
tnt_s_patch16_224 pass 0
twins_pcpvt_base pass 0
visformer_small pass 0
vit_base_patch16_224 pass 0
volo_d1_224 pass 0
xcit_large_24_p8_224 pass 0
21
22
23
24
25
26
27
28
29
30
31
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57

View File

@ -6,10 +6,6 @@ AlbertForMaskedLM,pass,0
AlbertForQuestionAnswering,pass,0
AllenaiLongformerBase,pass,4
@ -18,50 +14,22 @@ BartForCausalLM,pass,0
BartForConditionalGeneration,pass,0
BertForMaskedLM,pass,0
BertForQuestionAnswering,pass,0
BlenderbotForCausalLM,pass_due_to_skip,0
BlenderbotSmallForCausalLM,pass,0
BlenderbotSmallForConditionalGeneration,pass,0
CamemBert,pass,0
DebertaV2ForMaskedLM,pass_due_to_skip,0
DebertaV2ForQuestionAnswering,pass,0
DistilBertForMaskedLM,pass,0
DistilBertForQuestionAnswering,pass,0
DistillGPT2,pass,2
@ -70,10 +38,6 @@ ElectraForCausalLM,pass,0
ElectraForQuestionAnswering,pass,0
GPT2ForSequenceClassification,pass,0
@ -86,10 +50,6 @@ LayoutLMForMaskedLM,pass,0
LayoutLMForSequenceClassification,pass,0
M2M100ForConditionalGeneration,pass,0
@ -98,10 +58,6 @@ MBartForCausalLM,pass,0
MBartForConditionalGeneration,pass,0
MT5ForConditionalGeneration,pass,0
@ -110,18 +66,10 @@ MegatronBertForCausalLM,pass,0
MegatronBertForQuestionAnswering,pass,0
MobileBertForMaskedLM,pass,0
MobileBertForQuestionAnswering,pass,0
OPTForCausalLM,pass,0
@ -130,26 +78,14 @@ PLBartForCausalLM,pass,0
PLBartForConditionalGeneration,pass,0
PegasusForCausalLM,pass,0
PegasusForConditionalGeneration,pass,0
RobertaForCausalLM,pass,0
RobertaForQuestionAnswering,pass,0
T5ForConditionalGeneration,pass,0

1 name accuracy graph_breaks
6 BartForConditionalGeneration BlenderbotForCausalLM pass pass_due_to_skip 0
7 BertForMaskedLM DebertaV2ForMaskedLM pass pass_due_to_skip 0
8 BertForQuestionAnswering DistilBertForMaskedLM pass 0
BlenderbotForCausalLM pass_due_to_skip 0
BlenderbotSmallForCausalLM pass 0
BlenderbotSmallForConditionalGeneration pass 0
CamemBert pass 0
9 DebertaV2ForMaskedLM DistillGPT2 pass_due_to_skip pass 0 2
10 DebertaV2ForQuestionAnswering ElectraForCausalLM pass 0
11 DistilBertForMaskedLM GPT2ForSequenceClassification pass 0
14 ElectraForCausalLM M2M100ForConditionalGeneration pass 0
15 ElectraForQuestionAnswering MBartForCausalLM pass 0
16 GPT2ForSequenceClassification MT5ForConditionalGeneration pass 0
GoogleFnet pass 0
LayoutLMForMaskedLM pass 0
LayoutLMForSequenceClassification pass 0
M2M100ForConditionalGeneration pass 0
17 MBartForCausalLM MegatronBertForCausalLM pass 0
18 MBartForConditionalGeneration MobileBertForMaskedLM pass 0
19 MT5ForConditionalGeneration OPTForCausalLM pass 0
20 MegatronBertForCausalLM PLBartForCausalLM pass 0
MegatronBertForQuestionAnswering pass 0
MobileBertForMaskedLM pass 0
MobileBertForQuestionAnswering pass 0
OPTForCausalLM pass 0
21 PLBartForCausalLM PegasusForCausalLM pass 0
22 PLBartForConditionalGeneration RobertaForCausalLM pass 0
23 PegasusForCausalLM T5ForConditionalGeneration pass 0
24 PegasusForConditionalGeneration T5Small pass 0
RobertaForCausalLM pass 0
RobertaForQuestionAnswering pass 0
T5ForConditionalGeneration pass 0
T5Small pass 0
TrOCRForCausalLM pass 0
XGLMForCausalLM pass 0
XLNetLMHeadModel pass 0
YituTechConvBert pass 0
meta-llama/Llama-3.2-1B pass 0
google/gemma-2-2b pass 0
google/gemma-3-4b-it pass 0
openai/whisper-tiny pass 0
25 Qwen/Qwen3-0.6B TrOCRForCausalLM pass 0
26 mistralai/Mistral-7B-Instruct-v0.3 XGLMForCausalLM pass 0
27 openai/gpt-oss-20b XLNetLMHeadModel pass 0
28 YituTechConvBert pass 0
29 meta-llama/Llama-3.2-1B pass 0
30 google/gemma-2-2b pass 0
31 google/gemma-3-4b-it pass 0
32 openai/whisper-tiny pass 0
33 Qwen/Qwen3-0.6B pass 0
34 mistralai/Mistral-7B-Instruct-v0.3 pass 0
35 openai/gpt-oss-20b pass 0
38
39
40
41
42
43
50
51
52
53
54
55
58
59
60
61
62
63
66
67
68
69
70
71
72
73
74
75
78
79
80
81
82
83
84
85
86
87
88
89
90
91

View File

@ -6,10 +6,6 @@ AlbertForMaskedLM,pass,4
AlbertForQuestionAnswering,pass,5
AllenaiLongformerBase,pass,9
@ -18,50 +14,22 @@ BartForCausalLM,pass,6
BartForConditionalGeneration,pass,8
BertForMaskedLM,pass,5
BertForQuestionAnswering,pass,5
BlenderbotForCausalLM,eager_fail_to_run,0
BlenderbotSmallForCausalLM,pass,6
BlenderbotSmallForConditionalGeneration,pass,8
CamemBert,pass,5
DebertaV2ForMaskedLM,pass_due_to_skip,0
DebertaV2ForQuestionAnswering,eager_1st_run_OOM,0
DistilBertForMaskedLM,pass,5
DistilBertForQuestionAnswering,pass,5
DistillGPT2,pass,7
@ -70,10 +38,6 @@ ElectraForCausalLM,pass,4
ElectraForQuestionAnswering,pass,5
GPT2ForSequenceClassification,pass,6
@ -86,10 +50,6 @@ LayoutLMForMaskedLM,pass,5
LayoutLMForSequenceClassification,pass,6
M2M100ForConditionalGeneration,pass,4
@ -98,10 +58,6 @@ MBartForCausalLM,pass,6
MBartForConditionalGeneration,pass,8
MT5ForConditionalGeneration,pass,5
@ -110,18 +66,10 @@ MegatronBertForCausalLM,pass,5
MegatronBertForQuestionAnswering,pass,5
MobileBertForMaskedLM,pass,3
MobileBertForQuestionAnswering,pass,3
OPTForCausalLM,pass,8
@ -130,26 +78,14 @@ PLBartForCausalLM,pass,6
PLBartForConditionalGeneration,pass,8
PegasusForCausalLM,pass,6
PegasusForConditionalGeneration,pass,7
RobertaForCausalLM,pass,5
RobertaForQuestionAnswering,pass,5
T5ForConditionalGeneration,pass,5

1 name accuracy graph_breaks
6 BartForConditionalGeneration BlenderbotForCausalLM pass eager_fail_to_run 8 0
7 BertForMaskedLM DebertaV2ForMaskedLM pass pass_due_to_skip 5 0
8 BertForQuestionAnswering DistilBertForMaskedLM pass 5
BlenderbotForCausalLM eager_fail_to_run 0
BlenderbotSmallForCausalLM pass 6
BlenderbotSmallForConditionalGeneration pass 8
CamemBert pass 5
9 DebertaV2ForMaskedLM DistillGPT2 pass_due_to_skip pass 0 7
10 DebertaV2ForQuestionAnswering ElectraForCausalLM eager_1st_run_OOM pass 0 4
11 DistilBertForMaskedLM GPT2ForSequenceClassification pass 5 6
14 ElectraForCausalLM M2M100ForConditionalGeneration pass 4
15 ElectraForQuestionAnswering MBartForCausalLM pass 5 6
16 GPT2ForSequenceClassification MT5ForConditionalGeneration pass 6 5
GoogleFnet pass 5
LayoutLMForMaskedLM pass 5
LayoutLMForSequenceClassification pass 6
M2M100ForConditionalGeneration pass 4
17 MBartForCausalLM MegatronBertForCausalLM pass 6 5
18 MBartForConditionalGeneration MobileBertForMaskedLM pass 8 3
19 MT5ForConditionalGeneration OPTForCausalLM pass 5 8
20 MegatronBertForCausalLM PLBartForCausalLM pass 5 6
MegatronBertForQuestionAnswering pass 5
MobileBertForMaskedLM pass 3
MobileBertForQuestionAnswering pass 3
OPTForCausalLM pass 8
21 PLBartForCausalLM PegasusForCausalLM pass 6
22 PLBartForConditionalGeneration RobertaForCausalLM pass 8 5
23 PegasusForCausalLM T5ForConditionalGeneration pass 6 5
24 PegasusForConditionalGeneration T5Small pass 7 5
RobertaForCausalLM pass 5
RobertaForQuestionAnswering pass 5
T5ForConditionalGeneration pass 5
T5Small pass 5
TrOCRForCausalLM pass 6
XGLMForCausalLM pass 6
XLNetLMHeadModel pass 5
YituTechConvBert pass 5
meta-llama/Llama-3.2-1B eager_fail_to_run 0
google/gemma-2-2b eager_fail_to_run 0
google/gemma-3-4b-it eager_fail_to_run 0
openai/whisper-tiny eager_fail_to_run 0
25 Qwen/Qwen3-0.6B TrOCRForCausalLM eager_fail_to_run pass 0 6
26 XGLMForCausalLM pass 6
27 XLNetLMHeadModel pass 5
28 YituTechConvBert pass 5
29 meta-llama/Llama-3.2-1B eager_fail_to_run 0
30 google/gemma-2-2b eager_fail_to_run 0
31 google/gemma-3-4b-it eager_fail_to_run 0
32 openai/whisper-tiny eager_fail_to_run 0
33 Qwen/Qwen3-0.6B eager_fail_to_run 0
34
35
38
39
40
41
42
43
50
51
52
53
54
55
58
59
60
61
62
63
66
67
68
69
70
71
72
73
74
75
78
79
80
81
82
83
84
85
86
87
88
89
90
91

View File

@ -10,126 +10,22 @@ beit_base_patch16_224,pass,0
botnet26t_256,pass,0
cait_m36_384,pass,0
coat_lite_mini,pass,0
convit_base,pass,0
convmixer_768_32,pass,0
convnext_base,pass,0
crossvit_9_240,pass,0
cspdarknet53,pass,0
deit_base_distilled_patch16_224,pass,0
dla102,pass,0
dm_nfnet_f0,pass,0
dpn107,pass,0
eca_botnext26ts_256,pass,0
eca_halonext26ts,pass,0
ese_vovnet19b_dw,pass,0
fbnetc_100,pass,0
fbnetv3_b,pass,0
gernet_l,pass,0
ghostnet_100,pass,0
gluon_inception_v3,pass,0
gmixer_24_224,pass,0
gmlp_s16_224,pass,0
hrnet_w18,pass,0
inception_v3,pass,0
jx_nest_base,pass,0
lcnet_050,pass,0
levit_128,fail_to_run,0
mixer_b16_224,pass,0
mixnet_l,pass,0
mnasnet_100,pass,0
mobilenetv2_100,pass,0
@ -146,100 +42,16 @@ nfnet_l0,pass,0
pit_b_224,pass,0
pnasnet5large,pass,0
poolformer_m36,pass,0
regnety_002,pass,0
repvgg_a2,pass,0
res2net101_26w_4s,pass,0
res2net50_14w_8s,pass,0
res2next50,pass,0
resmlp_12_224,pass,0
resnest101e,pass,0
rexnet_100,pass,0
sebotnet33ts_256,pass,0
selecsls42b,pass,0
spnasnet_100,pass,0
swin_base_patch4_window7_224,pass,0
swsl_resnext101_32x16d,pass,0
tf_efficientnet_b0,pass,0
tf_mixnet_l,pass,0
tinynet_a,pass,0
tnt_s_patch16_224,pass,0
twins_pcpvt_base,pass,0
visformer_small,pass,0
vit_base_patch16_224,pass,0
volo_d1_224,pass,0
xcit_large_24_p8_224,pass,0

1 name accuracy graph_breaks
10 crossvit_9_240 mobilevit_s pass 0
11 cspdarknet53 nfnet_l0 pass 0
12 deit_base_distilled_patch16_224 repvgg_a2 pass 0
dla102 pass 0
dm_nfnet_f0 pass 0
dpn107 pass 0
eca_botnext26ts_256 pass 0
eca_halonext26ts pass 0
ese_vovnet19b_dw pass 0
fbnetc_100 pass 0
fbnetv3_b pass 0
gernet_l pass 0
ghostnet_100 pass 0
gluon_inception_v3 pass 0
gmixer_24_224 pass 0
gmlp_s16_224 pass 0
hrnet_w18 pass 0
inception_v3 pass 0
jx_nest_base pass 0
lcnet_050 pass 0
levit_128 fail_to_run 0
mixer_b16_224 pass 0
mixnet_l pass 0
mnasnet_100 pass 0
mobilenetv2_100 pass 0
mobilenetv3_large_100 pass 0
mobilevit_s pass 0
nfnet_l0 pass 0
pit_b_224 pass 0
pnasnet5large pass 0
poolformer_m36 pass 0
regnety_002 pass 0
repvgg_a2 pass 0
res2net101_26w_4s pass 0
res2net50_14w_8s pass 0
13 res2next50 swin_base_patch4_window7_224 pass 0
14 resmlp_12_224 tf_efficientnet_b0 pass 0
15 resnest101e visformer_small pass 0
16
sebotnet33ts_256 pass 0
selecsls42b pass 0
spnasnet_100 pass 0
swin_base_patch4_window7_224 pass 0
17
18
19
20
tnt_s_patch16_224 pass 0
twins_pcpvt_base pass 0
visformer_small pass 0
vit_base_patch16_224 pass 0
volo_d1_224 pass 0
xcit_large_24_p8_224 pass 0
21
22
23
24
25
26
27
28
29
30
31
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57

View File

@ -10,126 +10,22 @@ beit_base_patch16_224,pass,7
botnet26t_256,pass,6
cait_m36_384,eager_fail_to_run,0
coat_lite_mini,pass,6
convit_base,pass,7
convmixer_768_32,pass,5
convnext_base,pass,7
crossvit_9_240,pass,7
cspdarknet53,pass,7
deit_base_distilled_patch16_224,pass,7
dla102,pass,7
dm_nfnet_f0,pass,6
dpn107,pass,6
eca_botnext26ts_256,pass,7
eca_halonext26ts,pass,7
ese_vovnet19b_dw,pass,7
fbnetc_100,pass,7
fbnetv3_b,pass,6
gernet_l,pass,6
ghostnet_100,pass,6
gluon_inception_v3,pass,7
gmixer_24_224,pass,6
gmlp_s16_224,pass,7
hrnet_w18,pass,5
inception_v3,pass,6
jx_nest_base,pass,7
lcnet_050,pass,6
levit_128,pass,7
mixer_b16_224,pass,7
mixnet_l,pass,6
mnasnet_100,pass,7
mobilenetv2_100,pass,7
@ -146,100 +42,16 @@ nfnet_l0,pass,7
pit_b_224,pass,6
pnasnet5large,pass,5
poolformer_m36,pass,6
regnety_002,pass,6
repvgg_a2,pass,7
res2net101_26w_4s,pass,6
res2net50_14w_8s,pass,6
res2next50,pass,6
resmlp_12_224,pass,6
resnest101e,pass,6
rexnet_100,pass,7
sebotnet33ts_256,pass,6
selecsls42b,pass,6
spnasnet_100,pass,7
swin_base_patch4_window7_224,pass,7
swsl_resnext101_32x16d,pass,6
tf_efficientnet_b0,pass,6
tf_mixnet_l,pass,6
tinynet_a,pass,6
tnt_s_patch16_224,pass,7
twins_pcpvt_base,pass,7
visformer_small,pass,7
vit_base_patch16_224,pass,7
volo_d1_224,pass,7
xcit_large_24_p8_224,pass_due_to_skip,7

1 name accuracy graph_breaks
10 crossvit_9_240 mobilevit_s pass 7 6
11 cspdarknet53 nfnet_l0 pass 7
12 deit_base_distilled_patch16_224 repvgg_a2 pass 7
dla102 pass 7
dm_nfnet_f0 pass 6
dpn107 pass 6
eca_botnext26ts_256 pass 7
eca_halonext26ts pass 7
ese_vovnet19b_dw pass 7
fbnetc_100 pass 7
fbnetv3_b pass 6
gernet_l pass 6
ghostnet_100 pass 6
gluon_inception_v3 pass 7
gmixer_24_224 pass 6
gmlp_s16_224 pass 7
hrnet_w18 pass 5
inception_v3 pass 6
jx_nest_base pass 7
lcnet_050 pass 6
levit_128 pass 7
mixer_b16_224 pass 7
mixnet_l pass 6
mnasnet_100 pass 7
mobilenetv2_100 pass 7
mobilenetv3_large_100 pass 7
mobilevit_s pass 6
nfnet_l0 pass 7
pit_b_224 pass 6
pnasnet5large pass 5
poolformer_m36 pass 6
regnety_002 pass 6
repvgg_a2 pass 7
res2net101_26w_4s pass 6
res2net50_14w_8s pass 6
13 res2next50 swin_base_patch4_window7_224 pass 6 7
14 resmlp_12_224 tf_efficientnet_b0 pass 6
15 resnest101e visformer_small pass 6 7
16
sebotnet33ts_256 pass 6
selecsls42b pass 6
spnasnet_100 pass 7
swin_base_patch4_window7_224 pass 7
17
18
19
20
tnt_s_patch16_224 pass 7
twins_pcpvt_base pass 7
visformer_small pass 7
vit_base_patch16_224 pass 7
volo_d1_224 pass 7
xcit_large_24_p8_224 pass_due_to_skip 7
21
22
23
24
25
26
27
28
29
30
31
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57

View File

@ -6,10 +6,6 @@ AlbertForMaskedLM,pass,0
AlbertForQuestionAnswering,pass,0
AllenaiLongformerBase,pass,4
@ -18,50 +14,22 @@ BartForCausalLM,pass,0
BartForConditionalGeneration,pass,0
BertForMaskedLM,pass,0
BertForQuestionAnswering,pass,0
BlenderbotForCausalLM,pass_due_to_skip,0
BlenderbotSmallForCausalLM,pass,0
BlenderbotSmallForConditionalGeneration,pass,0
CamemBert,pass,0
DebertaV2ForMaskedLM,pass_due_to_skip,0
DebertaV2ForQuestionAnswering,pass,0
DistilBertForMaskedLM,pass,0
DistilBertForQuestionAnswering,pass,0
DistillGPT2,pass,2
@ -70,10 +38,6 @@ ElectraForCausalLM,pass,0
ElectraForQuestionAnswering,pass,0
GPT2ForSequenceClassification,pass,0
@ -86,10 +50,6 @@ LayoutLMForMaskedLM,pass,0
LayoutLMForSequenceClassification,pass,0
M2M100ForConditionalGeneration,pass,0
@ -98,10 +58,6 @@ MBartForCausalLM,pass,0
MBartForConditionalGeneration,pass,0
MT5ForConditionalGeneration,pass,0
@ -110,18 +66,10 @@ MegatronBertForCausalLM,pass,0
MegatronBertForQuestionAnswering,pass,0
MobileBertForMaskedLM,pass,0
MobileBertForQuestionAnswering,pass,0
OPTForCausalLM,pass,0
@ -130,26 +78,14 @@ PLBartForCausalLM,pass,0
PLBartForConditionalGeneration,pass,0
PegasusForCausalLM,pass,0
PegasusForConditionalGeneration,pass,0
RobertaForCausalLM,pass,0
RobertaForQuestionAnswering,pass,0
T5ForConditionalGeneration,pass,0

1 name accuracy graph_breaks
6 BartForConditionalGeneration BlenderbotForCausalLM pass pass_due_to_skip 0
7 BertForMaskedLM DebertaV2ForMaskedLM pass pass_due_to_skip 0
8 BertForQuestionAnswering DistilBertForMaskedLM pass 0
BlenderbotForCausalLM pass_due_to_skip 0
BlenderbotSmallForCausalLM pass 0
BlenderbotSmallForConditionalGeneration pass 0
CamemBert pass 0
9 DebertaV2ForMaskedLM DistillGPT2 pass_due_to_skip pass 0 2
10 DebertaV2ForQuestionAnswering ElectraForCausalLM pass 0
11 DistilBertForMaskedLM GPT2ForSequenceClassification pass 0
14 ElectraForCausalLM M2M100ForConditionalGeneration pass 0
15 ElectraForQuestionAnswering MBartForCausalLM pass 0
16 GPT2ForSequenceClassification MT5ForConditionalGeneration pass 0
GoogleFnet pass 0
LayoutLMForMaskedLM pass 0
LayoutLMForSequenceClassification pass 0
M2M100ForConditionalGeneration pass 0
17 MBartForCausalLM MegatronBertForCausalLM pass 0
18 MBartForConditionalGeneration MobileBertForMaskedLM pass 0
19 MT5ForConditionalGeneration OPTForCausalLM pass 0
20 MegatronBertForCausalLM PLBartForCausalLM pass 0
MegatronBertForQuestionAnswering pass 0
MobileBertForMaskedLM pass 0
MobileBertForQuestionAnswering pass 0
OPTForCausalLM pass 0
21 PLBartForCausalLM PegasusForCausalLM pass 0
22 PLBartForConditionalGeneration RobertaForCausalLM pass 0
23 PegasusForCausalLM T5ForConditionalGeneration pass 0
24 PegasusForConditionalGeneration T5Small pass 0
RobertaForCausalLM pass 0
RobertaForQuestionAnswering pass 0
T5ForConditionalGeneration pass 0
T5Small pass 0
TrOCRForCausalLM pass 0
XGLMForCausalLM pass 0
XLNetLMHeadModel pass 0
YituTechConvBert pass 0
meta-llama/Llama-3.2-1B pass 0
google/gemma-2-2b pass 0
google/gemma-3-4b-it pass_due_to_skip 0
openai/whisper-tiny pass 0
25 Qwen/Qwen3-0.6B TrOCRForCausalLM pass 0
26 mistralai/Mistral-7B-Instruct-v0.3 XGLMForCausalLM pass 0
27 openai/gpt-oss-20b XLNetLMHeadModel pass 0
28 YituTechConvBert pass 0
29 meta-llama/Llama-3.2-1B pass 0
30 google/gemma-2-2b pass 0
31 google/gemma-3-4b-it pass_due_to_skip 0
32 openai/whisper-tiny pass 0
33 Qwen/Qwen3-0.6B pass 0
34 mistralai/Mistral-7B-Instruct-v0.3 pass 0
35 openai/gpt-oss-20b pass 0
38
39
40
41
42
43
50
51
52
53
54
55
58
59
60
61
62
63
66
67
68
69
70
71
72
73
74
75
78
79
80
81
82
83
84
85
86
87
88
89
90
91

View File

@ -6,10 +6,6 @@ AlbertForMaskedLM,pass,4
AlbertForQuestionAnswering,pass,5
AllenaiLongformerBase,pass,9
@ -18,50 +14,22 @@ BartForCausalLM,pass,6
BartForConditionalGeneration,pass,8
BertForMaskedLM,pass,5
BertForQuestionAnswering,pass,5
BlenderbotForCausalLM,eager_fail_to_run,0
BlenderbotSmallForCausalLM,pass,6
BlenderbotSmallForConditionalGeneration,pass,8
CamemBert,pass,5
DebertaV2ForMaskedLM,pass_due_to_skip,0
DebertaV2ForQuestionAnswering,eager_1st_run_OOM,0
DistilBertForMaskedLM,pass,5
DistilBertForQuestionAnswering,pass,5
DistillGPT2,pass,7
@ -70,10 +38,6 @@ ElectraForCausalLM,pass,4
ElectraForQuestionAnswering,pass,5
GPT2ForSequenceClassification,pass,6
@ -86,10 +50,6 @@ LayoutLMForMaskedLM,pass,5
LayoutLMForSequenceClassification,pass,6
M2M100ForConditionalGeneration,pass,4
@ -98,10 +58,6 @@ MBartForCausalLM,pass,6
MBartForConditionalGeneration,pass,8
MT5ForConditionalGeneration,pass,5
@ -110,18 +66,10 @@ MegatronBertForCausalLM,pass,5
MegatronBertForQuestionAnswering,pass,5
MobileBertForMaskedLM,pass,3
MobileBertForQuestionAnswering,pass,3
OPTForCausalLM,pass,8
@ -130,26 +78,14 @@ PLBartForCausalLM,pass,6
PLBartForConditionalGeneration,pass,8
PegasusForCausalLM,pass,6
PegasusForConditionalGeneration,pass,7
RobertaForCausalLM,pass,5
RobertaForQuestionAnswering,pass,5
T5ForConditionalGeneration,pass,5

1 name accuracy graph_breaks
6 BartForConditionalGeneration BlenderbotForCausalLM pass eager_fail_to_run 8 0
7 BertForMaskedLM DebertaV2ForMaskedLM pass pass_due_to_skip 5 0
8 BertForQuestionAnswering DistilBertForMaskedLM pass 5
BlenderbotForCausalLM eager_fail_to_run 0
BlenderbotSmallForCausalLM pass 6
BlenderbotSmallForConditionalGeneration pass 8
CamemBert pass 5
9 DebertaV2ForMaskedLM DistillGPT2 pass_due_to_skip pass 0 7
10 DebertaV2ForQuestionAnswering ElectraForCausalLM eager_1st_run_OOM pass 0 4
11 DistilBertForMaskedLM GPT2ForSequenceClassification pass 5 6
14 ElectraForCausalLM M2M100ForConditionalGeneration pass 4
15 ElectraForQuestionAnswering MBartForCausalLM pass 5 6
16 GPT2ForSequenceClassification MT5ForConditionalGeneration pass 6 5
GoogleFnet pass 5
LayoutLMForMaskedLM pass 5
LayoutLMForSequenceClassification pass 6
M2M100ForConditionalGeneration pass 4
17 MBartForCausalLM MegatronBertForCausalLM pass 6 5
18 MBartForConditionalGeneration MobileBertForMaskedLM pass 8 3
19 MT5ForConditionalGeneration OPTForCausalLM pass 5 8
20 MegatronBertForCausalLM PLBartForCausalLM pass 5 6
MegatronBertForQuestionAnswering pass 5
MobileBertForMaskedLM pass 3
MobileBertForQuestionAnswering pass 3
OPTForCausalLM pass 8
21 PLBartForCausalLM PegasusForCausalLM pass 6
22 PLBartForConditionalGeneration RobertaForCausalLM pass 8 5
23 PegasusForCausalLM T5ForConditionalGeneration pass 6 5
24 PegasusForConditionalGeneration T5Small pass 7 5
RobertaForCausalLM pass 5
RobertaForQuestionAnswering pass 5
T5ForConditionalGeneration pass 5
T5Small pass 5
TrOCRForCausalLM pass 6
XGLMForCausalLM pass 6
XLNetLMHeadModel pass 5
YituTechConvBert pass 5
meta-llama/Llama-3.2-1B eager_fail_to_run 0
google/gemma-2-2b eager_fail_to_run 0
google/gemma-3-4b-it eager_fail_to_run 0
openai/whisper-tiny eager_fail_to_run 0
25 Qwen/Qwen3-0.6B TrOCRForCausalLM eager_fail_to_run pass 0 6
26 XGLMForCausalLM pass 6
27 XLNetLMHeadModel pass 5
28 YituTechConvBert pass 5
29 meta-llama/Llama-3.2-1B eager_fail_to_run 0
30 google/gemma-2-2b eager_fail_to_run 0
31 google/gemma-3-4b-it eager_fail_to_run 0
32 openai/whisper-tiny eager_fail_to_run 0
33 Qwen/Qwen3-0.6B eager_fail_to_run 0
34
35
38
39
40
41
42
43
50
51
52
53
54
55
58
59
60
61
62
63
66
67
68
69
70
71
72
73
74
75
78
79
80
81
82
83
84
85
86
87
88
89
90
91

View File

@ -10,126 +10,22 @@ beit_base_patch16_224,pass,0
botnet26t_256,pass,0
cait_m36_384,pass,0
coat_lite_mini,pass,0
convit_base,pass,0
convmixer_768_32,pass,0
convnext_base,pass,0
crossvit_9_240,pass,0
cspdarknet53,pass,0
deit_base_distilled_patch16_224,pass,0
dla102,pass,0
dm_nfnet_f0,pass,0
dpn107,pass,0
eca_botnext26ts_256,pass,0
eca_halonext26ts,pass,0
ese_vovnet19b_dw,pass,0
fbnetc_100,pass,0
fbnetv3_b,pass,0
gernet_l,pass,0
ghostnet_100,pass,0
gluon_inception_v3,pass,0
gmixer_24_224,pass,0
gmlp_s16_224,pass,0
hrnet_w18,pass,0
inception_v3,pass,0
jx_nest_base,pass,0
lcnet_050,pass,0
levit_128,pass,0
mixer_b16_224,pass,0
mixnet_l,pass,0
mnasnet_100,pass,0
mobilenetv2_100,pass,0
@ -146,100 +42,16 @@ nfnet_l0,pass,0
pit_b_224,pass,0
pnasnet5large,pass,0
poolformer_m36,pass,0
regnety_002,pass,0
repvgg_a2,pass,0
res2net101_26w_4s,pass,0
res2net50_14w_8s,pass,0
res2next50,pass,0
resmlp_12_224,pass,0
resnest101e,pass,0
rexnet_100,pass,0
sebotnet33ts_256,pass,0
selecsls42b,pass,0
spnasnet_100,pass,0
swin_base_patch4_window7_224,pass,0
swsl_resnext101_32x16d,pass,0
tf_efficientnet_b0,pass,0
tf_mixnet_l,pass,0
tinynet_a,pass,0
tnt_s_patch16_224,pass,0
twins_pcpvt_base,pass,0
visformer_small,pass,0
vit_base_patch16_224,pass,0
volo_d1_224,pass,0
xcit_large_24_p8_224,pass,0

1 name accuracy graph_breaks
10 crossvit_9_240 mobilevit_s pass 0
11 cspdarknet53 nfnet_l0 pass 0
12 deit_base_distilled_patch16_224 repvgg_a2 pass 0
dla102 pass 0
dm_nfnet_f0 pass 0
dpn107 pass 0
eca_botnext26ts_256 pass 0
eca_halonext26ts pass 0
ese_vovnet19b_dw pass 0
fbnetc_100 pass 0
fbnetv3_b pass 0
gernet_l pass 0
ghostnet_100 pass 0
gluon_inception_v3 pass 0
gmixer_24_224 pass 0
gmlp_s16_224 pass 0
hrnet_w18 pass 0
inception_v3 pass 0
jx_nest_base pass 0
lcnet_050 pass 0
levit_128 pass 0
mixer_b16_224 pass 0
mixnet_l pass 0
mnasnet_100 pass 0
mobilenetv2_100 pass 0
mobilenetv3_large_100 pass 0
mobilevit_s pass 0
nfnet_l0 pass 0
pit_b_224 pass 0
pnasnet5large pass 0
poolformer_m36 pass 0
regnety_002 pass 0
repvgg_a2 pass 0
res2net101_26w_4s pass 0
res2net50_14w_8s pass 0
13 res2next50 swin_base_patch4_window7_224 pass 0
14 resmlp_12_224 tf_efficientnet_b0 pass 0
15 resnest101e visformer_small pass 0
16
sebotnet33ts_256 pass 0
selecsls42b pass 0
spnasnet_100 pass 0
swin_base_patch4_window7_224 pass 0
17
18
19
20
tnt_s_patch16_224 pass 0
twins_pcpvt_base pass 0
visformer_small pass 0
vit_base_patch16_224 pass 0
volo_d1_224 pass 0
xcit_large_24_p8_224 pass 0
21
22
23
24
25
26
27
28
29
30
31
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57

View File

@ -10,126 +10,22 @@ beit_base_patch16_224,pass,7
botnet26t_256,pass,6
cait_m36_384,eager_fail_to_run,0
coat_lite_mini,pass,6
convit_base,pass,7
convmixer_768_32,pass,5
convnext_base,pass,7
crossvit_9_240,pass,7
cspdarknet53,pass,7
deit_base_distilled_patch16_224,pass,7
dla102,pass,7
dm_nfnet_f0,pass,6
dpn107,pass,6
eca_botnext26ts_256,pass,7
eca_halonext26ts,pass,7
ese_vovnet19b_dw,pass,7
fbnetc_100,pass,7
fbnetv3_b,pass,6
gernet_l,pass,6
ghostnet_100,pass,6
gluon_inception_v3,pass,7
gmixer_24_224,pass,6
gmlp_s16_224,pass,7
hrnet_w18,pass,5
inception_v3,pass,6
jx_nest_base,pass,7
lcnet_050,pass,6
levit_128,pass,7
mixer_b16_224,pass,7
mixnet_l,pass,6
mnasnet_100,pass,7
mobilenetv2_100,pass,7
@ -146,100 +42,16 @@ nfnet_l0,pass,7
pit_b_224,pass,6
pnasnet5large,pass,5
poolformer_m36,pass,6
regnety_002,pass,6
repvgg_a2,pass,7
res2net101_26w_4s,pass,6
res2net50_14w_8s,pass,6
res2next50,pass,6
resmlp_12_224,pass,6
resnest101e,pass,6
rexnet_100,pass,7
sebotnet33ts_256,pass,6
selecsls42b,pass,6
spnasnet_100,pass,7
swin_base_patch4_window7_224,pass,7
swsl_resnext101_32x16d,pass,6
tf_efficientnet_b0,pass,6
tf_mixnet_l,pass,6
tinynet_a,pass,6
tnt_s_patch16_224,pass,7
twins_pcpvt_base,pass,7
visformer_small,pass,7
vit_base_patch16_224,pass,7
volo_d1_224,pass,7
xcit_large_24_p8_224,pass_due_to_skip,7

1 name accuracy graph_breaks
10 crossvit_9_240 mobilevit_s pass 7 6
11 cspdarknet53 nfnet_l0 pass 7
12 deit_base_distilled_patch16_224 repvgg_a2 pass 7
dla102 pass 7
dm_nfnet_f0 pass 6
dpn107 pass 6
eca_botnext26ts_256 pass 7
eca_halonext26ts pass 7
ese_vovnet19b_dw pass 7
fbnetc_100 pass 7
fbnetv3_b pass 6
gernet_l pass 6
ghostnet_100 pass 6
gluon_inception_v3 pass 7
gmixer_24_224 pass 6
gmlp_s16_224 pass 7
hrnet_w18 pass 5
inception_v3 pass 6
jx_nest_base pass 7
lcnet_050 pass 6
levit_128 pass 7
mixer_b16_224 pass 7
mixnet_l pass 6
mnasnet_100 pass 7
mobilenetv2_100 pass 7
mobilenetv3_large_100 pass 7
mobilevit_s pass 6
nfnet_l0 pass 7
pit_b_224 pass 6
pnasnet5large pass 5
poolformer_m36 pass 6
regnety_002 pass 6
repvgg_a2 pass 7
res2net101_26w_4s pass 6
res2net50_14w_8s pass 6
13 res2next50 swin_base_patch4_window7_224 pass 6 7
14 resmlp_12_224 tf_efficientnet_b0 pass 6
15 resnest101e visformer_small pass 6 7
16
sebotnet33ts_256 pass 6
selecsls42b pass 6
spnasnet_100 pass 7
swin_base_patch4_window7_224 pass 7
17
18
19
20
tnt_s_patch16_224 pass 7
twins_pcpvt_base pass 7
visformer_small pass 7
vit_base_patch16_224 pass 7
volo_d1_224 pass 7
xcit_large_24_p8_224 pass_due_to_skip 7
21
22
23
24
25
26
27
28
29
30
31
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57

View File

@ -6,10 +6,6 @@ AlbertForMaskedLM,pass,0
AlbertForQuestionAnswering,pass,0
AllenaiLongformerBase,pass,4
@ -18,50 +14,22 @@ BartForCausalLM,pass,0
BartForConditionalGeneration,pass,0
BertForMaskedLM,pass,0
BertForQuestionAnswering,pass,0
BlenderbotForCausalLM,pass_due_to_skip,0
BlenderbotSmallForCausalLM,pass,0
BlenderbotSmallForConditionalGeneration,pass,0
CamemBert,pass,0
DebertaV2ForMaskedLM,pass_due_to_skip,0
DebertaV2ForQuestionAnswering,pass,0
DistilBertForMaskedLM,pass,0
DistilBertForQuestionAnswering,pass,0
DistillGPT2,pass,2
@ -70,10 +38,6 @@ ElectraForCausalLM,pass,0
ElectraForQuestionAnswering,pass,0
GPT2ForSequenceClassification,pass,0
@ -86,10 +50,6 @@ LayoutLMForMaskedLM,pass,0
LayoutLMForSequenceClassification,pass,0
M2M100ForConditionalGeneration,pass,0
@ -98,10 +58,6 @@ MBartForCausalLM,pass,0
MBartForConditionalGeneration,pass,0
MT5ForConditionalGeneration,pass,0
@ -110,18 +66,10 @@ MegatronBertForCausalLM,pass,0
MegatronBertForQuestionAnswering,pass,0
MobileBertForMaskedLM,pass,0
MobileBertForQuestionAnswering,pass,0
OPTForCausalLM,pass,0
@ -130,26 +78,14 @@ PLBartForCausalLM,pass,0
PLBartForConditionalGeneration,pass,0
PegasusForCausalLM,pass,0
PegasusForConditionalGeneration,pass,0
RobertaForCausalLM,pass,0
RobertaForQuestionAnswering,pass,0
T5ForConditionalGeneration,pass,0

1 name accuracy graph_breaks
6 BartForConditionalGeneration BlenderbotForCausalLM pass pass_due_to_skip 0
7 BertForMaskedLM DebertaV2ForMaskedLM pass pass_due_to_skip 0
8 BertForQuestionAnswering DistilBertForMaskedLM pass 0
BlenderbotForCausalLM pass_due_to_skip 0
BlenderbotSmallForCausalLM pass 0
BlenderbotSmallForConditionalGeneration pass 0
CamemBert pass 0
9 DebertaV2ForMaskedLM DistillGPT2 pass_due_to_skip pass 0 2
10 DebertaV2ForQuestionAnswering ElectraForCausalLM pass 0
11 DistilBertForMaskedLM GPT2ForSequenceClassification pass 0
14 ElectraForCausalLM M2M100ForConditionalGeneration pass 0
15 ElectraForQuestionAnswering MBartForCausalLM pass 0
16 GPT2ForSequenceClassification MT5ForConditionalGeneration pass 0
GoogleFnet pass 0
LayoutLMForMaskedLM pass 0
LayoutLMForSequenceClassification pass 0
M2M100ForConditionalGeneration pass 0
17 MBartForCausalLM MegatronBertForCausalLM pass 0
18 MBartForConditionalGeneration MobileBertForMaskedLM pass 0
19 MT5ForConditionalGeneration OPTForCausalLM pass 0
20 MegatronBertForCausalLM PLBartForCausalLM pass 0
MegatronBertForQuestionAnswering pass 0
MobileBertForMaskedLM pass 0
MobileBertForQuestionAnswering pass 0
OPTForCausalLM pass 0
21 PLBartForCausalLM PegasusForCausalLM pass 0
22 PLBartForConditionalGeneration RobertaForCausalLM pass 0
23 PegasusForCausalLM T5ForConditionalGeneration pass 0
24 PegasusForConditionalGeneration T5Small pass 0
RobertaForCausalLM pass 0
RobertaForQuestionAnswering pass 0
T5ForConditionalGeneration pass 0
T5Small pass 0
TrOCRForCausalLM pass 0
XGLMForCausalLM pass 0
XLNetLMHeadModel pass 0
YituTechConvBert pass 0
meta-llama/Llama-3.2-1B pass 0
google/gemma-2-2b pass 0
google/gemma-3-4b-it pass_due_to_skip 0
openai/whisper-tiny pass 0
25 Qwen/Qwen3-0.6B TrOCRForCausalLM pass 0
26 mistralai/Mistral-7B-Instruct-v0.3 XGLMForCausalLM pass 0
27 openai/gpt-oss-20b XLNetLMHeadModel pass 0
28 YituTechConvBert pass 0
29 meta-llama/Llama-3.2-1B pass 0
30 google/gemma-2-2b pass 0
31 google/gemma-3-4b-it pass_due_to_skip 0
32 openai/whisper-tiny pass 0
33 Qwen/Qwen3-0.6B pass 0
34 mistralai/Mistral-7B-Instruct-v0.3 pass 0
35 openai/gpt-oss-20b pass 0
38
39
40
41
42
43
50
51
52
53
54
55
58
59
60
61
62
63
66
67
68
69
70
71
72
73
74
75
78
79
80
81
82
83
84
85
86
87
88
89
90
91

Some files were not shown because too many files have changed in this diff Show More