34f681c13b
[CI] Remove inductor skip list for timm_models ( #98840 )
...
Summary: check against the expected csv file instead of skipping tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98840
Approved by: https://github.com/ezyang
2023-04-15 13:54:41 +00:00
c4de7fdef5
[CI] Mark sebotnet33ts_256 as nondeterministic ( #98356 )
...
Summary: The goal is make sure the new dashboard doesn't give noisy
alert on this test.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98356
Approved by: https://github.com/ezyang
2023-04-05 12:05:47 +00:00
d305d4a57f
[Dynamo] Fix TIMM benchmark compute_loss ( #97423 )
...
Fixes #97382
#95416 fixed a critical bug in dynamo benchmark, where AMP tests fall back to eager mode before that PR. However, after that PR, we found [a list of TIMM models amp + eager + training testing failed](https://docs.google.com/spreadsheets/d/1DEhirVOkj15Lu4UNawIUon9MqkVLaWqyT-DQPif5NHk/edit#gid=0 ).
Now we identified the root cause is: high loss values make gradient checking harder, as small changes in accumulation order upset accuracy checks. We should switch to the helper function ```reduce_to_scalar_loss``` which has been used by Torchbench tests.
After switching to ```reduce_to_scalar_loss```, TIMM models accuracy pass rate grows from 67.74% to 91.94% in my local test. The rest 5 failed models(ese_vovnet19b_dw, fbnetc_100, mnasnet_100, mobilevit_s, sebotnet33ts_256) need further investigation and handling, but I think it should be similar reason.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97423
Approved by: https://github.com/Chillee
2023-03-24 16:50:28 +00:00
60a68477a6
Bump black version to 23.1.0 ( #96578 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96578
Approved by: https://github.com/ezyang
2023-03-15 06:27:59 +00:00
02792ff16f
[CI] Make inductor-perf-test-nightly produce data for dashboard ( #95685 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95685
Approved by: https://github.com/ezyang , https://github.com/huydhn
2023-03-06 03:14:03 +00:00
8d45f555d7
[BE] [1/3] Rewrite super()
calls in caffe2 and benchmarks ( #94587 )
...
Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied.
- #94587
- #94588
- #94592
Also, methods with only a `super()` call are removed:
```diff
class MyModule(nn.Module):
- def __init__(self):
- super().__init__()
-
def forward(self, ...):
...
```
Some cases that change the semantics should be kept unchanged. E.g.:
f152a79be9/caffe2/python/net_printer.py (L184-L190)
f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94587
Approved by: https://github.com/ezyang
2023-02-11 18:19:48 +00:00
333e771394
Add benchmarks.py to run all benchmarks, add new file with all torchbench model names ( #94146 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94146
Approved by: https://github.com/ezyang
2023-02-08 01:18:38 +00:00
498c6ed8d8
Add missing format string ( #93866 )
...
Signed-off-by: Edward Z. Yang <ezyang@meta.com >
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93866
Approved by: https://github.com/albanD , https://github.com/Skylion007
2023-02-01 20:56:46 +00:00
f646126ecd
Running timm benchmarks no longer silently retries ( #93030 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93030
Approved by: https://github.com/eellison
2023-01-26 03:44:38 +00:00
c52567ec18
Switch CI exclusions to use exact match. ( #92761 )
...
Since the CI exclusions are hard-coded in our script, we might as well require them to match exactly. This solved some head scratching where I was like, "this model is not obviously excluded, why is it not showing up in CI."
Signed-off-by: Edward Z. Yang <ezyang@meta.com >
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92761
Approved by: https://github.com/jansel
2023-01-22 17:10:20 +00:00
0c1777acec
Dynamo benchmark: add CPU specific changes ( #88477 )
...
This pr adds some CPU specific changes:
- Add support for IPEX backend
- https://github.com/pytorch/torchdynamo/issues/1618
- https://github.com/pytorch/torchdynamo/issues/1534
- Enable CPU launcher in runner.py.
- Fix the issue that some environment variables are not support on CPU
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88477
Approved by: https://github.com/jgong5 , https://github.com/jansel
2023-01-07 09:26:06 +00:00
84e73e1269
[inductor] small CI improvements ( #91140 )
...
Summary: 1) Increase timm_model download retry times; 2) Skip certain
random triton failures.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91140
Approved by: https://github.com/williamwen42
2022-12-20 17:26:12 +00:00
7ebc45eadd
[dynamo] Better error message for bad timm model name ( #91049 )
...
Fixes https://github.com/pytorch/torchdynamo/issues/1995
Running `python benchmarks/dynamo/timm_models.py --performance --float32 -dcuda --output=out.csv --training --inductor --only bad_model_name` gives
```
Traceback (most recent call last):
File "benchmarks/dynamo/timm_models.py", line 338, in <module>
main(TimmRunnner())
File "/scratch/williamwen/work/pytorch/benchmarks/dynamo/common.py", line 1660, in main
return maybe_fresh_cache(run, args.cold_start_latency and args.only)(
File "/scratch/williamwen/work/pytorch/benchmarks/dynamo/common.py", line 833, in inner
return fn(*args, **kwargs)
File "/scratch/williamwen/work/pytorch/benchmarks/dynamo/common.py", line 2000, in run
) = runner.load_model(device, model_name, batch_size=batch_size)
File "benchmarks/dynamo/timm_models.py", line 215, in load_model
raise RuntimeError(f"Failed to load model '{model_name}'")
RuntimeError: Failed to load model 'bad_model_name'
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91049
Approved by: https://github.com/ezyang
2022-12-19 22:37:34 +00:00
7c524221ba
[reland3][dynamo] Revert "Revert "[reland][dynamo] use optimizers correctly in benchmar… ( #90956 )
...
…king (#87492 )" (#90746 )"
This reverts commit ff1bbc2773a31ab839438966266ed8ee206cb8c5.
This should be okay to merge now. The flakiness of HF models will be fixed by seeding the rng (https://github.com/pytorch/pytorch/pull/90936 ), and the numeric mismatch was root-caused to three decomps (still investigating why those decomps cause this) see https://github.com/pytorch/torchdynamo/issues/1985 for more detail.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90956
Approved by: https://github.com/desertfire
2022-12-17 06:27:15 +00:00
6bc6fb21db
Revert "[reland2][dynamo] Revert "Revert "[reland][dynamo] use optimizers correctly in benchmar… ( #90956 )"
...
This reverts commit 8bc38ae4e2037ae42813d552e5d412db77167bc0.
Reverted https://github.com/pytorch/pytorch/pull/90956 on behalf of https://github.com/desertfire due to Causing TIMM model failures
2022-12-16 19:28:05 +00:00
8bc38ae4e2
[reland2][dynamo] Revert "Revert "[reland][dynamo] use optimizers correctly in benchmar… ( #90956 )
...
…king (#87492 )" (#90746 )"
This reverts commit ff1bbc2773a31ab839438966266ed8ee206cb8c5.
This should be okay to merge now. The flakiness of HF models will be fixed by seeding the rng (https://github.com/pytorch/pytorch/pull/90936 ), and the numeric mismatch was root-caused to three decomps (still investigating why those decomps cause this) see https://github.com/pytorch/torchdynamo/issues/1985 for more detail.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90956
Approved by: https://github.com/desertfire
2022-12-16 13:33:38 +00:00
57e2090e21
[Dynamo][TIMM][Benchmarks] Fix TIMM 0.8.0dev
breaking the timm_models.py
script's data config ( #90404 )
...
It seems `0.8.0dev` breaks the current argument passing by expecting a dictionary instead of a namespace after 0dadb4a6e9
CC @desertfire @ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90404
Approved by: https://github.com/ngimel
2022-12-15 22:21:19 +00:00
ff1bbc2773
Revert "[reland][dynamo] use optimizers correctly in benchmarking ( #87492 )" ( #90746 )
...
This reverts commit d91d7a322172da4d92672301f3cfa3344d544a9e.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90746
Approved by: https://github.com/anijain2305
2022-12-13 11:37:16 +00:00
d91d7a3221
[reland][dynamo] use optimizers correctly in benchmarking ( #87492 )
...
Reland https://github.com/pytorch/pytorch/pull/87311
mlazos: updated to use SGD to not add a bunch of additional memory allocations (like Adam)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87492
Approved by: https://github.com/desertfire
2022-12-09 20:32:53 +00:00
f7cdd3a7a0
[inductor] Use a large tolerance for botnet26t_256 ( #90383 )
...
Summary: botnet26t_256 shows random tolerance failure on CI. The root
cause of this randomness is still to-be-invesitgated, but let's use a
larger tolerance for now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90383
Approved by: https://github.com/ezyang
2022-12-07 19:35:06 +00:00
3162a48a77
[dynamo][benchmarks] Call zero grad ( #90026 )
...
Hoping that it might reduce some flakiness
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90026
Approved by: https://github.com/williamwen42
2022-12-02 04:05:57 +00:00
68805b08d1
[benchmarks][dynamo] Trying CI - Set train() for TIMM models accuracy tests ( #89780 )
...
Moving to train mode for TIMM models and also raising batch size for accuracy testing.
Raising batch size seems to remove a lot of noise/instability coming from batch_norm decomposition.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89780
Approved by: https://github.com/ngimel
2022-11-30 12:57:35 +00:00
1b575782a0
[dynamo][benchmarks] use fresh inductor cache and raise batch size wherever possible ( #88044 )
...
cc @mlazos @soumith @voznesenskym @yanboliang @penguinwu @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88044
Approved by: https://github.com/ngimel
2022-10-30 17:10:17 +00:00
57b36bf353
Bring back TIMM model inductor CI test ( #87730 )
...
Summary: https://github.com/pytorch/pytorch/pull/87588 has solved the
inductor compilation speed regression, so we can try to run TIMM models
with fewer shards and also enable pretained model downloading which
should resolve the flakyness we have seen previously.
cc @jansel @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87730
Approved by: https://github.com/anijain2305
2022-10-26 00:15:35 +00:00
f047dadab9
Enable inductor CI for TIMM ( #87462 )
...
cc @jansel @lezcano @fdrocha @mlazos @soumith @voznesenskym @yanboliang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87462
Approved by: https://github.com/anijain2305
2022-10-22 05:50:00 +00:00
f38a88c4dd
Revert "[dynamo] use optimizers correctly in benchmarking ( #87311 )"
...
This reverts commit 703c19008df4700b6a522b0ae5c4b6d5ffc0906f.
Reverted https://github.com/pytorch/pytorch/pull/87311 on behalf of https://github.com/anijain2305 due to Bin (desertfire) is trying to get torchbench models in CI, and this PR prevents that. I will bring this back after models are in CI.
2022-10-20 22:01:51 +00:00
703c19008d
[dynamo] use optimizers correctly in benchmarking ( #87311 )
...
We were not setting optimizers correctly
* This hid the issue that we see here - https://github.com/pytorch/torchdynamo/issues/1687
* This has also revealed that we are activating profilers for every dynamo optimized model call. This could affect speedup
cc @jansel @lezcano @fdrocha
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87311
Approved by: https://github.com/mlazos , https://github.com/yanboliang
2022-10-20 05:46:25 +00:00
c30cfb07ab
[dynamo][dashboard] Run 2 iterations for the correctness runs ( #87104 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87104
Approved by: https://github.com/soumith
2022-10-18 15:53:40 +00:00
c7c09722ad
Move TorchDynamo into PyTorch core ( #86461 )
...
Context:
https://github.com/pytorch/torchdynamo/issues/1588
This PR moves [TorchDynamo](https://github.com/pytorch/torchdynamo ) and TorchInductor into PyTorch core.
- `torchdynamo` becomes `torch._dynamo`
- `torchinductor` becomes `torch._inductor`
This PR was generated by running `copy_to_core.sh` in https://github.com/pytorch/torchdynamo/pull/1538
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86461
Approved by: https://github.com/voznesenskym
2022-10-13 23:18:06 +00:00