Commit Graph

18 Commits

Author SHA1 Message Date
5cb05a82b4 [BC breaking] move benchmarking + prefer inductor path (#132827)
move benchmarking out of `torch._inductor.runtime.runtime_utils` and into `torch._inductor.runtime.benchmarking`, and prefer this path over directly accessing Triton's benchmarking

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132827
Approved by: https://github.com/eellison
2024-08-08 00:47:45 +00:00
c0ed38e644 [BE][Easy][3/19] enforce style for empty lines in import segments in benchmarks/ (#129754)
See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter.

You can review these PRs via:

```bash
git diff --ignore-all-space --ignore-blank-lines HEAD~1
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129754
Approved by: https://github.com/ezyang
2024-07-17 14:34:42 +00:00
7b5a8424a1 [GPT-fast] Update micro benchmark numbers as A100-50G (#129799)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129799
Approved by: https://github.com/Chillee
2024-06-29 04:36:07 +00:00
9554a9af87 [GPT-benchmark] Distinguish LLM models and mirco-benchmarks (#129498)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129498
Approved by: https://github.com/huydhn
2024-06-26 00:25:05 +00:00
9e8443b56f Remove dtype from gpt-fast micro benchmark experiments model name (#128789)
Per comments on https://github.com/pytorch/test-infra/pull/5344, we already have a dtype column with the same information

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128789
Approved by: https://github.com/yanboliang
2024-06-18 01:26:45 +00:00
f37121bb74 Add model name, quantization and device to gpt_fast micro benchmark output (#128091)
A small enhancement to https://hud.pytorch.org/benchmark/llms with these columns in the output.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128091
Approved by: https://github.com/yanboliang
2024-06-15 01:39:48 +00:00
1fb4effe7a [GPT-fast benchmark] Add MLP, gather + gemv, gemv micro benchmark (#128002)
Output example:
```
| name                         | metric                    | target  | actual  |
|------------------------------|---------------------------|---------|---------|
| layer_norm_bfloat16          | memory_bandwidth(GB/s)    | 1017    | 1000.01 |
| mlp_layer_norm_gelu_bfloat16 | flops_utilization         | 0.71    | 0.71    |
| gemv_int8                    | memory_bandwidth(GB/s)    | 990     | 984.06 |
| gemv_bfloat16                | memory_bandwidth(GB/s)    | 1137    | 1137.92 |
| gather_gemv_int8             | memory_bandwidth(GB/s)    | 1113    | 1111.09 |
| gather_gemv_bfloat16         | memory_bandwidth(GB/s)    | 1249    | 1248.15 |

```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128002
Approved by: https://github.com/Chillee
2024-06-14 17:03:22 +00:00
0be06b08fc [GPT-fast benchmark] Merge GPT-fast and micro benchmark output as one CSV file (#127586)
Consolidate GPT-fast models benchmark with micro-benchmark, and save output as one CSV file with the same format as https://github.com/pytorch/pytorch/pull/126754#issue-2307296847.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127586
Approved by: https://github.com/Chillee
2024-05-31 18:50:49 +00:00
26f4f10ac8 [5/N][Easy] fix typo for usort config in pyproject.toml (kown -> known): sort torch (#127126)
The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127126
Approved by: https://github.com/kit1980
2024-05-27 14:49:57 +00:00
55c0ab2887 Revert "[5/N][Easy] fix typo for usort config in pyproject.toml (kown -> known): sort torch (#127126)"
This reverts commit 7763c83af67eebfdd5185dbe6ce15ece2b992a0f.

Reverted https://github.com/pytorch/pytorch/pull/127126 on behalf of https://github.com/XuehaiPan due to Broken CI ([comment](https://github.com/pytorch/pytorch/pull/127126#issuecomment-2133044286))
2024-05-27 09:22:08 +00:00
7763c83af6 [5/N][Easy] fix typo for usort config in pyproject.toml (kown -> known): sort torch (#127126)
The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127126
Approved by: https://github.com/kit1980
ghstack dependencies: #127122, #127123, #127124, #127125
2024-05-27 04:22:18 +00:00
a174c536f8 GPT-fast benchmark: adding memory bandwidth and use A100-40GB as target (#125881)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125881
Approved by: https://github.com/Chillee
2024-05-11 10:46:54 +00:00
9dee3ef919 Ingest gpt-fast benchmark results from S3 to Rockset (#125891)
A follow-up of https://github.com/pytorch/pytorch/pull/125450, this extends the `tools/stats/upload_dynamo_perf_stats.py` script to upload arbitrary benchmark results in CSV format.

* Upload gpt-fast benchmarks to a new Rockset collection `benchmarks/oss_ci_benchmark`.  The file is in the following format:
```
$ cat test/test-reports/gpt_fast_benchmark.csv
name,mode,target,actual,percentage
Llama-2-7b-chat-hf,bfloat16,104,104.754128,100.73%
```
* The CSV output needs to be kept in `test/test-reports` directory.
* Re-use the existing `.github/workflows/upload-test-stats.yml` workflow

### Testing

Run the commands manually

```
(py3.11) huydo@huydo-mbp pytorch % python3 -m tools.stats.upload_artifacts --workflow-run-id 9026179545 --workflow-run-attempt 1 --repo "pytorch/pytorch"
Using temporary directory: /var/folders/x4/2kd9r0fn5b9bf_sbcw16fxsc0000gn/T/tmp6eug3cdz
Downloading test-jsons-runattempt1-test-inductor-micro-benchmark-1-1-linux.gcp.a100_24803987212.zip
Upload /private/var/folders/x4/2kd9r0fn5b9bf_sbcw16fxsc0000gn/T/tmp6eug3cdz/test-jsons-runattempt1-test-inductor-micro-benchmark-1-1-linux.gcp.a100_24803987212.zip to s3://gha-artifacts/pytorch/pytorch/9026179545/1/artifact/test-jsons-test-inductor-micro-benchmark-1-1-linux.gcp.a100_24803987212.zip
Downloading test-reports-runattempt1-test-inductor-micro-benchmark-1-1-linux.gcp.a100_24803987212.zip
Upload /private/var/folders/x4/2kd9r0fn5b9bf_sbcw16fxsc0000gn/T/tmp6eug3cdz/test-reports-runattempt1-test-inductor-micro-benchmark-1-1-linux.gcp.a100_24803987212.zip to s3://gha-artifacts/pytorch/pytorch/9026179545/1/artifact/test-reports-test-inductor-micro-benchmark-1-1-linux.gcp.a100_24803987212.zip

(py3.11) huydo@huydo-mbp pytorch % python3 -m tools.stats.upload_dynamo_perf_stats --workflow-run-id 9026179545 --workflow-run-attempt 1 --repo "pytorch/pytorch" --head-branch "ciflow/inductor-micro-benchmark/125891" --rockset-collection oss_ci_benchmark --rockset-workspace benchmarks --match-filename "^gpt_fast_benchmark"
Using temporary directory: /var/folders/x4/2kd9r0fn5b9bf_sbcw16fxsc0000gn/T/tmp8xr4sdxk
Downloading test-reports-test-inductor-micro-benchmark-1-1-linux.gcp.a100_24803987212.zip
Extracting test-reports-test-inductor-micro-benchmark-1-1-linux.gcp.a100_24803987212.zip to unzipped-test-reports-test-inductor-micro-benchmark-1-1-linux.gcp.a100_24803987212
Processing gpt_fast_benchmark from test-reports-test-inductor-micro-benchmark-1-1-linux.gcp.a100_24803987212.zip
Writing 3 documents to Rockset
Done!
```

Also run a sanity check on ingesting inductor benchmark results:

```
(py3.11) huydo@huydo-mbp pytorch % python -m tools.stats.upload_dynamo_perf_stats --workflow-run-id 8997654356 --workflow-run-attempt 1 --repo pytorch/pytorch --head-branch main --rockset-collection torch_dynamo_perf_stats --rockset-workspace inductor --match-filename "^inductor_"
...
Writing 4904 documents to Rockset
Done!
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125891
Approved by: https://github.com/yanboliang
2024-05-11 04:16:36 +00:00
f87fbfdb01 GPT-fast benchmark: remove Embedding layer from model size (#125901)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125901
Approved by: https://github.com/Chillee
2024-05-10 08:18:13 +00:00
8c74162074 Reduce the number of layers for mixtral moe model to adapt CI memory limitation (#125608)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125608
Approved by: https://github.com/Chillee, https://github.com/huydhn
2024-05-06 21:52:25 +00:00
1d6c5972c1 [BE]: Optimize min/max/sum comprehensions C419 (#123960)
Automatic fixes that replaces certain list comprehensions with generator ones where appropriate so that they are immediately consumed. This is preview functionality in ruff for rule C419 and it was automatically applied.

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123960
Approved by: https://github.com/malfet
2024-04-12 23:54:15 +00:00
ed37fbdf60 made gpt_fast benchmark run faster (#122872)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122872
Approved by: https://github.com/msaroufim, https://github.com/yifuwang
ghstack dependencies: #122848
2024-03-29 03:49:19 +00:00
43e243180b Add gpt-fast as a static benchmark (#121886)
Run:
```
python benchmarks/gpt_fast/benchmark.py
```
It generated a cvs file ```gpt_fast_benchmark.csv``` with the content like:
```
name,mode,target,actual,percentage
Llama-2-7b-chat-hf,bfloat16,104,103.458618,99.48%
Llama-2-7b-chat-hf,int8,155,158.964615,102.56%
Mixtral-8x7B-v0.1,int8,97,99.760132,102.85%
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121886
Approved by: https://github.com/Chillee
2024-03-14 21:46:59 +00:00