pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Author	SHA1	Message	Date
Yuanyuan Chen	fdab48a7c1	Enable all PIE rules on ruff (#165814 ) This PR enables all PIE rules on ruff, there are already some enabled rules from this family, the new added rules are ``` PIE796 Enum contains duplicate value: {value} PIE808 Unnecessary start argument in range ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/165814 Approved by: https://github.com/ezyang	2025-10-18 07:36:18 +00:00
PyTorch MergeBot	24520b8386	Revert "Enable all PIE rules on ruff (#165814 )" This reverts commit c79dfdc6550e872783aa5cb5fc9e86589bf18872. Reverted https://github.com/pytorch/pytorch/pull/165814 on behalf of https://github.com/cyyever due to Need to cover more files ([comment](https://github.com/pytorch/pytorch/pull/165814#issuecomment-3417931863))	2025-10-18 07:21:08 +00:00
Yuanyuan Chen	c79dfdc655	Enable all PIE rules on ruff (#165814 ) This PR enables all PIE rules on ruff, there are already some enabled rules from this family, the new added rules are ``` PIE796 Enum contains duplicate value: {value} PIE808 Unnecessary start argument in range ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/165814 Approved by: https://github.com/ezyang	2025-10-18 06:40:12 +00:00
Yuanyuan Chen	b2953f5643	[9/N] Apply ruff UP035 rule (#165515 ) This is follow-up of #165214 to continue applying ruff UP035 rule to the code base. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165515 Approved by: https://github.com/Lucaskabela	2025-10-17 00:09:51 +00:00
Xuehai Pan	42015db6a9	[BE] fix typos in benchmarks/ (#156077 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156077 Approved by: https://github.com/Skylion007, https://github.com/malfet ghstack dependencies: #156069	2025-06-17 13:12:18 +00:00
Xuehai Pan	c73a92fbf5	[BE][CI] bump `ruff` to 0.9.2: multiline `assert` statements (#144546 ) Reference: https://docs.astral.sh/ruff/formatter/black/#assert-statements > Unlike Black, Ruff prefers breaking the message over breaking the assertion, similar to how both Ruff and Black prefer breaking the assignment value over breaking the assignment target: > > ```python > # Input > assert ( > len(policy_types) >= priority + num_duplicates > ), f"This tests needs at least {priority+num_duplicates} many types." > > > # Black > assert ( > len(policy_types) >= priority + num_duplicates > ), f"This tests needs at least {priority+num_duplicates} many types." > > # Ruff > assert len(policy_types) >= priority + num_duplicates, ( > f"This tests needs at least {priority + num_duplicates} many types." > ) > ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/144546 Approved by: https://github.com/malfet	2025-02-27 20:46:16 +00:00
Huy Do	eb553ae3cf	Fix broken gpt_fast micro benchmark after #144315 (#145235 ) The benchmark is failing with the following error ``` File "/var/lib/jenkins/workspace/benchmarks/gpt_fast/benchmark.py", line 333, in <module> main(output_file=args.output, only_model=args.only) File "/var/lib/jenkins/workspace/benchmarks/gpt_fast/benchmark.py", line 308, in main lst = func(device) File "/var/lib/jenkins/workspace/benchmarks/gpt_fast/benchmark.py", line 66, in run_mlp_layer_norm_gelu us_per_iter = benchmarker.benchmark(compiled_mod, (x,)) * 1000 File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/_inductor/runtime/benchmarking.py", line 39, in wrapper return fn(self, args, *kwargs) TypeError: benchmark() missing 1 required positional argument: 'fn_kwargs' ``` An example error is https://github.com/pytorch/pytorch/actions/runs/12862761823/job/35858912555 I also assign `oncall: pt2` as the owner of this job going forward. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145235 Approved by: https://github.com/nmacchioni	2025-01-21 17:42:24 +00:00
Aaron Orenstein	07669ed960	PEP585 update - benchmarks tools torchgen (#145101 ) This is one of a series of PRs to update us to PEP585 (changing Dict -> dict, List -> list, etc). Most of the PRs were completely automated with RUFF as follows: Since RUFF UP006 is considered an "unsafe" fix first we need to enable unsafe fixes: ``` --- a/tools/linter/adapters/ruff_linter.py +++ b/tools/linter/adapters/ruff_linter.py @@ -313,6 +313,7 @@ "ruff", "check", "--fix-only", + "--unsafe-fixes", "--exit-zero", *([f"--config={config}"] if config else []), "--stdin-filename", ``` Then we need to tell RUFF to allow UP006 (as a final PR once all of these have landed this will be made permanent): ``` --- a/pyproject.toml +++ b/pyproject.toml @@ -40,7 +40,7 @@ [tool.ruff] -target-version = "py38" +target-version = "py39" line-length = 88 src = ["caffe2", "torch", "torchgen", "functorch", "test"] @@ -87,7 +87,6 @@ "SIM116", # Disable Use a dictionary instead of consecutive `if` statements "SIM117", "SIM118", - "UP006", # keep-runtime-typing "UP007", # keep-runtime-typing ] select = [ ``` Finally running `lintrunner -a --take RUFF` will fix up the deprecated uses. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145101 Approved by: https://github.com/bobrenjc93	2025-01-18 05:05:07 +00:00
Nicolas Macchioni	4375c2c534	Cleanup gpt_fast benchmark (#144517 ) This is an exact copy of https://github.com/pytorch/pytorch/pull/144484, I bricked the last PR running ghstack land :( Pull Request resolved: https://github.com/pytorch/pytorch/pull/144517 Approved by: https://github.com/davidberard98, https://github.com/huydhn	2025-01-10 05:22:13 +00:00
bobrenjc93	fcf9dc3b11	Migrate from Tuple -> tuple in benchmarks (#144259 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144259 Approved by: https://github.com/yanboliang	2025-01-07 04:09:52 +00:00
Yanbo Liang	792e6184c5	[GPT-fast] Support run spcific model or micro-benchmark (#143607 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143607 Approved by: https://github.com/BoyuanFeng, https://github.com/jerryzh168, https://github.com/huydhn	2024-12-20 19:58:07 +00:00
Tom Ritchford	498a7808ff	Fix unused Python variables outside torch/ and test/ (#136359 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/136359 Approved by: https://github.com/albanD	2024-12-11 17:10:23 +00:00
Huy Do	fe68f61c59	Migrate micro benchmark results to benchmark database schema v3 (#141745 ) Similar to https://github.com/pytorch/pytorch/pull/141087, this uploads the micro benchmark results to benchmark database with its new schema v3. The data can then be queried. ~I'm testing with `inductor-micro-benchmark-x86` which should be sufficient because `inductor-micro-benchmark` is broken atm. The CSV output stays for now until the dashboard is migrated to schema v3.~ https://github.com/pytorch/pytorch/issues/141747 has been resolved, so inductor-micro-benchmark should work now Pull Request resolved: https://github.com/pytorch/pytorch/pull/141745 Approved by: https://github.com/yanboliang	2024-12-02 19:45:51 +00:00
Jerry Zhang	a962ae511d	Extend gpt-fast LLM dashboard to support torchao autoquant (#140627 ) Summary: We want to test autoquant on relevant LLM models right now only llama2 and mixtral, but want to extend to more models like https://github.com/vllm-project/vllm/tree/main/vllm/model_executor/models Test Plan: ``` Llama-2-7b-chat-hf Mixtral-8x7B-v0.1 gpt-fast int8 112.98 147.92 torchao autoquant 87.41 85.90 torchao autoquantv2 131.12 79.59 ``` https://hud.pytorch.org/benchmark/llms?repoName=pytorch%2Fpytorch in pytorch/benchmarks/gpt_fast ``` python benchmark.py ``` output: ``` Loading model Llama-2-7b-chat-hf Using int8 weight-only quantization! Time to load model: 2.80 seconds Compilation time: 170.24 seconds Average tokens/sec: 112.98 tokens/sec Average bandwidth achieved: 746.86 GB/s Memory used: 7.95 GB Loading model Mixtral-8x7B-v0.1 Using int8 weight-only quantization! Time to load model: 0.24 seconds Compilation time: 181.81 seconds Average tokens/sec: 147.92 tokens/sec Average bandwidth achieved: 953.06 GB/s Memory used: 32.45 GB Loading model Llama-2-7b-chat-hf Time to load model: 0.11 seconds Using autoquant Compilation time: 109.31 seconds Average tokens/sec: 87.17 tokens/sec Average bandwidth achieved: 1151.86 GB/s Memory used: 32.45 GB Loading model Llama-2-7b-chat-hf Time to load model: 0.11 seconds Compilation time: 48.08 seconds Average tokens/sec: 87.41 tokens/sec Average bandwidth achieved: 1155.05 GB/s Memory used: 36.86 GB Loading model Mixtral-8x7B-v0.1 Time to load model: 0.20 seconds Using autoquant Compilation time: 47.32 seconds Average tokens/sec: 85.90 tokens/sec Average bandwidth achieved: 1106.37 GB/s Memory used: 66.81 GB local test (autoquant v2): Loading model Mixtral-8x7B-v0.1 Compilation time: 124.40 seconds Average tokens/sec: 90.41 tokens/sec Average bandwidth achieved: 1164.47 GB/s Memory used: 53.91 GB Loading model Llama-2-7b-chat-hf TODO ``` gpt_fast_benchmark.csv: ``` name,metric,target,actual,dtype,device,arch,is_model Llama-2-7b-chat-hf,token_per_sec,144,112.98,int8,cuda,NVIDIA PG509-210,True Llama-2-7b-chat-hf,memory_bandwidth(GB/s),957,746.86,int8,cuda,NVIDIA PG509-210,True Llama-2-7b-chat-hf,compilation_time(s),136,170.24,int8,cuda,NVIDIA PG509-210,True Mixtral-8x7B-v0.1,token_per_sec,175,147.92,int8,cuda,NVIDIA PG509-210,True Mixtral-8x7B-v0.1,memory_bandwidth(GB/s),1130,953.06,int8,cuda,NVIDIA PG509-210,True Mixtral-8x7B-v0.1,compilation_time(s),133,181.81,int8,cuda,NVIDIA PG509-210,True gemv,memory_bandwidth(GB/s),870,867.06,int8,cuda,NVIDIA PG509-210,False gemv,memory_bandwidth(GB/s),990,1092.43,bfloat16,cuda,NVIDIA PG509-210,False layer_norm,memory_bandwidth(GB/s),950,573.57,bfloat16,cuda,NVIDIA PG509-210,False Llama-2-7b-chat-hf,token_per_sec,144,87.17,autoquant,cuda,NVIDIA PG509-210,True Llama-2-7b-chat-hf,memory_bandwidth(GB/s),957,1151.86,autoquant,cuda,NVIDIA PG509-210,True Llama-2-7b-chat-hf,compilation_time(s),136,109.31,autoquant,cuda,NVIDIA PG509-210,True gather_gemv,memory_bandwidth(GB/s),990,945.38,int8,cuda,NVIDIA PG509-210,False gather_gemv,memory_bandwidth(GB/s),1060,1188.29,bfloat16,cuda,NVIDIA PG509-210,False mlp_layer_norm_gelu,flops_utilization,0.8,0.82,bfloat16,cuda,NVIDIA PG509-210,False Llama-2-7b-chat-hf,token_per_sec,94,87.41,bfloat16,cuda,NVIDIA PG509-210,True Llama-2-7b-chat-hf,memory_bandwidth(GB/s),1253,1155.05,bfloat16,cuda,NVIDIA PG509-210,True Llama-2-7b-chat-hf,compilation_time(s),133,48.08,bfloat16,cuda,NVIDIA PG509-210,True Mixtral-8x7B-v0.1,token_per_sec,175,85.90,autoquant,cuda,NVIDIA PG509-210,True Mixtral-8x7B-v0.1,memory_bandwidth(GB/s),1130,1106.37,autoquant,cuda,NVIDIA PG509-210,True Mixtral-8x7B-v0.1,compilation_time(s),133,47.32,autoquant,cuda,NVIDIA PG509-210,True ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/140627 Approved by: https://github.com/huydhn	2024-11-27 21:57:48 +00:00
Xuehai Pan	267f82b860	[BE] Format `.ci/` / `.github/` / `benchmarks/` / `functorch/` / `tools/` / `torchgen/` with `ruff format` (#132577 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132577 Approved by: https://github.com/malfet	2024-10-11 18:30:26 +00:00
Yanbo Liang	c30042fbeb	[GPT-fast] Update compilation time target for Llama & Mixtral (#135817 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/135817 Approved by: https://github.com/xmfan, https://github.com/huydhn	2024-09-12 07:13:44 +00:00
Huy Do	24a223c49d	Run inductor micro benchmark on x86 metal runner (#135042 ) This enables inductor micro benchmark on CPU (x86): * Running on AWS metal runner for more accurate benchmark * I add a new `arch` column, which will be either x86_64 or arm64 for CPU or GPU name for GPU. We can use this later to differentiate between different setup, i.e. cuda (a100) vs cuda (a10g) or cpu (x86_64) vs cpu (arm64) The next step would be to run this one cpu arm64, and cuda (a10g). ### Testing Here is the CSV results from my test run https://github.com/pytorch/pytorch/actions/runs/10709344180 ``` name,metric,target,actual,dtype,device,arch,is_model mlp_layer_norm_gelu,flops_utilization,0.8,17.36,bfloat16,cpu,x86_64,False gather_gemv,memory_bandwidth(GB/s),990,170.80,int8,cpu,x86_64,False gather_gemv,memory_bandwidth(GB/s),1060,204.78,bfloat16,cpu,x86_64,False Mixtral-8x7B-v0.1,token_per_sec,175,26.68,int8,cpu,x86_64,True Mixtral-8x7B-v0.1,memory_bandwidth(GB/s),1130,171.91,int8,cpu,x86_64,True Mixtral-8x7B-v0.1,compilation_time(s),162,47.36,int8,cpu,x86_64,True gemv,memory_bandwidth(GB/s),870,236.36,int8,cpu,x86_64,False gemv,memory_bandwidth(GB/s),990,305.71,bfloat16,cpu,x86_64,False Llama-2-7b-chat-hf,token_per_sec,94,14.01,bfloat16,cpu,x86_64,True Llama-2-7b-chat-hf,memory_bandwidth(GB/s),1253,185.18,bfloat16,cpu,x86_64,True Llama-2-7b-chat-hf,compilation_time(s),162,74.99,bfloat16,cpu,x86_64,True Llama-2-7b-chat-hf,token_per_sec,144,25.09,int8,cpu,x86_64,True Llama-2-7b-chat-hf,memory_bandwidth(GB/s),957,165.83,int8,cpu,x86_64,True Llama-2-7b-chat-hf,compilation_time(s),172,70.69,int8,cpu,x86_64,True layer_norm,memory_bandwidth(GB/s),950,172.03,bfloat16,cpu,x86_64,False ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/135042 Approved by: https://github.com/yanboliang	2024-09-05 21:31:36 +00:00
Nicolas Macchioni	5cb05a82b4	[BC breaking] move benchmarking + prefer inductor path (#132827 ) move benchmarking out of `torch._inductor.runtime.runtime_utils` and into `torch._inductor.runtime.benchmarking`, and prefer this path over directly accessing Triton's benchmarking Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/132827 Approved by: https://github.com/eellison	2024-08-08 00:47:45 +00:00
Aaron Gokaslan	fd4b649e6c	[BE]: Simplify some list comps to generators C419 (#132578 ) Simplifies some list comprehensions to generator which is more efficient. Automatically applied diffs for the most part with ruff Pull Request resolved: https://github.com/pytorch/pytorch/pull/132578 Approved by: https://github.com/ezyang	2024-08-04 17:46:26 +00:00
Xuehai Pan	c0ed38e644	[BE][Easy][3/19] enforce style for empty lines in import segments in `benchmarks/` (#129754 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129754 Approved by: https://github.com/ezyang	2024-07-17 14:34:42 +00:00
Yanbo Liang	7b5a8424a1	[GPT-fast] Update micro benchmark numbers as A100-50G (#129799 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/129799 Approved by: https://github.com/Chillee	2024-06-29 04:36:07 +00:00
Yanbo Liang	9554a9af87	[GPT-benchmark] Distinguish LLM models and mirco-benchmarks (#129498 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/129498 Approved by: https://github.com/huydhn	2024-06-26 00:25:05 +00:00
Huy Do	9e8443b56f	Remove dtype from gpt-fast micro benchmark experiments model name (#128789 ) Per comments on https://github.com/pytorch/test-infra/pull/5344, we already have a dtype column with the same information Pull Request resolved: https://github.com/pytorch/pytorch/pull/128789 Approved by: https://github.com/yanboliang	2024-06-18 01:26:45 +00:00
Yanbo Liang	a489792bb2	[GPT-benchmark] Fix memory bandwidth for MoE (#128783 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/128783 Approved by: https://github.com/Chillee ghstack dependencies: #128768	2024-06-17 21:04:57 +00:00
Yanbo Liang	8c06eae17e	[GPT-benchmark] Add metric: compilation time for GPT models (#128768 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/128768 Approved by: https://github.com/Chillee	2024-06-17 21:04:57 +00:00
Huy Do	f37121bb74	Add model name, quantization and device to gpt_fast micro benchmark output (#128091 ) A small enhancement to https://hud.pytorch.org/benchmark/llms with these columns in the output. Pull Request resolved: https://github.com/pytorch/pytorch/pull/128091 Approved by: https://github.com/yanboliang	2024-06-15 01:39:48 +00:00
Yanbo Liang	1fb4effe7a	[GPT-fast benchmark] Add MLP, gather + gemv, gemv micro benchmark (#128002 ) Output example: ``` \| name \| metric \| target \| actual \| \|------------------------------\|---------------------------\|---------\|---------\| \| layer_norm_bfloat16 \| memory_bandwidth(GB/s) \| 1017 \| 1000.01 \| \| mlp_layer_norm_gelu_bfloat16 \| flops_utilization \| 0.71 \| 0.71 \| \| gemv_int8 \| memory_bandwidth(GB/s) \| 990 \| 984.06 \| \| gemv_bfloat16 \| memory_bandwidth(GB/s) \| 1137 \| 1137.92 \| \| gather_gemv_int8 \| memory_bandwidth(GB/s) \| 1113 \| 1111.09 \| \| gather_gemv_bfloat16 \| memory_bandwidth(GB/s) \| 1249 \| 1248.15 \| ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/128002 Approved by: https://github.com/Chillee	2024-06-14 17:03:22 +00:00
Yanbo Liang	0be06b08fc	[GPT-fast benchmark] Merge GPT-fast and micro benchmark output as one CSV file (#127586 ) Consolidate GPT-fast models benchmark with micro-benchmark, and save output as one CSV file with the same format as https://github.com/pytorch/pytorch/pull/126754#issue-2307296847. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127586 Approved by: https://github.com/Chillee	2024-05-31 18:50:49 +00:00
Xuehai Pan	26f4f10ac8	[5/N][Easy] fix typo for `usort` config in `pyproject.toml` (`kown` -> `known`): sort torch (#127126 ) The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127126 Approved by: https://github.com/kit1980	2024-05-27 14:49:57 +00:00
PyTorch MergeBot	55c0ab2887	Revert "[5/N][Easy] fix typo for `usort` config in `pyproject.toml` (`kown` -> `known`): sort torch (#127126 )" This reverts commit 7763c83af67eebfdd5185dbe6ce15ece2b992a0f. Reverted https://github.com/pytorch/pytorch/pull/127126 on behalf of https://github.com/XuehaiPan due to Broken CI ([comment](https://github.com/pytorch/pytorch/pull/127126#issuecomment-2133044286))	2024-05-27 09:22:08 +00:00
Xuehai Pan	7763c83af6	[5/N][Easy] fix typo for `usort` config in `pyproject.toml` (`kown` -> `known`): sort torch (#127126 ) The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127126 Approved by: https://github.com/kit1980 ghstack dependencies: #127122, #127123, #127124, #127125	2024-05-27 04:22:18 +00:00
Yanbo Liang	66c23cb021	Add micro-benchmark framework and multi_layer_norm as an example (#126754 ) ```micro_benchmark.py``` output csv example (all numbers are fake, just for demo) ``` name,metric,target,actual multi_layer_norm,inference_time(s),20,19.87 multi_layer_norm,memory_bandwidth(GB/s),108,108.04 llama2-int8, token_per_sec,155,156 llama2-int8,memory_bandwidth(GB/s),92,92.7 ``` Expected dashboard looks like: ``` \| name \| metric \| target \| actual \| change \| \|------------------\|------------------------\|--------\|--------\|--------\| \| multi_layer_norm \| inference_time(s) \| 20 \| 19.87 \| 99% \| \| \| memory_bandwidth(GB/s) \| 108 \| 108.04 \| 101% \| \| llama2-int8 \| token_per_sec \| 155 \| 156 \| 100% \| \| \| memory_bandwidth(GB/s) \| 92 \| 92.7 \| 101% \| ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/126754 Approved by: https://github.com/Chillee	2024-05-22 01:27:37 +00:00
Yanbo Liang	a174c536f8	GPT-fast benchmark: adding memory bandwidth and use A100-40GB as target (#125881 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/125881 Approved by: https://github.com/Chillee	2024-05-11 10:46:54 +00:00
Huy Do	9dee3ef919	Ingest gpt-fast benchmark results from S3 to Rockset (#125891 ) A follow-up of https://github.com/pytorch/pytorch/pull/125450, this extends the `tools/stats/upload_dynamo_perf_stats.py` script to upload arbitrary benchmark results in CSV format. * Upload gpt-fast benchmarks to a new Rockset collection `benchmarks/oss_ci_benchmark`. The file is in the following format: ``` $ cat test/test-reports/gpt_fast_benchmark.csv name,mode,target,actual,percentage Llama-2-7b-chat-hf,bfloat16,104,104.754128,100.73% ``` * The CSV output needs to be kept in `test/test-reports` directory. * Re-use the existing `.github/workflows/upload-test-stats.yml` workflow ### Testing Run the commands manually ``` (py3.11) huydo@huydo-mbp pytorch % python3 -m tools.stats.upload_artifacts --workflow-run-id 9026179545 --workflow-run-attempt 1 --repo "pytorch/pytorch" Using temporary directory: /var/folders/x4/2kd9r0fn5b9bf_sbcw16fxsc0000gn/T/tmp6eug3cdz Downloading test-jsons-runattempt1-test-inductor-micro-benchmark-1-1-linux.gcp.a100_24803987212.zip Upload /private/var/folders/x4/2kd9r0fn5b9bf_sbcw16fxsc0000gn/T/tmp6eug3cdz/test-jsons-runattempt1-test-inductor-micro-benchmark-1-1-linux.gcp.a100_24803987212.zip to s3://gha-artifacts/pytorch/pytorch/9026179545/1/artifact/test-jsons-test-inductor-micro-benchmark-1-1-linux.gcp.a100_24803987212.zip Downloading test-reports-runattempt1-test-inductor-micro-benchmark-1-1-linux.gcp.a100_24803987212.zip Upload /private/var/folders/x4/2kd9r0fn5b9bf_sbcw16fxsc0000gn/T/tmp6eug3cdz/test-reports-runattempt1-test-inductor-micro-benchmark-1-1-linux.gcp.a100_24803987212.zip to s3://gha-artifacts/pytorch/pytorch/9026179545/1/artifact/test-reports-test-inductor-micro-benchmark-1-1-linux.gcp.a100_24803987212.zip (py3.11) huydo@huydo-mbp pytorch % python3 -m tools.stats.upload_dynamo_perf_stats --workflow-run-id 9026179545 --workflow-run-attempt 1 --repo "pytorch/pytorch" --head-branch "ciflow/inductor-micro-benchmark/125891" --rockset-collection oss_ci_benchmark --rockset-workspace benchmarks --match-filename "^gpt_fast_benchmark" Using temporary directory: /var/folders/x4/2kd9r0fn5b9bf_sbcw16fxsc0000gn/T/tmp8xr4sdxk Downloading test-reports-test-inductor-micro-benchmark-1-1-linux.gcp.a100_24803987212.zip Extracting test-reports-test-inductor-micro-benchmark-1-1-linux.gcp.a100_24803987212.zip to unzipped-test-reports-test-inductor-micro-benchmark-1-1-linux.gcp.a100_24803987212 Processing gpt_fast_benchmark from test-reports-test-inductor-micro-benchmark-1-1-linux.gcp.a100_24803987212.zip Writing 3 documents to Rockset Done! ``` Also run a sanity check on ingesting inductor benchmark results: ``` (py3.11) huydo@huydo-mbp pytorch % python -m tools.stats.upload_dynamo_perf_stats --workflow-run-id 8997654356 --workflow-run-attempt 1 --repo pytorch/pytorch --head-branch main --rockset-collection torch_dynamo_perf_stats --rockset-workspace inductor --match-filename "^inductor_" ... Writing 4904 documents to Rockset Done! ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/125891 Approved by: https://github.com/yanboliang	2024-05-11 04:16:36 +00:00
Yanbo Liang	f87fbfdb01	GPT-fast benchmark: remove Embedding layer from model size (#125901 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/125901 Approved by: https://github.com/Chillee	2024-05-10 08:18:13 +00:00
Yanbo Liang	8c74162074	Reduce the number of layers for mixtral moe model to adapt CI memory limitation (#125608 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/125608 Approved by: https://github.com/Chillee, https://github.com/huydhn	2024-05-06 21:52:25 +00:00
Aaron Gokaslan	1d6c5972c1	[BE]: Optimize min/max/sum comprehensions C419 (#123960 ) Automatic fixes that replaces certain list comprehensions with generator ones where appropriate so that they are immediately consumed. This is preview functionality in ruff for rule C419 and it was automatically applied. Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/123960 Approved by: https://github.com/malfet	2024-04-12 23:54:15 +00:00
chilli	ed37fbdf60	made gpt_fast benchmark run faster (#122872 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122872 Approved by: https://github.com/msaroufim, https://github.com/yifuwang ghstack dependencies: #122848	2024-03-29 03:49:19 +00:00
Yanbo Liang	43e243180b	Add gpt-fast as a static benchmark (#121886 ) Run: ``` python benchmarks/gpt_fast/benchmark.py ``` It generated a cvs file ```gpt_fast_benchmark.csv``` with the content like: ``` name,mode,target,actual,percentage Llama-2-7b-chat-hf,bfloat16,104,103.458618,99.48% Llama-2-7b-chat-hf,int8,155,158.964615,102.56% Mixtral-8x7B-v0.1,int8,97,99.760132,102.85% ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/121886 Approved by: https://github.com/Chillee	2024-03-14 21:46:59 +00:00

39 Commits