2 Commits

Author SHA1 Message Date
8c74162074 Reduce the number of layers for mixtral moe model to adapt CI memory limitation (#125608)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125608
Approved by: https://github.com/Chillee, https://github.com/huydhn
2024-05-06 21:52:25 +00:00
43e243180b Add gpt-fast as a static benchmark (#121886)
Run:
```
python benchmarks/gpt_fast/benchmark.py
```
It generated a cvs file ```gpt_fast_benchmark.csv``` with the content like:
```
name,mode,target,actual,percentage
Llama-2-7b-chat-hf,bfloat16,104,103.458618,99.48%
Llama-2-7b-chat-hf,int8,155,158.964615,102.56%
Mixtral-8x7B-v0.1,int8,97,99.760132,102.85%
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121886
Approved by: https://github.com/Chillee
2024-03-14 21:46:59 +00:00