96 Commits

Author SHA1 Message Date
79caae1c04 Update email address (#7624)
Update contact address

Signed-off-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>
2025-10-07 17:15:49 +00:00
2b68bbc594 Blog of zenflow binding study (#7614)
This PR add a blog/lab for study of zenflow and zero offload performance
with DeepSpeed CPU core binding.

---------

Signed-off-by: Guokai Ma <guokai.ma@gmail.com>
Signed-off-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>
Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
Signed-off-by: Junjie Mao <junjie.mao@linux.alibaba.com>
Co-authored-by: Xinyu Lian <lian7@illinois.edu>
Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>
Co-authored-by: zhengchenyu <zhengchenyu16@163.com>
Co-authored-by: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com>
Co-authored-by: Junjie Mao <junjie.mao@linux.alibaba.com>
2025-10-06 11:38:44 -04:00
65322e103c Super offload blog Chinese version (#7620)
This is the Chinese version of the SuperOffload blog.

---------

Signed-off-by: Guokai Ma <guokai.ma@gmail.com>
Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>
2025-10-04 12:58:51 +00:00
330f738cd7 Minor fix in the SuperOffload blog (#7612)
Polish SuperOffload blog post; minor grammar and style fixes

---------

Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>
2025-10-01 11:02:31 +00:00
462d28c5e6 Add blog for SuperOffload (#7594)
This PR adds a blog post for SuperOffload. More specifically, the blog
covers the design and motivation behind SuperOffload, comparisons with
previous approaches, key experiences and insights, and guidance on
enabling and using SuperOffload.

See also:
[PR#7559](https://github.com/deepspeedai/DeepSpeed/pull/7559) -
SuperOffload implementation.
[PR#990](https://github.com/deepspeedai/DeepSpeedExamples/pull/990) -
Examples.

---------

Signed-off-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>
Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>
2025-09-30 13:59:34 -04:00
cda3f9628c Add blog for ZenFlow (#7463)
This PR adds a blog post and images for ZenFlow, introducing its design,
benefits, and usage. The blog explains how ZenFlow improves GPU
utilization by overlapping computation and communication during
offloaded training.

See also: 
#7391 – core ZenFlow implementation.
[#982](https://github.com/deepspeedai/DeepSpeedExamples/pull/982) - –
benchmarking and fine-tuning example.

---------

Signed-off-by: Tingfeng Lan <erc8gx@virginia.edu>
Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>
Co-authored-by: Hongwei Chen <33092912+hwchen2017@users.noreply.github.com>
2025-08-10 08:50:34 -04:00
9ac9441400 Fix 404s (#7363)
Signed-off-by: Olatunji Ruwase <tjruwase@gmail.com>
2025-06-16 18:54:36 -04:00
bb293aea5d Update folder name (#7343)
Sync folder name with release date

---------

Signed-off-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>
2025-06-09 09:15:18 -07:00
24a1d8f936 DeepNVMe update (#7215)
- FastPersist
- ZeRO-Inference+SGLang

---------

Signed-off-by: Olatunji Ruwase <olruwase@microsoft.com>
Signed-off-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>
Co-authored-by: jerryyangli <jerryyangli@gmail.com>
Co-authored-by: Yang Li <yangli2@microsoft.com>
Co-authored-by: Guanhua Wang <alexwgh333@gmail.com>
Co-authored-by: Connor Holmes <connorholmes@microsoft.com>
Co-authored-by: Bing Xie <67908712+xiexbing@users.noreply.github.com>
Co-authored-by: cassieesvelt <73311224+cassieesvelt@users.noreply.github.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: swli <47371259+lucasleesw@users.noreply.github.com>
Co-authored-by: Cheng Li <pistasable@gmail.com>
Co-authored-by: Molly Smith <112220543+molly-smith@users.noreply.github.com>
Co-authored-by: Ubuntu <jomayeri@microsoft.com>
Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>
Co-authored-by: Zhipeng Wang <zhipeng.rainbowserie@gmail.com>
2025-06-06 18:49:41 -04:00
fff77bd293 Update README.md (#7246)
I make the sentence look more human, not robot.
2025-04-25 15:15:16 +00:00
962a8f0ad7 Recommend using latest (#7233)
Add a sentence to DeepCompile blog to recommend using the latest
version.

Signed-off-by: Masahiro Tanaka <mtanaka@microsoft.com>
2025-04-18 16:35:49 +00:00
8f93f8b9b0 Fix release links (#7219)
Fix DS release links

---------

Signed-off-by: Masahiro Tanaka <mtanaka@microsoft.com>
Signed-off-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Masahiro Tanaka <mtanaka@microsoft.com>
Co-authored-by: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com>
Co-authored-by: zafarsadiq <zafarsadiq120@gmail.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
2025-04-16 08:41:55 -07:00
227a60c0c4 DeepCompile for enhanced compiler integration (#7154)
This PR introduces *DeepCompile*, a new feature that efficiently
integrates compiler optimizations with other DeepSpeed features.
DeepCompile utilizes torch's dynamo to capture the computation graph and
modifies it to incorporate DeepSpeed’s optimizations seamlessly.

Currently, DeepCompile supports ZeRO-1 and ZeRO-3, with enhancements
such as proactive prefetching and selective unsharding to improve
performance.
(More details will be added later.)

---------

Signed-off-by: Masahiro Tanaka <mtanaka@microsoft.com>
Signed-off-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: zafarsadiq <zafarsadiq120@gmail.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
2025-04-16 04:33:53 +00:00
29fa95a819 update dependencies version info (#7206)
The release versions are now available. update from the master branch to
use the minimum required versions instead.
also link the
example.https://github.com/deepspeedai/DeepSpeedExamples/pull/964

---------

Signed-off-by: inkcherry <mingzhi.liu@intel.com>
2025-04-08 15:22:58 +00:00
ac295aa06c Fix typos in GDS blog (#7177)
Signed-off-by: Logan Adams <loadams@microsoft.com>
2025-03-26 23:32:17 +00:00
1ca83a6bb9 hf tp+zero training doc. (#7151)
@tjruwase Don't merge yet, I will leave a comment when it is ready for
merge. Thank you.

---------

Signed-off-by: Olatunji Ruwase <olruwase@microsoft.com>
Signed-off-by: inkcherry <mingzhi.liu@intel.com>
Signed-off-by: Logan Adams <loadams@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Logan Adams <loadams@microsoft.com>
2025-03-20 23:23:43 +00:00
c2c8199394 Update references to new X/Twitter handle (#7110)
As a part of joining the Linux Foundation AI&Data it makes sense to
rename the X/Twitter accounts associated with DeepSpeed.

---------

Signed-off-by: Logan Adams <loadams@microsoft.com>
2025-03-04 23:22:38 +00:00
e637677766 Add chinese blog for deepspeed windows, and fix format (#7035)
Fix #7029 
- Add Chinese blog for deepspeed windows
- Fix format in README.md

Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
2025-02-13 17:10:30 -08:00
83f5deed41 add gds chinese blog (#7034)
cc @tjruwase @jomayeri

---------

Co-authored-by: root <root@ftqtmec25000000.taxzvufipdhelhupulxcbvr15f.ux.internal.cloudapp.net>
2025-02-13 19:38:36 +00:00
fd40516923 Update GH org references (#6998)
Signed-off-by: Olatunji Ruwase <olruwase@microsoft.com>
Signed-off-by: Logan Adams <loadams@microsoft.com>
Signed-off-by: Fabien Dupont <fdupont@redhat.com>
Co-authored-by: Fabien Dupont <fabiendupont@fabiendupont.fr>
2025-02-05 00:56:50 +00:00
53fb5795a1 Fix windows blog examples (#6934) 2025-01-08 12:54:19 -08:00
0e92f9b41f Update README.md (#6824)
Fix broken tutorial link
2024-12-05 11:31:52 -08:00
0b0fef3d41 Ulyssess offload blog (#6814)
Ulysses-Offload (FPDT) blog, please see corresponding tutorial page at
[link](https://github.com/microsoft/DeepSpeed/pull/6813).

---------

Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Co-authored-by: Logan Adams <loadams@microsoft.com>
2024-12-05 16:39:44 +00:00
ec6cc49034 Domino Blog (#6776)
This PR is domino blog on our public side.

cc @tjruwase

---------

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
2024-11-25 11:59:04 -08:00
5e16f255a6 docs: fix HF links (#6780)
The current link
https://huggingface.co/docs/transformers/main_classes/deepspeed is very
unhelpful.

It turns out in the past it had some guides:
https://huggingface.co/docs/transformers/v4.27.1/main_classes/deepspeed#shared-configuration

Later it's refreshed and moved to
https://huggingface.co/docs/transformers/deepspeed
2024-11-25 10:10:08 -08:00
5df12a4a85 DeepNVMe tutorial (#6449)
Co-authored-by: Logan Adams <loadams@microsoft.com>
Co-authored-by: jomayeri <deepspeed@H100-VM2.shlnn55tgwve1eacvp21ie45dg.jx.internal.cloudapp.net>
2024-09-04 15:31:31 +00:00
649b078571 Add Japanese translation of Windows support blog (#6394)
This PR adds the Japanese translation of the release blog of Windows
support.
2024-08-21 18:24:27 -07:00
01fe65b300 DeepSpeed on Window blog (#6364)
DeepSpeed on Windows blog

---------

Co-authored-by: Logan Adams <loadams@microsoft.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
2024-08-19 11:16:22 -07:00
ade7149db4 Add Japanese translation of DeepNVMe blog (#5845)
This PR adds the Japanese translation of the DeepNVMe blog.
2024-08-06 12:21:31 -07:00
2ef8223210 Fix NV references (#5821)
Fix NVIDIA references and typos.

---------

Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
2024-08-02 10:18:01 -07:00
029bb5274a Link GDS blog to site (#5820) 2024-08-01 13:35:26 -07:00
249c1db2fb Add fp16 support of Qwen1.5MoE models (A2.7B) to DeepSpeed-FastGen (#5403)
This PR adds support for Qwen1.5MoE-A2.7B models.

support for https://github.com/microsoft/DeepSpeed-MII/issues/457

### Test Code

for mii pipeline:
```python
import mii

pipe = mii.pipeline("/data/zonepg/models/Qwen/Qwen1.5-MoE-A2.7B")
responses = pipe("DeepSpeed is", max_new_tokens=128, do_sample=False)
if pipe.is_rank_0:
    print(responses[0])
```
for huggingface:
```python
import mii

from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation import GenerationConfig
import torch
tokenizer = AutoTokenizer.from_pretrained("/data/zonepg/models/Qwen/Qwen1.5-MoE-A2.7B")
model = AutoModelForCausalLM.from_pretrained("/data/zonepg/models/Qwen/Qwen1.5-MoE-A2.7B", device_map="auto", torch_dtype=torch.float16, trust_remote_code=True).eval()
print(model)
inputs = tokenizer('DeepSpeed is', return_tensors='pt')
inputs = inputs.to(model.device)
pred = model.generate(**inputs, max_new_tokens=128, do_sample=False, repetition_penalty=1.0)
test = tokenizer.decode(pred.cpu()[0], skip_special_tokens=False)
print(test)
```

### Qwen1.5-MoE-A2.7B
Huggingface output with prompt "DeepSpeed is":
```
 a deep learning framework that is designed to accelerate the training of large-scale neural networks. It is built on top of PyTorch and provides a set of tools and techniques for optimizing the performance of deep learning models.

DeepSpeed supports a variety of hardware accelerators, including GPUs, TPUs, and FPGAs, and can be used to train models on distributed systems, such as clusters of GPUs or TPUs.

One of the key features of DeepSpeed is its ability to automatically parallelize the training of deep learning models across multiple GPUs or TPUs. This can significantly reduce the time required to train large models, as it allows the
```
DeepSpeed-FastGen output with prompt "DeepSpeed is":
```
 a deep learning framework that is designed to accelerate the training of large-scale neural networks. It is built on top of PyTorch and provides a set of tools and techniques for optimizing the performance of deep learning models.

DeepSpeed supports a variety of hardware accelerators, including GPUs, TPUs, and FPGAs, and can be used to train models on distributed systems, such as clusters of GPUs or TPUs.

One of the key features of DeepSpeed is its ability to automatically parallelize the training of deep learning models across multiple GPUs or TPUs. This can significantly reduce the time required to train large models, as it allows the
```

DeepSpeed-FastGen output with prompt "DeepSpeed is" with 8-way sharding:
```
 a deep learning framework that is designed to accelerate the training of large-scale neural networks. It is built on top of PyTorch and provides a set of tools and techniques for optimizing the performance of deep learning models.

DeepSpeed supports a variety of hardware accelerators, including GPUs, TPUs, and FPGAs, and can be used to train models on distributed systems, such as clusters of GPUs or TPUs.

One of the key features of DeepSpeed is its ability to automatically parallelize the training of deep learning models across multiple GPUs or TPUs. This can significantly reduce the time required to train large models, as it allows the
```

Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Co-authored-by: Heyang Qin <heyangqin@microsoft.com>
Co-authored-by: Abhishek Kulkarni <11399+adk9@users.noreply.github.com>
2024-08-01 10:27:24 -07:00
324ee65cb0 GDS AIO Blog (#5817)
README and media for the GDS blog.

---------

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
2024-08-01 09:15:10 -04:00
6a163e03f4 Add support for Microsoft Phi-3 model to DeepSpeed-FastGen (#5559)
This PR adds support for Microsoft Phi-3 model to FastGen.

DeepSpeed-FastGen output with prompt "DeepSpeed is":
```
an AI-powered platform designed to optimize and scale distributed deep learning models across clusters.**

DeepSpeed is a cutting-edge AI-driven toolkit that empowers users to enhance and scale deep learning models across distributed computing environments. By harnessing the power of artificial intelligence, DeepSpeed provides innovative solutions for optimizing resource allocation, managing data synchronization, and improving model parallelism. This enables efficient scaling and execution of complex deep learning tasks, unlocking the full potential of distributed computing systems.

### Key Features of DeepSpeed:

1.
```

---------

Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
2024-07-20 03:18:36 +00:00
78c6c449c9 Update the list of supported models in the Chinese README of fastgen (#5773)
Updates to the three models supported in deepspeed-fastgen since the
last Chinese README update.

Co-authored-by: weifangyuan <i.weifangyuan@yuewen.com>
2024-07-16 13:32:16 +00:00
dd7a5be53d UCP Chinese Blog (#5713)
Co-authored-by: Sam Ade Jacobs <samjacobs@microsoft.com>
Co-authored-by: Masahiro Tanaka <mtanaka@microsoft.com>
2024-07-01 15:57:52 -07:00
121efdbd5c DeepSpeed Universal Checkpointing: Blog and Tutorial (#5711)
Train {GPT,LLaMA, Phi}-like models (or any model) at ultra low-cost with
DeepSpeed Universal Checkpointing (UCP). UCP abstracts away the
complexities of saving and loading model states. See arxiv paper, blog
and tutorial in this PR for details.

---------

Co-authored-by: Masahiro Tanaka <mtanaka@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Logan Adams <loadams@microsoft.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
2024-07-01 14:37:24 -07:00
d86a68c3d4 [fix] fix typo s/simultanenously /simultaneously (#5359)
fix typo s/simultanenously /simultaneously
         s/Colosal /Colossal
detail info 
        modified:   blogs/deepspeed-fp6/03-05-2024/README.md
        modified:   blogs/deepspeed-ulysses/README.md
2024-04-03 15:54:45 +00:00
d1536e4494 Fp6 blog chinese (#5239) 2024-03-07 17:33:50 -08:00
0a979f8bc1 FP6 blog (#5235)
Co-authored-by: Zhen Zheng
[zhengzhen@microsoft.com](mailto:zhengzhen@microsoft.com)
Co-authored-by: Xiaoxia Wu
[xiaoxiawu@microsoft.com](mailto:xiaoxiawu@microsoft.com)
Co-authored-by: Haojun Xia
[xhjustc@mail.ustc.edu.cn](mailto:xhjustc@mail.ustc.edu.cn)
Co-authored-by: Olatunji Ruwase
[olruwase@microsoft.com](mailto:olruwase@microsoft.com)
Co-authored-by: Leon Song
[leonsong@microsoft.com](mailto:leonsong@microsoft.com)

---------

Co-authored-by: xiaoxiawu-microsoft <xiaoxiawu@microsoft.com>
Co-authored-by: Arash Bakhtiari <arash@bakhtiari.org>
Co-authored-by: Xiaoxia (Shirley) Wu <94406484+xiaoxiawu-microsoft@users.noreply.github.com>
Co-authored-by: ZHENG, Zhen <zhengzhen.z@qq.com>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
2024-03-07 17:00:15 -08:00
8d0150d917 Fix typos in blogs/ (#5172)
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
2024-02-22 18:40:17 +00:00
4738a5e61f Fix placeholder value in FastGen Blog (#5000) 2024-01-23 09:59:36 -08:00
b81bed69a8 fix some typo under blogs/ (#4988)
fix some typo under blogs/

detail info
	modified:   blogs/comm-opt/README.md
	modified:   blogs/deepspeed-fastgen/README.md
	modified:   blogs/deepspeed-offloadpp/README.md
	modified:   blogs/deepspeed-triton/README.md
	modified:   blogs/deepspeed-ulysses/README.md
	modified:   blogs/deepspeed-visualchat/10-03-2023/README-Japanese.md
	modified:   blogs/deepspeed-visualchat/10-03-2023/README.md
2024-01-23 02:09:25 +00:00
7fb5bade3e Update FastGen blog title (#4983) 2024-01-19 18:03:43 -06:00
1ac843a372 Update README.md 2024-01-19 15:00:46 -08:00
79564203c6 FastGen Jan 2024 blog (#4980)
Co-authored-by: Lev Kurilenko <lekurile@microsoft.com>
Co-authored-by: Arash Bakhtiari <arash@bakhtiari.org>
Co-authored-by: Heyang Qin <heyangqin@microsoft.com>
Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com>
2024-01-19 14:58:27 -08:00
ed10cc7382 Add support of Qwen models (7b, 14b, 72b) to DeepSpeed-FastGen (#4913)
This PR adds support for Qwen models 7b, 14b and 72b.

### Test Code

for mii pipeline:
```python
from mii import pipeline
pipe = pipeline("Qwen/Qwen-7B-Chat")
pipe.tokenizer.tokenizer.eos_token_id = 151643
output = pipe(["DeepSpeed is"], max_new_tokens=128, do_sample=False)
print(output)
```
for huggingface:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation import GenerationConfig
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-7B-Chat", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="auto", trust_remote_code=True, fp16=True).eval()
inputs = tokenizer('DeepSpeed is', return_tensors='pt')
inputs = inputs.to(model.device)
pred = model.generate(**inputs, max_new_tokens=128, do_sample=False, repetition_penalty=1.0)
test = tokenizer.decode(pred.cpu()[0], skip_special_tokens=False)
print(test)
```

### Qwen 7B
Huggingface output with prompt "DeepSpeed is":
```
 a high-performance, low-latency database management system designed for real-time analytics and machine learning applications. It is built on top of Apache Arrow, a columnar in-memory data format, and is optimized for processing large volumes of data in parallel.\n\nDeepSpeed offers several key features that make it well-suited for real-time analytics and machine learning applications:\n\n1. High Performance: DeepSpeed is designed to deliver high performance by leveraging parallel processing and optimized data structures. It can process large volumes of data in real-time, making it ideal for applications that require real-time analytics.\n\n2. Low Latency: DeepSpeed is designed to minimize latency by
```
DeepSpeed-FastGen output with prompt "DeepSpeed is":
```
 a high-performance, low-latency database management system designed for real-time analytics and machine learning applications. It is built on top of Apache Arrow, a columnar in-memory data format, and is optimized for processing large volumes of data in parallel.\n\nDeepSpeed offers several key features that make it well-suited for real-time analytics and machine learning applications:\n\n1. High Performance: DeepSpeed is designed to deliver high performance by leveraging parallel processing and optimized data structures. It can process large volumes of data in real-time, making it ideal for applications that require real-time analytics.\n\n2. Low Latency: DeepSpeed is designed to minimize latency by
```

### Qwen 72B
Huggingface output with prompt "DeepSpeed is":
```
是一个开源的深度学习优化库,它提供了多种优化技术,包括模型并行、数据并行、混合并行、ZeRO内存优化等。它可以帮助用户在大规模GPU集群上训练深度学习模型,提高训练速度,减少内存使用。\n在Deepspeed中,模型并行是一种将模型的不同部分分配到不同的GPU上的技术。这样可以处理模型太大,无法放在一个GPU上的问题。数据并行是将数据集分成多个部分,每个部分在不同的GPU上进行训练。混合并行则是结合了模型并行和数据并行,以更有效地利用GPU资源
```
DeepSpeed-FastGen output with prompt "DeepSpeed is" with 8-way sharding:
```
是一个开源的深度学习优化库,它提供了多种优化技术,包括模型并行、数据并行、混合并行、ZeRO内存优化等。它可以帮助用户在大规模GPU集群上训练深度学习模型,提高训练速度,减少内存使用。\n在Deepspeed中,模型并行是一种将模型的不同部分分配到不同的GPU上的技术。这样可以处理模型太大,无法放在一个GPU上的问题。数据并行是将数据集分成多个部分,每个部分在不同的GPU上进行训练。混合并行则是结合了模型并行和数据并行,以更有效地利用GPU资源
```

---------

Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
2024-01-10 23:05:27 +00:00
834272531a Add support of Microsoft Phi-2 model to DeepSpeed-FastGen (#4812)
This PR adds support for Microsoft Phi-2 model.

HF output with prompt "DeepSpeed is":
```
a company that helps make videos and movies look really good. They have a special way of making videos that makes them look like they were made in a movie theater. This is called "4K Ultra HD" and it makes the videos look very clear and detailed. DeepSpeed also has a special way of making videos that makes them look like they were made in a movie theater. This is called "4K Ultra HD" and it makes the videos look very clear and detailed. DeepSpeed also has a special way of making videos that makes them look like they were made in a movie theater. This is called "4K Ultra HD"
```

DeepSpeed-FastGen output with prompt "DeepSpeed is":
```
a company that helps make videos and movies look really good. They have a special way of making videos that makes them look like they were made in a movie theater. This is called "4K Ultra HD" and it makes the videos look very clear and detailed. DeepSpeed also has a special way of making videos that makes them look like they were made in a movie theater. This is called "4K Ultra HD" and it makes the videos look very clear and detailed. DeepSpeed also has a special way of making videos that makes them look like they were made in a movie theater. This is called "4K Ultra HD"
```

---------

Co-authored-by: Connor Holmes <connorholmes@microsoft.com>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
2024-01-03 13:27:06 -08:00
a7900bcc3d Add support of Falcon models (7b, 40b, 180b) to DeepSpeed-FastGen (#4790) 2023-12-12 13:31:44 -08:00
6b8103b46e [docs] Intel inference blog (#4734) 2023-11-28 08:27:54 -08:00