DeepSpeed

mirror of https://github.com/deepspeedai/DeepSpeed.git synced 2025-10-20 15:33:51 +08:00

Author	SHA1	Message	Date
Olatunji Ruwase	79caae1c04	Update email address (#7624 ) Update contact address Signed-off-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>	2025-10-07 17:15:49 +00:00
Ma, Guokai	2b68bbc594	Blog of zenflow binding study (#7614 ) This PR add a blog/lab for study of zenflow and zero offload performance with DeepSpeed CPU core binding. --------- Signed-off-by: Guokai Ma <guokai.ma@gmail.com> Signed-off-by: Olatunji Ruwase <tunji.ruwase@snowflake.com> Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com> Signed-off-by: Junjie Mao <junjie.mao@linux.alibaba.com> Co-authored-by: Xinyu Lian <lian7@illinois.edu> Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com> Co-authored-by: zhengchenyu <zhengchenyu16@163.com> Co-authored-by: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com> Co-authored-by: Junjie Mao <junjie.mao@linux.alibaba.com>	2025-10-06 11:38:44 -04:00
Ma, Guokai	65322e103c	Super offload blog Chinese version (#7620 ) This is the Chinese version of the SuperOffload blog. --------- Signed-off-by: Guokai Ma <guokai.ma@gmail.com> Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>	2025-10-04 12:58:51 +00:00
Xinyu Lian	330f738cd7	Minor fix in the SuperOffload blog (#7612 ) Polish SuperOffload blog post; minor grammar and style fixes --------- Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>	2025-10-01 11:02:31 +00:00
Xinyu Lian	462d28c5e6	Add blog for SuperOffload (#7594 ) This PR adds a blog post for SuperOffload. More specifically, the blog covers the design and motivation behind SuperOffload, comparisons with previous approaches, key experiences and insights, and guidance on enabling and using SuperOffload. See also: [PR#7559](https://github.com/deepspeedai/DeepSpeed/pull/7559) - SuperOffload implementation. [PR#990](https://github.com/deepspeedai/DeepSpeedExamples/pull/990) - Examples. --------- Signed-off-by: Olatunji Ruwase <tunji.ruwase@snowflake.com> Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>	2025-09-30 13:59:34 -04:00
Tingfeng Lan	cda3f9628c	Add blog for ZenFlow (#7463 ) This PR adds a blog post and images for ZenFlow, introducing its design, benefits, and usage. The blog explains how ZenFlow improves GPU utilization by overlapping computation and communication during offloaded training. See also: #7391 – core ZenFlow implementation. [#982](https://github.com/deepspeedai/DeepSpeedExamples/pull/982) - – benchmarking and fine-tuning example. --------- Signed-off-by: Tingfeng Lan <erc8gx@virginia.edu> Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com> Co-authored-by: Hongwei Chen <33092912+hwchen2017@users.noreply.github.com>	2025-08-10 08:50:34 -04:00
Olatunji Ruwase	9ac9441400	Fix 404s (#7363 ) Signed-off-by: Olatunji Ruwase <tjruwase@gmail.com>	2025-06-16 18:54:36 -04:00
Olatunji Ruwase	bb293aea5d	Update folder name (#7343 ) Sync folder name with release date --------- Signed-off-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>	2025-06-09 09:15:18 -07:00
Olatunji Ruwase	24a1d8f936	DeepNVMe update (#7215 ) - FastPersist - ZeRO-Inference+SGLang --------- Signed-off-by: Olatunji Ruwase <olruwase@microsoft.com> Signed-off-by: Olatunji Ruwase <tunji.ruwase@snowflake.com> Co-authored-by: jerryyangli <jerryyangli@gmail.com> Co-authored-by: Yang Li <yangli2@microsoft.com> Co-authored-by: Guanhua Wang <alexwgh333@gmail.com> Co-authored-by: Connor Holmes <connorholmes@microsoft.com> Co-authored-by: Bing Xie <67908712+xiexbing@users.noreply.github.com> Co-authored-by: cassieesvelt <73311224+cassieesvelt@users.noreply.github.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: swli <47371259+lucasleesw@users.noreply.github.com> Co-authored-by: Cheng Li <pistasable@gmail.com> Co-authored-by: Molly Smith <112220543+molly-smith@users.noreply.github.com> Co-authored-by: Ubuntu <jomayeri@microsoft.com> Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com> Co-authored-by: Zhipeng Wang <zhipeng.rainbowserie@gmail.com>	2025-06-06 18:49:41 -04:00
Jing Zhang	fff77bd293	Update README.md (#7246 ) I make the sentence look more human, not robot.	2025-04-25 15:15:16 +00:00
Masahiro Tanaka	962a8f0ad7	Recommend using latest (#7233 ) Add a sentence to DeepCompile blog to recommend using the latest version. Signed-off-by: Masahiro Tanaka <mtanaka@microsoft.com>	2025-04-18 16:35:49 +00:00
Olatunji Ruwase	8f93f8b9b0	Fix release links (#7219 ) Fix DS release links --------- Signed-off-by: Masahiro Tanaka <mtanaka@microsoft.com> Signed-off-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Masahiro Tanaka <mtanaka@microsoft.com> Co-authored-by: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com> Co-authored-by: zafarsadiq <zafarsadiq120@gmail.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>	2025-04-16 08:41:55 -07:00
Masahiro Tanaka	227a60c0c4	DeepCompile for enhanced compiler integration (#7154 ) This PR introduces DeepCompile, a new feature that efficiently integrates compiler optimizations with other DeepSpeed features. DeepCompile utilizes torch's dynamo to capture the computation graph and modifies it to incorporate DeepSpeed’s optimizations seamlessly. Currently, DeepCompile supports ZeRO-1 and ZeRO-3, with enhancements such as proactive prefetching and selective unsharding to improve performance. (More details will be added later.) --------- Signed-off-by: Masahiro Tanaka <mtanaka@microsoft.com> Signed-off-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: zafarsadiq <zafarsadiq120@gmail.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>	2025-04-16 04:33:53 +00:00
inkcherry	29fa95a819	update dependencies version info (#7206 ) The release versions are now available. update from the master branch to use the minimum required versions instead. also link the example.https://github.com/deepspeedai/DeepSpeedExamples/pull/964 --------- Signed-off-by: inkcherry <mingzhi.liu@intel.com>	2025-04-08 15:22:58 +00:00
Logan Adams	ac295aa06c	Fix typos in GDS blog (#7177 ) Signed-off-by: Logan Adams <loadams@microsoft.com>	2025-03-26 23:32:17 +00:00
inkcherry	1ca83a6bb9	hf tp+zero training doc. (#7151 ) @tjruwase Don't merge yet, I will leave a comment when it is ready for merge. Thank you. --------- Signed-off-by: Olatunji Ruwase <olruwase@microsoft.com> Signed-off-by: inkcherry <mingzhi.liu@intel.com> Signed-off-by: Logan Adams <loadams@microsoft.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Logan Adams <loadams@microsoft.com>	2025-03-20 23:23:43 +00:00
Logan Adams	c2c8199394	Update references to new X/Twitter handle (#7110 ) As a part of joining the Linux Foundation AI&Data it makes sense to rename the X/Twitter accounts associated with DeepSpeed. --------- Signed-off-by: Logan Adams <loadams@microsoft.com>	2025-03-04 23:22:38 +00:00
Hongwei Chen	e637677766	Add chinese blog for deepspeed windows, and fix format (#7035 ) Fix #7029 - Add Chinese blog for deepspeed windows - Fix format in README.md Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>	2025-02-13 17:10:30 -08:00
Guanhua Wang	83f5deed41	add gds chinese blog (#7034 ) cc @tjruwase @jomayeri --------- Co-authored-by: root <root@ftqtmec25000000.taxzvufipdhelhupulxcbvr15f.ux.internal.cloudapp.net>	2025-02-13 19:38:36 +00:00
Olatunji Ruwase	fd40516923	Update GH org references (#6998 ) Signed-off-by: Olatunji Ruwase <olruwase@microsoft.com> Signed-off-by: Logan Adams <loadams@microsoft.com> Signed-off-by: Fabien Dupont <fdupont@redhat.com> Co-authored-by: Fabien Dupont <fabiendupont@fabiendupont.fr>	2025-02-05 00:56:50 +00:00
Logan Adams	53fb5795a1	Fix windows blog examples (#6934 )	2025-01-08 12:54:19 -08:00
Sam Ade Jacobs	0e92f9b41f	Update README.md (#6824 ) Fix broken tutorial link	2024-12-05 11:31:52 -08:00
Sam Ade Jacobs	0b0fef3d41	Ulyssess offload blog (#6814 ) Ulysses-Offload (FPDT) blog, please see corresponding tutorial page at [link](https://github.com/microsoft/DeepSpeed/pull/6813). --------- Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Co-authored-by: Logan Adams <loadams@microsoft.com>	2024-12-05 16:39:44 +00:00
Guanhua Wang	ec6cc49034	Domino Blog (#6776 ) This PR is domino blog on our public side. cc @tjruwase --------- Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>	2024-11-25 11:59:04 -08:00
谭九鼎	5e16f255a6	docs: fix HF links (#6780 ) The current link https://huggingface.co/docs/transformers/main_classes/deepspeed is very unhelpful. It turns out in the past it had some guides: https://huggingface.co/docs/transformers/v4.27.1/main_classes/deepspeed#shared-configuration Later it's refreshed and moved to https://huggingface.co/docs/transformers/deepspeed	2024-11-25 10:10:08 -08:00
Olatunji Ruwase	5df12a4a85	DeepNVMe tutorial (#6449 ) Co-authored-by: Logan Adams <loadams@microsoft.com> Co-authored-by: jomayeri <deepspeed@H100-VM2.shlnn55tgwve1eacvp21ie45dg.jx.internal.cloudapp.net>	2024-09-04 15:31:31 +00:00
Masahiro Tanaka	649b078571	Add Japanese translation of Windows support blog (#6394 ) This PR adds the Japanese translation of the release blog of Windows support.	2024-08-21 18:24:27 -07:00
Olatunji Ruwase	01fe65b300	DeepSpeed on Window blog (#6364 ) DeepSpeed on Windows blog --------- Co-authored-by: Logan Adams <loadams@microsoft.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>	2024-08-19 11:16:22 -07:00
Masahiro Tanaka	ade7149db4	Add Japanese translation of DeepNVMe blog (#5845 ) This PR adds the Japanese translation of the DeepNVMe blog.	2024-08-06 12:21:31 -07:00
Olatunji Ruwase	2ef8223210	Fix NV references (#5821 ) Fix NVIDIA references and typos. --------- Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>	2024-08-02 10:18:01 -07:00
Olatunji Ruwase	029bb5274a	Link GDS blog to site (#5820 )	2024-08-01 13:35:26 -07:00
Perry Zou	249c1db2fb	Add fp16 support of Qwen1.5MoE models (A2.7B) to DeepSpeed-FastGen (#5403 ) This PR adds support for Qwen1.5MoE-A2.7B models. support for https://github.com/microsoft/DeepSpeed-MII/issues/457 ### Test Code for mii pipeline: ```python import mii pipe = mii.pipeline("/data/zonepg/models/Qwen/Qwen1.5-MoE-A2.7B") responses = pipe("DeepSpeed is", max_new_tokens=128, do_sample=False) if pipe.is_rank_0: print(responses[0]) ``` for huggingface: ```python import mii from transformers import AutoModelForCausalLM, AutoTokenizer from transformers.generation import GenerationConfig import torch tokenizer = AutoTokenizer.from_pretrained("/data/zonepg/models/Qwen/Qwen1.5-MoE-A2.7B") model = AutoModelForCausalLM.from_pretrained("/data/zonepg/models/Qwen/Qwen1.5-MoE-A2.7B", device_map="auto", torch_dtype=torch.float16, trust_remote_code=True).eval() print(model) inputs = tokenizer('DeepSpeed is', return_tensors='pt') inputs = inputs.to(model.device) pred = model.generate(**inputs, max_new_tokens=128, do_sample=False, repetition_penalty=1.0) test = tokenizer.decode(pred.cpu()[0], skip_special_tokens=False) print(test) ``` ### Qwen1.5-MoE-A2.7B Huggingface output with prompt "DeepSpeed is": ``` a deep learning framework that is designed to accelerate the training of large-scale neural networks. It is built on top of PyTorch and provides a set of tools and techniques for optimizing the performance of deep learning models. DeepSpeed supports a variety of hardware accelerators, including GPUs, TPUs, and FPGAs, and can be used to train models on distributed systems, such as clusters of GPUs or TPUs. One of the key features of DeepSpeed is its ability to automatically parallelize the training of deep learning models across multiple GPUs or TPUs. This can significantly reduce the time required to train large models, as it allows the ``` DeepSpeed-FastGen output with prompt "DeepSpeed is": ``` a deep learning framework that is designed to accelerate the training of large-scale neural networks. It is built on top of PyTorch and provides a set of tools and techniques for optimizing the performance of deep learning models. DeepSpeed supports a variety of hardware accelerators, including GPUs, TPUs, and FPGAs, and can be used to train models on distributed systems, such as clusters of GPUs or TPUs. One of the key features of DeepSpeed is its ability to automatically parallelize the training of deep learning models across multiple GPUs or TPUs. This can significantly reduce the time required to train large models, as it allows the ``` DeepSpeed-FastGen output with prompt "DeepSpeed is" with 8-way sharding: ``` a deep learning framework that is designed to accelerate the training of large-scale neural networks. It is built on top of PyTorch and provides a set of tools and techniques for optimizing the performance of deep learning models. DeepSpeed supports a variety of hardware accelerators, including GPUs, TPUs, and FPGAs, and can be used to train models on distributed systems, such as clusters of GPUs or TPUs. One of the key features of DeepSpeed is its ability to automatically parallelize the training of deep learning models across multiple GPUs or TPUs. This can significantly reduce the time required to train large models, as it allows the ``` Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Co-authored-by: Heyang Qin <heyangqin@microsoft.com> Co-authored-by: Abhishek Kulkarni <11399+adk9@users.noreply.github.com>	2024-08-01 10:27:24 -07:00
Joe Mayer	324ee65cb0	GDS AIO Blog (#5817 ) README and media for the GDS blog. --------- Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>	2024-08-01 09:15:10 -04:00
Abhishek Kulkarni	6a163e03f4	Add support for Microsoft Phi-3 model to DeepSpeed-FastGen (#5559 ) This PR adds support for Microsoft Phi-3 model to FastGen. DeepSpeed-FastGen output with prompt "DeepSpeed is": ``` an AI-powered platform designed to optimize and scale distributed deep learning models across clusters.** DeepSpeed is a cutting-edge AI-driven toolkit that empowers users to enhance and scale deep learning models across distributed computing environments. By harnessing the power of artificial intelligence, DeepSpeed provides innovative solutions for optimizing resource allocation, managing data synchronization, and improving model parallelism. This enables efficient scaling and execution of complex deep learning tasks, unlocking the full potential of distributed computing systems. ### Key Features of DeepSpeed: 1. ``` --------- Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>	2024-07-20 03:18:36 +00:00
beep-bebop	78c6c449c9	Update the list of supported models in the Chinese README of fastgen (#5773 ) Updates to the three models supported in deepspeed-fastgen since the last Chinese README update. Co-authored-by: weifangyuan <i.weifangyuan@yuewen.com>	2024-07-16 13:32:16 +00:00
Heyang Qin	dd7a5be53d	UCP Chinese Blog (#5713 ) Co-authored-by: Sam Ade Jacobs <samjacobs@microsoft.com> Co-authored-by: Masahiro Tanaka <mtanaka@microsoft.com>	2024-07-01 15:57:52 -07:00
Sam Ade Jacobs	121efdbd5c	DeepSpeed Universal Checkpointing: Blog and Tutorial (#5711 ) Train {GPT,LLaMA, Phi}-like models (or any model) at ultra low-cost with DeepSpeed Universal Checkpointing (UCP). UCP abstracts away the complexities of saving and loading model states. See arxiv paper, blog and tutorial in this PR for details. --------- Co-authored-by: Masahiro Tanaka <mtanaka@microsoft.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Logan Adams <loadams@microsoft.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>	2024-07-01 14:37:24 -07:00
digger yu	d86a68c3d4	[fix] fix typo s/simultanenously /simultaneously (#5359 ) fix typo s/simultanenously /simultaneously s/Colosal /Colossal detail info modified: blogs/deepspeed-fp6/03-05-2024/README.md modified: blogs/deepspeed-ulysses/README.md	2024-04-03 15:54:45 +00:00
Xiaoxia (Shirley) Wu	d1536e4494	Fp6 blog chinese (#5239 )	2024-03-07 17:33:50 -08:00
Logan Adams	0a979f8bc1	FP6 blog (#5235 ) Co-authored-by: Zhen Zheng [zhengzhen@microsoft.com](mailto:zhengzhen@microsoft.com) Co-authored-by: Xiaoxia Wu [xiaoxiawu@microsoft.com](mailto:xiaoxiawu@microsoft.com) Co-authored-by: Haojun Xia [xhjustc@mail.ustc.edu.cn](mailto:xhjustc@mail.ustc.edu.cn) Co-authored-by: Olatunji Ruwase [olruwase@microsoft.com](mailto:olruwase@microsoft.com) Co-authored-by: Leon Song [leonsong@microsoft.com](mailto:leonsong@microsoft.com) --------- Co-authored-by: xiaoxiawu-microsoft <xiaoxiawu@microsoft.com> Co-authored-by: Arash Bakhtiari <arash@bakhtiari.org> Co-authored-by: Xiaoxia (Shirley) Wu <94406484+xiaoxiawu-microsoft@users.noreply.github.com> Co-authored-by: ZHENG, Zhen <zhengzhen.z@qq.com> Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>	2024-03-07 17:00:15 -08:00
iLeGend	8d0150d917	Fix typos in blogs/ (#5172 ) Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>	2024-02-22 18:40:17 +00:00
Michael Wyatt	4738a5e61f	Fix placeholder value in FastGen Blog (#5000 )	2024-01-23 09:59:36 -08:00
digger yu	b81bed69a8	fix some typo under blogs/ (#4988 ) fix some typo under blogs/ detail info modified: blogs/comm-opt/README.md modified: blogs/deepspeed-fastgen/README.md modified: blogs/deepspeed-offloadpp/README.md modified: blogs/deepspeed-triton/README.md modified: blogs/deepspeed-ulysses/README.md modified: blogs/deepspeed-visualchat/10-03-2023/README-Japanese.md modified: blogs/deepspeed-visualchat/10-03-2023/README.md	2024-01-23 02:09:25 +00:00
Arash Bakhtiari	7fb5bade3e	Update FastGen blog title (#4983 )	2024-01-19 18:03:43 -06:00
Michael Wyatt	1ac843a372	Update README.md	2024-01-19 15:00:46 -08:00
Michael Wyatt	79564203c6	FastGen Jan 2024 blog (#4980 ) Co-authored-by: Lev Kurilenko <lekurile@microsoft.com> Co-authored-by: Arash Bakhtiari <arash@bakhtiari.org> Co-authored-by: Heyang Qin <heyangqin@microsoft.com> Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com>	2024-01-19 14:58:27 -08:00
Peng Zou	ed10cc7382	Add support of Qwen models (7b, 14b, 72b) to DeepSpeed-FastGen (#4913 ) This PR adds support for Qwen models 7b, 14b and 72b. ### Test Code for mii pipeline: ```python from mii import pipeline pipe = pipeline("Qwen/Qwen-7B-Chat") pipe.tokenizer.tokenizer.eos_token_id = 151643 output = pipe(["DeepSpeed is"], max_new_tokens=128, do_sample=False) print(output) ``` for huggingface: ```python from transformers import AutoModelForCausalLM, AutoTokenizer from transformers.generation import GenerationConfig tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-7B-Chat", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="auto", trust_remote_code=True, fp16=True).eval() inputs = tokenizer('DeepSpeed is', return_tensors='pt') inputs = inputs.to(model.device) pred = model.generate(**inputs, max_new_tokens=128, do_sample=False, repetition_penalty=1.0) test = tokenizer.decode(pred.cpu()[0], skip_special_tokens=False) print(test) ``` ### Qwen 7B Huggingface output with prompt "DeepSpeed is": ``` a high-performance, low-latency database management system designed for real-time analytics and machine learning applications. It is built on top of Apache Arrow, a columnar in-memory data format, and is optimized for processing large volumes of data in parallel.\n\nDeepSpeed offers several key features that make it well-suited for real-time analytics and machine learning applications:\n\n1. High Performance: DeepSpeed is designed to deliver high performance by leveraging parallel processing and optimized data structures. It can process large volumes of data in real-time, making it ideal for applications that require real-time analytics.\n\n2. Low Latency: DeepSpeed is designed to minimize latency by ``` DeepSpeed-FastGen output with prompt "DeepSpeed is": ``` a high-performance, low-latency database management system designed for real-time analytics and machine learning applications. It is built on top of Apache Arrow, a columnar in-memory data format, and is optimized for processing large volumes of data in parallel.\n\nDeepSpeed offers several key features that make it well-suited for real-time analytics and machine learning applications:\n\n1. High Performance: DeepSpeed is designed to deliver high performance by leveraging parallel processing and optimized data structures. It can process large volumes of data in real-time, making it ideal for applications that require real-time analytics.\n\n2. Low Latency: DeepSpeed is designed to minimize latency by ``` ### Qwen 72B Huggingface output with prompt "DeepSpeed is": ``` 是一个开源的深度学习优化库，它提供了多种优化技术，包括模型并行、数据并行、混合并行、ZeRO内存优化等。它可以帮助用户在大规模GPU集群上训练深度学习模型，提高训练速度，减少内存使用。\n在Deepspeed中，模型并行是一种将模型的不同部分分配到不同的GPU上的技术。这样可以处理模型太大，无法放在一个GPU上的问题。数据并行是将数据集分成多个部分，每个部分在不同的GPU上进行训练。混合并行则是结合了模型并行和数据并行，以更有效地利用GPU资源 ``` DeepSpeed-FastGen output with prompt "DeepSpeed is" with 8-way sharding: ``` 是一个开源的深度学习优化库，它提供了多种优化技术，包括模型并行、数据并行、混合并行、ZeRO内存优化等。它可以帮助用户在大规模GPU集群上训练深度学习模型，提高训练速度，减少内存使用。\n在Deepspeed中，模型并行是一种将模型的不同部分分配到不同的GPU上的技术。这样可以处理模型太大，无法放在一个GPU上的问题。数据并行是将数据集分成多个部分，每个部分在不同的GPU上进行训练。混合并行则是结合了模型并行和数据并行，以更有效地利用GPU资源 ``` --------- Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>	2024-01-10 23:05:27 +00:00
Arash Bakhtiari	834272531a	Add support of Microsoft Phi-2 model to DeepSpeed-FastGen (#4812 ) This PR adds support for Microsoft Phi-2 model. HF output with prompt "DeepSpeed is": ``` a company that helps make videos and movies look really good. They have a special way of making videos that makes them look like they were made in a movie theater. This is called "4K Ultra HD" and it makes the videos look very clear and detailed. DeepSpeed also has a special way of making videos that makes them look like they were made in a movie theater. This is called "4K Ultra HD" and it makes the videos look very clear and detailed. DeepSpeed also has a special way of making videos that makes them look like they were made in a movie theater. This is called "4K Ultra HD" ``` DeepSpeed-FastGen output with prompt "DeepSpeed is": ``` a company that helps make videos and movies look really good. They have a special way of making videos that makes them look like they were made in a movie theater. This is called "4K Ultra HD" and it makes the videos look very clear and detailed. DeepSpeed also has a special way of making videos that makes them look like they were made in a movie theater. This is called "4K Ultra HD" and it makes the videos look very clear and detailed. DeepSpeed also has a special way of making videos that makes them look like they were made in a movie theater. This is called "4K Ultra HD" ``` --------- Co-authored-by: Connor Holmes <connorholmes@microsoft.com> Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>	2024-01-03 13:27:06 -08:00
Arash Bakhtiari	a7900bcc3d	Add support of Falcon models (7b, 40b, 180b) to DeepSpeed-FastGen (#4790 )	2023-12-12 13:31:44 -08:00
Jeff Rasley	6b8103b46e	[docs] Intel inference blog (#4734 )	2023-11-28 08:27:54 -08:00

1 2

96 Commits