DeepSpeed

mirror of https://github.com/deepspeedai/DeepSpeed.git synced 2025-10-20 15:33:51 +08:00

Author	SHA1	Message	Date
Olatunji Ruwase	fd40516923	Update GH org references (#6998 ) Signed-off-by: Olatunji Ruwase <olruwase@microsoft.com> Signed-off-by: Logan Adams <loadams@microsoft.com> Signed-off-by: Fabien Dupont <fdupont@redhat.com> Co-authored-by: Fabien Dupont <fabiendupont@fabiendupont.fr>	2025-02-05 00:56:50 +00:00
Perry Zou	249c1db2fb	Add fp16 support of Qwen1.5MoE models (A2.7B) to DeepSpeed-FastGen (#5403 ) This PR adds support for Qwen1.5MoE-A2.7B models. support for https://github.com/microsoft/DeepSpeed-MII/issues/457 ### Test Code for mii pipeline: ```python import mii pipe = mii.pipeline("/data/zonepg/models/Qwen/Qwen1.5-MoE-A2.7B") responses = pipe("DeepSpeed is", max_new_tokens=128, do_sample=False) if pipe.is_rank_0: print(responses[0]) ``` for huggingface: ```python import mii from transformers import AutoModelForCausalLM, AutoTokenizer from transformers.generation import GenerationConfig import torch tokenizer = AutoTokenizer.from_pretrained("/data/zonepg/models/Qwen/Qwen1.5-MoE-A2.7B") model = AutoModelForCausalLM.from_pretrained("/data/zonepg/models/Qwen/Qwen1.5-MoE-A2.7B", device_map="auto", torch_dtype=torch.float16, trust_remote_code=True).eval() print(model) inputs = tokenizer('DeepSpeed is', return_tensors='pt') inputs = inputs.to(model.device) pred = model.generate(**inputs, max_new_tokens=128, do_sample=False, repetition_penalty=1.0) test = tokenizer.decode(pred.cpu()[0], skip_special_tokens=False) print(test) ``` ### Qwen1.5-MoE-A2.7B Huggingface output with prompt "DeepSpeed is": ``` a deep learning framework that is designed to accelerate the training of large-scale neural networks. It is built on top of PyTorch and provides a set of tools and techniques for optimizing the performance of deep learning models. DeepSpeed supports a variety of hardware accelerators, including GPUs, TPUs, and FPGAs, and can be used to train models on distributed systems, such as clusters of GPUs or TPUs. One of the key features of DeepSpeed is its ability to automatically parallelize the training of deep learning models across multiple GPUs or TPUs. This can significantly reduce the time required to train large models, as it allows the ``` DeepSpeed-FastGen output with prompt "DeepSpeed is": ``` a deep learning framework that is designed to accelerate the training of large-scale neural networks. It is built on top of PyTorch and provides a set of tools and techniques for optimizing the performance of deep learning models. DeepSpeed supports a variety of hardware accelerators, including GPUs, TPUs, and FPGAs, and can be used to train models on distributed systems, such as clusters of GPUs or TPUs. One of the key features of DeepSpeed is its ability to automatically parallelize the training of deep learning models across multiple GPUs or TPUs. This can significantly reduce the time required to train large models, as it allows the ``` DeepSpeed-FastGen output with prompt "DeepSpeed is" with 8-way sharding: ``` a deep learning framework that is designed to accelerate the training of large-scale neural networks. It is built on top of PyTorch and provides a set of tools and techniques for optimizing the performance of deep learning models. DeepSpeed supports a variety of hardware accelerators, including GPUs, TPUs, and FPGAs, and can be used to train models on distributed systems, such as clusters of GPUs or TPUs. One of the key features of DeepSpeed is its ability to automatically parallelize the training of deep learning models across multiple GPUs or TPUs. This can significantly reduce the time required to train large models, as it allows the ``` Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Co-authored-by: Heyang Qin <heyangqin@microsoft.com> Co-authored-by: Abhishek Kulkarni <11399+adk9@users.noreply.github.com>	2024-08-01 10:27:24 -07:00
Abhishek Kulkarni	6a163e03f4	Add support for Microsoft Phi-3 model to DeepSpeed-FastGen (#5559 ) This PR adds support for Microsoft Phi-3 model to FastGen. DeepSpeed-FastGen output with prompt "DeepSpeed is": ``` an AI-powered platform designed to optimize and scale distributed deep learning models across clusters.** DeepSpeed is a cutting-edge AI-driven toolkit that empowers users to enhance and scale deep learning models across distributed computing environments. By harnessing the power of artificial intelligence, DeepSpeed provides innovative solutions for optimizing resource allocation, managing data synchronization, and improving model parallelism. This enables efficient scaling and execution of complex deep learning tasks, unlocking the full potential of distributed computing systems. ### Key Features of DeepSpeed: 1. ``` --------- Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>	2024-07-20 03:18:36 +00:00
Peng Zou	ed10cc7382	Add support of Qwen models (7b, 14b, 72b) to DeepSpeed-FastGen (#4913 ) This PR adds support for Qwen models 7b, 14b and 72b. ### Test Code for mii pipeline: ```python from mii import pipeline pipe = pipeline("Qwen/Qwen-7B-Chat") pipe.tokenizer.tokenizer.eos_token_id = 151643 output = pipe(["DeepSpeed is"], max_new_tokens=128, do_sample=False) print(output) ``` for huggingface: ```python from transformers import AutoModelForCausalLM, AutoTokenizer from transformers.generation import GenerationConfig tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-7B-Chat", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="auto", trust_remote_code=True, fp16=True).eval() inputs = tokenizer('DeepSpeed is', return_tensors='pt') inputs = inputs.to(model.device) pred = model.generate(**inputs, max_new_tokens=128, do_sample=False, repetition_penalty=1.0) test = tokenizer.decode(pred.cpu()[0], skip_special_tokens=False) print(test) ``` ### Qwen 7B Huggingface output with prompt "DeepSpeed is": ``` a high-performance, low-latency database management system designed for real-time analytics and machine learning applications. It is built on top of Apache Arrow, a columnar in-memory data format, and is optimized for processing large volumes of data in parallel.\n\nDeepSpeed offers several key features that make it well-suited for real-time analytics and machine learning applications:\n\n1. High Performance: DeepSpeed is designed to deliver high performance by leveraging parallel processing and optimized data structures. It can process large volumes of data in real-time, making it ideal for applications that require real-time analytics.\n\n2. Low Latency: DeepSpeed is designed to minimize latency by ``` DeepSpeed-FastGen output with prompt "DeepSpeed is": ``` a high-performance, low-latency database management system designed for real-time analytics and machine learning applications. It is built on top of Apache Arrow, a columnar in-memory data format, and is optimized for processing large volumes of data in parallel.\n\nDeepSpeed offers several key features that make it well-suited for real-time analytics and machine learning applications:\n\n1. High Performance: DeepSpeed is designed to deliver high performance by leveraging parallel processing and optimized data structures. It can process large volumes of data in real-time, making it ideal for applications that require real-time analytics.\n\n2. Low Latency: DeepSpeed is designed to minimize latency by ``` ### Qwen 72B Huggingface output with prompt "DeepSpeed is": ``` 是一个开源的深度学习优化库，它提供了多种优化技术，包括模型并行、数据并行、混合并行、ZeRO内存优化等。它可以帮助用户在大规模GPU集群上训练深度学习模型，提高训练速度，减少内存使用。\n在Deepspeed中，模型并行是一种将模型的不同部分分配到不同的GPU上的技术。这样可以处理模型太大，无法放在一个GPU上的问题。数据并行是将数据集分成多个部分，每个部分在不同的GPU上进行训练。混合并行则是结合了模型并行和数据并行，以更有效地利用GPU资源 ``` DeepSpeed-FastGen output with prompt "DeepSpeed is" with 8-way sharding: ``` 是一个开源的深度学习优化库，它提供了多种优化技术，包括模型并行、数据并行、混合并行、ZeRO内存优化等。它可以帮助用户在大规模GPU集群上训练深度学习模型，提高训练速度，减少内存使用。\n在Deepspeed中，模型并行是一种将模型的不同部分分配到不同的GPU上的技术。这样可以处理模型太大，无法放在一个GPU上的问题。数据并行是将数据集分成多个部分，每个部分在不同的GPU上进行训练。混合并行则是结合了模型并行和数据并行，以更有效地利用GPU资源 ``` --------- Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>	2024-01-10 23:05:27 +00:00
Arash Bakhtiari	834272531a	Add support of Microsoft Phi-2 model to DeepSpeed-FastGen (#4812 ) This PR adds support for Microsoft Phi-2 model. HF output with prompt "DeepSpeed is": ``` a company that helps make videos and movies look really good. They have a special way of making videos that makes them look like they were made in a movie theater. This is called "4K Ultra HD" and it makes the videos look very clear and detailed. DeepSpeed also has a special way of making videos that makes them look like they were made in a movie theater. This is called "4K Ultra HD" and it makes the videos look very clear and detailed. DeepSpeed also has a special way of making videos that makes them look like they were made in a movie theater. This is called "4K Ultra HD" ``` DeepSpeed-FastGen output with prompt "DeepSpeed is": ``` a company that helps make videos and movies look really good. They have a special way of making videos that makes them look like they were made in a movie theater. This is called "4K Ultra HD" and it makes the videos look very clear and detailed. DeepSpeed also has a special way of making videos that makes them look like they were made in a movie theater. This is called "4K Ultra HD" and it makes the videos look very clear and detailed. DeepSpeed also has a special way of making videos that makes them look like they were made in a movie theater. This is called "4K Ultra HD" ``` --------- Co-authored-by: Connor Holmes <connorholmes@microsoft.com> Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>	2024-01-03 13:27:06 -08:00
Arash Bakhtiari	a7900bcc3d	Add support of Falcon models (7b, 40b, 180b) to DeepSpeed-FastGen (#4790 )	2023-12-12 13:31:44 -08:00
Masahiro Tanaka	ab6b1e16bb	Add Japanese blog for DeepSpeed-FastGen (#4651 ) This blog adds Japanese blog for DeepSpeed-FastGen. (also includes small fix of typos in the original blog) --------- Co-authored-by: Conglong Li <conglong.li@gmail.com>	2023-11-07 10:10:45 -08:00
Masahiro Tanaka	d89027be61	Fix figure in FlexGen blog (#4624 ) Fix the latency-throughput figure for 13B.	2023-11-05 10:23:46 -08:00
Masahiro Tanaka	ff53c22485	Add number for latency comparison (#4612 ) This PR adds latency comparison	2023-11-03 16:57:00 -07:00
Jeff Rasley	1d9e256c03	DeepSpeed-FastGen blog (#4607 ) Co-authored-by: Connor Holmes <connorholmes@microsoft.com> Co-authored-by: Masahiro Tanaka <mtanaka@microsoft.com> Co-authored-by: Heyang Qin <heyangqin@microsoft.com> Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com> Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com> Co-authored-by: Lev Kurilenko <lekurile@microsoft.com>	2023-11-03 15:32:40 -07:00

10 Commits