DeepSpeed

mirror of https://github.com/deepspeedai/DeepSpeed.git synced 2025-10-20 15:33:51 +08:00

Author	SHA1	Message	Date
Logan Adams	4686d5ef0b	Update version after 0.17.1 release (#7345 )	2025-06-09 21:03:16 -07:00
Logan Adams	2ce5505799	Move pytest pinning from individual tests to requirements-dev.txt until fixed. (#7327 ) pytest 8.4.0 seems to break a number of our tests, rather than pinning in each individually, we should just put this in the requirements file until we resolve the issue. --------- Co-authored-by: Olatunji Ruwase <tjruwase@gmail.com> v0.17.1	2025-06-09 22:42:55 +00:00
Felix Gondwe	4d0c159630	Fix docs that are rendering Incorrectly (#7344 ) Fixes #6747 ### Changes - Added missing imports required for the documentation to render correctly. - Changed `autoclass_content` from `auto` to `both` The value `auto` is not valid according to the [Sphinx documentation](https://www.sphinx-doc.org/en/master/usage/extensions/autodoc.html#confval-autoclass_content). ### Preview Sample fixed page: https://deepspeedfelixgondwefork.readthedocs.io/en/latest/model-checkpointing.html Current broken page: https://deepspeed.readthedocs.io/en/latest/model-checkpointing.html --------- Signed-off-by: felixgondwe <zungwala@gmail.com> Signed-off-by: Shaik Raza Sikander <srsikander@habana.ai> Signed-off-by: Masahiro Tanaka <mtanaka@microsoft.com> Signed-off-by: Olatunji Ruwase <olruwase@microsoft.com> Signed-off-by: Olatunji Ruwase <tunji.ruwase@snowflake.com> Signed-off-by: xiongjyu <xiongjyu@gmail.com> Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Olatunji Ruwase <tjruwase@gmail.com> Co-authored-by: Raza Sikander <srsikander@habana.ai> Co-authored-by: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Co-authored-by: Ramya Ramineni <62723901+rraminen@users.noreply.github.com> Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com> Co-authored-by: jerryyangli <jerryyangli@gmail.com> Co-authored-by: Yang Li <yangli2@microsoft.com> Co-authored-by: Guanhua Wang <alexwgh333@gmail.com> Co-authored-by: Connor Holmes <connorholmes@microsoft.com> Co-authored-by: Bing Xie <67908712+xiexbing@users.noreply.github.com> Co-authored-by: cassieesvelt <73311224+cassieesvelt@users.noreply.github.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com> Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: swli <47371259+lucasleesw@users.noreply.github.com> Co-authored-by: Cheng Li <pistasable@gmail.com> Co-authored-by: Molly Smith <112220543+molly-smith@users.noreply.github.com> Co-authored-by: Ubuntu <jomayeri@microsoft.com> Co-authored-by: Zhipeng Wang <zhipeng.rainbowserie@gmail.com> Co-authored-by: xiongjyu <xiongjyu@gmail.com> Co-authored-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>	2025-06-09 13:15:44 -07:00
Olatunji Ruwase	e440506bee	Improve overflow handling in ZeRO (#6976 ) Fix #5241: Improve overflow handling - [x] ZeRO 1 - [x] ZeRO 2 - [ ] ZeRO 3 - [ ] BF16Optimizer Enable pydantic configuration for mixed precision - [x] bf16 - [x] fp16 --------- Signed-off-by: Olatunji Ruwase <olruwase@microsoft.com> Signed-off-by: Fabien Dupont <fdupont@redhat.com> Signed-off-by: Logan Adams <loadams@microsoft.com> Signed-off-by: inkcherry <mingzhi.liu@intel.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Co-authored-by: Xinyu Lian <lian7@illinois.edu> Co-authored-by: loadams <loadams@users.noreply.github.com> Co-authored-by: Omar Elayan <142979319+oelayan7@users.noreply.github.com> Co-authored-by: Fabio Geraci <118277438+fabiosanger@users.noreply.github.com> Co-authored-by: Sam Foreman <saforem2@gmail.com> Co-authored-by: Fabien Dupont <fabiendupont@fabiendupont.fr> Co-authored-by: Liangliang Ma <1906710196@qq.com> Co-authored-by: inkcherry <mingzhi.liu@intel.com> Co-authored-by: Logan Adams <loadams@microsoft.com> Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>	2025-06-09 17:30:51 +00:00
Olatunji Ruwase	bb293aea5d	Update folder name (#7343 ) Sync folder name with release date --------- Signed-off-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>	2025-06-09 09:15:18 -07:00
Emmanuel Ferdman	05818e90d9	Fix LoRA arxiv reference (#7340 ) ## PR Summary This small PR fixes the LoRA arxiv reference in `mixed_precision_zeropp.md`. Relevant docs page: https://www.deepspeed.ai/tutorials/mixed_precision_zeropp/ Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>	2025-06-07 14:09:01 -04:00
xiongjyu	770967f5f0	fixed: Modified the topkgating function and modified the test_moe file for testing (#7163 ) Since the previous PR encountered the DCO problem and could not be solved for some reason, I resubmitted a completely identical PR but without the problem. --------- Signed-off-by: xiongjyu <xiongjyu@gmail.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Co-authored-by: Olatunji Ruwase <tjruwase@gmail.com> Co-authored-by: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com>	2025-06-06 16:42:41 -07:00
Olatunji Ruwase	24a1d8f936	DeepNVMe update (#7215 ) - FastPersist - ZeRO-Inference+SGLang --------- Signed-off-by: Olatunji Ruwase <olruwase@microsoft.com> Signed-off-by: Olatunji Ruwase <tunji.ruwase@snowflake.com> Co-authored-by: jerryyangli <jerryyangli@gmail.com> Co-authored-by: Yang Li <yangli2@microsoft.com> Co-authored-by: Guanhua Wang <alexwgh333@gmail.com> Co-authored-by: Connor Holmes <connorholmes@microsoft.com> Co-authored-by: Bing Xie <67908712+xiexbing@users.noreply.github.com> Co-authored-by: cassieesvelt <73311224+cassieesvelt@users.noreply.github.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: swli <47371259+lucasleesw@users.noreply.github.com> Co-authored-by: Cheng Li <pistasable@gmail.com> Co-authored-by: Molly Smith <112220543+molly-smith@users.noreply.github.com> Co-authored-by: Ubuntu <jomayeri@microsoft.com> Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com> Co-authored-by: Zhipeng Wang <zhipeng.rainbowserie@gmail.com>	2025-06-06 18:49:41 -04:00
Ramya Ramineni	cb3ad0c176	fp16 optimizer timers fix - TypeError: 'NoneType' object is not callable (#7330 ) This fix is required to prevent the below error: =================================== FAILURES =================================== __________________ TestFp8ComposabilityAcrossZero.test[fp16] ___________________ multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/opt/conda/envs/py_3.10/lib/python3.10/multiprocessing/pool.py", line 125, in worker result = (True, func(args, kwds)) File "/opt/conda/envs/py_3.10/lib/python3.10/multiprocessing/pool.py", line 51, in starmapstar return list(itertools.starmap(args[0], args[1])) File "/root/PR/test/DeepSpeed/tests/unit/common.py", line 322, in _dist_run raise e File "/root/PR/test/DeepSpeed/tests/unit/common.py", line 314, in _dist_run self.run(self._fixture_kwargs) File "/root/PR/test/DeepSpeed/tests/unit/common.py", line 470, in run self._current_test(*fixture_kwargs) File "/root/PR/test/DeepSpeed/tests/unit/runtime/half_precision/test_fp8.py", line 88, in test loss = run_zero(stage, model_dtype) File "/root/PR/test/DeepSpeed/tests/unit/runtime/half_precision/test_fp8.py", line 74, in run_zero model.step() File "/root/PR/test/DeepSpeed/deepspeed/runtime/engine.py", line 2387, in step self._take_model_step(lr_kwargs) File "/root/PR/test/DeepSpeed/deepspeed/runtime/engine.py", line 2290, in _take_model_step self.optimizer.step() File "/root/PR/test/DeepSpeed/deepspeed/runtime/fp16/fused_optimizer.py", line 255, in step self.timers(OVERFLOW_CHECK_TIMER).start() TypeError: 'NoneType' object is not callable """ Co-authored-by: Olatunji Ruwase <tjruwase@gmail.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>	2025-06-06 14:02:22 +00:00
Masahiro Tanaka	7d0c3f782e	Fix issue with symint input (#7243 ) This PR fixes an issue with symint input in backend. (See #7229) --------- Signed-off-by: Masahiro Tanaka <mtanaka@microsoft.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>	2025-06-06 00:08:41 +00:00
Raza Sikander	2ad2011cc9	Fix pytest version to 8.3.5 in hpu-gaudi actions (#7337 ) This is needed to avoid the issue of ci failure in #7330 PR. Signed-off-by: Shaik Raza Sikander <srsikander@habana.ai> Co-authored-by: Olatunji Ruwase <tjruwase@gmail.com>	2025-06-05 23:10:19 +00:00
Quentin Gallouédec	d0f7091aa4	Update config_utils.py (#7333 ) Fixes this warning: ``` /fsx/qgallouedec/miniconda3/envs/trl/lib/python3.12/site-packages/deepspeed/runtime/config_utils.py💯 PydanticDeprecatedSince211: Accessing the 'model_fields' attribute on the instance is deprecated. Instead, you should access this attribute from the model class. Deprecated in Pydantic V2.11 to be removed in V3.0. fields = self.model_fields ``` Co-authored-by: Olatunji Ruwase <tjruwase@gmail.com>	2025-06-05 09:51:14 -07:00
Xinyu Fu	b8d4b84260	Improve Ulysses Plus Docs (#7335 ) Improve or fix some minor indentation, typo, and list numbering issues of the Ulysses Plus tutorial. --------- Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>	2025-06-05 15:26:30 +00:00
Stas Bekman	097f0637d5	UlyssesPlus Docs take 2 (#7332 ) bare md urls don't get automatically linked, so fixing that.	2025-06-03 12:14:06 -07:00
Stas Bekman	81a47408c3	Ulysses Plus Docs (#7331 ) The docs/tutorials for https://github.com/deepspeedai/DeepSpeed/pull/7268 I also updated the previous Ulysses to clarify that it's for Megatron-Deepspeed. --------- Signed-off-by: Stas Bekman <stas@stason.org>	2025-06-03 11:20:41 -07:00
Logan Adams	8f3c3e78ab	Update version.txt after v0.17.0 release (#7326 )	2025-06-02 16:22:32 -07:00
Michael Wyatt	720787e79b	Bump to v0.17.0 (#7324 ) Co-authored-by: Logan Adams <loadams@microsoft.com> v0.17.0	2025-06-02 16:01:44 -07:00
inkcherry	8b03a35646	Fix ci hang in torch2.7& improve ut (#7321 ) fix ci hang. improve the ut. --------- Signed-off-by: inkcherry <mingzhi.liu@intel.com> Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>	2025-06-02 15:41:10 -04:00
Stas Bekman	4d00b38ada	Ulysses SP for HF Integration (#7268 ) This is the Deepspeed counterpart of https://github.com/snowflakedb/ArcticTraining/pull/45 - as the new feature(s) require changes on both sides. For PR reviewers: Readiness status: - [x] Code - [x] Tests - [ ] Docs - working on it Features: - [x] add support for delaying grad addition via `param.ds_grad_is_ready` flag (used when performing tiled compute in an autograd function) - [x] add light sp-only mpu version (Jeff Rasley) - [x] improved debug - [x] added `all_gather_object` to `dist` - [x] `UlyssesSPAttentionHF` (port of UlyssesAttention from Megatron-Deepspeed plus modern MHA-variations) - [x] `UlyssesSPDataLoaderAdapter` - DL adapter to shard the normal DL batches to be used by `UlyssesSPAttentionHF` - [x] `SequenceTiledCompute` - generic autograd function to perform compute after tiling on the sequence dimension - [x] `TiledMLP` - a specific autograd function to perform tiled MLP (it's much easier to understand before trying to grok `SequenceTiledCompute`) - [x] added a differentiable `_DimZeroAllToAll` (Samyam Rajbhandari) - [x] torch-dist-check now allows `torch.distributed.nn` (which is needed since deepspeed's dist is not up to date with `torch.distributed.nn`) --------- Signed-off-by: Stas Bekman <stas.bekman@snowflake.com> Signed-off-by: Stas Bekman <stas@stason.org> Co-authored-by: Stas Bekman <stas.bekman@snowflake.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com> Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>	2025-05-31 07:25:23 +00:00
pencil-hub	0baf79ead0	fix asymmetric in dequantize (#7283 ) 框架中反量化算子（dequantize）在非对称量化产生结果时只将float类型转成half类型，漏掉了对float类型的转换，导致在输出是float类型时，会产生精度误差。 <img width="809" alt="企业微信截图_1747294273387" src="https://github.com/user-attachments/assets/3be19f06-89fe-404c-bc32-efcacc31bb1d" /> --------- Co-authored-by: 潘俊涵 <sp.junhan.pan@enflame-tech.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com> Co-authored-by: Zhipeng Wang <zhipeng.rainbowserie@gmail.com>	2025-05-29 20:05:58 +00:00
Stas Bekman	b66c81077c	anchor transformers version (#7316 ) some features require minimal transformers versions so let's start anchoring. and fixing tests that break with recent transformers. I need this fixed to be able to merge https://github.com/deepspeedai/DeepSpeed/pull/7268 which requires `transformers>=4.51.3` --------- Signed-off-by: Stas Bekman <stas.bekman@snowflake.com> Co-authored-by: Stas Bekman <stas.bekman@snowflake.com>	2025-05-29 06:19:54 +00:00
Raza Sikander	ec6b254dce	Update gaudi2 nightly,ci to latest 1.21.0 build (#7313 ) Signed-off-by: Shaik Raza Sikander <srsikander@habana.ai> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>	2025-05-29 02:58:52 +00:00
Stas Bekman	e5afb88760	`tests/conftest.py`: automatically add local deepspeed repo when running tests (#7317 ) This is a follow up to https://github.com/deepspeedai/DeepSpeed/pull/923 my original code was a copy from transformers, which has a different fs layout and I missed that. So this PR is fixing it to actually do the right thing. Now you can have multiple clones of deepspeed and the tests will use the local repo automatically and not the pre-installed deepspeed.	2025-05-28 23:32:49 +00:00
Stas Bekman	b4cc079eee	CI: prefer bf16 over fp16 (#7304 ) these days fp16 is barely ever used, so we should be testing bf16 instead of fp16 where possible. had to fix a bunch of tests to adapt to this change. a few bugs as well on the way. --------- Signed-off-by: Stas Bekman <stas.bekman@snowflake.com> Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com> Co-authored-by: Stas Bekman <stas.bekman@snowflake.com>	2025-05-28 00:49:21 +00:00
Naveenraj Kamalakannan	b9af5d8d61	Fix: Update grad norm calculation for CPU offload (#7302 ) ## Description This PR fixes an issue where gradient clipping modifications are not reflected in the global gradient norm calculation when CPU offloading is enabled. The issue occurs because the `averaged_gradients` are not being updated with the clipped gradients when CPU offloading is active. ## Problem When using CPU offloading with gradient clipping: 1. The gradients are successfully clipped using `safe_set_local_grad` 2. However, the `_global_grad_norm` calculation still uses the original unclipped gradients. 3. This leads to incorrect gradient norm reporting and potential issues with gradient clipping effectiveness ## Solution The fix ensures that the `averaged_gradients` are properly updated with the clipped gradients when CPU offloading is enabled, similar to how it works when CPU offloading is disabled. ## Testing The fix has been tested with: - CPU offloading enabled and disabled - Different gradient clipping values - A simple model with linear layers - Both FP16 and BF16 ## Related Issues Fixes #7292 --------- Signed-off-by: Naveenraj Kamalakannan <therealnaveenkamal@gmail.com>	2025-05-27 12:13:54 +00:00
Armin Zhu	17c8be0706	Fix the GPU memory usage of ZeRO-Offload (only update stage_1_and_2.py) (#7309 ) Signed-off-by: Armin Zhu <mingzhengzhu1998@gmail.com> Fix the memory usage of ZeRO-Offload with stage 1 and 2. Before the fix, the memory usage is about 3x that of params_FP16. This is caused by the H2D data copy is using different data type. Now the GPU memory usage is about 1x params_FP16. And the H2D memory copy needs a 16bit pinned memory buffer.	2025-05-27 12:13:24 +00:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	b666844ffc	Fix AutoTP gathering replaced layer params when bias is not None (#7257 ) Some params are one-dimensional, this PR adds support for these params. Resolve #7249 ```log param.shape torch.Size([768, 1536]) param.shape torch.Size([768]) ... ``` ```log with deepspeed.module_inject.layers.GatherReplacedLayerParams([param], model, enabled=True): ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "deepspeed/module_inject/layers.py", line 359, in __enter__ self.params[0].gather_params(self.params) File "torch/utils/_contextlib.py", line 116, in decorate_context return func(args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "deepspeed/module_inject/layers.py", line 473, in gather_params param.shape[1], ~~~~~~~~~~~^^^ IndexError: tuple index out of range ``` --------- Signed-off-by: Hollow Man <hollowman@opensuse.org> Signed-off-by: inkcherry <mingzhi.liu@intel.com> Co-authored-by: Hongwei Chen <33092912+hwchen2017@users.noreply.github.com> Co-authored-by: inkcherry <mingzhi.liu@intel.com>	2025-05-25 04:03:20 +00:00
Zhipeng Wang	d4032ec7d1	Update COMMITTERS.md (#7305 ) Adding Zhipeng Wang to the TSC Committers. --------- Co-authored-by: Hongwei Chen <33092912+hwchen2017@users.noreply.github.com>	2025-05-23 14:52:19 +00:00
Logan Adams	e3bd16fdbd	Update next version in version.txt after 0.16.9 release. (#7306 )	2025-05-22 16:31:05 -07:00
YiSheng5	bdba8231bc	[XPU] Support XCCL on deepspeed side (#7299 ) XCCL will be used for XPU device on Pytorch-2.8, with this support will remove torch-ccl on XPU device, and we will also reserve the old path for torch-CCL enable. --------- Signed-off-by: yisheng <yi.sheng@intel.com> Co-authored-by: Ma, Guokai <guokai.ma@gmail.com> v0.16.9	2025-05-22 16:31:26 +00:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	0e3209a16b	Fix extra_repr_str when weight is None / in zero-3 (#7254 ) extra_repr_str will be undefined if self.weight is None with current code. In addition, the shape is stored in ds_shape if it's in ZeRO-3, so we also need to do this check (Although currently AutoTP hasn't supported ZeRO-3). ```logs File "deepspeed/__init__.py", line 394, in tp_model_init model = TpTrainingManager(model=model, tp_size=tp_size, dtype=dtype).module ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "deepspeed/runtime/tensor_parallel/tp_manager.py", line 35, in __init__ self._apply_policies(parser_dict) File "deepspeed/runtime/tensor_parallel/tp_manager.py", line 47, in _apply_policies self._apply_injection_policy(self.config, client_module) File "deepspeed/runtime/tensor_parallel/tp_manager.py", line 53, in _apply_injection_policy replace_transformer_layer(client_module, self.module, None, self.config, self.model_config) File "deepspeed/module_inject/replace_module.py", line 400, in replace_transformer_layer replaced_module = replace_module(model=model, ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "deepspeed/module_inject/replace_module.py", line 653, in replace_module replaced_module, _ = _replace_module(model, policy, state_dict=sd) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "deepspeed/module_inject/replace_module.py", line 713, in _replace_module _, layer_id = _replace_module(child, ^^^^^^^^^^^^^^^^^^^^^^ File "deepspeed/module_inject/replace_module.py", line 713, in _replace_module _, layer_id = _replace_module(child, ^^^^^^^^^^^^^^^^^^^^^^ File "deepspeed/module_inject/replace_module.py", line 689, in _replace_module replaced_module = policies[child.__class__][0](child, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "deepspeed/module_inject/replace_module.py", line 333, in replace_fn new_module = replace_wo_policy(child, _policy, prefix=prefix, state_dict=state_dict) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "deepspeed/module_inject/replace_module.py", line 316, in replace_wo_policy return _autotp._replace_module(module) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "deepspeed/module_inject/auto_tp.py", line 481, in _replace_module self._replace_module(child, name, class_name) File "deepspeed/module_inject/auto_tp.py", line 466, in _replace_module setattr(r_module, name, self.linear_policies[child.__class__](child, prev_name + '.' + name, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "deepspeed/module_inject/auto_tp.py", line 361, in _replace if 'Yuan' in str(self.module): ^^^^^^^^^^^^^^^^ File "torch/nn/modules/module.py", line 2940, in __repr__ mod_str = repr(module) ^^^^^^^^^^^^ File "torch/nn/modules/module.py", line 2940, in __repr__ mod_str = repr(module) ^^^^^^^^^^^^ File "torch/nn/modules/module.py", line 2934, in __repr__ extra_repr = self.extra_repr() ^^^^^^^^^^^^^^^^^ File "deepspeed/module_inject/layers.py", line 267, in extra_repr out_features, in_features = self.weight.shape[-2:] if self.weight is not None else (None, None) ^^^^^^^^^^^^^^^^^^^^^^^^^ ValueError: not enough values to unpack (expected 2, got 1) ``` Signed-off-by: Hollow Man <hollowman@opensuse.org> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>	2025-05-22 04:01:12 +00:00
Stas Bekman	e290bf580d	disable license check until the new license situation has been sorted… (#7301 ) Until we sort out the new license situation disable this check so that new code not owned by MSFT could be added --------- Signed-off-by: Stas Bekman <stas@stason.org>	2025-05-22 00:27:39 +00:00
ranzhejiang	41fceadeeb	Add qwen3moe meta loading for AutoTP (#7297 ) Enable Qwen3-Moe meta loading for AutoTP, for issue https://github.com/deepspeedai/DeepSpeed/issues/7275 Signed-off-by: ranzhejiang <zhejiang.ran@intel.com>	2025-05-20 20:06:51 +00:00
Olatunji Ruwase	0e741714f5	Enable ZeRO set/get APIs for NVMe offload (#7046 ) - Extend APIs for [debugging](https://deepspeed.readthedocs.io/en/latest/zero3.html#debugging) and [modifying](https://deepspeed.readthedocs.io/en/latest/zero3.html#modifying-partitioned-states) ZeRO partitioned states to NVMe offload. - Add vectorized update API. This is performance-critical for NVMe offloading scenarios. --------- Signed-off-by: Olatunji Ruwase <olruwase@microsoft.com> Signed-off-by: Masahiro Tanaka <mtanaka@microsoft.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Co-authored-by: Logan Adams <loadams@microsoft.com> Co-authored-by: Masahiro Tanaka <mtanaka@microsoft.com> Co-authored-by: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com> Co-authored-by: Guanhua Wang <alexwgh333@gmail.com>	2025-05-20 00:11:17 +00:00
Emmanuel Ferdman	b048cc2b46	Modernize system executable detection across components (#7290 ) # PR Summary This small PR resolves deprecation warnings caused by the use of `distutils.spawn.find_executable`: ```python DeprecationWarning: Use shutil.which instead of find_executable ``` Please note that `find_executable` is deprecated from Python 3.10 and removed in 3.12. `shutil.which` available since Python 3.3. Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com> Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>	2025-05-19 21:42:42 +00:00
Max Kovalenko	d0ef6501b8	Avoid graph break by removing another redundant requires grad false (#7263 ) This PR is an follow-up to [PR #7158](https://github.com/deepspeedai/DeepSpeed/pull/7158) handling the same issue in another place. See [PR #7158](https://github.com/deepspeedai/DeepSpeed/pull/7158) for details. --------- Signed-off-by: Max Kovalenko <mkovalenko@habana.ai> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Co-authored-by: Hongwei Chen <33092912+hwchen2017@users.noreply.github.com>	2025-05-19 16:38:12 +00:00
Ma, Guokai	80bc7b76da	Add qwen3 meta loading for AutoTP (#7293 ) This PR fixes https://github.com/deepspeedai/DeepSpeed/issues/7275 to enable Qwen3 meta loading for AutoTP Signed-off-by: Ma, Guokai <guokai.ma@intel.com>	2025-05-19 15:36:42 +00:00
Logan Adams	88a1b5c057	Update patch version after 0.16.8 release (#7296 )	2025-05-19 09:31:28 -07:00
Ma, Guokai	f45950258b	rollback #6726 (#7258 ) This PR rollback #6726 which caused https://github.com/deepspeedai/DeepSpeed/issues/7116 . --------- Signed-off-by: Guokai Ma <guokai.ma@gmail.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com> v0.16.8	2025-05-19 04:54:16 +00:00
Logan Adams	d46947db4a	Temporarily skip AIO tests due to an issue with runners (#7288 ) Signed-off-by: Logan Adams <loadams@microsoft.com> Signed-off-by: Olatunji Ruwase <tunji.ruwase@snowflake.com> Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>	2025-05-18 23:36:06 +00:00
Logan Adams	930ab46e63	Fix issues XPU tests hit with extra-index-url (#7291 ) cc: @Liangliang-Ma --------- Signed-off-by: Logan Adams <loadams@microsoft.com>	2025-05-16 19:07:35 -07:00
Liangliang Ma	5a4e7a08ec	[XPU] update xpu-max1100 CI workflow to torch 2.7 (#7284 ) Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com> Signed-off-by: Logan Adams <loadams@microsoft.com> Co-authored-by: Logan Adams <loadams@microsoft.com>	2025-05-15 10:02:53 -07:00
Reza Yazdani	069ec31c59	Fix fp8 gemm (#7265 ) This PR addresses this issue https://github.com/deepspeedai/DeepSpeed/issues/7236. I might have reverted some of the recent changes introduced in this [PR](https://github.com/deepspeedai/DeepSpeed/pull/6932), which was necessary to remove a misaligned address issue on the CUDA kernel. I will get back to this and try to make the necessary changes for the other pass. cc: @mrwyattii @jeffra --------- Co-authored-by: Reza Yazdani <reza.yazdani@snowflake.com> Co-authored-by: Reza Yazdani <rezay@microsoft.com> Co-authored-by: Jeff Rasley <jeffra45@gmail.com> Co-authored-by: Michael Wyatt <michael.wyatt@snowflake.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>	2025-05-08 15:21:52 -07:00
Stas Bekman	e1ba9e614f	add `Makefile` to ease maintenance (#7267 ) adding `Makefile` with `make format` and `make test` to make things easier to maintain. --------- Signed-off-by: Stas Bekman <stas@stason.org>	2025-05-07 13:09:56 -07:00
LosCrossos	ee492c30a7	Fix compile error for nv_bloat162 (#7248 ) some systems seem not to have the __nv_bfloat162 definition so a placeholder was introduced. newer CUDA libs have that definition, which breaks the compile process. this patch adds the official cuda_bf16.h guard while keeping the old code and a safety assert in case the definition should change in the future. see #7190 for reference --------- Signed-off-by: LosCrossos <165311345+loscrossos@users.noreply.github.com> Signed-off-by: LosCrossos <165311345+mytait@users.noreply.github.com> Co-authored-by: LosCrossos <165311345+mytait@users.noreply.github.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>	2025-04-27 05:16:34 +00:00
Jing Zhang	fff77bd293	Update README.md (#7246 ) I make the sentence look more human, not robot.	2025-04-25 15:15:16 +00:00
Logan Adams	9926879b59	Update CPU torch version to 2.7 (#7241 ) Signed-off-by: Logan Adams <loadams@microsoft.com>	2025-04-23 21:58:01 +00:00
Logan Adams	8d2865e014	Revert "Update torch cpu test version" This reverts commit 00b5678bbf10c12b97a5f80d4b89247dcd837a95.	2025-04-23 13:26:40 -07:00
Logan Adams	00b5678bbf	Update torch cpu test version Signed-off-by: Logan Adams <loadams@microsoft.com>	2025-04-23 13:26:02 -07:00
Yejing Lai	d79bd930d6	Add cpu accelerator fp16 dtype support (#7207 ) Add cpu accelerator fp16 dtype support --------- Signed-off-by: Lai, Yejing <yejing.lai@intel.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>	2025-04-21 19:21:37 +00:00

... 2 3 4 5 6 ...

2958 Commits