DeepSpeed/deepspeed at 2b68bbc59476f0a7d85ce9a6ae0a74a057a4b7d2 - DeepSpeed - Gitea: Git for Me

frozenleaves/DeepSpeed

mirror of https://github.com/deepspeedai/DeepSpeed.git synced 2025-10-20 15:33:51 +08:00

Files

History

Masahiro Tanaka 71d077da73 Enable grad scaler for ZeRO-0 + torch.autocast path (#7619 )

Currently, the DeepSpeed engine does not enable the grad scaler for the
ZeRO-0 and `torch.autocast` path, even when dtype is set to `fp16`. This
leads to errors in tests when we replace our hard-coded tolerances with
PyTorch’s [standard
tolerances](https://docs.pytorch.org/docs/stable/testing.html#torch.testing.assert_close)
(Thank you @stas00 for you suggestion regarding the previous PR).

This PR enables the grad scaler for this path to improve accuracy, and
refactors the tests to simplify validation by using
`torch.testing.assert_close`. The tests now rely on PyTorch’s standard
(and stricter) tolerances, and they still pass.

---------

Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>

2025-10-04 13:21:08 +00:00

..

fix typo s/1014 /1024 (#7528 )

2025-09-01 01:12:40 +00:00

Fix the universal checkpoint issue for stage3 when there are multiple subgroups. (#7585 )

2025-09-27 17:39:43 +00:00

avoid setting device_id to init_process_group (#7542 )

2025-09-05 06:06:26 +00:00

DeepCompile: Use min_cut_rematerialization for partitioning joint graphs (#7609 )

2025-10-03 03:39:38 +00:00

Fix invalid f-strings (#7457 )

2025-08-16 18:22:19 +00:00

Update yapf version (#6721 )

2024-11-06 18:57:12 +00:00

[MoE] Fix misuse of num_experts as expert parallel group size (ep_size) (#7551 )

2025-09-09 22:31:44 -07:00

DeepNVMe update (#7215 )

2025-06-06 18:49:41 -04:00

[logging] less startup noise (#7526 )

2025-09-02 19:14:57 +00:00

Fix invalid f-strings (#7457 )

2025-08-16 18:22:19 +00:00

model_implementations

Fix invalid f-strings (#7457 )

2025-08-16 18:22:19 +00:00

Fix invalid f-strings (#7457 )

2025-08-16 18:22:19 +00:00

Fix invalid f-strings (#7457 )

2025-08-16 18:22:19 +00:00

fix wandb.log() call by removing sync kwarg (#7383 )

2025-06-23 16:07:35 +00:00

fix typo with deepspeed/ (#3547 )

2023-06-02 00:47:14 +00:00

Fix invalid f-strings (#7457 )

2025-08-16 18:22:19 +00:00

SuperOffload Release (#7559 )

2025-09-24 13:09:23 +00:00

Update DeepSpeed copyright license to Apache 2.0 (#3111 )

2023-03-30 17:14:38 -07:00

Flops profiler support for F.interpolate (#7353 )

2025-06-20 15:02:56 +00:00

Enable grad scaler for ZeRO-0 + torch.autocast path (#7619 )

2025-10-04 13:21:08 +00:00

fix issues raised by Coverity scans (#7431 )

2025-08-02 12:16:10 -04:00

add print_dist util (#7621 )

2025-10-03 19:30:26 -07:00

__init__.py

Make Muon optimizer easier to enable (#7555 )

2025-09-17 09:52:11 -04:00

accelerator

Abstract accelerator (step 2) (#2560 )

2023-01-06 23:40:58 -05:00

constants.py

DeepNVMe update (#7215 )

2025-06-06 18:49:41 -04:00

env_report.py

Fix invalid f-strings (#7457 )

2025-08-16 18:22:19 +00:00

git_version_info.py

Make op builder detection adapt to accelerator change (#5206 )

2024-03-12 20:48:29 +00:00