DeepSpeed

mirror of https://github.com/deepspeedai/DeepSpeed.git synced 2025-10-20 15:33:51 +08:00

Author	SHA1	Message	Date
zhengchenyu	47b3fb5e7f	Fixed the problem of loading universal checkpoint error in multi-machine mode. (#7601 ) In a multi-machine environment, loading the stage3 universal checkpoint will produce incorrect results, causing the loss to increase abnormally.	2025-09-28 20:26:11 +00:00
Ma, Guokai	66c70312f2	Change current_device() to current_device_name() (#7600 ) This PR fix a bug that in some place get_accelerator().current_device() are used instead of get_accelerator().current_device_name(). This would be mostly fine but on CPU this won't work `torch.empty(3, device=get_accelerator().current_device()` <-- won't work other than CUDA device `torch.empty(3, device=torch.device(get_accelerator().current_device()))` <-- works for GPU device, but won't work for CPU `torch.empty(3, device=torch.device(get_accelerator().current_device_name()))` <-- works for both GPU device and CPU `torch.empty(3, device=get_accelerator().current_device_name())` <-- this also works, but not as formal as the last one. This bug is exposed when I tried to run AutoTP training on Xeon server for debug purpose. --------- Signed-off-by: Guokai Ma <guokai.ma@gmail.com>	2025-09-28 10:19:49 -07:00
zhengchenyu	91d14527b6	Fix the universal checkpoint issue for stage3 when there are multiple subgroups. (#7585 ) Describe the bug When the model is large and there are multiple subgroups, we use ds_to_universal.py, will fail ,the error log are below: ``` * 1. Extracting ZeRO fragments 0%\| \| 0/1 [00:03<?, ?it/s] Traceback (most recent call last): File "/work/zhengchenyu/ai-project/qwen3/scripts/ds_to_universal_example.py", line 21, in <module> main() File "/work/zhengchenyu/ai-project/qwen3/scripts/ds_to_universal_example.py", line 18, in main ds_to_universal_main(args) File "/opt/conda/lib/python3.11/site-packages/deepspeed/checkpoint/ds_to_universal.py", line 523, in main _extract_zero_shard_files_stage3(args, optim_files, param_shapes, dp_degree, temp_dir) File "/opt/conda/lib/python3.11/site-packages/deepspeed/checkpoint/ds_to_universal.py", line 375, in _extract_zero_shard_files_stage3 _do_parallel_work(do_work, list(range(dp_degree)), args.num_extract_workers) File "/opt/conda/lib/python3.11/site-packages/deepspeed/checkpoint/ds_to_universal.py", line 359, in _do_parallel_work results.append(do_work(work)) ^^^^^^^^^^^^^ File "/opt/conda/lib/python3.11/site-packages/deepspeed/checkpoint/ds_to_universal.py", line 167, in extract_zero_shards_stage3 dump_param_fragment(temp_dir, 0, dp_index, state_key, flat_state[state_key], name, offset, File "/opt/conda/lib/python3.11/site-packages/deepspeed/checkpoint/ds_to_universal.py", line 194, in dump_param_fragment state_flat_tensor = state_flat_tensor.narrow(0, offset, numel).clone() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: start (0) + length (155582464) exceeds dimension size (74499072). ``` To Reproduce Steps to reproduce the behavior: 1. Use large model to run, or set sub_group_size to a lower value. Then train and save model 2. Run ds_to_universal.py The reason** I found that the previous stage3 universal checkpoint implementation did not take subgroups into account. I also found the following problems during debugging. * Unable to handle multiple sub-groups, which will result in data loss * When load_checkpoint is True, then all process will save to same zero model checkpoint file. If multiple processes write at the same time, the file will be corrupted. Occasionally, file corruption was discovered during testing. Relete issue: #7584 --------- Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>	2025-09-27 17:39:43 +00:00
Masahiro Tanaka	6ea345ae27	Simplify leaf module hook (#7592 ) This PR simplifies hooks for leaf module using PyTorch's API. --------- Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com> Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>	2025-09-27 13:12:15 -04:00
Naveenraj Kamalakannan	b75654001a	disables ZeRO checkpoint loading path when stage=0 (#7586 ) Fixes #7571 When ZeRO is disabled (stage 0) and bf16 is enabled, the current guard sets `load_zero_checkpoint=True`, which leads to `_load_zero_checkpoint` and `_restore_from_bit16_weights()` being called even though no ZeRO state exists. This PR removes the `self.bfloat16_enabled()` condition so that load_zero_checkpoint is tied strictly to `self.zero_optimization()`. Stage 0 (BF16/FP16/FP32): cleanly skips ZeRO checkpoint path. Stage ≥ 1: loads ZeRO partitioned optimizer state as before. cc @sfc-gh-truwase Signed-off-by: Naveenraj Kamalakannan <therealnaveenkamal@gmail.com> Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>	2025-09-25 20:31:14 +00:00
Manh Nguyen	16c1bf429f	Include init file for superoffload folder (#7591 ) This PR just fixes tiny error for pr [7559](https://github.com/deepspeedai/DeepSpeed/pull/7559) in the comment reported error [here](https://github.com/deepspeedai/DeepSpeed/pull/7559#issuecomment-3329036699). ``` [rank1]: File "/usr/local/lib/python3.11/dist-packages/deepspeed/runtime/engine.py", line 1462, in _configure_optimizer [rank1]: self.optimizer = self._configure_zero_optimizer(basic_optimizer) [rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank1]: File "/usr/local/lib/python3.11/dist-packages/deepspeed/runtime/engine.py", line 1835, in _configure_zero_optimizer [rank1]: from deepspeed.runtime.superoffload.superoffload_stage3 import SuperOffloadOptimizer_Stage3 [rank1]: ModuleNotFoundError: No module named 'deepspeed.runtime.superoffload' ``` Create `__init__.py` for superoffload folder to avoid import error when superoffload folder irgnored by pip installation. --------- Signed-off-by: nguyen599 <pnvmanh2123@gmail.com>	2025-09-24 16:50:17 +00:00
Xinyu Lian	af56ed4d37	SuperOffload Release (#7559 ) This PR introduces SuperOffload—an optimizer designed for Superchips (Nvidia GH200 & GB200, AMD MI300A) with high CPU–GPU bandwidth. It enables full fine-tuning of GPT-OSS-20B, Qwen3-14B, and Phi-4 on a single GH200 GPU, achieving up to ~500 TFLOPS, using Hugging Face Transformers and DeepSpeed—no custom modeling code required. SuperOffload extends ZeRO-Offload with fine-grained control and CPUAdam rollback utilities, allowing GPU execution to overlap with CPUAdam. This reduces GPU idle time and improves overall efficiency. Key changes: - New SuperOffloadOptimizer_Stage3 optimizer. - C++/CUDA binding for adam_rollback to revert one optimization step. - Config additions including super_offload and cpuadam_cores_perc. A detailed blog and tutorial will be available soon. --------- Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>	2025-09-24 13:09:23 +00:00
Junjie Mao	17d80ce440	Deepcompile: Make size of activation to free configurable (#7582 ) In deepcompile free-activation mode, only activations larger than a threshold are eagerly freed. The threshold is hardcoded today and thus may not be suitable in all cases. This PR first generalizes the dc.init() interface to take the whole compile_config object, and then converts the threshold into a config item. This corresponds to issue 3 of #7577. --------- Signed-off-by: Junjie Mao <junjie.mao@linux.alibaba.com>	2025-09-24 01:37:46 +00:00
Olatunji Ruwase	bc9ed477e9	Broadcast fp16 overflow in Z1 (#7580 ) Fix #7568 Signed-off-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>	2025-09-23 15:51:43 +00:00
Junjie Mao	8c7c56a932	Deepcompile: Fix bugs when applying deepcompile to VLA-like models (#7569 ) Describe the bug When applying deepcompile to the OpenVLA model (which is composed of two vision transformers and a llama-7B), I met the following issues: a. Not all parameters are trained, which leads to compile-time exceptions as well as incorrect invocation of `endBackward()`. b. `release_param()` can be passed a tuple, not a tensor. c. A use-before-define error in `fast_free_schedule()`. This PR attempts to fix all of those issues. Patch 1~2 resolves a, 3 resolves b and 4 resolves c. To Reproduce the issues Use this script: https://gist.github.com/eternalNight/3c2cf8c703f1e9e7742d3b7f9e1edae3 1. `deepspeed --num_gpus=N openvla-like.py -c` --------- Signed-off-by: Junjie Mao <junjie.mao@linux.alibaba.com>	2025-09-23 07:27:15 +00:00
Junjie Mao	35de2030be	logging: Also set log level of logger handlers (#7576 ) After #7526 the default logger passes logs to a StreamHandler, which has its own log level. Changing the log level of the logger alone does not take effect in such case. Update the log level of all handlers when changing the parent logger's. Signed-off-by: Junjie Mao <junjie.mao@linux.alibaba.com>	2025-09-23 03:32:37 +00:00
Jupiter-Guy	325c6c5e9c	DeepCompile ZeRO-3: robust allgather for uneven shards; fix profiling… (#7489 ) … meta key (max_mem) --------- Signed-off-by: Abhishek <dalakotiashu150@gmail.com> Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com> Co-authored-by: Abhishek <dalakotiashu150@gmail.com> Co-authored-by: Masahiro Tanaka <mtanaka@anyscale.com>	2025-09-22 16:45:00 -07:00
Logan Adams	80033a8293	Update version.txt post 0.17.6 release (#7572 )	2025-09-19 14:33:22 -07:00
jinghanhu	e4f6da9685	[bugfix] fix partition context unpatch (#7566 ) ## Fix asymmetric patching/unpatching in InsertPostInitMethodToModuleSubClasses ### Problem Description The `InsertPostInitMethodToModuleSubClasses` context manager patches `__init__` methods of model classes during entry and unpatches them during exit. However, asymmetric condition checks between patching and unpatching can introduce subtle inheritance bugs. ### Root Cause Analysis The issue occurs with classes that have multiple inheritance where: 1. Child class A does not override `__init__` 2. Parent class B does not inherit from `nn.Module` 3. Parent class C inherits from `nn.Module` Current asymmetric logic: ```python # Patching (entry): Only patch classes with explicit __init__ def _enable_class(cls): if '__init__' in cls.__dict__: # ✅ Strict check cls._old_init = cls.__init__ cls.__init__ = partition_after(cls.__init__) # Unpatching (exit): Restore any class with _old_init def _disable_class(cls): if hasattr(cls, '_old_init'): # ❌ Permissive check cls.__init__ = cls._old_init ``` Execution flow: 1. During entry: Child A is skipped (no explicit `__init__`), Parent C is patched 2. During exit: Child A inherits `_old_init` from Parent C and gets incorrectly "restored" Result: Child A's `__init__` points to Parent C's original `__init__`, bypassing Parent B and breaking the inheritance chain. ### Reproduction Case This pattern is common in Hugging Face models: ```python class Qwen3ForSequenceClassification(GenericForSequenceClassification, Qwen3PreTrainedModel): pass # No explicit __init__ # GenericForSequenceClassification - not a nn.Module subclass # Qwen3PreTrainedModel - inherits from nn.Module ``` ### Solution Apply symmetric condition checking in both patch and unpatch operations: ```python def _disable_class(cls): # Match the patching condition: only restore classes we explicitly patched if '__init__' in cls.__dict__ and hasattr(cls, '_old_init'): cls.__init__ = cls._old_init delattr(cls, '_old_init') # Optional cleanup ``` This ensures that only classes that were explicitly patched during entry get restored during exit. ### Testing The fix has been validated against the Qwen3ForSequenceClassification reproduction case and resolves the inheritance chain corruption. ### Related Issues - External issue: https://github.com/modelscope/ms-swift/pull/5820 Co-authored-by: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com> v0.17.6	2025-09-19 07:24:33 +00:00
Junjie Mao	6b731c5c96	scripts: Check .is_cuda only in non-C++ files (#7561 ) The check-torchcuda.py today will search for all occurrences of .is_cuda in the repository when a commit only modifies C++ headers and sources, which I believe is not intended. Check usage of .is_cuda only when a commit modifies any non-C++ file. Signed-off-by: Junjie Mao <junjie.mao@linux.alibaba.com>	2025-09-19 05:01:50 +00:00
Ma, Guokai	2585881ae9	Make Muon optimizer easier to enable (#7555 ) The original Muon optimizer PR (https://github.com/deepspeedai/DeepSpeed/pull/7509) requires user to explicitly set `use_muon` flags in `model.parameters()`, as shown in test https://github.com/deepspeedai/DeepSpeed/blob/master/tests/unit/ops/muon/test_muon.py#L27 . This PR integrate setting of `use_muon` into DeepSpeed before engine initialization. This makes Muon optimizer easier to use. User only needs to change optimizer in `config.json` from `AdamW` to `Muon`, no need to change code. It will solve the following issue https://github.com/deepspeedai/DeepSpeed/issues/7552 --------- Signed-off-by: Ma, Guokai <guokai.ma@intel.com> Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com> Co-authored-by: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com>	2025-09-17 09:52:11 -04:00
Welsper	aa539c6dd5	fix npu device_id AttributeError issue (#7560 ) ## Environment ``` torch 2.7.1 torch_npu 2.7.1rc1 deepspeed 0.17.3 ``` ## Issue An `AttributeError` is raised when `init_process_group` on NPU device since deepspeed v0.17.3. The issue is similar to https://github.com/deepspeedai/DeepSpeed/pull/7488. Trace: ``` Traceback (most recent call last): File "/home/welsper/.local/lib/python3.10/site-packages/swift/cli/sft.py", line 10, in <module> sft_main() File "/home/welsper/.local/lib/python3.10/site-packages/swift/llm/train/sft.py", line 331, in sft_main return SwiftSft(args).main() File "/home/welsper/.local/lib/python3.10/site-packages/swift/llm/train/sft.py", line 27, in __init__ super().__init__(args) File "/home/welsper/.local/lib/python3.10/site-packages/swift/llm/base.py", line 19, in __init__ self.args = self._parse_args(args) File "/home/welsper/.local/lib/python3.10/site-packages/swift/llm/base.py", line 31, in _parse_args args, remaining_argv = parse_args(self.args_class, args) File "/home/welsper/.local/lib/python3.10/site-packages/swift/utils/utils.py", line 152, in parse_args args, remaining_args = parser.parse_args_into_dataclasses(argv, return_remaining_strings=True) File "/home/welsper/.local/lib/python3.10/site-packages/transformers/hf_argparser.py", line 358, in parse_args_into_dataclasses obj = dtype(inputs) File "<string>", line 325, in __init__ File "/home/welsper/.local/lib/python3.10/site-packages/swift/llm/argument/train_args.py", line 175, in __post_init__ self.training_args = TrainerFactory.get_training_args(self) File "/home/welsper/.local/lib/python3.10/site-packages/swift/trainers/trainer_factory.py", line 70, in get_training_args return training_args_cls(args_dict) File "<string>", line 167, in __init__ File "/home/welsper/.local/lib/python3.10/site-packages/swift/trainers/arguments.py", line 152, in __post_init__ super().__post_init__() File "/home/welsper/.local/lib/python3.10/site-packages/swift/trainers/arguments.py", line 133, in __post_init__ super().__post_init__() File "/home/welsper/.local/lib/python3.10/site-packages/transformers/training_args.py", line 1803, in __post_init__ self.device File "/home/welsper/.local/lib/python3.10/site-packages/transformers/training_args.py", line 2332, in device return self._setup_devices File "/home/welsper/.local/lib/python3.10/site-packages/transformers/utils/generic.py", line 74, in __get__ cached = self.fget(obj) File "/home/welsper/.local/lib/python3.10/site-packages/transformers/training_args.py", line 2259, in _setup_devices self.distributed_state = PartialState(accelerator_state_kwargs) File "/home/welsper/.local/lib/python3.10/site-packages/accelerate/state.py", line 216, in __init__ dist.init_distributed(dist_backend=self.backend, auto_mpi_discovery=False, kwargs) File "/home/welsper/.local/lib/python3.10/site-packages/deepspeed/comm/comm.py", line 854, in init_distributed cdb = TorchBackend(dist_backend, timeout, init_method, rank, world_size) File "/home/welsper/.local/lib/python3.10/site-packages/deepspeed/comm/torch.py", line 120, in __init__ self.init_process_group(backend, timeout, init_method, rank, world_size) File "/home/welsper/.local/lib/python3.10/site-packages/deepspeed/comm/torch.py", line 163, in init_process_group torch.distributed.init_process_group(backend, *kwargs) File "/home/welsper/.local/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 81, in wrapper return func(args, *kwargs) File "/home/welsper/.local/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 95, in wrapper func_return = func(args, **kwargs) File "/home/welsper/.local/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 1717, in init_process_group default_pg, _ = _new_process_group_helper( File "/home/welsper/.local/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 1831, in _new_process_group_helper if device_id is not None and (device_id.index is None or device_id.type == "cpu"): AttributeError: 'device' object has no attribute 'index' ``` ## Fix Switch `torch.npu.device(device_index)` to `torch.device('npu', device_index)`. Now: `d40a0f5de8/accelerator/npu_accelerator.py (L47-L48)` After fix: ```python def device(self, device_index=None): return torch.device('npu', device_index) ``` Signed-off-by: welsper <welsper@qq.com> Co-authored-by: welsper <xinyuyang@cmbchina.com> Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com> Co-authored-by: Ma, Guokai <guokai.ma@gmail.com>	2025-09-17 15:46:33 +08:00
Junjie Mao	2d84be8159	deepcompile: Create a full list of no-copy ops (#7562 ) The list of torch no-copy ops is hard coded and does not include all operations that may aliasing inputs in their outputs. Instead of using a fixed list, iterate over all ops under torch.ops.aten and identify those with aliasing behavior by inspecting their schema. With PyTorch 2.7.1, the default overload of ops identified by the updated logic include: - _nested_view_from_buffer - _reshape_alias - alias - as_strided - conj - detach - diagonal - expand - imag - lift_fresh - narrow - permute - pin_memory - positive - real - reshape - squeeze - t - unfold - unsqueeze - view - view_as_complex - view_as_real - most operations whose name ends with an underscore Signed-off-by: Junjie Mao <junjie.mao@linux.alibaba.com>	2025-09-16 09:05:11 -07:00
Junjie Mao	e9d5d416cc	deepcompile: Record graph order using OrderedDict (#7563 ) On clear, GraphOrder does not clears ordered_frames. That may confuses subsequent passes after the first iteration. Use an OrderedDict to record the mapping from frame IDs to other graph-related information. Also fix the type annotation of graph_order which is a list of (int , bool) tuples instead of a list of int. Signed-off-by: Junjie Mao <junjie.mao@linux.alibaba.com>	2025-09-16 05:25:32 +00:00
Junjie Mao	660ee89529	deepcompile: Create dummy inputs using empty_strided (#7564 ) CUDA tensors may have a larger storage than numel() * dtype.itemsize due to alignment considerations. Creating dummy tensors by torch.zero().as_strided() leads to out-of-bound errors in such cases. Create dummy inputs by empty_strided().zero_() instead. Signed-off-by: Junjie Mao <junjie.mao@linux.alibaba.com>	2025-09-15 14:19:06 -07:00
Masahiro Tanaka	d40a0f5de8	Add dependency for deepcompile test (#7558 ) This PR adds dependency to CI tests for DeepCompile. --------- Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>	2025-09-13 00:45:08 -07:00
Masahiro Tanaka	b9bd03a2ec	Move modal tests to tests/v1 (#7557 ) This PR moves active tests under `tests/unit/v1` to clarify which tests are run on modal. --------- Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>	2025-09-12 17:28:47 -04:00
Masahiro Tanaka	0e859aa0d3	Fix gradient buffer access for DeepCompile Z1/2 (#7548 ) The initialization of DeepCompile+Z1/2 now fails due to the change introduced in #7509. This PR resolves the issue by: - Adding an argument to optimizer.get_flat_partition - Skipping the entire allreduce function in the engine --------- Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>	2025-09-10 18:12:02 +00:00
Masahiro Tanaka	0012ff6ea8	Limit random seed range in tests (#7553 ) `pytest-randomly` often passes a large seed value to `set_random_seed` and causes an error ([example](https://github.com/deepspeedai/DeepSpeed/actions/runs/17620450004/job/50064585974)) ``` E ValueError: Seed must be between 0 and 2**32 - 1 ``` This PR limits the range of seed values by taking a modulo. Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>	2025-09-10 17:45:37 +00:00
Ayush	8cbbbb539d	[MoE] Fix misuse of num_experts as expert parallel group size (ep_size) (#7551 ) Fixes #7535 ## Description This PR fixes a bug in inference/engine.py where num_experts (moe_experts) was incorrectly passed as the expert parallel group size (ep_size) when creating expert parallel groups. Currently: ``` if moe and dist.get_world_size() > 1: self._create_ep_parallel_group(config.moe.moe_experts) ``` This causes invalid behavior whenever `num_experts > world_size`, because `_create_ep_parallel_group` expects a group size, not the total number of experts as pointed out by @Arnoochka ## Root Cause num_experts = number of experts inside the MoE layer. ep_size = how many GPUs to group together for expert parallelism. These were mixed up in the code. ##Fix Replaced the incorrect call with the proper ep_size argument: ``` if moe and dist.get_world_size() > 1: self._create_ep_parallel_group(config.moe.ep_size) ``` Additionally, added a safety check in _create_ep_parallel_group to catch invalid configurations: ``` num_ep_groups = dist.get_world_size() // moe_ep_size if num_ep_groups == 0: raise ValueError( f"Invalid ep_size={moe_ep_size} for world_size={dist.get_world_size()}" ) ``` ## Backward compatibility - If a user was already running with ep_size >= num_experts, the old code worked fine which would still work fine. - Only the previously broken case (num_experts > world_size) now works correctly. Signed-off-by: Flakes342 <ayushtanwar1729@gmail.com>	2025-09-09 22:31:44 -07:00
Stas Bekman	533e834b0a	[alstn tutorial] support bs>1 (#7550 ) Edit tutorial's demo code to support bs>1 and prevent div by zero	2025-09-09 12:51:42 -07:00
Max Kovalenko	450b965efb	Revert "Add index to HPU devices (#7497 )" (#7545 ) This reverts commit 047a7599d24622dfb37fa5e5a32c671b1bb44233. Unfortunately, the above required substantial redesign of existing HPU stack, which is currently not feasible, so reverting. Signed-off-by: Max Kovalenko <mkovalenko@habana.ai> Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>	2025-09-08 18:07:55 -04:00
Masahiro Tanaka	b82ef716c8	Improve error message and reduce validation in autocast test (#7547 ) This PR improves error logging and relaxes loss value checks in the autocast test. Previously, the test displayed error messages and mismatched loss values on all ranks, even if the mismatch only occurred on some ranks. This was confusing, since logs from other ranks could appear correct. This PR changes the behavior so that error messages are shown only on the ranks where the mismatch occurs. Additionally, this PR skips loss value validation for `test_lower_precision_model`, where we intentionally use a different communication dtype from the baseline (standard PyTorch autocast). --------- Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com> Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>	2025-09-05 07:04:18 +00:00
kaixuanliu	08879a3916	avoid setting device_id to `init_process_group` (#7542 ) In some usecases such as vllm, we need to new distributed group not only on gpu, but also on cpu, if we set `device_id` here, it will prevent us from new distributed group on cpu: [L230](https://github.com/vllm-project/vllm/blob/main/vllm/distributed/parallel_state.py#L230) . This PR fixes this bug. --------- Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com> Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>	2025-09-05 06:06:26 +00:00
MingjieLuAMD	78a74874b2	fix get_cuda_compile_flag (#7521 ) command: python3 -c 'import deepspeed;deepspeed.ops.adam.cpu_adam.CPUAdamBuilder().load()' when running on the rocm platform, it encounter an error: Traceback (most recent call last): File "<string>", line 1, in <module> File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 538, in load return self.jit_load(verbose) File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 570, in jit_load cxx_args = self.strip_empty_entries(self.cxx_args()) File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 401, in strip_empty_entries return [x for x in args if len(x) > 0] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 401, in <listcomp> return [x for x in args if len(x) > 0] TypeError: object of type 'NoneType' has no len() Compare with version 0.16.5: https://github.com/deepspeedai/DeepSpeed/blob/v0.16.5/op_builder/builder.py#L435 The current version of code is missing a return when self.is_rocm_pytorch() is True. Just add return '-D__DISABLE_CUDA__' is ok! --------- Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>	2025-09-04 12:34:17 -04:00
Ma, Guokai	43537d0a60	Autotune ZenFlow affinity (#7506 ) This PR address the following ZenFlow optimizer core binding issue. https://github.com/deepspeedai/DeepSpeed/issues/7478 With this PR, ZenFlow optimizer worker would derive its core binding from deepspeed core binding mechanism. The algorithm is as following: 1. Each DeepSpeed rank get its core binding by using DeepSpeed command line `--bind_cores_to_rank`, this command would assign each CPU physical cores to different workers 2. When spawing ZenFlow optimizer worker, DeepSpeed would split current CPU affinity list into two sublist: pt_affinity and zf_affinity 3. zf_affinity would be used to set affinity of ZenFlow optimizer worker. pt_affinity would be used to set current pytorch process. 4. By default, one cores is reserved by each pytorch process, the rest is used by ZenFlow optimizer worker. The number of cores reserved for pytorch process can be changed by ZenFlow config variable: `pt_reserved_cores` --------- Signed-off-by: Guokai Ma <guokai.ma@gmail.com> Signed-off-by: Ma, Guokai <guokai.ma@intel.com> Signed-off-by: aeeeeeep <aeeeeeep@proton.me> Signed-off-by: Olatunji Ruwase <tunji.ruwase@snowflake.com> Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> Co-authored-by: aeeeeeep <aeeeeeep@proton.me> Co-authored-by: Zhipeng Wang <zhipeng.rainbowserie@gmail.com> Co-authored-by: Zhipeng Wang <zwanga@wustl.edu> Co-authored-by: Peng Du <pedu@linkedin.com> Co-authored-by: pengdurice <pengduhit@gmail.com> Co-authored-by: Zhipeng Wang <zhipengbayern@gmail.com> Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>	2025-09-04 07:10:39 -04:00
Masahiro Tanaka	66bf2a642d	Relax restrictions of torch.autocast integration (#7543 ) This PR relaxes two restrictions on torch.autocast in the DeepSpeed engine: 1) Nesting torch.autocast Currently, we do not expect `torch.autocast` to be used outside the DeepSpeed engine. Here is the current behavior: - If `torch.autocast` is enabled in the DeepSpeed config and the engine detects it is also enabled outside, a warning is displayed. - If it is disabled in the config, the engine raises an error. This design prevents the following usage: ```python with torch.autocast(...): logits = deepspeed_model(...) loss = criteria_fn(logits) ``` In this case, we also want to apply autocast to `criteria_fn`. With the current behavior, we would need move `deepspeed_model(...)` outside the `torch.autocast` context, leading to inconsistent code between DeepSpeed and non-DeepSpeed setups. (cannot be handled with `enabled` arg of `torch.autocast`) Change in this PR: `torch.autocast` outside the DeepSpeed engine is ignored, and - If `torch_autocast` is enabled in the config, DeepSpeed will follow that setting. - If it is disabled, DeepSpeed falls back to its own mixed-precision support (or FP32). In these cases, DeepSpeed engine shows a message to explain the behavior. 2) Model’s dtype Previously, DeepSpeed assumed the model’s dtype must be FP32 when `torch.autocast` was enabled. However, models with lower-precision parameters (e.g., BF16) can also be used with autocast. For example, if both the model and `torch.autocast` use BF16, autocast will upcast precision-sensitive ops as needed. Change in this PR: Removed the assertion that restricted the model’s dtype to FP32. This PR also adds and updates tests to cover these new behaviors. --------- Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>	2025-09-03 12:15:10 -07:00
Stas Bekman	8af75487f4	Fix zenflow_torch_adam.py (#7544 ) `_disable_dynamo_if_unsupported` fallback wasn't getting created under certain conditions. This PR is fixing this. Also removed debug print. Fixes issue installing deepspeed on torch 2.4 and 2.1 that triggered this: ``` #42 15.84 Traceback (most recent call last): #42 15.84 File "<string>", line 2, in <module> #42 15.84 File "<pip-setuptools-caller>", line 34, in <module> #42 15.84 File "/tmp/pip-install-qgzd6ybt/deepspeed_b3b4858a062d49c7b8d6ef31332a96cf/setup.py", line 40, in <module> #42 15.84 from op_builder import get_default_compute_capabilities, OpBuilder #42 15.84 File "/tmp/pip-install-qgzd6ybt/deepspeed_b3b4858a062d49c7b8d6ef31332a96cf/op_builder/__init__.py", line 18, in <module> #42 15.84 import deepspeed.ops.op_builder # noqa: F401 # type: ignore #42 15.84 File "/tmp/pip-install-qgzd6ybt/deepspeed_b3b4858a062d49c7b8d6ef31332a96cf/deepspeed/__init__.py", line 25, in <module> #42 15.84 from . import ops #42 15.84 File "/tmp/pip-install-qgzd6ybt/deepspeed_b3b4858a062d49c7b8d6ef31332a96cf/deepspeed/ops/__init__.py", line 6, in <module> #42 15.84 from . import adam #42 15.84 File "/tmp/pip-install-qgzd6ybt/deepspeed_b3b4858a062d49c7b8d6ef31332a96cf/deepspeed/ops/adam/__init__.py", line 9, in <module> #42 15.84 from .zenflow_torch_adam import ZenFlowSelectiveAdamW #42 15.84 File "/tmp/pip-install-qgzd6ybt/deepspeed_b3b4858a062d49c7b8d6ef31332a96cf/deepspeed/ops/adam/zenflow_torch_adam.py", line 685, in <module> #42 15.84 @_disable_dynamo_if_unsupported(single_tensor_fn=_single_tensor_adamw) #42 15.84 NameError: name '_disable_dynamo_if_unsupported' is not defined #42 15.84 [WARNING] ZenFlow disabled: torch internal optimizer symbols could not be imported: cannot import name '_disable_dynamo_if_unsupported' from 'torch.optim.optimizer' (/usr/local/lib/python3.10/dist-packages/torch/optim/optimizer.py) ``` --------- Signed-off-by: Stas Bekman <stas@stason.org>	2025-09-03 18:14:18 +00:00
Masahiro Tanaka	1e183a6a9d	Fix scaling and allgather with `torch.autocast` (#7534 ) This PR includes these two fixes: - Use GradScaler only for FP16 (not for BF16) - Fix dtype conversion for ZeRO3 allgather - The reduce hook should be called only once, even when a parameter is shared across multiple layers (tied parameters). - Currently, the hook is triggered at each tied layer because we temporarily set `.data` with a different dtype. - The fix ensures that the parameter consistently retains the same dtype. --------- Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com> Signed-off-by: Olatunji Ruwase <tunji.ruwase@snowflake.com> Signed-off-by: Stas Bekman <stas@stason.org> Signed-off-by: jakehemmerle <jakehemmerle@protonmail.com> Signed-off-by: Qi Bin <qibin0506@users.noreply.github.com> Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com> Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> Co-authored-by: digger yu <digger-yu@outlook.com> Co-authored-by: Jake Hemmerle <jakehemmerle@gmail.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Co-authored-by: Qi Bin <qibin0506@users.noreply.github.com>	2025-09-03 01:22:19 +00:00
Qi Bin	c07b3abf9a	fixed DeepSpeedCPULion with ZeRO-Offload bug (#7531 ) fixed DeepSpeedCPULion with ZeRO-Offload bug [issues/7524](https://github.com/deepspeedai/DeepSpeed/issues/7524) Signed-off-by: Qi Bin <qibin0506@users.noreply.github.com> Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>	2025-09-02 21:40:14 +00:00
Jake Hemmerle	4d83f3fe13	docs typo: `lrrt.md`, reference to `cycle_min_lr` should be `cycle_max_lr` (#7530 ) Signed-off-by: Olatunji Ruwase <tunji.ruwase@snowflake.com> Signed-off-by: jakehemmerle <jakehemmerle@protonmail.com> Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com> Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>	2025-09-02 21:17:22 +00:00
Stas Bekman	9e4957eb30	[doc] fixing moe tutorial (#7538 ) MoE tutorial fixes: 1. cifar example has been moved - fix the url 2. fixing text and improving markup --------- Signed-off-by: Stas Bekman <stas@stason.org>	2025-09-02 16:53:15 -04:00
Stas Bekman	066d912052	[logging] less startup noise (#7526 ) This PR removes some and enables removing other startup noise - especially when it's replicated rank-times and doesn't carry any informative payload. 1. add `--log_level` flag which sets the launcher's logger to a desired setting - defaulting to `logging.INFO` for now for BC, but will change to `logging.WARNING` in v1 2. add `--quiet/-q` flag which sets the launcher's logger to `logging.ERROR` which essentially disables startup info messages 3. change the logging defaults elsewhere to `logging.WARNING` (main impact is the accelerator.py), once deepspeed started the frameworks control its loglevel for each rank, so the tricky part is this pre-start stage info logs. this part is breaking BC as there is no machinery to set the logger level for `real_accelerator.py`) 4. builder is changed to non-verbose (BC breaking) --------- Signed-off-by: Stas Bekman <stas@stason.org> Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>	2025-09-02 19:14:57 +00:00
Stas Bekman	411e20a3f7	undo the revert (#7536 ) replay https://github.com/deepspeedai/DeepSpeed/pull/3019 as it got reverted	2025-09-02 14:24:48 -04:00
digger yu	902e78c989	fix typo s/1014 /1024 (#7528 ) fix typo s/1014 /1024 s/was_interruptted /was_interrupted detail info modified: deepspeed/autotuning/scheduler.py modified: deepspeed/autotuning/utils.py Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>	2025-09-01 01:12:40 +00:00
Olatunji Ruwase	eabb687ac1	ZeRO3: Improve mismatch detection (#7525 ) ZeRO3 tracks DDP (SPMD) behavior by matching values different training states across ranks. Some of these states are represented as lists, and mismatches sometimes manifests as hangs during error detection. This PR improves error detection by first validating the list lengths across ranks before validating the list contents. Motivated by https://github.com/deepspeedai/DeepSpeed/issues/7461#issuecomment-3235146207 --------- Signed-off-by: Olatunji Ruwase <tunji.ruwase@snowflake.com> Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>	2025-08-31 17:57:10 -04:00
heyujiao99	9bf215d213	Add riscv64 cpu support in deepspeed_shm_comm op (#7519 ) This patch adds riscv64 support for the deepspeed_shm_comm operator，enabling DeepSpeed to perform CPU training/inference on RISCV64 hosts, for research purposes. Based on the discussion in pull #7387 , this patch refactors some original code to support multiple CPU architectures. Related tests have passed on x86 and RISC-V CPU, and I successfully ran Qwen2.5 on a RISC-V CPU, ```bash (myenv) [root@openeuler-riscv64 DeepSpeed ]$ pytest tests/unit/comm/test_dist.py::TestDistInferenceAllReduce -vv ====================================================================== test session starts ======================================================================= platform linux -- Python 3.11.4, pytest-7.2.0, pluggy-1.6.0 -- /root/myenv/bin/python3 cachedir: .pytest_cache hypothesis profile 'default' rootdir: /root/ecosystem/DeepSpeed/tests, configfile: pytest.ini plugins: mock-3.14.1, hypothesis-6.135.14, forked-1.6.0 collected 3 items tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype0] PASSED [ 33%] tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype1] PASSED [ 66%] tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype2] PASSED [100%] (myenv) root@ubuntu-2204:~/soft-working-dir/DeepSpeed# pytest tests/unit/comm/test_dist.py::TestDistInferenceAllReduce -vv ====================================================================== test session starts ======================================================================= platform linux -- Python 3.12.3, pytest-7.2.0, pluggy-1.6.0 -- /root/soft-working-dir/myenv/bin/python3 cachedir: .pytest_cache rootdir: /root/soft-working-dir/DeepSpeed/tests, configfile: pytest.ini plugins: forked-1.6.0 collected 3 items tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype0] PASSED [ 33%] tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype1] PASSED [ 66%] tests/unit/comm/test_dist.py::TestDistInferenceAllReduce::test[dtype2] PASSED [100%] ``` --------- Signed-off-by: heyujiao99 <he.yujiao@sanechips.com.cn> Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com> Co-authored-by: Ma, Guokai <guokai.ma@gmail.com>	2025-08-29 23:41:25 +08:00
Tingfeng Lan	e04fa3e679	Update README with ZenFlow release blog featured by PyTorch. (#7520 ) Main change: Add post bullet and link to ZenFlow release blog on latest news. Blog link: https://pytorch.org/blog/zenflow-stall-free-offloading-engine-for-llm-training/ --------- Signed-off-by: Tingfeng Lan <erc8gx@virginia.edu>	2025-08-28 13:28:08 -04:00
Olatunji Ruwase	889f0ead27	Enable non-ZeRO mode (#7515 ) Enabled via `stage=0` which corresponds to DDP. Remove hardwired path to b16_optimizer. Enable`torch.autocast` for DDP training Enable native mixed precision DDP for bfloat16 Update torch.autocast and native mixed precision UTs <img width="976" height="184" alt="image" src="https://github.com/user-attachments/assets/92904cdc-e312-46a4-943f-011eb5ab146a" /> --------- Signed-off-by: Olatunji Ruwase <tunji.ruwase@snowflake.com> Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>	2025-08-27 14:07:29 -04:00
Zhipeng Wang	66ad278048	Enabling Muon Optimizer in DeepSpeed (#7509 ) Authorship: @pengdurice and @PKUWZP Related Issue: #7438 # Introduction [Muon](https://arxiv.org/abs/2502.16982), a new optimizer that has attracted the community’s attention recently shows promising results in training large language models. Adding the Muon Optimizer to DeepSpeed, a popular OSS framework for large scale training and inference is critically important for DeepSpeed users and developers. There has been a [PR](https://github.com/deepspeedai/DeepSpeed/pull/7454) attempting the adoption. (Huge Thanks to @qimcis), which is a good starting point. It still requires more substantial effort to make it fully compatible and work within DeepSpeed. We are publishing this PR to fully enable Muon Optimizer capabilities for DeepSpeed. # Issues and solutions ## Issues 1. With stage 1, 2 or 3, the optimizer states will be partitioned within the same data parallel group. This means that each process is already handling only parts of the model parameters and there is no need to use the DP solution as in the [code](https://github.com/KellerJordan/Muon/blob/master/muon.py#L195). 2. The parameters (and the gradients) will be flattened to 1D vector before being used in the optimizer, thus nullifying the major hypothesis of the muon optimizer: it works by orthogonalizing the updates for each matrix (dim >=2) ## Solutions To solve the issues, we propose this new PR in which: 1. We simplify the Muon code by [removing](https://github.com/deepspeedai/DeepSpeed/compare/master...pengdurice:DeepSpeed:peng-add-muon-v1#diff-c9052994e41caee9ca88363749c10af08655f8019f08dc971c018663d25a3712R22) the partitioning and muon updates logics. 2. We [move](https://github.com/deepspeedai/DeepSpeed/compare/master...pengdurice:DeepSpeed:peng-add-muon-v1#diff-99dcf26ea2876ff5bbf05b5165c4133eaa0d0f36b170685643c2f7e2eb566addR1867) the muon update to the [get_flat_partition](https://github.com/deepspeedai/DeepSpeed/compare/master...pengdurice:DeepSpeed:peng-add-muon-v1#diff-99dcf26ea2876ff5bbf05b5165c4133eaa0d0f36b170685643c2f7e2eb566addR1848) function of stage 1 and 2 DeepSpeedZeroOptimizer in which per parameter gradients are collected before being flattened and used by the optimizer to update the model parameters. Since each parameter is still in its original shape, we can easily apply the muon updates. 3. We also save the momentum buffer into the optimizer’ state so that we have a smooth convergence after applying the saved checkpoints. 4. We added comprehensive unit tests to validate Muon Optimizer's correctness and functionality. # Future directions and roadmap In the future, several follow up works are of interests: - [ ] Create a CPU offload version. - [ ] Apply Muon to Stage 3 - [ ] Use the highly optimized version of Adam for the Adam part of MuonWithAuxAdam optimizer. - [ ] More efficient implementations e.g. a) add specialized kernels for Newton-Schulz iteration and muon updates; b) parallelize updates for the parameters (currently, each parameter is updated separately and sequentially) --------- Co-authored-by: Peng Du <pedu@linkedin.com> Co-authored-by: pengdurice <pengduhit@gmail.com> Co-authored-by: Zhipeng Wang <zhipengbayern@gmail.com> Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>	2025-08-26 18:34:35 -07:00
Zhipeng Wang	e4662faffd	Update TSC Committers (#7517 ) Update the affiliations and the TSC Committers. Co-authored-by: Zhipeng Wang <zwanga@wustl.edu>	2025-08-26 07:24:12 -04:00
aeeeeeep	38d1a9eb64	Fix assert when 'pp_int' object has no attribute 'custom_print_str' (#7507 ) Fix assert `'pp_int' object has no attribute 'custom_print_str'` when tracking deepspeed module with some track debug tools like [objwatch](https://github.com/aeeeeeep/objwatch) ```python3 import objwatch objwatch.watch(targets=[deepspeed], framework="torch.distributed", indexes=[0,], with_locals=True) ``` Signed-off-by: aeeeeeep <aeeeeeep@proton.me>	2025-08-25 10:57:08 -04:00
Stas Bekman	d9cb78683e	CI funding shout out to modal.com (#7503 ) modal.com has been sponsoring our CI - thank you, Modal! Add a shout out.	2025-08-21 10:03:49 -07:00
YiMing Liu	bc8c0db3b4	Support DeepSpeed offload and reload states with ZeRO1 and ZeRO2 (#7421 ) Please refer to https://github.com/deepspeedai/DeepSpeed/issues/7251 --------- Signed-off-by: lym <letusgo126@126.com> Signed-off-by: Max Kovalenko <mkovalenko@habana.ai> Signed-off-by: Alex Kiefer <alexkiefer51@gmail.com> Signed-off-by: Stas Bekman <stas@stason.org> Signed-off-by: Sam Foreman <saforem2@gmail.com> Signed-off-by: Stas Bekman <stas.bekman@snowflake.com> Signed-off-by: huanyuqu <yc37960@um.edu.mo> Signed-off-by: weeknan <zhounan0431@163.com> Signed-off-by: WoosungMyung <dntjd517@naver.com> Signed-off-by: Nir Sonnenschein <nsonnenschein@habana.ai> Signed-off-by: Junjie Mao <banxing.mjj@alibaba-inc.com> Signed-off-by: vinceliu <lpnpcs@gmail.com> Signed-off-by: Tingfeng Lan <erc8gx@virginia.edu> Signed-off-by: Olatunji Ruwase <tunji.ruwase@snowflake.com> Signed-off-by: Olatunji Ruwase <tjruwase@gmail.com> Signed-off-by: Tunji Ruwase <tunji.ruwase@snowflake.com> Signed-off-by: Yao, Matrix <matrix.yao@intel.com> Signed-off-by: Yusen Wu <xrn4ub@virginia.edu> Signed-off-by: cyy <cyyever@outlook.com> Co-authored-by: Max Kovalenko <mkovalenko@habana.ai> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Co-authored-by: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com> Co-authored-by: Alexander Kiefer <56556451+alexk101@users.noreply.github.com> Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com> Co-authored-by: Hongwei Chen <33092912+hwchen2017@users.noreply.github.com> Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> Co-authored-by: Sam Foreman <saforem2@gmail.com> Co-authored-by: Stas Bekman <stas.bekman@snowflake.com> Co-authored-by: huanyuqu <55744355+huanyuqu@users.noreply.github.com> Co-authored-by: weeknan <57584045+weeknan@users.noreply.github.com> Co-authored-by: Olatunji Ruwase <tjruwase@gmail.com> Co-authored-by: Aurick Qiao <aurick.qiao@snowflake.com> Co-authored-by: Logan Adams <loadams@microsoft.com> Co-authored-by: Zhipeng Wang <zhipeng.rainbowserie@gmail.com> Co-authored-by: WoosungMyung <115716986+WoosungMyung@users.noreply.github.com> Co-authored-by: Nir Sonnenschein <nsonnenschein@habana.ai> Co-authored-by: Junjie Mao <junjie.mao@hotmail.com> Co-authored-by: Junjie Mao <banxing.mjj@alibaba-inc.com> Co-authored-by: lpnpcs <lpnpcs@vip.qq.com> Co-authored-by: Ma, Guokai <guokai.ma@gmail.com> Co-authored-by: Tingfeng Lan <tafflann@outlook.com> Co-authored-by: Rui Yan <49115835+yanrui27@users.noreply.github.com> Co-authored-by: Feng Yunlong <20281571+AlongWY@users.noreply.github.com> Co-authored-by: Yao Matrix <matrix.yao@intel.com> Co-authored-by: Tingfeng Lan <erc8gx@virginia.edu> Co-authored-by: Yusen Wu <xrn4ub@virginia.edu> Co-authored-by: Yuanyuan Chen <cyyever@outlook.com> Co-authored-by: Michael Wyatt <michael.wyatt@snowflake.com>	2025-08-20 22:03:26 +00:00
Logan Adams	f45159e415	Update version.txt after 0.17.5 release (#7502 )	2025-08-20 21:41:57 +00:00

1 2 3 4 5 ...

2933 Commits