[ALST] fix typo in the url (#7444)

fixing the misspelled url --------- Signed-off-by: Stas Bekman <stas@stason.org>
2025-10-20 15:33:51 +08:00 · 2025-07-23 12:33:23 -07:00
parent 3bf53451e5
commit 1d10d48291
3 changed files with 3 additions and 3 deletions
--- a/deepspeed/runtime/sequence_parallel/ulysses_sp.py
+++ b/deepspeed/runtime/sequence_parallel/ulysses_sp.py
@ -21,7 +21,7 @@ ALST features found in this module:

 This module implements Arctic Long Sequence Training: Scalable And Efficient Training For Multi-Million Token Sequences: https://arxiv.org/abs/2506.13996

-For integration docs see: https://www.deepspeed.ai/tutorials/ulysses-alst-sequence-pallellism/
+For integration docs see: https://www.deepspeed.ai/tutorials/ulysses-alst-sequence-parallelism/

 The other ALST features live inside
 https://github.com/snowflakedb/ArcticTraining/blob/main/projects/sequence-parallelism/
--- a/docs/_data/navigation.yml
+++ b/docs/_data/navigation.yml
@ -122,7 +122,7 @@ lnav:
      - title: 'Transformer Kernel'
        url: /tutorials/transformer_kernel/
      - title: 'Arctic Long Sequence Training (ALST) for HF Transformers integration'
-        url: /tutorials/ulysses-alst-sequence-pallellism
+        url: /tutorials/ulysses-alst-sequence-parallelism
      - title: 'ZeRO-Offload'
        url: /tutorials/zero-offload/
      - title: 'ZeRO'
--- a/docs/_tutorials/ds-sequence.md
+++ b/docs/_tutorials/ds-sequence.md
@ -5,7 +5,7 @@ tags: training sequence-parallelism

 In this tutorial we describe how to enable DeepSpeed-Ulysses for Megatron-Deepspeed. DeepSpeed-Ulysses is a simple but highly communication and memory efficient mechanism sequence parallelism approach for training of large transformer models with massive sequence lengths. It partitions input tensors along the sequence dimension and uses a communication-efficient all-2-all collective for distributed attention computations. Additionally, DeepSpeed-Ulysses incorporates advanced modeling and system optimizations, such as Flash attention, sparse attention, and ZeRO optimizer, to optimize both computational efficiency and memory usage. Training with DeepSpeed sequence parallelism allows both model size and sequence length to scale near indefinitely unbounded by single GPU memory limitation and at a high fraction of peak compute performance. Currently, DeepSpeed-Ulysses can handle sequences up to 1 million in length (10 times the size of a complete Harry Potter book!) on 64 A100 GPUs. Please read our [DeepSpeed-Ulysses blog](https://github.com/deepspeedai/DeepSpeed/tree/master/blogs/deepspeed-ulysses) to learn more!

-If you're interested in a newer version that works with HF Transformers, please see https://www.deepspeed.ai/tutorials/ulysses-alst-sequence-pallellism
+If you're interested in a newer version that works with HF Transformers, please see https://www.deepspeed.ai/tutorials/ulysses-alst-sequence-parallelism


 ## 1. Installation