[ALST] fix typo in the url (#7444)

fixing the misspelled url

---------

Signed-off-by: Stas Bekman <stas@stason.org>
This commit is contained in:
Stas Bekman
2025-07-23 12:33:23 -07:00
committed by GitHub
parent 3bf53451e5
commit 1d10d48291
3 changed files with 3 additions and 3 deletions

View File

@ -21,7 +21,7 @@ ALST features found in this module:
This module implements Arctic Long Sequence Training: Scalable And Efficient Training For Multi-Million Token Sequences: https://arxiv.org/abs/2506.13996
For integration docs see: https://www.deepspeed.ai/tutorials/ulysses-alst-sequence-pallellism/
For integration docs see: https://www.deepspeed.ai/tutorials/ulysses-alst-sequence-parallelism/
The other ALST features live inside
https://github.com/snowflakedb/ArcticTraining/blob/main/projects/sequence-parallelism/

View File

@ -122,7 +122,7 @@ lnav:
- title: 'Transformer Kernel'
url: /tutorials/transformer_kernel/
- title: 'Arctic Long Sequence Training (ALST) for HF Transformers integration'
url: /tutorials/ulysses-alst-sequence-pallellism
url: /tutorials/ulysses-alst-sequence-parallelism
- title: 'ZeRO-Offload'
url: /tutorials/zero-offload/
- title: 'ZeRO'

View File

@ -5,7 +5,7 @@ tags: training sequence-parallelism
In this tutorial we describe how to enable DeepSpeed-Ulysses for Megatron-Deepspeed. DeepSpeed-Ulysses is a simple but highly communication and memory efficient mechanism sequence parallelism approach for training of large transformer models with massive sequence lengths. It partitions input tensors along the sequence dimension and uses a communication-efficient all-2-all collective for distributed attention computations. Additionally, DeepSpeed-Ulysses incorporates advanced modeling and system optimizations, such as Flash attention, sparse attention, and ZeRO optimizer, to optimize both computational efficiency and memory usage. Training with DeepSpeed sequence parallelism allows both model size and sequence length to scale near indefinitely unbounded by single GPU memory limitation and at a high fraction of peak compute performance. Currently, DeepSpeed-Ulysses can handle sequences up to 1 million in length (10 times the size of a complete Harry Potter book!) on 64 A100 GPUs. Please read our [DeepSpeed-Ulysses blog](https://github.com/deepspeedai/DeepSpeed/tree/master/blogs/deepspeed-ulysses) to learn more!
If you're interested in a newer version that works with HF Transformers, please see https://www.deepspeed.ai/tutorials/ulysses-alst-sequence-pallellism
If you're interested in a newer version that works with HF Transformers, please see https://www.deepspeed.ai/tutorials/ulysses-alst-sequence-parallelism
## 1. Installation