Files
DeepSpeed/tests
Stas Bekman c2bb53f20f TiledMLP + SequenceTiledCompute: improve the bs>1 use-case (#7422)
Improved TiledMLP and SequenceTiledCompute for bs>1

This PR:
- extends the testing utils to add `CaptureStd*`, `CaptureLogger`
context managers
- extends the test to run both bs=1 and bs=2
- use an uneven seqlen to test varlen shards
- flattens bs+seqlen dim, to avoid problems with grad tensor strides
when bs>1 - mlp doesn't care for the bs dimension so using a pretend
`bs*seqlen` seqlen instead and restoring the shape at the end for the
grad.

---------

Signed-off-by: Stas Bekman <stas@stason.org>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
2025-07-16 09:30:08 -07:00
..
2023-04-11 11:53:38 -07:00
2024-11-13 09:04:56 -08:00
2024-02-22 22:55:40 +00:00