Files
DeepSpeed/deepspeed
Stas Bekman 1b08325da3 [TiledMLP] moe support (#7622)
MoE routers seem to drop the `bs` dimension in `x` so the `[bs, seqlen,
hidden_size]` is no longer expected. support that use-case.

Signed-off-by: Stas Bekman <stas@stason.org>
2025-10-07 13:33:34 +00:00
..
2025-09-01 01:12:40 +00:00
2025-08-16 18:22:19 +00:00
2024-11-06 18:57:12 +00:00
2025-06-06 18:49:41 -04:00
2025-08-16 18:22:19 +00:00
2025-08-16 18:22:19 +00:00
2023-06-02 00:47:14 +00:00
2025-08-16 18:22:19 +00:00
2025-09-24 13:09:23 +00:00
2025-10-07 13:33:34 +00:00
2025-10-03 19:30:26 -07:00
2025-06-06 18:49:41 -04:00
2025-08-16 18:22:19 +00:00