4 Commits

Author SHA1 Message Date
d40a0f5de8 Add dependency for deepcompile test (#7558)
This PR adds dependency to CI tests for DeepCompile.

---------

Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
2025-09-13 00:45:08 -07:00
b9bd03a2ec Move modal tests to tests/v1 (#7557)
This PR moves active tests under `tests/unit/v1` to clarify which tests
are run on modal.

---------

Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
2025-09-12 17:28:47 -04:00
889f0ead27 Enable non-ZeRO mode (#7515)
Enabled via `stage=0` which corresponds to DDP. 
Remove hardwired path to b16_optimizer.
Enable`torch.autocast` for DDP training
Enable native mixed precision DDP for bfloat16
Update torch.autocast and native mixed precision UTs

<img width="976" height="184" alt="image"
src="https://github.com/user-attachments/assets/92904cdc-e312-46a4-943f-011eb5ab146a"
/>

---------

Signed-off-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
2025-08-27 14:07:29 -04:00
a12de38db6 Modal CI (#7289)
This is an initial effort to migrate CI unto Modal infra. This PR
creates two new workflows that run on Modal
1. modal-torch-latest: a subset of nv-torch-latest-v100 that includes
`tests/unit/runtime/zero/test_zero.py`.
2. modal-accelerate: a full copy of nv-accelerate-v100. 

Follow up PRs will selectively migrate relevant workflows onto Modal.

---------

Signed-off-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>
Signed-off-by: Olatunji Ruwase <tjruwase@gmail.com>
Signed-off-by: Tunji Ruwase <tunji.ruwase@snowflake.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Logan Adams <loadams@microsoft.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Co-authored-by: Olatunji Ruwase <tjruwase@gmail.com>
Co-authored-by: Stas Bekman <stas.bekman@snowflake.com>
2025-08-11 20:13:39 +00:00