The check-torchcuda.py today will search for all occurrences of .is_cuda
in the repository when a commit only modifies C++ headers and sources,
which I believe is not intended.
Check usage of .is_cuda only when a commit modifies any non-C++ file.
Signed-off-by: Junjie Mao <junjie.mao@linux.alibaba.com>
This is the Deepspeed counterpart of
https://github.com/snowflakedb/ArcticTraining/pull/45 - as the new
feature(s) require changes on both sides.
For PR reviewers:
Readiness status:
- [x] Code
- [x] Tests
- [ ] Docs - working on it
Features:
- [x] add support for delaying grad addition via
`param.ds_grad_is_ready` flag (used when performing tiled compute in an
autograd function)
- [x] add light sp-only mpu version (Jeff Rasley)
- [x] improved debug
- [x] added `all_gather_object` to `dist`
- [x] `UlyssesSPAttentionHF` (port of UlyssesAttention from
Megatron-Deepspeed plus modern MHA-variations)
- [x] `UlyssesSPDataLoaderAdapter` - DL adapter to shard the normal DL
batches to be used by `UlyssesSPAttentionHF`
- [x] `SequenceTiledCompute` - generic autograd function to perform
compute after tiling on the sequence dimension
- [x] `TiledMLP` - a specific autograd function to perform tiled MLP
(it's much easier to understand before trying to grok
`SequenceTiledCompute`)
- [x] added a differentiable `_DimZeroAllToAll` (Samyam Rajbhandari)
- [x] torch-dist-check now allows `torch.distributed.nn` (which is
needed since deepspeed's dist is not up to date with
`torch.distributed.nn`)
---------
Signed-off-by: Stas Bekman <stas.bekman@snowflake.com>
Signed-off-by: Stas Bekman <stas@stason.org>
Co-authored-by: Stas Bekman <stas.bekman@snowflake.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>
Until we sort out the new license situation disable this check so that
new code not owned by MSFT could be added
---------
Signed-off-by: Stas Bekman <stas@stason.org>