Files
DeepSpeed/.gitignore
Olatunji Ruwase 56c5223868 bf16+pipeline parallelism (#1801)
* bf16 updates

* Got bf16 working

* fp32 reduction; flattened tensors

* bf16+zero_stage_1 first cut

* finish zero_stage 1 sharding

* Matching fp16 with debugging codes

* Matching loss with fp16

* Fix gradient clipping

* bf16 gradient clipping fix
bf16 checkpoint save/load

* Unscale grad norm

* Fix grad norm scaling

* Enable loading fp16_zero_1 into bf16_zero_1 engine and vice versa

* Fix clip_grad key error

* Reduce tied weight gradients

* Fix grad norm for moe

* Reduce specified gradients

* Use O(n) instead of O(n^2)

* Remove optimizer restriction for bf16

* Link bf16 & fp32 params

* Clip gradients of last stage tied weights

* Simplify tied weights reduction logic

* Also clip all tp rank parameters

* lp to hp mapping

* Link lp/hp/optim state; Refresh links after checkpoint load

* Remove debug print

* Remove debug print

* Simplify zero_grad logic

* fp32 accessors

* Fix update bug

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2022-04-19 10:15:59 -07:00

32 lines
384 B
Plaintext

*.pyc
.idea/
*~
*.swp
*.log
deepspeed/git_version_info_installed.py
__pycache__
# Build + installation data
build/
dist/
*.so
deepspeed.egg-info/
build.txt
# Website
docs/_site/
docs/build
docs/code-docs/source/_build
docs/code-docs/_build
docs/code-docs/build
.sass-cache/
.jekyll-cache/
.jekyll-metadata
# Testing data
tests/unit/saved_checkpoint/
# Dev/IDE data
.vscode
.theia