14 Commits

Author SHA1 Message Date
fd40516923 Update GH org references (#6998)
Signed-off-by: Olatunji Ruwase <olruwase@microsoft.com>
Signed-off-by: Logan Adams <loadams@microsoft.com>
Signed-off-by: Fabien Dupont <fdupont@redhat.com>
Co-authored-by: Fabien Dupont <fabiendupont@fabiendupont.fr>
2025-02-05 00:56:50 +00:00
7f3d669b40 Remove Duplicate Declaration of pandas in Dockerfile (#6959)
### Description

This pull request removes the redundant installation of `pandas` from
the `Dockerfile`.
It was previously declared twice, and this update eliminates the
duplicate entry, improving the clarity and maintainability of the
`Dockerfile`.


018ece5af2/docker/Dockerfile (L124)


018ece5af2/docker/Dockerfile (L135)

### Changes

Removed the duplicate pandas installation line from the `RUN pip
install` command.
2025-01-17 17:44:49 +00:00
a1b0c35a1d Switch what versions of python are supported (#5676)
Add support for testing compilation with python 3.11/3.12.  

Also add the dockerfiles used to build those images.

---------

Co-authored-by: Michael Wyatt <michael.wyatt@snowflake.com>
2024-11-06 20:37:52 -08:00
15ed83a9a6 Update dockerfile with updated versions (#4780)
Fixes #4763
2023-12-07 19:25:57 +00:00
e31b40411f fix: remove unnessary # punct in the second sed command (#4061) 2023-07-31 16:58:31 +00:00
45cecc05fb fix "ERROR: failed to solve: nvidia/cuda:11.7.0-devel-ubuntu18.04: docker.io/nvidia/cuda:11.7.0-devel-ubuntu18.04: not found" (#3930)
Update Nvidia docker version.

Fix "ERROR: failed to solve: nvidia/cuda:11.7.0-devel-ubuntu18.04: docker.io/nvidia/cuda:11.7.0-devel-ubuntu18.04: not found"

Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
2023-07-13 12:27:29 -07:00
a65f6b9e9b Update Dockerfile with newer cuda and torch. (#3716)
* Add non-interactive prompt, causing issues for some users

* Update pytorch version too
2023-06-09 12:31:03 -07:00
ab1d2f826b Update Dockerfile (#3298)
line 98 should be
curl -O https://bootstrap.pypa.io/pip/3.6/get-pip.py && \
to avoid
#16 106.9 ERROR: This script does not work on Python 3.6 The minimum supported Python version is 3.7. Please use https://bootstrap.pypa.io/pip/3.6/get-pip.py instead.

Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
2023-04-20 12:06:07 -07:00
7e2103f8e2 Use rocm/pytorch:latest (#2613) 2022-12-15 14:03:09 -08:00
7bcb4fabeb Enable CG headers on ROCm (#1821)
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
2022-03-11 12:06:41 -08:00
ac71a1a461 [docker] simplify and update rocm dockerfile (#1819) 2022-03-09 15:23:27 -08:00
c3c8d5dd93 AMD support (#1430)
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Jithun Nair <jithun.nair@amd.com>
Co-authored-by: rraminen <rraminen@amd.com>
Co-authored-by: Jeff Daily <jeff.daily@amd.com>
Co-authored-by: okakarpa <okakarpa@amd.com>
Co-authored-by: rraminen <rraminen@amd.com>
Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com>
Co-authored-by: Jeff Daily <jeff.daily@amd.com>
Co-authored-by: okakarpa <okakarpa@amd.com>
Co-authored-by: Ramya Ramineni <62723901+rraminen@users.noreply.github.com>
2022-03-03 01:53:35 +00:00
599258f979 ZeRO 3 Offload (#834)
* Squash stage3 v1 (#146)

Co-authored-by: Samyam <samyamr@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: Shaden Smith <ShadenTSmith@gmail.com>
Co-authored-by: eltonzheng <eltonz@microsoft.com>

* Fix correctness bug (#147)

* formatting fix (#150)

* stage3 bugfix (API) update and simplified FP16 Z3 tests (#151)

* fp16 Z3 API update and bugfix

* revert debug change

* ZeRO-3 detach and race condition bugfixes (#149)

* trying out ZeRO-3 race condition fix

* CUDA sync instead of stream

* reduction stream sync

* remove commented code

* Fix optimizer state_dict KeyError (#148)

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

* fix for smaller SGS sizes, ensures each grad is backed by unique tensors (#152)

* Simplifying the logic for getting averaged gradients (#153)

* skip for now

* Z3 Docs redux (#154)

* removing some TODOs and commented code (#155)

* New Z3 defaults (#156)

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

* formatting

* megatron external params

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: Shaden Smith <ShadenTSmith@gmail.com>
Co-authored-by: eltonzheng <eltonz@microsoft.com>
2021-03-08 12:54:54 -08:00
b29229bf52 update docker image and bump DSE 2020-09-10 17:18:18 +00:00