pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Author	SHA1	Message	Date
Anthony Shoumikhin	7d39e73c57	Fix more URLs (#153277 ) Or ignore them. Found by running the lint_urls.sh script locally with https://github.com/pytorch/pytorch/pull/153246 Pull Request resolved: https://github.com/pytorch/pytorch/pull/153277 Approved by: https://github.com/malfet	2025-05-14 16:23:50 +00:00
Catherine Lee	4b8b7c7fb9	[CI] Use cmake from pip instead of conda in CI docker images (#152537 ) As in title idk how the install_cmake script is used because I see it being called with 3.18 but when I look at the build jobs some say 3.18 and others 3.31 Just make everything install cmake via the requirements-ci.txt. I don't know if the comment at `5d36485b4a/.ci/docker/common/install_conda.sh (L78)` still holds, but pretty much every build has CONDA_CMAKE set to true, so I'm just defaulting to installing through pip Also defaulting to 4.0.0 everywhere except the executorch docker build because executorch reinstalls 3.31.something Pull Request resolved: https://github.com/pytorch/pytorch/pull/152537 Approved by: https://github.com/cyyever, https://github.com/atalman, https://github.com/malfet	2025-05-08 18:58:10 +00:00
PyTorch MergeBot	a7ea115494	Revert "[CI] Use cmake from pip instead of conda in CI docker images (#152537 )" This reverts commit 941062894a1accfd472d0acd2716493e1f173bd7. Reverted https://github.com/pytorch/pytorch/pull/152537 on behalf of https://github.com/malfet due to Sorry to revert this PR, but it broke doc builds, see `4976b1a3a8/1` ([comment](https://github.com/pytorch/pytorch/pull/152537#issuecomment-2863337268))	2025-05-08 14:53:34 +00:00
Catherine Lee	941062894a	[CI] Use cmake from pip instead of conda in CI docker images (#152537 ) As in title idk how the install_cmake script is used because I see it being called with 3.18 but when I look at the build jobs some say 3.18 and others 3.31 Just make everything install cmake via the requirements-ci.txt. I don't know if the comment at `5d36485b4a/.ci/docker/common/install_conda.sh (L78)` still holds, but pretty much every build has CONDA_CMAKE set to true, so I'm just defaulting to installing through pip Also defaulting to 4.0.0 everywhere except the executorch docker build because executorch reinstalls 3.31.something Pull Request resolved: https://github.com/pytorch/pytorch/pull/152537 Approved by: https://github.com/cyyever, https://github.com/atalman, https://github.com/malfet	2025-05-08 10:10:27 +00:00
Catherine Lee	6d28d61323	[CI] Remove protobuf from docker image (#151933 ) Pretty sure the source should be the one in third-party Pull Request resolved: https://github.com/pytorch/pytorch/pull/151933 Approved by: https://github.com/huydhn	2025-04-23 10:29:09 +00:00
Jeff Daily	2bd5bfa3ce	[ROCm] use magma-rocm tarball for CI/CD (#149986 ) Follow-up to #149902. Pull Request resolved: https://github.com/pytorch/pytorch/pull/149986 Approved by: https://github.com/malfet	2025-03-28 19:28:50 +00:00
Nikita Shulga	c41196a4d0	[EZ][Docker] Remove `install_db.sh` (#149360 ) Which is a vestige of caffe2 days and was no-op since https://github.com/pytorch/pytorch/pull/125092 Pull Request resolved: https://github.com/pytorch/pytorch/pull/149360 Approved by: https://github.com/atalman, https://github.com/cyyever, https://github.com/seemethere, https://github.com/Skylion007	2025-03-18 16:07:47 +00:00
Xinya Zhang	bc576355a2	Let aotriton.cmake detect the best binary package to use, and deprecate aotriton_version.txt (#137443 ) We do not need `install_aotriton.sh` and `aotriton_version.txt` any more since `aotriton.cmake` now installs the best binary release package as the default option when building pytorch. This should resolve the issue of needing a pre-installed aotriton package when building PyTorch for ROCm from source, which is not feasible if building PyTorch outside a CI docker image. With this change, a user can have a pre-installed AOTriton in their environment, if desired, and have the build pick it up by specifying the `AOTRITON_INSTALLED_PREFIX` env var, or have the build automatically detect and install the compatible version. As a third option, the user can also force AOTriton to build from source instead, using the `AOTRITON_INSTALL_FROM_SOURCE` env var. Also, with the changes in this PR, the cmake build process handles the tasks of copying aotriton .so and images directory from `torch/lib` to the installation path. Pull Request resolved: https://github.com/pytorch/pytorch/pull/137443 Approved by: https://github.com/jithunnair-amd, https://github.com/jeffdaily Co-authored-by: Jithun Nair <jithun.nair@amd.com>	2025-01-09 00:00:02 +00:00
Jack Taylor	034717a029	[ROCm] remove triton-rocm commit pin and merge pins with triton.txt (#133438 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/133438 Approved by: https://github.com/jithunnair-amd, https://github.com/malfet Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com>	2024-09-05 20:36:45 +00:00
PyTorch MergeBot	a1ba8e61d1	Revert "[ROCm] remove triton-rocm commit pin and merge pins with triton.txt (#133438 )" This reverts commit 5e8bf29148a590318f678620f84be8f4d5ffff5c. Reverted https://github.com/pytorch/pytorch/pull/133438 on behalf of https://github.com/ZainRizvi due to This still breaks linux binary builds. Added the appropriate labels to ensure tests can pass. See [GH job link](https://github.com/pytorch/pytorch/actions/runs/10626427003/job/29460479554) [HUD commit link](`5e8bf29148`) ([comment](https://github.com/pytorch/pytorch/pull/133438#issuecomment-2322246198))	2024-08-30 20:00:41 +00:00
Jack Taylor	5e8bf29148	[ROCm] remove triton-rocm commit pin and merge pins with triton.txt (#133438 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/133438 Approved by: https://github.com/jithunnair-amd, https://github.com/malfet Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com>	2024-08-30 03:38:35 +00:00
PyTorch MergeBot	4648848696	Revert "[ROCm] remove triton-rocm commit pin and merge pins with triton.txt (#133438 )" This reverts commit f71c3d265ab52589f983dd252d61461db4e7dbbd. Reverted https://github.com/pytorch/pytorch/pull/133438 on behalf of https://github.com/jeanschmidt due to seems to have introduced breakages in linux binary builds ([comment](https://github.com/pytorch/pytorch/pull/133438#issuecomment-2308787310))	2024-08-25 11:20:30 +00:00
Jack Taylor	f71c3d265a	[ROCm] remove triton-rocm commit pin and merge pins with triton.txt (#133438 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/133438 Approved by: https://github.com/jithunnair-amd, https://github.com/malfet	2024-08-24 18:26:49 +00:00
Xinya Zhang	d34075e0bd	Add Efficient Attention support on ROCM (#124885 ) This patch implements `with sdpa_kernel(SDPBackend.EFFICIENT_ATTENTION):` by reusing AOTriton's accelerated SDPA implementation Known limitations: - Only supports MI200/MI300X GPUs - Does not support varlen - Does not support `CausalVariant` - Optional arguments `causal_diagonal` and `seqlen_k` in `_efficient_attention_forward/backward` must be null - Does not work well with inductor's SDPA rewriter. The rewriter has been updated to only use math and flash attention on ROCM. This PR also uses a different approach of installing AOTriton binary instead of building it from source in the base docker image. More details on motivation: https://github.com/pytorch/pytorch/pull/124885#issuecomment-2153229129 `PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TESTING_DEVICE_ONLY_FOR="cuda" python test/test_transformers.py` yields "55028 passed, 20784 skipped" results with this change. [Previous result](https://hud.pytorch.org/pr/127528) of `test_transformers.py` was 0 error, 0 failure, 55229 skipped out of 75517 tests in total (the XML report does not contain total number of passed tests). Pull Request resolved: https://github.com/pytorch/pytorch/pull/124885 Approved by: https://github.com/malfet	2024-06-08 22:41:05 +00:00
Xinya Zhang	ef9451ac8d	Move the build of AOTriton to base ROCM docker image. (#127012 ) Mitigates #126111 AOTrtion, as a Math library, takes long time to build. However, this library itself is not moving as fast as PyTorch itself and it is not cost-efficient to build it for every CI check. This PR moves the build of AOTriton from PyTorch to its base docker image, avoids duplicated and long build time. Pre-this-PR: * PyTorch base docker build job duration: 1.1-1.3h * PyTorch build job duration: 1.4-1.5hr (includes AOTriton build time of 1hr6min on a linux.2xlarge node) Post-this-PR: * PyTorch base docker build job duration: 1.3h (includes AOTriton build time of 20min on a linux.12xlarge node) * PyTorch build job duration: <20 min Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/127012 Approved by: https://github.com/jithunnair-amd, https://github.com/pruthvistony, https://github.com/huydhn	2024-06-03 20:35:22 +00:00
Jack Taylor	d30cdc4321	[ROCm] amdsmi library integration (#119182 ) Adds monitoring support for ROCm using amdsmi in place of pynvml. Pull Request resolved: https://github.com/pytorch/pytorch/pull/119182 Approved by: https://github.com/jeffdaily, https://github.com/malfet, https://github.com/xw285cornell	2024-05-21 01:59:26 +00:00
cyy	3f11958d39	Remove FFMPEG from CI scripts (#125546 ) Because FFMPEG was solely used by Caffe2. Pull Request resolved: https://github.com/pytorch/pytorch/pull/125546 Approved by: https://github.com/r-barnes, https://github.com/kit1980, https://github.com/albanD, https://github.com/malfet, https://github.com/seemethere	2024-05-11 16:46:13 +00:00
PyTorch MergeBot	4dad988822	Revert "Remove vision packages from CI scripts (#125546 )" This reverts commit f42ea14c3f795082138421fcef90d24f64c6fd35. Reverted https://github.com/pytorch/pytorch/pull/125546 on behalf of https://github.com/huydhn due to I think we are using vision in inductor tests with their various models there ([comment](https://github.com/pytorch/pytorch/pull/125546#issuecomment-2105174723))	2024-05-10 19:43:23 +00:00
cyy	f42ea14c3f	Remove vision packages from CI scripts (#125546 ) Because they were solely used by Caffe2. Pull Request resolved: https://github.com/pytorch/pytorch/pull/125546 Approved by: https://github.com/r-barnes, https://github.com/kit1980, https://github.com/albanD	2024-05-10 17:53:48 +00:00
PyTorch MergeBot	0d4fdb0bb7	Revert "[ROCm] amdsmi library integration (#119182 )" This reverts commit 85447c41e32b1e43a025ea19ac812a0c7f88ff57. Reverted https://github.com/pytorch/pytorch/pull/119182 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but the ROCm failed test is legit `85447c41e3` ([comment](https://github.com/pytorch/pytorch/pull/119182#issuecomment-2103433197))	2024-05-09 21:18:21 +00:00
Jack Taylor	85447c41e3	[ROCm] amdsmi library integration (#119182 ) Adds monitoring support for ROCm using amdsmi in place of pynvml. Pull Request resolved: https://github.com/pytorch/pytorch/pull/119182 Approved by: https://github.com/jeffdaily, https://github.com/malfet, https://github.com/xw285cornell	2024-05-09 18:21:38 +00:00
Prachi Gupta	07123bc198	[ROCm] Build Triton in Centos for ROCm (#112050 ) Triton build for centos-based ROCm Dockerfile was missing. This brings centos Dockerfile up-to-date with ubuntu Dockerfile. No CI job covers this change; this change is independently verified by ROCm QA team. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112050 Approved by: https://github.com/jataylo, https://github.com/malfet	2023-11-05 20:43:56 +00:00
Huy Do	85bd6bc010	Cache pretrained mobilenet_v2 and mobilenet_v3_large models in Docker (#100302 ) Follow the example I did for ONNX in https://github.com/pytorch/pytorch/pull/96793, this caches the pretrained `mobilenet_v2 model` and `mobilenet_v3_large` used by CI jobs. I think there might be an issue either with AWS or with the domain download.pytorch.org as the connection to the latter has been failing a lots in the past few days. Related flaky jobs: * https://github.com/pytorch/pytorch/actions/runs/4835873487/jobs/8618836446 * https://github.com/pytorch/pytorch/actions/runs/4835783539/jobs/8618404639 * https://github.com/pytorch/pytorch/actions/runs/4835783539/jobs/8618404639 ``` Downloading: "https://download.pytorch.org/models/mobilenet_v2-b0353104.pth" to /var/lib/jenkins/.cache/torch/hub/checkpoints/mobilenet_v2-b0353104.pth Traceback (most recent call last): File "/opt/conda/envs/py_3.8/lib/python3.8/urllib/request.py", line 1354, in do_open h.request(req.get_method(), req.selector, req.data, headers, File "/opt/conda/envs/py_3.8/lib/python3.8/http/client.py", line 1256, in request self._send_request(method, url, body, headers, encode_chunked) File "/opt/conda/envs/py_3.8/lib/python3.8/http/client.py", line 1302, in _send_request self.endheaders(body, encode_chunked=encode_chunked) File "/opt/conda/envs/py_3.8/lib/python3.8/http/client.py", line 1251, in endheaders self._send_output(message_body, encode_chunked=encode_chunked) File "/opt/conda/envs/py_3.8/lib/python3.8/http/client.py", line 1011, in _send_output self.send(msg) File "/opt/conda/envs/py_3.8/lib/python3.8/http/client.py", line 951, in send self.connect() File "/opt/conda/envs/py_3.8/lib/python3.8/http/client.py", line 1418, in connect super().connect() File "/opt/conda/envs/py_3.8/lib/python3.8/http/client.py", line 922, in connect self.sock = self._create_connection( File "/opt/conda/envs/py_3.8/lib/python3.8/socket.py", line 808, in create_connection raise err File "/opt/conda/envs/py_3.8/lib/python3.8/socket.py", line 796, in create_connection sock.connect(sa) OSError: [Errno 99] Cannot assign requested address ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/100302 Approved by: https://github.com/ZainRizvi	2023-05-01 19:31:37 +00:00
Huy Do	371f587c92	Dockerize lint jobs (#94255 ) This is to minimize network flakiness when running lint jobs. I create a new Docker image for linter and install all linter dependencies there. After that, all linter jobs are converted to use Nova generic Linux job https://github.com/pytorch/test-infra/blob/main/.github/workflows/linux_job.yml with the new image. For the future task: I encounter this issue with the current mypy version we are using and Python 3.11 https://github.com/python/mypy/issues/13627. Fixing this requires upgrading mypy to a newer version, but that can be done separately (require formatting/fixing `*.py` files with the newer mypy version) `collect_env` linter job is currently not included here as it needs older Python versions (3.5). It could also be converted to use the same mechanism (with another Docker image, probably). This one rarely fails though. ### Testing BEFORE https://github.com/pytorch/pytorch/actions/runs/4130366955 took a total of ~14m AFTER https://github.com/pytorch/pytorch/actions/runs/4130712385 also takes a total of ~14m Pull Request resolved: https://github.com/pytorch/pytorch/pull/94255 Approved by: https://github.com/ZainRizvi	2023-02-11 21:56:19 +00:00
pramenku	dddc0b41db	[ROCm] centos update endpoint repo and fix sudo (#92034 ) * Update ROCm centos Dockerfile * Update install_user.sh for centos sudo issue Fixes ROCm centos Dockerfile due to https://packages.endpoint.com/rhel/7/os/x86_64/endpoint-repo-1.9-1.x86_64.rpm file is not accessible. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92034 Approved by: https://github.com/malfet	2023-02-09 21:30:58 +00:00
Nikita Shulga	6c4dc98b9d	[CI][BE] Move docker forlder to `.ci` (#93104 ) Follow up after https://github.com/pytorch/pytorch/pull/92569 Pull Request resolved: https://github.com/pytorch/pytorch/pull/93104 Approved by: https://github.com/huydhn, https://github.com/seemethere, https://github.com/ZainRizvi	2023-02-03 12:25:33 +00:00

26 Commits