7 Commits

Author SHA1 Message Date
bcf50636ba [CI] Removing --user flag from all pip install commands (#154900)
Related to https://github.com/pytorch/pytorch/issues/148335

python virtualenv doesn't support using `--user` flag:

```
ERROR: Can not perform a '--user' install. User site-packages are not visible in this virtualenv.
+ python3 -m pip install --progress-bar off --user ninja==1.10.2
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154900
Approved by: https://github.com/jeffdaily

Co-authored-by: Jithun Nair <jithun.nair@amd.com>
2025-07-14 21:09:42 +00:00
7d39e73c57 Fix more URLs (#153277)
Or ignore them.
Found by running the lint_urls.sh script locally with https://github.com/pytorch/pytorch/pull/153246

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153277
Approved by: https://github.com/malfet
2025-05-14 16:23:50 +00:00
bcf1031cb8 [ROCm] Fixes to enable VM-based MI300 CI runners (#152133)
New VM-based MI300 CI runners tested in https://github.com/pytorch/pytorch/pull/151708 exposed some issues in CI that this PR fixes:

* HSAKMT_DEBUG_LEVEL is a debug env var that was introduced to debug driver issues. However, in the new MI300 runners being tested, since they run inside a VM, the driver emits a debug message `Failed to map remapped mmio page on gpu_mem 0` when calling `rocminfo` or doing other GPU-related work. This results in multiple PyTorch unit tests failing when doing a string match on the stdout vs expected output.

* HSA_FORCE_FINE_GRAIN_PCIE was relevant for rccl performance improvement, but is not required now.

* amdsmi doesn't return metrics like [power_info](https://rocm.docs.amd.com/projects/amdsmi/en/latest/reference/amdsmi-py-api.html#amdsmi-get-power-cap-info) and [clock_info](https://rocm.docs.amd.com/projects/amdsmi/en/latest/reference/amdsmi-py-api.html#amdsmi-get-clock-info) in a VM ("Guest") environment. Return 0 as the default in cases where amdsmi returns "N/A"

* amdsmi throws an exception when calling `amdsmi.amdsmi_get_clock_info` on the VM-based runners. Temporarily skipping the unit test for MI300 until we find a resolution.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152133
Approved by: https://github.com/jeffdaily
2025-04-25 18:06:48 +00:00
0ecb071fc4 [BE][CI] change references from .jenkins to .ci (#92624)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92624
Approved by: https://github.com/ZainRizvi, https://github.com/huydhn
2023-01-30 22:50:07 +00:00
b453adc945 [BE][CI] rename .jenkins (#92845)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92845
Approved by: https://github.com/clee2000
2023-01-25 23:47:38 +00:00
afe6ea884f Revert "[BE][CI] rename .jenkins to .ci, add symlink (#92621)"
This reverts commit 8972a9fe6aa8be8f8035c83094ed371973bfbe73.

Reverted https://github.com/pytorch/pytorch/pull/92621 on behalf of https://github.com/atalman due to breaks shipit
2023-01-23 15:04:58 +00:00
8972a9fe6a [BE][CI] rename .jenkins to .ci, add symlink (#92621)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92621
Approved by: https://github.com/huydhn, https://github.com/ZainRizvi
2023-01-21 02:40:18 +00:00