Commit Graph

652 Commits

Author SHA1 Message Date
4dce5b71a0 [build] modernize build-frontend: python setup.py develop/install -> [uv ]pip install --no-build-isolation [-e ]. (#156027)
Modernize the development installation:

```bash
# python setup.py develop
python -m pip install --no-build-isolation -e .

# python setup.py install
python -m pip install --no-build-isolation .
```

Now, the `python setup.py develop` is a wrapper around `python -m pip install -e .` since `setuptools>=80.0`:

- pypa/setuptools#4955

`python setup.py install` is deprecated and will emit a warning during run. The warning will become an error on October 31, 2025.

- 9c4d383631/setuptools/command/install.py (L58-L67)

> ```python
> SetuptoolsDeprecationWarning.emit(
>     "setup.py install is deprecated.",
>     """
>     Please avoid running ``setup.py`` directly.
>     Instead, use pypa/build, pypa/installer or other
>     standards-based tools.
>     """,
>     see_url="https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html",
>     due_date=(2025, 10, 31),
> )
> ```

- pypa/setuptools#3849

Additional Resource:

- [Why you shouldn't invoke setup.py directly](https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/156027
Approved by: https://github.com/ezyang
2025-07-09 11:24:27 +00:00
4cc8b60d1b [BE][1/16] fix typos in torch/ (#156311)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/156311
Approved by: https://github.com/albanD
2025-07-09 11:02:22 +00:00
ed6df0e324 correctly import torch.version (#157584)
The structure is

```
torch/
  __init__.py
  version.py
```

When we import torch, only `torch/__init__.py` is executed by default.

The submodules like `version.py` are not automatically imported or attached to the torch module.

So without anything in `__init__.py`, `torch.version` may not be found. So in this PR, we make the import explicit.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/157584
Approved by: https://github.com/ezyang
2025-07-07 21:43:35 +00:00
9968edd002 Fix #153942 (#153943)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153943
Approved by: https://github.com/malfet
2025-07-04 18:25:18 +00:00
19f851ce10 Revert "Simplify nvtx3 CMake handling, always use nvtx3 (#153784)"
This reverts commit 099d0d6121125062ebc05771c8330cb7cd8d053a.

Reverted https://github.com/pytorch/pytorch/pull/153784 on behalf of https://github.com/Camyll due to breaking internal tests and cuda 12.4 builds still used in CI ([comment](https://github.com/pytorch/pytorch/pull/153784#issuecomment-3001702310))
2025-06-24 20:02:07 +00:00
cyy
099d0d6121 Simplify nvtx3 CMake handling, always use nvtx3 (#153784)
Fall back to third-party NVTX3 if system NVTX3 doesn't exist. We also reuse the `CUDA::nvtx3` target for better interoperability.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153784
Approved by: https://github.com/ezyang
2025-06-23 06:12:46 +00:00
7f0cddfb55 [dynamo] Add documentation for guard_filter_fn (#156114)
Summary: Adding a section of doc for guard_filter_fn.

Test Plan:
CI

Rollback Plan:

Differential Revision: D76756743

Pull Request resolved: https://github.com/pytorch/pytorch/pull/156114
Approved by: https://github.com/jansel
2025-06-19 16:13:12 +00:00
c4b93e6579 Replace frame_traced_fn hook with get_traced_code() util (#155249)
#153622 introduced a hook for getting the relevant code objects after frame tracing. The idea is to have vLLM use this instead of monkey-patching `inline_call_()` to determine the source code files to hash. Unfortunately, the hook runs too late; the vLLM backend needs access to the set of source code filenames while it's running.

This PR replaces the newly-added hook with a utility function that a backend can call to get this information. I've made the change in vLLM and can verify that this allows the information to be queried at the right time.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155249
Approved by: https://github.com/zou3519
2025-06-10 22:40:58 +00:00
694028f502 update get_default_device to also respect torch.device ctx manager (#148621)
Fixes https://github.com/pytorch/pytorch/issues/131328
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148621
Approved by: https://github.com/ezyang
2025-06-07 14:26:17 +00:00
30387ab2e4 [ROCm] Adds initialization support for PyTorch when built from ROCm wheels. (#155285)
AMD is beginning to roll out ROCm distribution via Python wheels. This patch adds the `__init__.py` hook that is necessary to bootstrap ROCm correctly on Linux and Windows when built from these wheels.

See draft, developer documentation describing the mechanism here: https://github.com/ROCm/TheRock/blob/main/docs/packaging/python_packaging.md

This operates to similar effect as how Torch can depend on CUDA wheels, with some differences:

* ROCm libraries and checks are delegated to helpers in the `rocm_sdk` module, which knows how to find and configure access to the installed libraries. This limits the amount of plumbing and path machinations that must match up between the framework and ROCm.
* When building torch against ROCm, no ROCm system install is needed: instead the proper SDK development wheel is installed and the `CMAKE_PREFIX_PATH` is obtained via `rocm-sdk path --cmake`.
* It is expected that whoever produces such a build will also place a generated `_rocm_init.py` in the `torch` module with initialization logic to preload libraries, check versions, verify GPU compatibility, etc.
* See [build_prod_wheels.py](https://github.com/ROCm/TheRock/blob/main/external-builds/pytorch/build_prod_wheels.py) for an example build script that is being used to generate nightlies in this configuration.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/155285
Approved by: https://github.com/jeffdaily

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
2025-06-07 02:59:03 +00:00
9db7bcb3fe [Dynamo] Introduce hook receiving list of traced code objects (#153622)
This PR:
* Expands `Hooks` with a new, optional `frame_traced_fn` field. It should be a callable receiving the list of traced code objects
* Maintains a list of `traced_code` objects in the `TracingContext` of an `OutputGraph`
    *  Whenever an `inline_call()` is encountered, the corresponding code object is added to this set
    * `OutputGraph`'s associated `f_code` is added to the list just before the hook is called

I believe use of this hook should enable the source code hashing that vLLM does in a better way than monkey-patching `inline_call()`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153622
Approved by: https://github.com/jansel
2025-05-28 15:40:09 +00:00
f55f2f42a7 Add missing docstring for sym_ite (#154201)
`sym_ite` is listed in [the reference page](https://docs.pytorch.org/docs/stable/torch.html) and has no document.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/154201
Approved by: https://github.com/Skylion007
2025-05-26 15:59:21 +00:00
756fd80734 [BE] Improve the typing related to model input argument of torch.compile() (#153559)
Summary: Match the `overload` typing with the original typing in function definition and adjust the corresponding comments.

Test Plan: contbuild & OSS CI

Differential Revision: D74746243

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153559
Approved by: https://github.com/Skylion007
2025-05-15 04:49:26 +00:00
5bf0c3518c Detect NVSHMEM location (#153010)
### Changes
- Detect NVSHMEM install location via `sysconfig.get_path("purelib")`, which typically resolves to `<conda_env>/lib/python/site-packages`, and NVSHMEM include and lib live under `nvidia/nvshmem`
- Added link dir via `target_link_directories`
- Removed direct dependency on mlx5
- Added preload rule (following other other NVIDIA libs)

### Plan of Record
1. End user experience: link against NVSHMEM dynamically (NVSHMEM lib size is 100M, similar to NCCL, thus we'd like users to `pip install nvshmem` than torch carrying the bits)
2. Developer experience: at compile time, prefers wheel dependency than using Git submodule
General rule: submodule for small lib that torch can statically link with
If user pip install a lib, our CI build process should do the same, rather than building from Git submodule (just for its header, for example)
3. Keep `USE_NVSHMEM` to gate non-Linux platforms, like Windows, Mac
4. At configuration time, we should be able to detect whether nvshmem is available, if not, we don't build `NVSHMEMSymmetricMemory` at all.

For now, we have symbol dependency on two particular libs from NVSHMEM:
- libnvshmem_host.so: contains host side APIs;
- libnvshmem_device.a: contains device-side global variables AND device function impls.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153010
Approved by: https://github.com/ngimel, https://github.com/fduwjj, https://github.com/Skylion007
2025-05-07 23:35:04 +00:00
f5f8f637a5 [Typing] Improve device typing for torch.set_default_device() (#153028)
Part of: #152952

Here is the definition of `torch.types.Device`:

ab997d9ff5/torch/types.py (L74)

So `_Optional[_Union["torch.device", str, builtins.int]]` is equivalent to it.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153028
Approved by: https://github.com/Skylion007
2025-05-07 19:31:43 +00:00
7d205b22b5 [profiler][retry] don't disable CUPTI_LAZY_REINIT for cuda >= 12.6 (#151124)
Retry of https://github.com/pytorch/pytorch/pull/150957, which was reverted due to internal meta failures

Credit to @mgmtea who wrote the initial version of this PR: https://github.com/pytorch/pytorch/pull/146604

Context: CUPTI is the NVIDIA library that Kineto uses for collecting GPU-side info during profiling. The intended usage is to register a callback while you want profiling to occur, and then unregister the callback when you want profiling to stop. But a bug would cause crashes if CUPTI callbacks were de-registered when used with cudagraphs. The workaround was to disable "CUPTI_LAZY_REINIT" and "CUPTI_TEARDOWN" in Kineto - which prevents crashes, but can result in slower execution after profiling has occurred and completed.

This bug is believed to be fixed in CUDA >= 12.6, so this PR qualifies that DISABLE_CUPTI_LAZY_REINIT=1 and CUPTI_TEARDOWN=0 should only be applied if CUDA >= 12.6. Additionally, `profiler_allow_cudagraph_cupti_lazy_reinit_cuda12()` is added as an escape hatch so that we can add a killswitch in case we see more crashes related to this.

Differential Revision: [D72842114](https://our.internmc.facebook.com/intern/diff/D72842114/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D72842114/)!

Differential Revision: [D72842114](https://our.internmc.facebook.com/intern/diff/D72842114)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151124
Approved by: https://github.com/sraikund16
2025-04-15 16:11:49 +00:00
101c4f482a Docs: Fix typos in the Symbolic Numbers docstrings (#151181)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/151181
Approved by: https://github.com/soulitzer
2025-04-14 01:46:02 +00:00
44ed0c9fbb Revert "[profiler] don't disable CUPTI_LAZY_REINIT for cuda >= 12.6 (#150957)"
This reverts commit 37812009fd123d5c4a038ce798eedd4a89eeffad.

Reverted https://github.com/pytorch/pytorch/pull/150957 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/150957#issuecomment-2795878848))
2025-04-11 05:38:58 +00:00
86370fd658 [dynamo] Allow guards to be dropped with custom filter functions. (#150936)
Summary: A follow up of https://github.com/pytorch/pytorch/pull/150689.

Test Plan: test_dynamo -k test_guard_filter_fn

Differential Revision: D72722322

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150936
Approved by: https://github.com/jansel
2025-04-11 03:06:34 +00:00
37812009fd [profiler] don't disable CUPTI_LAZY_REINIT for cuda >= 12.6 (#150957)
Credit to @mgmtea who wrote the initial version of this PR: https://github.com/pytorch/pytorch/pull/146604

Context: CUPTI is the NVIDIA library that Kineto uses for collecting GPU-side info during profiling. The intended usage is to register a callback while you want profiling to occur, and then unregister the callback when you want profiling to stop. But a bug would cause crashes if CUPTI callbacks were de-registered when used with cudagraphs. The workaround was to disable "CUPTI_LAZY_REINIT" and "CUPTI_TEARDOWN" in Kineto - which prevents crashes, but can result in slower execution after profiling has occurred and completed.

This bug is believed to be fixed in CUDA >= 12.6, so this PR qualifies that DISABLE_CUPTI_LAZY_REINIT=1 and CUPTI_TEARDOWN=0 should only be applied if CUDA >= 12.6. Additionally, `profiler_allow_cudagraph_cupti_lazy_reinit_cuda12()` is added as an escape hatch so that we can add a killswitch in case we see more crashes related to this.

Differential Revision: [D72745929](https://our.internmc.facebook.com/intern/diff/D72745929)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150957
Approved by: https://github.com/aaronenyeshi, https://github.com/Skylion007
2025-04-10 17:45:01 +00:00
5471e80fb4 Remove guard_size_oblivious from vector_norm decomposition. (#148809)
This PR remove the usage of guard_size_oblivious in vector_norm by inlining it in the runtime check,
this prevent any data dependent error from ever appearing here at the locations where guard_size_oblivious
used to exist. Before this PR it used to break potentially. This is NOT BC breaking or changing of semantics from eager.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148809
Approved by: https://github.com/bobrenjc93
2025-04-10 16:19:00 +00:00
68b327341c Fix #149806 : Fix path lookup in _preload_cuda_deps (#149808)
@pytorchbot label "bug"

Fixes #149806

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149808
Approved by: https://github.com/jansel
2025-03-25 23:03:47 +00:00
5a7588f183 [Build] Remove pre-CXX11 ABI logic from build script (#149888)
Only keep one in check_binary_symbols to make sure there are no pre-CXX11 ABI symbols in the library
Pull Request resolved: https://github.com/pytorch/pytorch/pull/149888
Approved by: https://github.com/atalman, https://github.com/seemethere
ghstack dependencies: #149887
2025-03-25 03:17:16 +00:00
FEI
59d5cf083b update torch.nn.RelicationPad{1,2,3}d deternimistic documentation (#148633)
https://github.com/pytorch/pytorch/issues/115395
This issue mentioned that when deterministic mode is turned on, added a decomp for replication_pad_{1,2,3}d
to make the backward function deterministic.
@malfet
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148633
Approved by: https://github.com/isuruf
2025-03-25 02:01:31 +00:00
2975664fb0 add python root bin to windows load path. (#146573)
This PR is extend python root bin path to dll load list.
It makes PyTorch robust and compatible to more dependency libraries, such as `intel-pti`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146573
Approved by: https://github.com/EikanWang, https://github.com/albanD
2025-03-21 00:48:43 +00:00
a66a9581da [dynamo] support Python 3.13t (#149549)
A few bug fixes to get Dynamo mostly working with 3.13 nogil. Dynamo encounters internal CPython assert errors in older versions of 3.13. The fix has been landed on [CPython's 3.13 branch](https://github.com/python/cpython/tree/3.13) and will be included in 3.13.3 (https://peps.python.org/pep-0719/ - april 8). If you wish to try `torch.compile` on the latest 3.13 branch, you can comment out the error checking (i.e. 70b6cd4e11/torch/__init__.py (L2535) and 70b6cd4e11/torch/_dynamo/eval_frame.py (L899)).

We will work on getting PyTorch CI up for Dynamo/dynamo-wrapped/inductor once 3.13.3 is available.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149549
Approved by: https://github.com/jansel
2025-03-20 09:49:27 +00:00
14dc6e732d Cache the get_device_module result (#149207)
Summary: As title.

Test Plan: OSS CIs.

Reviewed By: chaos5958

Differential Revision: D71084180

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149207
Approved by: https://github.com/jansel
2025-03-19 03:20:38 +00:00
230a3b0f83 Add cuda 11.8 guard for cufile preload (#148184)
Follow up after https://github.com/pytorch/pytorch/pull/148137
Make sure we don't try to load cufile on CUDA 11.8

Test:
```
>>> import torch
/usr/local/lib64/python3.9/site-packages/torch/_subclasses/functional_tensor.py:276: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:81.)
  cpu = _conversion_method_template(device=torch.device("cpu"))
>>> torch.__version__
'2.7.0.dev20250227+cu118'
>>>
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148184
Approved by: https://github.com/mikaylagawarecki
2025-03-01 01:01:04 +00:00
5a14ff8ace Add cufile to list of libraries to preload (#148137)
Fixes: https://github.com/pytorch/pytorch/issues/148120

Test with almalinux/9-base:latest :
```
>>> import torch
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib64/python3.9/site-packages/torch/__init__.py", line 401, in <module>
    from torch._C import *  # noqa: F403
ImportError: libcufile.so.0: cannot open shared object file: No such file or directory
>>> exit()
[root@18b37257e416 /]# vi /usr/local/lib64/python3.9/site-packages/torch/__init__.py
[root@18b37257e416 /]# python3
Python 3.9.19 (main, Sep 11 2024, 00:00:00)
[GCC 11.5.0 20240719 (Red Hat 11.5.0-2)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
/usr/local/lib64/python3.9/site-packages/torch/_subclasses/functional_tensor.py:276: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:81.)
  cpu = _conversion_method_template(device=torch.device("cpu"))
>>> torch.__version__
'2.7.0.dev20250227+cu126'
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148137
Approved by: https://github.com/malfet
2025-02-28 00:35:47 +00:00
754fb834db [BE][CI] bump ruff to 0.9.0: string quote styles (#144569)
Reference: https://docs.astral.sh/ruff/formatter/#f-string-formatting

- Change the outer quotes to double quotes for nested f-strings

```diff
- f'{", ".join(args)}'
+ f"{', '.join(args)}"
```

- Change the inner quotes to double quotes for triple f-strings

```diff
  string = """
-     {', '.join(args)}
+     {", ".join(args)}
  """
```

- Join implicitly concatenated strings

```diff
- string = "short string " "short string " f"{var}"
+ string = f"short string short string {var}"
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144569
Approved by: https://github.com/Skylion007
ghstack dependencies: #146509
2025-02-24 19:56:09 +00:00
6a0138fcc1 Torch device backend autoload fix (#145611)
This causes an import failure if an external backend imports a module that uses `torch._as_tensor_fullprec` when it is being loaded.

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145611
Approved by: https://github.com/albanD
2025-01-31 19:27:42 +00:00
2a70de7e92 [CUDA] Change slim-wheel libraries load order (#145638)
There is no libnvjitlink in  CUDA-11.x , so attempts to load it first will abort the execution and prevent the script from preloading nvrtc

Fixes issues reported in https://github.com/pytorch/pytorch/pull/145614#issuecomment-2613107072

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145638
Approved by: https://github.com/atalman, https://github.com/kit1980, https://github.com/malfet

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
2025-01-24 22:00:56 +00:00
9752c7c1c8 [CD] Fix slim-wheel cuda_nvrtc import problem (#145582)
Similar fix as: https://github.com/pytorch/pytorch/pull/144816

Fixes: https://github.com/pytorch/pytorch/issues/145580

Found during testing of https://github.com/pytorch/pytorch/issues/138340

Please note both nvrtc and nvjitlink exist for cuda 11.8, 12.4 and 12.6 hence we can safely remove if statement. Preloading can apply to all supporting cuda versions.

CUDA 11.8 path:
```
(.venv) root@b4ffe5c8ac8c:/pytorch/.ci/pytorch/smoke_test# ls /.venv/lib/python3.12/site-packages/torch/lib/../../nvidia/cuda_nvrtc/lib
__init__.py  __pycache__  libnvrtc-builtins.so.11.8  libnvrtc-builtins.so.12.4  libnvrtc.so.11.2  libnvrtc.so.12
(.venv) root@b4ffe5c8ac8c:/pytorch/.ci/pytorch/smoke_test# ls /.venv/lib/python3.12/site-packages/torch/lib/../../nvidia/nvjitlink/lib
__init__.py  __pycache__  libnvJitLink.so.12
```

Test with rc 2.6 and CUDA 11.8:
```
python cudnn_test.py
2.6.0+cu118
---------------------------------------------SDPA-Flash---------------------------------------------
ALL GOOD
---------------------------------------------SDPA-CuDNN---------------------------------------------
ALL GOOD
```

Thank you @nWEIdia for discovering this issue

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145582
Approved by: https://github.com/nWEIdia, https://github.com/eqy, https://github.com/kit1980, https://github.com/malfet

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
2025-01-24 04:47:57 +00:00
f2cfe8b59f PEP585 update - mostly toplevels (#145178)
See #145101 for details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145178
Approved by: https://github.com/bobrenjc93
2025-01-22 02:21:14 +00:00
323fb4dad0 Unconditionally exclude upper bound in all size oblivious tests (#144867)
I was thinking about https://github.com/pytorch/pytorch/pull/144471 some more and I thought, "Hmm, why not just always exclude the constant upper bound." So here it is.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144867
Approved by: https://github.com/bobrenjc93
2025-01-21 20:44:09 +00:00
505ade7471 [inductor] Simplify mode options, only apply CompilerBisector changes once (#145232)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145232
Approved by: https://github.com/yanboliang
2025-01-21 19:25:46 +00:00
4eea2f7496 [inductor] Fix ignored options for torch.compile (#145131)
#139833 broke `torch.compile(options=...)` so that many (all?) options passed in get completely ignored.  @alexreinking pointed this out when `options={"cpu_backend":"halide"}` did nothing.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145131
Approved by: https://github.com/exclamaforte
2025-01-18 03:39:49 +00:00
d21738f24a Revert "Fix torch.normal ignores default_device (#144070)"
This reverts commit 184549b2d7e59acfc6e47d121e9ebb50648945b3.

Reverted https://github.com/pytorch/pytorch/pull/144070 on behalf of https://github.com/ezyang due to broken a specific use case ([comment](https://github.com/pytorch/pytorch/pull/144070#issuecomment-2590681953))
2025-01-14 17:41:58 +00:00
f2975717f3 [CD] Fix slim-wheel nvjit-link import problem (#141063)
When other toolkit (say CUDA-12.3)  is installed and `LD_LIBRARY_PATH` points to there, import torch will fail with
```
ImportError: /usr/local/lib/python3.10/dist-packages/torch/lib/../../nvidia/cusparse/lib/libcusparse.so.12: undefined symbol: __nvJitLinkComplete_12_4, version libnvJitLink.so.12
```
It could not be worked around by tweaking rpath, as it also depends on the library load order, which are not guaranteed by any linker. Instead solve this by preloading `nvjitlink` right after global deps are loaded, by running something along the lines of the following
```python
        if version.cuda in ["12.4", "12.6"]:
            with open("/proc/self/maps") as f:
                _maps = f.read()
            # libtorch_global_deps.so always depends in cudart, check if its installed via wheel
            if "nvidia/cuda_runtime/lib/libcudart.so" in _maps:
                # If all abovementioned conditions are met, preload nvjitlink
                _preload_cuda_deps("nvjitlink", "libnvJitLink.so.*[0-9]")
```

Fixes https://github.com/pytorch/pytorch/issues/140797

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141063
Approved by: https://github.com/kit1980

Co-authored-by: Sergii Dymchenko <sdym@meta.com>
2025-01-14 17:33:07 +00:00
ffb3f32693 Add max kwarg to torch._check with alternate size oblivious semantics (#144471)
Fixes https://github.com/pytorch/pytorch/issues/120288 for the static bound case

I had been tying myself in knots in the original issue about the fact that we can't really do symbolic bounds like u0 < s0. But then I realized, "Wait, but the static bounds are easy!" So this makes it so you can also exclude a specific upper bound when doing size oblivious tests, which is enough to solve https://github.com/pytorch/pytorch/issues/123592#issuecomment-2574556708

It's written very dirtily, maybe there's some cleanup. Bikeshed on the public API name also welcome.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144471
Approved by: https://github.com/avikchaudhuri
2025-01-14 15:10:57 +00:00
184549b2d7 Fix torch.normal ignores default_device (#144070)
Fixes #122886

1. Enable `torch.normal` working with `DeviceContext` to get default device which set via `set_default_device`.
2. Add hint in `set_default_device` doc, suggest use `torch.Tensor.to` method move to desired device explicitly.

**Test Result**
1. **Doc Preview**
![image](https://github.com/user-attachments/assets/eb69c334-be2b-4dc5-bdce-567da21e1635)

2. **Local Test**
```python
>>> import torch
>>> torch.normal(0.,1., (10,10)).device
device(type='cpu')
>>> torch.set_default_device('cuda')
>>> torch.normal(0.,1., (10,10)).device
device(type='cuda', index=0)
```

```bash
pytest test/test_tensor_creation_ops.py
```

![image](https://github.com/user-attachments/assets/8b466b55-f162-4b83-8b20-71de2c1d0914)

```bash
lintrunner
```
![image](https://github.com/user-attachments/assets/5b269c50-da57-47ed-8500-4edf2c2295e4)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144070
Approved by: https://github.com/ezyang
2025-01-10 08:19:55 +00:00
2b241a8206 Amazon Linux 2023: Preload cusparseLt.so (#144477)
Fixes https://github.com/pytorch/pytorch/issues/144433

Test with some debug statements added:

```
>>> import torch
trying to load libcublas.so.*[0-9] from ['/usr/local/lib/python3.9/site-packages/nvidia/cublas/lib/libcublas.so.12']
trying to load libcublas.so.*[0-9] from /usr/local/lib/python3.9/site-packages/nvidia/cublas/lib/libcublas.so.12
trying to load libcudnn.so.*[0-9] from ['/usr/local/lib/python3.9/site-packages/nvidia/cudnn/lib/libcudnn.so.9']
trying to load libcudnn.so.*[0-9] from /usr/local/lib/python3.9/site-packages/nvidia/cudnn/lib/libcudnn.so.9
trying to load libnvrtc.so.*[0-9] from ['/usr/local/lib/python3.9/site-packages/nvidia/cuda_nvrtc/lib/libnvrtc.so.12']
trying to load libnvrtc.so.*[0-9] from /usr/local/lib/python3.9/site-packages/nvidia/cuda_nvrtc/lib/libnvrtc.so.12
trying to load libcudart.so.*[0-9] from ['/usr/local/lib/python3.9/site-packages/nvidia/cuda_runtime/lib/libcudart.so.12']
trying to load libcudart.so.*[0-9] from /usr/local/lib/python3.9/site-packages/nvidia/cuda_runtime/lib/libcudart.so.12
trying to load libcupti.so.*[0-9] from ['/usr/local/lib/python3.9/site-packages/nvidia/cuda_cupti/lib/libcupti.so.12']
trying to load libcupti.so.*[0-9] from /usr/local/lib/python3.9/site-packages/nvidia/cuda_cupti/lib/libcupti.so.12
trying to load libcufft.so.*[0-9] from ['/usr/local/lib/python3.9/site-packages/nvidia/cufft/lib/libcufft.so.11']
trying to load libcufft.so.*[0-9] from /usr/local/lib/python3.9/site-packages/nvidia/cufft/lib/libcufft.so.11
trying to load libcurand.so.*[0-9] from ['/usr/local/lib/python3.9/site-packages/nvidia/curand/lib/libcurand.so.10']
trying to load libcurand.so.*[0-9] from /usr/local/lib/python3.9/site-packages/nvidia/curand/lib/libcurand.so.10
trying to load libnvJitLink.so.*[0-9] from ['/usr/local/lib/python3.9/site-packages/nvidia/nvjitlink/lib/libnvJitLink.so.12']
trying to load libnvJitLink.so.*[0-9] from /usr/local/lib/python3.9/site-packages/nvidia/nvjitlink/lib/libnvJitLink.so.12
trying to load libcusparse.so.*[0-9] from ['/usr/local/lib/python3.9/site-packages/nvidia/cusparse/lib/libcusparse.so.12']
trying to load libcusparse.so.*[0-9] from /usr/local/lib/python3.9/site-packages/nvidia/cusparse/lib/libcusparse.so.12
trying to load libcusparseLt.so.*[0-9] from []
trying to load libcusparseLt.so.*[0-9] from /usr/local/lib/python3.9/site-packages/cusparselt/lib/libcusparseLt.so.0
trying to load libcusolver.so.*[0-9] from ['/usr/local/lib/python3.9/site-packages/nvidia/cusolver/lib/libcusolver.so.11']
trying to load libcusolver.so.*[0-9] from /usr/local/lib/python3.9/site-packages/nvidia/cusolver/lib/libcusolver.so.11
trying to load libnccl.so.*[0-9] from ['/usr/local/lib/python3.9/site-packages/nvidia/nccl/lib/libnccl.so.2']
trying to load libnccl.so.*[0-9] from /usr/local/lib/python3.9/site-packages/nvidia/nccl/lib/libnccl.so.2
trying to load libnvToolsExt.so.*[0-9] from ['/usr/local/lib/python3.9/site-packages/nvidia/nvtx/lib/libnvToolsExt.so.1']
trying to load libnvToolsExt.so.*[0-9] from /usr/local/lib/python3.9/site-
packages/nvidia/nvtx/lib/libnvToolsExt.so.1
/usr/local/lib64/python3.9/site-packages/torch/_subclasses/functional_tensor.py:275: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:81.)
  cpu = _conversion_method_template(device=torch.device("cpu"))
>>> exit()
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144477
Approved by: https://github.com/Skylion007, https://github.com/nWEIdia
2025-01-09 20:04:11 +00:00
f700035090 [3.13t] use sysconfig to check for Python nogil builds (#144361)
`sys._is_gil_enabled()` wasn't working in certain cases, according to @atalman

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144361
Approved by: https://github.com/atalman
2025-01-08 13:00:32 +00:00
dc55704b48 Rename cache limit to recompile limit in configs (#143709)
This PR renames every cache_limit to recompile_limit via sed.

Old config options are maintained via Config(alias='xyz')

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143709
Approved by: https://github.com/jansel
2024-12-22 10:03:57 +00:00
e1e83015d2 [dynamo, 3.13t] raise error if torch.compile is attempted in 3.13t (nogil) (#143404)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/143404
Approved by: https://github.com/colesbury, https://github.com/atalman
2024-12-19 18:10:01 +00:00
f8c212a925 Transform unbacked int expressions into a fresh unbacked int. (#141917)
Fix: #141419

This PR introduces the `torch.sym_fresh_size` API, which transforms an unbacked int
expression into a fresh unbacked int.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141917
Approved by: https://github.com/ezyang
2024-12-05 16:53:44 +00:00
416f500bfe [CI, 3.13] enable 3.13 CI (#139533)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/139533
Approved by: https://github.com/atalman, https://github.com/malfet
ghstack dependencies: #141409, #142003, #141572, #141577, #141605, #141621, #141623, #141673, #141674, #141858, #141862
2024-12-05 00:25:03 +00:00
ee7eaad5c3 [dynamo] add SymNode bitwise and/or (#138777)
Fixes [T203472723](https://www.internalfb.com/intern/tasks/?t=203472723)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138777
Approved by: https://github.com/ezyang
2024-11-22 23:36:16 +00:00
2239d1a7a3 Revert "[CI, 3.13] enable 3.13 CI (#139533)"
This reverts commit b7a25c1ee7cdb559516db2b10279c996742a1708.

Reverted https://github.com/pytorch/pytorch/pull/139533 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing test_cpp_extensions_open_device_registration. The test was wrongly excluded by TD ([comment](https://github.com/pytorch/pytorch/pull/139533#issuecomment-2494328806))
2024-11-22 17:18:49 +00:00
b7a25c1ee7 [CI, 3.13] enable 3.13 CI (#139533)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/139533
Approved by: https://github.com/atalman, https://github.com/malfet
2024-11-22 14:43:02 +00:00