This PR enables all PIE rules on ruff, there are already some enabled rules from this family, the new added rules are
```
PIE796 Enum contains duplicate value: {value}
PIE808 Unnecessary start argument in range
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165814
Approved by: https://github.com/ezyang
This PR enables all PIE rules on ruff, there are already some enabled rules from this family, the new added rules are
```
PIE796 Enum contains duplicate value: {value}
PIE808 Unnecessary start argument in range
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165814
Approved by: https://github.com/ezyang
This change addresses confusing error messages users encounter when using the ONNX exporter with default settings. Previously, `fallback=True` was the default, which would attempt to fall back to the TorchScript exporter when the dynamo path failed, leading to mixed error messages that obscured the actual issues.
## Problem
When `fallback=True` by default:
- Users get confusing error messages mixing dynamo and TorchScript export failures
- Error messages tell users to provide the `f` argument unnecessarily
- Dynamo error messages get flushed with TorchScript errors when both paths fail
- Users expecting the dynamo path get unexpected fallback behavior
## Solution
Changed the default from `fallback=True` to `fallback=False` in both:
- `torch.onnx.export()` function
- `torch.onnx._internal.exporter._compat.export_compat()` function
## Impact
**Before:**
```python
# Would fallback to TorchScript on dynamo failure, causing mixed error messages
torch.onnx.export(model, args)
```
**After:**
```python
# Clean dynamo-only errors by default
torch.onnx.export(model, args)
# Advanced users can still opt-in to fallback behavior
torch.onnx.export(model, args, fallback=True)
```
Fixes#162697
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162726
Approved by: https://github.com/titaiwangms, https://github.com/xadupre
## Summary
This PR improves typing in ONNX-related modules by replacing TypeVar bound to Callable[..., Any] with ParamSpec to preserve parameter types and avoid type erasure in decorator functions.
## Changes
- `torch/onnx/_internal/exporter/_flags.py`: Replace TCallable TypeVar with ParamSpec
- `torch/onnx/ops/_impl.py`: Replace _T TypeVar with ParamSpec for _onnx_op decorator
- `torch/onnx/_internal/exporter/_torchlib/_torchlib_registry.py`: Replace _T TypeVar with ParamSpec
## Motivation
The previous implementation used TypeVar bound to Callable which erased parameter type information to Any. ParamSpec preserves the exact parameter types and return types, providing better type safety and IDE support.
## Testing
- Verified all changes compile and import correctly
- Created comprehensive test suite to validate ParamSpec functionality
- No linting errors introduced
- Maintains backward compatibility
Fixes#142306
Pull Request resolved: https://github.com/pytorch/pytorch/pull/162332
Approved by: https://github.com/Skylion007
Refactor torchscript based exporter logic to move them to a single (private) location for better code management. Original public module and method apis are preserved.
- Updated module paths in `torch/csrc/autograd/python_function.cpp` accordingly
- Removed `check_onnx_broadcast` from `torch/autograd/_functions/utils.py` because it is private&unused
@albanD / @soulitzer could you review changes in `torch/csrc/autograd/python_function.cpp` and
`torch/autograd/_functions/utils.py`? Thanks!
## BC Breaking
- **Deprecated members in `torch.onnx.verification` are removed**
Differential Revision: [D81236421](https://our.internmc.facebook.com/intern/diff/D81236421)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161323
Approved by: https://github.com/titaiwangms, https://github.com/angelayi
If onnx exporter fallbacks to draft_export with big models, this is taking forever for users, and possibly spam the printout, which keeps users from their stack trace with strict=False.
We could consider make another API for draft_export as debugging tool, or combine it with report=True when "model is small"?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161454
Approved by: https://github.com/justinchuby
Remove enable_fake_mode and exporter_legacy entirely. Even though this is bc breaking, `enable_fake_mode` is no longer compatible with the latest version of transformers, and so it is no longer useful.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/161222
Approved by: https://github.com/titaiwangms
TL;DR: Cuts vLLM cudagraph collection from 80s -> 24s
Stop garbage collecting by default on every cudagraph recording. The old behavior can be re-enabled by setting `TORCH_CUDAGRAPH_GC=1` or the config `force_cudagraph_gc`.
We were previously garbage collecting at the beginning of each cudagraph
capture. vLLM collects 5427 graphs and most of those garbage collections weren't
actually collecting any memory (CPU or GPU). This changes it to not collect more
than every 10s so if we're capturing in a loop we don't burn all our cycles
looking for garbage.
(These number have a lot of variance from run to run but give the correct
general scale)
```
| calls | total | synchronize | gcs | collect | empty cache | sys freed | cuda freed |
-------+-------+-------+-------------+------+---------+-------------+-----------+------------+
before | 5427 | 78s | 1.48s | 5427 | 53.22s | 1.21s | 145855 | 1539309568 |
-------+-------+-------+-------------+------+---------+-------------+-----------+------------+
after | 5427 | 24s | 0s | 3 | 1.53s | 0.84s | 592 | 1539309568 |
-------+-------+-------+-------------+------+---------+-------------+-----------+------------+
```
total - this is the total time reported by vLLM's "Graph capturing finished" log.
The rest of these are measured in torch.cuda.graphs.graph.__enter__():
calls - number of times torch.cuda.graphs.graph.__enter__ was called
synchronize - this is the duration taken by the cuda.synchronize call
gcs - number of times gc.collect was called
collect - this is the duration taken by the gc.collect call
empty cache - this is the duration taken by the torch.cuda.empty_cache call
sys freed - the number of bytes reported freed by gc.collect
cuda freed - the number of bytes reported freed by torch.cuda.memory_reserved
So it seems like the heavy lifting is done by torch.cuda.empty_cache() which is
fairly quick.
Cudagraph results from the TorchInductor Performance DashBoard (this is from the original version using the GC clock so the real results will be slightly better than this):
<img width="1494" height="382" alt="image" src="https://github.com/user-attachments/assets/69b705ef-47ce-4b6e-9733-1ec941cad93d" />
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158193
Approved by: https://github.com/ngimel
Differential Revision: D78431075
For #158366
- Calls runtime asserts pass for HOP subgraphs (in reenter_make_fx)
- For while_loop only (can be expanded), clones input tensors for subgraph tracing, so unbacked memos (item, nonzero, etc.) aren't reused
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158467
Approved by: https://github.com/ydwu4
Fix PyTorch tensor copying warning in ONNX export
## Problem
PyTorch ONNX exporter was generating a warning about incorrect tensor copying method:
```
UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158835
Approved by: https://github.com/justinchuby