It's not fully clear why these are not being created, but you can definitely
reproduce this in code. `__name__` is fun, since there appears to be no way to
explicitly set it on the pybind11 layer or c++ layer. I've set this in the
python wrapper code (which works correctly). But let me know if people feel
strongly and want us to go explicitly cast to python within the cpp functions
and set it there.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/147906
Approved by: https://github.com/jansel
ghstack dependencies: #147894
Fixes#10536
Reattempt of #61467. Thank you so much to @mskoh52 for your excellent work!
As I was trying to create a more efficient LLM data collator, I realized that `pad_sequence` only supports right padding, even though left padding is a very common format for LLMs, like Llama and Mistral.
The proposed alternative implementation was to use multiple flips, which tends to be 1.5x-2x slower. Instead we can add a [`padding_side` parameter as there is for for Hugging Face tokenizers](9d6c0641c4/src/transformers/tokenization_utils_base.py (L1565)), which requires only a very small change in the C++ code.
Here are the benchmarks of the new implementation!
`float32`:

`bool`:

Code:
```python
from __future__ import annotations
import random
import time
from typing import Literal
import numpy as np
import torch
def pad_sequence_with_flips(
sequences: list[torch.Tensor],
batch_first: bool = False,
padding_value: int | float | bool = 0.0,
padding_side: Literal["left", "right"] | str = "left",
) -> torch.Tensor:
if padding_side == 'right':
padded_sequence = torch._C._nn.pad_sequence([t.flatten() for t in sequences], batch_first=batch_first, padding_value=padding_value)
elif padding_side=='left':
padded_sequence = torch._C._nn.pad_sequence([t.flatten().flip(0) for t in sequences], batch_first=batch_first, padding_value=padding_value) # pyright: ignore[reportArgumentType]
padded_sequence = padded_sequence.flip(int(batch_first))
else:
raise ValueError(f"padding_side should be either 'right' or 'left', but got {padding_side}")
return padded_sequence
sequence_lengths: list[int] = []
flip_left_pad_times: list[float] = []
flip_left_pad_times_std: list[float] = []
left_pad_times: list[float] = []
left_pad_times_std: list[float] = []
RUNS_PER_LOOP: int = 100
for i in range(1, 7):
sequence_length = i * int(1e6) // 6
sequence_lengths.append(sequence_length)
sequences = [torch.randint(0, 2, (random.randint(1, sequence_length),), dtype=torch.bool) for _ in range(64)]
inner_left_pad_times: list[float] = []
inner_right_pad_times: list[float] = []
inner_flip_left_pad_times: list[float] = []
inner_flip_right_pad_times: list[float] = []
for _ in range(RUNS_PER_LOOP):
start = time.perf_counter()
torch._C._nn.pad_sequence(sequences, batch_first=True, padding_value=False, padding_side="left")
end = time.perf_counter()
inner_left_pad_times.append(end - start)
start = time.perf_counter()
pad_sequence_with_flips(sequences, batch_first=True, padding_value=False, padding_side="left")
end = time.perf_counter()
inner_flip_left_pad_times.append(end - start)
left_pad_times.append(sum(inner_left_pad_times) / len(inner_left_pad_times))
left_pad_times_std.append(np.std(inner_left_pad_times))
flip_left_pad_times.append(sum(inner_flip_left_pad_times) / len(inner_flip_left_pad_times))
flip_left_pad_times_std.append(np.std(inner_flip_left_pad_times))
print(f"Sequence Length: {sequence_length}, Left Pad Time: {left_pad_times[-1]}, Left with Flips Pad Time: {flip_left_pad_times[-1]}")
import matplotlib.pyplot as plt
plt.plot(sequence_lengths, left_pad_times, label="new pad_sequence left")
plt.scatter(sequence_lengths, left_pad_times)
plt.errorbar(sequence_lengths, left_pad_times, yerr=left_pad_times_std, linestyle='None', marker='^')
plt.plot(sequence_lengths, flip_left_pad_times, label="old pad_sequence left (2 flips)")
plt.scatter(sequence_lengths, flip_left_pad_times)
plt.errorbar(sequence_lengths, flip_left_pad_times, yerr=flip_left_pad_times_std, linestyle='None', marker='^')
plt.xlabel("Sequence Length")
plt.ylabel("Time (s)")
plt.legend(loc="upper right")
# Sequence Length: 166666, Left Pad Time: 0.06147645162009212, Left with Flips Pad Time: 0.09842291727001794
# Sequence Length: 333333, Left Pad Time: 0.08933195920990329, Left with Flips Pad Time: 0.15597836187991562
# Sequence Length: 500000, Left Pad Time: 0.08863158334006585, Left with Flips Pad Time: 0.15224887342999863
# Sequence Length: 666666, Left Pad Time: 0.10524682551997103, Left with Flips Pad Time: 0.18177212480995877
# Sequence Length: 833333, Left Pad Time: 0.11801802741003485, Left with Flips Pad Time: 0.20821274195001024
# Sequence Length: 1000000, Left Pad Time: 0.131894061660023, Left with Flips Pad Time: 0.23223503091008751
```
Co-authored-by: mskoh52 <mskoh52@users.noreply.github.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131884
Approved by: https://github.com/ezyang
Add similar semantics for creating a buffer object similar to creating a parameter. This is done by introducing a new Buffer class that can be used for type disambiguation. The underlying functionality of registering a buffer remains the same as the register_buffer method has not been changed. The persistent parameter in the Buffer type is to indicate whether a buffer object should be persistent or not. Other non-test changes have to do with getting the new Buffer type recognized by inductor and dynamo. Remaining changes are test changes to make sure that the Buffer type can be used as a drop in replacement for register_buffer as it just leads to register_buffer being called. The addition of this new functionality still allows for normal tensors to be used as buffers so these changes are intended to be backwards compatible.
Fixes#35735
Co-authored-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125971
Approved by: https://github.com/albanD, https://github.com/anijain2305, https://github.com/mlazos
We don't care about the Dynamo x TorchScript composition, so I'm
disabling these tests (so they don't get reported as flaky). Not
disabling all of the TorchScript tests yet because they have been useful
to catch random bugs.
Test Plan:
- CI
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128731
Approved by: https://github.com/williamwen42
Adds a ruff lint rule to ban raising raw exceptions. Most of these should at the very least be runtime exception, value errors, type errors or some other errors. There are hundreds of instance of these bad exception types already in the codebase, so I have noqa'd most of them. Hopefully this error code will get commiters to rethink what exception type they should raise when they submit a PR.
I also encourage people to gradually go and fix all the existing noqas that have been added so they can be removed overtime and our exception typing can be improved.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124570
Approved by: https://github.com/ezyang
Fixes#115331.
This PR increases the number of valid GPU devices to 512 (from 64) in order to future-proof PyTorch for providers that offer [single nodes with a large device count](https://www.tensorwave.com/). Until now, `DeviceIndex` was an `int8_t`, thus multiple changes were necessary:
- `DeviceIndex` changed to `int16_t`. Updated consumers that assume it to be an `int8_t`.
- Updated bounds checking for `torch.device()` in the Python frontend. Right now, we allow funny things like `torch.device('cpu', 200).index == -56`, which is undefined behavior. I inserted some checks to only allow values between 0 and `c10::Device::MAX_NUM_DEVICES - 1`.
- Updated the `ArgumentInfo` struct as it hardcodes the device index as 8 bit field [^1]. Might be a breaking change, not sure if users rely on this.
- Introduced `c10::Device::MAX_NUM_DEVICES` as a replacement for the old `C10_COMPILE_TIME_MAX_GPUS`
[^1]: This field was unsigned, so I guess this has also been undef behavior the whole time? Our default device index is -1, so this always wrapped around to 255 when written to the `ArgumentInfo` struct. When I switched the `DeviceIndex` to `int16_t`, it actually stayed 255 after unpacking from `ArgumentInfo` again, as the `DeviceIndex` was now wide enough that it didn't wrap back to -1.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119639
Approved by: https://github.com/cyyever, https://github.com/albanD, https://github.com/huydhn
Fixes#115331.
This PR increases the number of valid GPU devices to 512 (from 64) in order to future-proof PyTorch for providers that offer [single nodes with a large device count](https://www.tensorwave.com/). Until now, `DeviceIndex` was an `int8_t`, thus multiple changes were necessary:
- `DeviceIndex` changed to `int16_t`. Updated consumers that assume it to be an `int8_t`.
- Updated bounds checking for `torch.device()` in the Python frontend. Right now, we allow funny things like `torch.device('cpu', 200).index == -56`, which is undefined behavior. I inserted some checks to only allow values between 0 and `c10::Device::MAX_NUM_DEVICES - 1`.
- Updated the `ArgumentInfo` struct as it hardcodes the device index as 8 bit field [^1]. Might be a breaking change, not sure if users rely on this.
- Introduced `c10::Device::MAX_NUM_DEVICES` as a replacement for the old `C10_COMPILE_TIME_MAX_GPUS`
[^1]: This field was unsigned, so I guess this has also been undef behavior the whole time? Our default device index is -1, so this always wrapped around to 255 when written to the `ArgumentInfo` struct. When I switched the `DeviceIndex` to `int16_t`, it actually stayed 255 after unpacking from `ArgumentInfo` again, as the `DeviceIndex` was now wide enough that it didn't wrap back to -1.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119639
Approved by: https://github.com/cyyever, https://github.com/albanD
This PR relands #108238 that was closed as stale due to CLA issues and also because the CI check has marked the PR as not mergeable.
Repro 1:
```python
import torch
def f(x):
return x[x > 0]
jf = torch.jit.trace(f, torch.tensor(2., device="cuda"))
```
Error:
```bash
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/pytorch/torch/jit/_trace.py", line 874, in trace
traced = torch._C._create_function_from_trace(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<stdin>", line 2, in f
RuntimeError: NYI: Named tensors are not supported with the tracer
```
Repro2:
```python
import torch
import torch.nn.functional as F
from torch import nn
import copy
class Net(nn.Module):
def __init__(self):
super().__init__()
def forward(self, inputs):
x = copy.deepcopy(inputs) # RuntimeError: NYI: Named tensors are not supported with the tracer
x = F.relu(x)
return x
model = Net()
images = torch.randn(8, 28, 28)
torch.jit.trace(model, images)
```
Error 2:
```bash
Traceback (most recent call last):
File "/opt/pytorch/test_deepcopy.py", line 18, in <module>
File "/opt/pytorch/torch/jit/_trace.py", line 806, in trace
return trace_module(
^^^^^^^^^^^^^
File "/opt/pytorch/torch/jit/_trace.py", line 1074, in trace_module
module._c._create_method_from_trace(
File "/opt/pytorch/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/pytorch/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/pytorch/torch/nn/modules/module.py", line 1501, in _slow_forward
result = self.forward(*input, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/pytorch/test_deepcopy.py", line 12, in forward
x = F.relu(x)
^^^^^^^^^^
File "/opt/conda/envs/ptca/lib/python3.11/copy.py", line 153, in deepcopy
y = copier(memo)
^^^^^^^^^^^^
File "/opt/pytorch/torch/_tensor.py", line 122, in __deepcopy__
new_storage = self._typed_storage()._deepcopy(memo)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/pytorch/torch/storage.py", line 847, in _deepcopy
return self._new_wrapped_storage(copy.deepcopy(self._untyped_storage, memo))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/envs/ptca/lib/python3.11/copy.py", line 153, in deepcopy
y = copier(memo)
^^^^^^^^^^^^
File "/opt/pytorch/torch/storage.py", line 112, in __deepcopy__
new_storage = self.clone()
^^^^^^^^^^^^
File "/opt/pytorch/torch/storage.py", line 126, in clone
return type(self)(self.nbytes(), device=self.device).copy_(self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: NYI: Named tensors are not supported with the tracer
```
----
#48054 RuntimeError: NYI: Named tensors are not supported with the tracer
#49538 jit tracer doesn't work with unflatten layer
#31591 when i try to export a pytorch model to ONNX, got RuntimeError: output of traced region did not have observable data dependence with trace inputs; this probably indicates your program cannot be understood by the tracer.
- This bug was closed but exists. Multiple comments on it still showing error. This is addressed
Likely fixes the following issues (but untested)
#63297 Named tensor in tracer
#2323 [Bug] torch.onnx.errors.UnsupportedOperatorError when convert mask2former to onnx
Fix zero dimensioned tensors when used with jit.trace They are currently assigned an empty set for names {} this is not the same as "no name" so jit.trace bails with
"NYI: Named tensors are not supported with the tracer"
This happens when I am trying to save a non-trivial model as onnx but the simplest repro I have seen is 48054 above which has been added as test/jit/test_zero_dim_tensor_trace.py
Test plan:
New unit test added
Broken scenarios tested locally
CI
Fixes#48054
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118393
Approved by: https://github.com/zou3519
* Enable PERF402. Makes code more efficient and succinct by removing useless list copies that could be accomplished either via a list constructor or extend call. All test cases have noqa added since performance is not as sensitive in that folder.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115505
Approved by: https://github.com/malfet