mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Files

Matthew Hoffman 258f47fc0b Add padding_side to pad_sequence with "left" and "right" options ("right" as default) (#131884 )

Fixes #10536

Reattempt of #61467. Thank you so much to @mskoh52 for your excellent work!

As I was trying to create a more efficient LLM data collator, I realized that `pad_sequence` only supports right padding, even though left padding is a very common format for LLMs, like Llama and Mistral.

The proposed alternative implementation was to use multiple flips, which tends to be 1.5x-2x slower. Instead we can add a [`padding_side` parameter as there is for for Hugging Face tokenizers](9d6c0641c4/src/transformers/tokenization_utils_base.py (L1565)), which requires only a very small change in the C++ code.

Here are the benchmarks of the new implementation!

`float32`:

![eaaa95ef-9384-45d2-be56-6898bc1d3514](https://github.com/user-attachments/assets/3b0eb309-e5a0-4a4d-97bb-4e3298783dbb)

`bool`:

![892f32da-8d9a-492b-9507-18d3f0a41e8e](https://github.com/user-attachments/assets/6824ea15-7d4e-4b89-95f0-8546635f0c2e)

Code:

```python
from __future__ import annotations

import random
import time
from typing import Literal

import numpy as np
import torch

def pad_sequence_with_flips(
    sequences: list[torch.Tensor],
    batch_first: bool = False,
    padding_value: int | float | bool = 0.0,
    padding_side: Literal["left", "right"] | str = "left",
) -> torch.Tensor:
    if padding_side == 'right':
        padded_sequence = torch._C._nn.pad_sequence([t.flatten() for t in sequences], batch_first=batch_first, padding_value=padding_value)
    elif padding_side=='left':
        padded_sequence = torch._C._nn.pad_sequence([t.flatten().flip(0) for t in sequences], batch_first=batch_first, padding_value=padding_value)  # pyright: ignore[reportArgumentType]
        padded_sequence = padded_sequence.flip(int(batch_first))
    else:
        raise ValueError(f"padding_side should be either 'right' or 'left', but got {padding_side}")

    return padded_sequence

sequence_lengths: list[int] = []

flip_left_pad_times: list[float] = []
flip_left_pad_times_std: list[float] = []

left_pad_times: list[float] = []
left_pad_times_std: list[float] = []

RUNS_PER_LOOP: int = 100

for i in range(1, 7):
    sequence_length = i * int(1e6) // 6
    sequence_lengths.append(sequence_length)

    sequences = [torch.randint(0, 2, (random.randint(1, sequence_length),), dtype=torch.bool) for _ in range(64)]

    inner_left_pad_times: list[float] = []
    inner_right_pad_times: list[float] = []

    inner_flip_left_pad_times: list[float] = []
    inner_flip_right_pad_times: list[float] = []

    for _ in range(RUNS_PER_LOOP):

        start = time.perf_counter()
        torch._C._nn.pad_sequence(sequences, batch_first=True, padding_value=False, padding_side="left")
        end = time.perf_counter()
        inner_left_pad_times.append(end - start)

        start = time.perf_counter()
        pad_sequence_with_flips(sequences, batch_first=True, padding_value=False, padding_side="left")
        end = time.perf_counter()
        inner_flip_left_pad_times.append(end - start)

    left_pad_times.append(sum(inner_left_pad_times) / len(inner_left_pad_times))
    left_pad_times_std.append(np.std(inner_left_pad_times))

    flip_left_pad_times.append(sum(inner_flip_left_pad_times) / len(inner_flip_left_pad_times))
    flip_left_pad_times_std.append(np.std(inner_flip_left_pad_times))

    print(f"Sequence Length: {sequence_length}, Left Pad Time: {left_pad_times[-1]}, Left with Flips Pad Time: {flip_left_pad_times[-1]}")

import matplotlib.pyplot as plt

plt.plot(sequence_lengths, left_pad_times, label="new pad_sequence left")
plt.scatter(sequence_lengths, left_pad_times)
plt.errorbar(sequence_lengths, left_pad_times, yerr=left_pad_times_std, linestyle='None', marker='^')

plt.plot(sequence_lengths, flip_left_pad_times, label="old pad_sequence left (2 flips)")
plt.scatter(sequence_lengths, flip_left_pad_times)
plt.errorbar(sequence_lengths, flip_left_pad_times, yerr=flip_left_pad_times_std, linestyle='None', marker='^')

plt.xlabel("Sequence Length")
plt.ylabel("Time (s)")
plt.legend(loc="upper right")

# Sequence Length: 166666, Left Pad Time: 0.06147645162009212, Left with Flips Pad Time: 0.09842291727001794
# Sequence Length: 333333, Left Pad Time: 0.08933195920990329, Left with Flips Pad Time: 0.15597836187991562
# Sequence Length: 500000, Left Pad Time: 0.08863158334006585, Left with Flips Pad Time: 0.15224887342999863
# Sequence Length: 666666, Left Pad Time: 0.10524682551997103, Left with Flips Pad Time: 0.18177212480995877
# Sequence Length: 833333, Left Pad Time: 0.11801802741003485, Left with Flips Pad Time: 0.20821274195001024
# Sequence Length: 1000000, Left Pad Time: 0.131894061660023, Left with Flips Pad Time: 0.23223503091008751
```

Co-authored-by: mskoh52 <mskoh52@users.noreply.github.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131884
Approved by: https://github.com/ezyang

2024-08-07 15:53:07 +00:00

any.cpp

…

autograd.cpp

[9/N] Replace c10::optional with std::optional (#130674 )

2024-07-15 00:48:43 +00:00

CMakeLists.txt

Remove more unused variables in tests (#127510 )

2024-05-31 03:39:45 +00:00

dataloader.cpp

Fix typo under test directory (#111304 )

2023-10-16 23:06:06 +00:00

dispatch.cpp

…

enum.cpp

[2/N] Move c10::variant to std::variant (#109723 )

2023-09-24 02:47:43 +00:00

expanding-array.cpp

…

fft.cpp

…

functional.cpp

Add torch check for dtype within bilinear (#118900 )

2024-02-03 00:02:00 +00:00

grad_mode.cpp

…

inference_mode.cpp

Add Warning class and refactor C++ warnings to use it (#84101 )

2022-10-18 20:02:42 +00:00

init_baseline.h

…

init_baseline.py

[BE][Easy][7/19] enforce style for empty lines in import segments in test/[a-c]*/ and test/[q-z]*/ (#129758 )

2024-07-31 10:54:03 +00:00

init.cpp

…

integration.cpp

…

ivalue.cpp

Use object identity for deepcopy memo (#126126 )

2024-05-17 00:06:26 +00:00

jit.cpp

…

memory.cpp

[codemod] c10:optional -> std::optional (#126135 )

2024-05-14 19:35:51 +00:00

meta_tensor.cpp

…

misc.cpp

…

module.cpp

[nn] zero_grad() set_to_none default True (#92731 )

2023-01-26 01:04:28 +00:00

moduledict.cpp

[fix] nn c++ : segfault in modulelist and moduledict (#93074 )

2023-01-27 12:20:19 +00:00

modulelist.cpp

[fix] nn c++ : segfault in modulelist and moduledict (#93074 )

2023-01-27 12:20:19 +00:00

modules.cpp

[structural binding][11/N] Replace std::tie with structural binding (#130830 )

2024-07-18 00:45:06 +00:00

namespace.cpp

…

nested_int.cpp

Rename singleton int to nested int (#119661 )

2024-02-16 19:21:17 +00:00

nested.cpp

Add python nested_tensor and as_nested_tensor constructors in torch.nested (#85593 )

2022-09-28 20:15:02 +00:00

nn_utils.cpp

Add padding_side to pad_sequence with "left" and "right" options ("right" as default) (#131884 )

2024-08-07 15:53:07 +00:00

operations.cpp

…

optim_baseline.h

…

optim_baseline.py

UFMT formatting on test/autograd test/ao test/cpp test/backends (#123369 )

2024-04-05 18:51:38 +00:00

optim.cpp

include scheduler_on_plateau in optim.h (#121722 )

2024-03-27 19:45:25 +00:00

ordered_dict.cpp

…

parallel_benchmark.cpp

…

parallel.cpp

…

parameterdict.cpp

…

parameterlist.cpp

…

README.md

…

rnn.cpp

Fixed crash when calling pad_packed_tensor when packed with cuda tensors and ensure_sorted=false due to indexing with tensors on different devices (#115028 )

2023-12-07 18:09:18 +00:00

sequential.cpp

…

serialize.cpp

Remove more unused variables in tests (#127510 )

2024-05-31 03:39:45 +00:00

special.cpp

…

static.cpp

Remove unused type traits in torch/csrc/utils (#128799 )

2024-06-27 23:51:18 +00:00

support.cpp

…

support.h

[BE] Add missing override to remove build warning spam (#107191 )

2023-08-15 17:32:34 +00:00

tensor_cuda.cpp

…

tensor_flatten.cpp

Fix typo under test directory (#111304 )

2023-10-16 23:06:06 +00:00

tensor_indexing.cpp

…

tensor_options_cuda.cpp

…

tensor_options.cpp

…

tensor.cpp

Extend TensorImpl with BackendMeta (#97429 )

2023-04-04 23:47:03 +00:00

torch_include.cpp

…

transformer.cpp

…

README.md

C++ Frontend Tests

In this folder live the tests for PyTorch's C++ Frontend. They use the GoogleTest test framework.

CUDA Tests

To make a test runnable only on platforms with CUDA, you should suffix your test with _CUDA, e.g.

TEST(MyTestSuite, MyTestCase_CUDA) { }

To make it runnable only on platforms with at least two CUDA machines, suffix it with _MultiCUDA instead of _CUDA, e.g.

TEST(MyTestSuite, MyTestCase_MultiCUDA) { }

There is logic in main.cpp that detects the availability and number of CUDA devices and supplies the appropriate negative filters to GoogleTest.

Integration Tests

Integration tests use the MNIST dataset. You must download it by running the following command from the PyTorch root folder:

$ python tools/download_mnist.py -d test/cpp/api/mnist

The required paths will be referenced as test/cpp/api/mnist/... in the test code, so you must run the integration tests from the PyTorch root folder.