mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Files

Divyansh Khanna 6fa3592dc6 Dataloader benchmark script (#159432 )

This script adds a simple dataloading benchmark tracking throughput and memory.

The output looks like this
```
System Information:
  PyTorch version: 2.9.0a0+gitf87d117
  PyTorch location: /home/divyanshkhanna/pytorch/torch/__init__.py
  Torchvision version: 0.24.0a0+f52c4f1
  Torchvision location: /home/divyanshkhanna/pytorch/vision/torchvision/__init__.py
  CUDA available: True
  CUDA device: NVIDIA PG509-210
  CPU count: 192
  Physical CPU cores: 96
  Total system memory: 1510.11 GB

Loading dataset from imagenet/val (1 copies)
Dataset size: 50000

--- Benchmarking DataLoader with worker_method=multiprocessing ---
Memory before DataLoader creation: 500.59 MB

Detailed memory information:
  USS (Unique Set Size): 499.00 MB
  PSS (Proportional Set Size): 500.74 MB
  RSS (Resident Set Size): 497.39 MB
Memory after DataLoader creation: 1127.61 MB
Memory increase: 627.02 MB
Starting training loop with 1 epochs (max 100 batches per epoch)
Epoch 1, Batch 10, Time: 0.2910s, Memory: 12044.50 MB
Epoch 1, Batch 20, Time: 0.2909s, Memory: 12185.71 MB
Epoch 1, Batch 30, Time: 0.2909s, Memory: 10654.93 MB
Epoch 1, Batch 40, Time: 0.2909s, Memory: 12378.26 MB
Epoch 1, Batch 50, Time: 0.2907s, Memory: 12402.28 MB
Epoch 1, Batch 60, Time: 0.2909s, Memory: 10559.35 MB
Epoch 1, Batch 70, Time: 0.2907s, Memory: 12644.69 MB
Epoch 1, Batch 80, Time: 0.2909s, Memory: 12654.65 MB
Epoch 1, Batch 90, Time: 0.2909s, Memory: 12727.20 MB
Epoch 1, Batch 100, Time: 0.2908s, Memory: 12722.09 MB

Results:
  Worker method: multiprocessing
  DataLoader init time: 0.1553 seconds
  Average batch time: 0.3408 seconds
  Samples per second: 375.53
  Peak memory usage: 12738.76 MB
  Memory increase: 12238.17 MB
```

> TODO: This script right now is CPU-only friendly and GPU friendly. But it might be worth upgrading it to test against a canonical DistributedDataParallel setup on say a 1x8 node. Or maybe we can keep that as a separate script inside `benchmarks`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/159432
Approved by: https://github.com/ramanishsingh

2025-08-06 19:05:19 +00:00

dataloader_benchmark.py

Dataloader benchmark script (#159432 )

2025-08-06 19:05:19 +00:00

README.md

…

samplers_benchmark.py

…

README.md

PyTorch Data Benchmarks

This directory contains benchmarks for the torch.utils.data module components, focusing on the performance of samplers.

Dependencies

The benchmarks require the following dependencies:

numpy
tabulate

You can install them using pip:

pip install numpy tabulate

Running the benchmarks

To run the BatchSampler benchmark:

python samplers_benchmark.py

Sampler Benchmark

The samplers_benchmark.py script benchmarks the performance of PyTorch's BatchSampler against an alternative implementation as an example. It tests with the following parameters:

Batch sizes: 4, 8, 64, 640, 6400, 64000
Drop last options: True, False
Each configuration is run 10 times and averaged
Results include speedup percentage calculations

Output

The benchmark outputs a table with the following columns:

Batch Size
Drop Last
Original (s): Time taken by the original implementation
New (s): Time taken by the alternative implementation
Speedup: Percentage improvement of the new implementation over the original

Example output:

+------------+-----------+---------------+----------+---------+
| Batch Size | Drop Last | Original (s)  | New (s)  | Speedup |
+============+===========+===============+==========+=========+
|          4 | True      | 0.1234        | 0.1000   | 18.96%  |
+------------+-----------+---------------+----------+---------+
|          4 | False     | 0.1345        | 0.1100   | 18.22%  |
+------------+-----------+---------------+----------+---------+
...

Extending the Benchmark

To benchmark a different implementation:

On local:

Modify the NewBatchSampler class in samplers_benchmark.py with your implementation. Similarly replace BatchSampler with the corresponding PyTorch implementation.
- Ensure to include all inputs like replacement for RandomSampler and its variations
Run the benchmark to compare its performance against the original