mirror of
https://github.com/pytorch/pytorch.git
synced 2025-10-20 21:14:14 +08:00
This script adds a simple dataloading benchmark tracking throughput and memory. The output looks like this ``` System Information: PyTorch version: 2.9.0a0+gitf87d117 PyTorch location: /home/divyanshkhanna/pytorch/torch/__init__.py Torchvision version: 0.24.0a0+f52c4f1 Torchvision location: /home/divyanshkhanna/pytorch/vision/torchvision/__init__.py CUDA available: True CUDA device: NVIDIA PG509-210 CPU count: 192 Physical CPU cores: 96 Total system memory: 1510.11 GB Loading dataset from imagenet/val (1 copies) Dataset size: 50000 --- Benchmarking DataLoader with worker_method=multiprocessing --- Memory before DataLoader creation: 500.59 MB Detailed memory information: USS (Unique Set Size): 499.00 MB PSS (Proportional Set Size): 500.74 MB RSS (Resident Set Size): 497.39 MB Memory after DataLoader creation: 1127.61 MB Memory increase: 627.02 MB Starting training loop with 1 epochs (max 100 batches per epoch) Epoch 1, Batch 10, Time: 0.2910s, Memory: 12044.50 MB Epoch 1, Batch 20, Time: 0.2909s, Memory: 12185.71 MB Epoch 1, Batch 30, Time: 0.2909s, Memory: 10654.93 MB Epoch 1, Batch 40, Time: 0.2909s, Memory: 12378.26 MB Epoch 1, Batch 50, Time: 0.2907s, Memory: 12402.28 MB Epoch 1, Batch 60, Time: 0.2909s, Memory: 10559.35 MB Epoch 1, Batch 70, Time: 0.2907s, Memory: 12644.69 MB Epoch 1, Batch 80, Time: 0.2909s, Memory: 12654.65 MB Epoch 1, Batch 90, Time: 0.2909s, Memory: 12727.20 MB Epoch 1, Batch 100, Time: 0.2908s, Memory: 12722.09 MB Results: Worker method: multiprocessing DataLoader init time: 0.1553 seconds Average batch time: 0.3408 seconds Samples per second: 375.53 Peak memory usage: 12738.76 MB Memory increase: 12238.17 MB ``` > TODO: This script right now is CPU-only friendly and GPU friendly. But it might be worth upgrading it to test against a canonical DistributedDataParallel setup on say a 1x8 node. Or maybe we can keep that as a separate script inside `benchmarks` Pull Request resolved: https://github.com/pytorch/pytorch/pull/159432 Approved by: https://github.com/ramanishsingh
PyTorch Data Benchmarks
This directory contains benchmarks for the torch.utils.data
module components, focusing on the performance of samplers.
Dependencies
The benchmarks require the following dependencies:
numpy
tabulate
You can install them using pip:
pip install numpy tabulate
Running the benchmarks
To run the BatchSampler benchmark:
python samplers_benchmark.py
Sampler Benchmark
The samplers_benchmark.py
script benchmarks the performance of PyTorch's BatchSampler against an alternative implementation as an example. It tests with the following parameters:
- Batch sizes: 4, 8, 64, 640, 6400, 64000
- Drop last options: True, False
- Each configuration is run 10 times and averaged
- Results include speedup percentage calculations
Output
The benchmark outputs a table with the following columns:
- Batch Size
- Drop Last
- Original (s): Time taken by the original implementation
- New (s): Time taken by the alternative implementation
- Speedup: Percentage improvement of the new implementation over the original
Example output:
+------------+-----------+---------------+----------+---------+
| Batch Size | Drop Last | Original (s) | New (s) | Speedup |
+============+===========+===============+==========+=========+
| 4 | True | 0.1234 | 0.1000 | 18.96% |
+------------+-----------+---------------+----------+---------+
| 4 | False | 0.1345 | 0.1100 | 18.22% |
+------------+-----------+---------------+----------+---------+
...
Extending the Benchmark
To benchmark a different implementation:
On local:
- Modify the
NewBatchSampler
class insamplers_benchmark.py
with your implementation. Similarly replaceBatchSampler
with the corresponding PyTorch implementation.- Ensure to include all inputs like
replacement
forRandomSampler
and its variations
- Ensure to include all inputs like
- Run the benchmark to compare its performance against the original