mirror of
https://github.com/pytorch/pytorch.git
synced 2025-10-21 05:34:18 +08:00
Summary: This speeds-up "advanced" indexing (indexing a tensor by a tensor) on CPU and GPU. There's still a bunch of work to do, including speeding up indexing by a byte (boolean) mask and speeding up the derivative calculation for advanced indexing. Here's some speed comparisons to indexing on master using a little [benchmark script](https://gist.github.com/colesbury/c369db72aad594e5e032c8fda557d909) with 16 OpenMP threads and on a P100. The test cases are listed as (input shape -> output shape). | Test case | CPU (old vs. new) | CUDA (old vs. new) | |-----------------------|---------------------|------------------------| | 1024x1024 -> 512x1024 | 225 us vs. **57 us** | 297 us vs. **47 us** | | 1024x1024 -> 1024x512 | 208 us vs. **153 us** | 335 us vs. **54 us** | | 50x50 -> 20000x50 | 617 us vs. **77 us** | 239 us vs. **54 us** | | 50x50 -> 50x20000 | 575 us vs. **236 us** | 262 us vs. **58 us** | | 2x5x10 -> 10 | 65 us vs. **18 us** | 612 us vs. **93 us** | See #11647 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13420 Reviewed By: soumith Differential Revision: D13088936 Pulled By: colesbury fbshipit-source-id: 0a5c2ee9aa54e15f96d06692d1694c3b24b924e2
11 lines
244 B
Python
11 lines
244 B
Python
import torch
|
|
from test_indexing import *
|
|
|
|
|
|
if __name__ == '__main__':
|
|
if torch.cuda.is_available():
|
|
torch.set_default_tensor_type(torch.cuda.FloatTensor)
|
|
run_tests()
|
|
else:
|
|
print("Skipping test_indexing_cuda.py")
|