mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Files

mikeiovine 98b4a4100d [SR] Add a copy variant for fused_split_and_squeeze

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75660

The outputs of `split_and_squeeze` are passed to `VarStack` in models we care about. `VarStack` has a [fast path](https://www.internalfb.com/code/fbsource/[893193f5277184fd17f4ea3f28fe415a4df37707]/fbcode/caffe2/aten/src/ATen/native/TensorShape.cpp?lines=296-298) for when all of its inputs have the same strides.

Hitting the slow path adds a ton of extra overhead - so much that it's worth it to copy in `split_and_squeeze` and force all of `VarStack`'s inputs to be contiguous so we can take advantage of the fast path.

Differential Revision: [D35513777](https://our.internmc.facebook.com/intern/diff/D35513777/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D35513777/)!

Approved by: https://github.com/hlu1

2022-04-13 20:02:01 +00:00

cpp

Add cuda-11.3+clang9 build workflow (take 2)

2022-04-11 17:13:01 +00:00

distributed

Fix some typos.

2022-04-11 21:55:59 +00:00

fastrnns

Clean up profiling mode and profiling executor strategy (#73875 )

2022-03-29 18:38:51 +00:00

framework_overhead_benchmark

Remove py2 compatible future imports (#44735 )

2020-09-16 12:55:57 -07:00

functional_autograd_benchmark

Extend autograd functional benchmarking to run vectorized tasks (#67045 )

2021-12-21 17:20:29 -08:00

fuser

Benchmarks for various fusers (#67622 )

2021-11-04 18:57:17 -07:00

instruction_counts

Allow instruction counting to use shared memory as a staging ground. (And a couple other tweaks.) (#56711 )

2021-05-12 20:37:41 -07:00

operator_benchmark

[TorchArrow][AIBench] Add AIBench Metrics for TorchArrow Inference Benchmark Test (#75035 )

2022-04-01 00:35:42 +00:00

overrides_benchmark

Use classmethods for overrides (#64841 )

2021-09-17 08:32:49 -07:00

profiler_benchmark

Use libkineto in profiler (#46470 )

2020-11-25 04:32:16 -08:00

record_function_benchmark

Fix D23995953 import.

2020-09-29 19:30:23 -07:00

serialization

[JIT] Make new zip serialization for torch save/load significantly (~70%) faster (#38379 )

2020-05-29 01:56:18 -07:00

sparse

Add CSR (compressed sparse row) layout for sparse tensors (#50937 )

2021-04-12 10:09:12 -07:00

static_runtime

[SR] Add a copy variant for fused_split_and_squeeze

2022-04-13 20:02:01 +00:00

tensorexpr

Fix some typos.

2022-04-11 21:55:59 +00:00

compare-fastrnn-results.py

Benchmarks: add scripts for FastRNNs results comparison. (#44134 )

2020-09-03 13:44:42 -07:00

compare.sh

Benchmarks: add scripts for FastRNNs results comparison. (#44134 )

2020-09-03 13:44:42 -07:00

README.md

Add CSR (compressed sparse row) layout for sparse tensors (#50937 )

2021-04-12 10:09:12 -07:00

upload_scribe.py

Fix benchmark's import module and remove its usage of tools.stats.scribe (#61808 )

2021-07-19 09:45:05 -07:00

README.md

PyTorch Benchmarks

This folder contains scripts that produce reproducible timings of various PyTorch features.

It also provides mechanisms to compare PyTorch with other frameworks.

Setup environment

Make sure you're on a machine with CUDA, torchvision, and pytorch installed. Install in the following order:

# Install torchvision. It comes with the pytorch stable release binary
conda install pytorch torchvision -c pytorch

# Install the latest pytorch master from source.
# It should supersede the installation from the release binary.
cd $PYTORCH_HOME
python setup.py build develop

# Check the pytorch installation version
python -c "import torch; print(torch.__version__)"

Benchmark List

Please refer to each subfolder to discover each benchmark suite

Fast RNNs benchmarks