mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Files

Natalia Gimelshein 3875e1ba45 try to make at::cat in mm_tree_reduction operate on contig tensors (#18816 )

Summary:
Sometimes at::cat gets transposed inputs and goes on a slow path. Also, make jit_premul lstm benchmark add bias to the whole input tensor to avoid separate reduction kernels in the backward pass.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18816

Differential Revision: D15013576

Pulled By: wanchaol

fbshipit-source-id: bcfa1cf44180b11b05b0f55f034707012f66281a

2019-04-24 23:44:25 -07:00

__init__.py

Turn on F401: Unused import warning. (#18598 )

2019-03-30 09:01:17 -07:00

bench.py

try to make at::cat in mm_tree_reduction operate on contig tensors (#18816 )

2019-04-24 23:44:25 -07:00

cells.py

try to make at::cat in mm_tree_reduction operate on contig tensors (#18816 )

2019-04-24 23:44:25 -07:00

custom_lstms.py

Move fast rnn benchmark to pytorch/pytorch

2019-03-27 14:46:09 -07:00

factory.py

try to make at::cat in mm_tree_reduction operate on contig tensors (#18816 )

2019-04-24 23:44:25 -07:00

profile.py

Turn on F401: Unused import warning. (#18598 )

2019-03-30 09:01:17 -07:00

README.md

Move fast rnn benchmark to pytorch/pytorch

2019-03-27 14:46:09 -07:00

runner.py

try to make at::cat in mm_tree_reduction operate on contig tensors (#18816 )

2019-04-24 23:44:25 -07:00

scratch.py

Move fast rnn benchmark to pytorch/pytorch

2019-03-27 14:46:09 -07:00

test.py

Turn on F401: Unused import warning. (#18598 )

2019-03-30 09:01:17 -07:00

README.md

Fast RNN benchmarks

Benchmarks for TorchScript models

For most stable results, do the following:

Set CPU Governor to performance mode (as opposed to energy save)
Turn off turbo for all CPUs (assuming Intel CPUs)
Shield cpus via cset shield when running benchmarks.

Some of these scripts accept command line args but most of them do not because I was lazy. They will probably be added sometime in the future, but the default sizes are pretty reasonable.

Test fastrnns (fwd + bwd) correctness

Test the fastrnns benchmarking scripts with the following: python -m fastrnns.test or run the test independently: python -m fastrnns.test --rnns jit

Run benchmarks

python -m fastrnns.bench

should give a good comparision, or you can specify the type of model to run

python -m fastrnns.bench --rnns cudnn aten jit --group rnns

Run model profiling, calls nvprof

python -m fastrnns.profile

should generate nvprof file for all models somewhere. you can also specify the models to generate nvprof files separately:

python -m fastrnns.profile --rnns aten jit

Caveats

Use Linux for the most accurate timing. A lot of these tests only run on CUDA.