pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 12:54:11 +08:00

Author	SHA1	Message	Date
Sam Estep	3a0801f960	[skip ci] Fix "arugment" typos (#61459 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/61455. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61459 Reviewed By: soulitzer Differential Revision: D29636559 Pulled By: samestep fbshipit-source-id: 9ad65265c0491d9e81bb303abe3a07c6843bfa4a	2021-07-15 15:20:18 -07:00
Adam Simpkins	808d0e3353	[caffe2] update make_mnist_db and make_image_db to move strings into DB::Put() (#60919 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60919 Update make_mnist_db.cc and make_image_db.cc to work with the DB API changes in D29204425 (`00896cb9ed`). This is similar to the changes to make_cifar_db.cc landed in D29374754 (`394f60b0fc`). ghstack-source-id: 132621346 Test Plan: buck build caffe2/binaries/... Reviewed By: valmikir Differential Revision: D29447314 fbshipit-source-id: 33aff85c24d8b785211287de23d46704c7eb0726	2021-06-29 11:52:43 -07:00
Adam Simpkins	394f60b0fc	[caffe2] update make_cifar_db to move the string into DB::Put() (#60692 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60692 Update make_cifar_db.cc to work with the DB API changes in D29204425 (`00896cb9ed`). Test Plan: buck build caffe2/binaries:make_cifar_db Differential Revision: D29374754 fbshipit-source-id: 23d2acd24031d11071791e398433b537215ffd38	2021-06-25 14:02:24 -07:00
Sam Estep	2e26976ad3	Disallow versionless Python shebangs (#58275 ) Summary: Some machines don't have a versionless `python` on their PATH, which breaks these existing shebangs. I'm assuming that all the existing versionless `python` shebangs are meant to be `python3` and not `python2`; please let me know if my assumption was incorrect for any of these. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58275 Test Plan: CI. Reviewed By: zhouzhuojie Differential Revision: D28428143 Pulled By: samestep fbshipit-source-id: 6562be3d12924db72a92a0207b060ef740f61ebf	2021-05-14 08:26:02 -07:00
Linbin Yu	6dd1978d4b	print average duration for caffe2 benchmark Summary: print average duration for caffe2 benchmark Test Plan: buck run //xplat/caffe2:caffe2_benchmarkAppleMac -- --init_net ~/track_init_net.pb --net ~/track_predict_net.pb --warmup 10 --input 'data' --input_dims '1,4,128,256' --input_type float --iter 20 Using additional configuration options from .buckconfig.local Building: finished in 0.6 sec (100%) 247/2137 jobs, 0 updated Total time: 0.6 sec Average Duration: 18111 us Reviewed By: larryliu0820 Differential Revision: D27745416 fbshipit-source-id: a5d20b8ef0ba4a9547d396738d5ddd1aca57684d	2021-04-13 14:19:34 -07:00
Ivan Kobzarev	85fcadc059	[lite-interpreter] speed_benchmark_torch support BUILD_LITE_INTERPRETER (#55402 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55402 Test Plan: Imported from OSS Reviewed By: cccclai Differential Revision: D27599824 Pulled By: IvanKobzarev fbshipit-source-id: 3adbb8a16a785d3610404d71ef2d895904b1a8ef	2021-04-07 11:39:32 -07:00
Ailing Zhang	24c904951c	Replace AutoNonVariableTypeMode with InferenceMode in fbcode. (#55114 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55114 Test Plan: CI Reviewed By: ezyang, bhosmer Differential Revision: D27472768 fbshipit-source-id: 76f17ef7de40f6e04e2968f8958027b5f93e1c0c	2021-04-02 11:45:53 -07:00
Sam Estep	8c798e0622	Forbid trailing whitespace (#53406 ) Summary: Context: https://github.com/pytorch/pytorch/pull/53299#discussion_r587882857 These are the only hand-written parts of this diff: - the addition to `.github/workflows/lint.yml` - the file endings changed in these four files (to appease FB-internal land-blocking lints): - `GLOSSARY.md` - `aten/src/ATen/core/op_registration/README.md` - `scripts/README.md` - `torch/csrc/jit/codegen/fuser/README.md` The rest was generated by running this command (on macOS): ``` git grep -I -l ' $' -- . ':(exclude)/contrib/' ':(exclude)third_party' \| xargs gsed -i 's/ *$//' ``` I looked over the auto-generated changes and didn't see anything that looked problematic. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53406 Test Plan: This run (after adding the lint but before removing existing trailing spaces) failed: - https://github.com/pytorch/pytorch/runs/2043032377 This run (on the tip of this PR) succeeded: - https://github.com/pytorch/pytorch/runs/2043296348 Reviewed By: walterddr, seemethere Differential Revision: D26856620 Pulled By: samestep fbshipit-source-id: 3f0de7f7c2e4b0f1c089eac9b5085a58dd7e0d97	2021-03-05 17:22:55 -08:00
Stephen Jia	c2558b4b61	[vulkan] Add nonVarTypeModeGuard to vulkan tests and speed_benchmark_torch (#52535 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52535 Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D26580994 Pulled By: SS-JIA fbshipit-source-id: 94f091432265cf6607b73c34846c07273d47c70b	2021-02-25 14:23:40 -08:00
Scott Wolchok	22c6dafd33	[PyTorch] Use plain old function pointer for RecordFunctionCallback (reapply) (#49408 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49408 Nearly every non-test callsite doesn't need to capture any variables anyway, and this saves 48 bytes per callback. ghstack-source-id: 118665808 Test Plan: Wait for GitHub CI since we had C++14-specific issues with this one in previous PR https://github.com/pytorch/pytorch/pull/48629 Reviewed By: malfet Differential Revision: D25563207 fbshipit-source-id: 6a2831205917d465f8248ca37429ba2428d5626d	2020-12-15 19:16:01 -08:00
Mike Ruberry	25bc906281	Revert D25135415: [PyTorch] Use plain old function pointer for RecordFunctionCallback Test Plan: revert-hammer Differential Revision: D25135415 (`7e23ee1598`) Original commit changeset: 5e92dc79da64 fbshipit-source-id: 45b1634a100084c84dca158a1f16ca760fef6988	2020-12-14 21:04:27 -08:00
Scott Wolchok	7e23ee1598	[PyTorch] Use plain old function pointer for RecordFunctionCallback (#48629 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48629 Nearly every non-test callsite doesn't need to capture any variables anyway, and this saves 48 bytes per callback. ghstack-source-id: 118568240 Test Plan: CI Reviewed By: dhruvbird Differential Revision: D25135415 fbshipit-source-id: 5e92dc79da6473ed15d1e381a21ed315879168f3	2020-12-14 20:08:16 -08:00
Scott Wolchok	900aa4ee97	[PyTorch] remove convenience RecordFunctionCallback interface (#48620 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48620 In preparation for storing bare function pointer (8 bytes) instead of std::function (32 bytes). ghstack-source-id: 118568242 Test Plan: CI Reviewed By: ezyang Differential Revision: D25132183 fbshipit-source-id: 3790cfb5d98479a46cf665b14eb0041a872c13da	2020-12-14 20:03:15 -08:00
Ilia Cherniavskii	db5e5b439c	Extra sampling of record function events [resend] (#49114 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49114 resend of https://github.com/pytorch/pytorch/pull/48289 Test Plan: see 48289 Reviewed By: robieta Differential Revision: D25443365 Pulled By: ilia-cher fbshipit-source-id: c15ac312222bb4d744e10199ed79801cccae8227	2020-12-11 12:53:37 -08:00
Mike Ruberry	9f7fb54693	Revert D25111515: Extra sampling of record function events Test Plan: revert-hammer Differential Revision: D25111515 (`09b974c2d5`) Original commit changeset: 0d572a3636fe fbshipit-source-id: d558d8052924d937d86db7dd40dc6388e6d28823	2020-12-09 08:37:17 -08:00
Ilia Cherniavskii	09b974c2d5	Extra sampling of record function events (#48289 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48289 Adding extra sampling step when dispatching RecordFunction. (Note: this ignores all push blocking failures!) Reviewed By: swolchok Differential Revision: D25111515 Pulled By: ilia-cher fbshipit-source-id: 0d572a3636fe649a47ec47901826bbfc08368937	2020-12-09 02:29:13 -08:00
Ashkan Aliabadi	251398acca	Force a sync on non-CPU tensors for the benchmark to reflect the timing accurately. (#48856 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48856 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D25339803 Pulled By: AshkanAliabadi fbshipit-source-id: fdfd9a0e0cc37245d7671419f492e445396fbdb8	2020-12-05 10:47:44 -08:00
Stephen Jia	cc1c3063c5	Add test binary to compare torch model outputs (#47933 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47933 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D25309199 Pulled By: SS-JIA fbshipit-source-id: adc3fc7db33c251f6b661916265b86b7b8c68fc2	2020-12-03 15:29:56 -08:00
Chester Liu	8177f63c91	Reorganize and refine the Windows.h import in C++ files (#48009 ) Summary: This PR aims to reduce the import overhead and symbol noises from the `windows.h` headers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48009 Reviewed By: gchanan Differential Revision: D25045840 Pulled By: ezyang fbshipit-source-id: 01fda70f433ba2dd0cd2d7cd676ab6ffe9d98b90	2020-11-20 14:21:09 -08:00
David Reiss	faf03bd226	Update default ouput extension in optimize_for_mobile.cc (#45598 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45598 .bc is causing issues on Android. Let's switch to .ptl. Test Plan: CI Reviewed By: kimishpatel Differential Revision: D24026180 fbshipit-source-id: 9f252f3652d748bccb19dc61a783d693e171b2c6	2020-10-15 15:34:34 -07:00
Tao Xu	a277c097ac	[iOS][GPU] Add Metal/MPSCNN support on iOS (#46112 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46112 ### Summary This PR adds the support of running torchscript models on iOS GPU via Metal (Inference only). The feature is currently in prototype state, API changes are expected. The tutorial and the documents will be added once it goes to beta. allow-large-files - Users API ``` auto module = torch::jit::load(model); module.eval(); at::Tensor input = at::ones({1,3,224,224}, at::ScalarType::Float).metal(); auto output = module.forward({input}).toTensor().cpu(); ``` - Supported Models - Person Segmentation v106 (FB Internal) - Mobilenetv2 - Supported Operators - aten::conv2d - aten::addmm - aten::add.Tensor - aten::sub.Tensor - aten::mul.Tensor - aten::relu - aten::hardtanh - aten::hardtanh_ - aten::sigmoid - aten::max_pool2d - aten::adaptive_avg_pool2d - aten::reshape - aten::t - aten::view - aten::log_softmax.int - aten::upsample_nearest2d.vec - Supported Devices - Apple A9 and above - iOS 10.2 and above - CMake scripts - `IOS_ARCH=arm64 ./scripts/build_ios.sh -DUSE_METAL=ON` ### Test Plan - Circle CI ghstack-source-id: 114155638 Test Plan: 1. Sandcastle CI 2. Circle CI Reviewed By: dreiss Differential Revision: D23236555 fbshipit-source-id: 98ffc48b837e308bc678c37a9a5fd8ae72d11625	2020-10-13 01:46:56 -07:00
Kimish Patel	6e55a26e10	Move mobile specific CPUCachingAllocator to c10/mobile folder. (#45364 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45364 Plus add some more comments about the usage, limitations and cons. Test Plan: Build and run benchmark binary. Reviewed By: gchanan Differential Revision: D23944193 fbshipit-source-id: 30d4f4991d2185a0ab768d94c846d73730fc0835	2020-09-29 11:33:26 -07:00
Ilia Cherniavskii	35596d39e9	Coalesce TLS accesses in RecordFunction constructor (#44970 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44970 Right now, when RecordFunction is not active (usual case), we do two TLS accesses (check for thread local callbacks, and check for thread local boolean). Experimenting with reducing number of TLS accesses in RecordFunction constructor. Test Plan: record_function_benchmark Reviewed By: dzhulgakov Differential Revision: D23791165 Pulled By: ilia-cher fbshipit-source-id: 6137ce4bface46f540ece325df9864fdde50e0a4	2020-09-28 21:42:23 -07:00
gunandrose4u	a4aba1d465	fix compile error (#45052 ) Summary: Update vulkanOptimizeForMobile function invoking in optimize_for_mobile.cc to align latest call contract in PR https://github.com/pytorch/pytorch/pull/44903. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45052 Reviewed By: malfet Differential Revision: D23814953 Pulled By: mrshenli fbshipit-source-id: 0fa844a8291e952715b9de35cdec0e411c42b7f9	2020-09-21 10:23:49 -07:00
Xiang Gao	20ac736200	Remove py2 compatible future imports (#44735 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44735 Reviewed By: mruberry Differential Revision: D23731306 Pulled By: ezyang fbshipit-source-id: 0ba009a99e475ddbe22981be8ac636f8a1c8b02f	2020-09-16 12:55:57 -07:00
Linbin Yu	bff741a849	Improve save_for_mobile cxx binary (#43721 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43721 We can combine optimization pass and save_for_mobile together to reduce friction. Since lite interpreter model can also be used in full JIT, I don't think we need the option to save it as full JIT model. Also - improved usage message - print op list before and after optimization pass Test Plan: ``` buck run //xplat/caffe2:optimize_for_mobile -- --model=/home/linbin/sparkspot.pt Building: finished in 12.4 sec (100%) 2597/2597 jobs, 2 updated Total time: 12.5 sec pt_operator_library( name = "old_op_library", ops = [ "aten::_convolution", "aten::adaptive_avg_pool2d", "aten::add_.Tensor", "aten::batch_norm", "aten::mul.Tensor", "aten::relu_", "aten::softplus", "aten::sub.Tensor", ], ) pt_operator_library( name = "new_op_library", ops = [ "aten::adaptive_avg_pool2d", "aten::add_.Tensor", "aten::batch_norm", "aten::mul.Tensor", "aten::relu_", "aten::softplus", "aten::sub.Tensor", "prepacked::conv2d_clamp_run", ], ) The optimized model for lite interpreter was saved to /home/linbin/sparkspot_mobile_optimized.bc ``` ``` buck run //xplat/caffe2:optimize_for_mobile -- --model=/home/linbin/sparkspot.pt --backend=vulkan ``` Reviewed By: kimishpatel Differential Revision: D23363533 fbshipit-source-id: f7fd61aaeda5944de5bf198e7f93cacf8368babd	2020-08-27 11:01:12 -07:00
Kimish Patel	2a08566b8f	Simple caching allocator for CPU. (#42006 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42006 This PR introduces a simple CPU caching allocator. This is specifically intended for mobile use cases and for inference. There is nothing specific to the implementation that can prevent it from other use cases, however its simplicity may not be suitable everywhere. It simply tracks allocation by sizes and relies on deterministic repeatable behavior where allocation of same sizes are made on every inference. Thus after the first allocation when the pointer is returned, instead of returning it to system, allocator caches it for subsequent use. Memory is freed automatically at the end of the process, or it can be explicitly freed. This is enabled at the moment in DefaultMobileCPUAllocator only. Test Plan: android test: cpu_caching_allocator_test Imported from OSS Reviewed By: dreiss Differential Revision: D22726976 fbshipit-source-id: 9a38b1ce34059d5653040a1c3d035bfc97609e6c	2020-08-21 19:09:22 -07:00
Ilia Cherniavskii	8e0714a60d	[rfc] Reduce number of coin flips in RecordFunction (#40758 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40758 Currently we flip a coin for each sampled callback each time we run RecordFunction, this PR is an attempt to skip most of the coin flips (for the low-probability observers) and keep the distribution close to the original one Test Plan: CI and record_function_benchmark ``` (python_venv) iliacher@devgpu151:~/local/pytorch (reduce_coin_flops)$ ./build/bin/record_function_benchmark Warmup time: 30108 us. Time per iteration (1x1): 1496.78 us. Time per iteration (16x16): 2142.46 us. Pure RecordFunction runtime of 10000000 iterations 687929 us, number of callback invocations: 978 (python_venv) iliacher@devgpu151:~/local/pytorch (reduce_coin_flops)$ ./build/bin/record_function_benchmark Warmup time: 19051 us. Time per iteration (1x1): 1581.89 us. Time per iteration (16x16): 2195.67 us. Pure RecordFunction runtime of 10000000 iterations 682402 us, number of callback invocations: 1023 (python_venv) iliacher@devgpu151:~/local/pytorch (reduce_coin_flops)$ ./build/bin/record_function_benchmark Warmup time: 18715 us. Time per iteration (1x1): 1566.11 us. Time per iteration (16x16): 2131.17 us. Pure RecordFunction runtime of 10000000 iterations 693571 us, number of callback invocations: 963 (python_venv) iliacher@devgpu151:~/local/pytorch (reduce_coin_flops)$ (python_venv) iliacher@devgpu151:~/local/pytorch (reduce_coin_flops)$ ./build/bin/record_function_benchmark Warmup time: 18814 us. Time per iteration (1x1): 1536.2 us. Time per iteration (16x16): 1985.82 us. Pure RecordFunction runtime of 10000000 iterations 944959 us, number of callback invocations: 1015 (python_venv) iliacher@devgpu151:~/local/pytorch (reduce_coin_flops)$ ./build/bin/record_function_benchmark Warmup time: 18278 us. Time per iteration (1x1): 1526.32 us. Time per iteration (16x16): 2093.77 us. Pure RecordFunction runtime of 10000000 iterations 985307 us, number of callback invocations: 1013 (python_venv) iliacher@devgpu151:~/local/pytorch (reduce_coin_flops)$ ./build/bin/record_function_benchmark Warmup time: 18545 us. Time per iteration (1x1): 1524.65 us. Time per iteration (16x16): 2080 us. Pure RecordFunction runtime of 10000000 iterations 952835 us, number of callback invocations: 1048 ``` Reviewed By: dzhulgakov Differential Revision: D22320879 Pulled By: ilia-cher fbshipit-source-id: 2193f07d2f7625814fe7bc3cc85ba4092fe036bc	2020-06-30 17:23:00 -07:00
Ivan Kobzarev	3852215170	[vulkan] jit passes for vulkan conv2 prepack and fuse with clamp (#39282 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39282 Test Plan: Imported from OSS Differential Revision: D21962424 Pulled By: IvanKobzarev fbshipit-source-id: 2d20e827d2c3836b7e6b443293377c68dc1ffa5a	2020-06-20 14:12:21 -07:00
Wanchao Liang	4b028a8e07	[jit] support pad_sequence/pack_sequence (#39844 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39844 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D22026720 Pulled By: wanchaol fbshipit-source-id: cc51ea77eff3689e319ec7e89a54c788646b5940	2020-06-19 19:03:14 -07:00
Xingying Cheng	0b3755b1d0	Add optimization blacklist as second arg to optimizeForMobile method. (#37462 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37462 Instead of running all the optimization pass in optimizeForMobile method, introducing a whitelist optimizer dictionary as second param in the method, when it is not passed during calling, the method will run all the optimization passes, otherwise the method will read the dict and only run the pass with value of True. ghstack-source-id: 106104503 Test Plan: python test/test_mobile_optimizer.py Imported from OSS Differential Revision: D22096029 fbshipit-source-id: daa9370c0510930f4c032328b225df0bcf97880f	2020-06-17 18:14:45 -07:00
generatedunixname89002005287564	42f0ea49ca	[Codemod][GleanFbcode] Remove dead includes in caffe2/binaries Reviewed By: ilia-cher Differential Revision: D21949969 fbshipit-source-id: 80336f82e9507dd001d079644cba5012bc5c8eed	2020-06-15 12:16:52 -07:00
Ivan Kobzarev	e399e470b6	[vulkan] speed_becnhmark_torch add vulkan arg to use Vulkan backend (#39076 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39076 `--vulkan` argument to use torch benchmark on Vulkan Backend if it is True - inputs will be converted to Vulkan backend before module.forward Usage for mobilenetv2 fp32: ``` /build/bin/speed_benchmark_torch --model=mn-fp32.pt --input_type=float --input_dims=1,3,224,224 --warmup=1 --iter=5 --vulkan=true ``` Test Plan: Imported from OSS Differential Revision: D21962428 Pulled By: IvanKobzarev fbshipit-source-id: 3136af5386b6bce9ea53ba4a9019af2d312544b3	2020-06-10 22:19:22 -07:00
Ilia Cherniavskii	2d708cefcc	Move RecordFunction into ATen (#37548 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37548 Moving RecordFunction from torch::autograd::profiler into at namespace Test Plan: CI Imported from OSS Differential Revision: D21315852 fbshipit-source-id: 4a4dbabf116c162f9aef0da8606590ec3f3847aa	2020-05-07 14:52:39 -07:00
Ilia Cherniavskii	c24c5f9684	Make RecordFunction callbacks thread local and modernize interface (#37491 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37491 This PR modernizes RecordFunction API and adds thread local callbacks in addition to the global ones Changes: - support for TLS callbacks, this is going to be the foundation of profiler and other tools - modernize interface around simple set of functions (add\|remove\|has\|clear)(Global\|ThreadLocal)(Callback) and adding RecordFunctionCallback to easily construct callbacks to be passed - we also add `.setShouldRun` into the callback interface to support cases when simple uniform sampling is not enough - to properly support add/remove introduce the idea of callback handle returned by add - internal implementation still uses SmallVector to store intermediate state (as before) - in this case these are vector of handles of callbacks that were picked to run - to speed up runtime we keep these vectors sorted, this way we can quickly enumerate callbacks that need to be run - added tests for new functionality Test Plan: BUILD_BINARY=1 USE_BLAS=MKL USE_MKLDNN=0 USE_CUDA=0 python setup.py develop install ./build/bin/test_jit CI record_function_benchmark: https://gist.github.com/ilia-cher/f1e094dae47fe23e55e7672ac4dcda2f Imported from OSS Differential Revision: D21300448 fbshipit-source-id: 6d55c26dbf20b33d35c3f1604dcc07bb063c8c43	2020-05-07 14:51:02 -07:00
Kimish Patel	dd64d26d74	Make speed_benchmark_torch report latency in us (#37953 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37953 Earlier it said us but reported ms. Test Plan: buck run aibench:run_bench -- -b aibench/specifications/models/pytorch/fbnet/fbnet_mobile_inference.json --devices s9u --remote --framework pytorch --logger_level info --job_queue aibench_interactive --platform android/full_jit Reviewed By: xcheng16 Differential Revision: D21349612 fbshipit-source-id: b97b6216eb0264123ff2c7852a0678b2008b0bf1	2020-05-07 11:08:14 -07:00
Brian Vaughan	d4edbbd396	Revert D21369541: Make a separate cmake option for caffe2 tests Test Plan: revert-hammer Differential Revision: D21369541 Original commit changeset: 669cff70c5b5 fbshipit-source-id: 500d261eaf3f02bcd698d343480b9e951e2844b9	2020-05-05 06:30:52 -07:00
Michael Suo	aff92ef3d6	Make a separate cmake option for caffe2 tests (#37721 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37721 Even though we disabled caffe2 test configs in Python, the BUILD_TEST option was still building caffe2 test cpp binaries and various CI configurations were running them (since they just run every binary in `torch/test`). This PR adds a caffe2-specific BUILD_TEST option (BUILD_CAFFE2_TEST), which defaults to OFF, and gates the compilation of caffe2 test cpp binaries under it. Test Plan: Imported from OSS Differential Revision: D21369541 Pulled By: suo fbshipit-source-id: 669cff70c5b53f016e8e016bcb3a99bf3617e1f9	2020-05-04 23:26:27 -07:00
Kimish Patel	9f02897431	Account for the change in optimizeForMobile API change. Test Plan: TBD Reviewed By: ayush29feb Differential Revision: D21185736 fbshipit-source-id: fc7abc9c2eba8e6a390e54168b1fc4a17bf80e68	2020-04-24 13:21:56 -07:00
Ashkan Aliabadi	c4b9f3bf55	Enable torch_speed_benchmark to accept different memory formats. (#36202 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36202 Test Plan: Imported from OSS Differential Revision: D20970216 Pulled By: AshkanAliabadi fbshipit-source-id: bb5a260e5677716356eec6ad4daa1f3c65420bbd	2020-04-23 13:18:43 -07:00
David Reiss	1c15cb4773	Add bundled input support to speed_benchmark_torch (#36765 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36765 We recently added support for bundling inputs with models. Now add support to the benchmarker to use those inputs. This frees users from having to look up the proper input format for each model. Test Plan: - Ran on a model without bundled inputs. Saw a clear error. - Ran on a model with too few bundled inputs. Saw a clear error. - Ran on a proper bundled input. Model executed. Differential Revision: D21142659 Pulled By: dreiss fbshipit-source-id: d23c1eb9d1de882345b007bf2bfbbbd6f964f6fe	2020-04-20 15:32:57 -07:00
Peizhao Zhang	7374a00bef	[pt]Supported benchmarking pytorch jit self-contained models. (#35279 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35279 Supported benchmarking pytorch jit self-contained models. * By specifying flag `--no_inputs=True`, the binary supports benchmarking self-contained torchscript model (model runs without inputs, `model.forward()`) * This allows moving data preparation part outside of this binary. Reviewed By: kimishpatel Differential Revision: D20585639 fbshipit-source-id: c28e50503534c90023c1430479d26f1c1ce740b1	2020-04-09 17:02:17 -07:00
Kimish Patel	c5c63a2e35	Add quick utility to transform scripted/traced models for mobile. (#35904 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35904 Currently this optimization means transform conv2d and linears to prepacked(xnnpack) equivalent. Test Plan: buck run fbsource//xplat/caffe2:optimize_for_mobile -- --model="/tmp/inpainting_fbnet.pt" Reviewed By: AshkanAliabadi Differential Revision: D20824433 fbshipit-source-id: 88d5c0d21b77911f95f018b03398b0df758ab0d7	2020-04-03 11:42:11 -07:00
Peng Xia	c3abcf83aa	[AI Bench] Resumme speed_benchmark_torch.cc to origin Summary: we removed all assistant specific code Test Plan: ``` buck run aibench:run_bench -- -b aibench/specifications/models/pytorch/fbnet/fbnet_mobile_inference.json --platform android/full_jit --framework pytorch --remote --devices SM-G950U-7.0-24 ``` https://our.intern.facebook.com/intern/aibench/details/940147322057842 Reviewed By: kimishpatel Differential Revision: D20686220 fbshipit-source-id: b7336d5ea15fa11be01abf4ad12747feaaf22ea8	2020-04-02 08:35:46 -07:00
Ilia Cherniavskii	800d5617c0	Recording of TorchScript functions (#34710 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34710 Extending RecordFunction API to support new recording scopes (such as TorchScript functions), as well as giving more flexibility to set sampling rate. Test Plan: unit test (test_misc.cpp/testRecordFunction) Reviewed By: gdankel, dzhulgakov Differential Revision: D20158523 fbshipit-source-id: a9e0819d21cc06f4952d92d43246587c36137582	2020-03-31 00:33:23 -07:00
Nikita Shulga	b9adbb5002	Fix/relax CMake linter rules (#35574 ) Summary: Ignore mixed upper-case/lower-case style for now Fix space between function and its arguments violation Pull Request resolved: https://github.com/pytorch/pytorch/pull/35574 Test Plan: CI Differential Revision: D20712969 Pulled By: malfet fbshipit-source-id: 0012d430aed916b4518599a0b535e82d15721f78	2020-03-27 16:52:33 -07:00
Ayush Saraf	3789db40f2	[aibench] added support for measuring memory on AI Bench for Caffe2 Models (#35036 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35036 Exposing the helper functions in benchmark_helper.h Reviewed By: kimishpatel, geof90 Differential Revision: D20528983 fbshipit-source-id: 73231becd93b1e700d37af425bebb628890dec9a	2020-03-25 01:58:18 -07:00
Jiakai Liu	6e47e7bf52	[pytorch][mobile] fixed AutoGradMode/AutoNonVariableTypeMode uses for mobile callsites Summary: There are three guards related to mobile build: * AutoGradMode * AutoNonVariableTypeMode * GraphOptimizerEnabledGuard Today we need set some of these guards before calling libtorch APIs because we customized mobile build to only support inference (for both OSS and most FB use cases) to optimize binary size. Several changes were made since 1.3 release so there are already inconsistent uses of these guards in the codebase. I did a sweep of all mobile related model loading & forward() call sites, trying to unify the use of these guards: Full JIT: still set all three guards. More specifically: * OSS: Fixed a bug of not setting the guard at model load time correctly in Android JNI. * FB: Not covered by this diff (as we are using mobile interpreter for most internal builds). Lite JIT (mobile interpreter): only needs AutoNonVariableTypeMode guard. AutoGradMode doesn't seem to be relevant (so removed from a few places) and GraphOptimizerEnabledGuard definitely not relevant (only full JIT has graph optimizer). More specifically: * OSS: At this point we are not committed to support Lite-JIT. For Android it shares the same code with FB JNI callsites. * FB: JNI callsites: Use the unified LiteJITCallGuard. For iOS/C++: manually set AutoNonVariableTypeMode for _load_for_mobile() & forward() callsites. Ideally we should avoid having to set AutoNonVariableTypeMode for mobile interpreter. It's currently needed for dynamic dispatch + inference-only mobile build (where variable kernels are not registered) - without the guard it will try to run `variable_fallback_kernel` and crash (PR #34038). The proper fix will take some time so using this workaround to unblock selective BUCK build which depends on dynamic dispatch. PS. The current status (of having to set AutoNonVariableTypeMode) should not block running FL model + mobile interpreter - if all necessary variable kernels are registered then it can call _load_for_mobile()/forward() against the FL model without setting the AutoNonVariableTypeMode guard. It's still inconvenient for JAVA callsites as it's set unconditionally inside JNI methods. Test Plan: - CI Reviewed By: xta0 Differential Revision: D20498017 fbshipit-source-id: ba6740f66839a61790873df46e8e66e4e141c728	2020-03-18 17:19:35 -07:00
Michael Suo	c235be42dd	[jit] kill script namespace (#34515 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34515 Once upon a time we thought this was necessary. In reality it is not, so removing it. For backcompat, our public interface (defined in `api/`) still has typedefs to the old `script::` names. There was only one collision: `Pass` as a `Stmt` and `Pass` as a graph transform. I renamed one of them. Test Plan: Imported from OSS Differential Revision: D20353503 Pulled By: suo fbshipit-source-id: 48bb911ce75120a8c9e0c6fb65262ef775dfba93	2020-03-11 23:32:48 -07:00
Peng Xia	25e4e9eb86	[On-device Benchmark] speed_benchmark_torch switch to log latency from dataset level to row level (#34598 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34598 as above Test Plan: test.txt ``` what time is it now could you set a reminder at 7 am waht is the weather today ``` example json ``` { "model": { "category": "CNN", "description": "Assistant Mobile Inference", "files": { "model": { "filename": "model.pt1", "location": "//everstore/GICWmAB2Znbi_mAAAB0P51IPW8UrbllgAAAP/model.pt1", "md5": "c0f4b29c442bbaeb0007fb0ce513ccb3" }, "data": { "filename": "input.txt", "location": "/home/pengxia/test/input.txt", "md5": "c0f4b29c442bbaeb0007fb0ce513ccb3" } }, "format": "pytorch", "framework": "pytorch", "kind": "deployment", "name": "Assistant Mobile Inference" }, "tests": [ { "command": "{program} --model {files.model} --input_dims \"1\" --input_type NLUType --warmup {warmup} --iter 5 --input_file {files.data} --report_pep true", "identifier": "{ID}", "metric": "delay", "iter": 15, "warmup": 2, "log_output": true } ] } ``` iter = 5 (--iter 5 ) 3(3 lintes in the test.txt) = 15 arbabu123 I will provide a wrapper to compute the iter in future. run following command ``` buck run aibench:run_bench -- -b aibench/specifications/models/pytorch/fbnet/assistant_mobile_inference.json --platform android/full_jit --framework pytorch --remote --devices SM-G960U-8.0.0-26 ``` results https://our.intern.facebook.com/intern/aibench/details/275259559594003 Note: this is compatible with the existing examples.* Reviewed By: kimishpatel, ljk53 Differential Revision: D20389285 fbshipit-source-id: 80165ef394439a307ac7986cf540a80fdf3d85d6	2020-03-11 13:51:42 -07:00

1 2 3 4 5

211 Commits