Summary:
[Here](https://docs.gradle.org/current/userguide/gradle_wrapper.html), there is the following description.
`The recommended way to execute any Gradle build is with the help of the Gradle Wrapper`
I took a little time to prepare Gradle for `pytorch_android` build. (version etc.)
I think using Gradle wrapper will make `pytorch_android` build more seamless.
Gradle wrapper version: 4.10.3
250c71121b/.circleci/scripts/build_android_gradle.sh (L13)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51067
Reviewed By: izdeby
Differential Revision: D26315718
Pulled By: IvanKobzarev
fbshipit-source-id: f8077d7b28dc0b03ee48bcdac2f5e47d9c1f04d9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49996
According to section 5.2.1 of Snapdragon Profiler User Guide
(https://developer.qualcomm.com/qfile/30580/snapdragon_profiler_user_guide_reva.pdf)
OpenGL ES, Vulkan, and OpenCL apps must include
android.permission.INTERNET in the app's AndroidManifest.xml to enable
API tracing and GPU metrics.
Test Plan: Imported from OSS
Reviewed By: SS-JIA
Differential Revision: D25809555
Pulled By: AshkanAliabadi
fbshipit-source-id: c4d88a7ea98d9166efbc4157df7d822d99ba0df9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48620
In preparation for storing bare function pointer (8 bytes)
instead of std::function (32 bytes).
ghstack-source-id: 118568242
Test Plan: CI
Reviewed By: ezyang
Differential Revision: D25132183
fbshipit-source-id: 3790cfb5d98479a46cf665b14eb0041a872c13da
Summary:
### Java, CPP
Introducing additional parameter `device` to LiteModuleLoader to specify device on which the `forward` will work.
On the java side this is enum that contains CPU and VULKAN, passing as jint to jni side and storing it as a member field on the same level as module.
On pytorch_jni_lite.cpp - for all input tensors converting them to vulkan.
On pytorch_jni_common.cpp (also goes to OSS) - if result Tensor is not cpu - call cpu. (Not Cpu at the moment is only Vulkan).
### BUCK
Introducing `pytorch_jni_lite_with_vulkan` target, that depends on `pytorch_jni_lite_with_vulkan` and adds `aten_vulkan`
In that case `pytorch_jni_lite_with_vulkan` can be used along with `pytorch_jni_lite_with_vulkan`.
Test Plan:
After the following diff with aidemo segmentation:
```
buck install -r aidemos-android
```
{F296224521}
Reviewed By: dreiss
Differential Revision: D23198335
fbshipit-source-id: 95328924e398901d76718c4d828f96e112dfa1b0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44202
In preparation for changing mobile run_method() to be variadic, this diff:
* Implements get_method() for mobile Module, which is similar to find_method but expects the method to exist.
* Replaces calls to the current nonvariadic implementation of run_method() by calling get_method() and then invoking the operator() overload on Method objects.
ghstack-source-id: 111848222
Test Plan: CI, and all the unit tests which currently contain run_method that are being changed.
Reviewed By: iseeyuan
Differential Revision: D23436351
fbshipit-source-id: 4655ed7182d8b6f111645d69798465879b67a577
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40785
The main goal of this change is to support creating Tensors specifying blob in NHWC (ChannelsLast) format.
ChannelsLast is supported only for 4-dim tensors, this is enforced on LibTorch side, I have not added asserts on java side in case that this limitation will be changed in future and not to have double asserts.
Additional changes in `aten/src/ATen/templates/Functions.h`:
`from_blob` creates `at::empty({0}, options)` tensor first and sets it Storage with sizes and strides afterwards.
But as ChannelsLast is only for 4-dim tensors - it fails on that creation, as dim==1.
I've added `zero_sizes()` function that returns `{0, 0, 0, 0}` for ChannelsLast and ChannelsLast3d.
Test Plan: Imported from OSS
Reviewed By: dreiss
Differential Revision: D22396244
Pulled By: IvanKobzarev
fbshipit-source-id: 02582d748a554e0f859aefe71cd2c1e321fb8979
Summary:
These were added accidentally (probably by an IDE) during a refactor.
These files have always been Open Source.
Test Plan: CI
Reviewed By: xcheng16
Differential Revision: D23250761
fbshipit-source-id: 4974430c0e28dd3269424d38edb36f4f71508157
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40199
Mobile custom selective build has already been covered by `test/mobile/custom_build/build.sh`.
It builds a CLI binary with host-toolchain and runs on host machine to
check correctness of the result.
But that custom build test doesn't cover the android/gradle build part.
And we cannot use it to measure and track the in-APK size of custom
build library.
So this PR adds the selective build test coverage for android NDK build.
Also integrate with the CI to upload the custom build size to scuba.
TODO:
Ideally it should build android/test_app and measure the in-APK size.
But the test_app hasn't been covered by any CI yet and is currently
broken, so build & measure AAR instead (which can be inaccurate as we
plan to pack C++ header files into AAR soon).
Sample result: https://fburl.com/scuba/pytorch_binary_size/skxwb1gh
```
+---------------------+-------------+-------------------+-----------+----------+
| build_mode | arch | lib | Build Num | Size |
+---------------------+-------------+-------------------+-----------+----------+
| custom-build-single | armeabi-v7a | libpytorch_jni.so | 5901579 | 3.68 MiB |
| prebuild | armeabi-v7a | libpytorch_jni.so | 5901014 | 6.23 MiB |
| prebuild | x86_64 | libpytorch_jni.so | 5901014 | 7.67 MiB |
+---------------------+-------------+-------------------+-----------+----------+
```
Test Plan: Imported from OSS
Differential Revision: D22111115
Pulled By: ljk53
fbshipit-source-id: 11d24efbc49a85f851ecd0e481d14123f405b3a9
Summary:
1. Modularize some bzl files to break circular buck load
2. Use query-based on instrumentation_tests
(Note: this ignores all push blocking failures!)
Test Plan: CI
Reviewed By: kwanmacher
Differential Revision: D22188728
fbshipit-source-id: affbabd333c51c8b1549af6602c6bb79fabb7236
Summary:
edit: apparently we hardcode a lot more versions that I would've anticipated.
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40519
Differential Revision: D22221280
Pulled By: seemethere
fbshipit-source-id: ba15a910a6755ec08c10f7783ed72b1e06e6b570
Summary:
This re-applies D21232894 (b9d3869df3) and D22162524, plus updates jni_deps in a few places
to avoid breaking host JNI tests.
Test Plan: `buck test @//fbandroid/mode/server //fbandroid/instrumentation_tests/com/facebook/caffe2:host-test`
Reviewed By: xcheng16
Differential Revision: D22199952
fbshipit-source-id: df13eef39c01738637ae8cf7f581d6ccc88d37d5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40442
Problem:
Nightly builds do not include libtorch headers as local build.
The reason is that on docker images path is different than local path when building with `scripts/build_pytorch_android.sh`
Solution:
Introducing gradle property to be able to specify it and add its specification to gradle build job and snapshots publishing job which run on the same docker image.
Test:
ci-all jobs check https://github.com/pytorch/pytorch/pull/40443
checking that gradle build will result with headers inside aar
Test Plan: Imported from OSS
Differential Revision: D22190955
Pulled By: IvanKobzarev
fbshipit-source-id: 9379458d8ab024ee991ca205a573c21d649e5f8a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37243
*** Why ***
As it stands, we have two thread pool solutions concurrently in use in PyTorch mobile: (1) the open source pthreadpool library under third_party, and (2) Caffe2's implementation of pthreadpool under caffe2/utils/threadpool. Since the primary use-case of the latter has been to act as a drop-in replacement for the third party version so as to enable integration and usage from within NNPACK and QNNPACK, Caffe2's implementation is intentionally written to the exact same interface as the third party version.
The original argument in favor of C2's implementation has been improved performance as a result of using spin locks, as opposed to relinquishing the thread's time slot and putting it to sleep - a less expensive operation up to a point. That seems to have given C2's implementation the upper hand in performance, hence justifying the added maintenance complexity, until the third party version improved in parallel surpassing the efficiency of C2's implementation as I have verified in benchmarks. With that advantage gone, there is no reason to continue using C2's implementation in PyTorch mobile either from the perspective of performance or code hygiene. As a matter of fact, there is considerable performance benefit to be had as a result of using the third party version as it currently stands.
This is a tricky change though, mainly because in order to avoid potential performance regressions, of which I have witnessed none but just in abundance of caution, we have decided to continue using the internal C2's implementation whenever building for Caffe2. Again, this is mainly to avoid potential performance regressions in production C2 use cases even if doing so results in reduced performance as far as I can tell.
So to summarize, today, and as it currently stands, we are using C2's implementation for (1) NNPACK, (2) PyTorch QNNPACK, and (3) ATen parallel_for on mobile builds, while using the third party version of pthreadpool for XNNPACK as XNNPACK does not provide any build options to link against an external implementation unlike NNPACK and QNNPACK do.
The goal of this PR then, is to unify all usage on mobile to the third party implementation both for improved performance and better code hygiene. This applies to PyTorch's use of NNPACK, QNNPACK, XNNPACK, and mobile's implementation of ATen parallel_for, all getting routed to the
exact same third party implementation in this PR.
Considering that NNPACK, QNNPACK, and XNNPACK are not mobile specific, these benefits carry over to non-mobile builds of PyTorch (but not Caffe2) as well. The implementation of ATen parallel_for on non-mobile builds remains unchanged.
*** How ***
This is where things get tricky.
A good deal of the build system complexity in this PR arises from our desire to maintain C2's implementation intact for C2's use.
pthreadpool is a C library with no concept of namespaces, which means two copies of the library cannot exist in the same binary or symbol collision will occur violating ODR. This means that somehow, and based on some condition, we must decide on the choice of a pthreadpool implementation. In practice, this has become more complicated as a result of all the possible combinations that USE_NNPACK, USE_QNNPACK, USE_PYTORCH_QNNPACK, USE_XNNPACK, USE_SYSTEM_XNNPACK, USE_SYSTEM_PTHREADPOOL and other variables can result in. Having said that, I have done my best in this PR to surgically cut through this complexity in a way that minimizes the side effects, considering the significance of the performance we are leaving on the table, yet, as a result of this combinatorial explosion explained above I cannot guarantee that every single combination will work as expected on the first try. I am heavily relying on CI to find any issues as local testing can only go that far.
Having said that, this PR provides a simple non mobile-specific C++ thread pool implementation on top of pthreadpool, namely caffe2::PThreadPool that automatically routes to C2's implementation or the third party version depending on the build configuration. This simplifies the logic at the cost of pushing the complexity to the build scripts. From there on, this thread pool is used in aten parallel_for, and NNPACK and family, again, routing all usage of threading to C2 or third party pthreadpool depending on the build configuration.
When it is all said or done, the layering will look like this:
a) aten::parallel_for, uses
b) caffe2::PThreadPool, which uses
c) pthreadpool C API, which delegates to
c-1) third_party implementation of pthreadpool if that's what the build has requested, and the rabbit hole ends here.
c-2) C2's implementation of pthreadpool if that's what the build has requested, which itself delegates to
c-2-1) caffe2::ThreadPool, and the rabbit hole ends here.
NNPACK, and (PyTorch) QNNPACK directly hook into (c). They never go through (b).
Differential Revision: D21232894
Test Plan: Imported from OSS
Reviewed By: dreiss
Pulled By: AshkanAliabadi
fbshipit-source-id: 8b3de86247fbc3a327e811983e082f9d40081354
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39587
Example of using direct linking to pytorch_jni library from aar and updating android/README.md with the tutorial how to do it.
Adding `nativeBuild` dimension to `test_app`, using direct aar dependencies, as headers packaging is not landed yet, excluding `nativeBuild` from building by default for CI.
Additional change to `scripts/build_pytorch_android.sh`:
Skipping clean task here as android gradle plugin 3.3.2 exteralNativeBuild has problems with it when abiFilters are specified.
Will be returned back in the following diffs with upgrading of gradle and android gradle plugin versions.
Test Plan: Imported from OSS
Differential Revision: D22118945
Pulled By: IvanKobzarev
fbshipit-source-id: 31c54b49b1f262cbe5f540461d3406f74851db6c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39588
Before this diff we used c++_static linking.
Users will dynamically link to libpytorch_jni.so and have at least one more their own shared library that probably uses stl library.
We must have not more than one stl per app. ( https://developer.android.com/ndk/guides/cpp-support#one_stl_per_app )
To have only one stl per app changing ANDROID_STL way to c++_shared, that will add libc++_shared.so to packaging.
Test Plan: Imported from OSS
Differential Revision: D22118031
Pulled By: IvanKobzarev
fbshipit-source-id: ea1e5085ae207a2f42d1fa9f6ab8ed0a21768e96
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39507
Adding gradle task that will be run after `assemble` to add `headers` folder to the aar.
Headers are choosed for the first specified abi, they should be the same for all abis.
Adding headers works through temporary unpacking into gradle `$buildDir`, copying headers to it, zipping aar with headers.
Test Plan: Imported from OSS
Differential Revision: D22118009
Pulled By: IvanKobzarev
fbshipit-source-id: 52e5b1e779eb42d977c67dba79e278f1922b8483
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39999
Cleaned up the android build scripts. Consolidated common functions into
common.sh. Also made a few minor fixes:
- We should trust build_android.sh doing right about reusing existing
`build_android_$abi` directory;
- We should clean up `pytorch_android/src/main/jniLibs/` to remove
broken symbolic links in case custom abi list changes since last build;
Test Plan: Imported from OSS
Differential Revision: D22036926
Pulled By: ljk53
fbshipit-source-id: e93915ee4f195111b6171cdabc667fa0135d5195
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39691
After switching on using fbjni-java-only dependency, we do not need to have gradle subproject fbjni.
Test Plan: Imported from OSS
Differential Revision: D22054575
Pulled By: IvanKobzarev
fbshipit-source-id: 331478a57dd0d0aa06a5ce96278b6c897cb0ac78
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39188
Extracting Vulkan_LIBS and Vulkan_INCLUDES setup from `cmake/Dependencies.cmake` to `cmake/VulkanDependencies.cmake` and reuse it in android/pytorch_android/CMakeLists.txt
Adding control to build with Vulkan setting env variable `USE_VULKAN` for `scripts/build_android.sh` `scripts/build_pytorch_android.sh`
We do not use Vulkan backend in pytorch_android, but with this build option we can track android aar change with `USE_VULKAN` added.
Currently it is 88Kb.
Test Plan: Imported from OSS
Differential Revision: D21770892
Pulled By: IvanKobzarev
fbshipit-source-id: a39433505fdcf43d3b524e0fe08062d5ebe0d872
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37548
Moving RecordFunction from torch::autograd::profiler into at namespace
Test Plan:
CI
Imported from OSS
Differential Revision: D21315852
fbshipit-source-id: 4a4dbabf116c162f9aef0da8606590ec3f3847aa
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34710
Extending RecordFunction API to support new recording scopes (such as TorchScript functions), as well as giving more flexibility to set sampling rate.
Test Plan: unit test (test_misc.cpp/testRecordFunction)
Reviewed By: gdankel, dzhulgakov
Differential Revision: D20158523
fbshipit-source-id: a9e0819d21cc06f4952d92d43246587c36137582
Summary:
Ignore mixed upper-case/lower-case style for now
Fix space between function and its arguments violation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35574
Test Plan: CI
Differential Revision: D20712969
Pulled By: malfet
fbshipit-source-id: 0012d430aed916b4518599a0b535e82d15721f78
Summary:
Since we've done the branch cut for 1.5.0 we should bump nightlies to 1.6.0
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35495
Differential Revision: D20697043
Pulled By: seemethere
fbshipit-source-id: 3646187a5e729994138bf2c68625f25f11430b3a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32313
`torch::autograd::profiler::pushCallback()`, `torch::jit::setPrintHandler` should be called only once, not before every loading
`JITCallGuard guard;` not needed before loading module and has no effect
Test Plan: Imported from OSS
Differential Revision: D20559676
Pulled By: IvanKobzarev
fbshipit-source-id: 70cce5d2dda20a00b378639725294cb3c440bad2
Summary:
There are three guards related to mobile build:
* AutoGradMode
* AutoNonVariableTypeMode
* GraphOptimizerEnabledGuard
Today we need set some of these guards before calling libtorch APIs because we customized mobile build to only support inference (for both OSS and most FB use cases) to optimize binary size.
Several changes were made since 1.3 release so there are already inconsistent uses of these guards in the codebase. I did a sweep of all mobile related model loading & forward() call sites, trying to unify the use of these guards:
Full JIT: still set all three guards. More specifically:
* OSS: Fixed a bug of not setting the guard at model load time correctly in Android JNI.
* FB: Not covered by this diff (as we are using mobile interpreter for most internal builds).
Lite JIT (mobile interpreter): only needs AutoNonVariableTypeMode guard. AutoGradMode doesn't seem to be relevant (so removed from a few places) and GraphOptimizerEnabledGuard definitely not relevant (only full JIT has graph optimizer). More specifically:
* OSS: At this point we are not committed to support Lite-JIT. For Android it shares the same code with FB JNI callsites.
* FB:
** JNI callsites: Use the unified LiteJITCallGuard.
** For iOS/C++: manually set AutoNonVariableTypeMode for _load_for_mobile() & forward() callsites.
Ideally we should avoid having to set AutoNonVariableTypeMode for mobile interpreter. It's currently needed for dynamic dispatch + inference-only mobile build (where variable kernels are not registered) - without the guard it will try to run `variable_fallback_kernel` and crash (PR #34038). The proper fix will take some time so using this workaround to unblock selective BUCK build which depends on dynamic dispatch.
PS. The current status (of having to set AutoNonVariableTypeMode) should not block running FL model + mobile interpreter - if all necessary variable kernels are registered then it can call _load_for_mobile()/forward() against the FL model without setting the AutoNonVariableTypeMode guard. It's still inconvenient for JAVA callsites as it's set unconditionally inside JNI methods.
Test Plan: - CI
Reviewed By: xta0
Differential Revision: D20498017
fbshipit-source-id: ba6740f66839a61790873df46e8e66e4e141c728