pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Author	SHA1	Message	Date
Sebastian Messmer	6706e9af19	Make C10_MOBILE consistent with how feature macros are usually used (#17481 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17481 Usually, feature macros are either defined or undefined and checked accordingly. C10_MOBILE was a weird special case that was always defined but either defined to 1 or to 0. This caused a lot of confusion for me when trying to disable something from mobile build and it also disabled it from the server build (because I was using ifdef). Also, I found a place in the existing code base that made that wrong assumption and used the macro wrongly, see https://fburl.com/y4icohts Reviewed By: dzhulgakov Differential Revision: D14214825 fbshipit-source-id: f3a155b6d43d334e8839e2b2e3c40ed2c773eab6	2019-02-27 17:57:51 -08:00
peterjc123	ee18448138	Don't install PDB for Windows static build of caffe2_observers (#16420 ) Summary: Fixes #16292. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16420 Differential Revision: D13833704 Pulled By: soumith fbshipit-source-id: 482ad6ce103bed7206e924e8c82454fbb1bfac42	2019-01-27 12:29:49 -08:00
peter	f7733526aa	Generate PDB files for better debugging on Windows (#16008 ) Summary: 1. Unify `build_pytorch_libs.bat`, `setup.py` and `torch/CMakeLists.txt` on the debugging flags with the `CMAKE_BUILD_TYPE` being `Debug`, `Release` and `RelWithDebInfo`. 2. Install PDBs through CMake if they are generated. Reference: 1. CMake PDB install: https://gitlab.kitware.com/cmake/cmake/issues/18393#note_459199 2. About debugging flags https://stackoverflow.com/a/4662345 3. MSDN page about /DEBUG flag: https://docs.microsoft.com/en-us/cpp/build/reference/debug-generate-debug-info?view=vs-2017 4. MSDN page about /Z{i/I/7}: https://docs.microsoft.com/en-us/cpp/build/reference/z7-zi-zi-debug-information-format?view=vs-2017 Work to do: - [x] Test the changes work in Release config through this PR - [ ] <del> Test debug build through https://github.com/pytorch/pytorch/pull/16009 </del> - [x] Test release build with debugging symbols through #16013 Difficulties: - [x] Replace /Zi flags with /Z7 (which will be added if DEBUG or RelWithDebInfo is used), as it is not supported by sccache - [x] Resolve `LINK : fatal error LNK1210: exceeded internal ILK size limit; link with /INCREMENTAL:NO` in the debug build - [ ] DEBUG build blocked by a MSVC bug. In order to resolve it, we'll need to update the MSVC in CI: https://developercommunity.visualstudio.com/content/problem/225957/fatal-error-lnk1318-unexpected-pdb-error-ok-0.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/16008 Differential Revision: D13709527 Pulled By: ezyang fbshipit-source-id: e8365bc75d9ec64099093f7001f83d99a06b196b	2019-01-16 23:34:32 -08:00
Jerry Zhang	0c32e1b43e	use C10_MOBILE/ANDROID/IOS (#15363 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15363 Didn't define C10_MOBILE in the numa file move diff: D13380559 move CAFFE2_MOBILE/ANDROID/IOS to c10 ``` codemod -m -d caffe2 --extensions h,hpp,cc,cpp,mm "CAFFE2_MOBILE" "C10_MOBILE" codemod -m -d caffe2 --extensions h,hpp,cc,cpp,mm "CAFFE2_ANDROID" "C10_ANDROID" codemod -m -d caffe2 --extensions h,hpp,cc,cpp,mm "CAFFE2_IOS" "C10_IOS" ``` i-am-not-moving-c2-to-c10 Reviewed By: marcinkwiatkowski Differential Revision: D13490020 fbshipit-source-id: c4f01cacbefc0f16d5de94155c26c92fd5d780e4	2019-01-09 15:08:20 -08:00
ArutyunovG	8e91da4cb3	Windows shared build (#13550 ) Summary: Hi guys, I'd like to build Caffe2 with more supported options in Windows with Microsoft Visual Studios. This is the first pull request. Running scripts/build_windows_shared.bat is able to build Caffe2 with both CMAKE_BUILD_TYPE=Debug and CMAKE_BUILD_TYPE=Release with Visual Studio 14 2015. CUDA is 9.0, cudnn is 7.0.5, glog, gflags and lmdb are supported on my system. Python is 3.5, Detectron works from python interface as well. It was even possible to debug detectron code and step into caffe2_gpu.dll with pdbs built. What is disappointing, that c10/experimental ops don't build with this Visual Studio generator, I added special option INCLUDE_EXPERIMENTAL_C10_OPS (default ON) to deal with it in build_windows_shared.bat. After this pull request the next step is to add Visual Studio 2017 support in the script. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13550 Reviewed By: ezyang Differential Revision: D13042597 Pulled By: orionr fbshipit-source-id: f313f909f599cd582a1d000eff766eef3a9fc4fc	2018-11-16 12:16:28 -08:00
Yangqing Jia	7d5f7ed270	Using c10 namespace across caffe2. (#12714 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12714 This is a short change to enable c10 namespace in caffe2. We did not enable it before due to gflags global variable confusion, but it should have been mostly cleaned now. Right now, the plan on record is that namespace caffe2 and namespace aten will fully be supersets of namespace c10. Most of the diff is codemod, and only two places of non-codemod is in caffe2/core/common.h, where ``` using namespace c10; ``` is added, and in Flags.h, where instead of creating aliasing variables in c10 namespace, we directly put it in the global namespace to match gflags (and same behavior if gflags is not being built with). Reviewed By: dzhulgakov Differential Revision: D10390486 fbshipit-source-id: 5e2df730e28e29a052f513bddc558d9f78a23b9b	2018-10-17 12:57:19 -07:00
Yangqing Jia	38f3d1fc40	move flags to c10 (#12144 ) Summary: still influx. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12144 Reviewed By: smessmer Differential Revision: D10140176 Pulled By: Yangqing fbshipit-source-id: 1a313abed022039333e3925d19f8b3ef2d95306c	2018-10-04 02:09:56 -07:00
Yangqing Jia	a6f1ae7f20	set up c10 scaffolding. Move macros proper first. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11939 Reviewed By: orionr, dzhulgakov Differential Revision: D10004629 Pulled By: Yangqing fbshipit-source-id: ba50a96820d35c7922d81c78c4cbe849c85c251c	2018-09-24 11:09:59 -07:00
Lingyi Liu	958ba4e913	Aibench for asr decoder Summary: as title Reviewed By: sf-wind Differential Revision: D9738021 fbshipit-source-id: 98f570484bca6486ad99207732efd534ec7e3251	2018-09-12 14:25:19 -07:00
Fei Sun	14d4bdb406	Reformat output data format to make it more general for other binaries (#9555 ) Summary: This is to simplify the data format during benchmarking. After this change, we can use the same benchmarking harness data conversion method to parse data from multiple binaries. This change should be coordinated with the PR: https://github.com/facebook/FAI-PEP/pull/63 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9555 Reviewed By: pjh5 Differential Revision: D8903024 Pulled By: sf-wind fbshipit-source-id: 61cabcff99f0873729142ec6cb6dc230c685d13a	2018-07-23 11:11:26 -07:00
Brad Stocks	c4bff25282	Additional operator information values (#9153 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9153 Closes https://github.com/pytorch/pytorch/pull/9153 Modified the values reported by the benchmarking platform to include tensor_shape and op_args. These values have a different naming scheme to values like flops and latency. Reviewed By: sf-wind Differential Revision: D8729791 fbshipit-source-id: f050200be01c6d0794bf5faaa6e8cef12a00affe	2018-07-16 17:40:44 -07:00
Hao Lu	af107c4d16	Fix shape inference bug (#9199 ) Summary: Closes https://github.com/pytorch/pytorch/pull/9199 The input shapes are not logged correctly in production because `PerfNetObserver::Stop()` only gets called after the inference is done for the net and in the mobile models, it's common practice to reuse the blobs as much as possible to save memory. And the shapes of the blobs keep changing during inference. By the time you you query `InputTensorShapes()` in `PerfNetObserver::Stop()`, you only get the final shape of the blobs. To fix this bug, I moved the 'InputTensorShapes()' query from `PerfNetObserver::Stop()` to `PerfOperatorObserver::Stop()`. The latter gets called at the end of operator->run() whereas `PerfNetObserver::Stop()` gets called at the end of net->run(). Also remove `PerfOperatorObserver::getAnalyticalCost()` since it's now done on the server side and no longer needed on mobile Reviewed By: Maratyszcza Differential Revision: D8743346 fbshipit-source-id: 5d2d0132e3f5e084be7d0173863e695e62a6b4a0	2018-07-06 15:15:17 -07:00
llyfacebook	681964cc47	output each operator separately due to logcat truncation (#8456 ) as title	2018-06-13 21:05:05 -04:00
llyfacebook	0c9b5f0825	Change the output format of caffe2 observers (#8261 ) as title	2018-06-07 17:30:43 -07:00
llyfacebook	7cace7219a	Change the benchmark log format and also log flops (#8215 ) as title	2018-06-06 17:04:54 -07:00
Bram Wasti	82b981e4db	Update from facebook 1ee4edd286a3 (#8040 ) * Adding instance weight to batch distill loss as title * add bfloat 16-31 added bfloat 16-31 and their respective unit tests * [CUDA9] Upgrade - fbcode CUDA9 upgrade diff D5654023 has been out for a while thanks to Pieter. But with time growing it's becoming quite hard to rebase, because of the symlinks and auto-generated build/config files in tp2. Break D5654023 into two diffs, one touching tp2 config files, and another one touching fbcode TARGETS file (adding nvcc flag). These two should be a bit easier to rebase (for detailed procedure see "Test Plan"). This diff can only be committed if: 1. CUDA 9 rpm is rolled out fleet-wide (TBD) 2. NVidia driver 390.40 is rolled out fleet-wide (done) 3. Upgrade CUDA 9.1, cudnn 7.1, nccl 2.1 (done) 4. Make sure all dependents are built (done) 5. Test all C2 operators, PyTorch (see test plan) * Share intermediate int32 buffer across Conv ops Adding a known type * [C2 fix] infer function for ensure_cpu_output_op this is adding the missing device funtion for ensure_cpu_output_op * [int8] Add blob serializer/deserializer for Int8TensorCPU To export to logfiledb * [nomnigraph] Add try catch block to optimization passes in predictor This will catch failures that happen in the optimization pass. * Caffe2: avoid static initialization order fiasco for CAFFE_ENFORCE CAFFE_ENFORCE uses strack trace fetcher. Which is currently a global static variable. If at static initialization time CAFFE_ENFORCE is used, this is a SIOF. Recently CAFFE_ENFORCE was added into init functions registration, so we started to see this. Meyers singleton is going to provide safety here. If stacktrace fetcher was not registered yet, it will just use a dummy one. * NUMA support in SparseNN CPU benchmark Adding support for NUMA in SparseNN CPU benchmark * [mobile-roofline] Add logging needed for roofline model This should be all that's needed * Let the operators using the same input if the operators are not chained or else, we have to change the input data dims * fix null-pointer-use UBSAN errors in in reshape_op.h * revert previous fix on input blob name as title * Adding flag to let MineHardNegative automatically extract single value from dict Model exporter requires the output of the model to be a struct. This makes it convenient to use those models directly in MineHardNegative by allow automatic extraction of the single element of dict, which is a common use case. * Reverting change that broke internal tests back to OSS compatible state	2018-06-01 17:41:09 -04:00
xkszltl	89ba9dc44f	Import/export observer symbols for DLL, which fixes the linking error in Visual Studio. (#6834 ) * Import/export observer symbols for DLL, which fixes the linking error in Visual Studio. * Add support of all default cmake build types for release to cuda.	2018-05-31 10:22:21 -07:00
Lu Fang	664fe34e0a	[Caffe2][fbcode=>GH sync] Update from facebook 4323b18ce13c (#7116 ) * [fix] Re-enable events in RNN ops We have earlier added event disabling in RNN ops as back then we didn't use events, with current use cases this is no longer true (https://fburl.com/8vd0lp8y) * use ops with cude impl * Revert D7729695: [caffe2][fix] Re-enable events in RNN ops This reverts commit 4b215c7496fb724656ff4c776933a15bdbbcde5e @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * [observer] Clean up observer_config.h #accept2ship * [1/n] Refactor dataio_test.py Replace code duplication with a common function * Add barrier net that runs before training nets Add a synchonize barrier net that is run before training nets. With this net, shards that are faster will wait for other shards before start training. This reduce chances of the faster shards timing out during GLOO AllReduce. Removed explicit data_parallel_model.py.synchronize call in holmes workflow. Similar change in speech/asr_training workflow will come in another diff. * Support the dnnlowp backend in caffe2_benchmark This is for SHARE operator latency evaluation * Migrate integral_image_op to main caffe2 migrate integral_image_op(GPU version) given by https://fburl.com/yvqezigi to caffe2/caffe2/operators and implement its CPU version. Write up a test using the hypothesis_test mechanism * [pos_disc, fbcode] Implement unjoined lr loss As explained in https://our.intern.facebook.com/intern/wiki/Model_Based_Calibration/, when the dataset is an joined data set, where labels might change later, we need to use unjoined logloss. The implementation is almost the same as in Sigrid (https://fburl.com/1trngsls), where loss = y (log(p) - log(1-p)) + (1-y)(log(1-p)) = xy - (1-y)x - (1-y)log(1+exp(-x)) For x < 0, to ensure stability and avoid overflow, we reformulate the above exp as loss = xy - (1-y)x - (1-y)x + (1-y)log(1+exp(x)) = xy + (1-y)log(1+exp(x)) Then the final expression becomes loss = xy + (y - 1) x (x >= 0) - (1 - y) log(1 + exp(x - 2 x (x >= 0))) where y is the true label, x is the dot product and p = logistic(x). This kind of implementation is align with the current implementation of the original cross entropy in https://phabricator.intern.facebook.com/diffusion/FBS/browse/master/fbcode/caffe2/caffe2/operators/cross_entropy_op.cc;0bae3b5d0f825897c5e0dd0ff10f489d7271bf25$7-13 * Keep the array to fix the conflict * [C2] Compute Adagrad effective LR The AdagradWithLR op outputs an extra blob which is contains the average effective learning rate across all weights in this blob. * Open-source extractMetaNetDef & runGlobalInitialization, add new Predictor constructor from db file, and add run_map_outputs 1. Open-source extractMetaNetDef and runGlobalInitialization, for use in 2. new Predictor constructor from db file. 3. Add new run function that returns outputs as TensorMap * Disable eigen cpu Disable eigen cpu in transpose and reduce * Introduce request_only/object_only property of ModelLayer by default this is False * A simple TC Caffe2 benchmark We can run tunner, get MappingOptions and then use them to compare against cuBLAS currently broken due to LLVM issues. How to run: hg checkout eec1ab31b59c03b8deded1c755a9abaf8c45be01 add D7401202 add D7434625 add D7506031 add D7540728 buck run @mode/dev-nosan tc/tc/benchmarks_python:caffe2_benchmark * Move Caffe2 feature_maps_ops to open source Need feature maps operators in open source project facebookresearch/BlueWhale * Manually fix the conflicts in channel shuffle op * Fix the inconsistency between different gh and fbcode * Skip Adagrad GPU Test (Because some gpu implementation is missing) * Fix another test to make sure it won't run on gpu when implementation is not available yet	2018-05-01 20:49:00 -07:00
Bram Wasti	aa56a1211d	Update from facebook (#6871 ) * Track checkpoint performance in scuba As title. * [C2/CUDA]: fix cross entropy sigmoid with logits when adding log_d_trick, I forgot to add it to the cuda impl; this diff fixes it. * Back out "[caffe2] Unregister MKL fallbacks for NCHW conversions" Original commit changeset: 8918dd40205a Will land after @jongsoo's diff https://phabricator.intern.facebook.com/D7596315 lands * [Easy][C2] Don't add blob to external outputs from output_record if it's already external output As desc. * On Mobile phones, call GlobalInit with no arguments in predictor in case we need to perform initialization FACEBOOK: The QPL logger needs the initialization code. In the past, the initialization code is put in the pipeline calling Caffe2. However, those places become obsolete quickly, as the product teams change places to call Caffe2 from time to time. We also need to track which teams use Caffe2 so that we can put the initialization code there. With this diff, the initialization code is put in the predictor constructor, only enabled for mobile phones. This way, we can always enable QPL logging. Once we do this, we can check how many times Caffe2 inference is called in production, and which models are more popular in production. This way, we can prioritize our effort supporting those models. Will clean up the old code calling the init in the product in a separate diff. * add padding op for sparse length tensor to pad length-based sparse tensor with padding_value * Add conv_op with cudaconvnet engine Add conv_op with cudaconvnet engine * [numa] Fix simple NUMA copy benchmark Move XavierFill into init_net and also compute BW * call roundf (device function) instead of round (host function) * [caffe2_benchmark][observer] Make caffe2_benchmark use its own observer 1. Add ClearGlobalNetObservers() 2. Make caffe2_benchmark use its own observer and observer_reporter * [detectron] Use roundf instead of round in the detectron module ops * allow K larger than number of elements in top k op one use case is to use this op together with PackSegments for sparse tensors, where the number of elements in each slice is not statistically defined. * add ChannelShuffle DNNLOWP op * fixup math_cpu.cc break	2018-04-23 15:01:56 -07:00
Martin Schatz	8baa563daf	Change observer copy() method to take id parameter This diff is added to support the ProfileObserver in order to differentiate operators in the stepnet properly. Since copy() is only used in the context of RNNs, the name has been changed to reflect that.	2018-03-27 18:10:39 -07:00
Alexander Sidorov	e431c98205	Caffe2: Add support for several auto-created observers and move net summary to (#2304 ) a separate observer This allows to support several auto-attached observers.	2018-03-18 18:23:40 -07:00
Yangqing Jia	dd1564b061	Caffe2 module update: move observers as well as binaries. (#2145 ) * Caffe2 module update: move observers as well as binaries. * Add threads linkage * Add Threads dependency to public interface	2018-03-06 14:45:21 -08:00

22 Commits