211 Commits

Author SHA1 Message Date
a1ea7dbe40 Adjust the API call to deserilize the tensorproto (#14132)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14132

as title

Reviewed By: jerryzh168

Differential Revision: D13110697

fbshipit-source-id: 822c9079de11951f90aec3d26f0e4108847e7dac
2018-12-10 22:54:42 -08:00
9b272c08cf Remove partially initialized Tensor in Deserialization (#14197)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14197

Pull Request resolved: https://github.com/pytorch/pytorch/pull/13642

Previously we pass in a patially initialized Tensor to Deserialize and it will fill
it with the result of deserialization of a tensor proto. Now we want it to return
a Tensor directly since it's just a shared pointer to TensorImpl.

Reviewed By: dzhulgakov

Differential Revision: D12874357

fbshipit-source-id: 12b80a763375da23cfa64a74d6bc186d8d03b94f
2018-12-10 17:17:29 -08:00
e9db9595d2 Add crop argument, can crop rec as well, first resize and then crop
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14894

Reviewed By: llyfacebook

Differential Revision: D13377604

Pulled By: sf-wind

fbshipit-source-id: 333d0d864e6c2dc85f405baa25ed58029d62750f
2018-12-08 11:14:56 -08:00
738fc7054b Report timer in benchmarking when requested
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14570

Reviewed By: llyfacebook

Differential Revision: D13264904

Pulled By: sf-wind

fbshipit-source-id: fd05bc32202b7734dc911e3c792357ddf9ecedee
2018-11-30 13:17:29 -08:00
8e91da4cb3 Windows shared build (#13550)
Summary:
Hi guys,

I'd like to build Caffe2 with more supported options in Windows with Microsoft Visual Studios.
This is the first pull request.
Running scripts/build_windows_shared.bat is able to build Caffe2 with both CMAKE_BUILD_TYPE=Debug and CMAKE_BUILD_TYPE=Release with Visual Studio 14 2015.
CUDA is 9.0, cudnn is 7.0.5, glog, gflags and lmdb are supported on my system.
Python is 3.5, Detectron works from python interface as well.
It was even possible to debug detectron code and step into caffe2_gpu.dll with pdbs built.

What is disappointing, that c10/experimental ops don't build with this Visual Studio generator, I added special option INCLUDE_EXPERIMENTAL_C10_OPS (default ON) to deal with it in build_windows_shared.bat.

After this pull request the next step is to add Visual Studio 2017 support in the script.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13550

Reviewed By: ezyang

Differential Revision: D13042597

Pulled By: orionr

fbshipit-source-id: f313f909f599cd582a1d000eff766eef3a9fc4fc
2018-11-16 12:16:28 -08:00
0d7a986da1 Change hip filename extension to .hip (#14036)
Summary:
xw285cornell

- To make hip files to have unique filename extension we change hip files from _hip.cc to .hip (it's the only blessing option other than .cu in hipcc 3d51a1fb01/bin/hipcc (L552)).
- Change to use host compiler to compile .cc|.cpp files. Previously we use hcc to compile them which is unnecessary
- Change the hipify script to not replace "gpu" with "hip" in the filename of the generated hipified files. Previously we do this because hcc has a bug when linking files that have same filename. We have now changed to use host linker to do linking so this is unnecessary anymore.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14036

Reviewed By: xw285cornell

Differential Revision: D13091813

Pulled By: bddppq

fbshipit-source-id: ea3d887751d8abb39d75f5d5104aa66ce66b9ee0
2018-11-16 11:55:59 -08:00
5639332a28 fix the deeptext issue (#14005)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14005

the partial initialization of tensor is no longer supported, we need to fix multiple places

Reviewed By: hl475

Differential Revision: D13078206

fbshipit-source-id: a1be2bd2a9f573db54e1366a0d7a17cc2e0db0c9
2018-11-15 12:13:45 -08:00
a0e783768f Do not fill in new data in every iteration if the input data only has one entry (#13495)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13495

If the user has one data file to put in, the data is filled up in every iteration,
which actually flushes the caches. The retrieved latency is larger than the latency
when the caches are warm. Instead of doing that, we should only rely on wipe_cache
variable to wipe the caches.

The change is to skip filling the data if the input only has one size and it is
not the first iteration

Reviewed By: hl475

Differential Revision: D12897946

fbshipit-source-id: ee54ed09b8ec85fcefe930858420b90d494ad972
2018-11-01 22:06:09 -07:00
8444ed951d add sleep time between runs (#12347)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12347

add sleep time between net and operator runs, and between each iteration.

Reviewed By: sf-wind

Differential Revision: D10209308

fbshipit-source-id: 9a42b47e1fdc14b42dba6bb3ff048fe8e2934615
2018-11-01 00:25:22 -07:00
eea2ee6d29 Renaming size() to numel() - 1/17
Summary: Codemod generated with clangr shard mode, 25 files per diff

Reviewed By: li-roy

Differential Revision: D10866237

fbshipit-source-id: 020fcfdf52083430c5b674eda8e07ad3adfcc838
2018-10-26 15:36:59 -07:00
a6949abb15 Guard all Caffe2 protobuf string serializations with CAFFE_ENFORCE (fixed reverted bug) (#12848)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12848

Updated all non-test uses of protobuf::MessageLite::SerializeAsString to call
SerializeAsString_EnforceCheck so that the return value is checked and can
throw an exception if failing.

Most of the affected code was called from classes derived from  BlobSerializeBase.
Didn't touch most tests and ENFORCE calls because they usually do checks
anyway.

Original commit changeset: c0760e73ecc7

Reviewed By: dzhulgakov

Differential Revision: D10453456

fbshipit-source-id: d2f2b7b4578e721924354149f08f627c7e3bf070
2018-10-23 16:21:26 -07:00
805f4d5cb8 Revert D10416438: Guard all Caffe2 protobuf string serializations with CAFFE_ENFORCE
Differential Revision:
D10416438

Original commit changeset: cb842e3e26b0

fbshipit-source-id: c0760e73ecc76ca9b1b74f6844e243c2df5260a2
2018-10-18 13:46:33 -07:00
63cd051867 Guard all Caffe2 protobuf string serializations with CAFFE_ENFORCE (#12799)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12799

Updated all non-test uses of protobuf::MessageLite::SerializeAsString to call
SerializeAsString_EnforceCheck so that the return value is checked and can
throw an exception if failing.

Most of the affected code was called from classes derived from  BlobSerializeBase.
Didn't touch most tests and ENFORCE calls because they usually do checks
anyway.

Reviewed By: ezyang

Differential Revision: D10416438

fbshipit-source-id: cb842e3e26b0918829d71267a375d4dd40600d58
2018-10-18 12:49:01 -07:00
7d5f7ed270 Using c10 namespace across caffe2. (#12714)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12714

This is a short change to enable c10 namespace in caffe2. We did not enable
it before due to gflags global variable confusion, but it should have been
mostly cleaned now. Right now, the plan on record is that namespace caffe2 and
namespace aten will fully be supersets of namespace c10.

Most of the diff is codemod, and only two places of non-codemod is in caffe2/core/common.h, where

```
using namespace c10;
```

is added, and in Flags.h, where instead of creating aliasing variables in c10 namespace, we directly put it in the global namespace to match gflags (and same behavior if gflags is not being built with).

Reviewed By: dzhulgakov

Differential Revision: D10390486

fbshipit-source-id: 5e2df730e28e29a052f513bddc558d9f78a23b9b
2018-10-17 12:57:19 -07:00
01a333fd7f OpenCV 4.0 Compatibility fix (#9966)
Summary:
caffe2 compiles with latest opencv 4.0 after committed changes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9966

Differential Revision: D10369130

Pulled By: ezyang

fbshipit-source-id: 9a104803edca5a22e27e140a794e4b8c878ca416
2018-10-15 17:42:04 -07:00
f1f521f71b make bench_gen.py work for 3d conv (#12433)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12433

To test 3d conv, we need to pass lists in spec argument. We also don't want to set use_cudnn=True which is the default in brew.

Reviewed By: llyfacebook, csummersea

Differential Revision: D10234315

fbshipit-source-id: 96a39992a97e020d6e9dac103e6d64df0cc1020b
2018-10-08 12:24:43 -07:00
38f3d1fc40 move flags to c10 (#12144)
Summary:
still influx.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12144

Reviewed By: smessmer

Differential Revision: D10140176

Pulled By: Yangqing

fbshipit-source-id: 1a313abed022039333e3925d19f8b3ef2d95306c
2018-10-04 02:09:56 -07:00
07bb79bd8b Use caffe2::int8::Int8TensorCPU when input type is uint8_t (#12274)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12274

We use caffe2::int8::Int8TensorCPU for quantized tensor with uint8_t element type.

Reviewed By: llyfacebook

Differential Revision: D10156452

fbshipit-source-id: 52cf2bedc9dbb433cd5d03f0b76723f7df6a7361
2018-10-03 19:26:16 -07:00
74dc4460eb New in StaticContext returns at::DataPtr (#12029)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12029

In order to remove New() function in StaticContext(to remove StaticContext) and converge to the Allocator design, we'll first change the return type of New to at::DataPtr.

Reviewed By: ezyang

Differential Revision: D9889990

fbshipit-source-id: 3257c763530b987025f428741bdd2e089d11bad4
2018-10-03 19:10:07 -07:00
557015fd93 wipe cache with writes (#12279)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12279

By some reason if we don't write to the wipe buffer, it doesn't really wipe out everything from caches in x86.
We also need to wipe out cache after initializing input blobs.

Reviewed By: Maratyszcza

Differential Revision: D10161211

fbshipit-source-id: c34414dd8b83947805010d7d57e4134d56de1430
2018-10-03 17:12:23 -07:00
c0ed48a57e Add support to the accuracy metric (#12211)
Summary:
The code that reads a blob from input files are broken. Fixing them. Also, add a binary that converts input files to blobs that can be used by Caffe2 directly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12211

Reviewed By: llyfacebook

Differential Revision: D10121845

Pulled By: sf-wind

fbshipit-source-id: 6e48bb594680bcb3186d8d43276b602041c30d3e
2018-10-03 02:10:51 -07:00
a76216b8ed Back out "[aibench] Use caffe2::int8::Int8TensorCPU when input type is uint8_t"
Summary: Original commit changeset: b63cd3a75f87

Reviewed By: bddppq

Differential Revision: D10154512

fbshipit-source-id: 039dfd295c5d1de799993a20e708915be65e9d76
2018-10-02 16:25:11 -07:00
04b0774964 Use caffe2::int8::Int8TensorCPU when input type is uint8_t (#12250)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12250

We use caffe2::int8::Int8TensorCPU for quantized tensor with uint8_t element type.

Reviewed By: llyfacebook

Differential Revision: D10121216

fbshipit-source-id: b63cd3a75f87e043cc3c83de4f3520b6ffbf1d07
2018-10-02 14:57:28 -07:00
f3c32a4b54 dnnlowp_16 -> dnnlowp_acc16 (#12205)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12205

We're more interested in testing the performance of DNNLOWP_ACC16 engine.

Reviewed By: llyfacebook

Differential Revision: D10121080

fbshipit-source-id: 7def38be838feb7636f7dd0c8ed352c2df398ec1
2018-10-01 09:40:13 -07:00
65cbb8226b IValue can store Blob (#11414)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11414

caffe2::Blob can be stored in an IValue. This is a precondition for caffe2 to switch from Blob to IValue.

Reviewed By: ezyang

Differential Revision: D9731326

fbshipit-source-id: 462a39d2d9ab6f85b99b1670848c6976a3de417c
2018-09-26 01:12:31 -07:00
8f0db9bbbb Removing some dependency edges from Blob to other caffe2 (#12043)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12043

Re-trying D9979976, this time with all call sites fixed.

D9979976 got reverted because there was a call site that wasn't covered by sandcastle it seems.
I fixed it and used 'grep' to ensure there aren't any more call sites in fbsource.

Reviewed By: ezyang

Differential Revision: D10026392

fbshipit-source-id: cd341514a8e53a40147ea0ee3e52f63bb6444157
2018-09-25 11:40:24 -07:00
2cdf98a74d Back out "Removing some dependency edges from Blob to other caffe2"
Summary: The controller you requested could not be found. Original commit changeset: 2ea17724e223

Differential Revision:
D10026321
Ninja: stable broken

fbshipit-source-id: faf87cb7cc0f78c2c10d4aa6fceea279cd27acd6
2018-09-25 01:11:14 -07:00
17a65bf9b6 Removing some dependency edges from Blob to other caffe2 (#11923)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11923

This is pre-work to allow moving Blob to ATen/core, which cannot depend on caffe2 anymore.
(1) Removing the Blob -> Tensor dependency allows us to move Blob to ATen/core and use it inside IValue without having to wait for the Tensor merge to be complete.
(2) In the final Blob design, we want it to be a very small class that doesn't have any special treatment for Tensor (or to be more correct, doesn't allow storing Tensor anymore), so this is anyhow the direction we want to go.

This changes call sites that will have to be moved to IValue later, but they cannot be moved to IValue directly, because for that, IValue first needs to be able to store Blob, which in turn first needs this diff and some other changes coming up in future diffs.

Codemods:
$ codemod --extensions h,hpp,c,cpp,cc "([a-zA-Z0-9_]+)\\.IsTensorType\\(" "BlobIsTensorType(\\1, "
$ codemod --extensions h,hpp,c,cpp,cc "([a-zA-Z0-9_]+)->IsTensorType\\(" "BlobIsTensorType(*\\1, "
$ codemod --extensions h,hpp,c,cpp,cc "([a-zA-Z0-9_]+)\\.GetMutableTensor\\(" "BlobGetMutableTensor(\\1, "
$ codemod --extensions h,hpp,c,cpp,cc "([a-zA-Z0-9_]+)->GetMutableTensor\\(" "BlobGetMutableTensor(*\\1, "

It is, however, not only these codemods because regex based refactoring was only able to match a small amount of the call sites. To catch more, I wouldn've needed a AST aware tool like clangr, which I didn't figure out how to use.

Reviewed By: ezyang

Differential Revision: D9979976

fbshipit-source-id: 2ea17724e223b5b73b44f99362727759ca689e61
2018-09-24 22:57:05 -07:00
a6630e25af Remove many caffe2::TIndex and replace them with int64_t (#11943)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11943

See title

Reviewed By: ezyang

Differential Revision: D9992645

fbshipit-source-id: e8f80d6ea762971513e5e8072975ceea53e1f11a
2018-09-22 18:11:04 -07:00
b2b05b7c20 Move blob serialization to free functions (#11817)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11817

Blob::Serialize() and Blob::Deserialize() are now free functions SerializeBlob(), DeserializeBlob() instead.
This takes away access to Blob internals from them and makes future refactorings easier.

Reviewed By: ezyang

Differential Revision: D9882726

fbshipit-source-id: 3251ebd4b53fc12f5e6924a6e4a8db3846ab3729
2018-09-20 23:27:34 -07:00
9f4bcdf075 caffe2::DeviceType -> at::DeviceType (#11254)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11254
Previously we use DeviceType in caffe2.proto directly, but it's an `enum` and have implicit conversion to int, which does not have type safety, e.g. we have to explicitly check for a device type is valid in event.h:
```
template <int d>
struct EventCreateFunctionRegisterer {
  explicit EventCreateFunctionRegisterer(EventCreateFunction f) {
    static_assert(d < MaxDeviceTypes, "");
    Event::event_creator_[d] = f;
  }
};
```
at::DeviceType is an `enum class`, and it does not have implicit conversion to int, and provides better type safety guarantees. In this diff we have done the following refactor(taking CPU as an example):

    1. caffe2::DeviceType → caffe2::DeviceTypeProto
    2. caffe2::CPU → caffe2::PROTO_CPU
    3. caffe2::DeviceType = at::DeviceType
    4. caffe2::CPU = at::DeviceType::CPU

codemod -d caffe2/caffe2 --extensions h,cc,cpp 'device_type\(\), ' 'device_type(), PROTO_'
+ some manual changes

In short, after this diff, in c++, caffe2::CPU refers to the at::DeviceType::CPU and the old proto caffe2::CPU will be caffe2::PROTO_CPU.
In python side, we have a temporary workaround that alias `caffe2_pb2.CPU = caffe2_pb2.PROOT_CPU` to make the change easier to review and this will be removed later.

Reviewed By: ezyang

Differential Revision: D9545704

fbshipit-source-id: 461a28a4ca74e616d3ee183a607078a717fd38a7
2018-09-05 16:28:09 -07:00
26409a4300 Caffe2 flags needs to be used after the GlobalInit function is called
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11120

Reviewed By: llyfacebook

Differential Revision: D9598430

Pulled By: sf-wind

fbshipit-source-id: 468f0ed7880339c9c4467d1cef29f5bc9fc80a2a
2018-08-30 19:10:39 -07:00
e7195431e0 Add benchmarking functionality to the benchmark app (#10976)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10976

The app can run in XCode with the benchmark metrics collected.
It can also run when building with buck

Reviewed By: llyfacebook

Differential Revision: D9546755

fbshipit-source-id: 60ad0112946f8cf57138417f6838a58ed6d2c90f
2018-08-30 09:54:55 -07:00
91797c0672 Replace direct include of caffe2.pb.h with an intermediary header caffe2_pb.h (#10946)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10946

```
codemod -d . --extensions cc,cpp,cu,cuh,h caffe2/proto/caffe2.pb.h caffe2/proto/caffe2_pb.h
```

Reviewed By: houseroad

Differential Revision: D9539945

fbshipit-source-id: 497d04720e8e7e61c05ffe1b23733d0cb774de7e
2018-08-28 11:57:08 -07:00
ddc37d7487 Update mobile predictor caller's interface
Summary: Update all the caller for the new interface

Reviewed By: highker

Differential Revision: D9323167

fbshipit-source-id: a39335ceb402db0719f5f2314085ba9a81380308
2018-08-24 23:40:05 -07:00
319fefe9e6 Support benchmark on windows machines
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10564

Reviewed By: llyfacebook

Differential Revision: D9356389

Pulled By: sf-wind

fbshipit-source-id: f6c58e68d3eaf3a39c9f89b8f04e6039c75b4cd9
2018-08-16 10:56:23 -07:00
099a545376 Hipify Caffe2 binaries (#10468)
Summary:
petrex
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10468

Reviewed By: yinghai

Differential Revision: D9301178

Pulled By: bddppq

fbshipit-source-id: 5da88aa4d79a5142f8e744cdcd8ae85951bc387c
2018-08-13 20:56:28 -07:00
40109b16d0 Remove caffe1 specific proto (#10380)
Summary:
This was used as a convenient way for us to convert c1 models. Now that conversion is more or less done, we should probably require any users who need to convert c1 models to explicitly install c1. This PR removes the explicit c1 proto (which was copied from c1) in favor of explicit installation.

Note that caffe_translator would still work properly, only difference is that now users need to install c1 separately.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10380

Differential Revision: D9267981

Pulled By: Yangqing

fbshipit-source-id: a6ce5d9463e6567976da83f2d08b2c3d94d14390
2018-08-10 11:10:26 -07:00
267c397c5b Add the ocr_det model for benchmarking (#10245)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10245

as title

Reviewed By: sf-wind

Differential Revision: D9176654

fbshipit-source-id: 3339d2aa6a0ceb0e751745c06dcfd025ccbf5449
2018-08-05 16:45:35 -07:00
7f2e43a084 Add the ocr_rec model json (#10240)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10240

as title

Reviewed By: sf-wind

Differential Revision: D9176522

fbshipit-source-id: 5b92c0b4ed24f96fe7b1321a3ab5ad26dcd3318d
2018-08-05 16:45:23 -07:00
aebf3b47ae Remove template parameter from Tensor (#9939)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9939

Pull Request resolved: https://github.com/facebookresearch/weakly-supervised-action-detection/pull/13

Pull Request resolved: https://github.com/pytorch/translate/pull/166

Pull Request resolved: https://github.com/pytorch/pytorch/pull/9125

Closes https://github.com/pytorch/pytorch/pull/9125

Use inheritance for polymorphism, and remove template parameter
This is to change the templating in call sites, the core implementations will change later

Before Caffe2 Tensor class was compile-time fixed to bind to a particular device/context. With this change, we're making it a runtime property (stored inside the tensor), but preserve the same semantics. For example, one has to specify device type in order to create a Tensor - there are no uninitialized tensors. More specifically the changes are:

1. We added an extra argument *DeviceType* to most of the constructors of the tensor, e.g. (Tensor(DeviceType type)),
2. Semantics of constructor Tensor(const Tensor<SrcContext>& src, ContextForCopy* context); is changed, in this constructor, the second context is passed in to enable us to call the templated Copy function, it could be in a different context as source and target previously, now we'll enforce that the context should have same device type as src, if it is provided.
3. To preserve 'get-or-construct' semantics of Blob, we added specialized getter Blob::GetMutableTensor that verifies both that Blob contains a Tensor and that it's of a correct type
4. Specifically, Tensor type is not default-constructible any more (as we don't have unknown device tensors) and thus some of the code handling STL containers needs to change

Note: Some changes are postponed just to keep this diff a bit smaller. Please see `TODO`s.

Reviewed By: ezyang, houseroad

Differential Revision: D9024330

fbshipit-source-id: e0b8295d2dc6ebe2963383ded5af799ad17164ba
2018-07-27 10:56:39 -07:00
dfa0af093d Move predictor into caffe2/caffe2/predictor (#9548)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9548

Pull Request resolved: https://github.com/pytorch/translate/pull/157

One part of refactor predictor. Move all the files into predictor dir.

Reviewed By: highker

Differential Revision: D8845276

fbshipit-source-id: 1e917464b0c8a042f025128a082c784eaa3b7013
2018-07-26 19:03:40 -07:00
d1260d26fe Sleep before run (#9891)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9891

Add an argument to benchmark binary to specify the seconds to sleep before the run and after the warmup.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9880

Reviewed By: llyfacebook

Differential Revision: D9014254

Pulled By: sf-wind

fbshipit-source-id: d5566186c8ed768f1e170e9266c5f2d6077391e0
2018-07-26 14:39:17 -07:00
969b62f276 Revert D8121878: Remove template parameter from Tensor
Differential Revision:
D8121878

Original commit changeset: 4a5e9a677ba4

fbshipit-source-id: d8e2c0bb145b52fbcca323b22d1d3346f0b3249e
2018-07-26 14:02:04 -07:00
cd5adc7b5f Remove template parameter from Tensor (#13)
Summary:
Pull Request resolved: https://github.com/facebookresearch/weakly-supervised-action-detection/pull/13

Pull Request resolved: https://github.com/pytorch/translate/pull/166

Pull Request resolved: https://github.com/pytorch/pytorch/pull/9125

Closes https://github.com/pytorch/pytorch/pull/9125

Use inheritance for polymorphism, and remove template parameter
This is to change the templating in call sites, the core implementations will change later

Before Caffe2 Tensor class was compile-time fixed to bind to a particular device/context. With this change, we're making it a runtime property (stored inside the tensor), but preserve the same semantics. For example, one has to specify device type in order to create a Tensor - there are no uninitialized tensors. More specifically the changes are:

1. We added an extra argument *DeviceType* to most of the constructors of the tensor, e.g. (Tensor(DeviceType type)),
2. Semantics of constructor Tensor(const Tensor<SrcContext>& src, ContextForCopy* context); is changed, in this constructor, the second context is passed in to enable us to call the templated Copy function, it could be in a different context as source and target previously, now we'll enforce that the context should have same device type as src, if it is provided.
3. To preserve 'get-or-construct' semantics of Blob, we added specialized getter Blob::GetMutableTensor that verifies both that Blob contains a Tensor and that it's of a correct type
4. Specifically, Tensor type is not default-constructible any more (as we don't have unknown device tensors) and thus some of the code handling STL containers needs to change

Note: Some changes are postponed just to keep this diff a bit smaller. Please see `TODO`s.

Reviewed By: xw285cornell

Differential Revision: D8121878

fbshipit-source-id: 4a5e9a677ba4ac82095df959851a054c81eccf81
2018-07-26 10:25:23 -07:00
edb88b5f3a Update from Facebook (#8887)
* add opencl + fpga context

adds an opencl context inside caffe2/fb which can be used for fpga access

* [Caffe2] Force tensor inference checks to be triggered during testing

We've started to rely on TensorInference functions more for different analysis.  This diff ensures that the TensorInference function's result matches what is expected from the definition of the operator.

* Enable building //caffe2:torch with @mode/opt

In @mode/opt, python runs out of a PAR, which breaks a lot of
assumptions in the code about where templates/ folders live relative
to __file__. Rather than introduce hacks with parutil, I simply turn
template_path into a parameter for all the relevant functions and
thread it through from the top level.

* [Caffe2] Fix cost models for DotProduct and Div.  Update Tensor Inference for dot product

As title.  DotProduct states that output is a 1-D tensor (https://caffe2.ai/docs/operators-catalogue.html#dotproduct) though code suggests it is either 0- or 1-D depending on inputs.  TensorInference defined to support implementation.

* [SG-MoE] Add an option to make the experts NOT as components

* [nomnigraph] Rename and fixup convertToNeuralNetOperator API

This will make things a bit cleaner

* no longer symlink THNN.h and THCUNN.h

* forced decoder network (onnx export)

Closes https://github.com/pytorch/translate/pull/95

Add networks in ensemble_export.py to create a forced decoding network from PyTorch NMT checkpoints. This network takes an arbitrary numberized (source, target) pair and returns the model score for the translation, including penalties.

Vocabulary reduction networks are also supported, but note that target indices which are not in the possible_translation_tokens generated for the source input will be trea

* Revert schema change to fix production models

Revert schema change to fix production models

* MockLogDeviceReader - rebase on FIX

# Goal

1), Build a make_mock_log_device_reader using make_mock_reader

2), Replace the real log_device_reader here: https://fburl.com/raihwf1p

# Log by D8151734

Real log_device_reader:
```
I0529 20:29:05.373108 954994 tensor.h:839] Tensor print_net/log of type std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >. Dims: (): read_net/ParseOpenTrainingRow:0
I0529 20:29:05.373244 954994 tensor.h:839] Tensor read_net/ParseOpenTrainin

* [C2/D2][1/n]: Nonnegative-Constrained Optimization -- log barrier

implement log barrier as a regularization method

* Add teacher weight screening.

Add teacher weight sceening according to teacher labels. If teacher label is zero, we do not use the distill loss in the objective function.

* Add NormalizerContext

See task for more detail. This implementation is a copy of what exists for RegularizerContext except for how the parameters are defined in the model_definition thrift file.

I'll try an alternative implementation which overrides the default arguments of functions instead like for argscopes in tensorflow.

https://github.com/pytorch/pytorch/compare/master...MaximeBoucher:update-from-facebook-0939578c068c?expand=1

* Adding cosine similarity option in dot processor

Add pairwise cosine similarity option in dot product.
Add an option to concate dot product and cosine similarity.
Add test cases.

* [nomnigraph][redo] Concat elim for sparseNN

Same as D7962948, which was reverted because Operator Schema was not
defined

* [pytorch] Revert pytorch/pytorch#7918 'Release GIL when copying to shared memory', breaks ASAN

Revert this pytorch diff that breaks ASAN when running Filament in dev mode; in opt mode it gives "bad file descriptor" errors. Looks like a race when copying tensors to shared memory in multiple mp.Queue's (which spawn separate threads).

https://github.com/pytorch/pytorch/pull/7918/files

* [nomnigraph][mobile] Enable nomnigraph by default, use -Oz on nomnigraph related code to reduce code size

enables nomnigraph and reduces codesize

* [Warmup] Allow both offline incremental training and online training

Change plan name on saving side and reading side to support both training type

This diff depends on D8128530 and D8168651.

* Revert D7802642: [Warmup] Allow both offline incremental training and online training

This reverts commit afc213cf9b36cecf75333a788391c4d09f4afccc

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* Add legacy grad logic to fix div op on old graphs.

Add legacy grad logic to fix div op on old graphs.

* Correctly propagate operator failures

Propagate errors from operators that throw exceptions and return false

* Revert D8374829: [caffe2][nomnigraph][redo] Concat elim for sparseNN

This reverts commit 6dda028c463e54bb5c32188bbbe9202107e188a5

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* [Caffe2] Added extra_info to core.DeviceOption(), enforced extra_info to be inherited in scope.DeviceScope

extra_info is a newly defined field in DeviceOption proto. This diff added extra_info to the core.DeviceOption().  And, In scope.DeviceScope(), this diff enforce the new scope to inherit the extra_info from old scope.

* [opt] hgdirsync wasn't enabled, merge diverged code

Here's the damage, P59732616 basically xplat was left behind but had
the change from assert to CAFFE_ENFORCE

* OMP parallelism over RoIs for RoIAlign op

Simpler to parallelize over RoIs. Shouldn't affect other uses as it relies on
the number of OMP threads set during startup.

PR: https://github.com/pytorch/pytorch/pull/8562

* Use int64_t for shape in FillOps

to avoid overflow of int32

* Implement Rotated RoIAlign op

Based on Rotated RPNs as explained in https://arxiv.org/abs/1703.01086.
The idea is simple - orientation/angle is added as an RPN
anchor parameter and then the angle is further regressed similar to bbox
coords. There are some additional changes related to NMS and IoU, but besides
that it's a direct extension to Faster-RCNN. Further details in https://fb.quip.com/sZHlA1iMfWPZ.

RoIs are represented in [center_x, center_y, width, height, angle] format.
`angle` repre

* Rotated RoIAlign op CUDA forward implementation

CUDA forward impl for D8415490

* RoIAlignRotated op CUDA backward pass implementation

TSIA

* All remaining fixes to eliminate process_github.sh

Most of this diff has already been reviewed separately, except for the parts relating to _thnn/utils.py and _utils._internal.py

remove skipIf(True, 'Fbcode') line from process_github.sh

replace sed of cpp file with #ifdef to control cudnnDestroy use

undo sync-time deletion of .gitattributes, remove process_github.sh

switch to using _utils._internal rather than try-import-except

This diff also fixes the open-source bug where rebuilds have

* Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training"

Original commit changeset: 7707d2efe60e The original diff is backout becuase the online trainer package is backed out. This code would only work with new online trainer package

* [easy] improve error log in adagrad op

as title

* re-allow use of thnn_h_path

This fixes cffi usage in OSS

* [4/4] [tum] paralyzing layerNorm for GPU full sync

as title

* add compile=False to pytorch tests, remove hack with pyc

* Add shape and type inference for RowWiseArgMax operator

See title

* Revert D8515341: Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training"

This reverts commit 78167eeef0af16b60f72c82f9dcdda9b41b4dcbd

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* [fix-flaky-test] mock_hive_reader_test flaky, because GlobalCounter collects local counts intervally

# Problem

`MockHiveReader` uses `GlobalCounter` to limit `max_examples`.

GlobalCounter on server node collect local counts from worker nodes every 1 sec.

This 1 sec delay makes it impossible to limit exactly to the `max_examples`, it will definitely exceed `max_examples`.

# Plan

Given,
```
Expected num_examples = max_examples + num_examples/sec (Read Speed) x 1 sec (GlobalCounter Sync Int

* [Caffe2] Fix FCGradient cost inference.  Prevent overflow in cost inference

FCGradient missed a factor 2 in the `num_outputs == 3` case.  Overflow was occurring with flop calculation for FC.  Changed types to `uint64_t` to prevent future problems.

* Fix binary ops with empty inputs

Fix binary ops with empty inputs

* Support the filling of input blob with provided data

as title for Biz Integrity case

* Back out "Revert D8515341: Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training""

Original commit changeset: 30c55dd38816 Original diff is reverted due to introducing bad integration test. Fixed the integration test.

* [c2][easy] improve pack ops error loggings

as desc.

* Add ShapeTypeInference for LpNorm operator

As desc

* Shard test_nn to reduce runtime for each test target

Closes https://github.com/pytorch/pytorch/pull/8793

The current test_nn would time out and be disabled in GreenWarden, and we need to have an option to split it up in order to pass the stress test. Right now GreenWarden roughly allows running 100 test cases in test_nn before timing out, and here we have an option to divide test_nn into 30 shards (with ~40 tests in each shard) to allow for some test suite growth in the future.

* Change default caffe2_streams_per_gpu to 1

* Remove IN_SANDCASTLE from common.py and test_nn.py

We prefer to disable the failing tests through Sandcastle UI instead.

* Add a new class for an updated prof_dag.proto

This diff contains:
- An updated prof_dag.proto that contains blob profiles.
- A class to deserialize this information (serialization is in a follow up diff)
- Update to separate profiling information from NeuralNet (and use it as part of the class above).
- Unit tests

* Lambdarank for SparseNN

This diff adds a lambda_rank_layer for SparseNN.
 changes include
1) Adds support for multi sessions in c2 op
2) Adds support for two different loss functions in c2 op
3) Unit tests for op

* Revert D8586950: Back out "Revert D8515341: Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training""

This reverts commit 012220ed63eccc35659a57b31d16a3625da6317b

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* [easy] A few fixups to multithread predictor benchmark

(1) support perf on T6 server
(2) remove dead code

* fix a bug about the map size

as title

* Fix reduce sum on in-place case.

Fix reduce sum on in-place case.

* [Warmup] Reland reverted diff Allow both offline incremental training and online training

Closes https://github.com/pytorch/pytorch/pull/8827

fix net transform integration test. Allow offline and online trainer to coexist D7802642.

* Add StoreHandlerNotAvailableException

Add an exception for a store that is not available or has been
deleted.

* Use exception handling for fault tolerance, missing KV store

Remove status blobs to communication ops so that exceptions propagate on
failure.

* [C2/D2][2/n]: Nonnegative-Constrained Optimization -- bounded grad proj

for simple bounded constrained optimization, incl non-negative box constraints.

* [GanH]: Adaptive Weighting with More Estimations

With implemented postivity optimization, we now learn adaptive weights with different
parameterizations.

This improves parameter estimation and training stability.

* Revert some changes for landing

* Remove AutoNoGIL in StorageSharing

* Temporarily disable net_tests

* Revert "[Caffe2] Force tensor inference checks to be triggered during testing"

This reverts commit 67ef05c22b2f71b4a489695384932f968384a2a4.

* Revert "Fix reduce sum on in-place case."

This reverts commit 6cb8a8e1b3db7b6d20941b0053e3f3836068eb64.

* Revert "Revert "Fix reduce sum on in-place case.""

This reverts commit 130a257c0893dc09f4bd6e6a45d112261807fd2c.
2018-06-26 14:55:48 -07:00
4d025a6a54 add wipe_cache option (#8204)
as title
2018-06-06 13:08:39 -07:00
82b981e4db Update from facebook 1ee4edd286a3 (#8040)
* Adding instance weight to batch distill loss

as title

* add bfloat 16-31

added bfloat 16-31 and their respective unit tests

* [CUDA9] Upgrade - fbcode

CUDA9 upgrade diff D5654023 has been out for a while thanks to Pieter. But with time growing it's becoming quite hard to rebase, because of the symlinks and auto-generated build/config files in tp2. Break D5654023 into two diffs, one touching tp2 config files, and another one touching fbcode TARGETS file (adding nvcc flag). These two should be a bit easier to rebase (for detailed procedure see "Test Plan").

This diff can only be committed if:
1. CUDA 9 rpm is rolled out fleet-wide (TBD)
2. NVidia driver 390.40 is rolled out fleet-wide (done)
3. Upgrade CUDA 9.1, cudnn 7.1, nccl 2.1 (done)
4. Make sure all dependents are built (done)
5. Test all C2 operators, PyTorch (see test plan)

* Share intermediate int32 buffer across Conv ops

Adding a known type

* [C2 fix] infer function for ensure_cpu_output_op

this is adding the missing device funtion for ensure_cpu_output_op

* [int8] Add blob serializer/deserializer for Int8TensorCPU

To export to logfiledb

* [nomnigraph] Add try catch block to optimization passes in predictor

This will catch failures that happen in the optimization pass.

* Caffe2: avoid static initialization order fiasco for CAFFE_ENFORCE

CAFFE_ENFORCE uses strack trace fetcher. Which is currently a
global static variable. If at static initialization time CAFFE_ENFORCE
is used, this is a SIOF. Recently CAFFE_ENFORCE was added into init
functions registration, so we started to see this.

Meyers singleton is going to provide safety here. If stacktrace
fetcher was not registered yet, it will just use a dummy one.

* NUMA support in SparseNN CPU benchmark

Adding support for NUMA in SparseNN CPU benchmark

* [mobile-roofline] Add logging needed for roofline model

This should be all that's needed

* Let the operators using the same input if the operators are not chained

or else, we have to change the input data dims

* fix null-pointer-use UBSAN errors in in reshape_op.h

* revert previous fix on input blob name

as title

* Adding flag to let MineHardNegative automatically extract single value from dict

Model exporter requires the output of the model to be a struct. This makes it convenient to use those models directly in MineHardNegative by allow automatic extraction of the single element of dict, which is a common use case.

* Reverting change that broke internal tests back to OSS compatible state
2018-06-01 17:41:09 -04:00
49f8581745 Update from facebook (#7855)
* [mpscnn] MPSCNNChannelShuffle

att

* [Easy] Adding tags as an argument to the functional layer

Without it "tags" would be added as an argument to the operator.

The change here is based on the assumption that there is no operator that takes "tags" as an argument.

* Fix locally_connected_op schema check.

Fix locally_connected_op schema check.

* [C2] Add TypeAndShape inference for few more operators

As desc

* [c2] Shape inference should support 0 as dimension

Tensors can have 0 in their dimension.

* Make MockHiveReader loop over and support max_examples

Replace DatasetReader with RandomDatasetReader.

So that Mock Hive Reader can simulate a large data input using a small sample file as source.

* Utility function to wipe cache between benchmark runs

Caffe2 benchmark does not wipe out cache between runs, and this potentially creates an unrealistically optimistic picture of performance. This diff adds utility function to wipe out the cache.

* Allow caffe2 GlobalInit to be invoked multiple times

Allow caffe2 GlobalInit to be invoked multiple times. Will re-parse gflags and update logging levels on successive invocations, but will not re-run init functions or perform other one-time initialization.

* Add Caffe2 GlobalInitIsCalledGuard to base net and operator classes

Warn if caffe2's GlobalInit function has not been invoked before creating an operator or net object. This is based on discussion here: https://fb.quip.com/kqGIAbmK7vNG

* Rethrow current exception on failure

Rethrow current exception instead of copy constructing a new one on op failure.

* Make `clone()` return subclass of List/Struct

`clone()` is not working correctly when we subclass those classes

* Wipe the cache before the net run

the util function is copied from D7409424
will rebase once D7409424 is landed.

* [Caffe2] [Mobile] Support utils/cast.h::GetCastDataType with LITE_PROTO builds

* Correct includes

async_polling include -> async_base include

* Prepare execution flags for executor migration

Making async_scheduling aware of underlying net type to prepare for executor
migration

* Add operator level observers into async executor

Adding operator level observers into RunAsync operators' calls

* Cleanup TEST_Benchmark

Remove duplicate code and provide default implementation in NetBase

* [C2] Fix type and shape inference for binary comparison ops

As desc.

* Add GlobalInit to predictor to ensure initialization is always done before prediction

FACEBOOK:

Redo D7651453 the correct way.

Now use a static variable for the arguments passed to GLog

* Remove spammy log message

This method is currently used in various places inside Caffe itself.

* Disable events for operators inside a chain

We don't need to use events in operators within a chain because the chain is
always scheduled on a single stream, keeping only first and last event for
scheduling purposes

* Ensure correct finish run order

In rare cases we might call finishRun and trigger net's destruction while
another worker is still holding shared_ptr to a thread pool, that can cause
thread pool destruction from within a worker thread in case no other nets are
using the pool. This diff fixes the order of calling finishRun and also changes
pool() to return raw pointer to keep pool's ownership within the net

* Reduce unnecessary polling

Make sure we don't waste CPU by polling operators that we can set an efficient
callbacks on

* Squash commit of syncing 9506eeb from github to fbcode

Patch xplat buck fix

add virtual destructor to OptimizationPass

add virtual destructor to OptimizationPass

build fixes for sync

build fixes for sync

* Fix net tracing

Fix net tracing from async_scheduling

* Fix logging
2018-05-29 11:38:02 -07:00
f94ae3ba1d Update from facebook (#7696)
* Fix handling of empty batches in SumReduceDimsOp

As titled

* Deferrable async_scheduling finishRun fix

Proper order of finishing run operations in deferrable_async_scheduling net

* Simplify exception handling in async_scheduling

Simplify exception handling, no need to busy wait, thread that processes the
last task can finish the run

* [C2]worker_coordinator_memorize_worker_ids

As titled. This is related to T28689868, where the number of blobs we want to create is equal to the number of worker ids

* Add unit test for nets with no type set

* Ignore total length argument in sympolic_pad_packed_sequence

1- There was a mistake in the code that total_length was added to the wrong symbolic function (pack_padded_sequence) instead of (pad_packed_sequence)
2- No need to throw an exception if total_length is given since it is only used to enable data_parallel training on multi-gpus and doesn't have anything to do with onnx export, so just ignore it. https://fburl.com/tk4gciqp

* Add support for MKLDNN to async_scheduling

Just add MKLDNN as a possible CPU option to async_scheduling's pool function

* [AuFL][ensemble] support branch output for prediction

This diff supports using predictions from different branches and thus enables model ensembling (not fully independent).

* Fix a bug in add_loss in layer_model_helper

As titled.

* Support lradaption for adam

1.lr adaption operator
2.apply to dense adam

* Perf tweaks for async_scheduling

Restore single pool option + remove unnecessary (no-ops) calls

* add quantization to SparseSimdAdagradOp

add a bunch of quantization signatures to SparseSimdAdagradOp, implementations to come next

* [sr] [codemod] Change all SR callsites to use new API

@allow-large-files

This diff refactors all callsites of SR to use the slightly changed API introduced in the diff below. Really what this means is that you need to include the correct header. Also if you were using `ClientFactory::newFactory` you need to not prefix it with `ClientFactory::`.

```
cd ~/fbsource/fbcode
find ./ -type f -exec sed -i -e 's:#include "servicerouter/client/cpp2/ClientFactory.h":#include "servicerouter/client/cpp2/ServiceRouter.h":' -e 's:#include <servicerouter/client/cpp2/ClientFactory.h>:#include <servicerouter/client/cpp2/ServiceRouter.h>:' -e 's/ClientFactory::newFactory(/newFactory(/g' {} \;
```

Also manually fixed spots that couldn't be done automatically (or broke because they depended on transitive includes).

* Back out "Fix handling of empty batches in SumReduceDimsOp"

Original commit changeset: 282da1730cc2 This commit is blocking the
Github->fbcode sync, which really needs to get merged ASAP. D7881937 which this
diff depends on will be reverted in the sync D7990948 which causes this to
break. The sync diff cannot be patched with this reversion because it must be
landed against base revision 5c8c099 , and D7881937 must not be included in the
sync diff because it is breaking GPU tests that are not available in sandcastle
: https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-cuda8.0-cudnn6-ubuntu16.04-test/3638/console
for one example.

* Add the flow to support operator benchmark

1) generate model with the operator 2) upload to everstore 3) generate model spec into json file 4) start running the benchmark

* [tum][gpu] Connect DPM trainer with flow and unit tests

This diff:
- Fix some small bugs for Yiming's recent changes to parallelizer, so it suits real use cases.
- Add correct tags to the TUM code, so we can do data parallel transform
- pass extra info when instantiation.
- add unit test for using DPM in TUM model

After this diff, we can do simple box, multi-gpu fully-sync trainer for TUM in Fblearner workflow, but may still need to do speed benchmarking.

* w/o normalized lradaption for adam dense only

The previous lr adaption includes a normalization step when performing the dot product operation. This is not exactly same as what is proposed in the paper. I add normalization as an option. Without it, the operator performs exactly what the paper proposed. With the option, we add the normalization step

* [fb] Use SharedPromise in DeferrableAsyncSchedulingNet

This code is to simplify DeferrableAsyncSchedulingNet by removing condition
variable + small fixes

* [tum] implement cuda sparseLengthsMean and LengthsMean

as title

* Adding an optional parameter to allow use of protobufs in InferShapesAndTypes function.

Adding an optional parameter to allow use of protobufs in InferShapesAndTypes function.

* Move feature_to_index to FeatureSpec.feature_to_index

move feature_to_index to FeatureSpec.feature_to_index to avoid override other fields

* [Caffe2] Rename bytes_moved to bytes_written

Just a rename in preparation for supporting bytes_read.

* [c2] fix ReduceFrontSumOp for empty case by setting 0

otherwise, it may use the results from last iteration when it's empty batch.

* [Caffe2] [Int8] Improve Intel CPU performance

* [Easy] Improve PrependDim op logging

as titled

* DBFileReader expand db_path using os.path.expanduser(..)

Since there are a lot of possible use cases of `DBFileReader` to read from user home path, like `~/local/sample.db`, I want to save people's trouble of calling `os.path.expanduser(db_path)` themselves.

* [Caffe2] Add bytes_read to cost structure

We're adding analytical read bytes to cost functions.  This extends the structure accordingly for all CostInference defined operators.
Additionally, some small bug fixes were performed:
1) Cost functions now extract type information of operands instead of assuming float

* Fix sleef on aarch64 for hhvm

@bypass-lint

Rename flag

* Remove duplicated part in caffe2/ideep/operators/conv_op.cc

should be sync error

* Rename test helper function test_adagrad_sparse_helper to adagrad_sparse_test_helper to avoid confusing pytest
2018-05-19 23:10:48 -07:00