Update from facebook 1ee4edd286a3 (#8040)

* Adding instance weight to batch distill loss

as title

* add bfloat 16-31

added bfloat 16-31 and their respective unit tests

* [CUDA9] Upgrade - fbcode

CUDA9 upgrade diff D5654023 has been out for a while thanks to Pieter. But with time growing it's becoming quite hard to rebase, because of the symlinks and auto-generated build/config files in tp2. Break D5654023 into two diffs, one touching tp2 config files, and another one touching fbcode TARGETS file (adding nvcc flag). These two should be a bit easier to rebase (for detailed procedure see "Test Plan").

This diff can only be committed if:
1. CUDA 9 rpm is rolled out fleet-wide (TBD)
2. NVidia driver 390.40 is rolled out fleet-wide (done)
3. Upgrade CUDA 9.1, cudnn 7.1, nccl 2.1 (done)
4. Make sure all dependents are built (done)
5. Test all C2 operators, PyTorch (see test plan)

* Share intermediate int32 buffer across Conv ops

Adding a known type

* [C2 fix] infer function for ensure_cpu_output_op

this is adding the missing device funtion for ensure_cpu_output_op

* [int8] Add blob serializer/deserializer for Int8TensorCPU

To export to logfiledb

* [nomnigraph] Add try catch block to optimization passes in predictor

This will catch failures that happen in the optimization pass.

* Caffe2: avoid static initialization order fiasco for CAFFE_ENFORCE

CAFFE_ENFORCE uses strack trace fetcher. Which is currently a
global static variable. If at static initialization time CAFFE_ENFORCE
is used, this is a SIOF. Recently CAFFE_ENFORCE was added into init
functions registration, so we started to see this.

Meyers singleton is going to provide safety here. If stacktrace
fetcher was not registered yet, it will just use a dummy one.

* NUMA support in SparseNN CPU benchmark

Adding support for NUMA in SparseNN CPU benchmark

* [mobile-roofline] Add logging needed for roofline model

This should be all that's needed

* Let the operators using the same input if the operators are not chained

or else, we have to change the input data dims

* fix null-pointer-use UBSAN errors in in reshape_op.h

* revert previous fix on input blob name

as title

* Adding flag to let MineHardNegative automatically extract single value from dict

Model exporter requires the output of the model to be a struct. This makes it convenient to use those models directly in MineHardNegative by allow automatic extraction of the single element of dict, which is a common use case.

* Reverting change that broke internal tests back to OSS compatible state
This commit is contained in:
Bram Wasti
2018-06-01 14:41:09 -07:00
committed by Soumith Chintala
parent 9060b7f4e2
commit 82b981e4db
23 changed files with 244 additions and 60 deletions

View File

@ -82,7 +82,7 @@ def IsOperatorWithEngine(op_type, engine):
return C.op_registry_key(op_type, engine) in _REGISTERED_OPERATORS
def DeviceOption(device_type, cuda_gpu_id=0, random_seed=None, node_name=None):
def DeviceOption(device_type, cuda_gpu_id=0, random_seed=None, node_name=None, numa_node_id=None):
option = caffe2_pb2.DeviceOption()
option.device_type = device_type
option.cuda_gpu_id = cuda_gpu_id
@ -90,6 +90,9 @@ def DeviceOption(device_type, cuda_gpu_id=0, random_seed=None, node_name=None):
option.node_name = node_name
if random_seed is not None:
option.random_seed = random_seed
if numa_node_id is not None:
assert device_type == caffe2_pb2.CPU
option.numa_node_id = numa_node_id
return option
@ -2256,6 +2259,8 @@ def InjectCrossDeviceCopies(net, blob_to_device=None, blob_remap=None,
Assumptions:
1. every external inputs of this net is already in blob_to_device!
2. if not, this function will use net device option
3. InferOpBlobDevices might fail to get the correct inference for ops like
EnsureCPUOutput that could take in input from multiple places.
'''
new_net = net.Clone(net._net.name + '_cross_device', keep_schema=True)
del new_net._net.op[:]