Compare commits

...

360 Commits

Author SHA1 Message Date
4c5b1cc026 version bump to 1.1 (#15554)
Summary:
version bump to 1.1
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15554

Differential Revision: D13550818

Pulled By: soumith

fbshipit-source-id: 8a28582c98b42c081e103581551a01fd96c9f42d
2018-12-26 15:44:25 -08:00
8c6ff91d57 In README.md CMAKE_PREFIX_PATH should be CONDA_PREFIX when using an conda virtual environment (#15548)
Summary:
In current README.md, `CMAKE_PREFIX_PATH` is set to conda root even when you have activated an virtual environment. When an conda virtualenv is activated, packages are installed in `CONDA_PREFIX`, not conda root. I think `CMAKE_PREFIX_PATH` should also be set to `CONDA_PREFIX` in this case. I think some build issues can be solved with the new instruction. Maybe something like #14954.

soumith,
When I made PR #15335 I was confused and made a wrong point. I think this PR could be the real solution.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15548

Differential Revision: D13549681

Pulled By: soumith

fbshipit-source-id: 42d855b6e49ee58d735d2f4715d3e5752a748693
2018-12-26 12:57:07 -08:00
cdb8edce75 add from_pretrained method to EmbeddingBag (#15273)
Summary:
The `EmbeddingBag` module does not include a `from_pretrained` method like the `Embedding` module.  I added it for consistency between the two modules.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15273

Differential Revision: D13547842

Pulled By: soumith

fbshipit-source-id: 8ffde51ff0c1e8fc8310263b6f375da88089ff7d
2018-12-26 08:35:39 -08:00
5ac95758e2 Make argument size checking consistent across CPU and CUDA for torch.gesv (#15430)
Summary:
There is an inconsistency in the size of arguments for gesv, which is fixed in this PR.

Changelog:
- Replicate check in CPU as done for CUDA
- Fix argument ordering (minor) in CUDA checking

Fixes #15328

Differential Revision: D13531167

Pulled By: soumith

fbshipit-source-id: c4b4e4fc12880208d08e88d1e47e730ac98c2ad3
2018-12-26 08:32:28 -08:00
f636dc9276 clang format world (#15524)
Summary:
The PR clang-formats everything in `torch/csrc/jit/` and adds it to the pre-commit hook.

Here is a list of non-mechanical changes:
- I went over each file and fixed up whenever I could tell that clang-format was clobbering comment formatting.
- Made the macros in register_prim_ops a little more clang-format friendly by omitting trailing commas
- Refactored autodiff.cpp to use a helper class with explicit state rather than a bunch of capturing lambdas
- Small improvements to the precommit hook clang-format
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15524

Differential Revision: D13547989

Pulled By: suo

fbshipit-source-id: 3ff1541bb06433ccfe6de6e33f29227a2b5bb493
2018-12-26 06:55:01 -08:00
d4712ee218 Added correct isinf handling for Integral tensors (#15489)
Summary:
Currently torch.isinf on integral tensor will raise RuntimeError: value cannot be converted to type int16_t without overflow: inf.
This pr will suppress the error and return false(0) for all integral tensors. The behavior will also be consistent with np.isinf
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15489

Reviewed By: zou3519

Differential Revision: D13540786

Pulled By: flashhack

fbshipit-source-id: e730dea849da6a59f3752d347bcfbadfd12c6483
2018-12-26 06:36:09 -08:00
d602ddcda3 Trivial comment update in autograd/function.h (#15529)
Summary:
I removed the explanation on `num_inputs` parameter. This parameter was removed in #8168

colesbury
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15529

Differential Revision: D13547854

Pulled By: soumith

fbshipit-source-id: 8a9ac58f2c93a2533b82ec63089477166ed0bcb9
2018-12-26 02:25:54 -08:00
6e4be0af2e Fix failed type cast in Windows Debug Build (#15333)
Summary:
Fixes #15330
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15333

Differential Revision: D13531317

Pulled By: soumith

fbshipit-source-id: b956f27bd7fa33cbdf405338fcbcbc7df2fd629f
2018-12-26 00:48:58 -08:00
12e0ed55b4 Upgrade MKL-DNN to version 0.17 and static build MKL-DNN (#15504)
Summary:
Upgrade MKl-DNN to 0.17 and static build MKL-DNN to fix the potentail build error due to old mkldnn version in host system.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15504

Differential Revision: D13547885

Pulled By: soumith

fbshipit-source-id: 46f790a3d9289c1e153e51c62be17c5206ea8f9a
2018-12-25 22:56:51 -08:00
2fe5c29d81 remove legacy from docs (#15112)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/15062
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15112

Differential Revision: D13547845

Pulled By: soumith

fbshipit-source-id: 61e3e6c6b0f6b6b3d571bee02db2938ea9698c99
2018-12-25 21:57:54 -08:00
60b13d1f71 Use at::zeros instead of torch::zeros in non-differentiable example (#15527)
Summary:
There was a typo in C++ docs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15527

Differential Revision: D13547858

Pulled By: soumith

fbshipit-source-id: 1f5250206ca6e13b1b1443869b1e1c837a756cb5
2018-12-25 21:50:17 -08:00
2ed95c5871 Fix the compare logic in function overflows for MSVC (#15499)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/15497.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15499

Differential Revision: D13547835

Pulled By: soumith

fbshipit-source-id: a674da93bf905a0b81f0cc60449ccb97c2746926
2018-12-25 21:50:15 -08:00
521894c490 Allow converting char tensor to numpy; add [fi]info.min (#15046)
Summary:
https://github.com/pytorch/pytorch/pull/14710 with test fixed.

Also added `finfo.min` and `iinfo.min` to get castable tensors.

cc soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15046

Reviewed By: soumith

Differential Revision: D13429388

Pulled By: SsnL

fbshipit-source-id: 9a08004419c83bc5ef51d03b6df3961a9f5dbf47
2018-12-24 09:11:24 -08:00
b7bc49ad70 Port replication_pad1d to ATen (#15507)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15507

Pull Request resolved: https://github.com/pytorch/pytorch/pull/15485

port replication_pad1d

Reviewed By: ezyang

Differential Revision: D13531920

fbshipit-source-id: dcd64ebd2c24b7431996231b8d5addfb600b1072
2018-12-24 06:34:02 -08:00
ad6799537e Support stateful dataset (#15096)
Summary:
Currently re-implements the dataloader for stateful datasets. Outstanding work:
- Refactor DataLoader and DataLoader2 to have common base classes and only differ in specifi pieces of logic,
- Figure out how to not duplicate the `MapDataset` logic for stateful vs. non-stateful
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15096

Differential Revision: D13522043

Pulled By: goldsborough

fbshipit-source-id: 08e461ca51783047f11facc4d27dfa2e4f1e4c2a
2018-12-24 06:26:40 -08:00
8cd917812b put interactive prompt in bash (#15521)
Summary:
This makes compatibility with different versions of python a little bit simpler, and fixes a problem where stdin wasn't being read from the terminal properly in the prompt.

zdevito This should fix your EOF exception.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15521

Differential Revision: D13546358

Pulled By: suo

fbshipit-source-id: fb7551a86c888196831c046d9d9848e7ff05b925
2018-12-24 05:37:46 -08:00
f8a56bf476 Fix the iterator category for torch::data::Iterator (#15500)
Summary:
Try to fix https://github.com/pytorch/pytorch/issues/14410.
Additional info: From this [page](https://stackoverflow.com/questions/14062297/canonical-way-to-define-forward-output-iterator), If we change it into `input_iterator_tag`, it doesn't mean the `output_iterator_tag` is lost.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15500

Differential Revision: D13545773

Pulled By: soumith

fbshipit-source-id: 327bfb7be83d53e42925e0e391b2a4277e3a1b36
2018-12-23 19:49:44 -08:00
c07647814b Precommit hook: just warn if no clang-tidy (#15514)
Summary:
The precommit hook shouldn't hard fail if there's no `clang-tidy`, just warn and omit the check.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15514

Differential Revision: D13545776

Pulled By: suo

fbshipit-source-id: 9bf3f8ee18703c6d1a39eb7776092fb5e120d2a1
2018-12-23 14:38:13 -08:00
4a716250cc Add torch.rot90 to torch.rst
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15512

Differential Revision: D13545775

Pulled By: soumith

fbshipit-source-id: 2a8896571745630cff4aaf3d5469ef646bdcddb4
2018-12-23 14:31:11 -08:00
51f1c4fea5 fix parallelization detection for CPU foreach_reduced_elt (#15483)
Summary:
This does two things:

(1): revert #15114 , which is incorrect and actually just completely disables parallelization in this function (because `at::get_num_threads` returns `-1` unless it has been set explicitly)

(2): Fix our (FB-internal) failing tests that #15114 was intended to fix, by still working correctly in a setup where `#ifdef _OPENMP` is set and `omp_get_max_threads() > 1` , but `#pragma omp parallel` only launches one thread. I believe such an unusual situation only exists in certain unit tests within FB infra but we still need it to work.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15483

Differential Revision: D13538940

Pulled By: umanwizard

fbshipit-source-id: a3362c7ac7327ced350d127bb426f82c59e42732
2018-12-23 12:51:40 -08:00
4e4ef0cffb add rowwise adagrad lp test (#15082)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15082

We didn't have unit test for low-precision rowwise adagrad

Reviewed By: chocjy

Differential Revision: D13300732

fbshipit-source-id: 46e7bdfc82c5a6855eeb6f653c0a96b0b3a20546
2018-12-22 10:25:39 -08:00
e012b183dd handle empty inputs to SparseLengthsMean correctly (#15389)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15389

SparseLengthsMean was generating uninitialized data for empty inputs (lengths == 0). We should return zeros.
The unit tests were also not covering this special case which is fixed by this diff.

Reviewed By: salexspb

Differential Revision: D13515970

fbshipit-source-id: 3c35265638f64f13f0262cee930c94f8628005da
2018-12-21 22:20:14 -08:00
58a7f2aed1 Add pthreadpool_create and pthreadpool_destroy (#15492)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15492

Add pthreadpool_create and pthreadpool_destroy, which are used by NNPACK tests.

Reviewed By: Maratyszcza

Differential Revision: D13540997

fbshipit-source-id: 628c599df87b552ca1a3703854ec170243f04d2e
2018-12-21 20:28:18 -08:00
90aa21e795 Metadata for input/output formats in model file proto. (#15252)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15252

We would like to extend the model file format to include strongly type, semantic information
about the model inputs and outputs.

The goal is for a user to be able to consider a model file like a function with
a well defined API describing what the inputs and outputs would be.

Reviewed By: dzhulgakov

Differential Revision: D13009915

fbshipit-source-id: 5df124a876ad03c05fbdaacae0eab659637734c1
2018-12-21 17:42:38 -08:00
f3a588fede add len to nativeResolver (#15488)
Summary:
(otherwise len is not resolvable using torch::jit::compile)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15488

Differential Revision: D13539991

Pulled By: zdevito

fbshipit-source-id: 3ba85fa7b1adb163f9229c568f7997d22321903d
2018-12-21 16:47:15 -08:00
934fc28656 Remove NoneGenerator
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15335

Differential Revision: D13540357

Pulled By: driazati

fbshipit-source-id: a289e5944b65872103f68faac74e18f10e7c6fff
2018-12-21 16:33:37 -08:00
1dcf2ea096 Add self to Python printer reserved words (#15318)
Summary:
This adds `self` to the list of reserved words and also sorts the lines and prevents the tracer from naming values 'self' (which happens in torch/tensor.py)

Fixes #15240
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15318

Differential Revision: D13540192

Pulled By: driazati

fbshipit-source-id: 46ae02e51b1b31d5c62110fa83ba258ea6bada27
2018-12-21 16:02:07 -08:00
70aafad08a AD support for adaptive_avg_pool2d (#15459)
Summary:
This adds AD support for adaptive_avg_pool2d, which is necessary for resnet50 in pytorch/vision:master. cc: soumith asuhan dlibenzi

apaszke  I saw that autodiff bug you fixed in #15403 , as it doesn't prevent this PR from passing, so I'll leave it for your PR to fix it. :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15459

Differential Revision: D13534732

Pulled By: ailzhang

fbshipit-source-id: 4e48b93e35d5ecfe7bd64b6a132a55b07843f206
2018-12-21 15:38:24 -08:00
01be9b7292 Handling nullptr case
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15467

Reviewed By: Maratyszcza

Differential Revision: D13536504

fbshipit-source-id: ab46ff6bb4b6ce881c3e29d7e6a095ea62289db4
2018-12-21 15:08:00 -08:00
235d47760b Relax check on outputs (#15458)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15458

many nets in the wild seem to have outputs that are never produced by the net.

Reviewed By: ZolotukhinM

Differential Revision: D13534185

fbshipit-source-id: 2b23b39c28404c53f68868f3bf6df53c5fea9eab
2018-12-21 14:19:37 -08:00
6bf05bfde6 allow non-final returns (#15463)
Summary:
This PR allows a subclass of programs that have return statements that are not final in the graph.

`final_returns.h` contains the a comment describing how this is accomplished.
To minimize complexity in `compiler.cpp`, this pass is done as an AST-to-AST rewrite before the compiler runs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15463

Differential Revision: D13538962

Pulled By: zdevito

fbshipit-source-id: 67105ca873351825b4a364092ab1873779f3e462
2018-12-21 14:01:33 -08:00
3da4a04733 Fixed trivial typos in Dropout2D and Dropout3D classes (#15200)
Summary:
Fixed trivial typos in Dropout2D and Dropout3D classes

weiyangfb
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15200

Differential Revision: D13537888

Pulled By: ezyang

fbshipit-source-id: 8fb06027ca663a2e4bfa016af400698ae3c88ad1
2018-12-21 11:58:10 -08:00
ff8fbc4f23 Updating submodules
Reviewed By: cdelahousse

fbshipit-source-id: 59d7a5b82fb78bc2d2285d0896e35c262512ffb9
2018-12-21 11:47:05 -08:00
7e2ec24886 eq_fixes (#15475)
Summary:
fixes #15464 .
cc : ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15475

Differential Revision: D13537812

Pulled By: ezyang

fbshipit-source-id: 127adf612ac8b3d3a64baa3d12a53daba7d3e4b8
2018-12-21 11:43:06 -08:00
d9cad71b36 Enable running collect_env.py without building PyTorch (#15468)
Summary: Closes #15346

Differential Revision: D13537873

Pulled By: ezyang

fbshipit-source-id: 7765ce4108dae9479d8900c0815cc2f174596a83
2018-12-21 11:37:43 -08:00
ac506f5820 Back out "[nomnigraph][executor] computeChains with nomnigraph" (#15451)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15451

Original commit changeset: ccd050bfead6

Reviewed By: ilia-cher

Differential Revision: D13533161

fbshipit-source-id: 1d0dcd54c2e3875aab015f3e996693e67a449b87
2018-12-21 11:09:27 -08:00
acbd9c49b0 Direct FBGEMM integraton into ATen (#13777)
Summary:
This PR implements infrastructure for post-processing a model to apply int8 quantization to its `nn.Linear` modules. Highlights of the implementation:

1) Inputs and outputs are `float` (quantized and packed internally), but the weight is quantized and packed ahead of time for efficiency. This implementation performs well in small-batch size GEMM calls. It should not be considered a general-purpose quantized GEMM kernel.
2) Weight packing is dependent on machine architecture (e.g. vector register width), so it is done just-in-time. Concretely, it is done on model load for the weights and it is done during operator execution for the input value.
3) Biases are unquantized
4) We fail loudly if we are attempting to run this on a machine that does not support FBGEMM. This is because we do not want a model's numerics to differ based on which machine it is run on. A model containing these FBGEMM ops *must* be run with FBGEMM

The API can be seen in the added test case. Highlights are:
1) `torch.jit.quantized.quantize_linear_modules` walks the module hierarchy of the passed-in Module and replaces all `nn.Linear` modules with a new `QuantizedLinear` module, which encapsulates the behavior described above.
2) `_pack()` and `_unpack()` script methods are present on `QuantizedLinear` modules. These methods should be called before serialization and after deserialization, respectively. This ensures that the weight matrix is properly packed for the running machine's architecture. Note that in the long term, we would like to move toward a more Pickle-style serialization technique, rather than having these explicit methods that mutate member values. This is blocked on being able to assign attributes in a ScriptMethod, among other things.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13777

Differential Revision: D13383276

Pulled By: jamesr66a

fbshipit-source-id: 00f29c9f34544add2b90107e3cf55a287802c344
2018-12-21 10:35:51 -08:00
614121c1ef Replace getargspec with getfullargspec (#15396)
Summary:
Replace `getargspec` with `getfullargspec` to resolve test warnings. Fixes #15344 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15396

Differential Revision: D13529548

Pulled By: zou3519

fbshipit-source-id: 50d3be92423a9ce89bc4895b67569663e1abbaa6
2018-12-21 09:40:33 -08:00
2b23ba8ef0 The benchmark binary support multiple batches in one run (#15443)
Summary:
It is sometimes beneficial to run multiple batches in one benchmark and check the aggregated results.

This PR enables this functionality.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15443

Reviewed By: llyfacebook

Differential Revision: D13531129

Pulled By: sf-wind

fbshipit-source-id: 553a762a5cbadf5a3d9fd6af767ae34899bc1aa2
2018-12-21 08:45:41 -08:00
433db13b48 Move torch.logspace to ATen and parallelize on CPU.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15438

Reviewed By: ezyang

Differential Revision: D13529626

Pulled By: gchanan

fbshipit-source-id: 896e8afee3d6b5a706c4f5815b91ba6bd8af6672
2018-12-21 08:24:33 -08:00
61cc701dd7 Fix cudnn dropout (#15473)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15473

Revert accidental changes introduced in D13335176

IntList is a range and copying it just copies pointers. Thus pointers would point either on deallocated memory or on the same memory causing equality always pass.

Reviewed By: ezyang

Differential Revision: D13537131

fbshipit-source-id: c97b3533be689bb4cdadd9e612f1284ac50e4bda
2018-12-21 08:15:44 -08:00
f52f68bcf9 format specialized_segment_ops_test.py to prepare D13515970 (#15408)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15408

Applied formatting to specialized_segment_ops_test.py to prepare D13515970

Reviewed By: salexspb

Differential Revision: D13520300

fbshipit-source-id: c3250b6abe8087c607f65ae60d1da61bd46c342b
2018-12-20 23:44:47 -08:00
cb79e1b3a5 Clean up onnxifi transformation code (#15453)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15453

Just move things around to facilitate further development. No logic change.

Reviewed By: rdzhabarov

Differential Revision: D13533959

fbshipit-source-id: eebab1306939e802aacffb24a711d372fd67916c
2018-12-20 22:06:47 -08:00
26b04523b1 Record Caffe2's current stream ID in c10_cuda. (#15174)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15174

Previously, Caffe2 maintained a separate per-thread per-device
current logical CUDA stream ID.  In this PR, we switch Caffe2 over
to using c10::Stream to manage the current stream, and also
manage the allocation of cudaStream_t objects.

This results in a slight behavior change: previously, Caffe2
would have been willing to allocate an arbitrary number of
CUDA streams, depending on how high the logical stream IDs
went.  The c10::Stream pool has a fixed number of streams, once
you exceed it, it wraps around.

Reviewed By: dzhulgakov

Differential Revision: D13451550

fbshipit-source-id: da6cf33ee026932a2d873835f6e090f7b8a7d8dc
2018-12-20 21:54:05 -08:00
3353064060 Add option to automatically handle unsorted variable-length sequences in RNNs (#15225)
Summary:
Fixes #3584.

Motivation: manually sorting sequences, packing them, and then unsorting them
is something a lot of users have complained about doing, especially when we can
offer library support for them.

Overview: we internally sort sequences before packing them and store a list of
`unsorted_indices` that represent how to unsort the sequences inside
PackedSequence. The packing helper functions return PackedSequence with the
`permutation` field and the unpacking helper functions use it to unsort.

To implement this, the following changes were made:
- PackedSequence now keeps `sorted_indices` and `unsorted_indices`.
  These two can be thought of as permutations and are inverses of each other.
  `sorted_indices` is how the sequences were sorted; `unsorted_indices` is how
  to unsort the sequences.
- Added an `enforce_sorted` argument to pack_sequence and pack_padded_sequence
  that maintains the legacy behavior of error-ing out on unsorted-sequences.
  When `enforce_sorted=True`, these functions maintain their ONNX exportability.
- pack_sequence(sequences, enforce_sorted) takes in unsorted sequences.
- pack_padded_sequence can take in a padded tensor that represents padded,
  unsorted sequences.
- pad_packed_sequence unsorts the PackedSequence such that it is still the
  inverse operation of packed_padded_sequence.
- RNNs apply `sort_indices` to their input hidden state and apply
  `unsort_indices` to their output hidden state. This is to ensure that the
  hidden state batches correspond to the user's ordering of input sequences.

NOT BC-Breaking
- The default for pack_sequence and pack_padded_sequence is
  `enforce_sorted=True` to avoid breaking ONNX export. To use the new
  functionality, pass in `enforce_sorted=False`

Testing Plan
- Modified TestNN.test_pack_sequence, TestNN.test_packed_padded_sequence,
  and TestNN.test_variable_sequence (RNN test) to check the behavior
  of unsorted sequences, sorted sequences, and sorted sequences with
  enforce_sorted=True
- test/test_jit.py has a test to see if RNNs are exportable with
  enforce_sorted=True

cc colesbury
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15225

Reviewed By: soumith

Differential Revision: D13507138

Pulled By: zou3519

fbshipit-source-id: b871dccd6abefffca81bc4e3efef1873faa242ef
2018-12-20 17:37:18 -08:00
52699f0754 Change default value of unique to 'sorted=True'
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15379

Differential Revision: D13531287

Pulled By: ezyang

fbshipit-source-id: 1512da7d660dc413688d99264e6434897c3ac78c
2018-12-20 17:09:08 -08:00
4ee1c2c632 add denormal options (ftz and daz)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15423

Reviewed By: yinghai

Differential Revision: D13526340

fbshipit-source-id: de2ecc717b4f778f33a8bf940ed144dbb230c7a8
2018-12-20 17:04:39 -08:00
3a6d473b49 collect_env fix (#15447)
Summary:
fixes #15214
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15447

Differential Revision: D13531523

Pulled By: ezyang

fbshipit-source-id: 8f24f5ae9f3e78f6c5c9ee702ba14faca7aa297a
2018-12-20 16:56:34 -08:00
a178f0a316 Remove unused field in jit script module deserializer (#15439)
Summary:
A little bit clean up.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15439

Reviewed By: zrphercule

Differential Revision: D13532015

Pulled By: houseroad

fbshipit-source-id: 2fb1e01fc28549c7e78af6c65ee68339950bc7da
2018-12-20 16:18:40 -08:00
8883ac4b58 Revert D13494873: [pytorch][PR] Fixing ONNX export of logical ops to have correct output datatype
Differential Revision:
D13494873

Original commit changeset: 069d2f956a5a

fbshipit-source-id: 80ef10b2eb623a63da51dc2e4874f2ee446f426d
2018-12-20 15:56:31 -08:00
95a0e2c421 Fix ASAN div by zero error in rotated GenerateProposals op (#15415)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15415

Was introduced in D13429770

Reviewed By: SuperIRabbit

Differential Revision: D13524114

fbshipit-source-id: a890eb3b97c24952c361155d1432a801499f4ddd
2018-12-20 15:44:15 -08:00
ed5b584f65 Tensor construction codemod(ResizeLike) - 7/7 (#15087)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15087

Codemod generated with clangr shard mode, 25 files per diff,
motivation: https://github.com/pytorch/pytorch/pull/12407

Reviewed By: ezyang

Differential Revision: D13419765

fbshipit-source-id: 34d695309a66723281429610a12544598c507d74
2018-12-20 15:33:07 -08:00
d6cbcb43c5 allow numpy-like boolean-list indexing in pytorch (#14932)
Summary:
Suggested fix to issue #6773, the fix allows numpy-like boolean-list indexing in pytorch
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14932

Differential Revision: D13398795

Pulled By: ezyang

fbshipit-source-id: 67f8daf9829db2550ff76d2bde673be6dd2708cd
2018-12-20 15:33:06 -08:00
f56217af3b Doc improvement on DDP (#15440)
Summary:
I noticed that some users don't even know we have this support. Adding into the doc
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15440

Differential Revision: D13531045

Pulled By: teng-li

fbshipit-source-id: 9757c400c0010608758c754df04e603b36035a10
2018-12-20 14:51:57 -08:00
cde26c659e Fix type annotation error. (#15448)
Summary:
According to mypy, the trailing -> None is mandatory.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15448

Differential Revision: D13532179

Pulled By: ezyang

fbshipit-source-id: e8972f8c9ada4657c518cd7bcd46e489ab8ddf5f
2018-12-20 14:47:57 -08:00
c24a124fa0 Add launch bounds needed for ROCm 2.0 (#15400)
Summary:
ROCm 2.0's compiler requires launch_bounds annotations if flat work group sizes are larger than the default of 256.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15400

Differential Revision: D13531239

Pulled By: ezyang

fbshipit-source-id: c0b40600a8c332823da6c7113c644d8dba424a9c
2018-12-20 14:39:13 -08:00
1a2ec10bd4 Support enough of closures to write autograd functions (#15411)
Summary:
This PR adds enough of the infra for supporting closures (inner script functions) in order to allow us to expression symbolic gradients using them. We do not actually ever run graphs that contain these closures. The symbolic_script infrastructure just extracts them out of the original forward graph and turns them into discrete forward/backward pairs. This cuts down on the type annotations necessary to write forward/backward pairs and aligns closely with the "differentiator" function approach to expression reverse-mode AD.

Example:

This code:
```
import torch

r = torch.jit.CompilationUnit(
'''
def mul_forward(self, other):
    def backward(grad_output):
        grad_self = (grad_output * other).sum_to_size(self.size())
        grad_other = (grad_output * self).sum_to_size(other.size())
        return grad_self, grad_other
    return self * other, backward
''')

print(r.module.code)
```

Will produce this graph (pretty printed for clarity):

```
def mul_forward(self,
    self: Tensor,
    other: Tensor) -> Tuple[Tensor, Tuple[None, Tuple[Tensor, Tensor]]]:
  backward = (self.__lambda, (other, self))
  return (torch.mul(self, other), backward)

def __lambda(self,
    context: Tuple[Tensor, Tensor],
    grad_output: Tensor) -> Tuple[Tensor, Tensor]:
  other, self, = context
  grad_self = torch.sum_to_size(torch.mul(grad_output, other), torch.size(self))
  grad_other = torch.sum_to_size(torch.mul(grad_output, self), torch.size(other))
  return (grad_self, grad_other)
```

symbolic_script will then do some modifications to remove the unsuppored prim::Function node, yielding:

```
def mul_forward(self,
    self: Tensor,
    other: Tensor) -> Tuple[Tensor, Tuple[None, Tuple[Tensor, Tensor]]]:
  return (torch.mul(self, other), (other, self))

def backward(self,
    context: Tuple[Tensor, Tensor],
    grad_output: Tensor) -> Tuple[Tensor, Tensor]:
  other, self, = context
  grad_self = torch.sum_to_size(torch.mul(grad_output, other), torch.size(self))
  grad_other = torch.sum_to_size(torch.mul(grad_output, self), torch.size(other))
  return (grad_self, grad_other)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15411

Differential Revision: D13523340

Pulled By: zdevito

fbshipit-source-id: 4d4a269460e595b16802c00ec55ae00e3e682d49
2018-12-20 14:39:11 -08:00
3fdf567752 Adding CUDA version for C2 operators generate proposals and nms (#13694)
Summary:
Related to issue #13684
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13694

Reviewed By: wat3rBro

Differential Revision: D13017791

Pulled By: newstzpz

fbshipit-source-id: 4bdc58e474d8e1f6cd73a02bf51f91542a2b9d0b
2018-12-20 14:39:09 -08:00
a47749cb28 Add at::one_hot (#15208)
Summary: Closes: https://github.com/pytorch/pytorch/issues/15060

Differential Revision: D13528014

Pulled By: ezyang

fbshipit-source-id: 5a18689a4c5638d92f9390c91517f741e5396293
2018-12-20 14:24:58 -08:00
2a64a78e7b Extract arguments to its own file and pass arguments to ios apps (#15413)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15413

In order to pass arguments to the ios app, need to extarct the arguments
to its own file. Also, in the ios app, do not use the benchmark.json, which
parses the arguments.

This is an incompatible change, needs to add hot fix to the tests.

Reviewed By: llyfacebook

Differential Revision: D13523240

fbshipit-source-id: b559cc7f52d8f50ee206a7ff8d7b59292d855197
2018-12-20 13:31:48 -08:00
f0f9277c3c Fixing ONNX export of logical ops to have correct output datatype (#15185)
Summary:
Currently PyTorch ONNX exporter exports the logical ops (`lt`, `gt`, `le`, `ge`, `eq`) with output type in corresponding ONNX ops as type `tensor(uint8)`. But ONNX spec allows for only `tensor(bool)`, which is why models that have these ops fail to load properly.

This issue is captured in https://github.com/pytorch/pytorch/issues/11339. Part of this issue, relating to the allowed input types, has been fixed in ONNX spec by houseroad. This PR fixes the other part pertaining to output type.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15185

Differential Revision: D13494873

Pulled By: houseroad

fbshipit-source-id: 069d2f956a5ae9bf0ac2540a32594a31b01adef8
2018-12-20 12:37:27 -08:00
cb0b096f2b Miscellaneous small doc fixes (#15373)
Summary:
This PR makes some small changes for better consistency in our README and
CONTRIBUTING docs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15373

Differential Revision: D13512753

Pulled By: driazati

fbshipit-source-id: 44398ad1894eef521d5f5acb1d06acaad67728cf
2018-12-20 12:33:40 -08:00
cac02034f6 Extend README for ATen/native/cpu (#15437)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15437

Differential Revision: D13529436

Pulled By: ezyang

fbshipit-source-id: 2e2193d54ea7f7626fe7392e4d0c130c2f87a76f
2018-12-20 11:17:00 -08:00
06a7cb5901 Implementing cuda kernel for tril_indices and triu_indices (#15203)
Summary:
Followup PR of #14904, and the stretch goal of #12653.

Directly calculate coordinates in the original tensor using column index in the result tensor. Every GPU thread takes care of a column (two numbers) in the output tensor.

The implementation detects and handles precision loss during calculating the square root of a `int64_t` variable, and supports tensors with up to `row * column = 2 ^ 59` numbers.

Algorithm details are describe in [comments of TensorFactories.cu](23ddb6f58a/aten/src/ATen/native/cuda/TensorFactories.cu (L109-L255)).

zou3519
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15203

Reviewed By: zou3519

Differential Revision: D13517695

Pulled By: mrshenli

fbshipit-source-id: 86b305d22cac08c8962a3b0cf8e9e620b7ec33ea
2018-12-20 10:23:38 -08:00
5c66662e58 Revert D13498974: [pytorch][PR] [jit] Add self to Python printer reserved words
Differential Revision:
D13498974

Original commit changeset: 488efb661476

fbshipit-source-id: 3b991bccf4cf2ffdafe70f145aff0ae2837e31f8
2018-12-20 10:02:37 -08:00
8db44eda01 Add support for batched pdist (#12302)
Summary:
This updates pdist to work for batched inputs, and updates the
documentation to reflect issues raised.

closes #9406
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12302

Reviewed By: ezyang

Differential Revision: D13528485

Pulled By: erikbrinkman

fbshipit-source-id: 63d93a6e1cc95b483fb58e9ff021758b341cd4de
2018-12-20 09:41:08 -08:00
7a764fe270 multi-dim standard deviation for CUDA. (#14990)
Summary:
This is the CUDA version of #14535 .
It refactors Reduce.cuh to allow more general classes of reductions to be performed -- we no longer assume that the temporary data returned during reduction is just one scalar, and instead allow an arbitrary accumulate type.
We also allow 64-bit indexing when necessary, since in general we will no longer be able to accumulate directly in the output. (In the cases when we can, we continue to split the tensors until they can be addressed with 32-bits, as before).
As an initial use-case, we implement `std` in multiple dimensions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14990

Differential Revision: D13405097

Pulled By: umanwizard

fbshipit-source-id: a56c24dc2fd5326d417632089bd3f5c4f9f0d2cb
2018-12-20 08:56:32 -08:00
5e624948b6 Add self to Python printer reserved words (#15318)
Summary:
This adds `self` to the list of reserved words and also sorts the lines and prevents the tracer from naming values 'self' (which happens in torch/tensor.py)

Fixes #15240
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15318

Differential Revision: D13498974

Pulled By: driazati

fbshipit-source-id: 488efb661476cdcdb8ecb9cb48942f02e3c1e611
2018-12-20 02:29:09 -08:00
eb5d28ecef Pretty printing of C++ modules (#15326)
Summary:
A long outstanding nicety: pretty printing of C++ modules. E.g.
```
  Sequential sequential(
      Linear(10, 3),
      Conv2d(1, 2, 3),
      Dropout(0.5),
      BatchNorm(5),
      Embedding(4, 10),
      LSTM(4, 5));
std::cout << sequential;
```
prints
```
torch::nn::Sequential(
  (0): torch::nn::Linear(in=10, out=3, with_bias=true)
  (1): torch::nn::Conv2d(input_channels=1, output_channels=2, kernel_size=[3, 3], stride=[1, 1])
  (2): torch::nn::Dropout(rate=0.5)
  (3): torch::nn::BatchNorm(features=5, eps=1e-05, momentum=0.1, affine=true, stateful=true)
  (4): torch::nn::Embedding(count=4, dimension=10)
  (5): torch::nn::LSTM(input_size=4, hidden_size=5, layers=1, dropout=0)
)
```

apaszke ebetica ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15326

Differential Revision: D13518986

Pulled By: goldsborough

fbshipit-source-id: 63bf753672f0e348951de3645208f263581de5fb
2018-12-19 21:55:49 -08:00
2ef0f1222a Restructuring prof dag counters (#13321)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13321

This diff simply refactors the `ProfDAGCounters` into two:
* `ProfDAGCounters` that gathers stats at runtime.
* `ProfDAGReport` which holds the report from the gathered stats once stats collection is done.

This refactoring allow us to implement `+=` for `ProfDAGReport`, which can be used for aggregating same-net reports on each host.

Reviewed By: donglimm

Differential Revision: D12837988

fbshipit-source-id: 0470c5fd6437f12711cab25a15a12965d79b2a91
2018-12-19 21:48:30 -08:00
b89b46abfb Remove python_default_init from ATen and use Optional (#15234)
Summary:
Optional clean up. This PR remove python_default_init from the yaml files, and the code-gen, and utilize optional type to do the work.

This also fix the bug in the #13149 to correctly adopt as_strided backward.

Fixes #9941
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15234

Differential Revision: D13502044

Pulled By: wanchaol

fbshipit-source-id: 774b61fc4414482cf11d56e22bd0275aefb352a4
2018-12-19 21:38:50 -08:00
3fc889e976 Tensor construction codemod(ResizeLike) - 1/7 (#15073)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15073

Codemod generated with clangr shard mode, 25 files per diff,
motivation: https://github.com/pytorch/pytorch/pull/12407

Reviewed By: dzhulgakov

Differential Revision: D13419563

fbshipit-source-id: 8c284405fa3a867303216df876ee6b20d8a46551
2018-12-19 21:38:48 -08:00
2db742fc95 Do not use fork to invoke test scripts in pytorch rocm CI
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14600

Differential Revision: D13523937

Pulled By: bddppq

fbshipit-source-id: 1493fdd051283650081d7944bb2bd7f0c4c44990
2018-12-19 21:35:16 -08:00
1071e92335 Replace Vec256<T>::size with constexpr method (#15406)
Summary:
Stack:
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; **#15406 Replace Vec256<T>::size with constexpr method**&nbsp;&nbsp;[💛](https://our.intern.facebook.com/intern/diff/D13519902/)

See Note [constexpr static function to avoid odr-usage compiler bug]
for detailed justification.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15406

Differential Revision: D13523774

Pulled By: ezyang

fbshipit-source-id: c0ab44298bb2ef3d68a66d026fc6bc156a909a6b
2018-12-19 20:33:45 -08:00
9abd755a76 Make cpuinfo logging less verbose (#15405)
Summary:
Log only errors in cpuinfo.

Fix to #15401 and #15398
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15405

Differential Revision: D13526251

Pulled By: Maratyszcza

fbshipit-source-id: 4d9eba0912f7b45093bed2e343cd77a151ffa8c4
2018-12-19 20:23:36 -08:00
88bf683cbc Support error handling in forked threads (#14523)
Summary:
Save error info in the future for parent thread to pick up. Throw the error
when the thread is the root thread.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14523

Differential Revision: D13251756

Pulled By: highker

fbshipit-source-id: b40f9a45665e1a934743f131ec5e8bad5622ce67
2018-12-19 18:54:46 -08:00
5dd5ef3214 default options for OutputTensorCopyFrom (#15248)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15248

OutputTensorCopyFrom takes four arguments: index, a source Tensor, TensorOptions and whether we want to perform an async call.
We want to provide some default option for TensorOptions, (1). default device to context_.device() (2). default dtype to input.dtype(). User can also explicitly provide these options to override default values.

next diff will change the order of TensorOptions parameter so that user don't need to write down tensor options unless they want to override.

Reviewed By: dzhulgakov

Differential Revision: D13453824

fbshipit-source-id: 87401f81c7c3f9fd3d8936c710e6c2e04a59b689
2018-12-19 18:14:47 -08:00
a00cfd1e9b Fix Module::copy_into
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15393

Differential Revision: D13519477

Pulled By: highker

fbshipit-source-id: d62928597ec0700b550e7cf481c8febae57b200d
2018-12-19 17:09:59 -08:00
0b219538cf add unpack_outputs to inlineCallTo
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15382

Differential Revision: D13518844

Pulled By: zdevito

fbshipit-source-id: 981936988080af80629b70bf5f6dfa52ceb09c2f
2018-12-19 15:11:59 -08:00
07d20b1e7c Fix documentation (#15372)
Summary:
Current documentation example doesn't compile. This fixes the doc so the example works.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15372

Differential Revision: D13522167

Pulled By: goldsborough

fbshipit-source-id: 5171a5f8e165eafabd9d1a28d23020bf2655f38b
2018-12-19 15:04:24 -08:00
055de167d5 computeChains with nomnigraph (#15366)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15366

swap the old implementation with a slightly easier one to understand

I ran the tests and compared the number of chains compared to the old algorithm.  This one outperforms on every test, but we have yet to see if that impacts performance at all.

old chain 34 nomnigraph chain 25
old chain 46 nomnigraph chain 34
old chain 228 nomnigraph chain 188
old chain 397 nomnigraph chain 338

Reviewed By: ilia-cher

Differential Revision: D13057451

fbshipit-source-id: ccd050bfead6eb94ab9c7b0a70b09a22c2b9e499
2018-12-19 15:04:23 -08:00
9217bde807 Refactor dataloader.py (#15331)
Summary:
Same as #14668, and was approved there.

ailzhang , please apply this patch to Horizon's `data_streamer.py`: https://gist.github.com/SsnL/020fdb3d6b7016d81b6ba1d04cc41459 Thank you!

Below is the original description at #14668:

As I am working on tasks in https://github.com/pytorch/pytorch/issues/13023, I realized how unreadable the code is because all functions to be run in multiprocessing must be at top global level. Adding more functionalities to `dataloader.py` will only make things worse.

So in this PR, I refactor `dataloader.py` and move much of it into `data._utils`. E.g., the `_worker_loop` and related methods are now in `data._utils.worker`, signal handling code in `data._utils.signal_handling`, collating code in `data._utils.collate`, etc. This split, IMHO, makes code much clearer. I will base my future changes to DataLoader on top of this.

No functionality is changed, except that  I added `torch._six.queue`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15331

Reviewed By: yf225

Differential Revision: D13503120

Pulled By: ailzhang

fbshipit-source-id: 94df16b4d80ad1102c437cde0d5a2e62cffe1f8e
2018-12-19 12:36:03 -08:00
41e7e1bc40 Rename potrs to cholesky_solve (#15334)
Summary:
Changelog:
- Renames `potrs` to `cholesky_solve` to remain consistent with Tensorflow and Scipy (not really, they call their function chol_solve)
- Default argument for upper in cholesky_solve is False. This will allow a seamless interface between `cholesky` and `cholesky_solve`, since the `upper` argument in both function are the same.
- Rename all tests
- Create a tentative alias for `cholesky_solve` under the name `potrs`, and add deprecated warning to not promote usage.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15334

Differential Revision: D13507724

Pulled By: soumith

fbshipit-source-id: b826996541e49d2e2bcd061b72a38c39450c76d0
2018-12-19 12:31:24 -08:00
33018e4e09 centralize side effects ops as node method (#15188)
Summary:
A number of different passes rely on whether a node has side effects. This centralizes the list of side effectful ops in one place.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15188

Differential Revision: D13508438

Pulled By: eellison

fbshipit-source-id: 2143e782b787731ce007b6dcd50cbde30e1b8dd0
2018-12-19 10:52:54 -08:00
560530aeec Optional ScalarType support for native functions & JIT (#15154)
Summary:
For #6593 and #9515

This completes the support for optional<ScalarType> in native, JIT and autograd.

Note: Mostly following the existing implementation for optional<Scalar> that was added in https://github.com/pytorch/pytorch/pull/12582.

This PR introduces a way to make functions accept an optional dtype and it will unblock #9515 by allowing the `dtype` param for type promotion interface:
```
func: name(inputs, *, ScalarType? dtype=None, Casting casting=same_kind)
```

An alternative approach could have been using `ScalarType::Undefined` for the same purpose but without optional, though it would have been a bit hacky.
```
func: name(inputs, *, ScalarType dtype=Undefined, Casting casting=same_kind)
```

Here's an example use of this in action: 971f69eac6

There are already a bunch of native functions that were getting optional `dtype` through function overloading. https://github.com/pytorch/pytorch/pull/15133 is the attempt to migrate all of those. I will send those changes separately after this since some functions (e.g. sum) need quite a bit of change in the codebase. See the commits over there.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15154

Differential Revision: D13457760

Pulled By: tugrulates

fbshipit-source-id: 706134f0bd578683edd416b96329b49a1ba8ab48
2018-12-19 10:45:35 -08:00
54d4fe3f49 Implement 'to' on ScriptModules (#15340)
Summary:
Following #6008
Fixes "Implement 'to' on ScriptModules #7354"

cc zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15340

Differential Revision: D13506646

Pulled By: zdevito

fbshipit-source-id: 318fea2e8e51a37ce9844efa4c8db67d45a66317
2018-12-19 10:41:23 -08:00
1d94a2bee3 Update cpuinfo submodule (#15385)
Summary:
Pull cpuinfo changes that should make it work on AWS Lambda servers (which don't have `/sys/devices/system/cpu/{possible,present}` files, and probably don't mount sysfs at all).

I'm not 100% sure it will fix the issue, but getting this update in would make it easier for users to test using a nightly build.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15385

Reviewed By: soumith

Differential Revision: D13517467

Pulled By: Maratyszcza

fbshipit-source-id: e8e544cd1f9dad304172ebb7b6ba7a8ad7d34e66
2018-12-19 07:31:45 -08:00
cbde820bc3 Updating submodules
Reviewed By: cdelahousse

fbshipit-source-id: dfbdae40e505c46cd64751c6ec107c84f9434131
2018-12-18 23:37:34 -08:00
cd8dd49fba race condition fix of using mutable_data inside OPENMP region for batched matmul (#15371)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15371

Similar to D13387692:

Never call mutable_data from an OpenMP region!!!

Reviewed By: jspark1105

Differential Revision: D13511259

fbshipit-source-id: 100812d2a547c0a1d5018749d5fdc88162375673
2018-12-18 23:22:56 -08:00
6ca1d93473 add whitelisted clang-format checks (#15254)
Summary:
This PR adds clang-format automation:
- It only checks on whitelisted files, so we can enable incrementally without noise
- There is a pre-commit hook provided that will do the same check, plus prompt users to apply the clang-format changes (no change is made without the user agreeing).

My plan is to migrate over whole files at a time, clang-formatting them and then adding them to the whitelist. Doing it this way should avoid too many merge pains (the most you'll have to is run clang-format on the affected file before rebasing).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15254

Differential Revision: D13515888

Pulled By: suo

fbshipit-source-id: d098eabcc97aa228c4dfce8fc096c3b5a45b591f
2018-12-18 22:34:20 -08:00
122b4ef41d build fix
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15384

Differential Revision: D13515708

Pulled By: zdevito

fbshipit-source-id: ea077cfec30edf41b85dc83c0a969d1146434145
2018-12-18 22:11:44 -08:00
0368054a6d Split up compiler.cpp (#15355)
Summary:
This separates the different parts of compiler.cpp to make their relationship more clear. In particular it adds:

* sugared_value.{h,cpp} - all the public SugaredValues that the compiler defines and a few that were inside compiler.cpp
* type_parser.{h, cpp} - Turns TreeRef's defining types into TypePtr
* schema_matching.{h, cpp} - infrastructure for matching arguments against overloaded schema and emitting builtin operators with a particular schema.
Retains:
* compiler.{h, cpp} - now responsible simply for the `defineMethodsInModule` infra structure.

Some utility functions like inlineCallTo have moved to ir.h.

Only thing that is not a move is some changes in module.h/cpp that remove multiple returns from `Method::emit_call_to`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15355

Reviewed By: suo, wanchaol

Differential Revision: D13507524

Pulled By: zdevito

fbshipit-source-id: 69ec936a9ff1a383c12a883616346b219c72e393
2018-12-18 19:43:35 -08:00
6ab2e7442d Autograd using torchscript (#14604)
Summary:
This PR enables autodiff to use the forward/backward graph compiled from python code, instead of using symbolic gradients(modifying the original graph directly).

We put the map in a separate .h file for now to wait for the native_functions.yaml and derivatives.yaml merge. This should ideally go into native_functions.yaml eventually.

This PR should be enough to unblock us for now, we can start writing gradients for aten functions in python.

Differential Revision: D13494635

Pulled By: ailzhang

fbshipit-source-id: f8d51a15243ac46afd09d930c573ccdfcd9fdaaf
2018-12-18 19:10:57 -08:00
4928c76415 Minor clean up for test_jit (#15368)
Summary:
* remove None args in functional tests
* remove some expect files that are not necessary
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15368

Differential Revision: D13512349

Pulled By: wanchaol

fbshipit-source-id: 304cffff966487d15c373057ae8ad114ef8aa7f9
2018-12-18 18:26:37 -08:00
f3bff2d500 Add RNNCell modules to Script standard library (#14695)
Summary:
Adds RNNCell modules to script standard lib

cc apaszke for argument_spec changes
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14695

Differential Revision: D13467680

Pulled By: driazati

fbshipit-source-id: 13a14da87714325cc4c3d49e5fde8a850d5d757b
2018-12-18 17:28:28 -08:00
f3cc9b2218 Remove fully qualified weak script names (#15364)
Summary:
Cleanup to make references to `weak_script` consistent across codebase
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15364

Differential Revision: D13509676

Pulled By: driazati

fbshipit-source-id: 93dbbbe57e9b9b6587895f3cc6fac678babd21de
2018-12-18 16:48:52 -08:00
096ee8467c Redefine scheduler to set learning rate using recursive formula (#14010)
Summary:
Modified step_lr for StepLR, MultiStepLR, ExponentialLR and CosineAnnealingLR. In this way, multiple schedulers can be used simultaneously to modify the learning rates.

Related issue: https://github.com/pytorch/pytorch/issues/13022

Added unit tests combining multiple schedulers.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14010

Reviewed By: ezyang

Differential Revision: D13494941

Pulled By: chandlerzuo

fbshipit-source-id: 7561270245639ba1f2c00748f8e4a5f7dec7160c
2018-12-18 16:44:31 -08:00
5e97720100 Replace resize_dim() with set_sizes_and_strides() in (#15348)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15348

We have a function resize_dim() on TensorImpl in c10/core/TensorImpl.h which lets you change the dimensionality of a tensor, resizing both sizes and strides. Unfortunately, this API is fairly easy to misuse, because it fills in the new entries with garbage when you size it larger. We want to refactor the call sites to use set_sizes_and_strides() instead, so that there is never an intermediate tensor state where the sizes/strides don't make sense. In this diff, resize_dim() is
replaced with set_sizes_and_strides() in aten/src/TH/THTensor.hpp.

Reviewed By: ezyang

Differential Revision: D13505512

fbshipit-source-id: 193bab89f0018c13ca07488be336d8e967746b76
2018-12-18 16:38:36 -08:00
5667af3880 Minor cleanup for TestFuser tests (#15134)
Summary:
Changelog:
- change some expect tests that didn't have to be expect tests,
  instead use self.assertAllFused
- Some of the fuser tests weren't using self.assertAllFused.
- Minor test renames

cc apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15134

Differential Revision: D13507481

Pulled By: zou3519

fbshipit-source-id: dd0788530a60bb5ed2f42b961fae3db2b4404b64
2018-12-18 16:33:59 -08:00
3681bf7cff add dense vector to id_list operator (#15090)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15090

as title
step 2 of the linked task

Reviewed By: ellie-wen

Differential Revision: D13425977

fbshipit-source-id: f3538ed68f42470ba39c5b779af764d4a5591a9d
2018-12-18 16:27:38 -08:00
f5da198236 fix clang-tidy script for python 3
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15360

Differential Revision: D13509668

Pulled By: suo

fbshipit-source-id: a3448a115eaac8dd4c3f179901a23bdbc5098408
2018-12-18 15:06:14 -08:00
2469f7e02e Port torch.linspace to ATen and parallelize it on CPU.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15320

Reviewed By: ezyang

Differential Revision: D13498995

Pulled By: gchanan

fbshipit-source-id: fba655d51d978fffaa53a5e4cae4a99ebfb0eddc
2018-12-18 15:01:49 -08:00
3118124cd6 Add (Un)Fold modules to standard library (#14759)
Summary:
Depends on #14597 for the corresponding aten ops.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14759

Differential Revision: D13325356

Pulled By: driazati

fbshipit-source-id: 99e39449c1ccfa293de05672c31a11e580bdd11f
2018-12-18 12:03:08 -08:00
f4c504593c Fix the (reduce)min and (reduce)max ONNX exporting (#15241)
Summary:
max and reducemax are smashed together, we need to support one input case.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15241

Reviewed By: yinghai

Differential Revision: D13473312

Pulled By: houseroad

fbshipit-source-id: 9b8c847286a2631b006ca900271bc0d26574101a
2018-12-18 11:48:06 -08:00
056cfaf3ff Method returns a single argument (#15289)
Summary:
This PR changes Method (just Method not all graphs) to always have a single
return argument.

This is part 1 in a set of changes that will enable us to have better handling if early return statements.
The simplification that this change provides greatly reduces the work for the next step.

This change makes it so that Method and Python handle multiple returns in the same way:
* 0 - None
* 1 - <single value>
* many - Tuple[...]

The result is that a lot of special-case handling in compiler.cpp and its
bindings can be removed. It also fixes several bugs in return handling,
including one where return values were not always checked against their
attributed values.

Notes:
* inferTypeFrom is renamed to be more accurate and discourage use.
* This has uncovered some bugs in other components, which are noted in
  the diff.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15289

Differential Revision: D13481649

Pulled By: zdevito

fbshipit-source-id: 0e2242a40bb28cca2d0e8be48bede96195e4858c
2018-12-18 10:44:09 -08:00
12cf5178aa caffe2 mobile opengl (#15322)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15322

caffe2 mobile opengl code is not used, deleting it to reduce complications when we perform other changes

Reviewed By: Maratyszcza

Differential Revision: D13499943

fbshipit-source-id: 6479f6b9f50f08b5ae28f8f0bc4a1c4fc3f3c3c2
2018-12-18 08:20:52 -08:00
54d8ce94ee Revert D13383102: [pytorch][PR] Upgrade MKL-DNN to version 0.17
Differential Revision:
D13383102

Original commit changeset: c434f0e0ddff

fbshipit-source-id: 690f46ca0710954fa591a5ea77535e9759db4de5
2018-12-18 07:39:20 -08:00
bb9b7de831 Updating submodules
Reviewed By: cdelahousse

fbshipit-source-id: 4bf66581d07d839f459869bc9c6428011063cc5b
2018-12-17 21:25:36 -08:00
3a98462f2c improve script/no script save error (#15321)
Summary:
Improves the error message for #15116
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15321

Differential Revision: D13499379

Pulled By: zdevito

fbshipit-source-id: b8dc0a83efabff74199f4aab2ee98aa41c42608b
2018-12-17 21:13:58 -08:00
e37a22128e Allow tracing with fork/wait (#15184)
Summary:
There is still limitation on this: if a script module is somewhere
in the trace, the inputs/outputs can only be tensors or tuples of
tensors.

resolves #15052
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15184

Differential Revision: D13457691

Pulled By: highker

fbshipit-source-id: 8fe46afc41357a0eb8eadd83f687b31d074deb0e
2018-12-17 20:34:26 -08:00
Jie
bd958cde68 [TensorIterator fixing mean to output correct result for half precisi… (#14878)
Summary:
…on](#12115)

mean is calculated in two step sum()/numel(). For half precision, data gets
casted back to half after sum().
We fused the division into the reduction kernel by adding pre_op/post_op.

This allows us to do torch.ones(65536).cuda().half().mean() to return correct
result.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14878

Differential Revision: D13491159

Pulled By: soumith

fbshipit-source-id: e83802e1628b6d2615c45e18d7acf991d143a09e
2018-12-17 20:13:30 -08:00
71ee882157 Reenable OpenMP by reverting the following two commits. (#15315)
Summary:
Revert "Put back linker flag for OpenMP to prevent build break on ppc64le (#14569)"

This reverts commit a84e873bb156080ea76ab182171b1f3b4d5395f6.

Revert "Update OpenMP cmake setting for xcode 9 compiler(AppleClang 9.0) (#14473)"

This reverts commit 8901935ad42fe9bf093d1106ea43606008a4024d.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15315

Differential Revision: D13495852

Pulled By: ezyang

fbshipit-source-id: bcd3f60088b14831c53d3c171f10cd1ab6b35dee
2018-12-17 19:54:41 -08:00
aec9fdf0a4 Fix _apply in nn.Module (#15305)
Summary:
Fixes an issue that arose from https://github.com/pytorch/pytorch/pull/13481 where `.shared_memory()` couldn't be called. Effectively undoes all changes to `nn.Module` from that PR and solve the relevant problem in a different way (the goal was to be able to call `._apply()` on the Python wrapper for a C++ module).

soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15305

Differential Revision: D13493937

Pulled By: goldsborough

fbshipit-source-id: 4cb8687f90fc8709a536c5e7eacd0dc8edf6f750
2018-12-17 16:22:21 -08:00
2f38ffbcb3 Add a correctness check for C++ types to custom operators (#15247)
Summary:
The JIT uses `int64_t` for its integer type and `double` for its floating point type, but users quite often want to write `int` or `float` and that currently fails in not-so-nice ways for custom ops. This PR adds a simple `static_assert` to catch these common failure cases.

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15247

Differential Revision: D13493941

Pulled By: goldsborough

fbshipit-source-id: c1cd0d10ab5838c75f167c0bdb57e45a0bc1344e
2018-12-17 16:17:27 -08:00
e650a84872 caffe2/python/task: added __repr__ methods to all task definitions (#15250)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15250

This adds `__repr__` methods to all of the classes under task.py. This makes the objects much easier to interact with when using them in an interactive manner, such as in a Jupyter notebook.

The default `__repr__` method just returns the object ID which is very unhelpful.

Reviewed By: hanli0612

Differential Revision: D13475758

fbshipit-source-id: 6e1b166ec35163b9776c797b6a2e0d002560cd29
2018-12-17 16:02:16 -08:00
e0b261a35b Port nn fold and unfold to c++
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14597

Reviewed By: ezyang

Differential Revision: D13272227

fbshipit-source-id: 6eccab5ff5830a977398a96393b778095120edc6
2018-12-17 15:46:37 -08:00
c66adfc16b Allow future type parsing
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14887

Differential Revision: D13490984

Pulled By: highker

fbshipit-source-id: 165fe995867be273793f983154aa6cbce13e4396
2018-12-17 15:39:52 -08:00
efb37e86eb Removing BUILD_C10_EXPERIMENTAL_OPS option and unglobbing experimental/c10d ops
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15064

Reviewed By: orionr

Differential Revision: D13474801

Pulled By: pjh5

fbshipit-source-id: 9d3664c3a3a1b6c2d9f083f8476fe3b037296b98
2018-12-17 15:35:41 -08:00
59d71b9664 Bicubic interpolation for nn.functional.interpolate (#9849)
Summary:
Addresses #918, interpolation results should be similar to tf

* Adds bicubic interpolation operator to `nn.functional.interpolate`
* Corresponding test in `test_nn.py`

The operator is added in legacy `TH` to be aligned with the other upsampling operators; they can be refactored/moved to ATen all at once when #10482 is resolved
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9849

Differential Revision: D9007525

Pulled By: driazati

fbshipit-source-id: 93ef49a34ce4e5ffd4bda94cd9a6ddc939f0a4cc
2018-12-17 15:31:48 -08:00
c5dd91c4ae add isinstance static type checking for jit (#15076)
Summary:
This PR add isinstance to do static type checking in JIT.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15076

Differential Revision: D13471067

Pulled By: wanchaol

fbshipit-source-id: d39b7ed5db9fcca4b503659d02cf7795950ea8ea
2018-12-17 15:21:49 -08:00
216ab259fb Fix the missing caffe2 proto files for Windows (#15157)
Summary:
Fixes #15156
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15157

Differential Revision: D13490420

Pulled By: orionr

fbshipit-source-id: 4387d707f634a5975238af915b1befb2277f8ec7
2018-12-17 15:21:47 -08:00
f4c59c5fdf Replace SwitchToDevice(0) with SwitchToDevice() (#15126)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15126

I want to make people stop manufacturing StreamId from thin air,
and a first step is to make people use the default stream.

Reviewed By: dzhulgakov

Differential Revision: D13432922

fbshipit-source-id: 9f0d8d70646c50d979bde5ba3c3addeebac48a3d
2018-12-17 15:15:00 -08:00
df4c9471ec Don't enforce docstrings on bool dispatch (#15306)
Summary:
Allows 2 functions that are boolean dispatched to have no docstrings (the only case that will fail now is if both functions have docstrings)

Fixes #15281
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15306

Differential Revision: D13494884

Pulled By: driazati

fbshipit-source-id: 65fec39ae03a7d6a68ad617c9b270faeb1617930
2018-12-17 14:41:05 -08:00
95d3fed68f Fix for issue 14829 (#14908)
Summary:
* Modify the testcase as outlined in the issue
   * Issue url: https://github.com/pytorch/pytorch/issues/14829
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14908

Differential Revision: D13490360

Pulled By: ezyang

fbshipit-source-id: ff11a72e19b49223652182e82c2b4e65fe444ca7
2018-12-17 14:28:50 -08:00
e07fc114a0 Minor fixes in .jenkins/caffe2/bench.sh
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15304

Differential Revision: D13493876

Pulled By: bddppq

fbshipit-source-id: 7146eb2587e526af65b4b0290c25bd55653a3088
2018-12-17 13:53:55 -08:00
700271d0e9 Adding ONNX export for torch.expand and torch.ne (#15050)
Summary:
`torch.expand` and `torch.ne` are used often in models and this PR adds ONNX export support for them. ArmenAg has created issue https://github.com/pytorch/pytorch/issues/10882 for this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15050

Differential Revision: D13453036

Pulled By: houseroad

fbshipit-source-id: 4724b4ffcebda6cd6b2acac51d6733cb27318daf
2018-12-17 13:48:14 -08:00
3df79f403e Tighten up invariants regarding StreamId. (#15125)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15125

I realized that it is really bad juju if you fake a StreamId
out of thin air, because in general this isn't going to work.
So, make the constructor a lot scarier.

Most "faking StreamId out of thin air" happens because someone
just wants to put something on the default stream.

Reviewed By: dzhulgakov

Differential Revision: D13432800

fbshipit-source-id: a86991d6fc1d8aa4e54e8175e5f06f90856238e6
2018-12-17 13:30:54 -08:00
1dbc7cff3e Fix tensor printing bug in Python 2 (#12732)
Summary:
`rsplit` doesn't have kwargs in Python 2 so this line raises an error

Fixes #15135
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12732

Differential Revision: D10458630

Pulled By: driazati

fbshipit-source-id: a63e42fbc0e39e4291480775b516c98122ec05a1
2018-12-17 13:17:51 -08:00
d71fac20eb Refactor hotpatch_vars and apply it to libtorch (#14976)
Summary:
Fixes #14801.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14976

Differential Revision: D13485381

Pulled By: soumith

fbshipit-source-id: 0af3c2e1b90988d56f6f85632328d1e4b788ffd2
2018-12-16 21:53:31 -08:00
656b565a0f Trivial comment correction in dataloader (#15276)
Summary:
Trivial comment correction in dataloader
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15276

Differential Revision: D13477324

Pulled By: soumith

fbshipit-source-id: 2a74a014999655d129311d611f2a09411339cb13
2018-12-15 10:59:00 -08:00
c51c825efe Delete ffi documentation (#15220)
Summary: Deleting FFI documentation since its deprecated.

Differential Revision: D13477329

Pulled By: soumith

fbshipit-source-id: 0b3d485eb7cef1f05b6b397dff50f21a49d6409e
2018-12-15 09:49:02 -08:00
60badccd10 Fix a typo in the assert
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15265

Reviewed By: llyfacebook

Differential Revision: D13477029

Pulled By: sf-wind

fbshipit-source-id: 9c5571a583c01f9701625541ebec0c836cb923f2
2018-12-15 09:09:09 -08:00
4bcb425490 fix cholesky call in potrs example (#15215)
Summary:
Cholesky by default returns the lower triangular matrix, see [docs](https://pytorch.org/docs/stable/torch.html#torch.cholesky).

However `torch.potrs` by default requires the upper triangular matrix. The naming of the variable `u` suggests that the example expects the upper to be returned, so I've added the flag to make that happen in the example.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15215

Differential Revision: D13476468

Pulled By: soumith

fbshipit-source-id: 7b68035f435a2b1be4d363b3f63e407394af949d
2018-12-15 04:43:34 -08:00
2b57bd4107 value-based mark and sweep DCE (#14910)
Summary:
This makes DCE more granular by tracking live values/aliases through the graph (rather than just nodes). So we can be more aggressive in DCE around control flow blocks. For example, in:
```
%a0 = aten::foo()
%b = aten::foo()
%a2, %b2 = prim::If(%cond) {
  block0() {
    %a1 = aten::foo(%.0)
    %b1 = aten::foo(%b)
  } -> (%a1, %b1)
}
return (%a2)
```
we will now dce all the `%b` stuff.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14910

Differential Revision: D13476445

Pulled By: suo

fbshipit-source-id: 2bf5db19711c07dde946697a4f4b270bd8baf791
2018-12-15 01:16:44 -08:00
df614371c7 Mention Jacobian-vector product in the doc of torch.autograd (#15197)
Summary:
A friend of me is learning deep learning and pytorch, and he is confused by the following piece of code from the tutorial https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html#gradients :

```python
x = torch.randn(3, requires_grad=True)

y = x * 2
while y.data.norm() < 1000:
    y = y * 2

print(y)

gradients = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)
y.backward(gradients)

print(x.grad)
```

He don't know where the following line comes from:
```python
gradients = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)
```

What are we computing? Why don't we compute "the gradient of `y` w.r.t `x`"?

In the tutorial, it only says
> You can do many crazy things with autograd!

Which does not explain anything. It seems to be hard for some beginners of deep learning to understand why do we ever do backwards with external gradient fed in and what is the meaning of doing so. So I modified the tutorial in https://github.com/pytorch/tutorials/pull/385
and the docstring correspondingly in this PR, explaining the Jacobian vector product. Please review this PR and https://github.com/pytorch/tutorials/pull/385 together.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15197

Differential Revision: D13476513

Pulled By: soumith

fbshipit-source-id: bee62282e9ab72403247384e4063bcdf59d40c3c
2018-12-15 00:10:30 -08:00
5b542a755f Tensor method rename dims()->sizes() (#15246)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15246

Codemod generated with clangr shard mode, 25 files per diff,

Reviewed By: igorsugak

Differential Revision: D13470369

fbshipit-source-id: ce995beab7c64bebe8b234fb5e6d015940ec2952
2018-12-14 21:11:02 -08:00
f118568662 Create parser.cpp (#15238)
Summary:
Moves implementation into .cpp file. Parser was getting included in several compilation units.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15238

Differential Revision: D13474635

Pulled By: zdevito

fbshipit-source-id: 7dc824eea8f506d6c8ae1aa67aeec0c34d5285fc
2018-12-14 19:31:36 -08:00
e1808be37d Add several features to converting images to blobs (#15204)
Summary:
Several enhancements are implemented:

* Resize the images to be within a boundary between min-size and max-size (can be height and weight). It tries to resize the minimum size to match the min-size and keep the aspect ratio. However, if in that case the maximum size is more than the max-size, then resize the maximum size to be equal to the max-size (and the minimum size is less than min-size). The min/max sizes are specified in argument scale, in a comma separated form. If one of the size is -1, then that size is not a restriction.

* Change the OpenCV resize function arguments from using cv::Size() to the x, y scale. Theoretically they should be the same. But in reality, the two ways of specifying them may result to different resized outputs.

* Once the image is read in, change the data to floats. That means, after resize and other preprocessing steps, the float values are preserved (not truncated to int).

* It is possible to convert data in text format to the blob format.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15204

Reviewed By: llyfacebook

Differential Revision: D13467225

Pulled By: sf-wind

fbshipit-source-id: 7da34a72d43a9603cd7ab953f5821c1222d0178f
2018-12-14 17:37:21 -08:00
717496e6c1 Supply static shape info to Reshape when doing onnxGetCompatibility (#15242)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15242

Newer version ONNX Reshape gets shape info from a tensor. Hence for static backend, we need to provide this info to it when doing `onnxGetCompatibility` too.

Reviewed By: jackm321

Differential Revision: D13471959

fbshipit-source-id: 8a58e28edd900b6ad54a1dbd63ff2579fbe0e820
2018-12-14 16:37:39 -08:00
763b9954f3 FP16MomentumSGDUpdate Op fix and enable for ROCm (#15150)
Summary:
1. Fix a bug in FP16MomentumSGDUpdate operator
2. Enable operator for ROCm
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15150

Differential Revision: D13473145

Pulled By: bddppq

fbshipit-source-id: 4c5c5f30cb9bba658e3639dbe193fa08a304d306
2018-12-14 16:33:45 -08:00
e596d23137 Start unittesting our main observer (#15191)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15191

OSS:

just splitting out basic flags from a unit test. So I can extend them in another test where I need to add additional flags.

Reviewed By: yinghai

Differential Revision: D13159184

fbshipit-source-id: 9823e792cf0ed8d0379235c44564862b7d784845
2018-12-14 16:24:38 -08:00
34f1f2208b Build c10 HIP test
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15233

Reviewed By: ezyang

Differential Revision: D13471002

Pulled By: bddppq

fbshipit-source-id: b42c3bc2b9db672ce50a52eb700cc6ed13d3535f
2018-12-14 15:36:38 -08:00
5e09c7bc80 record unit time in torch.cuda.event (#15221)
Summary: Record unit of time for torch.cuda.Event's elapsed_time

Differential Revision: D13467646

Pulled By: zou3519

fbshipit-source-id: 4f1f4ef5fa4bc5a1b4775dfcec6ab155e5bf8d6e
2018-12-14 15:29:06 -08:00
054456eb93 Preserve module hierarchy on traced modules (#15101)
Summary:
We need this, for example, to properly call `_unpack` when we have a traced module in the hierarchy
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15101

Differential Revision: D13468467

Pulled By: jamesr66a

fbshipit-source-id: c2b6740b12cde6e23395d12e42d4fc2c4c7ca3f2
2018-12-14 15:07:51 -08:00
60f02b87be fix an issue where two rules build the same .py files
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15230

Differential Revision: D13471625

Pulled By: zdevito

fbshipit-source-id: a982413a308c7a9bb5b6a82fe96fd3de44f555aa
2018-12-14 14:52:52 -08:00
bd368b867d Do not ifdef __launch_bounds__ out for ROCm. (#15228)
Summary:
The compiler understands it and profits from knowing it by not using too
many VGPRs as it defaults to 256 default workgroup size.

Fixes a problem in bringup of ROCm 2.0 on gfx906.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15228

Differential Revision: D13470950

Pulled By: bddppq

fbshipit-source-id: f9aa44c7c95299a099c0ea9317b9044cc056acc5
2018-12-14 14:47:32 -08:00
dcd1685282 Revert D13440858: [pytorch][PR] Use a pool of per-thread cudnn handles for each device, updated
Differential Revision:
D13440858

Original commit changeset: 1c6af5c53538

fbshipit-source-id: fda42ea75000d4a4e9c4a8eeaaa5518f7ad9c298
2018-12-14 14:35:01 -08:00
9f1d8f2eeb enabled tests in test_nn, test_cuda and test_sparse (#15232)
Summary:
tests work on ROCm 1.9.2 as present on CI (fp16 bringup, hipMemset and sparse improvements)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15232

Differential Revision: D13470991

Pulled By: bddppq

fbshipit-source-id: 45acc4f9ea5baaaf7672b86eb022948055779925
2018-12-14 14:27:57 -08:00
e9fb4d1f11 Fix jit doc codeblocks and tables (#15227)
Summary:
Some of the codeblocks were showing up as normal text and the "unsupported modules" table was formatted incorrectly
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15227

Differential Revision: D13468847

Pulled By: driazati

fbshipit-source-id: eb7375710d4f6eca1d0f44dfc43c7c506300cb1e
2018-12-14 14:27:56 -08:00
b316e44a46 Remove __forceinline__ hipification step. (#15229)
Summary:
The HIP definition now correctly contains the inline attribute.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15229

Differential Revision: D13470962

Pulled By: bddppq

fbshipit-source-id: 34f8361bda5f3dce20a2eeb530c3a25d1b1bdd06
2018-12-14 14:24:05 -08:00
7a61306031 Enable all clang-tidy performance checks (#15198)
Summary:
This PR adds the final set of clang-tidy checks we should add for our codebase: a last set of performance-related checks. Most fixes here are around changing `auto` to `const auto&` in a few places where unnecessary copies were made, and adding `reserve()` calls before loops doing repeated `push_back()`. Also a few cases of calling `std::string::find` with a single-character string literal instead of a single char, which uses a less efficient string search algorithm meant for searching larger substrings.

![image](https://user-images.githubusercontent.com/6429851/49978940-adc1a780-ff01-11e8-99da-a4e431361f07.png)

ezyang apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15198

Differential Revision: D13468797

Pulled By: goldsborough

fbshipit-source-id: 2bed1ea1c7c162b7f3e0e1026f17125e88c4d5b2
2018-12-14 13:32:47 -08:00
fc2856e9aa Refactor caffe2 CI scripts and add benchmark scripts
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14575

Differential Revision: D13468049

Pulled By: bddppq

fbshipit-source-id: e73bc8742c8a03f498816eee8a72b06a3e19fe48
2018-12-14 13:19:33 -08:00
4327a2d70a Better tests/support for Python/C++ inter-op (#15193)
Summary:
Methods like `module.named_modules()` returns a container of `shared_ptr<nn::Module>`. Currently the `nn::Module` base class does  not have Python bindings. This PR fixes this, and adds more unit tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15193

Differential Revision: D13458713

Pulled By: goldsborough

fbshipit-source-id: 4091fe1b96a1be8db14c6a4307fbacc2b41ff6fe
2018-12-14 08:42:10 -08:00
fb8487d708 Tensor construction codemod(ResizeLike) - 3/7 (#15122)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15122

Codemod generated with clangr shard mode, 25 files per diff,
motivation: https://github.com/pytorch/pytorch/pull/12407

Reviewed By: dzhulgakov

Differential Revision: D13419643

fbshipit-source-id: 65b5a037b94d458b944d51f790ba2829db1fb530
2018-12-14 02:08:37 -08:00
78bf1a9065 Revert D13407930: [pytorch][PR] Support torch.tensor in script
Differential Revision:
D13407930

Original commit changeset: d17f1195a221

fbshipit-source-id: f4458872c48ec4a2c9983b21ed90bcdc0ae665b7
2018-12-13 22:13:07 -08:00
331c4b5b4d caffe2 - make DataRandomFiller usable in unit tests (#15027)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15027

- Make DataRandomFiller able to accept input_dims and input_types for only non intermediate inputs. Add a helper to fill input directly to a workspace

Reviewed By: highker

Differential Revision: D13408345

fbshipit-source-id: 5fc54d33da12e3f0a200e79380d4c695b0339b17
2018-12-13 20:45:52 -08:00
66b26806fc caffe2 - easy - utils to set argument of operator (#15022)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15022

Add setArgument testing utils to make it easy to set argument for an operator

Reviewed By: yinghai

Differential Revision: D13405225

fbshipit-source-id: b5c1859c6819d53c1a44718e2868e3137067df36
2018-12-13 20:45:50 -08:00
9726651d1e caffe2 - easy - test utils for tensor assertion (#15020)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15020

Add test utils for assertion of a tensor (sizes and values)

Reviewed By: salexspb

Differential Revision: D13401146

fbshipit-source-id: bc385df074043e03ea884940b5631b96de4a607e
2018-12-13 20:45:48 -08:00
d0b4ae835d caffe2 - easy - test utils to compare tensors in two workspaces (#15181)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15181

Add test utils to compare tensors in two workspaces

Reviewed By: ZolotukhinM

Differential Revision: D13387212

fbshipit-source-id: e19d932a1ecc696bd0a08ea14d9a7485cce67bb2
2018-12-13 20:45:46 -08:00
a0f68646ac caffe2 - easy - test utils to fill tensors (#15019)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15019

Put some utils to fill tensors to test_utils

Reviewed By: salexspb

Differential Revision: D13386691

fbshipit-source-id: 51d891aad1ca12dc5133c0352df65b8db4f96edb
2018-12-13 20:45:44 -08:00
8fedde5530 caffe2 - easy - test utils to create operator (#15180)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15180

Test utils to create an operator

On top of D13370461

Reviewed By: ZolotukhinM

Differential Revision: D13382773

fbshipit-source-id: a88040ed5a60f31d3e73f1f958219cd7338dc52e
2018-12-13 20:45:42 -08:00
eb6fec3652 caffe2 - easy - Create test_util to make it easier to write C++ unit tests (#15014)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15014

Currently it looks like many of the simple operations such as comparing tensors, creating tensors, fetching tensors... are too verbose and took effort to write correctly in unit tests.
Easy to use utilities are often more important to increase productivity writing unit tests. While caffe2 python unit tests are relatively easier to write at the moment, the C++ side seems lacking.
In this change I create a test_util, started with assertsTensorEquals, getTensor, createTensor, and we can start putting more easy to use utilities there.

Reviewed By: salexspb

Differential Revision: D13370461

fbshipit-source-id: bee467a127e1d032ef19482f98aa5c776cf508c0
2018-12-13 20:45:41 -08:00
81644ed9ab Fix derivative for mvlgamma (#15049)
Summary:
Fixes #15015.

Added tests to validate derivative.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15049

Reviewed By: soumith

Differential Revision: D13434117

Pulled By: zou3519

fbshipit-source-id: 4a292600af9eb08b67c0f8b5482e9512aac95e72
2018-12-13 20:32:57 -08:00
0b9b965c1a Fix numpy conversion for int8 tensor
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15194

Differential Revision: D13459270

Pulled By: li-roy

fbshipit-source-id: 605534add263860a3ad9a7fa70888301ee0bf8e4
2018-12-13 19:38:09 -08:00
fb140c7828 add erf and erfc to fuser/autodiff
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15139

Differential Revision: D13455690

Pulled By: soumith

fbshipit-source-id: b06e5f5d362869c2e5fa11a52f9450d77c30d4cb
2018-12-13 19:17:40 -08:00
bb8ee2de0f Move TensorImpl::CopyFrom to caffe2::Tensor (2/2) (#14858)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14858

This diff doesn't change logic but just takes the existing code and moves it to caffe2::Tensor

Reviewed By: ezyang

Differential Revision: D13365817

fbshipit-source-id: bc73b27a793602cb14200dcdf357aa63233da43c
2018-12-13 18:41:24 -08:00
070f33f154 Move TensorImpl::CopyFrom to caffe2::Tensor (1/2) (#14656)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14656

This diff doesn't move it yet, but prepares it to be moved, i.e. removes all access to class internals.

dzhulgakov: Please comment on if you think it still makes sense to land this even though it's not blocking anymore since we're going to move at::CopyBytes anyhow.

ezyang: There's some changes in the implementation, especially handling undefined dest tensors. Please review carefully.

Reviewed By: ezyang

Differential Revision: D13287688

fbshipit-source-id: 17800ca8a79ab1633f23be58d96f99a160d8ed24
2018-12-13 18:41:23 -08:00
dc72a5e02c For rotated proposals, replace cv::rotatedRectangleIntersection with a correct version that doesn't have underflow problem (#15113)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15113

cv::rotatedRectangleIntersection has a known float underflow bug that would cause failure in ```CV_Assert(intersection.size() <= 8)```

For rotated proposals, replace cv::rotatedRectangleIntersection with a correct version that doesn't have underflow problem.

Otherwise, when ```USE_CPP_GENERATE_PROPOSALS = true```, the training would fail.

Reviewed By: viswanathgs

Differential Revision: D13429770

fbshipit-source-id: 5e95d059f3c668f14059a0a83e8e53d8554cdb99
2018-12-13 18:13:46 -08:00
aecab53778 Support torch.tensor in script (#14913)
Summary:
Adding support for torch.tensor in script.

The input list is typed as t[], because it can be arbitrarily nested. I added a check a compile time check  that the inner type of the list is a bool, float, or int.

Also adds specialization for Boolean Lists, which already existed at the ivalue level but had not been added to the compiler yet
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14913

Differential Revision: D13407930

Pulled By: eellison

fbshipit-source-id: d17f1195a22149d5b0d08d76c89a7fab8444f7c5
2018-12-13 17:38:38 -08:00
bbbfda72a0 Remove TensorImpl -> Type dependency
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15086

Reviewed By: dzhulgakov

Differential Revision: D13425628

fbshipit-source-id: 08a8a774d17b071367454e027012a02f96d177d4
2018-12-13 17:10:59 -08:00
1e9c384afb Enable performance-unnecessary-value-param in .clang-tidy (#15026)
Summary:
This PR fixes around 250 places in the codebase where we were making unnecessary copies of objects (some large, some small).

ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15026

Differential Revision: D13458784

Pulled By: goldsborough

fbshipit-source-id: be5148b2ce09493588d70952e6f6d6ff5ec5199b
2018-12-13 16:15:35 -08:00
bdfff2f8c2 Add missing caffe2_hip extension in setup.py
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15189

Reviewed By: orionr

Differential Revision: D13457644

Pulled By: bddppq

fbshipit-source-id: c2363e9b8fd21709b62777e5b2199f01ec1c65f8
2018-12-13 15:59:51 -08:00
de0784510d Remove disabled_features in hipify
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15098

Reviewed By: ezyang

Differential Revision: D13453762

Pulled By: bddppq

fbshipit-source-id: e177042c78f5bf393163d660c25b80285353853d
2018-12-13 15:43:57 -08:00
855d9e1f19 Run ONNX cuda backend test cases via ROCm
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15069

Differential Revision: D13427757

Pulled By: bddppq

fbshipit-source-id: ba0273d75986cd5b146f7041a83c63ddf9c6c0cf
2018-12-13 15:10:00 -08:00
6911ce19d7 Remove _finfo; replace _finfo usage with torch.finfo (#15165)
Summary:
This PR removes the usage of _finfo defined in torch.distributions.utils and changes the call sites
to use torch.finfo instead

Differential Revision: D13451936

Pulled By: soumith

fbshipit-source-id: 6dbda3a6179d9407bc3396bf1a2baf3e85bc4cf2
2018-12-13 14:30:27 -08:00
f1f7c16c90 Tensor construction codemod(ResizeLike) - 4/7 (#15088)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15088

Codemod generated with clangr shard mode, 25 files per diff,
motivation: https://github.com/pytorch/pytorch/pull/12407

Reviewed By: ezyang

Differential Revision: D13419682

fbshipit-source-id: 3e59403bc1c0e71e5cb66df932ed0c6a0a72e643
2018-12-13 13:39:56 -08:00
cbd1c519c4 Replace non-printable-ascii characters in ProtoDebugString (#14918)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14918

When ProtoBuf-Lite is in use, ProtoDebugString just calls SerializeAsString.
This produces binary output, which is not a very suitable "debug" string.
Specifically, we've observed it causing problems when calling code tries to
add the debug string to a Java exception message (which requires valid UTF-8).
Now, we replace all non-ASCII bytes with "?".

This is not a very fast implementation, but generating debug strings shouldn't
be a performance-sensitive operation in any application.

Reviewed By: dzhulgakov

Differential Revision: D13385540

fbshipit-source-id: 8868172baf20efaf53fecf7d666a6980f59b64f5
2018-12-13 13:16:24 -08:00
994f72ee3e Tensor construction codemod(ResizeLike) - 6/7 (#15137)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15137

Codemod generated with clangr shard mode, 25 files per diff,
motivation: https://github.com/pytorch/pytorch/pull/12407

Reviewed By: ezyang

Differential Revision: D13419736

fbshipit-source-id: f4ad7b9582c2f809258169b7fef9adbca7063d99
2018-12-13 12:47:33 -08:00
43c0b50c2e Tensor construction codemod(ResizeLike) - 5/7 (#15084)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15084

Codemod generated with clangr shard mode, 25 files per diff,
motivation: https://github.com/pytorch/pytorch/pull/12407

Reviewed By: ezyang

Differential Revision: D13419711

fbshipit-source-id: dd2b740c3f13d8087085bafc5571aaf908d1af42
2018-12-13 12:42:52 -08:00
86fbf17ba6 Use std::vector instead of alloca to work around hcc crash
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15175

Differential Revision: D13453708

Pulled By: bddppq

fbshipit-source-id: f8c147ae9f679e395fee9d4c73ebcca052c9a752
2018-12-13 12:34:36 -08:00
f61612206c Fix old tensor OutputTensorCopyFrom usage in ImageInput operator (#15094)
Summary:
cc jerryzh168
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15094

Differential Revision: D13451898

Pulled By: bddppq

fbshipit-source-id: 27906be62fb88aaa13c257441a2e35a285b445ee
2018-12-13 11:48:19 -08:00
e5bd6fe86d Kill non-forward, non-backward functions generated from nn.yaml (#15127)
Summary:
Updating binding to legacy functions.
Remove unused declarations.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15127

Differential Revision: D13433405

Pulled By: VitalyFedyunin

fbshipit-source-id: 58544d38affd20818742338c9eb789d9d14ccbaa
2018-12-13 11:34:50 -08:00
bc80deea1b Delete defunct USE_SIMPLE_BASE_CTOR_DTOR (#15144)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15144

Differential Revision: D13440872

Pulled By: ezyang

fbshipit-source-id: 2b1d73fac0c63729ba01d8f129642334ae9d9cf3
2018-12-13 11:20:37 -08:00
e51092a2b8 Fix typo (#15045)
Summary:
Simple typo fix
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15045

Reviewed By: dzhulgakov

Differential Revision: D13413509

Pulled By: houseroad

fbshipit-source-id: be66700c30d038368b1433232a4e3fd9299c83d6
2018-12-13 11:13:19 -08:00
ca4358c8f5 Use a pool of per-thread cudnn handles for each device, updated (#15080)
Summary:
Rebased version of https://github.com/pytorch/pytorch/pull/14861, hopefully addressing ezyang's comments.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15080

Differential Revision: D13440858

Pulled By: ezyang

fbshipit-source-id: 1c6af5c53538b81c6b92cf1dda231ed333f28035
2018-12-13 10:24:06 -08:00
214f46faf5 Fix bincount for non-contiguous inputs on CPU (#15109)
Summary:
Fixes #15058.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15109

Differential Revision: D13447448

Pulled By: soumith

fbshipit-source-id: 56e8d42934538fb00465105a2c5ccfeb7c18a651
2018-12-13 09:44:20 -08:00
bf7a2b9125 Unify SparseTensorImpl::size_ and TensorImpl::sizes_
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15130

Differential Revision: D13434981

Pulled By: VitalyFedyunin

fbshipit-source-id: 98bd4d66834a3c3d2ea577adb0c8413852da095d
2018-12-13 08:55:35 -08:00
0bf1383f0a Python <-> C++ Frontend inter-op (#13481)
Summary:
This PR enables C++ frontend modules to be bound into Python and added as submodules of Python modules. For this, I added lots of pybind11 bindings for the `torch::nn::Module` class, and modified the `torch.nn.Module` class in Python to have a new Metaclass that makes `isinstance(m, torch.nn.Module)` return true when `m` is a C++ frontend module. The methods and fields of C++ modules are bound in such a way that they work seamlessly as submodules of Python modules for most operations (one exception I know of: calling `.to()` ends up calling `.apply()` on each submodule with a Python lambda, which cannot be used in C++ -- this may require small changes on Python side).

I've added quite a bunch of tests to verify the bindings and equality with Python. I think I should also try out adding a C++ module as part of some large PyTorch module, like a WLM or something, and see if everything works smoothly.

The next step for inter-op across our system is ScriptModule <-> C++ Frontend Module inter-op. I think this will then also allow using C++ frontend modules from TorchScript.

apaszke zdevito

CC dzhulgakov
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13481

Differential Revision: D12981996

Pulled By: goldsborough

fbshipit-source-id: 147370d3596ebb0e94c82cec92993a148fee50a7
2018-12-13 08:04:02 -08:00
b14d6d730a Reuse KernelSpec for FusionGroups with equivalent graphs (#14541)
Summary:
Before this PR, loop unrolling + the graph fuser was creating multiple
FusionGroups with the same bodies (with different variable names) for
JIT LSTMs. Each FusionGroup got registered to a separate fusion key;
each key resulted in a different compilation for the same
specializations.

This PR makes it so that when registering FusionGroups with the fusion
compiler, the compiler first checks the KernelSpec cache to see if the
FusionGroup's graph exists already. If it does, then return the
corresponding KernelSpec's key to share compiled kernels.

In addition, graphs in the KernelSpec cache are canonicalized before
being cached. I added a flag to the canonicalize pass to remove unique
names of values.

This shortens the compile time for a JIT LSTM (seq_len of 100, loop
unroll factor of 8) from 5.3s to 2.3s. Most of this compile time is
running the graph fuser and/or fusion compiler; while this PR
makes it so that there is only one unique kernel in the forward pass,
there are a lot of different kernels (6) in the backward pass
(after loop unrolling) that should be investigated.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14541

Differential Revision: D13324487

Pulled By: zou3519

fbshipit-source-id: b841d82ed35a959b5cfc72db033bf5a7b42cc4fb
2018-12-13 07:54:35 -08:00
aa022313cb Removes THCNumerics usages in RNN.cu (#15085)
Summary:
We don't need THCNumerics here since at::Half can be implicitly converted to float and the cuda math dispatches are handled by `/usr/local/cuda/include/crt/math_functions.hpp` and `cmath`. ATen should be free of THCNumerics after this and when porting kernels from THC, one should not use THCNumerics.

Should close: https://github.com/pytorch/pytorch/issues/11878
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15085

Differential Revision: D13447558

Pulled By: soumith

fbshipit-source-id: 4ff5cbf838edcd01e2d1397e4d7f4f920e9e9fc3
2018-12-13 00:24:17 -08:00
1e0eab5df8 minimize header file includes from _avx2.cc (#14950)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14950

Minimize the number of headers included from _avx2.cc files to avoid accidental compilation of functions defined the header files reused by other translation units that can lead to illegal instruction errors.

Reviewed By: dskhudia

Differential Revision: D13394483

fbshipit-source-id: 67149a6fb51f7f047e745bfe395cb6dd4ae7c1ae
2018-12-13 00:18:11 -08:00
4b97a46421 Disable strict-overflow flag to avoid compilation error (#14977)
Summary:
Disable strict-overflow flag to avoid compilation error
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14977

Differential Revision: D13447577

Pulled By: soumith

fbshipit-source-id: 1957bd5aa3c7b79219da3dd53560464977c89526
2018-12-12 22:41:33 -08:00
1e93317b99 Remove "early-release beta" disclaimer from README (#15136)
Summary:
Now that PyTorch 1.0 is out, this should be updated :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15136

Differential Revision: D13447377

Pulled By: soumith

fbshipit-source-id: bd4e662c53d0699f25d4d90c1b4c1e182b4427c2
2018-12-12 22:14:14 -08:00
fabd23cb2d support casting to string (#15110)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15110

support casting to string on CPU

Reviewed By: intermilan

Differential Revision: D13429381

fbshipit-source-id: b737a1ba1237b10f692d5c42b42a544b94ba9fd1
2018-12-12 21:33:58 -08:00
1717ea1da0 Implementation of ChannelShuffle Op for MKLDNN (#15106)
Summary:
the speed-up of a single operation is up to 3X .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15106

Differential Revision: D13429596

Pulled By: bddppq

fbshipit-source-id: f8d987cafeac9bef9c3daf7e43ede8c6a4ee2ce5
2018-12-12 20:25:12 -08:00
895cb8fcea Fix resize for edge case tensors (#14874)
Summary:
Certain tensor shapes failed when being resized. This pull request addresses the bug found in #13404.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14874

Differential Revision: D13429788

Pulled By: soumith

fbshipit-source-id: 8aa6451dbadce46d6d1c47a01cb26e6559bcfc8c
2018-12-12 19:56:23 -08:00
78a77667dd Autoformat build_variables.py (#15152)
Summary:
autoformat `tools/build_variables.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15152

Differential Revision: D13445343

Pulled By: goldsborough

fbshipit-source-id: fd63588de114cb92deda03fa1a0b36f5f9082b2f
2018-12-12 19:30:17 -08:00
fab78827d6 don't compile dnnlowp.cc in avx2 option (#15147)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15147

Forgot to take out dnnlowp.cc from avx2 list in a previous diff.

Reviewed By: dskhudia

Differential Revision: D13440686

fbshipit-source-id: 9ada98b6e885c7d5f22c91a735ff60304480b4cb
2018-12-12 18:57:09 -08:00
d8260239a0 docs: minor spelling tweaks
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15148

Differential Revision: D13443708

Pulled By: suo

fbshipit-source-id: 5e3ec0afd3416ab8ce207f2d04105c49e1c04611
2018-12-12 18:17:14 -08:00
2211a283d2 Export defs.bzl to open source for pytorch (#15132)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15132

Pull Request resolved: https://github.com/facebook/fbshipit/pull/64

Reviewed By: dzhulgakov

Differential Revision: D13424093

fbshipit-source-id: bbebef964b9f3aef8f59cd394eca068680c36b5a
2018-12-12 17:40:29 -08:00
107c9ef518 Add back c2 string_utils include header to benchmark_helper
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15143

Differential Revision: D13439694

fbshipit-source-id: 78698b66d52a0178118cbf3e79a7a5ad1763d47b
2018-12-12 16:38:00 -08:00
6610ace28b use ROCm 1.9.2 fp16 capabilities in rocBLAS and MIOpen interfaces (#14994)
Summary:
* relax MIOpen if statement to allow fp16/fp32 mixed precision training now supported by ROCm 1.9.2
* use gemm_ex API of rocBLAS in ROCm 1.9.2 instead of the previous hgemm API
* with this: enable all but one half test in test_nn

While there, fix also:
* a group convolution issue w/ MIOpen pertaining to initializing MIOpen on multi-GPU systems properly we detected while working on this
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14994

Differential Revision: D13439869

Pulled By: bddppq

fbshipit-source-id: 75e4eb51a59488882e64b5eabdc30555b25be25e
2018-12-12 16:16:47 -08:00
f34d827007 Optimize CPU GenerateProposals op by lazily generating anchors (3-5x faster) (#15103)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15103

There are two main optimizations in this diff:
1. We generate all anchors for every single spatial grid first, and then apply
NMS to pick 2000 anchors according to RPN_PRE_NMS_TOP_N. By first sorting the
score and picking the 2000 top ones and then lazily generating only the
corresponding anchors is much faster.
2. Transposing bbox_deltas from (num_anchors * 4, H, W) to
(H, W, num_anchors * 4) was also quite slow - taking about 20ms in the RRPN
case when there are lots of anchors which it's negligible for RPN case (like
0.1 ms). Instead of transponsing, performing all operations in the
(num_anchors, H, W) format speeds things up.

For regular RPN scenario, this gives 5x speedup from 5.84ms to 1.18ms a case
with 35 anchors over a 600x600 image.

For rotated boxes with 245 anchors, the runtime down from 80ms to 27ms per
iter.

Reviewed By: newstzpz

Differential Revision: D13428688

fbshipit-source-id: 6006b332925e01a7c9433ded2ff5dc9e6d96f7d3
2018-12-12 15:53:52 -08:00
90f9e8103c Implement torch.tril_indices and torch.triu_indices (#12653) (#14904)
Summary:
This is an optimized implementation that does the following:

1. created an empty Tensor of correct size.
2. fill the Tensor with correct values.

The following three designs to fill in the Tensor result in roughly the same performance. Hence, the 2nd option is taken for simpler code, and to return contiguous tensors.

1. Sequential: fill row coordinates first, then columns. This results in two for-loop and more arithmetic operations.
2. Interleaved: fill in index coordinates one by one, which jumps between the two output Tensor rows in every iteration.
3. Transpose: create a n X 2 Tensor, fill the Tensor sequentially, and then transpose it.

<img width="352" alt="screen shot 2018-12-10 at 3 54 39 pm" src="https://user-images.githubusercontent.com/16999635/49769172-07bd3580-fc94-11e8-8164-41839185e9f9.png">

NOTE:

This implementation returns a 2D tensor, instead of a tuple of two tensors. It means that users will not be able to do the following:

```python
x = torch.ones(3, 3)
i = torch.tril_indices(3, 3)
x[i]  # need to first convert the 2D tensor into a tuple of two 1D tensors.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14904

Reviewed By: zou3519

Differential Revision: D13433027

Pulled By: mrshenli

fbshipit-source-id: 41c876aafcf584832d7069f7c5929ffb59e0ae6a
2018-12-12 15:40:14 -08:00
342e62f1e3 Minor documentation mistake (#15068)
Summary:
keepdim is a optional parameter for torch.max()
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15068

Differential Revision: D13437745

Pulled By: zou3519

fbshipit-source-id: b5198c7d4ae17758cd136f6e5aecc6cb5838f174
2018-12-12 15:24:26 -08:00
5837320b70 Add script standard library documentation + cleanup (#14912)
Summary:
Documents what is supported in the script standard library.

* Adds `my_script_module._get_method('forward').schema()` method to get function schema from a `ScriptModule`
* Removes `torch.nn.functional` from the list of builtins. The only functions not supported are `nn.functional.fold` and `nn.functional.unfold`, but those currently just dispatch to their corresponding aten ops, so from a user's perspective it looks like they work.
* Allow printing of `IValue::Device` by getting its string representation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14912

Differential Revision: D13385928

Pulled By: driazati

fbshipit-source-id: e391691b2f87dba6e13be05d4aa3ed2f004e31da
2018-12-12 12:30:13 -08:00
64b3364209 Move adaptive avg pooling 2d to ATen native (#14714)
Summary:
adaptive_avg_pool1d, adaptive_avg_pool2d, and adaptive_avgpool3d are neural network functions that are currently implemented in our legacy THNN (CPU) / THCUNN (CUDA) libraries.  It is generally better if these live in our new library ATen, since it is more feature complete and reduces cognitive overhead.

This change moves currently to adaptive_avg_pool1d and adaptive_avg_pool2d to ATen.

timed relevant cpu tests with this change:
```
[ialex@devgpu064.ash5 ~/pytorch] time python test/test_nn.py
test_AdaptiveAvgPool1d (__main__.TestNN)
test_AdaptiveAvgPool1d_cuda (__main__.TestNN)
test_AdaptiveAvgPool2d_single (__main__.TestNN)
test_AdaptiveAvgPool2d_single_cuda (__main__.TestNN)
test_AdaptiveAvgPool2d_tuple (__main__.TestNN)
test_AdaptiveAvgPool2d_tuple_cuda (__main__.TestNN)
test_AdaptiveAvgPool2d_tuple_none (__main__.TestNN)
test_AdaptiveAvgPool2d_tuple_none_cuda (__main__.TestNN)
test_AdaptiveAvgPool3d_single (__main__.TestNN)
test_AdaptiveAvgPool3d_single_cuda (__main__.TestNN)
test_AdaptiveAvgPool3d_tuple (__main__.TestNN)
test_AdaptiveAvgPool3d_tuple_cuda (__main__.TestNN)
test_AdaptiveAvgPool3d_tuple_none (__main__.TestNN)
test_AdaptiveAvgPool3d_tuple_none_cuda (__main__.TestNN)
test_adaptive_log_softmax (__main__.TestNN)
test_adaptive_pooling_input_size (__main__.TestNN)
test_adaptive_pooling_size_none (__main__.TestNN)
.s.s.s.s.s.s.s...
----------------------------------------------------------------------
Ran 17 tests in 6.273s

OK (skipped=7)

real	0m7.164s
user	3m1.289s
sys	0m0.905s
```

compared to master:
```
[ialex@devgpu064.ash5 ~/pytorch] time python test/test_nn.py
test_AdaptiveAvgPool1d (__main__.TestNN)
test_AdaptiveAvgPool1d_cuda (__main__.TestNN)
test_AdaptiveAvgPool2d_single (__main__.TestNN)
test_AdaptiveAvgPool2d_single_cuda (__main__.TestNN)
test_AdaptiveAvgPool2d_tuple (__main__.TestNN)
test_AdaptiveAvgPool2d_tuple_cuda (__main__.TestNN)
test_AdaptiveAvgPool2d_tuple_none (__main__.TestNN)
test_AdaptiveAvgPool2d_tuple_none_cuda (__main__.TestNN)
test_AdaptiveAvgPool3d_single (__main__.TestNN)
test_AdaptiveAvgPool3d_single_cuda (__main__.TestNN)
test_AdaptiveAvgPool3d_tuple (__main__.TestNN)
test_AdaptiveAvgPool3d_tuple_cuda (__main__.TestNN)
test_AdaptiveAvgPool3d_tuple_none (__main__.TestNN)
test_AdaptiveAvgPool3d_tuple_none_cuda (__main__.TestNN)
test_adaptive_log_softmax (__main__.TestNN)
test_adaptive_pooling_input_size (__main__.TestNN)
test_adaptive_pooling_size_none (__main__.TestNN)
.s.s.s.s.s.s.s...
----------------------------------------------------------------------
Ran 17 tests in 7.232s

OK (skipped=7)

real	0m8.065s
user	3m34.714s
sys	0m2.440s
```

also timed relevant cuda tests with this change:
```
[ialex@devgpu064.ash5 ~/pytorch] time python test/test_nn.py
test_AdaptiveAvgPool1d (__main__.TestNN)
test_AdaptiveAvgPool1d_cuda (__main__.TestNN)
test_AdaptiveAvgPool2d_single (__main__.TestNN)
test_AdaptiveAvgPool2d_single_cuda (__main__.TestNN)
test_AdaptiveAvgPool2d_tuple (__main__.TestNN)
test_AdaptiveAvgPool2d_tuple_cuda (__main__.TestNN)
test_AdaptiveAvgPool2d_tuple_none (__main__.TestNN)
test_AdaptiveAvgPool2d_tuple_none_cuda (__main__.TestNN)
test_AdaptiveAvgPool3d_single (__main__.TestNN)
test_AdaptiveAvgPool3d_single_cuda (__main__.TestNN)
test_AdaptiveAvgPool3d_tuple (__main__.TestNN)
test_AdaptiveAvgPool3d_tuple_cuda (__main__.TestNN)
test_AdaptiveAvgPool3d_tuple_none (__main__.TestNN)
test_AdaptiveAvgPool3d_tuple_none_cuda (__main__.TestNN)
test_adaptive_log_softmax (__main__.TestNN)
test_adaptive_pooling_input_size (__main__.TestNN)
test_adaptive_pooling_size_none (__main__.TestNN)
.................
----------------------------------------------------------------------
Ran 17 tests in 21.049s

OK

real	0m24.106s
user	0m20.890s
sys	0m4.026s
```

compared to master
```
[ialex@devgpu064.ash5 ~/pytorch] time python test/test_nn.py
test_AdaptiveAvgPool1d (__main__.TestNN)
test_AdaptiveAvgPool1d_cuda (__main__.TestNN)
test_AdaptiveAvgPool2d_single (__main__.TestNN)
test_AdaptiveAvgPool2d_single_cuda (__main__.TestNN)
test_AdaptiveAvgPool2d_tuple (__main__.TestNN)
test_AdaptiveAvgPool2d_tuple_cuda (__main__.TestNN)
test_AdaptiveAvgPool2d_tuple_none (__main__.TestNN)
test_AdaptiveAvgPool2d_tuple_none_cuda (__main__.TestNN)
test_AdaptiveAvgPool3d_single (__main__.TestNN)
test_AdaptiveAvgPool3d_single_cuda (__main__.TestNN)
test_AdaptiveAvgPool3d_tuple (__main__.TestNN)
test_AdaptiveAvgPool3d_tuple_cuda (__main__.TestNN)
test_AdaptiveAvgPool3d_tuple_none (__main__.TestNN)
test_AdaptiveAvgPool3d_tuple_none_cuda (__main__.TestNN)
test_adaptive_log_softmax (__main__.TestNN)
test_adaptive_pooling_input_size (__main__.TestNN)
test_adaptive_pooling_size_none (__main__.TestNN)
.................
----------------------------------------------------------------------
Ran 17 tests in 23.021s

OK

real	0m27.095s
user	0m20.121s
sys	0m3.668s
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14714

Differential Revision: D13384084

Pulled By: xnder

fbshipit-source-id: 344442103ccbbda72d3c010d2feea00e9985d226
2018-12-12 12:25:22 -08:00
63e77ab6c4 Move numa.{h, cc} to c10/util (#15024)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15024

Pull Request resolved: https://github.com/pytorch/pytorch/pull/14393

att

Reviewed By: dzhulgakov

Differential Revision: D13380559

fbshipit-source-id: abc3fc7321cf37323f756dfd614c7b41978734e4
2018-12-12 12:21:10 -08:00
b34ab435ef Stop erroneously running aten::warn (#15124)
Summary:
Fixes #15119. Before this PR, we were propagating constants through
aten::warn AND running it as a part of shape analysis.
This caused aten::warn to be run regardless of if it is
supposed to be run dynamically. This PR adds an exclusion for aten::warn
in constant propagation and shape analysis, similar to that of prim::RaiseException.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15124

Differential Revision: D13432815

Pulled By: zou3519

fbshipit-source-id: 15ab533ce2accb2da3fd4e569070c7979ce61708
2018-12-12 11:35:23 -08:00
2d485ffb17 Move CUDAGuard, CUDAStream and CUDAGuardImpl to c10/cuda (#14248)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14248

This diff also introduces a horrifying hack to override CUDA's DeviceGuardImpl
with a HIPGuardImplMasqueradingAsCUDA, to accommodate PyTorch's current
behavior of pretending CUDA is HIP when you build with ROCm enabled.

Reviewed By: bddppq

Differential Revision: D13145293

fbshipit-source-id: ee0e207b6fd132f0d435512957424a002d588f02
2018-12-12 11:24:26 -08:00
9943cf2378 Kill Type.storage. (#15075)
Summary:
It's not used.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15075

Reviewed By: ezyang

Differential Revision: D13422487

Pulled By: gchanan

fbshipit-source-id: 272aa0a10e96f3ffb97d571490b517f972b9dcf7
2018-12-12 10:57:54 -08:00
9d2955c39c fix infinite loop when get_max_threads is nonzero but num_threads is 1
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15114

Differential Revision: D13431891

Pulled By: umanwizard

fbshipit-source-id: f968b8e50cf776c346d4a28d72b12e7856c95839
2018-12-12 10:04:18 -08:00
68ad9ae5be Ensure there aren't variables in checked_tensor_unwrap, checked_tenso… (#15105)
Summary:
…r_list_unwrap.

These functions use unsafeGetTensorImpl(), which doesn't work with Variables (in a silent way that may blow up later).
So let's do early checking.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15105

Reviewed By: ezyang

Differential Revision: D13429149

Pulled By: gchanan

fbshipit-source-id: b85f6f5b7cdb9a6dd0c40205b924c840a3920ba0
2018-12-12 09:58:03 -08:00
0ad39ec5c1 Add better support for bools in the graph fuser (#15057)
Summary:
Fixes #15038.

aten::_cast_Float(tensor, non_blocking) support was added in #14336.
Its second argument is a bool, but because we don't support generating values
of type bool in the fuser codegen, the codegen errored out.

aten::_cast_Float in the fuser never actually uses its non_blocking
argument, so another way to fix this would be to have a special op for a
fused cast but I thought that we might have fusible ops that do take
bool arguments in the future so this would be good to have.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15057

Differential Revision: D13432091

Pulled By: zou3519

fbshipit-source-id: 455fe574f5f080aca9a112e346b841a2534a8dc3
2018-12-12 09:39:44 -08:00
f36a84b71b fix some tests that I accidentally disabled (#15077)
Summary:
While moving these scenarios into `_test_dim_ops` I accidentally left an empty loop in the actual tests, causing them to do nothing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15077

Differential Revision: D13428759

Pulled By: umanwizard

fbshipit-source-id: 08f53068981d9192c1408878b168e9053f4dc92e
2018-12-12 09:25:34 -08:00
3ae684266a Don't setup x86_64-linux-gnu-gcc as an sccache wrapper. (#15078)
Summary:
When I do this setup in a local Docker development environment,
I get the following error:

    x86_64-linux-gnu-gcc: error trying to exec 'cc1plus': execvp: No such file or directory

Somehow, gcc seems to get confused when it gets run from the wrong
directory.  Best not to do it.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15078

Differential Revision: D13432143

Pulled By: ezyang

fbshipit-source-id: b18e15f493503a4c8205c85f92a214e49762a7bc
2018-12-12 08:01:03 -08:00
00a4c8d41c Use c10::to_string that works cross platform (#15117)
Summary:
Fix master breakage introduced in #15108
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15117

Differential Revision: D13430568

Pulled By: bddppq

fbshipit-source-id: ce10bc552f085d1bf0afbc13119991bee014ac95
2018-12-12 02:58:49 -08:00
1423c0d9f1 Add EmptyNameScope to allow you jump out from current scope. (#14631)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14631

adding a empty name scope to allow people jump out from current namescope.

This could be useful when you want to access blob from parent or sibling scope.

 Facebook:

e.g: we encoutered a potential usecase in D13124249 (it's a large diff, please search by EmptyNameScope in that diff), we need to access to a blob declared in root namescope from a device namescope (device namescope has been used by parallel_GPU API). `EmptyNameScope` can help us do that with ease.

I referenced to `EmptyDeviceScope` D6103412 while implementing this one.

Reviewed By: yinghai

Differential Revision: D13272240

fbshipit-source-id: d4cde5abcc2336e456b6c6ef086266ef94d86da8
2018-12-12 01:39:50 -08:00
479481b6cb Remove linker and dlopen flags that allowed undefined symbols in rocm build (#15091)
Summary:
Previously the undefined symbols were caused by disabled_modules in tools/amd_build/disabled_features.json (now it's cleared).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15091

Differential Revision: D13429595

Pulled By: bddppq

fbshipit-source-id: b341e83f9e5a8d16440a364e837b045a8a4fd6e1
2018-12-11 23:23:47 -08:00
0dade9862c Fix serialization (#15033)
Summary:
Fixes a bug where (de-)/serializing a hierarchy of submodules where one submodule doesn't have any parameters, but its submodules do, doesn't get properly loaded. This had to do with the fact that the old protobuf format couldn't store empty parameters.

Fixes https://github.com/pytorch/pytorch/issues/14891

soumith ezyang ebetica
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15033

Differential Revision: D13411322

Pulled By: goldsborough

fbshipit-source-id: 2ef73b2aa93fa9e46b1cbe1fd47d9f134d6016d5
2018-12-11 22:43:36 -08:00
e20f9bbead Update the output format for benchmark_helper. It outputs the dimensi… (#15108)
Summary:
…on first and all the values in the next line. This way, it can output arbitrary blob
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15108

Reviewed By: llyfacebook

Differential Revision: D13429346

Pulled By: sf-wind

fbshipit-source-id: 5e0bba2a46fbe8d997dfc3d55a698484552e3af8
2018-12-11 22:24:56 -08:00
b07ee44f40 Pre-commit flake8/clang-tidy (#15102)
Summary:
Provide a pre-commit hook that does flake8 and clang tidy checks. Enables the clang-tidy script to run in parallel to make it fast enough to be used in a pre-commit hook.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15102

Reviewed By: soumith

Differential Revision: D13429629

Pulled By: zdevito

fbshipit-source-id: bd52fe5652f29b033de8d9926d78350b2da4c2fc
2018-12-11 22:18:18 -08:00
f8455ed754 add gloo support for gather on GPU (#14916)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14916

as titled

Reviewed By: pietern

Differential Revision: D13267832

fbshipit-source-id: 3b89d08af93f74941f17ff892c33fc2a4a023c19
2018-12-11 21:21:10 -08:00
3fa53da61a Fix include paths for UndefinedTensorImpl.h
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14818

Reviewed By: ezyang

Differential Revision: D13348042

fbshipit-source-id: 11bdfc755767ce9d0a6fa95b2cf49d50adde8d60
2018-12-11 21:01:45 -08:00
63db95dd11 Move UndefinedTensorImpl to c10 (meh) (#14817)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14817

unfortunately, we still need this.

Reviewed By: ezyang

Differential Revision: D13348041

fbshipit-source-id: e8dcc89f5c71bd1ea2c9813990dac6e58e63b1fd
2018-12-11 21:01:42 -08:00
2dfdbef91d Fix include paths for TensorImpl.h
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14816

Reviewed By: ezyang

Differential Revision: D13348040

fbshipit-source-id: a7204d89c2dd277d13093b0ed862f40b53dee82f
2018-12-11 21:01:40 -08:00
9e9e87c19e Move TensorImpl to c10 (yay!)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14795

Reviewed By: ezyang

Differential Revision: D13336856

fbshipit-source-id: 5375d0e42312ff7564f4df06210a5e49542d59e3
2018-12-11 21:01:38 -08:00
bff6d42cef Add at::scalar_tensor factory function, use it instead of Type.scalar… (#15074)
Summary:
…_tensor.

This is part of a long series of paring down the Type interface.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15074

Differential Revision: D13421482

Pulled By: gchanan

fbshipit-source-id: 84010ee71fef2cb74d32d5de7858d8ed9f36b885
2018-12-11 20:37:41 -08:00
b710642969 Make ATen HIPify out-of-place, but still reuse CUDA names. (#14866)
Summary:
```
    This diff changes the HIPification of ATen to be out-of-place.
    We now have the following mappings:

    - ATen/cuda => ATen/hip
    - ATen/native/cuda => ATen/native/hip
    - ATen/native/sparse/cuda => ATen/native/sparse/hip
    - THC => THH
    - THCUNN => THHUNN

    The build system is adjusted to know about these new build paths,
    and HIPify is taught how to adjust include paths and
    THC_GENERIC_FILE appropriately.  ATen_hip is now built as
    the ATen_hip library, rather than reusing ATen_cuda.

    However, despite these new filepaths, none of the identifiers in ATen
    have actually changed.  So, e.g., THHGeneral.h still defines functions
    named THC_blahblah, and HIP still shows up as CUDA in PyTorch itself.
    We'll tackle this in a subsequent PR; this diff is just to get the files
    out-of-place.

    Minor extra improvements:

    - Don't edit tmp_install when hipifying
    - HIP no longer builds native_cudnn_cpp; it was unnecessary
    - Caffe2_HIP_INCLUDES is now Caffe2_HIP_INCLUDE, for consistency
      with all the other variables.
    - HIP build now properly respects ATEN_CUDA_FILES_GEN_LIB (it
      did not previously.)
    - You can now override file extension matching in pyHIPIFY
      by explicitly specifying its full name in the matching list.
      This is used so we can HIPify CMakeLists.txt in some situations.

    A little bit of string and ceiling wax:

    - gen.py grows a --rocm flag so that it knows to generate CUDA
      files which actually refer to the HIP headers (e.g., THH.h)
      We'll get rid of this eventually and generate real HIP files,
      but not for this PR.
    - Management of HIP dependencies is now completely deleted
      from the ATen CMakeLists.txt.  The old code was dead (because
      it was shoveled in ATen_CUDA_DEPENDENCY_LIBS and promptly
      ignored by the Caffe2 build system) and didn't actually work.
```

Stacked on https://github.com/pytorch/pytorch/pull/14849 review last commit only
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14866

Differential Revision: D13419475

Pulled By: ezyang

fbshipit-source-id: cb4c843df69a1d8369314c9fab1b7719520fa3db
2018-12-11 19:15:27 -08:00
5c2c40ad87 Add error type to raise statement
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15039

Differential Revision: D13419566

Pulled By: zou3519

fbshipit-source-id: f67a3aebce937e3e640e91e81eb3e184cfdf269c
2018-12-11 17:41:44 -08:00
73ee7fda4c Remove deprecated variable_tensor_functions (#15003)
Summary:
Removing the deprecated functions in `torch/csrc/variable_tensor_functions.h` (like `torch::CPU`) and corresponding implementations from `torch/csrc/torch.cpp` from master after the release.

ezyang gchanan soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15003

Differential Revision: D13418086

Pulled By: goldsborough

fbshipit-source-id: a0accdf6f7b0efa1ec07ac7b74b86ff2da37543f
2018-12-11 17:16:11 -08:00
0552326846 add gloo scatter support on GPU (#14917)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14917

as titled

Reviewed By: pietern

Differential Revision: D13271560

fbshipit-source-id: 0187a3390f8ebd72a2c074e7a651432159d427c0
2018-12-11 17:11:13 -08:00
92314c83fa re-enable copy of python files, but be careful that the copy is only … (#14982)
Summary:
…done once

This allow no-op build to work correctly even when BUILD_CAFFE2_OPS is on.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14982

Differential Revision: D13413960

Pulled By: zdevito

fbshipit-source-id: 6e5412a8c375af8a47c76f548cdd31cff15f3853
2018-12-11 16:54:08 -08:00
71e0cb505c Split off fuser tests in test_jit.py to their own test case (#15072)
Summary:
This PR creates TestFuser inside test_jit.py to be a home for graph fuser
specific tests.

This was a useful exercise because now that all the fuser tests are in
one place, I can spot redundant and bitrotting tests for cleanup in a
future PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15072

Differential Revision: D13421458

Pulled By: zou3519

fbshipit-source-id: 80b1a7712feff75a0c186d1664601c4edbbca694
2018-12-11 14:55:06 -08:00
7408ce2f80 Supress warnings on generated tests
Summary: Removes all warnings spew for the TestJitGenerated tests

Differential Revision: D13420919

fbshipit-source-id: f251c12f923088ccc5daa2984c15003a67cbd1c1
2018-12-11 14:00:41 -08:00
04b65dfd1f Issue 14984: Remove divide by zero error in index_put_ (#14986)
Summary:
No check for zero index tensor was done in the accumulate=True (serial) case in the new TensorIterator code since https://github.com/pytorch/pytorch/pull/13420.

https://github.com/pytorch/pytorch/issues/14984
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14986

Differential Revision: D13417861

Pulled By: colesbury

fbshipit-source-id: e6ed1af8f708b53a35803fc157ed1f043169ec89
2018-12-11 13:38:12 -08:00
109c8d22dc Update onnx coverage script for more accurate result (#15029)
Summary:
The coverage of scalar-input test cases were not accurate. This patch fixed that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15029

Differential Revision: D13419764

Pulled By: zrphercule

fbshipit-source-id: a14a5cbef432bea8c9126156f5deb1125e1aeb47
2018-12-11 13:14:35 -08:00
f2f47de5ad tox.ini -> .flake8 (#15065)
Summary:
We were only using this file to configure flake8, and fbcode linters do not recognize tox.ini which causes spurious linter warnings.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15065

Differential Revision: D13420774

Pulled By: suo

fbshipit-source-id: e43a46befa36862c8b3c0a90074aec6a66531492
2018-12-11 13:14:34 -08:00
ca7f8fed60 silence unreachable code warnings (#15036)
Summary:
Stack:
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; **#15036 silence unreachable code warnings**&nbsp;&nbsp;[💛](https://our.intern.facebook.com/intern/diff/D13411100/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15036

Differential Revision: D13414712

Pulled By: li-roy

fbshipit-source-id: d4aa84571fa94c66f3c5bfa9575a10c6ee398f9e
2018-12-11 13:09:04 -08:00
d825b39061 improve deep equality check in alias annotation test (#15031)
Summary:
Previously we were returning true if either IValue wasn't a tensor, which…is bad
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15031

Differential Revision: D13409759

Pulled By: suo

fbshipit-source-id: f8bdcd05d334c1276ce46f55812065d358c1ff5d
2018-12-11 12:14:00 -08:00
02d149b767 Fix race condition in ThreadPool::workOnTasksUntilCompleted (#14833)
Summary:
Resolves #14704
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14833

Differential Revision: D13405211

Pulled By: highker

fbshipit-source-id: 8552d51eeb5d3af0ed66c461e5ddfeb9ae2926bd
2018-12-11 11:46:58 -08:00
c2a754c58b Fix CMakeLists.txt for Int8 python bindings (#15047)
Summary:
Currently in caffe2, one cannot properly fetch the content of Int8 blobs.

Upon digging the source code, it turns out that the relevant source code is not being compiled. Adding the source to CMakeLists.txt fixes this issue.

First time ever doing a pull request. Please let me know if there's any rule I should follow. Thanks.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15047

Differential Revision: D13417583

Pulled By: bddppq

fbshipit-source-id: dd39575971a3012635edbf97a045d80e4b62a8eb
2018-12-11 10:48:47 -08:00
687834dcb4 Install cpp tests when built (#15000)
Summary:
This is broken out of https://github.com/pytorch/pytorch/pull/13733/

We want to install cpp tests so they can ultimately be runnable from that location for Caffe2 tests run from PyTorch builds.

cc pjh5 yf225 anderspapitto
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15000

Reviewed By: pjh5

Differential Revision: D13416253

Pulled By: orionr

fbshipit-source-id: 51280be0a22557a742f90c9f303c58c35cbd4a38
2018-12-11 10:07:48 -08:00
5d3a347685 Stashing checkpointing RNG states based on devices of arg tensors (#14518)
Summary:
This PR intends to address apaszke's concerns in https://github.com/pytorch/pytorch/pull/14253#issuecomment-441740016.  Preserving the rng state is now controlled by a kwarg rather than a global state, hopefully in a python 2.7-compatible way.

Additionally, the checkpointing function stashes and restores the RNG states of
1. devices associated with all input tensor args to run_fn as well as
2. the current device.

I could easily change this to only save and restore the RNG states associated 1. alone.  This would simplify the logic to create a [deduplicated, ordered](https://github.com/pytorch/pytorch/compare/master...mcarilli:checkpointing_rng_touchup?expand=1#diff-58da227fc9b1d56752b7dfad90428fe0R37) list of devices considered active.

I'm wondering if the [get_device_states](https://github.com/pytorch/pytorch/compare/master...mcarilli:checkpointing_rng_touchup?expand=1#diff-58da227fc9b1d56752b7dfad90428fe0R32) and [set_device_states](https://github.com/pytorch/pytorch/compare/master...mcarilli:checkpointing_rng_touchup?expand=1#diff-58da227fc9b1d56752b7dfad90428fe0R47) functions are general enough to reside elsewhere (presumably torch/random.py).  I'm also wondering if the check on [torch.cuda._initialized](https://github.com/pytorch/pytorch/compare/master...mcarilli:checkpointing_rng_touchup?expand=1#diff-58da227fc9b1d56752b7dfad90428fe0R47) would be better placed within `get_device_states`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14518

Differential Revision: D13356210

Pulled By: ezyang

fbshipit-source-id: afa4cc21ce7862142d5cb1dec3750018df222039
2018-12-11 09:48:45 -08:00
25ddd659c9 Updating submodules
Reviewed By: cdelahousse

fbshipit-source-id: d39b31f12ab2ab570548f3e8a65949332a64a0ff
2018-12-11 07:40:37 -08:00
bf1d411dbf Switch Int8Softmax, Int8Relu, and Int8LeakyRelu to QNNPACK (#14933)
Summary:
Int8Softmax: 4x-5x speedup compared to previous implementation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14933

Differential Revision: D13406820

Pulled By: Maratyszcza

fbshipit-source-id: ea8cbe1b861ddb7ff1b851d06d52c6fd6d04ed01
2018-12-11 00:49:06 -08:00
a1ea7dbe40 Adjust the API call to deserilize the tensorproto (#14132)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14132

as title

Reviewed By: jerryzh168

Differential Revision: D13110697

fbshipit-source-id: 822c9079de11951f90aec3d26f0e4108847e7dac
2018-12-10 22:54:42 -08:00
27d5ae7afb use datatype dependent tolerance in data parallel tests
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14856

Differential Revision: D13413560

Pulled By: soumith

fbshipit-source-id: b3a0cfe93477ed332e6eaa2e39ef5f4cc8b36481
2018-12-10 22:50:27 -08:00
81dc78d871 Update pooling.py (#14998)
Summary:
Strange line in the documentation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14998

Differential Revision: D13413235

Pulled By: soumith

fbshipit-source-id: 80d05ec1185719b785f0aac914bc2369c1174f2f
2018-12-10 22:36:20 -08:00
48a361cc62 Clean up casting ops (#14947)
Summary:
This removes FloatToInt style names replacing it with just the destination
name (e.g. FloatToInt -> Float). This makes it more consistent with the
syntax and makes it easier to add type conversions (just add a new
prim::Int op, for instance).

None of these ops get serialized so this should not effect loading of
old models.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14947

Differential Revision: D13408409

Pulled By: zdevito

fbshipit-source-id: d773fe863f14d9de893f686832769f8cc8903a8e
2018-12-10 22:15:08 -08:00
cff509e2b1 share code between adagrad and rowwise adagrad tests (#14692)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14692

Remove some code duplication

Reviewed By: chocjy

Differential Revision: D13296731

fbshipit-source-id: 5924e037ca64fc4b89234be922bc5ca47fb8bd32
2018-12-10 22:10:39 -08:00
c48b15e41a TBB task graph (#15041)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15041

Adding an alternative implementation of a task graph based on TBB

Reviewed By: dmudiger

Differential Revision: D13412517

fbshipit-source-id: f5efedd680bbe0072bf38d504e5682ab51dd630f
2018-12-10 21:35:04 -08:00
45dfc6764e Enable more caffe2 fp16 rocm tests (#15040)
Summary:
cc rohithkrn petrex
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15040

Reviewed By: houseroad

Differential Revision: D13413068

Pulled By: bddppq

fbshipit-source-id: b2967f16f8da0b9e80083138fb8632c14e9e9b63
2018-12-10 21:30:21 -08:00
5022f9d6ef Enable the build of tests in ATen/core (#15032)
Summary:
Otherwise they won't build
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15032

Reviewed By: yinghai

Differential Revision: D13409801

Pulled By: houseroad

fbshipit-source-id: 95464aa8f3604835997ba1bb7f3c3e51485d1686
2018-12-10 21:24:54 -08:00
962b82dd81 More scaffolding for LegacyTHDispatch. (#14852)
Summary:
1) at::functions are now also exposed in the at::legacy::th namespace and we move relevant calls over to use them (to avoid merge conflicts)
2) LegacyTHDispatch now handles device-type initialization
3) We generate derived LegacyTHDispatchers, e.g. THLegacyCPULongDispatcher, although they are currently empty.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14852

Reviewed By: ezyang

Differential Revision: D13360852

Pulled By: gchanan

fbshipit-source-id: af6705aeba3593ea5dba9bfc62890e5257bc81f8
2018-12-10 19:57:01 -08:00
e9cd781681 Back out "Revert D13043261: [caffe2] Task graph and task future abstractions in executor"
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15030

Reviewed By: bddppq

Differential Revision: D13408998

fbshipit-source-id: 9eb675e09fbc4829eab34df7aa660a0590816feb
2018-12-10 19:30:58 -08:00
83f32eebd9 Tensor construction codemod - 2/3 (#14836)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14836

Codemod generated with clangr shard mode, 25 files per diff,
motivation: https://github.com/pytorch/pytorch/pull/12407

Reviewed By: bddppq

Differential Revision: D13335176

fbshipit-source-id: 8d89510670e2cf70559d2f75e68f7181feb0b6d9
2018-12-10 19:30:56 -08:00
5222a1b190 Fixing reading of FBGEMM from env variables
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15023

Reviewed By: orionr

Differential Revision: D13406778

Pulled By: pjh5

fbshipit-source-id: 2265f01170fb7969cbdf4e44ca6ef183f5d8017d
2018-12-10 18:18:38 -08:00
a97cf568a4 Alignas Array struct (#14920)
Summary:
This PR aligns the Array struct such that cuda vector performance improvements can be utilized.

I tested this by using it on our Philox header. Note how the vector store instruction gets used for cuda vector types and when using alignas on Array, vs when not using alignas on Array.

With cuda vector type (uint4, uint2, float4): https://godbolt.org/z/UaWOmR
With alignas: https://godbolt.org/z/Eeh0t5
Without alignas: https://godbolt.org/z/QT63gq
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14920

Differential Revision: D13406751

Pulled By: soumith

fbshipit-source-id: 685b1010ef1f576dde30c278b1e9b642f87c843d
2018-12-10 17:58:03 -08:00
7e2b074219 Integrate rocBLAS fp16 api into Caffe2 (#14882)
Summary:
This PR integrates rocBLAS half and mixed precision APIs in to Caffe2.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14882

Differential Revision: D13407840

Pulled By: bddppq

fbshipit-source-id: 75cb0d74da066776fa66575f1d255e879d36121e
2018-12-10 17:54:06 -08:00
92f3616f36 Fix old tensor CopyFrom usage in boolean mask operator
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15025

Differential Revision: D13407323

Pulled By: bddppq

fbshipit-source-id: 1bc1d28ad0c6c71d25d788549be18917e393ee50
2018-12-10 17:23:45 -08:00
4fcc2fffc3 unit test with multiple omp threads (#14958)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14958

Test with multiple threads

Reviewed By: jianyuh

Differential Revision: D13394791

fbshipit-source-id: 931a6c3bda15ebc816807e537dd0841c383e7a6f
2018-12-10 17:23:44 -08:00
9b272c08cf Remove partially initialized Tensor in Deserialization (#14197)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14197

Pull Request resolved: https://github.com/pytorch/pytorch/pull/13642

Previously we pass in a patially initialized Tensor to Deserialize and it will fill
it with the result of deserialization of a tensor proto. Now we want it to return
a Tensor directly since it's just a shared pointer to TensorImpl.

Reviewed By: dzhulgakov

Differential Revision: D12874357

fbshipit-source-id: 12b80a763375da23cfa64a74d6bc186d8d03b94f
2018-12-10 17:17:29 -08:00
4a145cd95c Revert D13043261: [caffe2] Task graph and task future abstractions in executor
Differential Revision:
D13043261

Original commit changeset: d89424354aea

fbshipit-source-id: b307e3281c4d83b60ba2bfadcbcf69afb7a41412
2018-12-10 16:03:59 -08:00
0a36fe565d apply() for ScriptModules (#14655)
Summary:
This can be use to initialize state that is not necessarily eligible for serialization/is implementation-specific. Concretely, I'm going to use this to pack the weight matrices for quantized Linear modules according to the FBGEMM APIs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14655

Differential Revision: D13404438

Pulled By: jamesr66a

fbshipit-source-id: 2d327cef5520fdd716b5b1b29effd60a049e8a4a
2018-12-10 15:40:31 -08:00
9bbb3efe2f Simplify THPPointer implementation for Storage. (#14897)
Summary:
We've virtualized the destructor for storage, so we
no longer have to forward to a particular backend.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14897

Differential Revision: D13399216

Pulled By: ezyang

fbshipit-source-id: 531d29c3f278477cfa8759f30ab4f304d695b659
2018-12-10 15:18:49 -08:00
23cc3daabd Disable getNumGPUs rewrite (#14993)
Summary:
cc iotamudelta

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14993

Differential Revision: D13405804

Pulled By: ezyang

fbshipit-source-id: c4aa9ed29ee2a4f3abf76c1e0fa8babfd738db35
2018-12-10 15:13:55 -08:00
6ad9f7b798 Fix include path for WrapDimMinimal.h
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14794

Reviewed By: dzhulgakov

Differential Revision: D13336842

fbshipit-source-id: ca49a9fd1d409d8a75e43eeb9b9b02c305ebb79a
2018-12-10 15:10:03 -08:00
279ec9ef7a Move WrapDimMinimal to c10
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14793

Reviewed By: ezyang

Differential Revision: D13336841

fbshipit-source-id: 4365a799e1856cc68dd94a273e97663fee5f51db
2018-12-10 15:10:01 -08:00
66315ab323 Stop disabling maybeOverlappingIndices (#14999)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

cc iotamudelta
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14999

Differential Revision: D13405754

Pulled By: ezyang

fbshipit-source-id: 98459496494390ad1115b4f1f6738d53c14f0745
2018-12-10 15:02:08 -08:00
483ba553bd add gloo allgather support on GPU (#14576)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14576

as titled

Reviewed By: pietern

Differential Revision: D13266063

fbshipit-source-id: e262f77d63724a7504a7112907bbfba49612fe75
2018-12-10 14:32:54 -08:00
029600813e Task graph and task future abstractions in executor
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14116

Reviewed By: dmudiger

Differential Revision: D13043261

fbshipit-source-id: d89424354aea14d1d14eb8320fb3aa34908a4e81
2018-12-10 14:28:56 -08:00
a51fe386c8 caffe2/caffe2/contrib/script (#15007)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15007

Pull Request resolved: https://github.com/pytorch/pytorch/pull/14979

att

Reviewed By: dzhulgakov

Differential Revision: D13286191

fbshipit-source-id: b8a6bc7aea44487aea4dcf7f44c858fd30c6293c
2018-12-10 14:23:31 -08:00
25144c8a09 s/Torch Script/TorchScript/g (#15011)
Summary:
pls
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15011

Differential Revision: D13404158

Pulled By: suo

fbshipit-source-id: e906281463d65c86e4e9073eb0c0a26f4f29e307
2018-12-10 13:48:24 -08:00
110ccbb689 Improve the docs of interpolate(align_corners=) (#14806)
Summary:
ailzhang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14806

Reviewed By: ailzhang

Differential Revision: D13366332

Pulled By: ppwwyyxx

fbshipit-source-id: 08fcea95d5c86b11cdfe464fdd9daa50050871f1
2018-12-10 12:50:38 -08:00
e77de07448 Improve build time of register_symbols.cpp without compiler hacks (#14911)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14911

In optimized modes the compiler tries to inline all the
`unordered_map::operator[]` calls, creating a massive amount of code
which takes several minutes to optimize. Instead, create a table of
PODs and populate the maps using a simple loop.

Reviewed By: soumith, luciang

Differential Revision: D13382948

fbshipit-source-id: b6752921e0f7213595d26b39e4397f6a3897960b
2018-12-10 11:57:11 -08:00
18c93b87c2 Delete defunct THP_API.h header. (#14899)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14899

Differential Revision: D13383687

Pulled By: ezyang

fbshipit-source-id: f2a08a769cc3775ba55f9c58d622a83df622d816
2018-12-10 10:47:24 -08:00
1989157eb6 Disable test_leaf_variable_sharing on ASAN runs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15001

Reviewed By: orionr

Differential Revision: D13399119

fbshipit-source-id: 6b1d098e55a67b1f5bc6d08a8ee3c1be8234a654
2018-12-10 10:43:05 -08:00
d30b6bf3b6 Revert D13306052: [pytorch][PR] Allow converting CharTensor to np arrays
Differential Revision:
D13306052

Original commit changeset: 202d038f139c

fbshipit-source-id: 11f6bdd687f8ea5ce2e5f28f48d19449a5c403eb
2018-12-10 10:36:17 -08:00
dc1e6d0b98 Non-INTERFACE AT_LINK_STYLE is dead code (#14822)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14822

Differential Revision: D13355574

Pulled By: ezyang

fbshipit-source-id: a7173084f8735424619b2e393df2715a05918b44
2018-12-10 09:42:53 -08:00
54d5c53826 Support torch.load with encoding (#14743)
Summary:
Addresses a common compatibility issue when loading Py2 checkpoints in Py3 regarding to bytes.

E.g.,
[1] https://github.com/pytorch/pytorch/issues/5994,
[2] https://github.com/CSAILVision/places365/issues/25,
[3] https://discuss.pytorch.org/t/how-to-load-a-saved-model-trained-on-pytorch-0-3-1-python-2-7-on-pyorch-1-0-python-3-7/31212
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14743

Reviewed By: weiyangfb

Differential Revision: D13350888

Pulled By: soumith

fbshipit-source-id: 2df4e828a8b70509118a355307ca3ebe51e108f6
2018-12-10 08:07:36 -08:00
9b2bd284b3 Convert int8 numpy array to CharTensor (#14700)
Summary:
When rewriting `default_collate`, I noticed that `from_numpy` and `as_tensor` and `tensor` all do not work on `np.int8` arrays.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14700

Reviewed By: weiyangfb

Differential Revision: D13305297

Pulled By: soumith

fbshipit-source-id: 2937110f65ed714ee830d50098db292238e9b2a9
2018-12-10 07:39:06 -08:00
e1b5dbf699 Allow converting CharTensor to np arrays (#14710)
Summary:
The other direction of #14700

cc soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14710

Reviewed By: weiyangfb

Differential Revision: D13306052

Pulled By: soumith

fbshipit-source-id: 202d038f139cf05e01069ff8d05268c66354c983
2018-12-10 07:35:28 -08:00
b039a715ce pre-pack operation of dnnlowp conv with 16-bit accumulation (#14881)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14881

This diff allows us to pre-quantize and pre-pack weight matrix used in DNNLOWP_ACC16 .
The intended use pattern is run Int8ConvPackWeight in init_net that generates a packed weight and Int8Conv with DNNLOWP_ACC16 engine uses the the packed weight.

Reviewed By: csummersea

Differential Revision: D13374662

fbshipit-source-id: dd02b9a4eb7af1fe208aa857fcd0b445e6e395af
2018-12-10 01:08:21 -08:00
e747acbebb Respect -q of setup.py (#14972)
Summary:
1. Changes the prints along the 'rebuild' pathway to respect the '-q' flag of setup.py
A clean rebuild now only prints:

    [zdevito@devgpu172.prn2 /data/users/zdevito/pytorch] python setup.py -q rebuild develop
    [0/1] Install the project...
    -- Install configuration: "RelWithDebInfo"
    ninja: no work to do.
    ninja: no work to do.
    ninja: no work to do.
    ninja: no work to do.
    ninja: no work to do.
    ninja: no work to do.

2. Deletes apparently dead calls to `generate_code`. Now that CMake builds these files,
it appears that it is getting called twice and the second version is never used.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14972

Reviewed By: soumith

Differential Revision: D13396330

Pulled By: zdevito

fbshipit-source-id: 83c45143bbc6a6d2c1cfee929291ec059f2b5dc3
2018-12-09 22:47:49 -08:00
fab8085111 _get_device_index supports parsing device strings
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14929

Reviewed By: weiyangfb

Differential Revision: D13394498

Pulled By: soumith

fbshipit-source-id: 948c6118abdf6c1e1a8a17709333954cafb2345e
2018-12-09 21:12:46 -08:00
5fd69e7551 remove mingfeima mkldnn reference from README, as no longer necessary (#14975)
Summary: we now get mkldnn automatically from third_party/ideep

Differential Revision: D13396480

Pulled By: soumith

fbshipit-source-id: 20f819ba4b78cbe9c7d0baeab1c575669cbf6c20
2018-12-09 20:44:10 -08:00
aefc83f46d fixing some rebuild issues (#14969)
Summary:
This fixes rebuild issues with the ninja part of the build. With this patch all ninja files will now report `nothing to do` if nothing has changed assuming `BUILD_CAFFE2_OPS=0`.

1. This only does the python file processing for caffe2 when BUILD_CAFFE2_OPS=1, this part of the build file is written in such a way that it is always required to rerun and can take substantial time to move files around in the no-op build. In the future this part should be rewritten to use a faster method of copying the files or should treat copying the files as part of the build rules and only run when the files are out of date.

2. This points `sleef` to a patched version that fixes a dead build output that is causing everything to relink all the time. See https://github.com/shibatch/sleef/pull/231#partial-pull-merging for the upstream change.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14969

Reviewed By: soumith

Differential Revision: D13395998

Pulled By: zdevito

fbshipit-source-id: ca85b7be9e99c5c578103c144ef0f2c3b927e724
2018-12-09 16:32:19 -08:00
fc30e2782c Remove deprecated info argument in btrifact (#14935)
Summary:
As specified in title.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14935

Differential Revision: D13394449

Pulled By: soumith

fbshipit-source-id: 569d59414f3a1a43ea641bded4b5433eb53e3490
2018-12-09 15:59:30 -08:00
86e03b8a30 add fix for CUDA 10 (#14971)
Summary:
Linux binaries-only fix for CUDA10
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14971

Differential Revision: D13395932

Pulled By: soumith

fbshipit-source-id: a72d6ab6b98c6c936e6391d55d2e4e45b9f1e6dd
2018-12-09 15:54:27 -08:00
5f2736b84a Fix mismatched test_{full,ones,zeros}_like onnx expect files (#14956)
Summary:
master broken #14903
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14956

Differential Revision: D13395363

Pulled By: bddppq

fbshipit-source-id: 31f0913843292e557807fd5a976f8907fa6cae4b
2018-12-09 08:57:14 -08:00
a1494efdfa fix auto grad summing for IfOp where intermediate output needs renaming (#14772)
Summary:
fix auto grad summing for IfOp where intermediate output needs renaming.

Bug before this diff:
- we only renames the output of IfOp without changing the subnet ops output
- this results in blob not found error

the unittest provides an example
this diff fix that for IfOp
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14772

Differential Revision: D13327090

Pulled By: harouwu

fbshipit-source-id: ec40ee88526ace3619c54551e223dd71158a02f8
2018-12-09 08:26:46 -08:00
fa12e1e4d4 Export ones_like, zeros_like and full_like using ONNX ConstantLike op. (#14903)
Summary:
This PR does the following:
1) Updates the ONNX export for `torch.zeros_like` and `torch.full_like` ops to use ONNX op `ConstantLike`. This reduces the export of experimental op `ConstantFill`, which may possibly be removed in future, see https://github.com/onnx/onnx/pull/1434).
2) It also adds export support for `torch.ones_like`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14903

Differential Revision: D13383700

Pulled By: houseroad

fbshipit-source-id: 566d00a943e9497172fcd5a034b638a650ab13a2
2018-12-08 22:49:02 -08:00
517c7c9861 Canonicalize all includes in PyTorch. (#14849)
Summary:
Anywhere we used #include "foo.h", we now say #include <foo.h>
Paths are adjusted to be rooted out of aten/src, torch/lib, or
the root level directory.

I modified CMakeLists.txt by hand to remove TH and THC from
the include paths.

I used the following script to do the canonicalization:

```
  import subprocess
  import re
  import os.path

  files = subprocess.check_output(['git', 'ls-files']).decode('utf-8').rstrip().split('\n')
  for fn in files:
      if not any(fn.endswith(suff) for suff in ['.cu', '.cpp', '.in', '.h', '.hpp', '.cu', '.cuh', '.cc']):
          continue
      if not any(fn.startswith(pref) for pref in ["aten/", "torch/"]):
          continue
      with open(fn, 'r') as f:
          c = f.read()
      def fmt(p):
          return "#include <{}>".format(p)
      def repl(m):
          p = m.group(1)
          if p in ["dlfcn.h", "unistd.h", "nvrtc.h", "cuda.h", "cuda_runtime.h", "cstdint", "cudnn.h", "Python.h", "cusparse.h", "cuda_runtime_api.h", "cuda_fp16.h", "cublas_v2.h", "stdint.h", "curand_kernel.h"]:
              return fmt(p)
          if any(p.startswith(pref) for pref in ["torch/csrc", "c10/", "ATen/", "caffe2/", "TH/", "THC/", "Eigen/", "gtest/", "zdl/", "gloo/", "onnx/", "miopen/"]):
              return fmt(p)
          for root in ["aten/src", "torch/lib", ""]:
              for bad_root in [os.path.dirname(fn), "aten/src/TH", "aten/src/THC", "torch/csrc"]:
                  new_p = os.path.relpath(os.path.join(bad_root, p), root)
                  if not new_p.startswith("../") and (os.path.exists(os.path.join(root, new_p)) or os.path.exists(os.path.join(root, new_p + ".in"))):
                      return fmt(new_p)
          print("ERROR: ", fn, p)
          return m.group(0)
      new_c = re.sub(r'#include "([^"]+)"', repl, c)
      if new_c != c:
          print(fn)
          with open(fn, 'w') as f:
              f.write(new_c)
```

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14849

Reviewed By: dzhulgakov

Differential Revision: D13363445

Pulled By: ezyang

fbshipit-source-id: 52361f878a672785f9306c9e9ab2513128092b68
2018-12-08 19:38:30 -08:00
a7b3197b2d race condition fix of calling mutable_data inside a openmp region (#14921)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14921

Fix race condition introduced in D13188595 .
Let's reminder ourselves "never call mutable_data from an OpenMP region!!!"

Reviewed By: jianyuh

Differential Revision: D13387692

fbshipit-source-id: 6a3aeedeeda55a9ede660de8f1f44d4eee76ae2b
2018-12-08 18:17:20 -08:00
e9db9595d2 Add crop argument, can crop rec as well, first resize and then crop
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14894

Reviewed By: llyfacebook

Differential Revision: D13377604

Pulled By: sf-wind

fbshipit-source-id: 333d0d864e6c2dc85f405baa25ed58029d62750f
2018-12-08 11:14:56 -08:00
b0909ea6a0 Switch Int8Sigmoid to QNNPACK (#14883)
Summary:
50x-100x speedup compared to current version.
Also, fixes a bug in the current version when batch size exceeds 1 (current version processes only the first image in this case).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14883

Differential Revision: D13390655

Pulled By: Maratyszcza

fbshipit-source-id: 1b33a97bf2d0866d38faa2b42e64fd2859017898
2018-12-08 02:47:29 -08:00
5e06fa0baf ONNX changes to use int32_t (instead of enum) to store data type
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14926

Reviewed By: houseroad

Differential Revision: D13390642

Pulled By: bddppq

fbshipit-source-id: c2314b24d9384f188fda2b9a5cc16465ad39581e
2018-12-08 01:06:08 -08:00
c8a5ec14dd Remove at references from c10
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14432

Reviewed By: dzhulgakov

Differential Revision: D13223904

fbshipit-source-id: 43b06e33e088e7789ccea6d92267936fe30d8571
2018-12-08 00:28:35 -08:00
25110d61fb Implement std for multiple dimensions on CPU devices. (#14535)
Summary:
Tested on a tensor with 1 billion elements and 3 dimensions on a powerful, highly
multi-core Linux machine.

parallelized: All operations (e.g., `t.std(1)`) that could be done in the old code are now several times faster. All
new operations (e.g., `t.std((0,2))` are significantly faster than the NumPy equivalents.
`t.std((0, 1, 2))`, a new operation, is logically equivalent to the
old `t.std()`, but faster.

serial: The above comment about old operationos now being faster still
holds, but `t.std((t1, ..., tn))` is now a few
times slower than `t.std()`. If this turns out to be important, we can
special-case that to use the old algorithm.

The approach is to create a new method, `TensorIterator::foreach_reduced_elt`,
valid for `TensorIterator`s that represent a dimension reduction. This
method calls a supplied function for each element in the output,
supplying it with the input elements that correspond to that output.

Given that primitive, we can implement reductions like the following pseudocode:

If there is more than one output element:
```
PARALLEL FOR EACH element IN output:
    accumulator = identity
    SERIAL FOR EACH data_point IN element.corresponding_input:
        accumulator.update(data_point)
    element = accumulator.to_output()
```

If there is only one output element, we still want to parallelize, so we
do so along the *input* instead:

```
accumulators[n_threads]
PARALLEL FOR EACH input_chunk IN input.chunks():
    accumulators[thread_num()] = identity
    SERIAL FOR EACH data_point IN input_chunk:
        accumulators[thread_num()].update_with_data(data_point)
accumulator = identity
SERIAL FOR EACH acc in accumulators:
    accumulator.update_with_other_accumulator(acc)
output_element = accumulator.to_output()
```

Note that accumulators and data points do not have to be the same type
in general, since it might be necessary to track arbitrary amounts of
data at intermediate stages.

For example, for `std`, we use a parallel version of Welford's
algorithm, which requies us to track the mean, second moment, and number
of elements, so the accumulator type for `std` contains three pieces of
data.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14535

Differential Revision: D13283887

Pulled By: umanwizard

fbshipit-source-id: 8586b7bf00bf9f663c55d6f8323301e257f5ec3f
2018-12-07 20:16:04 -08:00
c2a75926ca Add CAFFE2_API to video processing functions (#14900)
Summary:
Extracted from https://github.com/pytorch/pytorch/pull/13733

Some tests were failing because these methods didn't have an export.

cc pjh5 yf225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14900

Reviewed By: pjh5

Differential Revision: D13381130

Pulled By: orionr

fbshipit-source-id: 030536f8fb09765c09a7b0bd45400161053f2e18
2018-12-07 19:55:21 -08:00
52942e1f09 Enable unit tests known to work on ROCm (#14011)
Summary:
* Enable unit tests known to work on ROCm.
* Disable a few that are known to be flaky for the time being.
* Use std::abs for Half
* No more special casing for ROCm in TensorMathReduce
* Document an important detail for a hardcoded block size w.r.t. ROCm in TensorMathReduce

ezyang bddppq for awareness
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14011

Differential Revision: D13387679

Pulled By: bddppq

fbshipit-source-id: 4177f2a57b09d866ccbb82a24318f273e3292f71
2018-12-07 18:57:32 -08:00
5be28ade66 Automatic update of fbcode/onnx to aca8473a40cf43f01958c81b648efcee7f3a755a (#14865)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14865

Previous import was 42804705bdbf179d1a98394008417e1392013547

Included changes:
- **[aca8473](https://github.com/onnx/onnx/commit/aca8473)**: Add Erf operator for computing error function (#1675) <bddppq>
- **[3fc82ca](https://github.com/onnx/onnx/commit/3fc82ca)**: Add IsNaN operator. (#1656) <Pranav Sharma>
- **[0685f01](https://github.com/onnx/onnx/commit/0685f01)**: Add Sign Op (#1658) <Rui Zhu>
- **[2a8fae8](https://github.com/onnx/onnx/commit/2a8fae8)**: Fix unused var warning (#1669) <Yinghai Lu>
- **[e212833](https://github.com/onnx/onnx/commit/e212833)**: Update scan (#1653) <G. Ramalingam>

Reviewed By: zrphercule

Differential Revision: D13370727

fbshipit-source-id: 13a93d5acc8d4758f682278ea162ec9124ced22d
2018-12-07 17:37:42 -08:00
11a9248d01 Enable fp16 for MIOPEN operators in Caffe2 (#14905)
Summary:
This PR enables fp16 MIOPEN operators in Caffe2.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14905

Differential Revision: D13383439

Pulled By: bddppq

fbshipit-source-id: 840afa8d08bef2952ca0039dee2423f1542bb330
2018-12-07 17:26:44 -08:00
70598740ec Upgrade MKL-DNN to version 0.17 (#14308)
Summary:
upgrade MKL-DNN to version 0.17
update mkldnn bridge to latest.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14308

Differential Revision: D13383102

Pulled By: yinghai

fbshipit-source-id: c434f0e0ddff2ee2c86db2d6c44a37298fd005a3
2018-12-07 16:44:50 -08:00
478eb70c07 Fix build with OpenCV 4.0 (#14356)
Summary:
Fixes #14355
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14356

Differential Revision: D13356237

Pulled By: bddppq

fbshipit-source-id: 2bf6ee21995c2c7b617c4e78ea7341f975f1b937
2018-12-07 16:40:31 -08:00
4453a1ff88 Remove unused TensorImpl dependencies
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14792

Reviewed By: ezyang

Differential Revision: D13336843

fbshipit-source-id: 12f84799a70c2e90a8b934dd8dc031c09a6782f0
2018-12-07 16:23:48 -08:00
65aa11a876 Remove TensorImpl -> context_base dependency (#14658)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14658

Remove this dependency by moving at::CopyBytes to c10.
The implementations for at::CopyBytes will have to live in aten/caffe2 for now because they're not unified for CUDA yet.
They'll be moved into c10/backend/xxx later.

Reviewed By: dzhulgakov

Differential Revision: D13288655

fbshipit-source-id: 1c92379345308b3cd39a402779d7b7999613fc0d
2018-12-07 16:23:46 -08:00
086a37876b Fix include paths for TensorOptions
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14747

Reviewed By: ezyang

Differential Revision: D13318645

fbshipit-source-id: f5ba77a93f6019fbf5faffb47a2837c95fad474d
2018-12-07 16:23:44 -08:00
459aac4f24 Update graph printouts in JIT docs (#14914)
Summary:
Tracing records variable names and we have new types and stuff in the IR, so this updates the graph printouts in the docs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14914

Differential Revision: D13385101

Pulled By: jamesr66a

fbshipit-source-id: 6477e4861f1ac916329853763c83ea157be77f23
2018-12-07 15:08:53 -08:00
5734e96775 Improve hub documentation (#14862)
Summary:
Added a few examples and explains to how publish/load models.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14862

Differential Revision: D13384790

Pulled By: ailzhang

fbshipit-source-id: 008166e84e59dcb62c0be38a87982579524fb20e
2018-12-07 14:59:01 -08:00
65da7ddad6 USE_FBGEMM=True by default
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14868

Differential Revision: D13383390

Pulled By: jamesr66a

fbshipit-source-id: 1880c07dfd239e19153bd4fde2ab2c8d0604f956
2018-12-07 14:22:55 -08:00
a0ee3a279c USE_TENSORRT support and TensorRT 5 compatibility
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13945

Differential Revision: D13317525

Pulled By: yinghai

fbshipit-source-id: 8630dfec1bbc5aac19539e344e7c38a7fd8b051d
2018-12-07 14:01:11 -08:00
febc7ff99f Add __init__.py so files get picked up on install (#14898)
Summary:
This will let us install tests and other Caffe2 python code as a part of running Caffe2 tests in PyTorch.

Broken out of https://github.com/pytorch/pytorch/pull/13733/

cc pjh5 yf225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14898

Reviewed By: pjh5

Differential Revision: D13381123

Pulled By: orionr

fbshipit-source-id: 0ec96629b0570f6cc2abb1d1d6fce084e7464dbe
2018-12-07 13:40:23 -08:00
efc5e9f71a Replace calls of Type::_th_tensor. (#14877)
Summary:
_th_tensor is moving off Type, so these calls need to be replaced.

Unfortunately, replacing these with a full-fledged solution [e.g. from_storage(..., TensorOptions)] is a bit complicated because the storage itself fully defines the Type (modulo variable).  It's simpler to just wait for the Variable/Tensor merge rather than to solve this now, so instead I changed the call sites to: at::empty({0}, type.options()).set_(storage...).

This isn't great because we are also trying to get rid of Type::options, but this seems to be the lesser-of-two-evils.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14877

Differential Revision: D13374310

Pulled By: gchanan

fbshipit-source-id: eb953ed041507e6190d6f32e383912e5a08311cd
2018-12-07 13:04:48 -08:00
d6c53328f9 Large scale fix of python-related files in torch/csrc/
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14515

Differential Revision: D13247966

Pulled By: goldsborough

fbshipit-source-id: 7a127c508fc576a7a92626dd6b729f660162d628
2018-12-07 13:04:46 -08:00
939877bf4b Implementation of WeightedSum op for mkl-dnn and fix FC op output shape issue.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14407

Reviewed By: yinghai

Differential Revision: D13364364

Pulled By: wesolwsk

fbshipit-source-id: e69bcd1bc52e35b2f0e45e5dc40184f1bd66605d
2018-12-07 12:35:19 -08:00
265b55d028 Revert D13205604: Move numa.{h, cc} to c10/util
Differential Revision:
D13205604

Original commit changeset: 54166492d318

fbshipit-source-id: 89b6833518c0b554668c88ae38d97fbc47e2de17
2018-12-07 10:01:25 -08:00
1c9df7facf Expose torch.roll function and method (#14880)
Summary: Fixes #14859 .

Differential Revision: D13376915

Pulled By: zou3519

fbshipit-source-id: f1fc0e8492a159431a3fc0a19a41aa10429ecc80
2018-12-07 07:42:47 -08:00
6651fae827 Make autograd engine compatible with hip
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14873

Differential Revision: D13375053

Pulled By: bddppq

fbshipit-source-id: f3051640386667bbf0566856ed433eb83276c39e
2018-12-07 00:12:06 -08:00
6e453e56f9 Fixed ConvT docstring (#14876)
Summary:
Fixes #14099

I attempted to be as consistent as possible with the formatting, hence why my equation reads d*(k - 1) instead of (k - 1)*d.

Also there is an unused variable on line 46: `n = self.in_channels`. I could fix that here too if that's not too out of scope.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14876

Differential Revision: D13374317

Pulled By: soumith

fbshipit-source-id: a9f110acafa58cdb4206956dbe3ab4738d48292d
2018-12-06 23:57:30 -08:00
51d26e76f7 Updating submodules
Reviewed By: yns88

fbshipit-source-id: 7da015701f18f8a0b5a8092aae02a42ede7bfd44
2018-12-06 22:52:22 -08:00
4655b7bc4b Remove weak module test expect files (#14871)
Summary:
This PR removes some expect files that aren't really testing anything
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14871

Differential Revision: D13373762

Pulled By: driazati

fbshipit-source-id: e3537ee83df23b3b3b854f9b1253fd0cc8e9dd33
2018-12-06 21:55:12 -08:00
1a247f872f gradcheck (#14596)
Summary:
- allow gradcheck to take sparse tensor as input
- sparse output is not allowed yet at gradcheck
- add backward for `to_dense()` to get around sparse output
- calling gradcheck at test_sparse, so that we can use `_gen_sparse()` and also easily cover coalesced / uncoalesced test cases
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14596

Differential Revision: D13271904

Pulled By: weiyangfb

fbshipit-source-id: 5317484104404fd38058884c86e987546011dd86
2018-12-06 18:03:38 -08:00
bfa666eb0d Skipping two c10d tests only if there are multi-GPUs (#14860)
Summary:
Otherwise, these tests will fail, even though there are never meant to run on single GPU machines.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14860

Differential Revision: D13369060

Pulled By: teng-li

fbshipit-source-id: 8a637a6d57335491ba8602cd09927700b2bbf8a0
2018-12-06 17:28:07 -08:00
ada8f828f9 Move TensorOptions, DefaultTensorOptions to c10
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14746

Reviewed By: ezyang

Differential Revision: D13318644

fbshipit-source-id: b703d7dc67e75d9e9571c80d62a100c5fc4e84df
2018-12-06 15:59:04 -08:00
bd3eb87258 Switch Int8MaxPool operator to QNNPACK (#14832)
Summary:
1.6-2.4X speedup on ARM when compiled with gcc
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14832

Differential Revision: D13358160

Pulled By: Maratyszcza

fbshipit-source-id: 39e9791886fac62650bb53a9df341889f0bb5d49
2018-12-06 15:14:28 -08:00
e6a420114f collect_env.py: get conda magma and mkl information (#14854)
Summary:
Fixes #12371
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14854

Differential Revision: D13363635

Pulled By: zou3519

fbshipit-source-id: f8b5d05038bf5ce451399dfeed558ae298178128
2018-12-06 14:58:14 -08:00
ddca0442b6 Add LogSigmoid support in ONNX symbolic (#14830)
Summary:
Add LogSigmoid:

torch.LogSigmoid(x) = onnx.Log(onnx.Sigmoid(x))
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14830

Differential Revision: D13353891

Pulled By: zrphercule

fbshipit-source-id: bf456170b9e6c4edad07b3333cd5797f8e0fa97f
2018-12-06 14:17:33 -08:00
5f0bff9639 Kill GPU memory logs in normal runs (#14838)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14838

The GPU memory tracking logs are incredibly annoying and merely serve
to pollute output. I `VLOG(1)`ed them. Hopefully, this is non-controversial.

Reviewed By: kuttas

Differential Revision: D13343290

fbshipit-source-id: b3cae99346c97b66e97ea660061e15dc5c99b9fc
2018-12-06 13:51:14 -08:00
f82f4de229 Stop inserting static casts in Hipify (#14853)
Summary:
Latest hcc can now properly cast to correct type internally, so there is no need to insert static_cast in hipify scripts anymore.
However the hcc included in the latest ROCm release (1.9.2) doesn't have this fix, so leaving a flag to continue doing static_cast for those using the official ROCm releases.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14853

Differential Revision: D13363171

Pulled By: bddppq

fbshipit-source-id: a36476a8511222ff3c933d31788e8a0ffb04f5ca
2018-12-06 13:19:33 -08:00
b5db6ac9f1 Tensor construction codemod - 3/3 (#14835)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14835

Codemod generated with clangr shard mode, 25 files per diff,
motivation: https://github.com/pytorch/pytorch/pull/12407

Reviewed By: bddppq

Differential Revision: D13335184

fbshipit-source-id: 26d8247e16b30bdff045530034af9b72c76d066f
2018-12-06 11:50:59 -08:00
20d1bff292 Tensor construction codemod - 1/3 (#14828)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14828

Codemod generated with clangr shard mode, 25 files per diff,
motivation: https://github.com/pytorch/pytorch/pull/12407

Reviewed By: bddppq

Differential Revision: D13335160

fbshipit-source-id: a3ae4c5a86bfbdaf2d5aa14e0eef57255e829fd4
2018-12-06 11:47:32 -08:00
1d111853ae Move numa.{h, cc} to c10/util (#14393)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14393

att

Reviewed By: ezyang

Differential Revision: D13205604

fbshipit-source-id: 54166492d31827b0343ed070cc36a825dd86e2ed
2018-12-06 11:30:13 -08:00
75a2d8e2de Upgrade CI to ROCm 1.9.2 (#14216)
Summary:
Drop custom hcc/hip as the 1.9.2 release should contain the relevant patches therein.

Most notable feature in 1.9.2 is mixed precision support in rocBLAS and MIOpen. These features will be enabled by subsequent PRs.

bddppq ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14216

Differential Revision: D13354294

Pulled By: bddppq

fbshipit-source-id: 2541d4a196af21c9432c1aff7f6e65b572628028
2018-12-06 10:13:39 -08:00
1c8d41a08d Allow linspace and logspace with steps=1 and start != end like numpy (#14748)
Summary:
`torch.linspace(0, 1, 1)` fails with `RuntimeError: invalid argument 3: invalid number of points at ../aten/src/TH/generic/THTensorMoreMath.cpp:2119`, while `np.linspace(0, 1, 1)` works fine.
Looking at the code, there is even a comment by gchanan asking: "NumPy allows you to pass different points even if n <= 1 -- should we?"
I would say "yes". Currently, I would need to handle the case of `steps == 1` or `steps == 0` separately, making sure to change the `end` when calling `torch.linspace`. This is impractical. If we support `start != end`, there are two possibilities for the result: Either we ensure the first value in the resulting sequence always equals `start`, or we ensure the last value in the resulting sequence always equals `end`. Numpy chose the former, which also allows it to support a boolean `endpoint` flag. I'd say we should follow numpy.

This PR adapts `linspace` and `logspace` to mimic the behavior of numpy, adapts the tests accordingly, and extends the docstrings to make clear what happens when passing `steps=1`.

If you decide against this PR, the error message should become explicit about what I did wrong, and the documentation should be extended to mention this restriction.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14748

Differential Revision: D13356136

Pulled By: ezyang

fbshipit-source-id: db85b8f0a98a5e24b3acd766132ab71c91794a82
2018-12-06 09:30:55 -08:00
Jie
d2fdc33411 (#14580)
Summary:
Removes cast of half to float in torch.sum, with float16 input tensor and
float32 output tensor, instead we cast data when loading input in kernel.

This supposingly would save a kernel launch as well as a full global memory load
on promoted data type (float).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14580

Differential Revision: D13356203

Pulled By: ezyang

fbshipit-source-id: 85e91225b880a65fe3ceb493371b9b36407fdf48
2018-12-06 09:03:46 -08:00
eb3cabffd6 Consistent formatting in losses' docs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14739

Differential Revision: D13356143

Pulled By: ezyang

fbshipit-source-id: 9ae8316dd8ba6e910247b64cec22db63df10e11c
2018-12-06 09:01:24 -08:00
2e7cc86a62 Add (partial) autodiff support for nll_loss (#14305)
Summary:
Not ready yet, need some comments / help with this. It's good enough for https://github.com/pytorch/xla immediate goals (forward + backward trace fusion), but there are at least two issues with it:

1. If we don't allow it, `test/test_jit.py` fails to cover the change.
2. If we allow the weight to be set, running `test/test_jit.py TestJitGenerated.test_nn_nll_loss` fails with:

```
======================================================================
ERROR: test_nn_nll_loss (__main__.TestJitGenerated)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test/test_jit.py", line 10001, in do_test
    fn, f_args_variable, kwargs_variable, no_grad=no_grad)
  File "test/test_jit.py", line 9360, in check_against_reference
    outputs_test = self.runAndSaveRNG(func, recording_inputs, kwargs)
  File "test/test_jit.py", line 425, in runAndSaveRNG
    results = func(*inputs, **kwargs)
  File "test/test_jit.py", line 9298, in script_fn
    self.assertExportImport(CU.the_method.graph, tensors)
  File "test/test_jit.py", line 415, in assertExportImport
    self.assertExportImportModule(m, inputs)
  File "test/test_jit.py", line 419, in assertExportImportModule
    self.assertEqual(self.runAndSaveRNG(m.forward, inputs),
  File "test/test_jit.py", line 425, in runAndSaveRNG
    results = func(*inputs, **kwargs)
RuntimeError:
arguments for call are not valid:

  for operator aten::nll_loss_backward(Tensor grad_output, Tensor self, Tensor target, Tensor? weight, int reduction, int ignore_index, Tensor total_weight, *, Tensor out) -> Tensor:
  expected a value of type Tensor for argument 'total_weight' but found bool
  <internally-created-node>
  ~ <--- HERE

  for operator aten::nll_loss_backward(Tensor grad_output, Tensor self, Tensor target, Tensor? weight, int reduction, int ignore_index, Tensor total_weight) -> Tensor:
  expected a value of type Tensor for argument 'total_weight' but found bool
  <internally-created-node>
  ~ <--- HERE
for call at:
<internally-created-node>
~ <--- HERE
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14305

Differential Revision: D13356265

Pulled By: ezyang

fbshipit-source-id: 504d783b2d87f923e698a6a4efc0fd9935a94a41
2018-12-06 08:58:54 -08:00
e7bd8457a6 Updating submodules
Reviewed By: yns88

fbshipit-source-id: 2adbb6f97d4b8f067a2538fec855063510b0ca3f
2018-12-06 08:58:53 -08:00
6039c7611f Updating submodules
Reviewed By: yns88

fbshipit-source-id: e0509413215f3b7578b825c52365fec4da625bd5
2018-12-06 02:55:47 -08:00
12addc64a6 Fixed MIOpen RNN Segfault issue and enabled RNN test (#14810)
Summary:
This pull request contains changes for:
1. Added MIOpen RNN API miopenGetRNNLayerBiasSize and miopenGetRNNLayerParamSize.
2. Fixed usage of API miopenGetRNNLayerParam.
3. Modifying the RNN test to run using MIOpen engine.

Differential Revision: D13355699

Pulled By: bddppq

fbshipit-source-id: 6f750657f8049c5446eca893880b397804120b69
2018-12-05 23:54:31 -08:00
39d50ef4f6 Export complete subgraph io info when calling onnxGetBackendCompatibility (#14827)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14827

We need to send complete IO info when doing `onnxGetBackendCompatibility` to backend like Glow. Previously we are missing some info because sometimes we generate more than one nodes from one C2 op. This fixes the issue.

Reviewed By: jackm321

Differential Revision: D13352049

fbshipit-source-id: 8d8ac70656a0ac42f3a0ccecad61456a4f3b2435
2018-12-05 23:52:06 -08:00
ba287eebca Fix clip gradient with empty input (#14709)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14709

As titled

Reviewed By: Wakeupbuddy

Differential Revision: D13305554

fbshipit-source-id: 380062d4b0e4f9dc0207a27766cac7b8d05384d5
2018-12-05 22:53:25 -08:00
997df9a6ec Remove protobuf dependency in pytorch cmake file. (#14182)
Summary:
Currently, pytorch doesn't dependent on protobuf. So, we don't need to include the protobuf dir in pytorch cmake file.
And if we build caffe2 without custom-protobuf[1], we will have the protobuf mismatched problem.

[1]
92dbd0219f/CMakeLists.txt (L65)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14182

Differential Revision: D13356273

Pulled By: ezyang

fbshipit-source-id: 8120c3452d158dc51d70156433d7b9076c6aed47
2018-12-05 22:49:50 -08:00
3799d32b7b Optimize images (#14084)
Summary:
This is a PR that [ImgBot](https://imgbot.net/) opened on my fork https://github.com/zasdfgbnm/pytorch/pull/1, I forward it here.  ImgBot does lossless compression on images to reduce file size.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14084

Differential Revision: D13356293

Pulled By: ezyang

fbshipit-source-id: 731236d95ad870db8ccb99b03ed306704365242c
2018-12-05 22:46:32 -08:00
e27d77815d Prevent profile_observer_test from being run by CPU test (#14168)
Summary:
Fix CMakeLists.txt, so the test for CPU won't run profile_observer_test.cc, as currently it only supports GPU
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14168

Differential Revision: D13356274

Pulled By: ezyang

fbshipit-source-id: 7d105f2e18675e5fab129864958148b0f18d582c
2018-12-05 22:34:29 -08:00
14fb651b5f CAFFE2_INCLUDE_DIRS points to invalid path (#14306)
Summary:
I know that including CAFFE2_INCLUDE_DIRS in include headers are not necessary for newer cmakes. But I had this in one of my old projects and **cmake gave me error that "/usr/lib/include" is invalid path**.

It seems like "${_INSTALL_PREFIX}/lib/include" should be changed to "${_INSTALL_PREFIX}/include" as all caffe2 headers are in /include rather than /lib/include/

Please correct me if I am wrong?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14306

Differential Revision: D13356246

Pulled By: ezyang

fbshipit-source-id: e2d5d3c42352e59b245714ad90fd7a9ef48170d7
2018-12-05 22:32:04 -08:00
5e307bd1be use "Extension" instead of the unimported "setuptools.Extension" (#14475)
Summary:
use "Extension" instead of the unimported "setuptools.Extension"
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14475

Differential Revision: D13356219

Pulled By: ezyang

fbshipit-source-id: 5a3e7eb73a32d6bf09676efd9eddded5586435cd
2018-12-05 22:18:47 -08:00
d393dd0744 generate ATen core files with LF. (#14667)
Summary:
on Windows environment, some ATen core files (Type.h, Tensor.h, TensorMethods.h) are created and it's new line code is CRLF. (maybe enviconment dependant)
therefore, comparing files is failed in generate_outputs()agener917.py and compilation stopped.
this patch generates these files with LF forcibly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14667

Differential Revision: D13356170

Pulled By: ezyang

fbshipit-source-id: ef8cc3a6cc8bf3c45b78e9eb3df98cf47c0d33bb
2018-12-05 22:14:29 -08:00
2d60afbc90 Remove outdated css file and refs in cpp conf.py (#14779)
Summary:
pytorch_theme.css is no longer necessary for the cpp or html docs site build. The new theme styles are located at https://github.com/pytorch/pytorch_sphinx_theme. The Lato font is also no longer used in the new theme.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14779

Differential Revision: D13356125

Pulled By: ezyang

fbshipit-source-id: c7635eb7512c7dcaddb9cad596ab3dbc96480144
2018-12-05 21:55:45 -08:00
82903dda9b Fixes for some Windows compiler warnings (#14490)
Summary:
Implement some simple fixes to clean up windows build by fixing compiler warnings. Three main types of warnings were fixes:

1. GCC specific pragmas were changed to not be used on windows.
2. cmake flags that don't exist on windows were removed from windows build
3. Fix a macro that was defined multiple times on Windows.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14490

Differential Revision: D13241988

Pulled By: ezyang

fbshipit-source-id: 38da8354f0e3a3b9c97e33309cdda9fd23c08247
2018-12-05 21:27:07 -08:00
a6399121da Shut up "address will always evaluate to 'true'" warnings (#14774)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14774

Differential Revision: D13327969

Pulled By: ezyang

fbshipit-source-id: 43380c89eedaaa89467952401b8fd3f5a9ad754a
2018-12-05 21:18:31 -08:00
f9446e0c94 HIPify less files in PyTorch (#14804)
Summary:
Stacked on #14803
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14804

Differential Revision: D13347986

Pulled By: ezyang

fbshipit-source-id: c93177b4ad51855660d0de36d042bfc542bd4be0
2018-12-05 20:52:38 -08:00
ba0ebe33c1 Unify device argument parsing between torch and c10
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14786

Differential Revision: D13334501

Pulled By: bddppq

fbshipit-source-id: ae3536be1fe0dcd6a1552ec93629ecc9554c0d7c
2018-12-05 18:37:32 -08:00
252e9058d4 Improve assertion failure message (#14813)
Summary:
See #14554.

I can't figure out how the reported issue can happen. The best next
thing is have more information when this happens again.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14813

Differential Revision: D13351908

Pulled By: pietern

fbshipit-source-id: 61b30fcae2e34da54329d0893ca4921b6ad60f0d
2018-12-05 17:20:25 -08:00
83ad52634a Add FunctionSchema based Operator Registry (#13789)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13789

This enables creation of operators with FunctionSchema and IValue

Reviewed By: smessmer

Differential Revision: D13008791

fbshipit-source-id: 151efc88ac315f4a0ab0171a99774caaf767ef1e
2018-12-05 17:20:24 -08:00
67dcf10631 Increase test timeout (#14814)
Summary:
It is possible that some sort of contention causes process scheduling
delays which in turn cause the timeout to *not* be hit.

Increased sleep here will decrease the probability of this happening.

Fixes #14555.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14814

Differential Revision: D13351924

Pulled By: pietern

fbshipit-source-id: 1222cf0855408dfcb79f30f94694c790ee998cf9
2018-12-05 17:18:11 -08:00
c02b3e7cea Retry test on address already in use error (#14815)
Summary:
Thanks nairbv for the suggestion.

Also see #14589.

Fixes #14703.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14815

Differential Revision: D13351913

Pulled By: pietern

fbshipit-source-id: d11a4152505d0ce15592b13e417bb80551476a61
2018-12-05 17:09:46 -08:00
6fccca4278 improve ONNX tests on torch.Linear
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14821

Reviewed By: zrphercule

Differential Revision: D13348773

Pulled By: houseroad

fbshipit-source-id: 611ca6e28f715e5518649c8c16f702ac3433308c
2018-12-05 17:07:10 -08:00
1921 changed files with 41644 additions and 40585 deletions

View File

@ -1,7 +1,7 @@
# IMPORTANT: To update Docker image version, please search and update ":{previous_version}"
# in this file to the new version number, and **ALSO** update the version number below:
# PyTorchDockerVersion:262
# Caffe2DockerVersion:230
# Caffe2DockerVersion:238
docker_config_defaults: &docker_config_defaults
user: jenkins
@ -117,7 +117,7 @@ pytorch_linux_test_defaults: &pytorch_linux_test_defaults
<<: *setup_ci_environment
- run:
name: Test
no_output_timeout: "90m"
no_output_timeout: "1h"
command: |
set -e
export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-${CIRCLE_SHA1}
@ -800,7 +800,7 @@ jobs:
caffe2_py2_cuda8_0_cudnn6_ubuntu16_04_build:
environment:
JOB_BASE_NAME: caffe2-py2-cuda8.0-cudnn6-ubuntu16.04-build
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-cuda8.0-cudnn6-ubuntu16.04:230"
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-cuda8.0-cudnn6-ubuntu16.04:238"
CUDA_VERSION: "8"
BUILD_ENVIRONMENT: "py2-cuda8.0-cudnn6-ubuntu16.04"
<<: *caffe2_linux_build_defaults
@ -808,7 +808,7 @@ jobs:
caffe2_py2_cuda8_0_cudnn6_ubuntu16_04_test:
environment:
JOB_BASE_NAME: caffe2-py2-cuda8.0-cudnn6-ubuntu16.04-test
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-cuda8.0-cudnn6-ubuntu16.04:230"
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-cuda8.0-cudnn6-ubuntu16.04:238"
CUDA_VERSION: "8"
BUILD_ENVIRONMENT: "py2-cuda8.0-cudnn6-ubuntu16.04"
resource_class: gpu.medium
@ -817,7 +817,7 @@ jobs:
caffe2_py2_cuda9_0_cudnn7_ubuntu16_04_build:
environment:
JOB_BASE_NAME: caffe2-py2-cuda9.0-cudnn7-ubuntu16.04-build
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-cuda9.0-cudnn7-ubuntu16.04:230"
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-cuda9.0-cudnn7-ubuntu16.04:238"
CUDA_VERSION: "9"
BUILD_ENVIRONMENT: "py2-cuda9.0-cudnn7-ubuntu16.04"
<<: *caffe2_linux_build_defaults
@ -825,7 +825,7 @@ jobs:
caffe2_py2_cuda9_0_cudnn7_ubuntu16_04_test:
environment:
JOB_BASE_NAME: caffe2-py2-cuda9.0-cudnn7-ubuntu16.04-test
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-cuda9.0-cudnn7-ubuntu16.04:230"
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-cuda9.0-cudnn7-ubuntu16.04:238"
CUDA_VERSION: "9"
BUILD_ENVIRONMENT: "py2-cuda9.0-cudnn7-ubuntu16.04"
resource_class: gpu.medium
@ -834,7 +834,7 @@ jobs:
caffe2_py2_cuda9_1_cudnn7_ubuntu16_04_build:
environment:
JOB_BASE_NAME: caffe2-py2-cuda9.1-cudnn7-ubuntu16.04-build
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-cuda9.1-cudnn7-ubuntu16.04:230"
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-cuda9.1-cudnn7-ubuntu16.04:238"
CUDA_VERSION: "9.1"
BUILD_ENVIRONMENT: "py2-cuda9.1-cudnn7-ubuntu16.04"
<<: *caffe2_linux_build_defaults
@ -842,7 +842,7 @@ jobs:
caffe2_py2_cuda9_1_cudnn7_ubuntu16_04_test:
environment:
JOB_BASE_NAME: caffe2-py2-cuda9.1-cudnn7-ubuntu16.04-test
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-cuda9.1-cudnn7-ubuntu16.04:230"
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-cuda9.1-cudnn7-ubuntu16.04:238"
CUDA_VERSION: "9.1"
BUILD_ENVIRONMENT: "py2-cuda9.1-cudnn7-ubuntu16.04"
resource_class: gpu.medium
@ -851,14 +851,14 @@ jobs:
caffe2_py2_mkl_ubuntu16_04_build:
environment:
JOB_BASE_NAME: caffe2-py2-mkl-ubuntu16.04-build
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-mkl-ubuntu16.04:230"
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-mkl-ubuntu16.04:238"
BUILD_ENVIRONMENT: "py2-mkl-ubuntu16.04"
<<: *caffe2_linux_build_defaults
caffe2_py2_mkl_ubuntu16_04_test:
environment:
JOB_BASE_NAME: caffe2-py2-mkl-ubuntu16.04-test
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-mkl-ubuntu16.04:230"
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-mkl-ubuntu16.04:238"
BUILD_ENVIRONMENT: "py2-mkl-ubuntu16.04"
resource_class: large
<<: *caffe2_linux_test_defaults
@ -866,14 +866,14 @@ jobs:
caffe2_py2_gcc4_8_ubuntu14_04_build:
environment:
JOB_BASE_NAME: caffe2-py2-gcc4.8-ubuntu14.04-build
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-gcc4.8-ubuntu14.04:230"
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-gcc4.8-ubuntu14.04:238"
BUILD_ENVIRONMENT: "py2-gcc4.8-ubuntu14.04"
<<: *caffe2_linux_build_defaults
caffe2_py2_gcc4_8_ubuntu14_04_test:
environment:
JOB_BASE_NAME: caffe2-py2-gcc4.8-ubuntu14.04-test
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-gcc4.8-ubuntu14.04:230"
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-gcc4.8-ubuntu14.04:238"
BUILD_ENVIRONMENT: "py2-gcc4.8-ubuntu14.04"
resource_class: large
<<: *caffe2_linux_test_defaults
@ -881,14 +881,14 @@ jobs:
caffe2_onnx_py2_gcc5_ubuntu16_04_build:
environment:
JOB_BASE_NAME: caffe2-onnx-py2-gcc5-ubuntu16.04-build
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-gcc5-ubuntu16.04:230"
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-gcc5-ubuntu16.04:238"
BUILD_ENVIRONMENT: "onnx-py2-gcc5-ubuntu16.04"
<<: *caffe2_linux_build_defaults
caffe2_onnx_py2_gcc5_ubuntu16_04_test:
environment:
JOB_BASE_NAME: caffe2-onnx-py2-gcc5-ubuntu16.04-test
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-gcc5-ubuntu16.04:230"
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-gcc5-ubuntu16.04:238"
BUILD_ENVIRONMENT: "onnx-py2-gcc5-ubuntu16.04"
resource_class: large
<<: *caffe2_linux_test_defaults
@ -896,7 +896,7 @@ jobs:
caffe2_py2_cuda8_0_cudnn7_ubuntu16_04_build:
environment:
JOB_BASE_NAME: caffe2-py2-cuda8.0-cudnn7-ubuntu16.04-build
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-cuda8.0-cudnn7-ubuntu16.04:230"
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-cuda8.0-cudnn7-ubuntu16.04:238"
BUILD_ENVIRONMENT: "py2-cuda8.0-cudnn7-ubuntu16.04"
BUILD_ONLY: "1"
<<: *caffe2_linux_build_defaults
@ -904,7 +904,7 @@ jobs:
caffe2_py2_gcc4_9_ubuntu14_04_build:
environment:
JOB_BASE_NAME: caffe2-py2-gcc4.9-ubuntu14.04-build
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-gcc4.9-ubuntu14.04:230"
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-gcc4.9-ubuntu14.04:238"
BUILD_ENVIRONMENT: "py2-gcc4.9-ubuntu14.04"
BUILD_ONLY: "1"
<<: *caffe2_linux_build_defaults
@ -912,7 +912,7 @@ jobs:
caffe2_py2_clang3_8_ubuntu16_04_build:
environment:
JOB_BASE_NAME: caffe2-py2-clang3.8-ubuntu16.04-build
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-clang3.8-ubuntu16.04:230"
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-clang3.8-ubuntu16.04:238"
BUILD_ENVIRONMENT: "py2-clang3.8-ubuntu16.04"
BUILD_ONLY: "1"
<<: *caffe2_linux_build_defaults
@ -920,7 +920,7 @@ jobs:
caffe2_py2_clang3_9_ubuntu16_04_build:
environment:
JOB_BASE_NAME: caffe2-py2-clang3.9-ubuntu16.04-build
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-clang3.9-ubuntu16.04:230"
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-clang3.9-ubuntu16.04:238"
BUILD_ENVIRONMENT: "py2-clang3.9-ubuntu16.04"
BUILD_ONLY: "1"
<<: *caffe2_linux_build_defaults
@ -928,7 +928,7 @@ jobs:
caffe2_py2_clang7_ubuntu16_04_build:
environment:
JOB_BASE_NAME: caffe2-py2-clang7-ubuntu16.04-build
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-clang7-ubuntu16.04:230"
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-clang7-ubuntu16.04:238"
BUILD_ENVIRONMENT: "py2-clang7-ubuntu16.04"
BUILD_ONLY: "1"
<<: *caffe2_linux_build_defaults
@ -936,7 +936,7 @@ jobs:
caffe2_py2_android_ubuntu16_04_build:
environment:
JOB_BASE_NAME: caffe2-py2-android-ubuntu16.04-build
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-android-ubuntu16.04:230"
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-android-ubuntu16.04:238"
BUILD_ENVIRONMENT: "py2-android-ubuntu16.04"
BUILD_ONLY: "1"
<<: *caffe2_linux_build_defaults
@ -944,14 +944,14 @@ jobs:
caffe2_py2_cuda9_0_cudnn7_centos7_build:
environment:
JOB_BASE_NAME: caffe2-py2-cuda9.0-cudnn7-centos7-build
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-cuda9.0-cudnn7-centos7:230"
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-cuda9.0-cudnn7-centos7:238"
BUILD_ENVIRONMENT: "py2-cuda9.0-cudnn7-centos7"
<<: *caffe2_linux_build_defaults
caffe2_py2_cuda9_0_cudnn7_centos7_test:
environment:
JOB_BASE_NAME: caffe2-py2-cuda9.0-cudnn7-centos7-test
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-cuda9.0-cudnn7-centos7:230"
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-cuda9.0-cudnn7-centos7:238"
CUDA_VERSION: "9.0"
BUILD_ENVIRONMENT: "py2-cuda9.0-cudnn7-centos7"
resource_class: gpu.medium

View File

@ -3,26 +3,29 @@
Checks: '
-*
,bugprone-*
,-bugprone-macro-parentheses
,-bugprone-forward-declaration-namespace
,-bugprone-macro-parentheses
,cppcoreguidelines-*
,-cppcoreguidelines-pro-bounds-array-to-pointer-decay
,-cppcoreguidelines-pro-type-static-cast-downcast
,-cppcoreguidelines-pro-bounds-pointer-arithmetic
,-cppcoreguidelines-pro-bounds-constant-array-index
,-cppcoreguidelines-pro-type-cstyle-cast
,-cppcoreguidelines-pro-type-reinterpret-cast
,-cppcoreguidelines-pro-type-vararg
,-cppcoreguidelines-special-member-functions
,-cppcoreguidelines-interfaces-global-init
,-cppcoreguidelines-owning-memory
,hicpp-signed-bitwise
,-cppcoreguidelines-pro-bounds-array-to-pointer-decay
,-cppcoreguidelines-pro-bounds-constant-array-index
,-cppcoreguidelines-pro-bounds-pointer-arithmetic
,-cppcoreguidelines-pro-type-cstyle-cast
,-cppcoreguidelines-pro-type-reinterpret-cast
,-cppcoreguidelines-pro-type-static-cast-downcast
,-cppcoreguidelines-pro-type-union-access
,-cppcoreguidelines-pro-type-vararg
,-cppcoreguidelines-special-member-functions
,hicpp-exception-baseclass
,hicpp-avoid-goto
,modernize-*
,-modernize-use-default-member-init
,-modernize-return-braced-init-list
,-modernize-use-auto
,-modernize-use-default-member-init
,-modernize-use-using
,performance-*
,-performance-noexcept-move-constructor
'
WarningsAsErrors: '*'
HeaderFilterRegex: 'torch/csrc/.*'

View File

1
.gitignore vendored
View File

@ -23,6 +23,7 @@ aten/build/
aten/src/ATen/Config.h
aten/src/ATen/cuda/CUDAConfig.h
build/
caffe2/cpp_test/
dist/
docs/src/**/*
docs/cpp/build

4
.gitmodules vendored
View File

@ -60,10 +60,10 @@
url = https://github.com/onnx/onnx.git
[submodule "third_party/onnx-tensorrt"]
path = third_party/onnx-tensorrt
url = https://github.com/onnx/onnx-tensorrt
url = https://github.com/bddppq/onnx-tensorrt
[submodule "third_party/sleef"]
path = third_party/sleef
url = https://github.com/shibatch/sleef
url = https://github.com/zdevito/sleef
[submodule "third_party/ideep"]
path = third_party/ideep
url = https://github.com/intel/ideep

23
.jenkins/caffe2/bench.sh Executable file
View File

@ -0,0 +1,23 @@
#!/bin/bash
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
# Anywhere except $ROOT_DIR should work
cd "$INSTALL_PREFIX"
if [[ $BUILD_ENVIRONMENT == *-cuda* ]]; then
num_gpus=$(nvidia-smi -L | wc -l)
elif [[ $BUILD_ENVIRONMENT == *-rocm* ]]; then
num_gpus=$(rocminfo | grep 'Device Type.*GPU' | wc -l)
else
num_gpus=0
fi
cmd="$PYTHON $CAFFE2_PYPATH/python/examples/resnet50_trainer.py --train_data null --batch_size 64 --epoch_size 6400 --num_epochs 2"
if (( $num_gpus == 0 )); then
cmd="$cmd --use_cpu"
else
cmd="$cmd --num_gpus 1"
fi
eval "$cmd"

View File

@ -2,6 +2,14 @@
set -ex
# TODO: Migrate all centos jobs to use proper devtoolset
if [[ "$BUILD_ENVIRONMENT" == "py2-cuda9.0-cudnn7-centos7" ]]; then
# There is a bug in pango packge on Centos7 that causes undefined
# symbols, upgrading glib2 to >=2.56.1 solves the issue. See
# https://bugs.centos.org/view.php?id=15495
sudo yum install -y -q glib2-2.56.1
fi
pip install --user --no-cache-dir hypothesis==3.59.0
# The INSTALL_PREFIX here must match up with test.sh
@ -124,7 +132,24 @@ CMAKE_ARGS+=("-DCMAKE_INSTALL_PREFIX=${INSTALL_PREFIX}")
if [[ $BUILD_ENVIRONMENT == *mkl* ]]; then
CMAKE_ARGS+=("-DBLAS=MKL")
CMAKE_ARGS+=("-DUSE_MKLDNN=ON")
fi
if [[ $BUILD_ENVIRONMENT == py2-cuda9.0-cudnn7-ubuntu16.04 ]]; then
# removing http:// duplicate in favor of nvidia-ml.list
# which is https:// version of the same repo
sudo rm -f /etc/apt/sources.list.d/nvidia-machine-learning.list
curl -o ./nvinfer-runtime-trt-repo-ubuntu1604-5.0.2-ga-cuda9.0_1-1_amd64.deb https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/nvinfer-runtime-trt-repo-ubuntu1604-5.0.2-ga-cuda9.0_1-1_amd64.deb
sudo dpkg -i ./nvinfer-runtime-trt-repo-ubuntu1604-5.0.2-ga-cuda9.0_1-1_amd64.deb
sudo apt-key add /var/nvinfer-runtime-trt-repo-5.0.2-ga-cuda9.0/7fa2af80.pub
sudo apt-get -qq update
sudo apt-get install libnvinfer5 libnvinfer-dev
rm ./nvinfer-runtime-trt-repo-ubuntu1604-5.0.2-ga-cuda9.0_1-1_amd64.deb
CMAKE_ARGS+=("-DUSE_TENSORRT=ON")
fi
if [[ $BUILD_ENVIRONMENT == *cuda* ]]; then
CMAKE_ARGS+=("-DUSE_CUDA=ON")
CMAKE_ARGS+=("-DCUDA_ARCH_NAME=Maxwell")
@ -204,6 +229,11 @@ if [[ -z "$INTEGRATED" ]]; then
exit 1
fi
# This is to save test binaries for testing
mv "$INSTALL_PREFIX/test/" "$INSTALL_PREFIX/cpp_test/"
ls $INSTALL_PREFIX
else
# sccache will be stuck if all cores are used for compiling
@ -212,10 +242,12 @@ else
export MAX_JOBS=`expr $(nproc) - 1`
fi
USE_LEVELDB=1 USE_LMDB=1 USE_OPENCV=1 BUILD_BINARY=1 python setup.py install --user
USE_LEVELDB=1 USE_LMDB=1 USE_OPENCV=1 BUILD_TEST=1 BUILD_BINARY=1 python setup.py install --user
# This is to save test binaries for testing
cp -r torch/lib/tmp_install $INSTALL_PREFIX
mkdir -p "$INSTALL_PREFIX/cpp_test/"
cp -r caffe2/test/* "$INSTALL_PREFIX/cpp_test/"
ls $INSTALL_PREFIX

22
.jenkins/caffe2/common.sh Normal file
View File

@ -0,0 +1,22 @@
set -ex
LOCAL_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
ROOT_DIR=$(cd "$LOCAL_DIR"/../.. && pwd)
# Figure out which Python to use
PYTHON="python"
if [[ "${BUILD_ENVIRONMENT}" =~ py((2|3)\.?[0-9]?\.?[0-9]?) ]]; then
PYTHON="python${BASH_REMATCH[1]}"
fi
# Find where Caffe2 is installed. This will be the absolute path to the
# site-packages of the active Python installation
INSTALL_PREFIX="/usr/local/caffe2"
SITE_DIR=$($PYTHON -c "from distutils import sysconfig; print(sysconfig.get_python_lib(prefix=''))")
INSTALL_SITE_DIR="${INSTALL_PREFIX}/${SITE_DIR}"
CAFFE2_PYPATH="$INSTALL_SITE_DIR/caffe2"
# Set PYTHONPATH and LD_LIBRARY_PATH so that python can find the installed
# Caffe2.
export PYTHONPATH="${PYTHONPATH}:$INSTALL_SITE_DIR"
export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:${INSTALL_PREFIX}/lib"

View File

@ -1,23 +1,6 @@
#!/bin/bash
set -ex
LOCAL_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
ROOT_DIR=$(cd "$LOCAL_DIR"/../.. && pwd)
TEST_DIR=$ROOT_DIR/caffe2_tests
# Figure out which Python to use
PYTHON="python"
if [[ "${BUILD_ENVIRONMENT}" =~ py((2|3)\.?[0-9]?\.?[0-9]?) ]]; then
PYTHON="python${BASH_REMATCH[1]}"
fi
# The prefix must mirror the setting from build.sh
INSTALL_PREFIX="/usr/local/caffe2"
# Add the site-packages in the caffe2 install prefix to the PYTHONPATH
SITE_DIR=$($PYTHON -c "from distutils import sysconfig; print(sysconfig.get_python_lib(prefix=''))")
INSTALL_SITE_DIR="${INSTALL_PREFIX}/${SITE_DIR}"
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
# Skip tests in environments where they are not built/applicable
if [[ "${BUILD_ENVIRONMENT}" == *-android* ]]; then
@ -25,41 +8,34 @@ if [[ "${BUILD_ENVIRONMENT}" == *-android* ]]; then
exit 0
fi
# Set PYTHONPATH and LD_LIBRARY_PATH so that python can find the installed
# Caffe2.
export PYTHONPATH="${PYTHONPATH}:$INSTALL_SITE_DIR"
export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:${INSTALL_PREFIX}/lib"
cd "$ROOT_DIR"
if [ -d $TEST_DIR ]; then
echo "Directory $TEST_DIR already exists; please remove it..."
exit 1
fi
mkdir -p $TEST_DIR/{cpp,python}
TEST_DIR="$ROOT_DIR/caffe2_tests"
rm -rf "$TEST_DIR" && mkdir -p "$TEST_DIR"
cd "${WORKSPACE}"
# C++ tests
#############
# C++ tests #
#############
echo "Running C++ tests.."
gtest_reports_dir="${TEST_DIR}/cpp"
junit_reports_dir="${TEST_DIR}/junit_reports"
mkdir -p "$gtest_reports_dir" "$junit_reports_dir"
for test in $(find "${INSTALL_PREFIX}/test" -executable -type f); do
mkdir -p "$gtest_reports_dir"
for test in $(find "${INSTALL_PREFIX}/cpp_test" -executable -type f); do
case "$test" in
# skip tests we know are hanging or bad
*/mkl_utils_test|*/aten/integer_divider_test)
continue
;;
*/scalar_tensor_test|*/basic|*/native_test)
if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then
continue
else
"$test"
fi
;;
*)
if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then
continue
else
"$test"
fi
;;
*)
# Currently, we use a mixture of gtest (caffe2) and Catch2 (ATen). While
# planning to migrate to gtest as the common PyTorch c++ test suite, we
# currently do NOT use the xml test reporter, because Catch doesn't
@ -70,14 +46,17 @@ for test in $(find "${INSTALL_PREFIX}/test" -executable -type f); do
# output than it is to have XML output for Jenkins.
# Note: in the future, if we want to use xml test reporter once we switch
# to all gtest, one can simply do:
# "$test" --gtest_output=xml:"$gtest_reports_dir/$(basename $test).xml"
"$test"
"$test" --gtest_output=xml:"$gtest_reports_dir/$(basename $test).xml"
;;
esac
done
# Get the relative path to where the caffe2 python module was installed
CAFFE2_PYPATH="$INSTALL_SITE_DIR/caffe2"
################
# Python tests #
################
pytest_reports_dir="${TEST_DIR}/python"
mkdir -p "$pytest_reports_dir"
# Collect additional tests to run (outside caffe2/python)
EXTRA_TESTS=()
@ -98,7 +77,6 @@ if [[ $BUILD_ENVIRONMENT == *-rocm* ]]; then
rocm_ignore_test+=("--ignore $CAFFE2_PYPATH/python/operator_test/unique_ops_test.py")
fi
# Python tests
# NB: Warnings are disabled because they make it harder to see what
# the actual erroring test is
echo "Running Python tests.."
@ -108,7 +86,7 @@ pip install --user pytest-sugar
-x \
-v \
--disable-warnings \
--junit-xml="$TEST_DIR/python/result.xml" \
--junit-xml="$pytest_reports_dir/result.xml" \
--ignore "$CAFFE2_PYPATH/python/test/executor_test.py" \
--ignore "$CAFFE2_PYPATH/python/operator_test/matmul_op_test.py" \
--ignore "$CAFFE2_PYPATH/python/operator_test/pack_ops_test.py" \

View File

@ -14,18 +14,8 @@ clang --version
# symbolize=1: Gives us much better errors when things go wrong
export ASAN_OPTIONS=detect_leaks=0:symbolize=1
# FIXME: Remove the hardcoded "-pthread" option.
# With asan build, the cmake thread CMAKE_HAVE_LIBC_CREATE[1] checking will
# succeed because "pthread_create" is in libasan.so. However, libasan doesn't
# have the full pthread implementation. Other advanced pthread functions doesn't
# exist in libasan.so[2]. If we need some pthread advanced functions, we still
# need to link the pthread library.
# [1] https://github.com/Kitware/CMake/blob/8cabaaf054a16ea9c8332ce8e9291bd026b38c62/Modules/FindThreads.cmake#L135
# [2] https://wiki.gentoo.org/wiki/AddressSanitizer/Problems
#
# TODO: Make the ASAN flags a more unified env var
CC="clang" CXX="clang++" LDSHARED="clang --shared" \
CFLAGS="-fsanitize=address -fsanitize=undefined -fno-sanitize-recover=all -shared-libasan -pthread" \
CXX_FLAGS="-pthread" \
CFLAGS="-fsanitize=address -fsanitize=undefined -fno-sanitize-recover=all -shared-libasan" \
NO_CUDA=1 USE_MKLDNN=0 \
python setup.py install

View File

@ -65,7 +65,7 @@ if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then
fi
# Setup wrapper scripts
for compiler in cc c++ gcc g++ x86_64-linux-gnu-gcc; do
for compiler in cc c++ gcc g++; do
(
echo "#!/bin/sh"
echo "exec $SCCACHE $(which $compiler) \"\$@\""

View File

@ -141,6 +141,11 @@ if not "%USE_CUDA%"=="0" (
sccache --show-stats
sccache --zero-stats
rd /s /q %CONDA_PARENT_DIR%\\Miniconda3\\Lib\\site-packages\\torch
for /f "delims=" %%i in ('where /R caffe2\proto *.py') do (
IF NOT "%%i" == "%CD%\caffe2\proto\__init__.py" (
del /S /Q %%i
)
)
copy %CD%\\tmp_bin\\sccache.exe tmp_bin\\nvcc.exe
)

View File

@ -28,7 +28,7 @@ matrix:
script: mypy @mypy-files.txt
- env: CPP_DOC_CHECK
python: "3.6"
install:
install:
- sudo apt-get install -y doxygen
- pip install -r requirements.txt
script: cd docs/cpp/source && ./check-doxygen.sh
@ -41,3 +41,4 @@ matrix:
- llvm-toolchain-trusty
packages: clang-tidy
script: tools/run-clang-tidy-in-ci.sh

View File

@ -65,7 +65,6 @@ option(BUILD_DOCS "Build Caffe2 documentation" OFF)
option(BUILD_CUSTOM_PROTOBUF "Build and use Caffe2's own protobuf under third_party" ON)
option(BUILD_PYTHON "Build Python binaries" ON)
option(BUILD_CAFFE2_OPS "Build Caffe2 operators" ON)
option(BUILD_C10_EXPERIMENTAL_OPS "Build c10 experimental operators" ON)
option(BUILD_SHARED_LIBS "Build libcaffe2.so" ON)
cmake_dependent_option(
CAFFE2_LINK_LOCAL_PROTOBUF "If set, build protobuf inside libcaffe2.so." ON
@ -75,7 +74,7 @@ cmake_dependent_option(
"NOT BUILD_SHARED_LIBS" OFF)
option(BUILD_TEST "Build C++ test binaries (need gtest and gbenchmark)" OFF)
cmake_dependent_option(
INSTALL_TEST "Install test binaries if BUILD_TEST is on" OFF
INSTALL_TEST "Install test binaries if BUILD_TEST is on" ON
"BUILD_TEST" OFF)
option(USE_ACL "Use ARM Compute Library" OFF)
option(USE_ASAN "Use Address Sanitizer" OFF)
@ -93,7 +92,6 @@ option(USE_LEVELDB "Use LEVELDB" ON)
option(USE_LITE_PROTO "Use lite protobuf instead of full." OFF)
option(USE_LMDB "Use LMDB" ON)
option(USE_METAL "Use Metal for iOS build" ON)
option(USE_MOBILE_OPENGL "Use OpenGL for mobile code" ON)
option(USE_NATIVE_ARCH "Use -march=native" OFF)
option(USE_NCCL "Use NCCL" ON)
option(USE_SYSTEM_NCCL "Use system-wide NCCL" OFF)
@ -200,6 +198,10 @@ include(ExternalProject)
# ---[ Dependencies
include(cmake/Dependencies.cmake)
if(USE_FBGEMM)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DUSE_FBGEMM")
endif()
# ---[ Whitelist file if whitelist is specified
include(cmake/Whitelist.cmake)

View File

@ -3,15 +3,15 @@
If you are interested in contributing to PyTorch, your contributions will fall
into two categories:
1. You want to propose a new Feature and implement it
- post about your intended feature, and we shall discuss the design and
- Post about your intended feature, and we shall discuss the design and
implementation. Once we agree that the plan looks good, go ahead and implement it.
2. You want to implement a feature or bug-fix for an outstanding issue
- Look at the outstanding issues here: https://github.com/pytorch/pytorch/issues
- Especially look at the Low Priority and Medium Priority issues
- Pick an issue and comment on the task that you want to work on this feature
- Especially look at the Low Priority and Medium Priority issues.
- Pick an issue and comment on the task that you want to work on this feature.
- If you need more context on a particular issue, please ask and we shall provide.
Once you finish implementing a feature or bugfix, please send a Pull Request to
Once you finish implementing a feature or bug-fix, please send a Pull Request to
https://github.com/pytorch/pytorch
If you are not familiar with creating a Pull Request, here are some guides:
@ -24,7 +24,7 @@ If you are not familiar with creating a Pull Request, here are some guides:
To develop PyTorch on your machine, here are some tips:
1. Uninstall all existing PyTorch installs:
```
```bash
conda uninstall pytorch
pip uninstall torch
pip uninstall torch # run this command twice
@ -32,80 +32,81 @@ pip uninstall torch # run this command twice
2. Clone a copy of PyTorch from source:
```
```bash
git clone https://github.com/pytorch/pytorch
cd pytorch
```
3. Install PyTorch in `build develop` mode:
A full set of instructions on installing PyTorch from Source are here:
A full set of instructions on installing PyTorch from source is here:
https://github.com/pytorch/pytorch#from-source
The change you have to make is to replace
```
```bash
python setup.py install
```
with
```
```bash
python setup.py build develop
```
This is especially useful if you are only changing Python files.
This mode will symlink the python files from the current local source tree into the
python install.
This mode will symlink the Python files from the current local source tree into the
Python install.
Hence, if you modify a python file, you do not need to reinstall pytorch again and again.
Hence, if you modify a Python file, you do not need to reinstall PyTorch again and again.
For example:
- Install local pytorch in `build develop` mode
- modify your python file `torch/__init__.py` (for example)
- Install local PyTorch in `build develop` mode
- modify your Python file `torch/__init__.py` (for example)
- test functionality
- modify your python file `torch/__init__.py`
- modify your Python file `torch/__init__.py`
- test functionality
- modify your python file `torch/__init__.py`
- modify your Python file `torch/__init__.py`
- test functionality
You do not need to repeatedly install after modifying python files.
You do not need to repeatedly install after modifying Python files.
In case you want to reinstall, make sure that you uninstall pytorch first by running `pip uninstall torch`
In case you want to reinstall, make sure that you uninstall PyTorch first by running `pip uninstall torch`
and `python setup.py clean`. Then you can install in `build develop` mode again.
## Codebase structure
* [c10](c10) - Core library files that work everywhere, both server
and mobile. We are slowly moving pieces from ATen/core here.
This library is intended only to contain essential functionality,
and appropriate to use in settings where binary size matters. (But
and mobile. We are slowly moving pieces from [ATen/core](aten/src/ATen/core)
here. This library is intended only to contain essential functionality,
and appropriate to use in settings where binary size matters. (But
you'll have a lot of missing functionality if you try to use it
directly.)
* [aten](aten) - C++ tensor library for PyTorch (no autograd support)
* src
* [src](aten/src)
* [TH](aten/src/TH)
[THC](aten/src/THC)
[THNN](aten/src/THNN)
[THCUNN](aten/src/THCUNN) - Legacy library code from the original
Torch. Try not to add things here; we're slowly porting these to
native.
Torch. Try not to add things here; we're slowly porting these to
[native](aten/src/ATen/native).
* generic - Contains actual implementations of operators,
parametrized over `scalar_t`. Files here get compiled N times
parametrized over `scalar_t`. Files here get compiled N times
per supported scalar type in PyTorch.
* ATen
* [core](aten/src/ATen/core) - Core functionality of ATen. This
* [ATen](aten/src/ATen)
* [core](aten/src/ATen/core) - Core functionality of ATen. This
is migrating to top-level c10 folder.
* [native](aten/src/ATen/native) - Modern implementations of
operators. If you want to write a new operator, here is where
it should go. Most CPU operators go in the top level directory,
operators. If you want to write a new operator, here is where
it should go. Most CPU operators go in the top level directory,
except for operators which need to be compiled specially; see
cpu below.
* [cpu](aten/src/ATen/native/cpu) - Not actually CPU
implementations of operators, but specifically implementations
which are compiled with processor-specific instructions, like
AVX. See the README for more details.
AVX. See the [README](aten/src/ATen/native/cpu/README.md) for more
details.
* [cuda](aten/src/ATen/native/cuda) - CUDA implementations of
operators.
* [sparse](aten/src/ATen/native/sparse) - CPU and CUDA
@ -114,34 +115,34 @@ and `python setup.py clean`. Then you can install in `build develop` mode again.
[miopen](aten/src/ATen/native/miopen) [cudnn](aten/src/ATen/native/cudnn)
- implementations of operators which simply bind to some
backend library.
* [torch](torch) - The actual PyTorch library. Everything that is not
in csrc is Python modules, following the PyTorch Python frontend
module structure.
* [csrc](torch/csrc) - C++ files composing the PyTorch library. Files
* [torch](torch) - The actual PyTorch library. Everything that is not
in [csrc](torch/csrc) is a Python module, following the PyTorch Python
frontend module structure.
* [csrc](torch/csrc) - C++ files composing the PyTorch library. Files
in this directory tree are a mix of Python binding code, and C++
heavy lifting. Consult `setup.py` for the canonical list of Python
heavy lifting. Consult `setup.py` for the canonical list of Python
binding files; conventionally, they are often prefixed with
`python_`.
* [jit](torch/csrc/jit) - Compiler and frontend for TorchScript JIT
frontend.
* [autograd](torch/csrc/autograd) - Implementation of reverse-mode automatic
differentation
differentiation.
* [api](torch/csrc/api) - The PyTorch C++ frontend.
* [distributed](torch/csrc/distributed) - Distributed training
support for PyTorch.
* [tools](tools) - Code generation scripts for the PyTorch library.
See README of this directory for more details.
* [test](tests) - Python unit tests for PyTorch Python frontend
See [README](tools/README.md) of this directory for more details.
* [test](tests) - Python unit tests for PyTorch Python frontend.
* [test_torch.py](test/test_torch.py) - Basic tests for PyTorch
functionality
functionality.
* [test_autograd.py](test/test_autograd.py) - Tests for non-NN
automatic differentiation support
automatic differentiation support.
* [test_nn.py](test/test_nn.py) - Tests for NN operators and
their automatic differentiation
their automatic differentiation.
* [test_jit.py](test/test_jit.py) - Tests for the JIT compiler
and TorchScript
and TorchScript.
* ...
* [cpp](test/cpp) - C++ unit tests for PyTorch C++ frontend
* [cpp](test/cpp) - C++ unit tests for PyTorch C++ frontend.
* [expect](test/expect) - Automatically generated "expect" files
which are used to compare against expected output.
* [onnx](test/onnx) - Tests for ONNX export functionality,
@ -149,15 +150,15 @@ and `python setup.py clean`. Then you can install in `build develop` mode again.
* [caffe2](caffe2) - The Caffe2 library.
* [core](caffe2/core) - Core files of Caffe2, e.g., tensor, workspace,
blobs, etc.
* [operators](caffe2/operators) - Operators of Caffe2
* [python](caffe2/python) - Python bindings to Caffe2
* [operators](caffe2/operators) - Operators of Caffe2.
* [python](caffe2/python) - Python bindings to Caffe2.
* ...
## Unit testing
PyTorch's testing is located under `test/`. Run the entire test suite with
```
```bash
python test/run_test.py
```
@ -169,7 +170,7 @@ a number of useful features for local developing. Install it via `pip install py
If you want to just run tests that contain a specific substring, you can use the `-k` flag:
```
```bash
pytest test/test_nn.py -k Loss -v
```
@ -198,16 +199,16 @@ commands. To run this check locally, run `./check-doxygen.sh` from inside
## Managing multiple build trees
One downside to using `python setup.py develop` is that your development
version of pytorch will be installed globally on your account (e.g., if
version of PyTorch will be installed globally on your account (e.g., if
you run `import torch` anywhere else, the development version will be
used.
If you want to manage multiple builds of PyTorch, you can make use of
[conda environments](https://conda.io/docs/using/envs.html) to maintain
separate Python package environments, each of which can be tied to a
specific build of PyTorch. To set one up:
specific build of PyTorch. To set one up:
```
```bash
conda create -n pytorch-myfeature
source activate pytorch-myfeature
# if you run python now, torch will NOT be installed
@ -219,7 +220,7 @@ python setup.py build develop
If you are working on the C++ code, there are a few important things that you
will want to keep in mind:
1. How to rebuild only the code you are working on, and
1. How to rebuild only the code you are working on.
2. How to make rebuilds in the absence of changes go faster.
### Build only what you need.
@ -229,10 +230,10 @@ not very optimized for incremental rebuilds, this will actually be very slow.
Far better is to only request rebuilds of the parts of the project you are
working on:
- Working on the Python bindings? Run `python setup.py develop` to rebuild
- Working on the Python bindings? Run `python setup.py develop` to rebuild
(NB: no `build` here!)
- Working on `torch/csrc` or `aten`? Run `python setup.py rebuild_libtorch` to
- Working on `torch/csrc` or `aten`? Run `python setup.py rebuild_libtorch` to
rebuild and avoid having to rebuild other dependent libraries we
depend on.
@ -240,18 +241,19 @@ working on:
targets are listed in `dep_libs` in `setup.py`. prepend `build_` to
get a target, and run as e.g. `python setup.py build_gloo`.
- Working on a test binary? Run `(cd build && ninja bin/test_binary_name)` to
rebuild only that test binary (without rerunning cmake). (Replace `ninja` with
- Working on a test binary? Run `(cd build && ninja bin/test_binary_name)` to
rebuild only that test binary (without rerunning cmake). (Replace `ninja` with
`make` if you don't have ninja installed).
On the initial build, you can also speed things up with the environment
variables `DEBUG` and `NO_CUDA`.
- `DEBUG=1` will enable debug builds (-g -O0)
- `REL_WITH_DEB_INFO=1` will enable debug symbols with optimizations (-g -O3)
- `NO_CUDA=1` will disable compiling CUDA (in case you are developing on something not CUDA related), to save compile time.
For example:
```
```bash
NO_CUDA=1 DEBUG=1 python setup.py build develop
```
@ -270,9 +272,9 @@ information for the code in `torch/csrc`. More information at:
#### Use Ninja
Python `setuptools` is pretty dumb, and always rebuilds every C file in a
project. If you install the ninja build system with `pip install ninja`,
project. If you install the ninja build system with `pip install ninja`,
then PyTorch will use it to track dependencies correctly.
If pytorch was already built, you will need to run `python setup.py clean` once
If PyTorch was already built, you will need to run `python setup.py clean` once
after installing ninja for builds to succeed.
#### Use CCache
@ -283,9 +285,9 @@ compilation was exactly the same.
Using ccache in a situation like this is a real time-saver. However, by
default, ccache does not properly support CUDA stuff, so here are the
instructions for installing a custom `ccache` fork that has CUDA support:
instructions for installing a custom ccache fork that has CUDA support:
```
```bash
# install and export ccache
if ! ls ~/ccache/bin/ccache
then
@ -339,13 +341,13 @@ than Linux, which are worth keeping in mind when fixing these problems.
1. Symbols are NOT exported by default on Windows; instead, you have to explicitly
mark a symbol as exported/imported in a header file with `__declspec(dllexport)` /
`__declspec(dllimport)`. We have codified this pattern into a set of macros
`__declspec(dllimport)`. We have codified this pattern into a set of macros
which follow the convention `*_API`, e.g., `CAFFE2_API` inside Caffe2 and ATen.
(Every separate shared library needs a unique macro name, because symbol visibility
is on a per shared library basis. See c10/macros/Macros.h for more details.)
The upshot is if you see an "unresolved external" error in your Windows build, this
is probably because you forgot to mark a function with `*_API`. However, there is
is probably because you forgot to mark a function with `*_API`. However, there is
one important counterexample to this principle: if you want a *templated* function
to be instantiated at the call site, do NOT mark it with `*_API` (if you do mark it,
you'll have to explicitly instantiate all of the specializations used by the call
@ -353,7 +355,7 @@ than Linux, which are worth keeping in mind when fixing these problems.
2. If you link against a library, this does not make its dependencies transitively
visible. You must explicitly specify a link dependency against every library whose
symbols you use. (This is different from Linux where in most environments,
symbols you use. (This is different from Linux where in most environments,
transitive dependencies can be used to fulfill unresolved symbols.)
3. If you have a Windows box (we have a few on EC2 which you can request access to) and
@ -363,10 +365,10 @@ than Linux, which are worth keeping in mind when fixing these problems.
Even if you don't know anything about MSVC, you can use cmake to build simple programs on
Windows; this can be helpful if you want to learn more about some peculiar linking behavior
by reproducing it on a small example. Here's a simple example cmake file that defines
by reproducing it on a small example. Here's a simple example cmake file that defines
two dynamic libraries, one linking with the other:
```
```CMake
project(myproject CXX)
set(CMAKE_CXX_STANDARD 11)
add_library(foo SHARED foo.cpp)
@ -378,7 +380,7 @@ target_link_libraries(bar PUBLIC foo)
You can build it with:
```
```bash
mkdir build
cd build
cmake ..
@ -392,44 +394,44 @@ these exciting features lead to exciting bugs in Windows compilers.
To add insult to injury, the error messages will often not tell you
which line of code actually induced the erroring template instantiation.
I've found the most effective way to debug these problems is to
We've found the most effective way to debug these problems is to
carefully read over diffs, keeping in mind known bugs in MSVC/NVCC.
Here are a few well known pitfalls and workarounds:
* This is not actually a bug per se, but in general, code generated by MSVC
is more sensitive to memory errors; you may have written some code
that does a use-after-free or stack overflows; on Linux the code
might work, but on Windows your program will crash. ASAN may not
might work, but on Windows your program will crash. ASAN may not
catch all of these problems: stay vigilant to the possibility that
your crash is due to a real memory problem.
* (NVCC) `c10::optional` does not work when used from device code. Don't use
it from kernels. Upstream issue: https://github.com/akrzemi1/Optional/issues/58
* (NVCC) `c10::optional` does not work when used from device code. Don't use
it from kernels. Upstream issue: https://github.com/akrzemi1/Optional/issues/58
and our local issue #10329.
* `constexpr` generally works less well on MSVC.
* The idiom `static_assert(f() == f())` to test if `f` is constexpr
does not work; you'll get "error C2131: expression did not evaluate
to a constant". Don't use these asserts on Windows.
to a constant". Don't use these asserts on Windows.
(Example: `c10/util/intrusive_ptr.h`)
* (NVCC) Code you access inside a `static_assert` will eagerly be
evaluated as if it were device code, and so you might get an error
that the code is "not accessible".
```
```cpp
class A {
static A singleton_;
static constexpr inline A* singleton() {
return &singleton_;
}
};
static_assert(std::is_same(A*, decltype(A::singelton()))::value, "hmm");
static_assert(std::is_same(A*, decltype(A::singleton()))::value, "hmm");
```
* The compiler will run out of heap if you attempt to compile files that
are too large. Splitting such files into separate files helps.
* The compiler will run out of heap space if you attempt to compile files that
are too large. Splitting such files into separate files helps.
(Example: `THTensorMath`, `THTensorMoreMath`, `THTensorEvenMoreMath`.)
### Running Clang-Tidy
@ -453,8 +455,8 @@ have more checks than older versions. In our CI, we run clang-tidy-6.0.
git revision (you may want to replace `HEAD~1` with `HEAD` to pick up
uncommitted changes). Changes are picked up based on a `git diff` with the
given revision:
```sh
$ python tools/clang_tidy.py -d build -p torch/csrc --diff 'HEAD~1'
```bash
python tools/clang_tidy.py -d build -p torch/csrc --diff 'HEAD~1'
```
Above, it is assumed you are in the PyTorch root folder. `path/to/build` should
@ -463,26 +465,36 @@ root folder if you used `setup.py build`. You can use `-c <clang-tidy-binary>`
to change the clang-tidy this script uses. Make sure you have PyYaml installed,
which is in PyTorch's `requirements.txt`.
### Pre-commit Tidy/Linting Hook
We use clang-tidy and flake8 to perform additional formatting and semantic checking
of code. We provide a pre-commit git hook for performing these checks, before
a commit is created:
```bash
ln -s ../../tools/git-pre-commit .git/hooks/pre-commit
```
## Caffe2 notes
In 2018, we merged Caffe2 into the PyTorch source repository. While the
In 2018, we merged Caffe2 into the PyTorch source repository. While the
steady state aspiration is that Caffe2 and PyTorch share code freely,
in the meantime there will be some separation.
If you submit a PR to only PyTorch or only Caffe2 code, CI will only
run for the project you edited. The logic for this is implemented
run for the project you edited. The logic for this is implemented
in `.jenkins/pytorch/dirty.sh` and `.jenkins/caffe2/dirty.sh`; you
can look at this to see what path prefixes constitute changes.
This also means if you ADD a new top-level path, or you start
sharing code between projects, you need to modify these files.
There are a few "unusual" directories which, for historical reasons,
are Caffe2/PyTorch specific. Here they are:
are Caffe2/PyTorch specific. Here they are:
- `CMakeLists.txt`, `Makefile`, `binaries`, `cmake`, `conda`, `modules`,
`scripts` are Caffe2-specific. Don't put PyTorch code in them without
`scripts` are Caffe2-specific. Don't put PyTorch code in them without
extra coordination.
- `mypy*`, `requirements.txt`, `setup.py`, `test`, `tools` are
PyTorch-specific. Don't put Caffe2 code in them without extra
PyTorch-specific. Don't put Caffe2 code in them without extra
coordination.

View File

@ -8,8 +8,6 @@ PyTorch is a Python package that provides two high-level features:
You can reuse your favorite Python packages such as NumPy, SciPy and Cython to extend PyTorch when needed.
We are in an early-release beta. Expect some adventures and rough edges.
- [More about PyTorch](#more-about-pytorch)
- [Installation](#installation)
- [Binaries](#binaries)
@ -33,7 +31,7 @@ We are in an early-release beta. Expect some adventures and rough edges.
See also the [ci.pytorch.org HUD](https://ezyang.github.io/pytorch-ci-hud/build/pytorch-master).
## More about PyTorch
## More About PyTorch
At a granular level, PyTorch is a library that consists of the following components:
@ -44,12 +42,11 @@ At a granular level, PyTorch is a library that consists of the following compone
| **torch.nn** | a neural networks library deeply integrated with autograd designed for maximum flexibility |
| **torch.multiprocessing** | Python multiprocessing, but with magical memory sharing of torch Tensors across processes. Useful for data loading and Hogwild training |
| **torch.utils** | DataLoader, Trainer and other utility functions for convenience |
| **torch.legacy(.nn/.optim)** | legacy code that has been ported over from torch for backward compatibility reasons |
Usually one uses PyTorch either as:
- a replacement for NumPy to use the power of GPUs.
- a deep learning research platform that provides maximum flexibility and speed
- a deep learning research platform that provides maximum flexibility and speed.
Elaborating further:
@ -117,7 +114,7 @@ We've written custom memory allocators for the GPU to make sure that
your deep learning models are maximally memory efficient.
This enables you to train bigger deep learning models than before.
### Extensions without Pain
### Extensions Without Pain
Writing new neural network modules, or interfacing with PyTorch's Tensor API was designed to be straightforward
and with minimal abstractions.
@ -133,7 +130,6 @@ There is no wrapper code that needs to be written. You can see [a tutorial here]
### Binaries
Commands to install from binaries via Conda or pip wheels are on our website:
[https://pytorch.org](https://pytorch.org)
### From Source
@ -154,31 +150,20 @@ If you want to build on Windows, Visual Studio 2017 14.11 toolset and NVTX are a
Especially, for CUDA 8 build on Windows, there will be an additional requirement for VS 2015 Update 3 and a patch for it.
The details of the patch can be found out [here](https://support.microsoft.com/en-gb/help/4020481/fix-link-exe-crashes-with-a-fatal-lnk1000-error-when-you-use-wholearch).
#### Install optional dependencies
#### Install Dependencies
Common
```
conda install numpy pyyaml mkl mkl-include setuptools cmake cffi typing
```
On Linux
```bash
export CMAKE_PREFIX_PATH="$(dirname $(which conda))/../" # [anaconda root directory]
# Install basic dependencies
conda install numpy pyyaml mkl mkl-include setuptools cmake cffi typing
conda install -c mingfeima mkldnn
# Add LAPACK support for the GPU
# Add LAPACK support for the GPU if needed
conda install -c pytorch magma-cuda92 # or [magma-cuda80 | magma-cuda91] depending on your cuda version
```
On macOS
```bash
export CMAKE_PREFIX_PATH=[anaconda root directory]
conda install numpy pyyaml mkl mkl-include setuptools cmake cffi typing
```
On Windows
```cmd
conda install numpy pyyaml mkl mkl-include setuptools cmake cffi typing
```
#### Get the PyTorch source
#### Get the PyTorch Source
```bash
git clone --recursive https://github.com/pytorch/pytorch
cd pytorch
@ -187,11 +172,13 @@ cd pytorch
#### Install PyTorch
On Linux
```bash
export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
python setup.py install
```
On macOS
```bash
export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py install
```
@ -210,9 +197,9 @@ call "%VS150COMNTOOLS%\vcvarsall.bat" x64 -vcvars_ver=14.11
python setup.py install
```
### Docker image
### Docker Image
Dockerfile is supplied to build images with cuda support and cudnn v7. You can pass `-e PYTHON_VERSION=x.y` flag to specify which python version is to be used by Miniconda, or leave it unset to use the default. Build as usual
Dockerfile is supplied to build images with cuda support and cudnn v7. You can pass `-e PYTHON_VERSION=x.y` flag to specify which Python version is to be used by Miniconda, or leave it unset to use the default. Build as usual
```
docker build -t pytorch -f docker/pytorch/Dockerfile .
```
@ -259,8 +246,7 @@ Three pointers to get you started:
## Releases and Contributing
PyTorch has a 90 day release cycle (major releases).
Its current state is Beta, we expect no obvious bugs. Please let us know if you encounter a bug by [filing an issue](https://github.com/pytorch/pytorch/issues).
PyTorch has a 90 day release cycle (major releases). Please let us know if you encounter a bug by [filing an issue](https://github.com/pytorch/pytorch/issues).
We appreciate all contributions. If you are planning to contribute back bug-fixes, please do so without any further discussion.

View File

@ -21,9 +21,14 @@ set(ATen_THIRD_PARTY_INCLUDE)
set(ATen_CUDA_SRCS)
set(ATen_CUDA_TEST_SRCS)
set(ATen_CUDA_INCLUDE)
set(ATen_HIP_SRCS)
set(ATen_HIP_TEST_SRCS)
set(ATen_HIP_INCLUDE)
set(ATen_CPU_DEPENDENCY_LIBS)
set(ATen_CUDA_DEPENDENCY_LIBS)
set(ATen_HIP_DEPENDENCY_LIBS)
set(ATen_PUBLIC_CUDA_DEPENDENCY_LIBS)
set(ATen_PUBLIC_HIP_DEPENDENCY_LIBS)
SET(ATEN_INSTALL_BIN_SUBDIR "bin" CACHE PATH "ATen install binary subdirectory")
SET(ATEN_INSTALL_LIB_SUBDIR "lib" CACHE PATH "ATen install library subdirectory")
SET(ATEN_INSTALL_INCLUDE_SUBDIR "include" CACHE PATH "ATen install include subdirectory")
@ -35,22 +40,11 @@ endif()
set(TH_LINK_STYLE STATIC)
add_subdirectory(src/TH)
set(TH_CPU_INCLUDE
# dense
${CMAKE_CURRENT_SOURCE_DIR}/src/TH
${CMAKE_CURRENT_BINARY_DIR}/src/TH
${CMAKE_CURRENT_SOURCE_DIR}/src
${CMAKE_CURRENT_BINARY_DIR}/src
${CMAKE_BINARY_DIR}/aten/src)
list(APPEND ATen_CPU_INCLUDE ${TH_CPU_INCLUDE})
if(USE_CUDA OR USE_ROCM)
set(TH_CUDA_INCLUDE
# dense
${CMAKE_CURRENT_SOURCE_DIR}/src/THC
${CMAKE_CURRENT_BINARY_DIR}/src/THC)
list(APPEND ATen_CUDA_INCLUDE ${TH_CUDA_INCLUDE})
endif()
add_subdirectory(src/THNN)
# Find the HIP package, set the HIP paths, load the HIP CMake.
@ -69,9 +63,11 @@ IF(MSVC)
ENDIF(MSVC)
if(USE_ROCM)
# TODO: AT_HIP_ENABLED (change this once we represent HIP as HIP in
# ATen proper)
SET(AT_CUDA_ENABLED 1)
add_subdirectory(src/THC)
add_subdirectory(src/THCUNN)
add_subdirectory(src/THH)
add_subdirectory(src/THHUNN)
message("ROCm is enabled.")
elseif(USE_CUDA)
SET(AT_CUDA_ENABLED 1)
@ -82,24 +78,23 @@ else()
SET(AT_CUDA_ENABLED 0)
endif()
list(APPEND ATen_CPU_INCLUDE
${CMAKE_CURRENT_SOURCE_DIR}/src/THNN
${CMAKE_CURRENT_SOURCE_DIR}/src/THCUNN)
list(APPEND ATen_CPU_INCLUDE
${CMAKE_CURRENT_SOURCE_DIR}/src
${CMAKE_CURRENT_SOURCE_DIR}/../third_party/catch/single_include
${CMAKE_CURRENT_BINARY_DIR}/src/ATen)
${CMAKE_CURRENT_SOURCE_DIR}/../third_party/catch/single_include)
add_subdirectory(src/ATen)
# Pass source, includes, and libs to parent
set(ATen_CPU_SRCS ${ATen_CPU_SRCS} PARENT_SCOPE)
set(ATen_CUDA_SRCS ${ATen_CUDA_SRCS} PARENT_SCOPE)
set(ATen_HIP_SRCS ${ATen_HIP_SRCS} PARENT_SCOPE)
set(ATen_CPU_TEST_SRCS ${ATen_CPU_TEST_SRCS} PARENT_SCOPE)
set(ATen_CUDA_TEST_SRCS ${ATen_CUDA_TEST_SRCS} PARENT_SCOPE)
set(ATen_HIP_TEST_SRCS ${ATen_HIP_TEST_SRCS} PARENT_SCOPE)
set(ATen_CPU_INCLUDE ${ATen_CPU_INCLUDE} PARENT_SCOPE)
set(ATen_CUDA_INCLUDE ${ATen_CUDA_INCLUDE} PARENT_SCOPE)
set(ATen_HIP_INCLUDE ${ATen_HIP_INCLUDE} PARENT_SCOPE)
set(ATen_THIRD_PARTY_INCLUDE ${ATen_THIRD_PARTY_INCLUDE} PARENT_SCOPE)
set(ATen_CPU_DEPENDENCY_LIBS ${ATen_CPU_DEPENDENCY_LIBS} PARENT_SCOPE)
set(ATen_CUDA_DEPENDENCY_LIBS ${ATen_CUDA_DEPENDENCY_LIBS} PARENT_SCOPE)
set(ATen_HIP_DEPENDENCY_LIBS ${ATen_HIP_DEPENDENCY_LIBS} PARENT_SCOPE)
set(ATen_CORE_TEST_SRCS ${ATen_CORE_TEST_SRCS} PARENT_SCOPE)

View File

@ -1,24 +1,24 @@
#pragma once
#include "ATen/Allocator.h"
#include "ATen/CPUGeneral.h"
#include "ATen/Context.h"
#include "ATen/Device.h"
#include "ATen/DeviceGuard.h"
#include "ATen/DimVector.h"
#include "ATen/Dispatch.h"
#include "ATen/Formatting.h"
#include "ATen/Functions.h"
#include "ATen/ScalarOps.h"
#include "ATen/Tensor.h"
#include "ATen/TensorGeometry.h"
#include "ATen/TensorOperators.h"
#include "ATen/Type.h"
#include "ATen/core/ATenGeneral.h"
#include "ATen/core/Generator.h"
#include <ATen/Allocator.h>
#include <ATen/CPUGeneral.h>
#include <ATen/Context.h>
#include <ATen/Device.h>
#include <ATen/DeviceGuard.h>
#include <ATen/DimVector.h>
#include <ATen/Dispatch.h>
#include <ATen/Formatting.h>
#include <ATen/Functions.h>
#include <ATen/ScalarOps.h>
#include <ATen/Tensor.h>
#include <ATen/TensorGeometry.h>
#include <ATen/TensorOperators.h>
#include <ATen/Type.h>
#include <ATen/core/ATenGeneral.h>
#include <ATen/core/Generator.h>
#include <c10/core/Layout.h>
#include "ATen/core/Scalar.h"
#include <ATen/core/Scalar.h>
#include <c10/core/Storage.h>
#include "ATen/core/TensorMethods.h"
#include "ATen/core/TensorOptions.h"
#include <ATen/core/TensorMethods.h>
#include <c10/core/TensorOptions.h>
#include <c10/util/Exception.h>

View File

@ -1,6 +1,6 @@
#pragma once
#include "ATen/Config.h"
#include "ATen/core/Half.h"
#include <ATen/Config.h>
#include <ATen/core/Half.h>
// Defines the accumulation type for a scalar type.
// Example:

View File

@ -17,7 +17,14 @@ IF(NOT AT_INSTALL_BIN_DIR OR NOT AT_INSTALL_LIB_DIR OR NOT AT_INSTALL_INCLUDE_DI
ENDIF()
CONFIGURE_FILE(Config.h.in "${CMAKE_CURRENT_SOURCE_DIR}/Config.h")
CONFIGURE_FILE(cuda/CUDAConfig.h.in "${CMAKE_CURRENT_SOURCE_DIR}/cuda/CUDAConfig.h")
# TODO: Don't unconditionally generate CUDAConfig.h.in. Unfortuantely,
# this file generates AT_ROCM_ENABLED() which is required by the miopen
# files, which are compiled even if we are doing a vanilla CUDA build.
# Once we properly split CUDA and HIP in ATen, we can remove this code.
configure_file(cuda/CUDAConfig.h.in "${CMAKE_CURRENT_SOURCE_DIR}/cuda/CUDAConfig.h")
if(USE_ROCM)
configure_file(hip/HIPConfig.h.in "${CMAKE_CURRENT_SOURCE_DIR}/hip/HIPConfig.h")
endif()
# NB: If you edit these globs, you'll have to update setup.py package_data as well
FILE(GLOB base_h "*.h" "detail/*.h" "cpu/*.h")
@ -28,21 +35,33 @@ FILE(GLOB cuda_cpp "cuda/*.cpp" "cuda/detail/*.cpp")
FILE(GLOB cuda_cu "cuda/*.cu" "cuda/detail/*.cu")
FILE(GLOB cudnn_h "cudnn/*.h" "cudnn/*.cuh")
FILE(GLOB cudnn_cpp "cudnn/*.cpp")
FILE(GLOB hip_h "hip/*.h" "hip/detail/*.h" "hip/*.cuh" "hip/detail/*.cuh")
FILE(GLOB hip_cpp "hip/*.cpp" "hip/detail/*.cpp" "hip/impl/*.cpp")
FILE(GLOB hip_hip "hip/*.hip" "hip/detail/*.hip" "hip/impl/*.hip")
FILE(GLOB miopen_h "miopen/*.h")
FILE(GLOB miopen_cpp "miopen/*.cpp")
FILE(GLOB mkl_cpp "mkl/*.cpp")
FILE(GLOB mkldnn_cpp "mkldnn/*.cpp")
FILE(GLOB native_cpp "native/*.cpp")
FILE(GLOB native_sparse_cpp "native/sparse/*.cpp")
FILE(GLOB native_sparse_cuda_cu "native/sparse/cuda/*.cu")
FILE(GLOB native_sparse_cuda_cpp "native/sparse/cuda/*.cpp")
FILE(GLOB native_cudnn_cpp "native/cudnn/*.cpp")
FILE(GLOB native_miopen_cpp "native/miopen/*.cpp")
FILE(GLOB native_cuda_cu "native/cuda/*.cu")
FILE(GLOB native_cuda_cpp "native/cuda/*.cpp")
FILE(GLOB native_mkl_cpp "native/mkl/*.cpp")
FILE(GLOB native_mkldnn_cpp "native/mkldnn/*.cpp")
FILE(GLOB native_sparse_cpp "native/sparse/*.cpp")
FILE(GLOB native_cuda_cu "native/cuda/*.cu")
FILE(GLOB native_cuda_cpp "native/cuda/*.cpp")
FILE(GLOB native_cudnn_cpp "native/cudnn/*.cpp")
FILE(GLOB native_sparse_cuda_cu "native/sparse/cuda/*.cu")
FILE(GLOB native_sparse_cuda_cpp "native/sparse/cuda/*.cpp")
FILE(GLOB native_hip_hip "native/hip/*.hip")
FILE(GLOB native_hip_cpp "native/hip/*.cpp")
FILE(GLOB native_miopen_cpp "native/miopen/*.cpp")
FILE(GLOB native_cudnn_hip_cpp "native/cudnn/hip/*.cpp")
FILE(GLOB native_sparse_hip_hip "native/sparse/hip/*.hip")
FILE(GLOB native_sparse_hip_cpp "native/sparse/hip/*.cpp")
set(all_cpu_cpp ${base_cpp} ${ATen_CORE_SRCS} ${native_cpp} ${native_sparse_cpp} ${native_mkl_cpp} ${native_mkldnn_cpp} ${generated_cpp} ${ATen_CPU_SRCS} ${cpu_kernel_cpp})
if(AT_MKL_ENABLED)
@ -52,22 +71,32 @@ if(AT_MKLDNN_ENABLED)
set(all_cpu_cpp ${all_cpu_cpp} ${mkldnn_cpp})
endif()
IF(USE_CUDA OR USE_ROCM)
if(USE_CUDA AND USE_ROCM)
message(FATAL_ERROR "ATen doesn't not currently support simultaneously building with CUDA and ROCM")
endif()
IF(USE_CUDA)
list(APPEND ATen_CUDA_INCLUDE ${CMAKE_CURRENT_SOURCE_DIR}/cuda)
set(ATen_CUDA_SRCS ${ATen_CUDA_SRCS} ${cuda_cu} ${native_cuda_cu} ${native_sparse_cuda_cu})
set(all_cuda_cpp ${native_sparse_cuda_cpp} ${cuda_cpp} ${native_cuda_cpp} ${cuda_generated_cpp} ${ATen_CUDA_SRCS})
IF(USE_CUDA)
SET(all_cuda_cpp ${native_cudnn_cpp} ${native_miopen_cpp} ${all_cuda_cpp})
IF(CUDNN_FOUND)
SET(all_cuda_cpp ${all_cuda_cpp} ${cudnn_cpp})
ENDIF()
ELSEIF(USE_ROCM)
SET(all_cuda_cpp ${native_cudnn_cpp} ${native_miopen_cpp} ${miopen_cpp} ${all_cuda_cpp})
SET(all_cuda_cpp ${native_cudnn_cpp} ${native_miopen_cpp} ${all_cuda_cpp})
IF(CUDNN_FOUND)
SET(all_cuda_cpp ${all_cuda_cpp} ${cudnn_cpp})
ENDIF()
endif()
IF(USE_ROCM)
list(APPEND ATen_HIP_INCLUDE ${CMAKE_CURRENT_SOURCE_DIR}/hip)
set(ATen_HIP_SRCS ${ATen_HIP_SRCS} ${hip_hip} ${native_hip_hip} ${native_sparse_hip_hip})
# TODO: Codegen separate files for HIP and use those (s/cuda_generated_cpp/hip_generated_cpp)
set(all_hip_cpp ${native_sparse_hip_cpp} ${hip_cpp} ${native_hip_cpp} ${cuda_generated_cpp} ${ATen_HIP_SRCS})
set(all_hip_cpp ${native_miopen_cpp} ${native_cudnn_hip_cpp} ${miopen_cpp} ${all_hip_cpp})
endif()
filter_list(generated_h generated_cpp "\\.h$")
filter_list(cuda_generated_h cuda_generated_cpp "\\.h$")
# TODO: When we have hip_generated_cpp
#filter_list(hip_generated_h hip_generated_cpp "\\.h$")
list(APPEND ATen_CPU_INCLUDE ${CMAKE_CURRENT_SOURCE_DIR}/..)
# so the build can find the generated header files
@ -81,21 +110,28 @@ IF(BLAS_FOUND)
MESSAGE(STATUS "TH_BINARY_BUILD detected. Enabling special linkage.")
list(APPEND ATen_CPU_DEPENDENCY_LIBS
"${BLAS_LIBRARIES};${BLAS_LIBRARIES};${BLAS_LIBRARIES}")
if(USE_CUDA OR USE_ROCM)
if(USE_CUDA)
list(APPEND ATen_CUDA_DEPENDENCY_LIBS
"${BLAS_LIBRARIES};${BLAS_LIBRARIES};${BLAS_LIBRARIES}")
endif()
if(USE_ROCM)
list(APPEND ATen_HIP_DEPENDENCY_LIBS
"${BLAS_LIBRARIES};${BLAS_LIBRARIES};${BLAS_LIBRARIES}")
endif()
ELSE ($ENV{TH_BINARY_BUILD})
list(APPEND ATen_CPU_DEPENDENCY_LIBS ${BLAS_LIBRARIES})
if(USE_CUDA OR USE_ROCM)
if(USE_CUDA)
list(APPEND ATen_CUDA_DEPENDENCY_LIBS "${BLAS_LIBRARIES}")
endif()
if(USE_ROCM)
list(APPEND ATen_HIP_DEPENDENCY_LIBS "${BLAS_LIBRARIES}")
endif()
ENDIF ($ENV{TH_BINARY_BUILD})
ENDIF(BLAS_FOUND)
IF(LAPACK_FOUND)
list(APPEND ATen_CPU_DEPENDENCY_LIBS ${LAPACK_LIBRARIES})
if(USE_CUDA OR USE_ROCM)
if(USE_CUDA)
# Although Lapack provides CPU (and thus, one might expect that ATen_cuda
# would not need this at all), some of our libraries (magma in particular)
# backend to CPU BLAS/LAPACK implementations, and so it is very important
@ -104,6 +140,11 @@ IF(LAPACK_FOUND)
# This caused https://github.com/pytorch/pytorch/issues/7353
list(APPEND ATen_CUDA_DEPENDENCY_LIBS ${LAPACK_LIBRARIES})
endif()
if(USE_ROCM)
# It's not altogether clear that HIP behaves the same way, but it
# seems safer to assume that it needs it too
list(APPEND ATen_HIP_DEPENDENCY_LIBS ${LAPACK_LIBRARIES})
endif()
ENDIF(LAPACK_FOUND)
IF (UNIX AND NOT APPLE)
@ -206,6 +247,12 @@ IF(USE_CUDA AND NOT USE_ROCM)
--generate-code arch=compute_50,code=sm_50
--generate-code arch=compute_60,code=sm_60
--generate-code arch=compute_70,code=sm_70)
elseif(${CUDA_VERSION_MAJOR} EQUAL "10")
SET(CUFFT_FAKELINK_OPTIONS
--generate-code arch=compute_35,code=sm_35
--generate-code arch=compute_50,code=sm_50
--generate-code arch=compute_60,code=sm_60
--generate-code arch=compute_70,code=sm_70)
else()
MESSAGE(FATAL_ERROR "Unhandled major cuda version ${CUDA_VERSION_MAJOR}")
endif()
@ -252,22 +299,21 @@ IF(USE_CUDA AND NOT USE_ROCM)
ENDIF($ENV{ATEN_STATIC_CUDA})
ENDIF()
IF(USE_ROCM)
### Link in the ROCm libraries BLAS / RNG.
FIND_LIBRARY(ROCBLAS_LIBRARY rocblas HINTS ${ROCBLAS_PATH}/lib)
FIND_LIBRARY(HIPRAND_LIBRARY hiprand HINTS ${HIPRAND_PATH}/lib)
# NB: We're relying on cmake/Dependencies.cmake to appropriately setup HIP dependencies.
# In principle we could duplicate them, but handling the rocblas
# dependency is nontrivial. So better not to copy-paste.
# Look for Note [rocblas cmake bug]
list(APPEND ATen_CUDA_DEPENDENCY_LIBS ${ROCBLAS_LIBRARY} ${HIPRAND_LIBRARY})
ENDIF()
# Include CPU paths for CUDA as well
# Include CPU paths for CUDA/HIP as well
list(APPEND ATen_CUDA_INCLUDE ${ATen_CPU_INCLUDE})
list(APPEND ATen_HIP_INCLUDE ${ATen_CPU_INCLUDE})
# We have two libraries: libATen_cpu.so and libATen_cuda.so,
# with libATen_cuda.so depending on libATen_cpu.so. The CPU library
# contains CPU code only. libATen_cpu.so is invariant to the setting
# of USE_CUDA (it always builds the same way); libATen_cuda.so is only
# built when USE_CUDA=1 and CUDA is available.
# built when USE_CUDA=1 and CUDA is available. (libATen_hip.so works
# the same way as libATen_cuda.so)
set(ATen_CPU_SRCS ${all_cpu_cpp})
if(AT_LINK_STYLE STREQUAL "INTERFACE")
# Source code can't be added to an interface library, so it is
@ -291,7 +337,7 @@ else()
set(ATen_CPU_SRCS)
endif()
if(USE_CUDA OR USE_ROCM)
if(USE_CUDA)
set(ATen_CUDA_SRCS ${all_cuda_cpp})
if(AT_LINK_STYLE STREQUAL "INTERFACE")
# Source code can't be added to an interface library, so it is
@ -299,42 +345,25 @@ if(USE_CUDA OR USE_ROCM)
add_library(ATen_cuda INTERFACE)
list(APPEND ATen_CUDA_DEPENDENCY_LIBS ATEN_CUDA_FILES_GEN_LIB)
else()
# A hack to deal with cuda library dependencies and modern CMake: the
# CUDA_ADD_LIBRARY includes a target_link_libraries, and as a result,
# one cannot use PUBLIC/PRIVATE/INTERFACE for the target anymore. This
# hack adds the PRIVATE keywords to CUDA_LIBRARIES so we can deal with
# it. We will then manually add the cudart library as interface libs.
set(__tmp ${CUDA_LIBRARIES})
set(CUDA_LIBRARIES PRIVATE ${CUDA_LIBRARIES})
torch_cuda_based_add_library(ATen_cuda ${AT_LINK_STYLE} ${ATen_CUDA_SRCS})
set(CUDA_LIBRARIES ${__tmp})
target_link_libraries(ATen_cuda INTERFACE caffe2::cudart)
target_include_directories(
ATen_cuda INTERFACE $<INSTALL_INTERFACE:include>)
target_include_directories(
ATen_cuda PRIVATE ${ATen_THIRD_PARTY_INCLUDE})
target_include_directories(
ATen_cuda PRIVATE ${ATen_CUDA_INCLUDE})
target_link_libraries(
ATen_cuda PRIVATE ${ATen_CUDA_DEPENDENCY_LIBS} ATEN_CUDA_FILES_GEN_LIB)
# These public dependencies must go after the previous dependencies, as the
# order of the libraries in the linker call matters here when statically
# linking; libculibos and cublas must be last.
target_link_libraries(
ATen_cuda PUBLIC ATen_cpu ${ATen_PUBLIC_CUDA_DEPENDENCY_LIBS})
# Set standard properties on the target
torch_set_target_props(ATen_cuda)
caffe2_interface_library(ATen_cuda ATen_cuda_library)
# Make sure these don't get built by parent
set(ATen_CUDA_SRCS)
message(FATAL_ERROR "Non-INTERFACE AT_LINK_STYLE no longer supported")
endif()
endif()
if(USE_ROCM)
set(ATen_HIP_SRCS ${all_hip_cpp})
if(AT_LINK_STYLE STREQUAL "INTERFACE")
# Source code can't be added to an interface library, so it is
# passed back to be compiled into the containing library
add_library(ATen_hip INTERFACE)
# NB: Instead of adding it to this list, we add it by hand
# to caffe2_hip, because it needs to be a PRIVATE dependency
# list(APPEND ATen_HIP_DEPENDENCY_LIBS ATEN_CUDA_FILES_GEN_LIB)
else()
message(FATAL_ERROR "Non-INTERFACE AT_LINK_STYLE not (yet) supported for ROCm build")
endif()
endif()
if(NOT AT_LINK_STYLE STREQUAL "INTERFACE")
if(USE_CUDA)
if (NOT $ENV{ATEN_STATIC_CUDA})
@ -345,16 +374,22 @@ if(NOT AT_LINK_STYLE STREQUAL "INTERFACE")
if(NOT MSVC)
torch_compile_options(ATen_cpu)
if(USE_CUDA OR USE_ROCM)
if(USE_CUDA)
torch_compile_options(ATen_cuda)
endif()
if(USE_ROCM)
torch_compile_options(ATen_hip)
endif()
endif()
if(NOT ${CMAKE_VERSION} VERSION_LESS "3.1")
set_property(TARGET ATen_cpu PROPERTY CXX_STANDARD 11)
if(USE_CUDA OR USE_ROCM)
if(USE_CUDA)
set_property(TARGET ATen_cuda PROPERTY CXX_STANDARD 11)
endif()
if(USE_ROCM)
set_property(TARGET ATen_hip PROPERTY CXX_STANDARD 11)
endif()
endif()
endif()
@ -364,11 +399,12 @@ INSTALL(FILES "${CMAKE_CURRENT_BINARY_DIR}/cmake-exports/ATenConfig.cmake"
DESTINATION "${AT_INSTALL_SHARE_DIR}/cmake/ATen")
# https://stackoverflow.com/questions/11096471/how-can-i-install-a-hierarchy-of-files-using-cmake
FOREACH(HEADER ${base_h} ${ATen_CORE_HEADERS} ${cuda_h} ${cudnn_h})
FOREACH(HEADER ${base_h} ${ATen_CORE_HEADERS} ${cuda_h} ${cudnn_h} ${hip_h} ${miopen_h})
string(REPLACE "${CMAKE_CURRENT_SOURCE_DIR}/" "" HEADER_SUB ${HEADER})
GET_FILENAME_COMPONENT(DIR ${HEADER_SUB} DIRECTORY)
INSTALL(FILES ${HEADER} DESTINATION ${AT_INSTALL_INCLUDE_DIR}/ATen/${DIR})
ENDFOREACH()
# TODO: Install hip_generated_h when we have it
FOREACH(HEADER ${generated_h} ${cuda_generated_h})
# NB: Assumed to be flat
INSTALL(FILES ${HEADER} DESTINATION ${AT_INSTALL_INCLUDE_DIR}/ATen)
@ -386,10 +422,15 @@ endif()
set(ATen_CORE_SRCS ${ATen_CORE_SRCS} PARENT_SCOPE)
set(ATen_CPU_SRCS ${ATen_CPU_SRCS} PARENT_SCOPE)
set(ATen_CUDA_SRCS ${ATen_CUDA_SRCS} PARENT_SCOPE)
set(ATen_HIP_SRCS ${ATen_HIP_SRCS} PARENT_SCOPE)
set(ATen_CPU_TEST_SRCS ${ATen_CPU_TEST_SRCS} PARENT_SCOPE)
set(ATen_CUDA_TEST_SRCS ${ATen_CUDA_TEST_SRCS} PARENT_SCOPE)
set(ATen_CORE_TEST_SRCS ${ATen_CORE_TEST_SRCS} PARENT_SCOPE)
set(ATen_HIP_TEST_SRCS ${ATen_HIP_TEST_SRCS} PARENT_SCOPE)
set(ATen_CPU_INCLUDE ${ATen_CPU_INCLUDE} PARENT_SCOPE)
set(ATen_THIRD_PARTY_INCLUDE ${ATen_THIRD_PARTY_INCLUDE} PARENT_SCOPE)
set(ATen_CUDA_INCLUDE ${ATen_CUDA_INCLUDE} PARENT_SCOPE)
set(ATen_HIP_INCLUDE ${ATen_HIP_INCLUDE} PARENT_SCOPE)
set(ATen_CPU_DEPENDENCY_LIBS ${ATen_CPU_DEPENDENCY_LIBS} PARENT_SCOPE)
set(ATen_CUDA_DEPENDENCY_LIBS ${ATen_CUDA_DEPENDENCY_LIBS} PARENT_SCOPE)
set(ATen_HIP_DEPENDENCY_LIBS ${ATen_HIP_DEPENDENCY_LIBS} PARENT_SCOPE)

View File

@ -1,7 +1,7 @@
#pragma once
#include "ATen/Parallel.h"
#include "ATen/TensorUtils.h"
#include <ATen/Parallel.h>
#include <ATen/TensorUtils.h>
#include <limits>
#include <utility>
#include <cstring>

View File

@ -1,7 +1,7 @@
#pragma once
#include "TH/TH.h"
#include "c10/util/Exception.h"
#include <TH/TH.h>
#include <c10/util/Exception.h>
// This file creates a fake allocator that just throws exceptions if
// it is actually used.

View File

@ -4,7 +4,7 @@
// linking errors using MSVC
// See https://msdn.microsoft.com/en-us/library/a90k134d.aspx
// This header adds this if using CAFFE2_API
#include "ATen/core/ATenGeneral.h"
#include <ATen/core/ATenGeneral.h>
namespace at {
CAFFE2_API void set_num_threads(int);

View File

@ -1,4 +1,4 @@
#include "ATen/CPUGenerator.h"
#include <ATen/CPUGenerator.h>
#define const_generator_cast(generator) \
dynamic_cast<const CPUGenerator&>(generator)

View File

@ -1,8 +1,8 @@
#pragma once
#include "ATen/Utils.h"
#include "ATen/core/Generator.h"
#include "c10/util/Exception.h"
#include <ATen/Utils.h>
#include <ATen/core/Generator.h>
#include <c10/util/Exception.h>
namespace at {

View File

@ -1,8 +1,8 @@
#include "ATen/Config.h"
#include <ATen/Config.h>
#include "Context.h"
#include <ATen/Context.h>
#include <ATen/core/TensorOptions.h>
#include <c10/core/TensorOptions.h>
#include <thread>
#include <mutex>
@ -10,12 +10,12 @@
#include <string>
#include <stdexcept>
#include "ATen/CPUGenerator.h"
#include "ATen/RegisterCPU.h"
#include "ATen/Tensor.h"
#include <ATen/CPUGenerator.h>
#include <ATen/RegisterCPU.h>
#include <ATen/Tensor.h>
#include <ATen/cpu/FlushDenormal.h>
#include "TH/TH.h" // for USE_LAPACK
#include <TH/TH.h> // for USE_LAPACK
namespace at {

View File

@ -1,19 +1,19 @@
#pragma once
#include <ATen/CPUGeneral.h>
#include "ATen/Type.h"
#include "ATen/TypeExtendedInterface.h"
#include "ATen/Utils.h"
#include "ATen/LegacyTHDispatch.h"
#include "ATen/LegacyTHDispatcher.h"
#include "ATen/core/ATenGeneral.h"
#include "ATen/core/Generator.h"
#include "ATen/core/LegacyTypeDispatch.h"
#include "ATen/core/VariableHooksInterface.h"
#include "ATen/detail/CUDAHooksInterface.h"
#include "ATen/detail/HIPHooksInterface.h"
#include "ATen/detail/ComplexHooksInterface.h"
#include "c10/util/Exception.h"
#include <ATen/Type.h>
#include <ATen/TypeExtendedInterface.h>
#include <ATen/Utils.h>
#include <ATen/LegacyTHDispatch.h>
#include <ATen/LegacyTHDispatcher.h>
#include <ATen/core/ATenGeneral.h>
#include <ATen/core/Generator.h>
#include <ATen/core/LegacyTypeDispatch.h>
#include <ATen/core/VariableHooksInterface.h>
#include <ATen/detail/CUDAHooksInterface.h>
#include <ATen/detail/HIPHooksInterface.h>
#include <ATen/detail/ComplexHooksInterface.h>
#include <c10/util/Exception.h>
#include <memory>
#include <mutex>

View File

@ -1,5 +1,5 @@
#include "ATen/DLConvertor.h"
#include "ATen/Functions.h"
#include <ATen/DLConvertor.h>
#include <ATen/Functions.h>
#include <iostream>
#include <sstream>

View File

@ -1,8 +1,8 @@
#pragma once
#include "ATen/Tensor.h"
#include "ATen/ATen.h"
#include "ATen/dlpack.h"
#include <ATen/Tensor.h>
#include <ATen/ATen.h>
#include <ATen/dlpack.h>
// this convertor will:
// 1) take a Tensor object and wrap it in the DLPack tensor

View File

@ -1629,8 +1629,7 @@
- arg: THTensor* result
output: True
- THTensor* self
- arg: real p
python_default_init: AS_REAL(2)
- real p
- arg: long dim
wrap_dim: self
- arg: bool keepdim
@ -1882,44 +1881,6 @@
- THTensor* end
- real weight
]]
[[
name: _th_linspace
cname: linspace
types:
- Float
- Double
backends:
- CPU
- CUDA
variants:
- function
return: argument 0
arguments:
- arg: THTensor* result
output: True
- real start
- real end
- long steps
]]
[[
name: _th_logspace
cname: logspace
types:
- Float
- Double
backends:
- CPU
- CUDA
variants:
- function
return: argument 0
arguments:
- arg: THTensor* result
output: True
- real start
- real end
- long steps
]]
[[
name: _th_histc
cname: histc

View File

@ -1,4 +1,4 @@
#include "ATen/ExpandUtils.h"
#include <ATen/ExpandUtils.h>
namespace at {

View File

@ -1,7 +1,7 @@
#pragma once
#include "ATen/Tensor.h"
#include "c10/util/Exception.h"
#include <ATen/Tensor.h>
#include <c10/util/Exception.h>
#include <functional>
#include <sstream>

View File

@ -1,6 +1,6 @@
#pragma once
#include <ATen/core/TensorOptions.h>
#include <c10/core/TensorOptions.h>
namespace at {

View File

@ -38,6 +38,7 @@
#include <c10/core/Backend.h>
#include <c10/core/ScalarType.h>
#include <ATen/core/LegacyDeviceTypeInit.h>
#include <ATen/LegacyTHDispatcher.h>
namespace at {
@ -69,16 +70,51 @@ class CAFFE2_API LegacyTHDispatch {
dispatcher_registry[static_cast<int>(b)][static_cast<int>(s)] = std::move(t);
}
LegacyTHDispatcher & getLegacyTHDispatcher(Backend p, ScalarType s) {
auto* dispatcher = getLegacyTHDispatcherOpt(p, s);
if (!dispatcher) AT_ERROR(toString(p), toString(s), "THDispatcher is not enabled.");
return *dispatcher;
}
private:
LegacyTHDispatcher* getLegacyTHDispatcherRaw(Backend p, ScalarType s) {
return dispatcher_registry[static_cast<int>(p)][static_cast<int>(s)].get();
}
LegacyTHDispatcher & getLegacyTHDispatcher(Backend p, ScalarType s) {
auto* type = getLegacyTHDispatcherRaw(p, s);
if (!type) AT_ERROR(toString(p), toString(s), "THDispatcher is not enabled.");
return *type;
LegacyTHDispatcher* getLegacyTHDispatcherOpt(Backend p, ScalarType s) {
if (p != Backend::Undefined) {
initForDeviceType(backendToDeviceType(p));
// NB: there is no Complex for TH, so no initialization to be done.
}
auto dispatcher = getLegacyTHDispatcherRaw(p, s);
if(!dispatcher) {
if (p == Backend::Undefined || s == ScalarType::Undefined) {
AT_ERROR("Requested Undefined THDispatcher which is invalid. Backend:",
toString(p), "ScalarType: ", toString(s));
}
}
return dispatcher;
}
private:
void initForDeviceType(DeviceType p) {
static std::once_flag cpu_once;
static std::once_flag cuda_once;
if (p == DeviceType::CPU) {
std::call_once(cpu_once, [] {
getLegacyDeviceTypeInit().initCPU();
});
} else if (p == DeviceType::CUDA) {
std::call_once(cuda_once, [] {
getLegacyDeviceTypeInit().initCUDA();
});
} else if (p == DeviceType::HIP) {
std::call_once(cuda_once, [] {
getLegacyDeviceTypeInit().initHIP();
});
}
}
// NB: dispatcher_registry has nullptr for all CUDA backends until
// CUDA initialization has occurred
LegacyTHDispatcherUniquePtr dispatcher_registry

View File

@ -1,7 +1,7 @@
#pragma once
#include <c10/core/Scalar.h>
#include "ATen/Tensor.h"
#include <ATen/Tensor.h>
// This is in the c10 namespace because we use ADL to find the functions in it.
namespace c10 {
@ -10,10 +10,10 @@ namespace c10 {
// to implement this without going through Derived Types (which are not part of core).
inline at::Tensor scalar_to_tensor(Scalar s) {
if (s.isFloatingPoint()) {
return at::CPU(kDouble).scalarTensor(s);
return at::scalar_tensor(s, at::CPU(kDouble).options());
} else {
AT_ASSERT(s.isIntegral());
return at::CPU(kLong).scalarTensor(s);
return at::scalar_tensor(s, at::CPU(kLong).options());
}
}

View File

@ -32,14 +32,13 @@ namespace {
// values tensor for such an empty tensor.
SparseTensorImpl::SparseTensorImpl(at::TensorTypeId type_id, const caffe2::TypeMeta& data_type)
: TensorImpl(type_id, data_type, nullptr, false)
, size_{0}
, sparse_dim_(1)
, dense_dim_(0)
, indices_(at::empty({1, 0}, at::initialTensorOptions().device(sparseTensorIdToDeviceType(type_id)).dtype(ScalarType::Long)))
, values_(at::empty({0}, at::initialTensorOptions().device(sparseTensorIdToDeviceType(type_id)).dtype(data_type))) {}
IntList SparseTensorImpl::sizes() const {
return size_;
return sizes_;
}
IntList SparseTensorImpl::strides() const {
AT_ERROR("sparse tensors do not have strides");
@ -47,10 +46,6 @@ IntList SparseTensorImpl::strides() const {
bool SparseTensorImpl::is_contiguous() const {
AT_ERROR("sparse tensors do not have is_contiguous");
}
int64_t SparseTensorImpl::size(int64_t d) const {
d = at::maybe_wrap_dim(d, dim(), false);
return size_[d];
}
int64_t SparseTensorImpl::stride(int64_t d) const {
AT_ERROR("sparse tensors do not have strides");
}

View File

@ -1,8 +1,8 @@
#pragma once
#include "ATen/Tensor.h"
#include "ATen/core/TensorImpl.h"
#include "c10/util/Exception.h"
#include <ATen/Tensor.h>
#include <c10/core/TensorImpl.h>
#include <c10/util/Exception.h>
namespace at {
struct CAFFE2_API SparseTensorImpl : public TensorImpl {
@ -14,11 +14,6 @@ struct CAFFE2_API SparseTensorImpl : public TensorImpl {
// _indices.shape: dimensionality: 2, shape: (sparse_dim, nnz)
// _values.shape: dimensionality: 1 + dense_dim. shape: (nnz, shape[sparse_dim:])
// The true size of the sparse tensor (e.g., if you called to_dense()
// on it). When THTensor merges into TensorImpl, this field
// should move to the parent class.
std::vector<int64_t> size_;
int64_t sparse_dim_ = 0; // number of sparse dimensions
int64_t dense_dim_ = 0; // number of dense dimensions
@ -48,7 +43,6 @@ public:
IntList sizes() const override;
IntList strides() const override;
bool is_contiguous() const override;
int64_t size(int64_t d) const override;
int64_t stride(int64_t d) const override;
void resize_dim(int64_t ndim) override;
void set_size(int64_t dim, int64_t new_size) override;
@ -63,7 +57,7 @@ public:
// WARNING: This function does NOT preserve invariants of sparse_dim/dense_dim with
// respect to indices and values
void raw_resize_(int64_t sparse_dim, int64_t dense_dim, IntList size) {
size_ = size.vec();
sizes_ = size.vec();
sparse_dim_ = sparse_dim;
dense_dim_ = dense_dim;
refresh_numel();
@ -132,7 +126,7 @@ public:
"shrinking the size of dense dimensions (from ", dense_size_original, " to ", dense_size_new, ") on a non-empty sparse tensor is not supported.\n", alt_options_msg);
}
if ((!size.equals(size_)) || (sparse_dim != sparse_dim_) || (dense_dim != dense_dim_)) {
if ((!size.equals(sizes_)) || (sparse_dim != sparse_dim_) || (dense_dim != dense_dim_)) {
auto nnz = values().size(0);
std::vector<int64_t> values_size = {nnz};
auto dense_size = size.slice(sparse_dim);
@ -141,7 +135,7 @@ public:
indices_.resize_({sparse_dim, nnz});
}
size_ = size.vec();
sizes_ = size.vec();
sparse_dim_ = sparse_dim;
dense_dim_ = dense_dim;
refresh_numel();
@ -151,7 +145,7 @@ public:
void resize_and_clear_(int64_t sparse_dim, int64_t dense_dim, IntList size) {
AT_CHECK(sparse_dim + dense_dim == size.size(), "number of dimensions must be sparse_dim (", sparse_dim, ") + dense_dim (", dense_dim, "), but got ", size.size());
size_ = size.vec();
sizes_ = size.vec();
sparse_dim_ = sparse_dim;
dense_dim_ = dense_dim;

View File

@ -1,8 +1,8 @@
#pragma once
#include <c10/core/Scalar.h>
#include "ATen/Tensor.h"
#include "ATen/Type.h"
#include <ATen/Tensor.h>
#include <ATen/Type.h>
#include <string>
#include <stdexcept>

View File

@ -1,2 +1,2 @@
#pragma once
#include <ATen/core/TensorOptions.h>
#include <c10/core/TensorOptions.h>

View File

@ -1,7 +1,7 @@
#include "ATen/Config.h"
#include "ATen/TensorUtils.h"
#include <ATen/Config.h>
#include <ATen/TensorUtils.h>
#include "ATen/ATen.h"
#include <ATen/ATen.h>
#include <ostream>
#include <sstream>

View File

@ -1,8 +1,8 @@
#pragma once
#include "ATen/Tensor.h"
#include "ATen/TensorGeometry.h"
#include "ATen/Utils.h"
#include <ATen/Tensor.h>
#include <ATen/TensorGeometry.h>
#include <ATen/Utils.h>
// These functions are NOT in Utils.h, because this file has a dep on Tensor.h

View File

@ -1,5 +1,5 @@
#include "ATen/UndefinedType.h"
#include "c10/util/Exception.h"
#include <ATen/UndefinedType.h>
#include <c10/util/Exception.h>
namespace at {
@ -23,12 +23,6 @@ Device UndefinedType::getDeviceFromPtr(void*) const {
AT_ERROR("getDeviceFromPtr not defined for UndefinedType");
}
Storage UndefinedType::storage(bool resizable) const {
AT_ERROR("storage not defined for UndefinedType");
}
Storage UndefinedType::storage(size_t size, bool resizable) const {
AT_ERROR("storage(size_t) not defined for UndefinedType");
}
Storage UndefinedType::storageFromBlob(void * data, int64_t size, const std::function<void(void*)> & deleter) const {
AT_ERROR("storageFromBlob not defined for UndefinedType");
}

View File

@ -1,7 +1,7 @@
#pragma once
#include "ATen/TypeDefault.h"
#include "ATen/CheckGenerator.h"
#include <ATen/TypeDefault.h>
#include <ATen/CheckGenerator.h>
#ifdef _MSC_VER
#ifdef Type
@ -18,8 +18,6 @@ struct UndefinedType final : public TypeDefault {
virtual Backend backend() const override;
virtual Allocator* allocator() const override;
virtual Device getDeviceFromPtr(void* data) const override;
virtual Storage storage(bool resizable = false) const override;
virtual Storage storage(size_t size, bool resizable = false) const override;
virtual Storage storageFromBlob(void * data, int64_t size, const std::function<void(void*)> & deleter) const override;
virtual Storage storageWithAllocator(int64_t size, Allocator* allocator) const override;
virtual std::unique_ptr<Generator> generator() const override;

View File

@ -1,4 +1,4 @@
#include "ATen/Utils.h"
#include <ATen/Utils.h>
#include <stdarg.h>
#include <stdexcept>
#include <typeinfo>

View File

@ -1,11 +1,11 @@
#pragma once
#include "ATen/core/ATenGeneral.h"
#include <ATen/core/ATenGeneral.h>
#include <c10/core/StorageImpl.h>
#include "ATen/core/UndefinedTensorImpl.h"
#include <c10/core/UndefinedTensorImpl.h>
#include <c10/core/ScalarType.h>
#include "ATen/Formatting.h"
#include <ATen/Formatting.h>
#include <c10/util/ArrayRef.h>
#include <c10/util/Exception.h>
@ -73,6 +73,9 @@ static inline TensorImpl* checked_tensor_unwrap(const Tensor& expr, const char *
AT_ERROR("Expected object of scalar type ", scalar_type, " but got scalar type ", expr.scalar_type(),
" for argument #", pos, " '", name, "'");
}
if (expr.is_variable()) {
AT_ERROR("Expected Tensor (not Variable) for argument #", pos, " '", name, "'");
}
return expr.unsafeGetTensorImpl();
}
@ -88,7 +91,11 @@ static inline std::vector<TensorImpl*> checked_tensor_list_unwrap(ArrayRef<Tenso
}
if (expr.scalar_type() != scalar_type) {
AT_ERROR("Expected object of scalar type ", scalar_type, " but got scalar type ", expr.scalar_type(),
" for sequence elment ", i , " in sequence argument at position #", pos, " '", name, "'");
" for sequence element ", i , " in sequence argument at position #", pos, " '", name, "'");
}
if (expr.is_variable()) {
AT_ERROR("Expected Tensor (not Variable) for sequence element ",
i , " in sequence argument at position #", pos, " '", name, "'");
}
unwrapped.emplace_back(expr.unsafeGetTensorImpl());
}

View File

@ -1,10 +1,14 @@
#pragma once
#include "ATen/core/WrapDimMinimal.h"
#include "ATen/core/TensorImpl.h"
#include <c10/core/WrapDimMinimal.h>
#include <c10/core/TensorImpl.h>
namespace at {
static inline int64_t maybe_wrap_dim(int64_t dim, int64_t dim_post_expr, bool wrap_scalar=true) {
return c10::maybe_wrap_dim(dim, dim_post_expr, wrap_scalar);
}
static inline int64_t maybe_wrap_dim(int64_t dim, TensorImpl *tensor) {
return maybe_wrap_dim(dim, tensor->dim());
}

View File

@ -1,7 +1,7 @@
#pragma once
#include "ATen/core/TensorImpl.h"
#include "ATen/WrapDimUtils.h"
#include <c10/core/TensorImpl.h>
#include <ATen/WrapDimUtils.h>
#include <sstream>
#include <bitset>

View File

@ -1,3 +1,3 @@
#pragma once
#include "c10/macros/Macros.h"
#include <c10/macros/Macros.h>

View File

@ -1,2 +1,2 @@
#include "c10/util/Backtrace.h"
#include "c10/util/Type.h"
#include <c10/util/Backtrace.h>
#include <c10/util/Type.h>

View File

@ -6,12 +6,6 @@ FILE(GLOB ATen_CORE_SRCS "*.cpp")
FILE(GLOB ATen_CORE_TEST_SRCS "*_test.cpp")
EXCLUDE(ATen_CORE_SRCS "${ATen_CORE_SRCS}" ${ATen_CORE_TEST_SRCS})
# see the source file for explanation
set_source_files_properties(
${CMAKE_CURRENT_SOURCE_DIR}/register_symbols.cpp
PROPERTIES COMPILE_FLAGS -O0
)
# Pass to parent
set(ATen_CORE_HEADERS ${ATen_CORE_HEADERS} PARENT_SCOPE)
set(ATen_CORE_SRCS ${ATen_CORE_SRCS} PARENT_SCOPE)

View File

@ -1,12 +0,0 @@
#pragma once
#include <c10/macros/Macros.h>
namespace caffe2 {
class TypeMeta;
} // namespace caffe2
namespace at {
CAFFE2_API void set_default_dtype(caffe2::TypeMeta dtype);
CAFFE2_API const caffe2::TypeMeta& get_default_dtype();
} // namespace at

View File

@ -1,4 +1,4 @@
#include "ATen/core/Formatting.h"
#include <ATen/core/Formatting.h>
#include <cmath>
#include <cstdint>

View File

@ -1,2 +1,2 @@
#pragma once
#include "c10/Half.h"
#include <c10/Half.h>

View File

@ -28,7 +28,7 @@
#include <ATen/core/VariableHooksInterface.h>
#include <c10/util/Exception.h>
#include <ATen/core/LegacyDeviceTypeInit.h>
#include <ATen/core/TensorImpl.h>
#include <c10/core/TensorImpl.h>
namespace at {

View File

@ -1,2 +1,2 @@
#pragma once
#include "c10/macros/Macros.h"
#include <c10/macros/Macros.h>

View File

@ -4,23 +4,29 @@
#include <c10/core/Layout.h>
#include <c10/core/Scalar.h>
#include <c10/core/ScalarType.h>
#include "ATen/core/SparseTensorRef.h"
#include <ATen/core/SparseTensorRef.h>
#include <c10/core/Storage.h>
#include "ATen/core/TensorAccessor.h"
#include "ATen/core/TensorImpl.h"
#include "ATen/core/UndefinedTensorImpl.h"
#include <ATen/core/TensorAccessor.h>
#include <c10/core/TensorImpl.h>
#include <c10/core/UndefinedTensorImpl.h>
#include <c10/util/Exception.h>
#include <c10/util/Optional.h>
#include <ATen/core/LegacyTypeDispatch.h>
namespace c10{
struct TensorOptions;
}
namespace at {
struct Generator;
struct Type;
class Tensor;
struct TensorOptions;
} // namespace at
namespace at {
class Tensor;
using TensorList = ArrayRef<Tensor>;
// Tensor is a "generic" object holding a pointer to the underlying TensorImpl object, which
// has an embedded reference count. In this way, Tensor is similar to boost::intrusive_ptr.
//
@ -292,10 +298,8 @@ public:
Tensor argmax() const;
Tensor argmin(int64_t dim, bool keepdim=false) const;
Tensor argmin() const;
Tensor as_strided(IntList size, IntList stride) const;
Tensor & as_strided_(IntList size, IntList stride);
Tensor as_strided(IntList size, IntList stride, int64_t storage_offset) const;
Tensor & as_strided_(IntList size, IntList stride, int64_t storage_offset);
Tensor as_strided(IntList size, IntList stride, c10::optional<int64_t> storage_offset=c10::nullopt) const;
Tensor & as_strided_(IntList size, IntList stride, c10::optional<int64_t> storage_offset=c10::nullopt);
Tensor asin() const;
Tensor & asin_();
Tensor atan() const;
@ -443,17 +447,18 @@ public:
Tensor & squeeze_();
Tensor & squeeze_(int64_t dim);
Tensor sspaddmm(const Tensor & mat1, const Tensor & mat2, Scalar beta=1, Scalar alpha=1) const;
Tensor stft(int64_t n_fft, int64_t hop_length, int64_t win_length, const Tensor & window={}, bool normalized=false, bool onesided=true) const;
Tensor stft(int64_t n_fft, c10::optional<int64_t> hop_length=c10::nullopt, c10::optional<int64_t> win_length=c10::nullopt, const Tensor & window={}, bool normalized=false, bool onesided=true) const;
int64_t stride(int64_t dim) const;
Tensor sum(ScalarType dtype) const;
Tensor sum() const;
Tensor sum(IntList dim, bool keepdim, ScalarType dtype) const;
Tensor sum(IntList dim, bool keepdim=false) const;
Tensor sum(IntList dim, ScalarType dtype) const;
Tensor sum_to_size(IntList size) const;
Tensor sqrt() const;
Tensor & sqrt_();
Tensor std(bool unbiased=true) const;
Tensor std(int64_t dim, bool unbiased=true, bool keepdim=false) const;
Tensor std(IntList dim, bool unbiased=true, bool keepdim=false) const;
Tensor prod(ScalarType dtype) const;
Tensor prod() const;
Tensor prod(int64_t dim, bool keepdim, ScalarType dtype) const;
@ -480,7 +485,7 @@ public:
Tensor view_as(const Tensor & other) const;
Tensor where(const Tensor & condition, const Tensor & other) const;
Tensor norm(Scalar p=2) const;
Tensor norm(Scalar p, int64_t dim, bool keepdim=false) const;
Tensor norm(c10::optional<Scalar> p, int64_t dim, bool keepdim=false) const;
Tensor clone() const;
Tensor & resize_as_(const Tensor & the_template);
Tensor pow(Scalar exponent) const;
@ -627,7 +632,7 @@ public:
std::tuple<Tensor,Tensor> eig(bool eigenvectors=false) const;
std::tuple<Tensor,Tensor,Tensor> svd(bool some=true, bool compute_uv=true) const;
Tensor cholesky(bool upper=false) const;
Tensor potrs(const Tensor & input2, bool upper=true) const;
Tensor cholesky_solve(const Tensor & input2, bool upper=false) const;
Tensor potri(bool upper=true) const;
std::tuple<Tensor,Tensor> pstrf(bool upper=true, Scalar tol=-1) const;
std::tuple<Tensor,Tensor> qr() const;
@ -732,4 +737,4 @@ Tensor make_tensor(Args&&... args) {
} // namespace at
#include "ATen/core/TensorMethods.h"
#include <ATen/core/TensorMethods.h>

View File

@ -1,5 +1,5 @@
#include "gtest/gtest.h"
#include "caffe2/core/tensor.h"
#include <gtest/gtest.h>
#include <caffe2/core/tensor.h>
TEST(TensorImplTest, Caffe2Constructor) {
caffe2::Tensor tensor(caffe2::CPU);

View File

@ -1,10 +1,11 @@
#pragma once
#include "ATen/core/Tensor.h"
#include <ATen/core/Tensor.h>
#include <c10/core/Scalar.h>
#include "ATen/core/SparseTensorRef.h"
#include "ATen/core/Type.h"
#include "ATen/core/TensorOptions.h"
#include <c10/macros/Macros.h>
#include <ATen/core/SparseTensorRef.h>
#include <ATen/core/Type.h>
#include <c10/core/TensorOptions.h>
namespace at {
@ -114,16 +115,10 @@ inline Tensor Tensor::argmin(int64_t dim, bool keepdim) const {
inline Tensor Tensor::argmin() const {
return type().argmin(*this);
}
inline Tensor Tensor::as_strided(IntList size, IntList stride) const {
return type().as_strided(*this, size, stride);
}
inline Tensor & Tensor::as_strided_(IntList size, IntList stride) {
return type().as_strided_(*this, size, stride);
}
inline Tensor Tensor::as_strided(IntList size, IntList stride, int64_t storage_offset) const {
inline Tensor Tensor::as_strided(IntList size, IntList stride, c10::optional<int64_t> storage_offset) const {
return type().as_strided(*this, size, stride, storage_offset);
}
inline Tensor & Tensor::as_strided_(IntList size, IntList stride, int64_t storage_offset) {
inline Tensor & Tensor::as_strided_(IntList size, IntList stride, c10::optional<int64_t> storage_offset) {
return type().as_strided_(*this, size, stride, storage_offset);
}
inline Tensor Tensor::asin() const {
@ -567,7 +562,7 @@ inline Tensor & Tensor::squeeze_(int64_t dim) {
inline Tensor Tensor::sspaddmm(const Tensor & mat1, const Tensor & mat2, Scalar beta, Scalar alpha) const {
return type().sspaddmm(*this, mat1, mat2, beta, alpha);
}
inline Tensor Tensor::stft(int64_t n_fft, int64_t hop_length, int64_t win_length, const Tensor & window, bool normalized, bool onesided) const {
inline Tensor Tensor::stft(int64_t n_fft, c10::optional<int64_t> hop_length, c10::optional<int64_t> win_length, const Tensor & window, bool normalized, bool onesided) const {
return type().stft(*this, n_fft, hop_length, win_length, window, normalized, onesided);
}
inline int64_t Tensor::stride(int64_t dim) const {
@ -588,6 +583,9 @@ inline Tensor Tensor::sum(IntList dim, bool keepdim) const {
inline Tensor Tensor::sum(IntList dim, ScalarType dtype) const {
return type().sum(*this, dim, dtype);
}
inline Tensor Tensor::sum_to_size(IntList size) const {
return type().sum_to_size(*this, size);
}
inline Tensor Tensor::sqrt() const {
return type().sqrt(*this);
}
@ -597,7 +595,7 @@ inline Tensor & Tensor::sqrt_() {
inline Tensor Tensor::std(bool unbiased) const {
return type().std(*this, unbiased);
}
inline Tensor Tensor::std(int64_t dim, bool unbiased, bool keepdim) const {
inline Tensor Tensor::std(IntList dim, bool unbiased, bool keepdim) const {
return type().std(*this, dim, unbiased, keepdim);
}
inline Tensor Tensor::prod(ScalarType dtype) const {
@ -678,7 +676,7 @@ inline Tensor Tensor::where(const Tensor & condition, const Tensor & other) cons
inline Tensor Tensor::norm(Scalar p) const {
return type().norm(*this, p);
}
inline Tensor Tensor::norm(Scalar p, int64_t dim, bool keepdim) const {
inline Tensor Tensor::norm(c10::optional<Scalar> p, int64_t dim, bool keepdim) const {
return type().norm(*this, p, dim, keepdim);
}
inline Tensor Tensor::clone() const {
@ -1119,8 +1117,8 @@ inline std::tuple<Tensor,Tensor,Tensor> Tensor::svd(bool some, bool compute_uv)
inline Tensor Tensor::cholesky(bool upper) const {
return type().cholesky(*this, upper);
}
inline Tensor Tensor::potrs(const Tensor & input2, bool upper) const {
return type().potrs(*this, input2, upper);
inline Tensor Tensor::cholesky_solve(const Tensor & input2, bool upper) const {
return type().cholesky_solve(*this, input2, upper);
}
inline Tensor Tensor::potri(bool upper) const {
return type().potri(*this, upper);

View File

@ -1,18 +1,18 @@
#pragma once
#include "ATen/core/ATenGeneral.h"
#include <ATen/core/ATenGeneral.h>
#include <c10/core/Allocator.h>
#include "ATen/core/Deprecated.h"
#include "ATen/core/Generator.h"
#include <ATen/core/Deprecated.h>
#include <ATen/core/Generator.h>
#include <c10/core/Layout.h>
#include <c10/core/Scalar.h>
#include <c10/core/ScalarType.h>
#include "ATen/core/SparseTensorRef.h"
#include <ATen/core/SparseTensorRef.h>
#include <c10/util/ArrayRef.h>
#include <c10/Half.h>
#include <c10/core/TensorTypeIdRegistration.h>
#include "ATen/core/Reduction.h"
#include "ATen/core/TensorOptions.h"
#include <ATen/core/Reduction.h>
#include <c10/core/TensorOptions.h>
#include <c10/util/Optional.h>
@ -35,9 +35,11 @@ struct Storage;
namespace at {
class Tensor;
using TensorList = ArrayRef<Tensor>;
class Context;
struct Generator;
class Tensor;
static inline void noop_deleter(void*) {}
@ -97,8 +99,6 @@ struct CAFFE2_API Type {
bool is_undefined() const noexcept { return is_undefined_; }
virtual Allocator * allocator() const = 0;
virtual Device getDeviceFromPtr(void * data) const = 0;
virtual Storage storage(bool resizable = false) const = 0;
virtual Storage storage(size_t size, bool resizable = false) const = 0;
virtual Storage storageFromBlob(void * data, int64_t size, const std::function<void(void*)> & deleter=noop_deleter) const = 0;
virtual Storage storageWithAllocator(int64_t size, Allocator* allocator) const = 0;
virtual std::unique_ptr<Generator> generator() const = 0;
@ -135,7 +135,10 @@ struct CAFFE2_API Type {
return backendToDeviceType(backend());
}
virtual Tensor copy(const Tensor & src, bool non_blocking=false, c10::optional<Device> to_device={}) const = 0;
virtual Tensor copy(
const Tensor& src,
bool non_blocking = false,
c10::optional<Device> to_device = {}) const = 0;
virtual Tensor & copy_(Tensor & self, const Tensor & src, bool non_blocking=false) const = 0;
virtual void backward(
@ -149,7 +152,6 @@ struct CAFFE2_API Type {
virtual Tensor tensorFromBlob(void * data, IntList sizes, IntList strides, const std::function<void(void*)> & deleter=noop_deleter) const = 0;
virtual Tensor tensorWithAllocator(IntList sizes, Allocator* allocator) const = 0;
virtual Tensor tensorWithAllocator(IntList sizes, IntList strides, Allocator* allocator) const = 0;
virtual Tensor scalarTensor(Scalar s) const = 0;
bool operator==(const Type& other) const {
return this == &other;
@ -168,7 +170,7 @@ struct CAFFE2_API Type {
/// Constructs the `TensorOptions` from a type and a Device. Asserts that
/// the device type matches the device type of the type.
TensorOptions options(optional<Device> device_opt) const {
TensorOptions options(c10::optional<Device> device_opt) const {
if (!device_opt.has_value()) {
return options(-1);
} else {
@ -203,10 +205,8 @@ struct CAFFE2_API Type {
virtual Tensor argmax(const Tensor & self) const = 0;
virtual Tensor argmin(const Tensor & self, int64_t dim, bool keepdim) const = 0;
virtual Tensor argmin(const Tensor & self) const = 0;
virtual Tensor as_strided(const Tensor & self, IntList size, IntList stride) const = 0;
virtual Tensor & as_strided_(Tensor & self, IntList size, IntList stride) const = 0;
virtual Tensor as_strided(const Tensor & self, IntList size, IntList stride, int64_t storage_offset) const = 0;
virtual Tensor & as_strided_(Tensor & self, IntList size, IntList stride, int64_t storage_offset) const = 0;
virtual Tensor as_strided(const Tensor & self, IntList size, IntList stride, c10::optional<int64_t> storage_offset) const = 0;
virtual Tensor & as_strided_(Tensor & self, IntList size, IntList stride, c10::optional<int64_t> storage_offset) const = 0;
virtual Tensor asin(const Tensor & self) const = 0;
virtual Tensor & asin_(Tensor & self) const = 0;
virtual Tensor atan(const Tensor & self) const = 0;
@ -354,17 +354,18 @@ struct CAFFE2_API Type {
virtual Tensor & squeeze_(Tensor & self) const = 0;
virtual Tensor & squeeze_(Tensor & self, int64_t dim) const = 0;
virtual Tensor sspaddmm(const Tensor & self, const Tensor & mat1, const Tensor & mat2, Scalar beta, Scalar alpha) const = 0;
virtual Tensor stft(const Tensor & self, int64_t n_fft, int64_t hop_length, int64_t win_length, const Tensor & window, bool normalized, bool onesided) const = 0;
virtual Tensor stft(const Tensor & self, int64_t n_fft, c10::optional<int64_t> hop_length, c10::optional<int64_t> win_length, const Tensor & window, bool normalized, bool onesided) const = 0;
virtual int64_t stride(const Tensor & self, int64_t dim) const = 0;
virtual Tensor sum(const Tensor & self, ScalarType dtype) const = 0;
virtual Tensor sum(const Tensor & self) const = 0;
virtual Tensor sum(const Tensor & self, IntList dim, bool keepdim, ScalarType dtype) const = 0;
virtual Tensor sum(const Tensor & self, IntList dim, bool keepdim) const = 0;
virtual Tensor sum(const Tensor & self, IntList dim, ScalarType dtype) const = 0;
virtual Tensor sum_to_size(const Tensor & self, IntList size) const = 0;
virtual Tensor sqrt(const Tensor & self) const = 0;
virtual Tensor & sqrt_(Tensor & self) const = 0;
virtual Tensor std(const Tensor & self, bool unbiased) const = 0;
virtual Tensor std(const Tensor & self, int64_t dim, bool unbiased, bool keepdim) const = 0;
virtual Tensor std(const Tensor & self, IntList dim, bool unbiased, bool keepdim) const = 0;
virtual Tensor prod(const Tensor & self, ScalarType dtype) const = 0;
virtual Tensor prod(const Tensor & self) const = 0;
virtual Tensor prod(const Tensor & self, int64_t dim, bool keepdim, ScalarType dtype) const = 0;
@ -391,7 +392,7 @@ struct CAFFE2_API Type {
virtual Tensor view_as(const Tensor & self, const Tensor & other) const = 0;
virtual Tensor where(const Tensor & condition, const Tensor & self, const Tensor & other) const = 0;
virtual Tensor norm(const Tensor & self, Scalar p) const = 0;
virtual Tensor norm(const Tensor & self, Scalar p, int64_t dim, bool keepdim) const = 0;
virtual Tensor norm(const Tensor & self, c10::optional<Scalar> p, int64_t dim, bool keepdim) const = 0;
virtual Tensor clone(const Tensor & self) const = 0;
virtual Tensor & resize_as_(Tensor & self, const Tensor & the_template) const = 0;
virtual Tensor pow(const Tensor & self, Scalar exponent) const = 0;
@ -538,7 +539,7 @@ struct CAFFE2_API Type {
virtual std::tuple<Tensor,Tensor> eig(const Tensor & self, bool eigenvectors) const = 0;
virtual std::tuple<Tensor,Tensor,Tensor> svd(const Tensor & self, bool some, bool compute_uv) const = 0;
virtual Tensor cholesky(const Tensor & self, bool upper) const = 0;
virtual Tensor potrs(const Tensor & self, const Tensor & input2, bool upper) const = 0;
virtual Tensor cholesky_solve(const Tensor & self, const Tensor & input2, bool upper) const = 0;
virtual Tensor potri(const Tensor & self, bool upper) const = 0;
virtual std::tuple<Tensor,Tensor> pstrf(const Tensor & self, bool upper, Scalar tol) const = 0;
virtual std::tuple<Tensor,Tensor> qr(const Tensor & self) const = 0;
@ -588,4 +589,4 @@ protected:
} // namespace at
#include "ATen/core/Tensor.h"
#include <ATen/core/Tensor.h>

View File

@ -1,34 +1 @@
#pragma once
#include "ATen/core/TensorImpl.h"
namespace at {
struct CAFFE2_API UndefinedTensorImpl final : public TensorImpl {
public:
// Without this, we get:
// error: identifier "at::UndefinedTensorImpl::_singleton" is undefined in device code
// (ostensibly because the constexpr tricks MSVC into trying to compile this
// function for device as well).
#ifdef _WIN32
static inline TensorImpl * singleton() {
#else
static constexpr inline TensorImpl * singleton() {
#endif
return &_singleton;
}
IntList sizes() const override;
IntList strides() const override;
int64_t size(int64_t d) const override;
int64_t stride(int64_t d) const override;
int64_t dim() const override;
const Storage& storage() const override;
int64_t storage_offset() const override;
private:
UndefinedTensorImpl();
static UndefinedTensorImpl _singleton;
public:
friend struct UndefinedType;
};
} // namespace at
#include <c10/core/UndefinedTensorImpl.h>

View File

@ -2,7 +2,7 @@
#include <unordered_set>
#include <vector>
#include <ATen/core/interned_strings.h>
#include "c10/util/Exception.h"
#include <c10/util/Exception.h>
namespace c10 {
class AliasInfo {

View File

@ -43,6 +43,7 @@ _(aten, _cast_Short) \
_(aten, _cat) \
_(aten, _ceil) \
_(aten, _cholesky_helper) \
_(aten, _cholesky_solve_helper) \
_(aten, _convolution) \
_(aten, _convolution_double_backward) \
_(aten, _convolution_nogroup) \
@ -102,7 +103,6 @@ _(aten, _pack_padded_sequence_backward) \
_(aten, _pad_packed_sequence) \
_(aten, _pdist_backward) \
_(aten, _pdist_forward) \
_(aten, _potrs_helper) \
_(aten, _prod) \
_(aten, _prodall) \
_(aten, _range) \
@ -242,6 +242,7 @@ _(aten, ceil) \
_(aten, celu) \
_(aten, chain_matmul) \
_(aten, cholesky) \
_(aten, cholesky_solve) \
_(aten, chunk) \
_(aten, clamp) \
_(aten, clamp_max) \
@ -523,7 +524,6 @@ _(aten, pixel_shuffle) \
_(aten, poisson) \
_(aten, polygamma) \
_(aten, potri) \
_(aten, potrs) \
_(aten, pow) \
_(aten, prelu) \
_(aten, prelu_backward) \
@ -626,6 +626,7 @@ _(aten, sub) \
_(aten, sub_) \
_(aten, rsub) \
_(aten, sum) \
_(aten, sum_to_size) \
_(aten, svd) \
_(aten, symeig) \
_(aten, t) \
@ -683,6 +684,9 @@ _(aten, unsqueeze) \
_(aten, upsample_bilinear2d) \
_(aten, upsample_bilinear2d_backward) \
_(aten, upsample_bilinear2d_forward) \
_(aten, upsample_bicubic2d) \
_(aten, upsample_bicubic2d_backward) \
_(aten, upsample_bicubic2d_forward) \
_(aten, upsample_linear1d) \
_(aten, upsample_linear1d_backward) \
_(aten, upsample_linear1d_forward) \

View File

@ -11,49 +11,6 @@ C10_DEFINE_TYPED_REGISTRY(
std::unique_ptr,
at::Device);
// First dimension of the array is `bool async`: 0 is sync,
// 1 is async (non-blocking)
static CopyBytesFunction g_copy_bytes[2][COMPILE_TIME_MAX_DEVICE_TYPES]
[COMPILE_TIME_MAX_DEVICE_TYPES];
_CopyBytesFunctionRegisterer::_CopyBytesFunctionRegisterer(
DeviceType fromType,
DeviceType toType,
CopyBytesFunction func_sync,
CopyBytesFunction func_async) {
auto from = static_cast<int>(fromType);
auto to = static_cast<int>(toType);
if (!func_async) {
// default to the sync function
func_async = func_sync;
}
CHECK(
g_copy_bytes[0][from][to] == nullptr &&
g_copy_bytes[1][from][to] == nullptr)
<< "Duplicate registration for device type pair "
<< c10::DeviceTypeName(fromType) << ", " << c10::DeviceTypeName(toType);
g_copy_bytes[0][from][to] = func_sync;
g_copy_bytes[1][from][to] = func_async;
}
void CopyBytes(
size_t nbytes,
const void* src,
Device src_device,
void* dst,
Device dst_device,
bool async) {
auto ptr = g_copy_bytes[async ? 1 : 0][static_cast<int>(src_device.type())]
[static_cast<int>(dst_device.type())];
CAFFE_ENFORCE(
ptr,
"No function found for copying from ",
c10::DeviceTypeName(src_device.type()),
" to ",
c10::DeviceTypeName(dst_device.type()));
ptr(nbytes, src, src_device, dst, dst_device);
}
} // namespace at
namespace caffe2 {

View File

@ -11,6 +11,7 @@
#include <c10/util/typeid.h>
#include <c10/util/Exception.h>
#include <c10/util/Registry.h>
#include <c10/core/CopyBytes.h>
namespace caffe2 {
class Event;
@ -156,39 +157,6 @@ inline std::unique_ptr<at::BaseContext> CreateContext(
} // namespace at
// TODO: move it to a separate file in c10 if possible
namespace at {
using CopyBytesFunction = void (*)(
size_t nbytes,
const void* src,
Device src_device,
void* dst,
Device dst_device);
struct CAFFE2_API _CopyBytesFunctionRegisterer {
_CopyBytesFunctionRegisterer(
DeviceType from,
DeviceType to,
CopyBytesFunction func_sync,
CopyBytesFunction func_async = nullptr);
};
#define REGISTER_COPY_BYTES_FUNCTION(from, to, ...) \
namespace { \
static _CopyBytesFunctionRegisterer C10_ANONYMOUS_VARIABLE( \
g_copy_function)(from, to, __VA_ARGS__); \
}
CAFFE2_API void CopyBytes(
size_t nbytes,
const void* src,
Device src_device,
void* dst,
Device dst_device,
bool async);
} // namespace at
namespace caffe2 {
using at::BaseContext;

View File

@ -1,4 +1,4 @@
#include "ATen/core/interned_strings.h"
#include <ATen/core/interned_strings.h>
#include <cstdint>
#include <cstring>
#include <iostream>
@ -7,9 +7,9 @@
#include <string>
#include <unordered_map>
#include <vector>
#include "ATen/core/interned_strings_class.h"
#include "c10/util/Exception.h"
#include "c10/util/Optional.h"
#include <ATen/core/interned_strings_class.h>
#include <c10/util/Exception.h>
#include <c10/util/Optional.h>
namespace c10 {

View File

@ -52,14 +52,11 @@ namespace c10 {
_(prim, TupleSlice) \
_(prim, ListConstruct) \
_(prim, ListUnpack) \
_(prim, BoolToTensor) \
_(prim, NumToTensor) \
_(prim, TensorToNum) \
_(prim, ImplicitTensorToNum) \
_(prim, TensorToBool) \
_(prim, IntToFloat) \
_(prim, FloatToInt) \
_(prim, StringToFloat) \
_(prim, Bool) \
_(prim, Int) \
_(prim, Float) \
_(prim, device) \
_(prim, dtype) \
_(prim, shape) \
@ -70,7 +67,6 @@ namespace c10 {
_(prim, AnyDefined) \
_(prim, FusedConcat) \
_(prim, ConstantChunk) \
_(prim, NoneGenerator) \
_(prim, MMTreeReduce) \
_(prim, MMBatchSide) \
_(aten, warn) \
@ -78,6 +74,7 @@ namespace c10 {
_(aten, __round_to_zero_floordiv)\
_(prim, fork) \
_(prim, RaiseException) \
_(prim, Function) \
_(aten, append) \
_(aten, format) \
_(aten, __not__) \
@ -87,6 +84,7 @@ namespace c10 {
_(aten, _set_item) \
_(aten, index_put_) \
_(aten, device) \
_(aten, len) \
FORALL_ATEN_BASE_SYMBOLS(_) \
_(onnx, Add) \
_(onnx, Concat) \

View File

@ -6,8 +6,8 @@
#include <string>
#include <unordered_map>
#include <vector>
#include "ATen/core/interned_strings.h"
#include "c10/util/Exception.h"
#include <ATen/core/interned_strings.h>
#include <c10/util/Exception.h>
namespace c10 {

View File

@ -80,4 +80,8 @@ std::ostream& operator<<(std::ostream & out, const IValue & v) {
#undef TORCH_FORALL_TAGS
void IValue::dump() const {
std::cout << *this << "\n";
}
} // namespace c10

View File

@ -2,8 +2,8 @@
#include <ATen/core/Scalar.h>
#include <ATen/core/Tensor.h>
#include <ATen/core/TensorImpl.h>
#include <ATen/core/UndefinedTensorImpl.h>
#include <c10/core/TensorImpl.h>
#include <c10/core/UndefinedTensorImpl.h>
#include <ATen/core/blob.h>
#include <c10/util/intrusive_ptr.h>
#include <ATen/core/thread_pool.h>
@ -134,6 +134,8 @@ struct CAFFE2_API IValue final {
return *this;
}
void dump() const;
bool isAliasOf(const IValue& rhs) const {
if (this->tag != rhs.tag) {
// Trivially don't alias if the type is different
@ -510,14 +512,49 @@ struct C10_EXPORT ivalue::Future final : c10::intrusive_ptr_target {
}
public:
struct CAFFE2_API FutureError final : public std::exception {
FutureError(std::string&& error_msg_)
: error_msg(std::move(error_msg_)) {}
FutureError() = default;
const char* what() const noexcept override {
return error_msg.c_str();
}
std::string error_msg;
};
/**
* Wait on the future until it completes.
*/
void wait() {
if (completed()) {
return;
}
c10::global_work_queue().workOnTasksUntilCompleted(intrusive_from_this());
std::condition_variable finished;
bool fired = false;
// Add a callback to notify the current thread
// when the current future completes.
addCallback([&] {
std::unique_lock<std::mutex> lock(mutex_);
finished.notify_all();
fired = true;
});
// The current thread will be blocked unless the above callback is fired.
std::unique_lock<std::mutex> lock(mutex_);
while (!fired) {
finished.wait(lock);
}
AT_ASSERT(completed());
}
/**
* Explicitly mark the future as completed with the output value.
*/
void markCompleted(IValue value) {
{
// This is not to protect completed_ but to create a barrier
@ -528,21 +565,39 @@ struct C10_EXPORT ivalue::Future final : c10::intrusive_ptr_target {
value_ = std::move(value);
}
// There is no need to protect callbacks anymore.
// Once completed_ is set to true, no one can add new callback to the list.
for (auto& callback : callbacks) {
callback();
fireCallbacks();
}
void markCompleted(FutureError&& error_) {
{
// This is not to protect completed_ but to create a barrier
// from possible addCallback() calls
std::unique_lock<std::mutex> lock(mutex_);
AT_ASSERT(!completed());
completed_ = true;
has_error = true;
error = std::move(error_);
}
callbacks.clear();
fireCallbacks();
}
// Get the result of the current future.
IValue value() {
std::unique_lock<std::mutex> lock(mutex_);
AT_ASSERT(completed());
if (has_error) {
throw error;
}
return value_;
}
/**
* Add a callback to the future.
* The callbacks will be executed once the future completes.
* If the future has already completed,
* this function will execute the callback immediately.
*/
void addCallback(std::function<void(void)> callback) {
std::unique_lock<std::mutex> lock(mutex_);
if (completed()) {
@ -558,23 +613,43 @@ struct C10_EXPORT ivalue::Future final : c10::intrusive_ptr_target {
return completed_;
}
std::mutex& get_mutex() {
return mutex_;
}
CAFFE2_API friend std::ostream& operator<<(
std::ostream& out,
const Future& v);
private:
void fireCallbacks() {
AT_ASSERT(completed());
// There is no need to protect callbacks with the lock.
// Once completed_ is set to true, no one can add new callback to the list.
for (auto& callback : callbacks) {
callback();
}
callbacks.clear();
}
std::mutex mutex_;
IValue value_; // when finished the value
std::atomic_bool completed_ = {false}; // is this future complete
std::vector<std::function<void(void)>> callbacks;
bool has_error = false;
FutureError error;
};
#undef TORCH_FORALL_TAGS
namespace detail {
struct _guarded_unsigned_long_unique_dummy final {
_guarded_unsigned_long_unique_dummy(int64_t){};
};
using _guarded_unsigned_long = c10::guts::conditional_t<
std::is_same<unsigned long, uint32_t>::value ||
std::is_same<unsigned long, uint64_t>::value,
_guarded_unsigned_long_unique_dummy,
unsigned long>;
} // namespace detail
#define DEFINE_TO(type, method_name) \
template<> \
@ -587,7 +662,16 @@ inline type IValue::to<type>() const & { \
}
DEFINE_TO(at::Tensor, toTensor)
DEFINE_TO(c10::intrusive_ptr<ivalue::Tuple>, toTuple)
DEFINE_TO(float, toDouble)
DEFINE_TO(double, toDouble)
DEFINE_TO(unsigned char, toInt)
DEFINE_TO(signed char, toInt)
DEFINE_TO(unsigned short, toInt)
DEFINE_TO(short, toInt)
DEFINE_TO(int, toInt)
DEFINE_TO(uint32_t, toInt)
DEFINE_TO(uint64_t, toInt)
DEFINE_TO(detail::_guarded_unsigned_long, toInt)
DEFINE_TO(int64_t, toInt)
DEFINE_TO(bool, toBool)
DEFINE_TO(c10::intrusive_ptr<ivalue::DoubleList>, toDoubleList)

View File

@ -532,6 +532,9 @@ struct CAFFE2_API FutureType : public SingleElementType<TypeKind::FutureType, Fu
ss << "Future[" << getElementType()->python_str() << "]";
return ss.str();
}
TypePtr createWithContained(std::vector<TypePtr> contained_types) const override {
return create(contained_types.at(0));
}
private:
FutureType(TypePtr elem) : SingleElementType(elem) {}
};
@ -868,7 +871,6 @@ inline TypePtr unshapedType(const TypePtr& type) {
}
inline TypePtr CompleteTensorType::fromNumberType(TypePtr typ) {
AT_ASSERT(typ->isSubtypeOf(NumberType::get()));
if (typ->isSubtypeOf(IntType::get())) {
return CompleteTensorType::create(at::kLong, at::kCPU, {});
} else if (typ->isSubtypeOf(FloatType::get())) {
@ -902,7 +904,6 @@ TypePtr getTypePtr() {
" could not be converted to any of the known types { ",
C10_FORALL_TYPES(TYPE_STR) "}");
#undef TYPE_STR
return nullptr;
}
template<> inline TypePtr getTypePtr<at::Tensor>() { return DynamicType::get(); }
@ -915,7 +916,7 @@ template<> inline TypePtr getTypePtr<std::vector<at::Tensor>>() { return ListTyp
template<> inline TypePtr getTypePtr<std::vector<double>>() { return ListType::ofFloats(); }
template<> inline TypePtr getTypePtr<std::vector<int64_t>>() { return ListType::ofInts(); }
CAFFE2_API TypePtr inferTypeFrom(const IValue& value);
CAFFE2_API TypePtr incompleteInferTypeFrom(const IValue& value);
using TypeEnv = std::unordered_map<std::string, TypePtr>;
struct MatchTypeReturn {

View File

@ -1,16 +1,38 @@
#include "ATen/core/interned_strings_class.h"
// This file is compiled with -O0 because the fully-macro-expanded
// function is huge and only called once at startup.
#include <ATen/core/interned_strings_class.h>
namespace c10 {
namespace {
struct Entry {
const char* const qual_name;
const char* const unqual_name;
const Symbol sym;
const Symbol ns_sym;
};
constexpr Entry entries[] = {
#define SYMBOL_ENTRY(n, s) {#n "::" #s, #s, n::s, namespaces::n},
FORALL_NS_SYMBOLS(SYMBOL_ENTRY)
#undef SYMBOL_ENTRY
};
} // namespace
InternedStrings::InternedStrings()
: sym_to_info_(static_cast<size_t>(_keys::num_symbols)) {
#define REGISTER_SYMBOL(n, s) \
string_to_sym_[#n "::" #s] = n::s; \
sym_to_info_[n::s] = {namespaces::n, #n "::" #s, #s};
FORALL_NS_SYMBOLS(REGISTER_SYMBOL)
#undef REGISTER_SYMBOL
// Instead of a loop, this could be done by expanding the
// assignments directly into FORALL_NS_SYMBOLS, but it would create
// a huge function (thanks to all the std::string constructors and
// operator[]s) which would take several minutes to optimize. A
// static C array of constexpr-constructible structs takes instead
// no time to compile.
for (const auto& entry : entries) {
string_to_sym_[entry.qual_name] = entry.sym;
sym_to_info_[entry.sym] = {
entry.ns_sym, entry.qual_name, entry.unqual_name};
}
}
} // namespace c10

View File

@ -65,20 +65,6 @@ void ThreadPool::waitWorkComplete() {
}
}
void ThreadPool::workOnTasksUntilCompleted(
c10::intrusive_ptr<ivalue::Future> future) {
if (future->completed()) {
return;
}
std::condition_variable finished;
future->addCallback([&] { finished.notify_all(); });
std::unique_lock<std::mutex> future_lock(future->get_mutex());
while (!future->completed()) {
finished.wait(future_lock);
}
}
void ThreadPool::main_loop(std::size_t index) {
init_thread();

View File

@ -53,7 +53,7 @@ class CAFFE2_API ThreadPool : public c10::TaskThreadPoolBase {
std::mutex mutex_;
std::condition_variable condition_;
std::condition_variable completed_;
bool running_;
std::atomic_bool running_;
bool complete_;
std::size_t available_;
std::size_t total_;
@ -89,9 +89,6 @@ class CAFFE2_API ThreadPool : public c10::TaskThreadPoolBase {
/// @brief Wait for queue to be empty
void waitWorkComplete();
// @brief Wait for the specific future to finish in the queue
void workOnTasksUntilCompleted(c10::intrusive_ptr<ivalue::Future> future);
protected:
virtual void init_thread() {}

View File

@ -116,7 +116,13 @@ ListTypePtr ListType::ofBools() {
return value;
}
TypePtr inferTypeFrom(const IValue& value) {
// why incomplete? You cannot completely recover a type from
// an IValue, List[List[int]] and List[List[Tensor]] will both
// become ivalue.isGenericList() and cannot be recovered.
// The only appropriate place to use this is where you know that
// you are only dealing with a subset of objects where you can recover
// the type, like in the tracer.
TypePtr incompleteInferTypeFrom(const IValue& value) {
if (value.isTensor()) {
return CompleteTensorType::create(value.toTensor());
} else if (value.isDouble()) {
@ -136,11 +142,11 @@ TypePtr inferTypeFrom(const IValue& value) {
} else if (value.isDoubleList()) {
return ListType::ofFloats();
} else if (value.isTuple()) {
return TupleType::create(fmap(value.toTuple()->elements(), inferTypeFrom));
return TupleType::create(fmap(value.toTuple()->elements(), incompleteInferTypeFrom));
} else if (value.isDevice()) {
return DeviceObjType::get();
}
AT_ASSERTM(false, "Unhandled IValue kind in inferTypeFrom");
AT_ERROR("Type cannot be accurately recovered from this IValue.");
}
c10::optional<TypePtr> unifyTypes(const TypePtr& t1, const TypePtr& t2) {

View File

@ -1,5 +1,5 @@
#pragma once
#include "vec256.h"
#include <ATen/cpu/vec256/vec256.h>
namespace at { namespace vec256 {
@ -10,10 +10,10 @@ inline scalar_t vec_reduce_all(
vec256::Vec256<scalar_t> acc_vec,
int64_t size) {
using Vec = vec256::Vec256<scalar_t>;
scalar_t acc_arr[Vec::size];
scalar_t acc_arr[Vec::size()];
acc_vec.store(acc_arr);
for (int64_t i = 1; i < size; i++) {
scalar_t acc_arr_next[Vec::size];
scalar_t acc_arr_next[Vec::size()];
acc_arr_next[0] = acc_arr[i];
Vec acc_vec_next = Vec::loadu(acc_arr_next);
acc_vec = vec_fun(acc_vec, acc_vec_next);
@ -25,11 +25,11 @@ inline scalar_t vec_reduce_all(
template <typename scalar_t, typename Op>
inline scalar_t reduce_all(const Op& vec_fun, scalar_t* data, int64_t size) {
using Vec = vec256::Vec256<scalar_t>;
if (size < Vec::size)
if (size < Vec::size())
return vec_reduce_all(vec_fun, Vec::loadu(data, size), size);
int64_t d = Vec::size;
int64_t d = Vec::size();
Vec acc_vec = Vec::loadu(data);
for (; d < size - (size % Vec::size); d += Vec::size) {
for (; d < size - (size % Vec::size()); d += Vec::size()) {
Vec data_vec = Vec::loadu(data + d);
acc_vec = vec_fun(acc_vec, data_vec);
}
@ -37,7 +37,7 @@ inline scalar_t reduce_all(const Op& vec_fun, scalar_t* data, int64_t size) {
Vec data_vec = Vec::loadu(data + d, size - d);
acc_vec = Vec::set(acc_vec, vec_fun(acc_vec, data_vec), size - d);
}
return vec_reduce_all(vec_fun, acc_vec, Vec::size);
return vec_reduce_all(vec_fun, acc_vec, Vec::size());
}
template <typename scalar_t, typename MapOp, typename ReduceOp>
@ -47,11 +47,11 @@ inline scalar_t map_reduce_all(
scalar_t* data,
int64_t size) {
using Vec = vec256::Vec256<scalar_t>;
if (size < Vec::size)
if (size < Vec::size())
return vec_reduce_all(red_fun, map_fun(Vec::loadu(data, size)), size);
int64_t d = Vec::size;
int64_t d = Vec::size();
Vec acc_vec = map_fun(Vec::loadu(data));
for (; d < size - (size % Vec::size); d += Vec::size) {
for (; d < size - (size % Vec::size()); d += Vec::size()) {
Vec data_vec = Vec::loadu(data + d);
data_vec = map_fun(data_vec);
acc_vec = red_fun(acc_vec, data_vec);
@ -61,7 +61,7 @@ inline scalar_t map_reduce_all(
data_vec = map_fun(data_vec);
acc_vec = Vec::set(acc_vec, red_fun(acc_vec, data_vec), size - d);
}
return vec_reduce_all(red_fun, acc_vec, Vec::size);
return vec_reduce_all(red_fun, acc_vec, Vec::size());
}
template <typename scalar_t, typename MapOp, typename ReduceOp>
@ -72,15 +72,15 @@ inline scalar_t map2_reduce_all(
const scalar_t* data2,
int64_t size) {
using Vec = vec256::Vec256<scalar_t>;
if (size < Vec::size) {
if (size < Vec::size()) {
Vec data_vec = Vec::loadu(data, size);
Vec data2_vec = Vec::loadu(data2, size);
data_vec = map_fun(data_vec, data2_vec);
return vec_reduce_all(red_fun, data_vec, size);
}
int64_t d = Vec::size;
int64_t d = Vec::size();
Vec acc_vec = map_fun(Vec::loadu(data), Vec::loadu(data2));
for (; d < size - (size % Vec::size); d += Vec::size) {
for (; d < size - (size % Vec::size()); d += Vec::size()) {
Vec data_vec = Vec::loadu(data + d);
Vec data2_vec = Vec::loadu(data2 + d);
data_vec = map_fun(data_vec, data2_vec);
@ -92,7 +92,7 @@ inline scalar_t map2_reduce_all(
data_vec = map_fun(data_vec, data2_vec);
acc_vec = Vec::set(acc_vec, red_fun(acc_vec, data_vec), size - d);
}
return vec_reduce_all(red_fun, acc_vec, Vec::size);
return vec_reduce_all(red_fun, acc_vec, Vec::size());
}
template <typename scalar_t, typename Op>
@ -103,7 +103,7 @@ inline void map(
int64_t size) {
using Vec = vec256::Vec256<scalar_t>;
int64_t d = 0;
for (; d < size - (size % Vec::size); d += Vec::size) {
for (; d < size - (size % Vec::size()); d += Vec::size()) {
Vec output_vec = vec_fun(Vec::loadu(input_data + d));
output_vec.store(output_data + d);
}
@ -122,7 +122,7 @@ inline void map2(
int64_t size) {
using Vec = vec256::Vec256<scalar_t>;
int64_t d = 0;
for (; d < size - (size % Vec::size); d += Vec::size) {
for (; d < size - (size % Vec::size()); d += Vec::size()) {
Vec data_vec = Vec::loadu(input_data + d);
Vec data_vec2 = Vec::loadu(input_data2 + d);
Vec output_vec = vec_fun(data_vec, data_vec2);

View File

@ -1,11 +1,11 @@
#pragma once
#include "intrinsics.h"
#include <ATen/cpu/vec256/intrinsics.h>
#include "vec256_base.h"
#include "vec256_float.h"
#include "vec256_double.h"
#include "vec256_int.h"
#include <ATen/cpu/vec256/vec256_base.h>
#include <ATen/cpu/vec256/vec256_float.h>
#include <ATen/cpu/vec256/vec256_double.h>
#include <ATen/cpu/vec256/vec256_int.h>
#include <algorithm>
#include <cstddef>
@ -15,14 +15,24 @@
namespace at {
namespace vec256 {
// Note [Acceptable use of anonymous namespace in header]
// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
// Yes you saw right, this is an anonymous namespace in a header. This header,
// and all of its subheaders, REQUIRE their code to be entirely inlined into
// the compilation unit that uses them. It's important that these functions have
// internal linkage so that kernels for different architectures don't get
// combined during linking. It's sufficient to label functions "static", but
// class methods must be an unnamed namespace to have internal linkage (since
// static means something different in the context of classes).
namespace {
template <typename T>
std::ostream& operator<<(std::ostream& stream, const Vec256<T>& vec) {
T buf[Vec256<T>::size];
T buf[Vec256<T>::size()];
vec.store(buf);
stream << "vec[";
for (int i = 0; i != Vec256<T>::size; i++) {
for (int i = 0; i != Vec256<T>::size(); i++) {
if (i != 0) {
stream << ", ";
}

View File

@ -6,8 +6,8 @@
#include <type_traits>
#include <bitset>
#include "ATen/Utils.h"
#include "ATen/native/Copy.h"
#include <ATen/Utils.h>
#include <ATen/native/Copy.h>
#include <c10/util/C++17.h>
#if defined(__GNUC__)
@ -20,6 +20,7 @@
namespace at {
namespace vec256 {
// See Note [Acceptable use of anonymous namespace in header]
namespace {
template<size_t n> struct int_of_size;
@ -45,15 +46,49 @@ struct Vec256 {
private:
T values[32 / sizeof(T)] = {0};
public:
static constexpr int size = 32 / sizeof(T);
// Note [constexpr static function to avoid odr-usage compiler bug]
// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
// Why, you might ask, is size defined to be a static constexpr function,
// rather than a more ordinary 'static constexpr int size;' variable?
// The problem lies within ODR rules for static constexpr members versus
// static constexpr functions. First, recall that this class (along with all
// of its derivations) live in an anonymous namespace: they are intended to be
// *completely* inlined at their use-sites, because we need to compile it
// multiple times for different instruction sets.
//
// Because of this constraint, we CANNOT provide a single definition for
// any static members in this class; since we want to compile the class
// multiple times, there wouldn't actually be any good place to put the
// definition. Now here is the problem: if we ODR-use a static constexpr
// member, we are *obligated* to provide a definition. Without the
// definition, you get a compile error like:
//
// relocation R_X86_64_PC32 against undefined symbol
// `_ZN2at6vec25612_GLOBAL__N_16Vec256IdE4sizeE' can not be used when making
// a shared object; recompile with -fPIC
//
// If this were C++17, we could replace a static constexpr variable with
// an inline variable which doesn't require one definition. But we are not
// C++17. So the next best thing is to replace the member with a static
// constexpr (and therefore inline) function, which does not require ODR
// either.
//
// Also, technically according to the C++ standard, we don't have to define
// a constexpr variable if we never odr-use it. But it seems that some
// versions GCC/Clang have buggy determinations on whether or not an
// identifier is odr-used or not, and in any case it's hard to tel if
// a variabe is odr-used or not. So best to just cut the probem at the root.
static constexpr int size() {
return 32 / sizeof(T);
}
Vec256() {}
Vec256(T val) {
for (int i = 0; i != size; i++) {
for (int i = 0; i != size(); i++) {
values[i] = val;
}
}
template<typename... Args,
typename = c10::guts::enable_if_t<(sizeof...(Args) == size)>>
typename = c10::guts::enable_if_t<(sizeof...(Args) == size())>>
Vec256(Args... vals) {
values = { vals... };
}
@ -61,7 +96,7 @@ public:
static Vec256<T> blend(const Vec256<T>& a, const Vec256<T>& b) {
int64_t mask = mask_;
Vec256 vec;
for (int64_t i = 0; i < size; i++) {
for (int64_t i = 0; i < size(); i++) {
if (mask & 0x01) {
vec[i] = b[i];
} else {
@ -74,9 +109,9 @@ public:
static Vec256<T> blendv(const Vec256<T>& a, const Vec256<T>& b,
const Vec256<T>& mask) {
Vec256 vec;
int_same_size_t<T> buffer[size];
int_same_size_t<T> buffer[size()];
mask.store(buffer);
for (int64_t i = 0; i < size; i++) {
for (int64_t i = 0; i < size(); i++) {
if (buffer[i] & 0x01)
{
vec[i] = b[i];
@ -88,14 +123,14 @@ public:
}
static Vec256<T> arange(T base = static_cast<T>(0), T step = static_cast<T>(1)) {
Vec256 vec;
for (int64_t i = 0; i < size; i++) {
for (int64_t i = 0; i < size(); i++) {
vec.values[i] = base + i * step;
}
return vec;
}
static Vec256<T> set(const Vec256<T>& a, const Vec256<T>& b, int64_t count = size) {
static Vec256<T> set(const Vec256<T>& a, const Vec256<T>& b, int64_t count = size()) {
Vec256 vec;
for (int64_t i = 0; i < size; i++) {
for (int64_t i = 0; i < size(); i++) {
if (i < count) {
vec[i] = b[i];
} else {
@ -114,7 +149,7 @@ public:
std::memcpy(vec.values, ptr, count * sizeof(T));
return vec;
}
void store(void* ptr, int count = size) const {
void store(void* ptr, int count = size()) const {
std::memcpy(ptr, values, count * sizeof(T));
}
const T& operator[](int idx) const {
@ -125,14 +160,14 @@ public:
}
Vec256<T> map(T (*f)(T)) const {
Vec256<T> ret;
for (int64_t i = 0; i != size; i++) {
for (int64_t i = 0; i != size(); i++) {
ret[i] = f(values[i]);
}
return ret;
}
Vec256<T> abs() const {
Vec256<T> ret;
for (int64_t i = 0; i < size; i++) {
for (int64_t i = 0; i < size(); i++) {
ret[i] = values[i] < 0 ? -values[i] : values[i];
}
return ret;
@ -214,7 +249,7 @@ public:
}
Vec256<T> pow(const Vec256<T> &exp) const {
Vec256<T> ret;
for (int64_t i = 0; i < size; i++) {
for (int64_t i = 0; i < size(); i++) {
ret[i] = std::pow(values[i], exp[i]);
}
return ret;
@ -222,7 +257,7 @@ public:
#define DEFINE_COMP(binary_pred) \
Vec256<T> operator binary_pred(const Vec256<T> &other) const { \
Vec256<T> vec; \
for (int64_t i = 0; i != size; i++) { \
for (int64_t i = 0; i != size(); i++) { \
if (values[i] binary_pred other.values[i]) { \
std::memset(static_cast<void*>(vec.values + i), 0xFF, sizeof(T)); \
} else { \
@ -242,7 +277,7 @@ public:
template <class T> Vec256<T> inline operator+(const Vec256<T> &a, const Vec256<T> &b) {
Vec256<T> c = Vec256<T>();
for (int i = 0; i != Vec256<T>::size; i++) {
for (int i = 0; i != Vec256<T>::size(); i++) {
c[i] = a[i] + b[i];
}
return c;
@ -250,7 +285,7 @@ template <class T> Vec256<T> inline operator+(const Vec256<T> &a, const Vec256<T
template <class T> Vec256<T> inline operator-(const Vec256<T> &a, const Vec256<T> &b) {
Vec256<T> c = Vec256<T>();
for (int i = 0; i != Vec256<T>::size; i++) {
for (int i = 0; i != Vec256<T>::size(); i++) {
c[i] = a[i] - b[i];
}
return c;
@ -258,7 +293,7 @@ template <class T> Vec256<T> inline operator-(const Vec256<T> &a, const Vec256<T
template <class T> Vec256<T> inline operator*(const Vec256<T> &a, const Vec256<T> &b) {
Vec256<T> c = Vec256<T>();
for (int i = 0; i != Vec256<T>::size; i++) {
for (int i = 0; i != Vec256<T>::size(); i++) {
c[i] = a[i] * b[i];
}
return c;
@ -266,7 +301,7 @@ template <class T> Vec256<T> inline operator*(const Vec256<T> &a, const Vec256<T
template <class T> Vec256<T> inline operator/(const Vec256<T> &a, const Vec256<T> &b) __ubsan_ignore_float_divide_by_zero__ {
Vec256<T> c = Vec256<T>();
for (int i = 0; i != Vec256<T>::size; i++) {
for (int i = 0; i != Vec256<T>::size(); i++) {
c[i] = a[i] / b[i];
}
return c;
@ -276,7 +311,7 @@ template <class T> Vec256<T> inline operator/(const Vec256<T> &a, const Vec256<T
// either input is a NaN.
template <class T> Vec256<T> inline maximum(const Vec256<T> &a, const Vec256<T> &b) {
Vec256<T> c = Vec256<T>();
for (int i = 0; i != Vec256<T>::size; i++) {
for (int i = 0; i != Vec256<T>::size(); i++) {
c[i] = (a[i] > b[i]) ? a[i] : b[i];
if (std::is_floating_point<T>::value && std::isnan(a[i])) {
// If either input is NaN, propagate a NaN.
@ -301,7 +336,7 @@ inline T maximum(const T& a, const T& b) {
// either input is a NaN.
template <class T> Vec256<T> inline minimum(const Vec256<T> &a, const Vec256<T> &b) {
Vec256<T> c = Vec256<T>();
for (int i = 0; i != Vec256<T>::size; i++) {
for (int i = 0; i != Vec256<T>::size(); i++) {
c[i] = (a[i] < b[i]) ? a[i] : b[i];
if (std::is_floating_point<T>::value && std::isnan(a[i])) {
// If either input is NaN, propagate a NaN.
@ -327,8 +362,8 @@ inline T minimum(const T& a, const T& b) {
template <class T> \
Vec256<T> inline operator op(const Vec256<T> &a, const Vec256<T> &b) { \
using iT = int_same_size_t<T>; \
iT buffer[Vec256<T>::size]; \
for (int64_t i = 0; i != Vec256<T>::size; i++) { \
iT buffer[Vec256<T>::size()]; \
for (int64_t i = 0; i != Vec256<T>::size(); i++) { \
auto a_val = a[i]; \
auto b_val = b[i]; \
iT *i_a_ptr = reinterpret_cast<iT*>(&a_val); \
@ -350,7 +385,7 @@ inline T fmadd(const T& a, const T& b, const T& c) {
template <int64_t scale = 1, typename T = void>
c10::guts::enable_if_t<scale == 1 || scale == 2 || scale == 4 || scale == 8, Vec256<T>>
inline gather(T const* base_addr, const Vec256<int_same_size_t<T>>& vindex) {
static constexpr int size = Vec256<T>::size;
static constexpr int size = Vec256<T>::size();
int_same_size_t<T> index_arr[size];
vindex.store(static_cast<void*>(index_arr));
T buffer[size];
@ -364,7 +399,7 @@ template <int64_t scale = 1, typename T = void>
c10::guts::enable_if_t<scale == 1 || scale == 2 || scale == 4 || scale == 8, Vec256<T>>
inline mask_gather(const Vec256<T>& src, T const* base_addr,
const Vec256<int_same_size_t<T>>& vindex, Vec256<T>& mask) {
static constexpr int size = Vec256<T>::size;
static constexpr int size = Vec256<T>::size();
T src_arr[size];
int_same_size_t<T> mask_arr[size]; // use int type so we can logical and
int_same_size_t<T> index_arr[size];
@ -392,7 +427,7 @@ namespace {
template<typename dst_t, typename src_t>
struct CastImpl {
static inline Vec256<dst_t> apply(const Vec256<src_t>& src) {
src_t src_arr[Vec256<src_t>::size];
src_t src_arr[Vec256<src_t>::size()];
src.store(static_cast<void*>(src_arr));
return Vec256<dst_t>::loadu(static_cast<const void*>(src_arr));
}
@ -412,7 +447,7 @@ Vec256<dst_t> cast(const Vec256<src_t>& src) {
template <typename T>
inline Vec256<int_same_size_t<T>> convert_to_int_of_same_size(const Vec256<T>& src) {
static constexpr int size = Vec256<T>::size;
static constexpr int size = Vec256<T>::size();
T src_arr[size];
src.store(static_cast<void*>(src_arr));
int_same_size_t<T> buffer[size];
@ -427,9 +462,9 @@ inline Vec256<int_same_size_t<T>> convert_to_int_of_same_size(const Vec256<T>& s
// returns: Vec256<float> = {a0, a1, a2, a3, a4, a5, a6, a7}
// Vec256<float> = {b0, b1, b2, b3, b4, b5, b6, b7}
template <typename T>
inline c10::guts::enable_if_t<Vec256<T>::size % 2 == 0, std::pair<Vec256<T>, Vec256<T>>>
inline c10::guts::enable_if_t<Vec256<T>::size() % 2 == 0, std::pair<Vec256<T>, Vec256<T>>>
deinterleave2(const Vec256<T>& a, const Vec256<T>& b) {
static constexpr int size = Vec256<T>::size;
static constexpr int size = Vec256<T>::size();
static constexpr int half_size = size / 2;
T a_arr[size];
T b_arr[size];
@ -453,9 +488,9 @@ deinterleave2(const Vec256<T>& a, const Vec256<T>& b) {
// returns: Vec256<float> = {a0, b0, a1, b1, a2, b2, a3, b3}
// Vec256<float> = {a4, b4, a5, b5, a6, b6, a7, b7}
template <typename T>
inline c10::guts::enable_if_t<Vec256<T>::size % 2 == 0, std::pair<Vec256<T>, Vec256<T>>>
inline c10::guts::enable_if_t<Vec256<T>::size() % 2 == 0, std::pair<Vec256<T>, Vec256<T>>>
interleave2(const Vec256<T>& a, const Vec256<T>& b) {
static constexpr int size = Vec256<T>::size;
static constexpr int size = Vec256<T>::size();
static constexpr int half_size = size / 2;
T a_arr[size];
T b_arr[size];
@ -475,7 +510,9 @@ interleave2(const Vec256<T>& a, const Vec256<T>& b) {
template <typename src_T, typename dst_T>
void convert(const src_T *src, dst_T *dst, int64_t n) {
#pragma unroll
#ifndef _MSC_VER
# pragma unroll
#endif
for (int64_t i = 0; i < n; i++) {
*dst = static_cast<dst_T>(
static_cast<at::native::inter_copy_type_t<dst_T>>(*src));

View File

@ -1,13 +1,14 @@
#pragma once
#include "intrinsics.h"
#include "vec256_base.h"
#include <ATen/cpu/vec256/intrinsics.h>
#include <ATen/cpu/vec256/vec256_base.h>
#if defined(__AVX__) && !defined(_MSC_VER)
#include <sleef.h>
#endif
namespace at {
namespace vec256 {
// See Note [Acceptable use of anonymous namespace in header]
namespace {
#if defined(__AVX__) && !defined(_MSC_VER)
@ -16,7 +17,9 @@ template <> class Vec256<double> {
private:
__m256d values;
public:
static constexpr int size = 4;
static constexpr int size() {
return 4;
}
Vec256() {}
Vec256(__m256d v) : values(v) {}
Vec256(double val) {
@ -40,7 +43,7 @@ public:
return Vec256<double>(base, base + step, base + 2 * step, base + 3 * step);
}
static Vec256<double> set(const Vec256<double>& a, const Vec256<double>& b,
int64_t count = size) {
int64_t count = size()) {
switch (count) {
case 0:
return a;
@ -53,22 +56,22 @@ public:
}
return b;
}
static Vec256<double> loadu(const void* ptr, int64_t count = size) {
if (count == size)
static Vec256<double> loadu(const void* ptr, int64_t count = size()) {
if (count == size())
return _mm256_loadu_pd(reinterpret_cast<const double*>(ptr));
__at_align32__ double tmp_values[size];
__at_align32__ double tmp_values[size()];
std::memcpy(
tmp_values,
reinterpret_cast<const double*>(ptr),
count * sizeof(double));
return _mm256_load_pd(tmp_values);
}
void store(void* ptr, int count = size) const {
if (count == size) {
void store(void* ptr, int count = size()) const {
if (count == size()) {
_mm256_storeu_pd(reinterpret_cast<double*>(ptr), values);
} else if (count > 0) {
double tmp_values[size];
double tmp_values[size()];
_mm256_storeu_pd(reinterpret_cast<double*>(tmp_values), values);
std::memcpy(ptr, tmp_values, count * sizeof(double));
}
@ -252,7 +255,7 @@ template <>
void convert(const double* src, double* dst, int64_t n) {
int64_t i;
#pragma unroll
for (i = 0; i <= (n - Vec256<double>::size); i += Vec256<double>::size) {
for (i = 0; i <= (n - Vec256<double>::size()); i += Vec256<double>::size()) {
_mm256_storeu_pd(dst + i, _mm256_loadu_pd(src + i));
}
#pragma unroll

View File

@ -1,13 +1,14 @@
#pragma once
#include "intrinsics.h"
#include "vec256_base.h"
#include <ATen/cpu/vec256/intrinsics.h>
#include <ATen/cpu/vec256/vec256_base.h>
#if defined(__AVX__) && !defined(_MSC_VER)
#include <sleef.h>
#endif
namespace at {
namespace vec256 {
// See Note [Acceptable use of anonymous namespace in header]
namespace {
#if defined(__AVX__) && !defined(_MSC_VER)
@ -16,7 +17,9 @@ template <> class Vec256<float> {
private:
__m256 values;
public:
static constexpr int size = 8;
static constexpr int size() {
return 8;
}
Vec256() {}
Vec256(__m256 v) : values(v) {}
Vec256(float val) {
@ -43,7 +46,7 @@ public:
base + 4 * step, base + 5 * step, base + 6 * step, base + 7 * step);
}
static Vec256<float> set(const Vec256<float>& a, const Vec256<float>& b,
int64_t count = size) {
int64_t count = size()) {
switch (count) {
case 0:
return a;
@ -64,19 +67,19 @@ public:
}
return b;
}
static Vec256<float> loadu(const void* ptr, int64_t count = size) {
if (count == size)
static Vec256<float> loadu(const void* ptr, int64_t count = size()) {
if (count == size())
return _mm256_loadu_ps(reinterpret_cast<const float*>(ptr));
__at_align32__ float tmp_values[size];
__at_align32__ float tmp_values[size()];
std::memcpy(
tmp_values, reinterpret_cast<const float*>(ptr), count * sizeof(float));
return _mm256_loadu_ps(tmp_values);
}
void store(void* ptr, int64_t count = size) const {
if (count == size) {
void store(void* ptr, int64_t count = size()) const {
if (count == size()) {
_mm256_storeu_ps(reinterpret_cast<float*>(ptr), values);
} else if (count > 0) {
float tmp_values[size];
float tmp_values[size()];
_mm256_storeu_ps(reinterpret_cast<float*>(tmp_values), values);
std::memcpy(ptr, tmp_values, count * sizeof(float));
}
@ -260,7 +263,7 @@ template <>
void convert(const float* src, float* dst, int64_t n) {
int64_t i;
#pragma unroll
for (i = 0; i <= (n - Vec256<float>::size); i += Vec256<float>::size) {
for (i = 0; i <= (n - Vec256<float>::size()); i += Vec256<float>::size()) {
_mm256_storeu_ps(dst + i, _mm256_loadu_ps(src + i));
}
#pragma unroll

View File

@ -1,7 +1,7 @@
#pragma once
#include "intrinsics.h"
#include "vec256_base.h"
#include <ATen/cpu/vec256/intrinsics.h>
#include <ATen/cpu/vec256/vec256_base.h>
namespace at {
namespace vec256 {
@ -22,7 +22,9 @@ public:
template <>
struct Vec256<int64_t> : public Vec256i {
static constexpr int size = 4;
static constexpr int size() {
return 4;
}
using Vec256i::Vec256i;
Vec256() {}
Vec256(int64_t v) { values = _mm256_set1_epi64x(v); }
@ -31,7 +33,7 @@ struct Vec256<int64_t> : public Vec256i {
}
template <int64_t mask>
static Vec256<int64_t> blend(Vec256<int64_t> a, Vec256<int64_t> b) {
__at_align32__ int64_t tmp_values[size];
__at_align32__ int64_t tmp_values[size()];
a.store(tmp_values);
if (mask & 0x01)
tmp_values[0] = _mm256_extract_epi64(b.values, 0);
@ -51,7 +53,7 @@ struct Vec256<int64_t> : public Vec256i {
return Vec256<int64_t>(base, base + step, base + 2 * step, base + 3 * step);
}
static Vec256<int64_t>
set(Vec256<int64_t> a, Vec256<int64_t> b, int64_t count = size) {
set(Vec256<int64_t> a, Vec256<int64_t> b, int64_t count = size()) {
switch (count) {
case 0:
return a;
@ -68,15 +70,15 @@ struct Vec256<int64_t> : public Vec256i {
return _mm256_loadu_si256(reinterpret_cast<const __m256i*>(ptr));
}
static Vec256<int64_t> loadu(const void* ptr, int64_t count) {
__at_align32__ int64_t tmp_values[size];
__at_align32__ int64_t tmp_values[size()];
std::memcpy(tmp_values, ptr, count * sizeof(int64_t));
return loadu(tmp_values);
}
void store(void* ptr, int count = size) const {
if (count == size) {
void store(void* ptr, int count = size()) const {
if (count == size()) {
_mm256_storeu_si256(reinterpret_cast<__m256i*>(ptr), values);
} else if (count > 0) {
__at_align32__ int64_t tmp_values[size];
__at_align32__ int64_t tmp_values[size()];
_mm256_storeu_si256(reinterpret_cast<__m256i*>(tmp_values), values);
std::memcpy(ptr, tmp_values, count * sizeof(int64_t));
}
@ -117,7 +119,9 @@ struct Vec256<int64_t> : public Vec256i {
template <>
struct Vec256<int32_t> : public Vec256i {
static constexpr int size = 8;
static constexpr int size() {
return 8;
}
using Vec256i::Vec256i;
Vec256() {}
Vec256(int32_t v) { values = _mm256_set1_epi32(v); }
@ -139,7 +143,7 @@ struct Vec256<int32_t> : public Vec256i {
base + 4 * step, base + 5 * step, base + 6 * step, base + 7 * step);
}
static Vec256<int32_t>
set(Vec256<int32_t> a, Vec256<int32_t> b, int32_t count = size) {
set(Vec256<int32_t> a, Vec256<int32_t> b, int32_t count = size()) {
switch (count) {
case 0:
return a;
@ -164,15 +168,15 @@ struct Vec256<int32_t> : public Vec256i {
return _mm256_loadu_si256(reinterpret_cast<const __m256i*>(ptr));
}
static Vec256<int32_t> loadu(const void* ptr, int32_t count) {
__at_align32__ int32_t tmp_values[size];
__at_align32__ int32_t tmp_values[size()];
std::memcpy(tmp_values, ptr, count * sizeof(int32_t));
return loadu(tmp_values);
}
void store(void* ptr, int count = size) const {
if (count == size) {
void store(void* ptr, int count = size()) const {
if (count == size()) {
_mm256_storeu_si256(reinterpret_cast<__m256i*>(ptr), values);
} else if (count > 0) {
__at_align32__ int32_t tmp_values[size];
__at_align32__ int32_t tmp_values[size()];
_mm256_storeu_si256(reinterpret_cast<__m256i*>(tmp_values), values);
std::memcpy(ptr, tmp_values, count * sizeof(int32_t));
}
@ -212,13 +216,17 @@ template <>
void convert(const int32_t *src, float *dst, int64_t n) {
int64_t i;
// int32_t and float have same size
#pragma unroll
for (i = 0; i <= (n - Vec256<int32_t>::size); i += Vec256<int32_t>::size) {
#ifndef _MSC_VER
# pragma unroll
#endif
for (i = 0; i <= (n - Vec256<int32_t>::size()); i += Vec256<int32_t>::size()) {
auto input_vec = _mm256_loadu_si256(reinterpret_cast<const __m256i*>(src + i));
auto output_vec = _mm256_cvtepi32_ps(input_vec);
_mm256_storeu_ps(reinterpret_cast<float*>(dst + i), output_vec);
}
#pragma unroll
#ifndef _MSC_VER
# pragma unroll
#endif
for (; i < n; i++) {
dst[i] = static_cast<float>(src[i]);
}
@ -228,13 +236,17 @@ template <>
void convert(const int32_t *src, double *dst, int64_t n) {
int64_t i;
// int32_t has half the size of double
#pragma unroll
for (i = 0; i <= (n - Vec256<double>::size); i += Vec256<double>::size) {
#ifndef _MSC_VER
# pragma unroll
#endif
for (i = 0; i <= (n - Vec256<double>::size()); i += Vec256<double>::size()) {
auto input_128_vec = _mm_loadu_si128(reinterpret_cast<const __m128i*>(src + i));
auto output_vec = _mm256_cvtepi32_pd(input_128_vec);
_mm256_storeu_pd(reinterpret_cast<double*>(dst + i), output_vec);
}
#pragma unroll
#ifndef _MSC_VER
# pragma unroll
#endif
for (; i < n; i++) {
dst[i] = static_cast<double>(src[i]);
}
@ -242,7 +254,9 @@ void convert(const int32_t *src, double *dst, int64_t n) {
template <>
struct Vec256<int16_t> : public Vec256i {
static constexpr int size = 16;
static constexpr int size() {
return 16;
}
using Vec256i::Vec256i;
Vec256() {}
Vec256(int16_t v) { values = _mm256_set1_epi16(v); }
@ -255,7 +269,7 @@ struct Vec256<int16_t> : public Vec256i {
}
template <int64_t mask>
static Vec256<int16_t> blend(Vec256<int16_t> a, Vec256<int16_t> b) {
__at_align32__ int16_t tmp_values[size];
__at_align32__ int16_t tmp_values[size()];
a.store(tmp_values);
if (mask & 0x01)
tmp_values[0] = _mm256_extract_epi16(b.values, 0);
@ -303,7 +317,7 @@ struct Vec256<int16_t> : public Vec256i {
base + 12 * step, base + 13 * step, base + 14 * step, base + 15 * step);
}
static Vec256<int16_t>
set(Vec256<int16_t> a, Vec256<int16_t> b, int16_t count = size) {
set(Vec256<int16_t> a, Vec256<int16_t> b, int16_t count = size()) {
switch (count) {
case 0:
return a;
@ -344,15 +358,15 @@ struct Vec256<int16_t> : public Vec256i {
return _mm256_loadu_si256(reinterpret_cast<const __m256i*>(ptr));
}
static Vec256<int16_t> loadu(const void* ptr, int16_t count) {
__at_align32__ int16_t tmp_values[size];
__at_align32__ int16_t tmp_values[size()];
std::memcpy(tmp_values, ptr, count * sizeof(int16_t));
return loadu(tmp_values);
}
void store(void* ptr, int count = size) const {
if (count == size) {
void store(void* ptr, int count = size()) const {
if (count == size()) {
_mm256_storeu_si256(reinterpret_cast<__m256i*>(ptr), values);
} else if (count > 0) {
__at_align32__ int16_t tmp_values[size];
__at_align32__ int16_t tmp_values[size()];
_mm256_storeu_si256(reinterpret_cast<__m256i*>(tmp_values), values);
std::memcpy(ptr, tmp_values, count * sizeof(int16_t));
}
@ -454,11 +468,11 @@ Vec256<int16_t> inline operator*(const Vec256<int16_t>& a, const Vec256<int16_t>
template <typename T>
Vec256<T> inline intdiv_256(const Vec256<T>& a, const Vec256<T>& b) {
T values_a[Vec256<T>::size];
T values_b[Vec256<T>::size];
T values_a[Vec256<T>::size()];
T values_b[Vec256<T>::size()];
a.store(values_a);
b.store(values_b);
for (int i = 0; i != Vec256<T>::size; i++) {
for (int i = 0; i != Vec256<T>::size(); i++) {
values_a[i] /= values_b[i];
}
return Vec256<T>::loadu(values_a);

View File

@ -1,9 +1,9 @@
#pragma once
#include "ATen/Config.h"
#include "ATen/Parallel.h"
#include "ATen/cpu/vec256/functional.h"
#include "ATen/cpu/vec256/vec256.h"
#include <ATen/Config.h>
#include <ATen/Parallel.h>
#include <ATen/cpu/vec256/functional.h>
#include <ATen/cpu/vec256/vec256.h>
// This header implements various unary operations using a MKL VML style
// interface.

View File

@ -7,7 +7,11 @@
namespace at { namespace cuda {
template <typename T, int size>
#ifndef __HIP_PLATFORM_HCC__
struct alignas(16) Array {
#else
struct Array {
#endif
T data[size];
C10_HOST_DEVICE T operator[](int i) const {

View File

@ -1,9 +1,9 @@
#pragma once
#include "ATen/cuda/detail/IndexUtils.cuh"
#include "ATen/TensorUtils.h"
#include "THC/THCAtomics.cuh"
#include "ATen/cuda/CUDAContext.h"
#include <ATen/cuda/detail/IndexUtils.cuh>
#include <ATen/TensorUtils.h>
#include <THC/THCAtomics.cuh>
#include <ATen/cuda/CUDAContext.h>
#include <math.h>
@ -271,7 +271,7 @@ template <typename Op,
typename IndexType,
int ADims,
int step>
#if __CUDA_ARCH__ >= 350
#if __CUDA_ARCH__ >= 350 || defined __HIP_PLATFORM_HCC__
__launch_bounds__(AT_APPLY_THREADS_PER_BLOCK, AT_APPLY_BLOCKS_PER_SM)
#endif
__global__ void kernelPointwiseApply1(detail::TensorInfo<scalar, IndexType> a,
@ -355,7 +355,7 @@ template <typename Op,
typename IndexType,
int ADims, int BDims,
int step>
#if __CUDA_ARCH__ >= 350
#if __CUDA_ARCH__ >= 350 || defined __HIP_PLATFORM_HCC__
__launch_bounds__(AT_APPLY_THREADS_PER_BLOCK, AT_APPLY_BLOCKS_PER_SM)
#endif
__global__ void
@ -464,7 +464,7 @@ template <typename Op,
typename IndexType,
int ADims, int BDims, int CDims,
int step>
#if __CUDA_ARCH__ >= 350
#if __CUDA_ARCH__ >= 350 || defined __HIP_PLATFORM_HCC__
__launch_bounds__(AT_APPLY_THREADS_PER_BLOCK, AT_APPLY_BLOCKS_PER_SM)
#endif
__global__ void
@ -587,7 +587,7 @@ template <typename Op,
typename IndexType,
int ADims, int BDims, int CDims, int DDims,
int step>
#if __CUDA_ARCH__ >= 350
#if __CUDA_ARCH__ >= 350 || defined __HIP_PLATFORM_HCC__
__launch_bounds__(AT_APPLY_THREADS_PER_BLOCK, AT_APPLY_BLOCKS_PER_SM)
#endif
__global__ void

View File

@ -1,5 +1,7 @@
#include "ATen/cuda/CUDAContext.h"
#include "THC/THCGeneral.hpp"
#include <ATen/cuda/CUDAContext.h>
#include <THC/THCGeneral.hpp>
#include <ATen/cuda/CUDAConfig.h>
namespace at { namespace cuda {

View File

@ -1,16 +1,16 @@
#pragma once
#include "ATen/core/ATenGeneral.h"
#include "ATen/Context.h"
#include "ATen/cuda/CUDAStream.h"
#include "ATen/cuda/Exceptions.h"
#include "c10/cuda/CUDAFunctions.h"
#include <ATen/core/ATenGeneral.h>
#include <ATen/Context.h>
#include <c10/cuda/CUDAStream.h>
#include <ATen/cuda/Exceptions.h>
#include <c10/cuda/CUDAFunctions.h>
#include <cstdint>
#include "cuda_runtime_api.h"
#include "cusparse.h"
#include "cublas_v2.h"
#include <cuda_runtime_api.h>
#include <cusparse.h>
#include <cublas_v2.h>
namespace at {
namespace cuda {

View File

@ -1,8 +1,8 @@
#pragma once
#include "ATen/cuda/Exceptions.h"
#include <ATen/cuda/Exceptions.h>
#include "cuda.h"
#include <cuda.h>
namespace at {
namespace cuda {

View File

@ -1,13 +1,13 @@
#pragma once
#include "ATen/cuda/ATenCUDAGeneral.h"
#include "ATen/cuda/CUDAContext.h"
#include "ATen/cuda/CUDAStream.h"
#include "ATen/cuda/CUDAGuard.h"
#include "ATen/cuda/Exceptions.h"
#include "c10/util/Exception.h"
#include <ATen/cuda/ATenCUDAGeneral.h>
#include <ATen/cuda/CUDAContext.h>
#include <c10/cuda/CUDAStream.h>
#include <c10/cuda/CUDAGuard.h>
#include <ATen/cuda/Exceptions.h>
#include <c10/util/Exception.h>
#include "cuda_runtime_api.h"
#include <cuda_runtime_api.h>
#include <cstdint>
#include <utility>
@ -35,7 +35,7 @@ struct AT_CUDA_API CUDAEvent {
~CUDAEvent() {
try {
if (is_created_) {
at::cuda::CUDAGuard device_guard(static_cast<int16_t>(device_index_));
CUDAGuard device_guard(static_cast<int16_t>(device_index_));
cudaEventDestroy(event_);
}
} catch (...) { /* No throw */ }
@ -74,7 +74,7 @@ struct AT_CUDA_API CUDAEvent {
// Note: cudaEventRecord must be called on the same device as the stream.
void record(const CUDAStream& stream) {
at::cuda::CUDAGuard guard(static_cast<int16_t>(stream.device_index()));
CUDAGuard guard(static_cast<int16_t>(stream.device_index()));
if (is_created_) {
AT_ASSERT(device_index_ == stream.device_index());
@ -92,7 +92,7 @@ struct AT_CUDA_API CUDAEvent {
// The event has no actual GPU resources associated with it.
void block(const CUDAStream& stream) {
if (is_created_) {
at::cuda::CUDAGuard guard(static_cast<int16_t>(stream.device_index()));
CUDAGuard guard(static_cast<int16_t>(stream.device_index()));
AT_CUDA_CHECK(cudaStreamWaitEvent(stream, event_, 0));
}
}

View File

@ -1,8 +1,8 @@
#include "ATen/Config.h"
#include <ATen/Config.h>
#include "ATen/CUDAGenerator.h"
#include "ATen/Context.h"
#include "THCTensorRandom.h"
#include <ATen/CUDAGenerator.h>
#include <ATen/Context.h>
#include <THC/THCTensorRandom.h>
#include <stdexcept>
// There is only one CUDAGenerator instance. Calls to seed(), manualSeed(),

View File

@ -1,7 +1,7 @@
#pragma once
#include <c10/util/ArrayRef.h>
#include <ATen/cuda/CUDAStream.h>
#include <c10/cuda/CUDAStream.h>
#include <ATen/cuda/CUDAContext.h>
#include <vector>

View File

@ -1,7 +1,7 @@
#pragma once
#include "ATen/Tensor.h"
#include "ATen/core/Half.h"
#include <ATen/Tensor.h>
#include <ATen/core/Half.h>
#include <cuda.h>
#include <cuda_runtime.h>

View File

@ -1,7 +1,7 @@
#pragma once
#include "c10/util/Exception.h"
#include "c10/cuda/CUDAException.h"
#include <c10/util/Exception.h>
#include <c10/cuda/CUDAException.h>
// See Note [CHECK macro]
#define AT_CUDNN_CHECK(EXPR) \

View File

@ -9,11 +9,11 @@
#include <ATen/native/cuda/CuFFTPlanCache.h>
#include <c10/util/Exception.h>
#include "THC/THC.h"
#include <THC/THC.h>
#include <THC/THCGeneral.hpp>
#if AT_CUDNN_ENABLED()
#include "ATen/cudnn/cudnn-wrapper.h"
#include <ATen/cudnn/cudnn-wrapper.h>
#endif
#include <cuda.h>

View File

@ -1,4 +1,5 @@
#include "IndexUtils.cuh"
#include <ATen/cuda/detail/IndexUtils.cuh>
#include <vector>
namespace at {
namespace cuda {
@ -35,7 +36,7 @@ within the next one.
*/
bool maybeOverlappingIndices(const Tensor& t) {
/* Extract size/stride arrays; only consider size >1 dims. */
SizeAndStride *info = (SizeAndStride *)alloca(sizeof(SizeAndStride) * t.dim());
std::vector<SizeAndStride> info(t.dim());
int dims = t.dim();
int nonSize1Dims = 0;
for (int i = 0; i < dims; ++i) {
@ -58,7 +59,7 @@ bool maybeOverlappingIndices(const Tensor& t) {
}
/* Ascending order (innermost dimension in sorted view is at [0]) */
qsort(info, nonSize1Dims, sizeof(SizeAndStride), compareSizeAndStride);
qsort(info.data(), nonSize1Dims, sizeof(SizeAndStride), compareSizeAndStride);
for (int i = 0; i < (nonSize1Dims - 1); ++i) {
if (((info[i].size - 1) * info[i].stride) >= info[i + 1].stride) {

View File

@ -1,7 +1,7 @@
#pragma once
#include "ATen/ATen.h"
#include "TensorInfo.cuh"
#include <ATen/ATen.h>
#include <ATen/cuda/detail/TensorInfo.cuh>
#include <limits>
namespace at {

View File

@ -1,6 +1,6 @@
#pragma once
#include "ATen/ATen.h"
#include <ATen/ATen.h>
// Contents of this file are copied from THCUNN/common.h for the ease of porting
// THCUNN functions into ATen.

View File

@ -9,20 +9,20 @@
/// OffsetCalculator calculates the offset in bytes of a linear index for NARGS
/// operands that share the same shape, but may have different strides.
template <int NARGS>
template <int NARGS, typename index_t = uint32_t>
struct OffsetCalculator {
static constexpr int MAX_DIMS = 25;
// The offset for each argument (in bytes). Wrapper around fixed-size array.
using offset_type = at::cuda::Array<uint32_t, NARGS>;
using offset_type = at::cuda::Array<index_t, NARGS>;
OffsetCalculator(int dims, const int64_t* sizes, const int64_t* const* strides) : dims(dims) {
AT_CHECK(dims <= MAX_DIMS, "tensor has too many (>25) dims");
for (int i = 0; i < MAX_DIMS; ++i) {
if (i < dims) {
sizes_[i] = IntDivider<uint32_t>(sizes[i]);
sizes_[i] = IntDivider<index_t>(sizes[i]);
} else {
sizes_[i] = IntDivider<uint32_t>(1);
sizes_[i] = IntDivider<index_t>(1);
}
for (int arg = 0; arg < NARGS; arg++) {
strides_[i][arg] = i < dims ? strides[arg][i] : 0;
@ -30,7 +30,7 @@ struct OffsetCalculator {
}
}
C10_HOST_DEVICE offset_type get(uint32_t linear_idx) const {
C10_HOST_DEVICE offset_type get(index_t linear_idx) const {
offset_type offsets;
#pragma unroll
for (int arg = 0; arg < NARGS; arg++) {
@ -55,6 +55,6 @@ struct OffsetCalculator {
}
int dims;
IntDivider<uint32_t> sizes_[MAX_DIMS];
uint32_t strides_[MAX_DIMS][NARGS];
IntDivider<index_t> sizes_[MAX_DIMS];
index_t strides_[MAX_DIMS][NARGS];
};

View File

@ -1,4 +1,4 @@
#include "Descriptors.h"
#include <ATen/cudnn/Descriptors.h>
#include <ATen/ATen.h>

View File

@ -1,12 +1,12 @@
#pragma once
#include "ATen/cuda/CUDAContext.h"
#include "ATen/cuda/Exceptions.h"
#include <ATen/cuda/CUDAContext.h>
#include <ATen/cuda/Exceptions.h>
#include "cudnn-wrapper.h"
#include <ATen/cudnn/cudnn-wrapper.h>
#include <ATen/ATen.h>
#include <ATen/TensorUtils.h>
#include "ATen/cuda/ATenCUDAGeneral.h"
#include <ATen/cuda/ATenCUDAGeneral.h>
#include <cuda.h>
#if CUDNN_VERSION < 7000

View File

@ -1,6 +1,6 @@
#include "Handle.h"
#include <ATen/cudnn/Handle.h>
#include "ATen/cuda/Exceptions.h"
#include <ATen/cuda/Exceptions.h>
#include <unordered_map>
#include <mutex>

View File

@ -1,7 +1,7 @@
#pragma once
#include "cudnn-wrapper.h"
#include "ATen/cuda/ATenCUDAGeneral.h"
#include <ATen/cudnn/cudnn-wrapper.h>
#include <ATen/cuda/ATenCUDAGeneral.h>
namespace at { namespace native {

Some files were not shown because too many files have changed in this diff Show More