Commit Graph

641 Commits

Author SHA1 Message Date
e855317370 Make dirichlet_grad and standard_gamma match ATen declarations (#4722)
The Python function has an underscore (_) prefix so the C++
IMPLEMENT_STATELESS call should have an underscore prefix as well.
2018-01-18 16:49:18 -05:00
1061d7970d Move broadcast and broadcast_coalesced to C++ 2018-01-18 11:16:45 +01:00
57549b7e44 Bind functions with out= arguments in VariableType (#4565)
This adds overrides in VariableType for the xxx_out ATen functions and
implements Python bindings. There is no support for automatic
differentiation. If any of the inputs (or outputs) requires grad, then the
function will throw an exception unless it's running in "no-grad" mode.

The bindings for calling torch.xxx functions on Variables are moved to a
different object. Previously, they were static method on VariableBase.
This change prevents users from accidentally calling static methods as if
they were instance methods.
2018-01-17 18:27:42 -05:00
f4a75deccf Fix the inconsistency of polygamma on Tensor and Variable, for issue #4466 (#4527)
* Fix the inconsistency of `polygamma` on Tensor and Variable.

Signed-off-by: HE, Tao <sighingnow@gmail.com>

* Regression test for #4466, polygamma works on variables.

Signed-off-by: HE, Tao <sighingnow@gmail.com>

* Add macro IMPLEMENT_STATELESS_SWAP to dispatch stateless methods on Variables correctly.

When call stateless methods with more than one arguments and the `self` comes second,
the `self` argument needs to be swapped to the first position before dispatching.

The macro `IMPLEMENT_STATELESS_ADDXX` is still reserved for deprecated `add**`
methods.

Signed-off-by: HE, Tao <sighingnow@gmail.com>
2018-01-09 10:39:09 -05:00
35abc4efa2 Add low-precision digamma() and polygamma() functions (#4399) 2018-01-02 11:53:23 +01:00
e519ef5337 Adding torch.expm1() and its inplace function (#4350) 2017-12-28 18:56:03 +09:00
658d4c7ea8 allow optional int tensor 2017-12-24 03:08:28 +08:00
5f7c5502b8 Further improvements to ATen convolution (#4287)
- Rename THNN convolution to have thnn_ prefix.
- Propagate CuDNN benchmark and deterministic to at::Context
- Add 'convolution', 'convNd' and 'conv_transposeNd' native wrappers, with defaults
  The conv_transposeNd wrappers are updated to have the same argument
  order as Python.
- torch.nn.functional directly dispatches to the native wrappers
- Make it possible to turn off tracing for some native wrappers, so I don't
  have to write symbolics for all the functions above
- Spectral ops can now make use of CuDNN convolution if possible
- Better commentary on cudnn_batch_norm
- Turn on DCE for all JIT tests.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-12-21 13:03:43 -05:00
9bf5e40dfa Refactor cudnn code layout / make build more robust. (#4201)
* Refactor cudnn code layout / make build more robust.

When I previously moved cuDNN into ATen, I wasn't too familiar with the
ATen native function directory layout, and so I did a number of
suboptimal things.  This commit fixes those problems.

- If NO_CUDA was set but cuDNN is installed on your system, we'd incorrectly
  assume that CUDNN was enabled, to hilarious effect.

- We now distinguish between cudnn implementation files and cudnn
  native function files.  The native files now live in ATen/native/cudnn,
  and are *unconditionally compiled*, even when we are not building with cuDNN.
  This means that we can unconditionally declare cudnn functions in yaml
  and they are always available, even if they are broken.  The cuDNN specific
  files live in 'cudnn', they are *never* installed, and they are used
  purely for implementation purposes.  I had to add stub implementations of
  all ATen functions to achieve this.

- I had written headers for at::native functions manually, but codegen
  will generate them for me automatically.  So I deleted the headers.
  That lets me get rid of some header install logic as well.

- There's a new note about ATen preprocessor philosophy.
2017-12-18 16:47:57 -05:00
d605058212 Replace Variable.volatile with torch.no_grad() (#3970)
This removes volatile from Variable. The functionality is mostly
replaced by a global (thread-local) flag, which is controlled by
torch.set_grad_enabled() and the context manager torch.no_grad().

In C++, the flag is exposed through GradMode::is_enabled() and GradMode::set_enabled()

Fixes #3627
2017-12-18 15:46:13 -05:00
0876bab8b7 Support CPU Apply in ATen and implement standard_gamma using it (#4161)
* Support CPU Apply directly in ATen and implement standard_gamma using it.

Main changes in this PR:
1) Added a TH_APPLY-style templatized function for CPU apply calls (currently only 2 and 3 tensor argument
versions are supported, but more are easy to add).  In fact, this is basically identical to TH_APPLY, except
it uses ATen functions and the API is a template instead of a macro.  The template takes an operation that
is performed on the data (and an indicator to signal early termination); i.e. you don't need to know that
x_data is a pointer to the current data location of x.

2) Refactors the ATen dispatch code to easily generate dispatch code for different subsets of the scalar types.
This is in preference to the template_scalar path, which requires valid specialization of each scalar type.  Valid
specializations are  particularly annoying with CUDA because you most likely can't put the specializations
in a header so need to write some sort of for-all-scalar-type macro to get the correct specializations.
Currently, we only generate dispatch_all (all scalar types, the equivalent existed already), and
dispatch_cpu_floating_types (which is used by standard_gamma).

3) Implements standard_gamma using the above changes (this is an arbitrary choice, it was the latest
apply macro to be committed).  The forward is bound via Declarations.yaml,
the backward via the Apply template, and then they are hooked together in derivatives.yaml.  This eliminates
needing to change TH at all going forward, which means one can write idiomatic C++ instead of the TH-style macros
(e.g. TH_MATH_NAME).

* Generate Dispatch code with nicer spacing.

* Small cleanups.

* Fix typo.

* Add TODOs for changing macros, remove dead code.

* Use a lambda function.

* Get rid of early exit.

* Rename Scalar,ScalarType template parameters to CScalar.

* Reorder _standard_gamma_grad parameters.

* Add comments explaining calling convention.

* Don't generate Dispatch.h anymore.

* Get rid of backend specific checks in dispatch.

* Fix empty/scalar check.
2017-12-18 15:45:01 -05:00
ee98e7a82e Implement Dirichlet and Beta distributions (#4117) 2017-12-18 19:11:37 +01:00
d8c5f2ae21 Fix a bug where from_dlpack failes if cuda is not initialized. (#4182) 2017-12-14 21:54:36 -05:00
787b9c5202 Propagate CuDNN enabled to ATen library. (#4104)
This is not currently used by anything, but eventually ATen
will need to make decisions about whether or not to use
CuDNN functions or not, which means we need to propagate
this variable to ATen.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-12-14 11:29:25 -05:00
05ebd21a36 Implement reparameterized gradient for Gamma sampler (#3978) 2017-12-11 03:32:15 -08:00
1c96809cf8 Bind cauchy_, exponential_, normal_, uniform_ functions to THPVariable. (#3945)
* Bind cauchy_, exponential_, normal_, uniform_ functions to THPVariable.

Also changes the error messages around Generator parser; previously, you'd get an error
like: torch._C.Generator is not a torch.Generator; now the check is proper but returns
that only None is supported.

* Support passing Generators to ATen Variable-bound methods.

This involves changing THPGenerator to have an at::Generator rather than a THGenerator.
TH getRNGState, setRNGState are still called directly because they are not bound from ATen yet;
they should probably be on the Generators and return (opaque) GenerateState objects.

* Fix default values.

* Properly use THRandom_initialSeed.

* update standard gamma to use new default generator.
2017-12-07 14:34:51 -08:00
d0cabbde74 Implement Variable.from_numpy (#4043)
Implements from_numpy using ATen tensors. Variable.from_numpy is a
convenient placeholder for the variant that returns Variables until we
merge Tensor and Variable.

The behavior is slightly changed:

 - from_numpy() on an empty array now returns an empty tensor instead of
   throwing an exception. The shape may not be preserved.
 - CharTensor(ndarray) used to throw an exception. It now copies the
   ndarray. Copying is implemented via ATen toType.
2017-12-06 14:08:56 -05:00
165d0897e4 Implement distributions.Gamma (#3841) 2017-12-02 01:10:08 +01:00
1c0fbd27a1 CuDNN bindings rewrite (into ATen) (#3666)
* Comprehensive rewrite of Torch CuDNN bindings / a bit of ATen infra

The executive summary is that this moves the torch/csrc/cudnn
library into ATen, adding a number of new cudnn_ methods to ATen
for batchnorm, convolution, affine grid generator and grid sampler.

ATen infra changes:

- TensorGeometry was moved to ATen
- TensorGeometry was modified to make its interface resemble that of
  Tensor; in particular, sizes is no longer a field, it's a method.
- AT_CUDA_ENABLED macro is set via ATen/Config.h header which is
  generated at cmake configure time.
  Fixes https://github.com/zdevito/ATen/issues/168
- Change AT_CUDA_ENABLED macro to be a function macro, so that we
  error if it is not defined
- Introduce a new TensorArg class, which is a Tensor plus a little
  metadata.  This helps us give good error messages when checking
  dimensions/shapes of tensors.
  Fixes https://github.com/zdevito/ATen/issues/169
- Also introduce a TensorGeometryArg class, for when you don't
  need the actual tensor data (which is most of the time.)
- Add ATen/Check.h, which contains a number of utility functions
  for testing shapes, types and devices of input tensors.  This
  will be particulary useful for native methods, which don't get
  code generated input testing code.  These functions take a
  'CheckedFrom' argument, at the moment just a string, which
  specifies some extra information about what function was
  doing the actual checking; this greatly improves error messages.
    - Many check functions take initializer lists, which let you
      test that all tensors have some property.  This API is
      peculiar, in that we IGNORE undefined tensors in this case.
      This is handled by filterDefined.
- Add AT_CUDNN_ENABLED macro
- CuDNN linking from ATen was improved; for example, we now actually
  add the CuDNN headers to our include path.
- Add some missing override specifiers to some methods
- We now actually build tests with CUDA functionality accessible
  (previously, AT_CUDA_ENABLED was not defined, meaning that
  the headers were missing all CUDA-only functionality.)
- Native functions now support giving explicit names to return
  outputs in yaml.  This makes it possible to hook into the NN
  autogenerated derivatives codepath using native functions.

CuDNN rewrite changes:

- torch/csrc/cudnn now uses ATen (rather than passing around
  THVoidTensor) and lives in ATen.  This lets us remove tensorPointer
  shenanigans.  The functions are exposed to ATen as native functions
  described in aten/src/ATen/cudnn/cuDNN.yaml
- ATen now builds and links against CuDNN when enabled.  The cmake
  package script was taken from Caffe2.
- Some header reorganization was done to help reduce dependencies
  on headers (this reorg is no longer used but I've kept it)
- Rename CHECK to CUDNN_CHECK
- Rip out old shape/type testing code in favor of modern ATen/Check.h
  interface using TensorArg.  In many cases, increase the robustness of
  the checking code.
- Change the inputs of the public facing functions, so that they can
  be bound by ATen
  - Delete THCState*; this is retrieved from the global ATen context
  - Delete cudnnHandle_t, this is retrieved from the global Handles.h
  - Delete cudnnDataType_t, this is retrieved from the Tensor type
  - Delete Convolution class, instead its constituent arguments are
    passed individually
- Change functions to return tensors, rather than take an appropriately
  sized output tensor as an input.
- Redo how transposed convolution / backward convolution is implemented
  (knock on effect of returning tensors).  Previously it was assumed
  that you would always pass an appropriately sized output tensor, but
  we don't want to do this anymore.  For backwards, we instead give
  the desired output tensor (input, really) size, because that is
  readily available.  For *transposed* convolution, however, we take
  output_padding, and otherwise do the shape calculation.
- Redo how legacy group convolution is implemented (knock on effect from
  porting cudnn to ATen.)  Previously, group convolution was implemented
  by manually constructing sizes and strides and then outputting
  appropriate, with macros switching between individual groups and
  all-at-once based on CuDNN version.  Now, the code looks exactly what
  you'd expect: there's a top-level wrapping function that supports
  group convolution no matter the version of CuDNN, and a low-level
  wrapper which supports only what CuDNN supports.  The top-level
  function conditions on CuDNN version, and invokes the low-level
  interface 1 or n times.
- There is now a debugging printer for tensor descriptors.
- Convolution struct is replaced with ConvolutionArgs, which is not
  part of the public API but is used internally to conveniently
  pass around all of the arguments needed for Convolution.
- Add some constexprs for well-known dimensions, reduce amount of
  magic numbers in code.
- Put 'deterministic' in to ConvParams.  Fixes #3659
- Lots more comments.
- Some pessimizations, in the name of code clarity:
  - The descriptors are initialized on every invocation of convolution
    forward/backward.  Previously, the descriptors were cached, so that
    you didn't have to initialize them again on backwards.  This is
    difficult to support in the ATen interface so I didn't support it.
  - Legacy group convolution initializes its workspace for *every* group
    it performs.  I did not feel motivated to fix this because the
    legacy codepath is already quite slow.
- Affine grid generator and grid sampler automatically call contiguous
  on their arguments as necessary.
- Batchnorm input checking is greatly beefed up, it now checks for
  the following input characteristics:
    - Definedness
    - GPU location
    - Type
    - Contiguity
    - Size

PyTorch binding code changes

- batchnorm now uses consistent var/data naming
- batchnorm and convolution make use of new ATen bindings
- Affine grid generator and grid sampler make use of ATen CuDNN
  bindings via derivatives.yaml.  This means I had to restructure
  the code a little, since the THNN bindings still go through
  a legacy Python class.
- I fixed some warnings:
  - s/friend class/friend struct/ on InterpreterStateImpl
  - Removed pessimizing move 'detached' in torch/csrc/autograd/variable.cpp
  - Removed unused pack_list on Scalar

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

GCC 4.8 buildfix

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Add TensorGeometry to ATen.h

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

CUDNN_CHECK

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Update TODO comment

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Delete return in cudnn_grid_sampler

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

s/cudnnSetStreamToCurrent/setCuDNNStreamToCurrent/g

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Don't allocate a new vector when filtering defined.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Remove Check overloads, convert to pass references.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Some more microbenchmarking.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-11-30 23:06:58 -05:00
1661370ac5 Signal handling in DataLoader workers; Timeout option (#3474) 2017-11-29 23:52:14 +01:00
4bce69be22 Implement Variable.storage() (#3765)
This still uses THPStorage, but avoids touching THPTensor
2017-11-20 14:18:07 -05:00
aa911939a3 Improve Windows Compatibility (for csrc/scripts) (#2941) 2017-11-08 19:51:35 +01:00
5e382894be add numpy() and from_numpy() to HalfTensor (#2953) 2017-11-08 15:01:29 +01:00
7c0b16c140 Add torch.take and Tensor.put_ (#3263)
* Add torch.take and Tensor.put_

These are similar to numpy.take and numpy.put. The take function allows
you to linearly index into a tensor without viewing it as a 1D tensor
first. The output has the same shape as the indices. The put function
copies value into a tensor also using linear indices.
2017-11-01 06:04:44 -04:00
a65db4e956 Use ATen for torch.cat, torch.addmm, and friends on Variables. (#3286)
This includes some changes to the dispatch code for torch.xxx functions:

 - Since Variable.addmm is an instance-method, the self argument has to
   come first. The dispatch code swaps the first two arguments if
   necessary to suppor the deprecated signatures where 'alpha' or 'beta'
   comes before the 'self' tensor.
 - Delete IMPLEMENT_STATELESS_REVERSED. These functions require output
   arguments to be passed in using the keyword 'out'. They were meant to
   handle torch.gt(out, a, b), but we haven't allowed that for a while.
2017-10-25 14:27:45 -04:00
f1f64c8d07 Generate autograd functions for NN / more refactors (#3136)
Generate autograd functions for NN and implement more derivatives in derivatives.yaml

A big refactor of gen_variable_type.py
2017-10-19 15:03:26 -04:00
f9ee52efa9 Update DLPack bindings 2017-10-19 10:06:53 -04:00
47beb64b5c Use ATen generator as default CPU generator (#3135)
ATen has it's own default CPU RNG. Use this as the default in PyTorch so
that random functions called through ATen have the same behavior as
random functions called through TensorMethods
2017-10-16 14:22:58 -04:00
756ab3f24f Adding conversion from python tensor to dlpack tensor (#2933) 2017-10-04 08:35:42 -04:00
b3bc5fe302 refactor THCP method defs into cuda/Module.cpp 2017-09-30 13:14:35 -07:00
2b9765ad02 Erf and erfinv (#2799) 2017-09-20 21:23:45 -04:00
8dae433de8 Move JIT passes to a separate directory 2017-09-19 10:53:32 -04:00
2e266837f5 Port TracingState to pybind11, new export() method.
Along the way I added converters for Variable and TracingInput.  Variable should
probably be moved to a more widely known spot.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
594f98ce16 Support multi-stage AutogradClosures 2017-09-05 17:48:55 -04:00
a3fdb281d1 Python wrapper for Node IR using pybind11
Supports almost all of the IR API.
2017-09-05 17:48:55 -04:00
bdcbbeaf68 Remove GlobalTracingState 2017-09-05 17:48:55 -04:00
e186d16e6b Apply JIT optimizations form Python 2017-09-05 17:48:55 -04:00
ea05ac8f41 Move JIT-related files to jit dir. Remove IR interpreter 2017-09-05 17:48:55 -04:00
a797ab9343 Rewrite AST to a new, more functional representation.
Previously, our AST was a DAG, where shared Nodes indicated a computation
should be reused.  This commit rewrites the IR into a new functional
representation which represents sharing explicitly using variable
bindings.

We offer a few justifications for this new style:

1. The new representation is not all that different from the
old one; it is about as easy to construct, and the lack of an
explicit graph doesn't negatively impact our ability to interpret
the graph, since we've chosen, as a matter of design, to NOT have
the IR participate in the actual execution of a graph.

2. The new let-binding representation has an implicit ordering,
which we can use to conveniently keep track of the original order
the trace showed up as.  This automatically gives us a topsort,
and gives us an easier to read textual representation of our
IR:

  %14 = Embedding %11, %0, -1, None, 2, False, False
  %15 = Dropout %14, 0.2, True, False
  %16 = Index %12, 0
  %17 = Index %12, 1
  %18 = Index %13, 0
  %19 = Index %13, 1
  %20 = Index %15, 0
  %21 = Linear %20, %1, %3
  %22 = Linear %16, %2, %4

3. It moves us closer to a Futhark style language
(http://futhark-lang.org/publications/pldi17.pdf).

Major aspects of the diff

- Node is replaced with Expr and Arg, a pair of mutually recursive
  structures which represent our new language.  In BNF, the language
  looks like this:

    a ::= c | %i
    e ::= %i, ... = e
        | PyOp e, ...
        | Ret %i, ...

  Technically, Ret is not actually a return (no control flow is involved),
  it just tuples up a series of tensors (identified by variables).

  One important invariant is that locals are always tensors; they
  are never constants (this is asymmetric with Args.)

- Arguments support Python constants.  This is an important piece because
  many operators take extra Python literals like integers and tuples in
  order to specify extra parameters about how an operator operates.  Adding
  this was essential to getting word_language_model to work.

- As both Expr and Arg have multiple variants, there is new infrastructure
  for doing case on the variants using ExprVisitor and ArgVisitor.  The
  strategy here is adapted from WebAssembly's visitors, although we have
  generalized to permit arbitrary argument forwarding, which is necessary
  to support tail-recursive visitor calls.  TCO is important because our
  interpreter may recurse arbitrarily deep into a stack of nested lets.
  If users wish, they can also manually case on the type tag.

- Tracing is now turned on and off using _tracer_enter/_tracer_exit in
  torch._C.  _tracer_enter accepts a list of variables which are to be
  treated as arguments; _tracer_exit accepts the list of traced variables
  which should be returned when you reexecute the trace, and returns
  the trace expression which can be reexecuted.  GlobalTracingState
  is a global variable which tracks whether or not we are tracing or not.

- You use run_forward to execute a trace on some set of parameters.

- When under tracing, variables keep track, via trace_local, what the
  name of their variables in the IR are.

Here is a simple runner which leaks memory but can be used to JIT models:

  import torch.autograd.function as F
  import torch._C

  def jit(model):
      import types
      real_forward = model.forward
      def forward(self, *args):
          def flatten(x):
              return tuple(F._iter_variables(x))
          if not hasattr(self, "saved_trace"):
              torch._C._tracer_enter(tuple(self.parameters()) + flatten(args))
              out = real_forward(*args)
              self.saved_trace = torch._C._tracer_exit(flatten(out))
              self.saved_outs = out
              return out
          else:
              flat_out = Variable._execution_engine.run_forward(self.saved_trace, tuple(self.parameters()) + flatten(args))
              return F._unflatten(flat_out, self.saved_outs)

Major problems:

- Sanity checking is spotty at best, especially when users pass in variables.

- The interpreter leaks tensor memory from the store.  When we add back def-use
  we should be able to deallocate tensors as soon as we know they are no longer
  necessary.

- The interpreter needs to reach feature parity with the old execution engine.
  From there, we need to see if backwards can be subsumed as well.

- I still have no confidence in having memory managed everything correctly.
  This requires a close look.

- Rather than return an *open* expression as a trace, we should return a
  *lambda* instead, which knows about how many formal parameters it
  requires.

- The IR is not introspectable from Python at the moment, but this is simply a
  matter of implementing all the binding code.

- The tracer is NOT reentrant (you can't trace while you're inside a trace.)
  Furthermore, no sanity checking is done if you try to incorrectly reuse
  things from one trace in another.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
e1b7872fc2 Make it possible to access IR from Python.
Also, add a new trace_fn field to attach forward IR to Variables.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
94b5990201 Add torch.cuda.get_device_name function (#2540) 2017-08-26 15:06:37 -04:00
eb58740651 add ones_like and zeros_like 2017-08-25 14:11:04 -04:00
c000d15058 Properly use Py_RETURN_True, Py_RETURN_False in back compatibility warnings. (#2345) 2017-08-08 21:54:20 -04:00
9d8cff9bc1 initialize aten and pytorch to share the same THCState 2017-07-11 10:35:03 -04:00
714351ff39 Officially enable process-group mode 2017-06-12 22:02:11 -04:00
4f602a52b5 Use THPUtils_assert rather than THError in torch/csrc/Module. 2017-06-11 05:37:59 -04:00
ffd808768e Remove raiseErrors from THTensor functions, have THStorage functions take an error_buffer to return a proper error message while being able to handle memory management correctly from calling function. 2017-06-11 05:37:59 -04:00
177785eecf explicit Ptr constructors, fast transposed copy. 2017-06-11 05:37:59 -04:00
be65f46c76 Add optional warning for backwards incompatible keepdim. Setting torch.utils.backcompat.keepdim.warning.enabled=True will cause Python warnings in the case where the default value of keepdim is used for 1-d reductions.
Also specify keepdim via kwargs in library so these warnings have less
noise.
2017-06-11 05:37:59 -04:00
3556d1b8a3 Add optional warning for backwards incompatible broadcast.
Setting torch.utils.backcompat.broadcast.warning.enabled=True
will cause Python warnings in the case where broadcast occurs
but previously 1-d view style pointwise ops occured.
2017-06-11 05:37:59 -04:00