pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-28 18:54:57 +08:00

Author	SHA1	Message	Date
Orion Reblitz-Richardson	9ec0a2aef4	fbshipit-source-id: ba600fcd2b5cefc7621357bdeb05e24cea02e5af	2018-06-27 04:50:56 -07:00
Peter Goldsborough	290d20b094	Replace max_pool with max_pool_with_indices (#8892 ) * Create max_poolXd_with_indices * Match ATen names in ONNX symbolic	2018-06-26 17:09:30 -07:00
Tongzhou Wang	e6c7b38f94	Cache cufft plans (#8344 ) * cache cufft plans * use an LRU cache * suffix CuFFTParams members with _ * import print_function for py2 * lint * fix potential race; add dummy impl for CPU only builds * cpp formatting; remove nccl makefile change * Use CUDA hooks instead * comments and doc * update the error message * move LRU cachae to a separate file and native::detail namespace * update comment * specify NOTE location in CuFFTPlanCache.h * update disabled_features.yaml to make amd ci work * another fix for AMD CI in disabled_features.yaml * Wrap cufft_plan_cache_* methods in __HIP_PLATFORM_HCC__ * improve the notes * lint * revert onnx change * put back inlining for CUFFT_CHECK	2018-06-22 13:02:34 -04:00
Peter Goldsborough	372d1d6735	Create ATen tensors via TensorOptions (#7869 ) * Created TensorOptions Storing the type in TensorOptions to solve the Variable problem Created convenience creation functions for TensorOptions and added tests Converted zeros to TensorOptions Converted rand to TensorOptions Fix codegen for TensorOptions and multiple arguments Put TensorOptions convenience functions into torch namespace too All factory functions except _like support TensorOptions Integrated with recent JIT changes Support _like functions Fix in place modification Some cleanups and fixes Support sparse_coo_tensor Fix bug in Type.cpp Fix .empty calls in C++ API Fix bug in Type.cpp Trying to fix device placement Make AutoGPU CPU compatible Remove some auto_gpu.h uses Fixing some headers Fix some remaining CUDA/AutoGPU issues Fix some AutoGPU uses Fixes to dispatch_tensor_conversion Reset version of new variables to zero Implemented parsing device strings Random fixes to tests Self review cleanups flake8 Undo changes to variable.{h,cpp} because they fail on gcc7.2 Add [cuda] tag to tensor_options_cuda.cpp Move AutoGPU::set_index_from into .cpp file because Windows is stupid and sucks Fix linker error in AutoGPU.cpp Fix bad merge conflict in native_functions.yaml Fixed caffe2/contrib/aten Fix new window functions added to TensorFactories.cpp * Removed torch::TensorOptions Added code to generate wrapper functions for factory methods Add implicit constructor from Backend to TensorOptions Remove Var() from C++ API and use torch:: functions Use torch:: functions more subtly in C++ API Make AutoGPU::set_device more exception safe Check status directly in DynamicCUDAHooksInterface Rename AutoGPU to DeviceGuard Removed set_requires_grad from python_variables.h and warn appropriately in Variable::set_requires_grad remove python_default_init: self.type() Add back original factory functions, but with deprecation warnings Disable DeviceGuard for a couple functions in ATen Remove print statement Fix DeviceGuard construction from undefined tensor Fixing CUDA device compiler issues Moved as many methods as possible into header files Dont generate python functions for deprecated factories Remove merge conflict artefact Fix tensor_options_cuda.cpp Fix set_requires_grad not being checked Fix tensor_new.h TEMPORARILY put some methods in .cpp files to see if it solves issues on windows and mac Fix bug in DeviceGuard.h Missing includes TEMPORARILY moving a few more methods into .cpp to see if it fixes windows Fixing linker errors * Fix up SummaryOps to use new factories Undo device agnostic behavior of DeviceGuard Use -1 instead of optional for default device index Also move DeviceGuard methods into header Fixes around device index after optional -> int32_t switch Fix use of DeviceGuard in new_with_tensor_copy Fix tensor_options.cpp * Fix Type::copy( * Remove test_non_float_params from ONNX tests * Set requires_grad=False in ONNX tests that use ints * Put layout/dtype/device on Tensor * Post merge fixes * Change behavior of DeviceGuard to match AutoGPU * Fix C++ API integration tests * Fix flip functions	2018-06-16 00:40:35 -07:00
Edward Z. Yang	711e5a6ceb	Port THS to ATen. (#8409 ) * Port THS to ATen. The basic structure of the patch: - All kernels in aten/src/THS got rewritten as native functions in aten/src/ATen/native/sparse I took the liberty to rename some of the kernels, opting for a longer, more transparent names than things like 'spaddcmul'. - Instead of holding fields for sparse tensor in the TH C struct THSTensor, they are now held in a C++ class SparseTensorImpl (this explains why I had to do this all in one go; I can't have two reps for sparse tensors!) Along the way, we change a key internal representation invariant: an "empty" sparse tensor has dimI == 1 and dimV == 0 (this is different from dimI == 0 and dimV == 0 we had before); this ensures that we maintain the invariant that dim == dimI + dimV. "Scalar" sparse tensors are made illegal, because there really is no way to properly express them in COO format. - Because we haven't ported THCS or any of the traditional dense TH implementations, there is a new set of adapter functions in native/LegacyBridge.cpp exclusively devoted to deciding whether or not to go to the new native implementation or back to the legacy TH binding (prefixed with th_). The intent is that when everything gets ported, we can delete this file. - I've kept the stubs for all the THS functions, but they now all error if you try to actually call them. Eventually, we should replace these with calls to ATen so that everything keeps working. - I gobbled up SparseMM (SparseMM.cpp is no more). It was tasty. There are some miscellaneous improvements which were needed for other changes in this patch: - There is now AT_FORALL_SCALAR_TYPES_EXCEPT_HALF, which does what it says on the tin. - axpy templated function moved to TH/BlasUtils.h, there's a new macro which lets you easily forward to all of the TH functions. We also expose THBlas_copy. I'm not terribly pleased with these functions but they seem to serve a purpose they need. - New method on Tensor to get TensorImpl, unsafeGetTensorImpl - accessor() is now this-const, since const-correctness on Tensor is a lie - New toSparse()/toDense() methods on Type; now you can call these directly without having to manually apply at::toSparse/toDense on the Backend and then running toBackend yourself. Changes to the kernels: - Previously, the whole body of all kernels was compiled for every supported scalar type. In our new implementation, the scalar dispatch has been pushed into the smallest extent which (1) is not in a type loop and (2) requires statically knowing the scalar type. These sites all use AT_DISPATCH_ALL_TYPES. I tried to use lambdas as much as possible, but sometimes it was not possible when a OpenMP pragma was used. - Anywhere we tested if the nDimension of a tensor was zero, we replaced with a test that numel is zero. Because, as we known, nDimension of zero-size tensors in TH is zero, and that's wrong wrong wrong (and not done this way in ATen). Some subtleties: - Places where previously fastget1d was used, I now use a TensorAccessor. However, you have to be careful about grabbing the accessor, because sometimes you will be accessor'ing indices/values and they are empty, which means they will be 1D* ("oh, aren't indices always 2D?" Nope. Nyet.) So, essentially, it is only safe to grab an accessor after you have checked that nnz != 0. All of these shenanigans will go away when we properly support zero-size dimensions. A few places, we test for this case just by wrapping the loop in a conditional on nnz. Some other places this is not so easy, so we instead short-circuit the function with a special case for when nnz == 0 (usually, these implementations are degenerate). - There is a very subtle but important difference between _sparse_get_impl(self)->indices() and self._indices(); the latter may return a view! This is because nnz is not guaranteed to match the dimensions of indices/values; you can "truncate" a sparse tensor by setting the nnz. Actually, I think this is not a good idea and we should enforce a stronger invariant, but for this patch I slavishly adhere to the old ways, and as such I have to be very careful if I want to resize something, I had better use the former and not the latter. - I had to reimplement broadcasting by hand (thus the s_ and non-s_ functions in the sparse native files). There is a very important distinction between foo_out and foo_, so it is important that the LegacyBridge function always call to the lower layer, and not try to avoid boilerplate by calling to another LegacyBridge function first. I did NOT put broadcasting in LegacyBridge (even though, ultimately, that's where it must live), because the th_ functions which are invoked from LegacyBridge handle broadcasting themselves, and I don't want to broadcast twice. - Sparse function MUST explicitly specify the Type they dispatch from, otherwise Variable wrapping/unwrapping will not work correctly. If you use _get_sparse_impl, that is sufficient to levy this requirement. - The "has native" tests in LegacyBridge.cpp are not 100%, because some of the functions are mixed dense-sparse functions, and so you can't just say, "Oh, if it's sparse and CPU, call the native sparse implementation." This is handled on a case by case basis. There is some especially complex logic for add(), which has dense-dense, sparse-sparse and dense-sparse implementations. - I added some uses of SparseTensorRef in native_functions.yaml, but you will notice that these are all on native_* functions, and not the actual, top-level functions. So the SparseTensorRef is purely documentary (helping you not call the wrong overload) but there is no magic; we do the wrapping ourselves the hard way. (This is in constrast to the TH binding code which is magical.) Except for _sparse_mask; _sparse_mask is magical. - There is a raw_copy_sparse_ method, which is really my way of getting around the fact that copy_ has never been implemented for sparse tensors (even before this patch), but there IS a super secret, internal way of doing these copies that the THS code used, and which I needed to get my hands on when I did this port. We should refactor so that either (a) copy_ does support sparse-sparse copy natively, or (b) we do this other ways. - Irritatingly, I must explicitly resize_as_ before copy_ into a tensor. This was not the case with THTensor_(copy) but I don't have any direct binding that doesn't have this requirement. - For some reason, the sparse tensor constructor accepts a scalar tensor for the values tensor. This is kind of weird because you always need an nnz-dimension. However, the old code supported this and just expanded it into a 1D size 0 tensor; so we need some explicit code to do this. There are maybe a bit more AT_ASSERTs in some of the kernels than is wise. I added them all when I was debugging and was loathe to remove them. Some last mile fixes after this commit went into PR - Move expand outside of dispatch so autograd works (it used to be inside and then we lost all of the recorded broadcasts). - Hack to duplicate the derivatives for our now two definitions TH and native. Mercifully the derivatives are short. - Apparently, TH has a special case to make foo_ functions method only, and if you don't do this the Python arg parsing is wrong. We carefully work around this in the native bindings - Apply DCE to a test_jit case, fixes wobbling due to DCE trick in tracing - Update test_function's output - Some last mile fixes for dispatch confusion in sparse_coo_tensor functions. - New simplified regression test based on failures I saw in ONNX - Increase tolerance on super resolution test - More robust dynamic_type normalization, fixes ONNX bug. The dynamic_type situation is very delicate; probably need to stop having both Scalar and real. - Make new_with_tensor_sparse more CUDA safe - Note about CUDA-safety in SparseTensorImpl - Rename dimI/dimV to sparseDims/denseDims. - Make localScalar on SparseTensorImpl work. - Make numel uniformly supported on all types, not just dense types - Add tests for is_nonzero() method (which exercises localScalar) - Disable constant JIT autogenerated tests, which are fragile and broken by this change, but being fixed in a parallel track. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-06-15 17:52:21 -04:00
anderspapitto	fcd9af8a25	changes to support ATen code generation inside fbcode (#8397 ) * Back out "Back out "Add support for generating ATen files during fbcode build"" Original commit changeset: 7b8de22d1613 I'm re-sending this diff exactly as it was approved and committed. Fixes to support @mode/opt will be sent separately for ease of review. * Enable building //caffe2:torch with @mode/opt In @mode/opt, python runs out of a PAR, which breaks a lot of assumptions in the code about where templates/ folders live relative to __file__. Rather than introduce hacks with parutil, I simply turn template_path into a parameter for all the relevant functions and thread it through from the top level.	2018-06-12 14:57:29 -07:00
Edward Z. Yang	7ed361a466	Rename SparseTensor to SparseTensorRef. (#8237 ) I want to introduce using SparseTensor = Tensor (as a documentary type alias for Tensor), but the name is already taken. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-06-07 11:03:49 -04:00
Seth Hendrickson	e9c33e91d9	Remove python bindings for `torch.slice` (#7924 ) * skip python bindings for slice * remove tests * convert slice test to indexing	2018-05-31 13:42:49 -04:00
Thomas Viehmann	8f97cbcf4e	remove index from python bindings (fixes: #7639 ) (#7690 )	2018-05-19 20:04:07 +02:00
Richard Zou	71626491c4	Add batched linear solver to torch.gesv() (#6100 ) * Add batched linear solver to torch.gesv() Fixes #3164 Picks up from #4502 I moved `gesv` to ATen. Adds bindings for MAGMA's `gesv_batched` function for CUDA. For CPU, runs `THLapack(gesv)` in a for loop. The new function supports arbitrary batch dimensions (and broadcasting of those dimensions). For example, the 4-d tensor `A x B x M x M` should be treated as having batch-size `(A x B)`. The overhead of creating the magma_queue_t is: ~350000 microseconds the first time it's called and ~6 microseconds every time after that. * Tests and docs * Address comments * Address comments * Rebase * Address comments * Fix rebase * Addressed comments * Address comments * Address comments * Addressed comments	2018-05-08 17:06:27 -04:00
Adam Paszke	0829d4502d	Trace size-dependent expressions correctly (#6554 ) This makes the JIT tracer much more robust, by allowing it to record dependencies on tensor sizes. For example, if you were to trace this function def fn(x): return x.view(x.size(1), -1) before this patch, then it would embed the actual value of x.size(1) in the trace as a constant, making it very hard to have e.g. batch size independent traces. Now, this will correctly record the dependency, and will retrieve the size of x at every run.	2018-05-04 10:55:39 +02:00
gchanan	681baa9254	Restore warning to torch.range. (#7194 ) Also, get rid of warning specification in Declarations.cwrap, which currently has no effect.	2018-05-02 21:53:00 -04:00
gchanan	2a18e7c45b	Have python dispatch respect 'auto_gpu' and 'with_gil'. (#7137 )	2018-05-01 13:51:02 -04:00
gchanan	a6bfa16c17	torch.arange: add numpy-style type inference. (#7016 ) * torch.arange: add numpy-style type inference. This is a backwards-compatibility breaking change. * Fix flake8. * Use at::optional. * Remove unneeded header files. * Use reference wrapper. * Update arange for test. * Address review comments.	2018-04-27 15:11:45 -04:00
gchanan	3d907ef78e	Consistently check 'out' variants against specified dtype/layout/device parameters. (#6973 ) We were previously doing this in the most common cases, but not consistently.	2018-04-25 22:46:42 -04:00
Adam Paszke	d26ab68485	Sort declarations when generating Python bindings (#6701 ) * Sort declarations when generating Python bindings This helps resolve ambiguities in argument parsing according to any rules we will need. For now, this allows us to make scalar operations more conservarive wrt. argument types, but makes them commutative again. * Fix inconsistencies between mod with tensor and scalar * Fix a stupid mistake	2018-04-18 21:51:35 -04:00
Thomas Viehmann	bd0cc7d364	Implement torch.einsum (fixes #1889 ) (#6307 ) * start at generic trilinear * Implement einsum (fixes #1889) This provides a simple implementation of einsum. It is built on top of the work for computing bilinear (#6110). It uses a naive left-to-right resolution at the moment. Autograd is able to differentiate by itself. The obvious unsupported feature is taking diagonals (einsum('ii->i',(a,)). * add tests and docs * fix flake8 * clean diff * rebase on current master to resolve conflicting String wrapping * clean up after rebase * better commentary in einsum and sumproduct_pair * don't say fixme if it's fixed and rename num_outputs to num_output_dims * adapt python wrapper to use std::string instead of String to avoid typedef at::String * typos and some vector to array conversion * fix accidental python<->python3 change * really fix bad rebase	2018-04-18 13:41:27 +02:00
gchanan	5ed3f3347a	Add dtypes (with reasonable defaults) to sum, prod, cumsum, cumprod. (#6573 ) * Add dtypes (with reasonable defaults) to sum, prod, cumsum, cumprod. This adds optional dtypes to torch.sum, torch.prod, torch.cumsum, torch.cumprod. By default, the dtype is torch.float64 for integral types, and the dtype of the input for floating point types. * Don't use optional<ScalarType>, because the jit can't handle it yet. Instead, we manually build the overloads. This is fairly painful because of default arguments, but should be easy to pull out once the jit can handle optional<ScalarType>. * Fix keepdim with out parameters. * Fix _cudnn_rnn_flatten_weight. * If dtype is provided to an out function, make sure it matches the dtype of the result. * Fix typo.	2018-04-16 23:52:59 -04:00
gchanan	749d51414a	Separate cuda-ness from dtype. (#6470 ) * Separate cuda-ness from dtype. There are no longer torch.cuda.int64, etc; only torch.int64 that correspond to at::ScalarType. At the python arg parser level, the corresponding ATen type is selected from the combination of (ScalarType, Layout, Device). There is also currently unused code in here for support ScalarType in native_functions; this will be used for specifying aggregate types on reduction functions. * Fix test_autograd. * Add defaults to randint_like. * Track is_cuda in py tensor types. * Fix test_sparse. * Fix multiprocessing. * Fix rnn. * Fix test_nn. * Fix flake8.	2018-04-12 14:05:44 -04:00
gchanan	87e369111a	Add string-style devices to all tensors. (#6283 ) * Add string-style devices to all tensors. Previously, tensors only had a 'get_device' method which would throw an exception on a CPU tensor. This made it necessary to if/else code that was meant to be device agnostic. This PR implements the following: 1) Adds a 'device' property to all tensors that returns a string representation of the device for all tensors. For cpu tensors this is 'cpu'. For cuda tensors this is 'cuda:X', where X is the cuda device ordinal. 2) Adds a DeviceSpec class. This is just a helper class for separating device_type and device_index specification and to allow partial specification. For example, you can call DeviceSpec('cuda'), DeviceSpec('cuda:0'), DeviceSpec('cuda', 1). Also has backwards compatibility support for specifying integers, which are treated as cuda devices. DeviceSpecs have the following properties: a) device_type: string representation of the device type (i.e. 'cpu' or 'cuda') b) device_index: integer for the device index (None if not specified) c) cuda_device_index: for backwards compatibility; behaves roughly like `get_device` did previously. I.e. if a function previously took integers for cuda devices, it can now take DeviceSpecs (or strings), and can maintain the old functionality by calling `old_index = DeviceSpec(old).cuda_device_index`. 3) tensor methods and torch. functions that took integer devices can now take integers, strings, or DeviceSpecs. For example: torch.randn((2,3), dtype=torch.cuda.float32, device='cuda:1') TODO in future PRs: A) Split out cuda from dtype so you don't need to overspecify cuda-ness B) We currently only support strings/DeviceSpecs in tensor methods and torch. functions. We should have equivalents torch.cuda.device(...), torch.cuda.device_of, etc. at the torch. level that work on strings/DeviceSpecs * Add deviceInt64 to python arg parser. * device_str. * Remove device_str. * remove device prefix from attributes. * Use const char * instead of string. * Move autogpu index out of Device. * comment on is_default. * Rename torch.DeviceSpec to torch.device. * comment. * Fix tests. * Fix flake8. * Fix sparse_coo_tensor parameter name. * Improve error message. * Remove device_ prefix from C++ device object. * Allocate static strings. * Return not implemented from rich compare. * Move torch::Device to THPDevice. * Remove cuda index. * Py_RETURN_NOTIMPLEMENTED doesn't exist in python2.	2018-04-06 15:12:05 -04:00
Peter Goldsborough	9ba70856a1	Add max_values and argmax convenience functions to ATen (#6201 ) * Add max_values and argmax convenience functions to ATen * Add documentation for torch.argmax/argmin and skip max_values * Add tests for argmax/argmin * Dont default the dim argument * Use dim=0 in test_torch.py for argmax tests * Implement argmin() and argmax() without dim * Call .contiguous() before .view(-1)	2018-04-04 15:53:26 -04:00
gchanan	4c81282c33	Introduce torch.layout and split layout from dtypes. (#6145 ) * Introduce torch.layout and split layout from dtypes. Tensors (and tensor types) now have a 'layout' attribute that returns either 'torch.strided' or 'torch.sparse_coo'. Previously, dtypes were 1-to-1 with ATen types/PyTensorTypes; the impetus behind this decision was to make things easy in the common case (i.e. specifying a type in a factory function). But this doesn't really follow for sparity, which isn't a common case. It also doesn't properly represent the concept or a dtype, which in numpy are proper scalar types (i.e. roughly the type returned from indexing the last dimension of an n-d array). But this should be the same whether or not the tensor is represented via strides, sparsity, etc. This is accomplished by: 1) having the dtype of tensor return the (device-type, scalar-type) combination, i.e. torch.cuda.float32, so both torch.cuda.FloatTensor and torch.cuda.sparse.FloatTensor have the same dtype 2) Adding a layout parameter to python functions, where the combination of (dtype, layout) maps to an ATen type that is used for dispatch. * Formatting, make init throw python_error. * Fix cuda not enabled error message. * Fix test.	2018-04-02 14:07:50 -04:00
Tongzhou Wang	d2c0f8bb57	avoid generating torch.*_backward_(input\|weight\|bias) (#6114 )	2018-03-30 15:23:56 -04:00
gchanan	df039e2998	Unify handling of type_dispatched_args in gen_python_functions. (#6088 ) This is just to simplify the handling, there is no generated code difference.	2018-03-28 22:23:20 -04:00
Richard Zou	e3e0c34390	Unify error checking for tesnor.index_copy_ (#5642 )	2018-03-22 20:07:15 -04:00
gchanan	a3442f62bc	Support native namespace functions with type dispatch. (#5576 ) * Support native namespace functions with type dispatch. Use 'ones' as an example. Note this is a "halfway" solution; i.e. the call chain is: at::ones(shape, dtype) -> dtype.ones(shape, dtype) -> CPUFloatType.ones(shape, dtype) -> at::native::ones(shape, dtype) The "nicer" solution would probably be something like: at::ones(shape, dtype) -> dtype.ones(shape) -> CPUFloatType.ones(shape) -> at::native::ones(shape, this) * Fix type inference. * Fix test install. * Fix extensions. * Put dtype argument at the beginning. * Fix extension.cpp. * Fix rnn. * Move zeros in the same manner. * Fix cuda. * Change randn. * Change rand. * Change randperm. * Fix aten contrib. * Resize in randperm_out. * Implement eye. * Fix sparse zeros. * linspace, logspace. * arange. * range. * Remove type dispatch from gen_python_functions. * Properly generate maybe_init_cuda for type dispatch functions not named type. * Don't duplicate dtype, this parameters for native type dispatched functions. * Call VariableType factory methods from the base type so it gets version number 0. * Address review comments.	2018-03-09 10:52:53 -05:00
gchanan	0f86f64398	Add support for device python arguments with constructors. (#5384 ) * Add support for device python arguments with constructors. * Fix flake8. * Simplify device handling. * Dont use torch._C._VariableFunctions. * Handle default values for functions that have tensor args (e.g. ones_like).	2018-02-28 14:41:57 -05:00
Sam Gross	ebd32f7bcd	Check that parsed_args contains enough space for all parameters (#5467 )	2018-02-28 14:34:04 -05:00
gchanan	f4cfd9bbfc	Don't python bind 'tensor' or 'sparse_coo_tensor'. (#5390 ) These are internal ATen functions; we have better python APIs.	2018-02-26 11:06:25 -05:00
Sam Gross	30ec06c140	Merge Variable and Tensor classes (#5225 ) This replaces the torch.Tensor constructors with factories that produce Variables. Similarly, functions on the torch module (e.g. torch.randn) now return Variables. To keep the PR to a reasonable size, I've left most of the unused tensor code. Subsequent PRs will remove the dead code, clean-up calls to torch.autograd.Variable, and rename Variable to Tensor everywhere. There are some breaking changes because Variable and Tensors had slightly different semantics. There's a list of those changes here: https://github.com/pytorch/pytorch/wiki/Breaking-Changes-from-Variable-and-Tensor-merge	2018-02-23 18:03:31 -05:00
gchanan	0878c6d4d7	Various dtype improvements. (#5321 ) * Various dtype improvements. 1) Add dtypes to the new data-based constructors: Variable.new_tensor and torch.autograd.variable. 2) In the python signatures, use Type instead of Dtype to match the C++ signatures; the error messages still print as dtype. 3) Handle / add a better error message when a dtype is used when ATen was not compiled with that type (e.g. cuda types). 4) Move cuda_lazy_init to its own file. A later commit will add support to the legacy constructors as well. * Move implementation of lazy_init to cpp. * Fix parsed_arg size.	2018-02-21 17:37:59 -05:00
gchanan	5edf6b2037	Add numpy-style dtypes to Variable factories. (#5245 ) * Add numpy-style dtypes to Variable factories. 1) Add numpy-style dtypes corresponding to torch tensor types. These are: torch.float16, torch.float32, torch.float64, torch.uint8, torch.int8, torch.int16, torch.int32, torch.int64 as well as torch.cuda, torch.sparse, and torch.cuda.sparse equivalents. 2) Adds "legacy" names for the above dtypes that correspond more closely to existing tensor names. These are: torch.half, torch.float, torch.double, torch.short, torch.int, torch.long. torch.byte and torch.char don't exist because they either don't match numpy semantics or differ on different architectures. 3) Adds a "dtype" parameter to Variable factories (e.g. zeros, ones) that allows the user to specify the type without changing the default tensor type. 4) Adds a "dtype" getter to Variables that return the canonical dtype from 1) This PR is missing the following useful features that should be added in the future: A) We only add the "dtype" parameter to auto-generated factories; hand-written factories like in tensor_new.cpp don't support this yet. B) We don't allow type conversions to use dtypes; that should be added to type(param) or a new function. C) We don't yet have a "device" parameter for these factories; right now, they will only create Variables on the default device. * backend_to_string can be private. * Define python binding argument indexes in a more simple way. * add all_declared_types, still need to hook it up to THPDType. * Fix all_declared_types for missing types (it's Sparse + Half). * Ensure cuda dtypes are created even if compiled with NO_CUDA=1. * Fix case where dtype is provided but dispatch is via namespace. This happens in ones_like, empty_like, randn_like. There is some question if we should do: 1) at::ones_like(tensor).toType(dtype) 2) at::ones_like(tensor.toType(dtype)) I did the former because this matches with the numpy documentation, i.e.: "Overrides the data type of the result." and it's easier to implement. Note that the above causes an extra copy, either of the input or output. Here's a better implementation: 1) Make zeros_like, ones_like native functions that take an optional type (named dtype?). 2) Match the type argument with the dtype, so we don't have two different parameters. 3) Call at::zeros_like(input, type) -> at::native::zeros_like(input, type) -> type.zeros(input.sizes()) * Don't return from maybe_initialize_cuda. * Don't leak DType name. * Address cpp review comments. * Share code between sparse and non-sparse test_dtypes. * Rewrite _like functions as native function with explicit type parameter. * Use type 'Type' instead of 'dtype' for consistency. * Address review comments. * Handle arg_idx when there is requires_grad but no dtype in python_binding_arguments.	2018-02-20 11:04:14 -05:00
Sam Gross	0509f26d41	Speed-up nn.Linear for the 3d input case (#5279 ) This adds at::_unsafe_view and uses it in matmul. The _unsafe_view function is identical to view except that the output is not treated like a view by the automatic differentiation code. This avoids in-place modifications triggering the more expensive CopySlices/AsStridedBackward behavior. The _unsafe_view function is only safe to use on temporaries that will be immediately discarded and that do not alias other tensors. Otherwise, in-place modificatiions may trigger incorrect gradients. The funciton is not exposed to Python. See #5169	2018-02-19 19:47:20 -05:00
Sam Gross	c1b98f0841	Add deprecated add_out overload (#5088 ) We have a few calls that use this signature on Tensors. This also updates the binding code to support deprecated xxx_out signatures.	2018-02-06 17:08:23 -05:00
Edward Z. Yang	7bd2db997e	Port cuDNN RNN bindings to ATen (#4881 ) * Add transpose() to TensorGeometry. This code is dead; I briefly used it in my RNN patchset but eventually rewrote it to not be necessary. However, it seemed like a useful gadget so I kept it. In general, it seems that it would be useful for TensorGeometry to support all operations that Tensor does, but it only computes the changes to sizes/strides instead of actually doing the computation. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Turn on wrap_dim behavior for TensorGeometry Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Support for hard-coded differentiable outputs. Some outputs of functions are nondifferentiable, and should always be returned with requires_grad=False. Traditionally, we have used the presence of 'grad' to signal that only the first output is differentiable, and the rest are not, but cudnn_rnn (to be implemented) breaks this pattern; its first three outputs are differentiable, but its last output is a buffer that is just consumed by backwards. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * TensorGeometry constructor from just sizes The sizes are assumed to form a contiguous tensor, and we compute the strides we would get in that case. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Support saving TensorList for backwards. There is some back story here. Saved TensorList in backwards will be used by cudnn_rnn, and it is worth asking, why is it necessary to save a list of tensors? Indeed, technically speaking a list of tensors is not necessary, we only need to save the sizes of each of the weight tensors. (We need the sizes because cuDNN is only going to blast the derivative of weights into a flat buffer, but we need to match the sizes of the views into the buffer when we eventually return the derivatives.) However, it was surprisingly awful trying to implement passing just sizes, because as non-Tensor arguments, the JIT interpreter generation code is expected to handle all non-Tensor arguments as attributes in the trace, and our attributes struct doesn't actually know how to do arrays of arrays. Saved TensorList code was much easier to get working, so that's what this patch does. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * MatrixRef - an ArrayRef with a stride, making it a 2D ArrayRef. Like ArrayRef, this class does not own the underlying data, it is expected to be used in situations where the data resides in some other buffer. This is intended to be trivially copyable, so it should be passed by value. For now, 2D only (so the copies are actually cheap, without having to write a SmallVector class) and contiguous only (so we can return non-strided ArrayRef on index). The intended use-case (not in this commit) is to make it easier to work with RNN weights, which are num_weights x num_layers matrix of parameters. P.S. dimension 0 indexes rows, dimension 1 indexes columns Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Generalize getDataType in Descriptors.h Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Change copy_range to take Tensor, and change cat_tensors_backward accordingly Should a backward function return a Variable or a Tensor? For the most part, all of our backward functions return Tensor, except cat_tensors_backward, which returns a variable_list (which is really the only thing that matters, because Tensor and Variable are interconvertible). But this is kind of weird, because it means that you can't implement a backwards in ATen that returns a std::vector<Tensor>, and then hook it up transparently with the derivatives code. So I switched it over. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Support 5-ary return Tensor tuple. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Support code generation with mixed Tensor/TensorList in output. I don't think I ended up using this in cudnn_rnn, but this seems it might be useful for someone else later. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Support 4-ary boolean array Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Add support for retain_variables in tools/autograd/derivatives.yaml 'retain_variables', a bool which is true if a user has specified that saved variables should be retained in case the backwards is run again later. This allows an optimization where we can destroy saved buffers if we know variables are not going to be retained, e.g., it is (will be) used by _cudnn_rnn Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Lazily initialize cuDNN descriptors Previously, cuDNN descriptors were eagerly allocated as soon as a FooDescriptor object was created. However, in some uses of TensorDescriptor, this is problematic: some tensors are optional and cuDNN's API expects to be given a nullptr TensorDescriptor in this case, not an uninitialized (but allocated) descriptor. Lazily initializing the descriptors makes it less likely for us to use uninitialized memory and matches the usual semantics of unique_ptr. It's good sense! Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Port cuDNN RNNs to ATen. This brings three new functions: - _cudnn_rnn_flatten_weight: flatten a matrix of weight tensors into a single contiguous weight buffer as required by cuDNN - _cudnn_rnn: run RNN forwards - _cudnn_rnn_backward: run RNN backwards RNNs have a lot of parameters, so we restructured what was previously a single 'fn' object that recorded all the parameters into three objects: RNNDescriptorParams, TensorDescriptorListParams and DropoutDescriptorParams. We make use of MatrixRef to organize the weight tensors (which are weight/bias x number of layers), but I did not teach the codegen how to pass these as arguments/return values natively, so instead a MatrixRef is passed as its constituent ArrayRef and int64_t stride0. cudnn_rnn has three differentiable outputs and one nondifferentiable one, so it makes use of the support for hard-coded differentiable outputs. I haven't deleted all of the descriptor code from Python, because dropout initialization still goes through this codepath, that should be fixed soon but I don't see it as essential for this PR. This commit also removes the last use of NestedIOFunction from PyTorch. There are some shenanigans with cuDNN dropout descriptor initialization, see below: Note [cuDNN dropout descriptor initialization] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In most cases, setting descriptors in cuDNN is cheap (e.g., cudnnSetTensorNdDescriptor). However, this is not the case for cudnnSetDropoutDescriptor: in cuDNN 6/7 (and possibly others) it does an expensive precomputation to initialize the random number generator states. In cuDNN 6, this is the ONLY official mechanism to initialize a dropout descriptor, which means that law-abiding clients were expected to generate a dropout descriptor once and cache it. However, our ATen interface is (1) stateless (so we can't cache the descriptors) and (2) does not accept arbitrary user types in its interface (so we can't pass the descriptor in). This puts us in a pickle. In cuDNN 7, a new function, cudnnRestoreDropoutDescriptor was added, which forgoes the expensive initialization process, and can initialize the descriptor with a pre-initialized state CUDA tensor. This is great, because it means we can simply pass in the state tensor and then initialize the descriptor internally. Unfortunately, this function is not available in cuDNN 6. To work around this, we break the cuDNN abstraction barrier, and have the struct layout of the underlaying dropout descriptor. With this struct, we can reimplement cudnnRestoreDropoutDescriptor from scratch. Great! Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Fix cuDNN 7 behavior. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Delete some unused, controversial methods from MatrixRef. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Add missing filter_dim_a slice Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Replace nested for-loop with itertools.chain. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * CR comment on mut_desc() Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Refactor DropoutDescriptor API. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Use cached CurrentDeviceProperties from Context. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Document _cudnn_rnn outputs. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Improve fmap docs, convert some functions to use it. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Move IndexRange to autograd/function.h Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Elaborate on CUDNN_STATUS_INVALID_VALUE return some more. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Add an all-in-one setter for RNNDescriptorParams. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Print what the unrecognized RNN mode was Signed-off-by: Edward Z. Yang <ezyang@fb.com> * RNN TensorDescriptor improvements - Have an explicit size/stride overload for set TensorDescriptor, so you don't have to create a goofy view to feed in. - Change the padding to 3D rather than 5D, which is all you actually need (it's just 2D that is not supported by cuDNN API.) Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Fix implementation of cudnnRestoreDropoutDescriptor, plus test. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Better comments about input layout. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Add comment about no-DropoutDescriptor argument RNNDescriptor function. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Rename vocab_size back to input_size. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Don't use backslash in comment. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Bugfix for contiguous TensorGeometry calculation. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Don't allocate a dummy tensor when setting TensorDescriptor for flatten_weight. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Make contiguity errors more user-friendly. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * s/fn.dropout.train/fn_train/ Signed-off-by: Edward Z. Yang <ezyang@fb.com> * s/_cudnn_rnn_backward_grad/_cudnn_rnn_backward_input/ Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Make dcx properly undefined when not required. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Remove old TODO. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Add state size check in cudnnRestoreDropoutDescriptor Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Explicitly narrow int64_t to size_t Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Restore copyParams comment. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Update benchmark numbers, and slight engineering improvements. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Typofix. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-02-05 13:54:11 -05:00
Richard Zou	bc11511cda	Restore sparse variable transpose_() and t_() (#4779 ) * Restore sparse variable transpose_() and t_() * Add dimension wrapping to transpose_, t_ * Don't expose sparse_raw_resize_ to python	2018-01-23 21:32:40 -05:00
gchanan	c49f0279a6	Add kwarg-only 'requires_grad' parameter to Variable factories. (#4748 ) * Add kwarg-only 'requires_grad' parameter to Variable factories. Functions that create variables, e.g. torch.ones_like currently always return Variables with requires_grad=False; this is less convenient than the existing Variable constructor that has a requires_grad parameter. This commit adds the parameter at the python binding level. * Fix flake8. * Address review comments. * Match set_requires_grad implementation with tensor_new version.	2018-01-22 19:15:11 -05:00
Sam Gross	57549b7e44	Bind functions with out= arguments in VariableType (#4565 ) This adds overrides in VariableType for the xxx_out ATen functions and implements Python bindings. There is no support for automatic differentiation. If any of the inputs (or outputs) requires grad, then the function will throw an exception unless it's running in "no-grad" mode. The bindings for calling torch.xxx functions on Variables are moved to a different object. Previously, they were static method on VariableBase. This change prevents users from accidentally calling static methods as if they were instance methods.	2018-01-17 18:27:42 -05:00
Sam Gross	f8a4b1a266	Split off load_derivatives and gen_autograd_functions from gen_variable_type (#4370 )	2017-12-27 18:59:41 -05:00
Tongzhou Wang	d8b2e5d091	Add python only default init expression; Implement stft, hann/hamming/bartlett window. (#4095 ) * implement stft * addressed comments; implemented window functions; added support for python only default initialization	2017-12-18 12:28:23 -05:00
Sam Gross	c813ce3787	Implement Variable._sparse_mask (#4124 ) * Implement Variable._sparse_mask * Use SparseTensor as the dyanmic_type	2017-12-15 17:25:20 -05:00
Sam Gross	aeb7a3668d	Implement Variable.new (#4080 )	2017-12-11 15:45:43 -05:00
Tongzhou Wang	c681b03d37	Add determinant function on variable; Add backward on svd (#3816 ) * determinant on variable * svd bwd	2017-12-01 13:22:46 -05:00
Adam Paszke	65e0d5bad8	Fix void* wrapping in autograd codegen Also, add assertions here and there to make sure bad things never happen again.	2017-11-24 13:33:13 +01:00
Sam Gross	9cb8b43778	Split off in-place NN functions (#3683 ) For example, this splits threshold into threshold(), which is now never in-place, and threshold_() which is always in-place. This simplifies the in-place vs. non-in-place logic in gen_variable_type.py, which was bug-prone.	2017-11-14 12:59:06 -05:00
Zach DeVito	5aa5b572e4	update build so that all of TH* is in libATen	2017-11-02 19:53:36 -04:00
Edward Z. Yang	53fe804322	Make ONNX work with new C++ autograd world. The general strategy is there is a new module, torch.onnx.symbolic, which contains a function for every ATen method name with the ONNX translation. While implementing this, I took the opportunity to expunge all references of 'g' from the public API; instead, it is managed by a global variable in torch.onnx which tracks the "current graph". Other changes: - If you pass a Tensor to op as an argument, it will now automatically be converted into a Constant ONNX node. This lets us remove needing to implement ONNX - Rename value to other, wherever there is both a Scalar and Tensor overload. This way, keyword dispatch can work uniformly in both cases. - Deleted any autograd Function classes that both had a symbolic and were ported to the new C++ autograd implementation. There may still be some straggling classes that didn't have symbolic. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-10-20 15:38:01 -04:00
Sam Gross	f1f64c8d07	Generate autograd functions for NN / more refactors (#3136 ) Generate autograd functions for NN and implement more derivatives in derivatives.yaml A big refactor of gen_variable_type.py	2017-10-19 15:03:26 -04:00
Sam Gross	f29bcab67e	Use Declarations.yaml to generate python bindings	2017-10-07 00:41:29 -04:00
Sam Gross	558d26a69e	Fix argument indices	2017-10-07 00:41:29 -04:00

... 4 5 6 7 8

351 Commits