pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Author	SHA1	Message	Date
Simeon Monov	c4ee2b7067	Moved torch headers copy to build_deps (#5772 ) * Moved torch headers copy to build_deps PR #5706 initially moved headers under build_ext to fix bdist_wheel and build develop. This broke install and #5755 moved them back to install which broke bdist_wheel and build develop. Looks like build_ext is called from install after it already tried to copy the headers to the python install dir and the headers were not installed correctly. Using build_deps works correct with all setup.py install, bdist_wheel and build develop. * Comment about the auto-generated files Added comment that the current solution will not include auto-generated files which may be a problem if somebody needs to use them	2018-03-23 11:34:27 -04:00
Jon Malmaud	add04c56bf	Verify that 'catch' submodule has been checked out before attempting build. (#5941 )	2018-03-22 11:28:04 -04:00
gchanan	c474136ee1	[REDO] Add torch.sparse_coo_tensor factory. (#5781 ) * Add torch.sparse_coo_tensor factory. Notes: 1) I didn't add Tensor.new_sparse_coo_tensor; it didn't seem particularly useful, but it's easy to add 2) This doesn't do the type inference, i.e. torch.sparse_coo_tensor(indices=LongTensor, values=IntTensor) will return a sparse tensor corresponding to the default type rather than a sparse IntTensor. We can add type inference later when we add it to other factories. * Fix merge. * Use type_conversion function from python_variable_methods.	2018-03-16 13:58:02 -04:00
cpuhrsch	5fa3aac610	ATen ReduceOps (#5776 ) #5481 was reverted due to a strange test bug. This PR attempts to fix that. This diff adds vectorization to ATen. It uses intel intrinsics to build a general vec256 class, that represents types of 256bit width. These can then be treated like regular variables. Using those it implements torch.sum() for the contiguous case. It uses Intel TBB for multithreading, which allows workstealing and chunks the reduction operations based on a experimentally chosen value (_THRESHOLD). It uses cpuinfo to pick the right code depending on the host's capabilities. The kernels are implemented under native/cpu. Each .cpp file is compiled with -avx, -avx2 and no additional flags. A macro is used to append AVX, AVX2 or NONE to the function name. The header then needs to define the functions three times, one for each capability. This could be improved by either changing the cmake file a bit or possibly generating source code using a Python script etc. For the non-contiguous case this defaults to the current implementation within TH. For CUDA is entirely defaults to the implementation within THC. There probably needs to be a bit of a debate around the design decisions here, the additional dependencies, parallelization strategy, clarity, etc. The numerical results also diverge from numpy with larger tensors, which is expected since we're summing, for example, 8 numbers and then adding the result to the running sum, instead of each number one by one. But there might be something to be said about accumulating into a double for floats or the degree of divergence, the behavior with respect to CUDA, etc. I wrote a [small Python script]( https://github.com/cpuhrsch/benchmark/blob/sumall/benchmarks/sum_bench.py) to compare the results with numpy numerically as well as on timing. I ran this script to create timings both on master and this branch. Here is the command for 1 core `OMP_NUM_THREAD=1 taskset -c 0 python sum_bench.py --enable_numpy 200` Here is the command for all cores `python sum_bench.py --enable_numpy 200` Here are the results of each: [Master, 1 core](https://paste.fedoraproject.org/paste/Nho9JzHpPVK9av8a6mByjQ) [This branch, 1 core](https://paste.fedoraproject.org/paste/6xLHkYvcVJx9z~5MoHxN4w) [Master, all cores](https://paste.fedoraproject.org/paste/5l3V1d5zGqvJcMXIUteMRw) [This branch, all cores](https://paste.fedoraproject.org/paste/J4RuDU-0Drz0aZwtphQwEA) To test the command is `python sum_bench.py --test 200` [This branch, test results](https://paste.fedoraproject.org/paste/kTEoUC~oWgXA6XWMAfNfNw) For this test we look at the average absolute value of the differences. This does not take into account the relative magnitude of the numbers. The numbers are sampled from a standard normal distribution. In terms of performance this diff should bring PyTorch on par with Numpy and usually exceed it by 1.5 to 2x.	2018-03-15 12:09:28 -04:00
peterjc123	abd6f82709	Fix debug build failure on Windows (#5771 )	2018-03-15 11:42:44 -04:00
Edward Z. Yang	cadeb0cb17	Revert "ATen ReduceOps (#5481 )" (#5765 ) * Revert "ATen ReduceOps (#5481)" This reverts commit 310c3735b9eb97f30cee743b773e5bb054989edc. * Revert "Check that new cpuinfo and tbb submodules exist (#5714)" This reverts commit 1a23c9901dbfee295bf5b3dad36e4d3ee7e86366.	2018-03-13 23:50:16 -04:00
Peter Goldsborough	bab0f8484b	Put torch header install back into the install command (#5755 )	2018-03-13 19:23:02 -04:00
Sam Gross	1a23c9901d	Check that new cpuinfo and tbb submodules exist (#5714 )	2018-03-12 15:44:10 -04:00
Zachary DeVito	41285edbb6	[jit] add a compiled script module (#5630 ) Add script::Module C++ class to represent script modules switch AST -> IR conversion to work on Modules/Methods rather than raw graphs function-only AST -> IR conversion is just a simplified case where there is only one module with a single method and no parameters. introduce SugaredValue in compiler.h to represent values in scope in a script function that are not first-class and that get desugared. This is used to represent the module's self parameter, as well as python function calls, and method calls on tensor provide a Python ScriptModule that provides a nice API on top of script::Module allowing for the definition of script modules with methods, parameters, and submodules Not in this PR but intended for the future: ScriptModule actually subclasses nn.Module, with most methods implemented Unification of tracedmodule and script module functionality into one container class. Detailed changelog: * Switch compiler over to using Module, but don't use them yet. * Remove intermediate attribute encoding in compiler * Create SugaredValue object to handle resolution of compiled module. * switch to_ir to modules, implement Select * hacky python wrappers * Private ScriptModule * Add `define` to script module * Attributes use TK_LIST_LITERAL this anticipates adding a real list literal expression to the language. * Add a metaclass to make sure script stubs are registered * Add a test * Doc createResolutionCallback * Docs and minor editing * Address PR comments * Document * Fix unicode issue	2018-03-12 09:52:40 -04:00
Simeon Monov	dede63689f	Moved headers files copy for C++ extensions to build_ext in setup.py (#5706 ) The header files needed for the C++ extensions were copied to torch/lib/include under install. In case of bdist_wheel or build develop for example, the files are not copied and cpp_extensions test is failing: ``` Running test_cpp_extensions.py ... running install running build running build_ext /home/moni/src/ibm/AI/pytorch/torch/utils/cpp_extension.py:79: UserWarning: Your compiler (g++) may be ABI-incompatible with PyTorch. Please use a compiler that is ABI-compatible with GCC 4.9 and above. See https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html. warnings.warn(ABI_INCOMPATIBILITY_WARNING.format(compiler)) building 'torch_test_cpp_extension' extension creating build creating build/temp.linux-x86_64-3.6 gcc -pthread -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/moni/src/ibm/AI/pytorch/torch/lib/include -I/home/moni/src/ibm/AI/pytorch/torch/lib/include/TH -I/home/moni/src/ibm/AI/pytorch/torch/lib/include/THC -I/home/moni/miniconda3/envs/pytorch/include/python3.6m -c extension.cpp -o build/temp.linux-x86_64-3.6/extension.o -g -DTORCH_EXTENSION_NAME=torch_test_cpp_extension -std=c++11 cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ extension.cpp:1:25: fatal error: torch/torch.h: No such file or directory #include <torch/torch.h> ^ compilation terminated. error: command 'gcc' failed with exit status 1 ```	2018-03-12 14:07:45 +01:00
Richard Zou	03f2ad9029	Add check for python build deps to setup.py (#5618 ) * Add check for python build deps to setup.py * Address comments * Remove install_requires line	2018-03-09 23:49:18 -05:00
Peter Goldsborough	7391dae709	Fix Variable conversion on the way to/from Python (#5581 ) * PyObject* <--> at::Tensor no longer unwraps variables, instead we expect end uses to always work with variable types, and we will only unwrap the variables when we optimize. * Add torch::CPU, torch::CUDA and torch::getType * at::CPU -> torch::CPU in extensions	2018-03-09 14:31:05 -08:00
Sam Gross	5dedc648bb	Compile DataLoader.cpp separately (#5507 ) Don't #include DataLoader.cpp in Module.cpp	2018-03-02 05:54:33 -05:00
Peter Goldsborough	b10fcca5f0	Install cuda headers in ATen build (#5474 )	2018-02-28 19:36:41 -08:00
peterjc123	377d896969	better solution for the linking error related to lazy_init for MSVC (#5375 ) * Revert "Fix wrong argument name (#5366)" This reverts commit cc9d3b265d7e688865fde055ee3a2f9b77b5714a. * Fix wrong argument naming * Revert "Wrap torch::cuda::lazy_init with WITH_CUDA flag" This reverts commit a8fa37f8fac5aef09eb7fe54d84de6126618c262. * Revert "Solves the linking error related to lazy_init for MSVC" This reverts commit 63913a102f274865a76e7c40ffdf6b40c277d5ff. * better solution for the linking error related to lazy_init for MSVC * Naming changes * Namespace changes and further comment * Rebasing onto current master * Remove code that is useless * Fix linting * Remove rebasing bugs	2018-02-28 17:34:34 -05:00
Sam Gross	48a3349c29	Delete dead Tensor code paths (#5417 ) This deletes most of the dead Tensor code paths, including the TensorMethods cwrap and generic/Tensor.cpp. This also moves the THNN.cwrap/.cpp generation to generate_code which can use ninja if installed.	2018-02-27 17:58:09 -05:00
gchanan	d5038309a1	Remove WITH_SCALARS, as it's enabled by default now. (#5437 )	2018-02-27 14:51:11 -05:00
Soumith Chintala	d2f71cbdeb	make CuDNN finders respect library major version (#5399 )	2018-02-24 19:37:00 -05:00
Sam Gross	30ec06c140	Merge Variable and Tensor classes (#5225 ) This replaces the torch.Tensor constructors with factories that produce Variables. Similarly, functions on the torch module (e.g. torch.randn) now return Variables. To keep the PR to a reasonable size, I've left most of the unused tensor code. Subsequent PRs will remove the dead code, clean-up calls to torch.autograd.Variable, and rename Variable to Tensor everywhere. There are some breaking changes because Variable and Tensors had slightly different semantics. There's a list of those changes here: https://github.com/pytorch/pytorch/wiki/Breaking-Changes-from-Variable-and-Tensor-merge	2018-02-23 18:03:31 -05:00
peterjc123	6c587e9e67	Solves the linking error related to lazy_init for MSVC (#5368 ) * Revert "Fix wrong argument name (#5366)" This reverts commit cc9d3b265d7e688865fde055ee3a2f9b77b5714a. * Solves the linking error related to lazy_init for MSVC * Fix wrong argument naming * Wrap torch::cuda::lazy_init with WITH_CUDA flag	2018-02-23 11:08:20 -05:00
Peter Goldsborough	008ba18c5b	Improve CUDA extension support (#5324 ) * Also pass torch includes to nvcc build * Export ATen/cuda headers with install * Refactor flags common to C++ and CUDA * Improve tests for C++/CUDA extensions * Export .cuh files under THC * Refactor and clean cpp_extension.py slightly * Include ATen in cuda extension test * Clarifying comment in cuda_extension.cu * Replace cuda_extension.cu with cuda_extension_kernel.cu in setup.py * Copy compile args in C++ extension and add second kernel * Conditionally add -std=c++11 to cuda_flags * Also export cuDNN headers * Add comment about deepcopy	2018-02-23 10:15:30 -05:00
peterjc123	cc9d3b265d	Fix wrong argument name (#5366 )	2018-02-23 00:37:02 -05:00
peterjc123	013ed5b88f	Add lazy_init.h into build for Windows and refactor code (#5365 ) * Add lazy_init.h into build for Windows and refactor code * Remove minor bugs	2018-02-23 00:05:43 -05:00
Soumith Chintala	9388d35293	prioritize cudnn library dir in library_dirs order (#5345 )	2018-02-21 22:51:04 -05:00
gchanan	0878c6d4d7	Various dtype improvements. (#5321 ) * Various dtype improvements. 1) Add dtypes to the new data-based constructors: Variable.new_tensor and torch.autograd.variable. 2) In the python signatures, use Type instead of Dtype to match the C++ signatures; the error messages still print as dtype. 3) Handle / add a better error message when a dtype is used when ATen was not compiled with that type (e.g. cuda types). 4) Move cuda_lazy_init to its own file. A later commit will add support to the legacy constructors as well. * Move implementation of lazy_init to cpp. * Fix parsed_arg size.	2018-02-21 17:37:59 -05:00
Edward Z. Yang	031412a14b	setup.py and cmake improvements (#5269 ) * Document env vars and properly propagate MAX_JOBS down. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Apply CFLAGS and LDFLAGS environment variables to cmake builds. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Test that running built program works; fixes #5151. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * CMake CR. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-02-20 16:55:57 -05:00
gchanan	5edf6b2037	Add numpy-style dtypes to Variable factories. (#5245 ) * Add numpy-style dtypes to Variable factories. 1) Add numpy-style dtypes corresponding to torch tensor types. These are: torch.float16, torch.float32, torch.float64, torch.uint8, torch.int8, torch.int16, torch.int32, torch.int64 as well as torch.cuda, torch.sparse, and torch.cuda.sparse equivalents. 2) Adds "legacy" names for the above dtypes that correspond more closely to existing tensor names. These are: torch.half, torch.float, torch.double, torch.short, torch.int, torch.long. torch.byte and torch.char don't exist because they either don't match numpy semantics or differ on different architectures. 3) Adds a "dtype" parameter to Variable factories (e.g. zeros, ones) that allows the user to specify the type without changing the default tensor type. 4) Adds a "dtype" getter to Variables that return the canonical dtype from 1) This PR is missing the following useful features that should be added in the future: A) We only add the "dtype" parameter to auto-generated factories; hand-written factories like in tensor_new.cpp don't support this yet. B) We don't allow type conversions to use dtypes; that should be added to type(param) or a new function. C) We don't yet have a "device" parameter for these factories; right now, they will only create Variables on the default device. * backend_to_string can be private. * Define python binding argument indexes in a more simple way. * add all_declared_types, still need to hook it up to THPDType. * Fix all_declared_types for missing types (it's Sparse + Half). * Ensure cuda dtypes are created even if compiled with NO_CUDA=1. * Fix case where dtype is provided but dispatch is via namespace. This happens in ones_like, empty_like, randn_like. There is some question if we should do: 1) at::ones_like(tensor).toType(dtype) 2) at::ones_like(tensor.toType(dtype)) I did the former because this matches with the numpy documentation, i.e.: "Overrides the data type of the result." and it's easier to implement. Note that the above causes an extra copy, either of the input or output. Here's a better implementation: 1) Make zeros_like, ones_like native functions that take an optional type (named dtype?). 2) Match the type argument with the dtype, so we don't have two different parameters. 3) Call at::zeros_like(input, type) -> at::native::zeros_like(input, type) -> type.zeros(input.sizes()) * Don't return from maybe_initialize_cuda. * Don't leak DType name. * Address cpp review comments. * Share code between sparse and non-sparse test_dtypes. * Rewrite _like functions as native function with explicit type parameter. * Use type 'Type' instead of 'dtype' for consistency. * Address review comments. * Handle arg_idx when there is requires_grad but no dtype in python_binding_arguments.	2018-02-20 11:04:14 -05:00
Adam Paszke	cb2fd39fdd	Add Python frontend to the JIT (#5190 )	2018-02-15 22:53:19 +01:00
Peter Goldsborough	2d5fbe6e0d	Improve Variable interface (#5127 ) * Improve Variable interface * Address comments from @apaszke and @colesbury * string ::operator= is not noexcept * Remove ir.h from tracer_state.h to improve build times * Make Variable a struct and pack SavedVariable fields * Implement as_variable_ref * grad_fn_ptr() -> grad_fn_unsafe() * Reduce hackiness of set_type hack * Include variable.h and edge.h in tracer_state.h because it uses them * class Variable -> struct Variable because Windows cant even * Make Variable::output_nr uint32_t instead of int * Add comment about tracing state * Replaced more static_cast<Variable&> and improve docs * Remove SavedVariable destructor and construct members in init list * Clarify docs for Variable * Variable::set_version -> set_version_counter	2018-02-12 23:26:26 -05:00
gchanan	4b8bf73729	Enable scalars. (#5158 ) * Enable scalars. * Avoid variable name shadowing in list comprehension, because it rebinds in python2, but not python3.	2018-02-09 15:45:41 -05:00
bddppq	3e85613751	Experimental jit script (#5074 )	2018-02-07 20:43:45 +01:00
Zachary DeVito	c308e03f3e	Initial GraphExecutor Implementation. (#4982 ) This adds the initial implementation of graph executor for the new JIT design. It includes a few python tests ensuring that nograd, backward, and double-backward cases work for simple examples and some corner cases. More work needs to be done to performance optimize as there are many extra copies and places where we hold onto variables longer than we should. These are noted in the comments.	2018-02-02 17:45:59 -08:00
Peter Goldsborough	1475895c1d	Use distutils.copy_tree/copy_file instead of shutil	2018-02-01 16:19:03 -08:00
Peter Goldsborough	1262fba8e7	[cpp extensions] Create torch.h and update setup.py	2018-02-01 16:19:03 -08:00
Zach DeVito	2d829d15af	[JIT] Add simple shape analysis This quick and dirty shape analysis just makes up fake tensors, and runs them through ATen to do shape propagation.	2018-01-28 22:55:36 -08:00
Edward Z. Yang	b8ab7bee26	Use variadic templates instead of initializer lists and overloads. (#4772 ) Suppose you are given a list of arguments, each of which may be Tensor or TensorList. How can you write a function that can treat these arguments uniformly as a list of tensors? This patch solves the problem using variadic templates. Why variadic templates? Use of variadic templates means anyone working with this code has to understand universal references, perfect forwarding, parameter packs and some idioms of C++ template design. However, I argue that variadic templates are the right tool for supporting the implementation of functions which must take an arbitrarily heterogenous set of inputs. We were able to limp by in old code because, for the most part, tensor inputs were homogenous, but this is no longer the case for some non-primitively differentiable functions; and with the upcoming cuDNN RNN in ATen PR, will no longer be the case for primitively differentiable functions too. There are two parts to the PR. First, we add torch/csrc/utils/variadic.h, which defines a mix-in IterArgs that takes any class which supports operator(), and augments with a new variadic function apply() which calls operator() on each argument passed to it. In an original draft of the patch, I wrote the recursion for each parameter pack from scratch for each function; however, it turns out there are no fewer than seven instances where we need this idiom, and the mix-in reduces the lines of code, and also helps centralize the most important (and easy to forget) boilerplate for perfect forwarding. To verify that IterArgs is compiled away into an unrolled form per call site, I inspected the assembly on some synthetic examples. Next, we modify the following functions to make use of IterArgs: - compute_requires_grad - Function::flags (Variable and Tensor variants) - flatten - isTracing - count_tensors / count_variables Finally, the tuple packer is rewritten to be variadic, although we cannot make use of IterArgs (since we are given a tuple). It might make sense to refactor the code into a generic piece which invokes a function with the arguments specified by a tuple, and then an appropriate IterArgs, but we leave this for future work. One thing to note: we cannot write a function with overloads for both Tensor and Variable, because both ArrayRef<Variable> and Tensor have implicit conversions from Variable, making such an overload ambiguous. It may be interesting to remove the implicit conversion from ArrayRef. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-01-26 15:56:39 -05:00
Soumith Chintala	bb3bc969ca	fix binary version scheme to be PEP compliant (#4847 )	2018-01-25 11:16:02 -05:00
Teng Li	1b3d6ab864	Enabling Infiniband support for Gloo data channel with auto IB detection (#4795 )	2018-01-24 23:18:24 +01:00
Zachary DeVito	0ae5498079	[JIT] add create_autodiff_subgraphs (#4822 ) This pass splits differentiable subgraphs into their own Node, similar to a fusion group. This initial implementation does not create optimal subgraphs, but it works well in the case where most things are differentiable, and has the building blocks (`mergeNodes`) to extend to the better implementation.	2018-01-23 23:46:54 -05:00
gchanan	9bb6d33d35	Enable scalars if compiled with WITH_SCALAR environment variable. (#4806 ) * Enable scalars if compiled with WITH_SCALAR environment variable. We are pretty close to enabling scalars (0-dimensional arrays); this allows turning them on for development purposes and to be able to write code that works both with and without scalars enabled. WITH_SCALARS is currently broken with distributions, but should work for test_torch, test_autograd, test_nn. * Fix unsqueeze. * Fix wrap dim, wrapping with Scalar.	2018-01-23 15:44:11 -05:00
Adam Paszke	ad2edd8613	Check submodules only in build_deps (#4770 )	2018-01-21 20:24:05 -08:00
Adam Paszke	816d5d8ff7	Scaffolding for source-to-source AD in the JIT	2018-01-20 17:34:08 +01:00
Adam Paszke	1061d7970d	Move broadcast and broadcast_coalesced to C++	2018-01-18 11:16:45 +01:00
Adam Paszke	de5f7b725e	Base for pure C++ NCCL interface	2018-01-18 11:16:45 +01:00
Sam Gross	57549b7e44	Bind functions with out= arguments in VariableType (#4565 ) This adds overrides in VariableType for the xxx_out ATen functions and implements Python bindings. There is no support for automatic differentiation. If any of the inputs (or outputs) requires grad, then the function will throw an exception unless it's running in "no-grad" mode. The bindings for calling torch.xxx functions on Variables are moved to a different object. Previously, they were static method on VariableBase. This change prevents users from accidentally calling static methods as if they were instance methods.	2018-01-17 18:27:42 -05:00
Adam Paszke	1a02d3ae86	Implement MM fusion (MM with add reduction tree) (#4615 ) Implement MM fusion (MM with add reduction tree) A tree where leaves are matrix multiplies and inner vertices are adds can be computed as a single mm. Such subgraph often appear in backward if a single weight is reused multiple times (e.g. in RNNs). NOTE: this seems to be slightly slower on the GPU than the naive implementation, but it's a huge win on the CPU (think 100x lower overhead)	2018-01-17 21:36:21 +01:00
Jon Crall	94f439c07c	Fixed setup.py to handle CUDNN_LIBRARY envvar with aten (#4597 ) * Fixed setup.py to handle CUDNN_LIBRARY envvar with aten * undo changes * Added CUDNN_LIBRARY to bat file	2018-01-11 07:24:17 -05:00
Edward Z. Yang	dc76db349e	Delete a pile of dead code (#4295 ) * Delete obsolete basic ops. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * More deletion. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Delete some unused utilities. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Delete dead apply_fn Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Delete CppFunction symbolic support. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Delete ForwardFunction Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Batchnorm is 'working' Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-01-04 09:21:54 -05:00
peterjc123	b78a37a058	Enable ninja during python build process for MSVC (#3993 )	2017-12-30 12:58:32 +01:00
Edward Z. Yang	8c9a22a88e	Support NO_NNPACK environment variable (#4401 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-12-29 16:33:01 +09:00

... 3 4 5 6 7 ...

469 Commits