Compare commits

..

3 Commits

Author SHA1 Message Date
ccd5f4dbfc version bump 2017-05-01 15:55:29 -04:00
3cc21b5a46 fix OSX build 2017-04-29 09:29:21 -04:00
27fb8750ad fix NCCL makefile for CUDA 7.5 2017-04-28 20:08:07 -04:00
731 changed files with 12115 additions and 62664 deletions

16
.gitignore vendored
View File

@ -5,7 +5,6 @@ torch.egg-info/
torch/version.py
torch/csrc/generic/TensorMethods.cpp
torch/lib/*.so*
torch/lib/*.a*
torch/lib/*.dylib*
torch/lib/*.h
torch/lib/build
@ -20,7 +19,6 @@ torch/csrc/nn/THCUNN.cpp
torch/csrc/nn/THNN_generic.cwrap
torch/csrc/nn/THNN_generic.cpp
torch/csrc/nn/THNN_generic.h
torch/csrc/generated
docs/src/**/*
test/data/legacy_modules.t7
test/data/gpu_tensors.pt
@ -35,17 +33,3 @@ test/.coverage
*/**/*.so*
*/**/*.dylib*
test/data/legacy_serialized.pt
test/data/linear.pt
# IPython notebook checkpoints
.ipynb_checkpoints
# Editor temporaries
*.swn
*.swo
*.swp
*~
# OSX dir files
.DS_Store

View File

@ -1,8 +1,7 @@
# https://travis-ci.org/pytorch/pytorch
language: python
dist: trusty
python:
- 2.7.9
- 2.7.8
- 2.7
- 3.5
- 3.6

View File

@ -44,9 +44,7 @@ https://github.com/pytorch/pytorch#from-source
The change you have to make is to replace
```
python setup.py install
```
`python setup.py install`
with
@ -63,73 +61,18 @@ Hence, if you modify a python file, you do not need to reinstall pytorch again a
For example:
- Install local pytorch in `build develop` mode
- modify your python file `torch/__init__.py` (for example)
- modify your python file torch/__init__.py (for example)
- test functionality
- modify your python file `torch/__init__.py`
- modify your python file torch/__init__.py
- test functionality
- modify your python file `torch/__init__.py`
- modify your python file torch/__init__.py
- test functionality
You do not need to repeatedly install after modifying python files.
#### C++ Development tips
## Writing documentation
PyTorch uses [Google style](http://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html)
for formatting docstrings. Length of line inside docstrings block must be limited to 80 characters to
fit into Jupyter documentation popups.
## Managing multiple build trees
One downside to using `python setup.py develop` is that your development
version of pytorch will be installed globally on your account (e.g., if
you run `import torch` anywhere else, the development version will be
used.
If you want to manage multiple builds of PyTorch, you can make use of
[conda environments](https://conda.io/docs/using/envs.html) to maintain
separate Python package environments, each of which can be tied to a
specific build of PyTorch. To set one up:
```
conda create -n pytorch-myfeature
source activate pytorch-myfeature
# if you run python now, torch will NOT be installed
python setup.py build develop
```
## C++ Development tips
If you are working on the C++ code, there are a few important things that you
will want to keep in mind:
1. How to rebuild only the code you are working on, and
2. How to make rebuilds in the absence of changes go faster.
### Build only what you need.
`python setup.py build` will build everything, but since our build system is
not very optimized for incremental rebuilds, this will actually be very slow.
Far better is to only request rebuilds of the parts of the project you are
working on:
- Working on `torch/csrc`? Run `python setup.py develop` to rebuild
(NB: no `build` here!)
- Working on `torch/lib/TH`, did not make any cmake changes, and just want to
see if it compiles? Run `(cd torch/lib/build/TH && make install -j$(getconf _NPROCESSORS_ONLN))`. This
applies for any other subdirectory of `torch/lib`. **Warning: Changes you
make here will not be visible from Python.** See below.
- Working on `torch/lib` and want to run your changes / rerun cmake? Run
`python setup.py build_deps`. Note that this will rerun cmake for
every subdirectory in TH; if you are only working on one project,
consider editing `torch/lib/build_all.sh` and commenting out the
`build` lines of libraries you are not working on.
On the initial build, you can also speed things up with the environment
variables `DEBUG` and `NO_CUDA`.
When you are developing on the C++ side of things, the environment variables `DEBUG` and `NO_CUDA` are helpful.
- `DEBUG=1` will enable debug builds (-g -O0)
- `NO_CUDA=1` will disable compiling CUDA (in case you are developing on something not CUDA related), to save compile time.
@ -139,15 +82,7 @@ For example:
NO_CUDA=1 DEBUG=1 python setup.py build develop
```
Make sure you continue to pass these flags on subsequent builds.
### Make no-op build fast.
Python `setuptools` is pretty dumb, and always rebuilds every C file in a
project. Using ccache in a situation like this is a real time-saver. However, by
default, ccache does not properly support CUDA stuff, so here are the
instructions for installing a custom `ccache` fork that has CUDA support:
Also, if you are developing a lot, using ccache is a real time-saver. By default, ccache does not properly support CUDA stuff, so here are the instructions for installing a custom `ccache` fork that has CUDA support:
```
# install and export ccache
if ! ls ~/ccache/bin/ccache

View File

@ -1,16 +1,18 @@
FROM nvidia/cuda:8.0-cudnn6-devel-ubuntu16.04
FROM nvidia/cuda:8.0-devel-ubuntu16.04
RUN echo "deb http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64 /" > /etc/apt/sources.list.d/nvidia-ml.list
ENV CUDNN_VERSION 6.0.20
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
cmake \
git \
curl \
vim \
ca-certificates \
libjpeg-dev \
libpng-dev &&\
libpng-dev \
libcudnn6=$CUDNN_VERSION-1+cuda8.0 \
libcudnn6-dev=$CUDNN_VERSION-1+cuda8.0 && \
rm -rf /var/lib/apt/lists/*
RUN curl -o ~/miniconda.sh -O https://repo.continuum.io/miniconda/Miniconda3-4.2.12-Linux-x86_64.sh && \
@ -28,9 +30,7 @@ COPY . .
RUN TORCH_CUDA_ARCH_LIST="3.5 5.2 6.0 6.1+PTX" TORCH_NVCC_FLAGS="-Xfatbin -compress-all" \
CMAKE_PREFIX_PATH="$(dirname $(which conda))/../" \
pip install -v .
RUN git clone https://github.com/pytorch/vision.git && cd vision && pip install -v .
python setup.py install
WORKDIR /workspace
RUN chmod -R a+w /workspace

View File

@ -2,30 +2,29 @@
--------------------------------------------------------------------------------
PyTorch is a Python package that provides two high-level features:
- Tensor computation (like NumPy) with strong GPU acceleration
- Deep neural networks built on a tape-based autograd system
PyTorch is a python package that provides two high-level features:
- Tensor computation (like numpy) with strong GPU acceleration
- Deep Neural Networks built on a tape-based autograd system
You can reuse your favorite Python packages such as NumPy, SciPy and Cython to extend PyTorch when needed.
You can reuse your favorite python packages such as numpy, scipy and Cython to extend PyTorch when needed.
We are in an early-release beta. Expect some adventures and rough edges.
We are in an early-release Beta. Expect some adventures and rough edges.
- [More about PyTorch](#more-about-pytorch)
- [More About PyTorch](#more-about-pytorch)
- [Installation](#installation)
- [Binaries](#binaries)
- [From Source](#from-source)
- [Docker Image](#docker-image)
- [From source](#from-source)
- [Docker image](#docker-image)
- [Getting Started](#getting-started)
- [Communication](#communication)
- [Releases and Contributing](#releases-and-contributing)
- [The Team](#the-team)
| System | 2.7 | 3.5 |
| System | Python | Status |
| --- | --- | --- |
| Linux CPU | [![Build Status](https://travis-ci.org/pytorch/pytorch.svg?branch=master)](https://travis-ci.org/pytorch/pytorch) | [![Build Status](https://travis-ci.org/pytorch/pytorch.svg?branch=master)](https://travis-ci.org/pytorch/pytorch) |
| Linux GPU | [![Build Status](http://build.pytorch.org:8080/buildStatus/icon?job=pytorch-master-py2-linux)](https://build.pytorch.org/job/pytorch-master-py2-linux) | [![Build Status](http://build.pytorch.org:8080/buildStatus/icon?job=pytorch-master-py3-linux)](https://build.pytorch.org/job/pytorch-master-py3-linux) |
| macOS CPU | [![Build Status](http://build.pytorch.org:8080/buildStatus/icon?job=pytorch-master-py2-osx-cpu)](https://build.pytorch.org/job/pytorch-master-py2-osx-cpu) | [![Build Status](http://build.pytorch.org:8080/buildStatus/icon?job=pytorch-master-py3-osx-cpu)](https://build.pytorch.org/job/pytorch-master-py3-osx-cpu) |
| Linux CPU | 2.7.8, 2.7, 3.5, nightly | [![Build Status](https://travis-ci.org/pytorch/pytorch.svg?branch=master)](https://travis-ci.org/pytorch/pytorch) |
| Linux GPU | 2.7 | [![Build Status](http://build.pytorch.org:8080/buildStatus/icon?job=pytorch-master-py2)](https://build.pytorch.org/job/pytorch-master-py2) |
| Linux GPU | 3.5 | [![Build Status](http://build.pytorch.org:8080/buildStatus/icon?job=pytorch-master-py3)](https://build.pytorch.org/job/pytorch-master-py3) |
## More about PyTorch
@ -38,7 +37,7 @@ At a granular level, PyTorch is a library that consists of the following compone
</tr>
<tr>
<td><b> torch.autograd </b></td>
<td> a tape-based automatic differentiation library that supports all differentiable Tensor operations in torch </td>
<td> a tape based automatic differentiation library that supports all differentiable Tensor operations in torch </td>
</tr>
<tr>
<td><b> torch.nn </b></td>
@ -46,7 +45,7 @@ At a granular level, PyTorch is a library that consists of the following compone
</tr>
<tr>
<td><b> torch.multiprocessing </b></td>
<td> Python multiprocessing, but with magical memory sharing of torch Tensors across processes. Useful for data loading and Hogwild training. </td>
<td> python multiprocessing, but with magical memory sharing of torch Tensors across processes. Useful for data loading and hogwild training. </td>
</tr>
<tr>
<td><b> torch.utils </b></td>
@ -60,14 +59,14 @@ At a granular level, PyTorch is a library that consists of the following compone
Usually one uses PyTorch either as:
- a replacement for NumPy to use the power of GPUs.
- A replacement for numpy to use the power of GPUs.
- a deep learning research platform that provides maximum flexibility and speed
Elaborating further:
### A GPU-Ready Tensor Library
### A GPU-ready Tensor library
If you use NumPy, then you have used Tensors (a.k.a ndarray).
If you use numpy, then you have used Tensors (a.k.a ndarray).
<p align=center><img width="30%" src="docs/source/_static/img/tensor_illustration.png" /></p>
@ -78,15 +77,15 @@ We provide a wide variety of tensor routines to accelerate and fit your scientif
such as slicing, indexing, math operations, linear algebra, reductions.
And they are fast!
### Dynamic Neural Networks: Tape-Based Autograd
### Dynamic Neural Networks: Tape based Autograd
PyTorch has a unique way of building neural networks: using and replaying a tape recorder.
Most frameworks such as TensorFlow, Theano, Caffe and CNTK have a static view of the world.
Most frameworks such as `TensorFlow`, `Theano`, `Caffe` and `CNTK` have a static view of the world.
One has to build a neural network, and reuse the same structure again and again.
Changing the way the network behaves means that one has to start from scratch.
With PyTorch, we use a technique called reverse-mode auto-differentiation, which allows you to
With PyTorch, we use a technique called Reverse-mode auto-differentiation, which allows you to
change the way your network behaves arbitrarily with zero lag or overhead. Our inspiration comes
from several research papers on this topic, as well as current and past work such as
[autograd](https://github.com/twitter/torch-autograd),
@ -98,45 +97,45 @@ You get the best of speed and flexibility for your crazy research.
<p align=center><img width="80%" src="docs/source/_static/img/dynamic_graph.gif" /></p>
### Python First
### Python first
PyTorch is not a Python binding into a monolithic C++ framework.
PyTorch is not a Python binding into a monolothic C++ framework.
It is built to be deeply integrated into Python.
You can use it naturally like you would use NumPy / SciPy / scikit-learn etc.
You can use it naturally like you would use numpy / scipy / scikit-learn etc.
You can write your new neural network layers in Python itself, using your favorite libraries
and use packages such as Cython and Numba.
Our goal is to not reinvent the wheel where appropriate.
### Imperative Experiences
### Imperative experiences
PyTorch is designed to be intuitive, linear in thought and easy to use.
When you execute a line of code, it gets executed. There isn't an asynchronous view of the world.
When you drop into a debugger, or receive error messages and stack traces, understanding them is straightforward.
The stack trace points to exactly where your code was defined.
When you drop into a debugger, or receive error messages and stack traces, understanding them is straight-forward.
The stack-trace points to exactly where your code was defined.
We hope you never spend hours debugging your code because of bad stack traces or asynchronous and opaque execution engines.
### Fast and Lean
PyTorch has minimal framework overhead. We integrate acceleration libraries
such as Intel MKL and NVIDIA (cuDNN, NCCL) to maximize speed.
At the core, its CPU and GPU Tensor and neural network backends
PyTorch has minimal framework overhead. We integrate acceleration libraries
such as Intel MKL and NVIDIA (CuDNN, NCCL) to maximize speed.
At the core, its CPU and GPU Tensor and Neural Network backends
(TH, THC, THNN, THCUNN) are written as independent libraries with a C99 API.
They are mature and have been tested for years.
Hence, PyTorch is quite fast whether you run small or large neural networks.
Hence, PyTorch is quite fast -- whether you run small or large neural networks.
The memory usage in PyTorch is extremely efficient compared to Torch or some of the alternatives.
We've written custom memory allocators for the GPU to make sure that
your deep learning models are maximally memory efficient.
This enables you to train bigger deep learning models than before.
### Extensions without Pain
### Extensions without pain
Writing new neural network modules, or interfacing with PyTorch's Tensor API was designed to be straightforward
Writing new neural network modules, or interfacing with PyTorch's Tensor API was designed to be straight-forward
and with minimal abstractions.
You can write new neural network layers in Python using the torch API
[or your favorite NumPy-based libraries such as SciPy](http://pytorch.org/tutorials/advanced/numpy_extensions_tutorial.html).
[or your favorite numpy based libraries such as SciPy](http://pytorch.org/tutorials/advanced/numpy_extensions_tutorial.html).
If you want to write your layers in C/C++, we provide an extension API based on
[cffi](http://cffi.readthedocs.io/en/latest/) that is efficient and with minimal boilerplate.
@ -150,16 +149,16 @@ Commands to install from binaries via Conda or pip wheels are on our website:
[http://pytorch.org](http://pytorch.org)
### From Source
### From source
If you are installing from source, we highly recommend installing an [Anaconda](https://www.continuum.io/downloads) environment.
You will get a high-quality BLAS library (MKL) and you get a controlled compiler version regardless of your Linux distro.
Once you have [Anaconda](https://www.continuum.io/downloads) installed, here are the instructions.
Once you have [anaconda](https://www.continuum.io/downloads) installed, here are the instructions.
If you want to compile with CUDA support, install
- [NVIDIA CUDA](https://developer.nvidia.com/cuda-downloads) 7.5 or above
- [NVIDIA cuDNN](https://developer.nvidia.com/cudnn) v5.x or above
- [NVIDIA CuDNN](https://developer.nvidia.com/cudnn) v5.x or above
If you want to disable CUDA support, export environment variable `NO_CUDA=1`.
@ -167,7 +166,7 @@ If you want to disable CUDA support, export environment variable `NO_CUDA=1`.
On Linux
```bash
export CMAKE_PREFIX_PATH="$(dirname $(which conda))/../" # [anaconda root directory]
export CMAKE_PREFIX_PATH=[anaconda root directory]
# Install basic dependencies
conda install numpy pyyaml mkl setuptools cmake gcc cffi
@ -197,21 +196,15 @@ MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py install
Dockerfile is supplied to build images with cuda support and cudnn v6. Build as usual
```
docker build -t pytorch .
docker build -t pytorch-cudnnv6 .
```
Alternatively, if you want a runtime image, build with
and run with nvidia-docker:
```
docker build -t pytorch . -f tools/docker/Dockerfile_runtime
nvidia-docker run --rm -ti --ipc=host pytorch-cudnnv6
```
and run with nvidia-docker:
```
nvidia-docker run --rm -ti --ipc=host pytorch
```
Please note that PyTorch uses shared memory to share data between processes, so if torch multiprocessing is used (e.g.
Please note that pytorch uses shared memory to share data between processes, so if torch multiprocessing is used (e.g.
for multithreaded data loaders) the default shared memory segment size that container runs with is not enough, and you
should increase shared memory size either with `--ipc=host` or `--shm-size` command line options to `nvidia-docker run`.
should increase shared memory size either with --ipc=host or --shm-size command line options to nvidia-docker run.
## Getting Started
@ -223,13 +216,13 @@ Three pointers to get you started:
## Communication
* forums: discuss implementations, research, etc. http://discuss.pytorch.org
* GitHub issues: bug reports, feature requests, install issues, RFCs, thoughts, etc.
* Slack: general chat, online discussions, collaboration etc. https://pytorch.slack.com/ . If you need a slack invite, ping us at soumith@pytorch.org
* github issues: bug reports, feature requests, install issues, RFCs, thoughts, etc.
* slack: general chat, online discussions, collaboration etc. https://pytorch.slack.com/ . If you need a slack invite, ping us at soumith@pytorch.org
* newsletter: no-noise, one-way email newsletter with important announcements about pytorch. You can sign-up here: http://eepurl.com/cbG0rv
## Releases and Contributing
PyTorch has a 90 day release cycle (major releases).
PyTorch has a 90 day release cycle (major releases).
It's current state is Beta, we expect no obvious bugs. Please let us know if you encounter a bug by [filing an issue](https://github.com/pytorch/pytorch/issues).
We appreciate all contributions. If you are planning to contribute back bug-fixes, please do so without any further discussion.

View File

@ -112,7 +112,3 @@ footer p {
nav .hidden-section {
display: inherit;
}
.wy-side-nav-search>div.version {
color: #000;
}

View File

@ -9,8 +9,6 @@ Automatic differentiation package - torch.autograd
.. autofunction:: backward
.. autofunction:: grad
Variable
--------
@ -40,8 +38,8 @@ All :class:`Variable` s keep track of in-place operations applied to them, and
if the implementation detects that a variable was saved for backward in one of
the functions, but it was modified in-place afterwards, an error will be raised
once backward pass is started. This ensures that if you're using in-place
functions and not seeing any errors, you can be sure that the computed
gradients are correct.
functions and not seing any errors, you can be sure that the computed gradients
are correct.
.. autoclass:: Variable

View File

@ -75,10 +75,10 @@ author = 'Torch Contributors'
#
# The short X.Y version.
# TODO: change to [:2] at v1.0
version = 'master (' + torch.__version__ + ' )'
version = '.'.join(torch.__version__.split('+')[0].split('.')[:3])
# The full version, including alpha/beta/rc tags.
# TODO: verify this works as expected
release = 'master'
release = torch.__version__.split('+')[0]
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
@ -113,7 +113,7 @@ html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]
#
html_theme_options = {
'collapse_navigation': False,
'display_version': True,
'display_version': False,
'logo_only': True,
}
@ -205,7 +205,7 @@ from sphinx import addnodes
def patched_make_field(self, types, domain, items, **kw):
# `kw` catches `env=None` needed for newer sphinx while maintaining
# `kw` catches `env=None` needed for newer sphinx while maingaining
# backwards compatibility when passed along further down!
# type: (List, unicode, Tuple) -> nodes.field

View File

@ -25,10 +25,3 @@ Streams and events
.. autoclass:: Event
:members:
NVIDIA Tools Extension (NVTX)
-----------------------------
.. autofunction:: torch.cuda.nvtx.mark
.. autofunction:: torch.cuda.nvtx.range_push
.. autofunction:: torch.cuda.nvtx.range_pop

View File

@ -10,4 +10,3 @@ torch.utils.data
.. autoclass:: torch.utils.data.sampler.RandomSampler
.. autoclass:: torch.utils.data.sampler.SubsetRandomSampler
.. autoclass:: torch.utils.data.sampler.WeightedRandomSampler
.. autoclass:: torch.utils.data.distributed.DistributedSampler

View File

@ -1,165 +0,0 @@
.. role:: hidden
:class: hidden-section
Distributed communication package - torch.distributed
=====================================================
.. automodule:: torch.distributed
.. currentmodule:: torch.distributed
Currently torch.distributed supports three backends, each with
different capabilities. The table below shows which functions are available
for use with CPU / CUDA tensors.
MPI supports cuda only iff the implementation used to build PyTorch supports it.
+------------+-----------+-----------+-----------+
| Backend | ``tcp`` | ``gloo`` | ``mpi`` |
+------------+-----+-----+-----+-----+-----+-----+
| Device | CPU | GPU | CPU | GPU | CPU | GPU |
+============+=====+=====+=====+=====+=====+=====+
| send | ✓ | ✘ | ✘ | ✘ | ✓ | ? |
+------------+-----+-----+-----+-----+-----+-----+
| recv | ✓ | ✘ | ✘ | ✘ | ✓ | ? |
+------------+-----+-----+-----+-----+-----+-----+
| broadcast | ✓ | ✘ | ✓ | ✓ | ✓ | ? |
+------------+-----+-----+-----+-----+-----+-----+
| all_reduce | ✓ | ✘ | ✓ | ✓ | ✓ | ? |
+------------+-----+-----+-----+-----+-----+-----+
| reduce | ✓ | ✘ | ✘ | ✘ | ✓ | ? |
+------------+-----+-----+-----+-----+-----+-----+
| all_gather | ✓ | ✘ | ✘ | ✘ | ✓ | ? |
+------------+-----+-----+-----+-----+-----+-----+
| gather | ✓ | ✘ | ✘ | ✘ | ✓ | ? |
+------------+-----+-----+-----+-----+-----+-----+
| scatter | ✓ | ✘ | ✘ | ✘ | ✓ | ? |
+------------+-----+-----+-----+-----+-----+-----+
| barrier | ✓ | ✘ | ✓ | ✓ | ✓ | ? |
+------------+-----+-----+-----+-----+-----+-----+
Initialization
--------------
The package needs to be initialized using the :func:`torch.distributed.init_process_group`
function before calling any other methods.
.. autofunction:: init_process_group
.. autofunction:: get_rank
.. autofunction:: get_world_size
--------------------------------------------------------------------------------
Currently three initialization methods are supported:
TCP initialization
^^^^^^^^^^^^^^^^^^
Initialization will utilize a network address reachable from all processes.
If the address belongs to one of the machines, initialization requires that all processes
have manually specified ranks.
Alternatively, the address has to be a valid IP multicast address, in which case,
ranks can be assigned automatically. Multicast initialization also supports
a ``group_name`` argument, which allows you to use the same address for multiple jobs,
as long as they use different group names.
::
import torch.distributed as dist
# Use address of one of the machines
dist.init_process_group(init_method='tcp://10.1.1.20:23456', rank=args.rank, world_size=4)
# or a multicast address - rank will be assigned automatically if unspecified
dist.init_process_group(init_method='tcp://[ff15:1e18:5d4c:4cf0:d02d:b659:53ba:b0a7]:23456',
world_size=4)
Shared file-system initialization
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Another initialization method makes use of a file system shared and visible from
all machines in a group. The URL should start with ``file://`` and contain a path
to a non-existent file (in an existing directory) on a shared file system.
This initialization method also supports a ``group_name`` argument, which allows you to
use the same shared file path for multiple jobs, as long as they use different
group names.
.. warning::
This method assumes that the file system supports locking using ``fcntl`` - most
local systems and NFS support it.
::
import torch.distributed as dist
# Rank will be assigned automatically if unspecified
dist.init_process_group(init_method='file:///mnt/nfs/sharedfile', world_size=4,
group_name=args.group)
Environment variable initialization
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This method will read the configuration from environment variables, allowing
one to fully customize how the information is obtained. The variables to be set
are:
* ``MASTER_PORT`` - required; has to be a free port on machine with rank 0
* ``MASTER_ADDR`` - required (except for rank 0); address of rank 0 node
* ``WORLD_SIZE`` - required; can be set either here, or in a call to init function
* ``RANK`` - required; can be set either here, or in a call to init function
The machine with rank 0 will be used to set up all connections.
This is the default method, meaning that ``init_method`` does not have to be specified (or
can be ``env://``).
Groups
------
By default collectives operate on the default group (also called the world) and
require all processes to enter the distributed function call. However, some workloads can benefit
from more fine-grained communication. This is where distributed groups come
into play. :func:`~torch.distributed.new_group` function can be
used to create new groups, with arbitrary subsets of all processes. It returns
an opaque group handle that can be given as a ``group`` argument to all collectives
(collectives are distributed functions to exchange information in certain well-known programming patterns).
.. autofunction:: new_group
Point-to-point communication
----------------------------
.. autofunction:: send
.. autofunction:: recv
:func:`~torch.distributed.isend` and :func:`~torch.distributed.irecv`
return distributed request objects when used. In general, the type of this object is unspecified
as they should never be created manually, but they are guaranteed to support two methods:
* ``is_completed()`` - returns True if the operation has finished
* ``wait()`` - will block the process until the operation is finished.
``is_completed()`` is guaranteed to return True once it returns.
.. autofunction:: isend
.. autofunction:: irecv
Collective functions
--------------------
.. autofunction:: broadcast
.. autofunction:: all_reduce
.. autofunction:: reduce
.. autofunction:: all_gather
.. autofunction:: gather
.. autofunction:: scatter
.. autofunction:: barrier

View File

@ -30,7 +30,6 @@ PyTorch is an optimized tensor library for deep learning using GPUs and CPUs.
optim
torch.autograd <autograd>
torch.multiprocessing <multiprocessing>
torch.distributed <distributed>
torch.legacy <legacy>
cuda
ffi

View File

@ -83,6 +83,6 @@ the current process group, and will keep track of all shared memory allocations.
Once all processes connected to it exit, it will wait a moment to ensure there
will be no new connections, and will iterate over all shared memory files
allocated by the group. If it finds that any of them still exist, they will be
deallocated. We've tested this method and it proved to be robust to various
deallocated. We've tested this method and it prooved to be robust to various
failures. Still, if your system has high enough limits, and ``file_descriptor``
is a supported strategy, we do not recommend switching to this one.

View File

@ -160,7 +160,7 @@ Pooling Layers
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: AdaptiveMaxPool2d
:members:
:members:
:hidden:`AdaptiveAvgPool1d`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@ -174,41 +174,7 @@ Pooling Layers
.. autoclass:: AdaptiveAvgPool2d
:members:
Padding Layers
--------------
:hidden:`ReflectionPad2d`
~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: ReflectionPad2d
:members:
:hidden:`ReplicationPad2d`
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: ReplicationPad2d
:members:
:hidden:`ReplicationPad3d`
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: ReplicationPad3d
:members:
:hidden:`ZeroPad2d`
~~~~~~~~~~~~~~~~~~~
.. autoclass:: ZeroPad2d
:members:
:hidden:`ConstantPad2d`
~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: ConstantPad2d
:members:
Non-linear Activations
----------------------------------
@ -230,12 +196,6 @@ Non-linear Activations
.. autoclass:: ELU
:members:
:hidden:`SELU`
~~~~~~~~~~~~~~
.. autoclass:: SELU
:members:
:hidden:`PReLU`
~~~~~~~~~~~~~~~
@ -343,19 +303,19 @@ Normalization layers
:members:
:hidden:`InstanceNorm1d`
~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: InstanceNorm1d
:members:
:hidden:`InstanceNorm2d`
~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: InstanceNorm2d
:members:
:hidden:`InstanceNorm3d`
~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: InstanceNorm3d
:members:
@ -430,12 +390,6 @@ Dropout layers
.. autoclass:: Dropout3d
:members:
:hidden:`AlphaDropout`
~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: AlphaDropout
:members:
Sparse layers
----------------------------------
@ -446,21 +400,9 @@ Sparse layers
.. autoclass:: Embedding
:members:
:hidden:`EmbeddingBag`
~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: EmbeddingBag
:members:
Distance functions
----------------------------------
:hidden:`CosineSimilarity`
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: CosineSimilarity
:members:
:hidden:`PairwiseDistance`
~~~~~~~~~~~~~~~~~~~~~~~~~~
@ -495,12 +437,6 @@ Loss functions
.. autoclass:: NLLLoss
:members:
:hidden:`PoissonNLLLoss`
~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: PoissonNLLLoss
:members:
:hidden:`NLLLoss2d`
~~~~~~~~~~~~~~~~~~~
@ -519,12 +455,6 @@ Loss functions
.. autoclass:: BCELoss
:members:
:hidden:`BCEWithLogitsLoss`
~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: BCEWithLogitsLoss
:members:
:hidden:`MarginRankingLoss`
~~~~~~~~~~~~~~~~~~~~~~~~~~~
@ -573,12 +503,6 @@ Loss functions
.. autoclass:: MultiMarginLoss
:members:
:hidden:`TripletMarginLoss`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: TripletMarginLoss
:members:
Vision layers
----------------
@ -589,12 +513,6 @@ Vision layers
.. autoclass:: PixelShuffle
:members:
:hidden:`Upsample`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: Upsample
:members:
:hidden:`UpsamplingNearest2d`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@ -608,21 +526,15 @@ Vision layers
:members:
DataParallel layers (multi-GPU, distributed)
--------------------------------------------
Multi-GPU layers
----------------
:hidden:`DataParallel`
~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: DataParallel
:members:
:hidden:`DistributedDataParallel`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: torch.nn.parallel.DataParallel
:members:
Utilities
---------
@ -632,16 +544,6 @@ Utilities
.. autofunction:: torch.nn.utils.clip_grad_norm
:hidden:`weight_norm`
~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: torch.nn.utils.weight_norm
:hidden:`remove_weight_norm`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: torch.nn.utils.remove_weight_norm
.. currentmodule:: torch.nn.utils.rnn
@ -774,7 +676,7 @@ Pooling functions
.. autofunction:: adaptive_avg_pool2d
Non-linear activation functions
-------------------------------
@ -804,11 +706,6 @@ Non-linear activation functions
.. autofunction:: elu
:hidden:`selu`
~~~~~~~~~~~~~~
.. autofunction:: selu
:hidden:`leaky_relu`
~~~~~~~~~~~~~~~~~~~~
@ -887,11 +784,6 @@ Normalization functions
.. autofunction:: batch_norm
:hidden:`normalize`
~~~~~~~~~~~~~~~~~~~~
.. autofunction:: normalize
Linear functions
----------------
@ -908,21 +800,6 @@ Dropout functions
.. autofunction:: dropout
:hidden:`alpha_dropout`
~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: alpha_dropout
:hidden:`dropout2d`
~~~~~~~~~~~~~~~~~~~
.. autofunction:: dropout2d
:hidden:`dropout3d`
~~~~~~~~~~~~~~~~~~~
.. autofunction:: dropout3d
Distance functions
----------------------------------
@ -931,100 +808,36 @@ Distance functions
.. autofunction:: pairwise_distance
:hidden:`cosine_similarity`
~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: cosine_similarity
Loss functions
--------------
:hidden:`binary_cross_entropy`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: binary_cross_entropy
:hidden:`poisson_nll_loss`
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: poisson_nll_loss
:hidden:`cosine_embedding_loss`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: cosine_embedding_loss
:hidden:`cross_entropy`
~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: cross_entropy
:hidden:`hinge_embedding_loss`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: hinge_embedding_loss
:hidden:`kl_div`
~~~~~~~~~~~~~~~~
.. autofunction:: kl_div
:hidden:`l1_loss`
~~~~~~~~~~~~~~~~~
.. autofunction:: l1_loss
:hidden:`mse_loss`
~~~~~~~~~~~~~~~~~~
.. autofunction:: mse_loss
:hidden:`margin_ranking_loss`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: margin_ranking_loss
:hidden:`multilabel_margin_loss`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: multilabel_margin_loss
:hidden:`multilabel_soft_margin_loss`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: multilabel_soft_margin_loss
:hidden:`multi_margin_loss`
~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: multi_margin_loss
:hidden:`nll_loss`
~~~~~~~~~~~~~~~~~~
.. autofunction:: nll_loss
:hidden:`binary_cross_entropy_with_logits`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: binary_cross_entropy_with_logits
:hidden:`kl_div`
~~~~~~~~~~~~~~~~
.. autofunction:: kl_div
:hidden:`cross_entropy`
~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: cross_entropy
:hidden:`binary_cross_entropy`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: binary_cross_entropy
:hidden:`smooth_l1_loss`
~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: smooth_l1_loss
:hidden:`soft_margin_loss`
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: soft_margin_loss
:hidden:`triplet_margin_loss`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: triplet_margin_loss
Vision functions
----------------
@ -1038,32 +851,6 @@ Vision functions
.. autofunction:: pad
:hidden:`upsample`
~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: upsample
:hidden:`upsample_nearest`
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: upsample_nearest
:hidden:`upsample_bilinear`
~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: upsample_bilinear
:hidden:`grid_sample`
~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: grid_sample
:hidden:`affine_grid`
~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: affine_grid
torch.nn.init
=============

View File

@ -67,18 +67,18 @@ model. ``volatile`` also determines that ``requires_grad is False``.
Volatile differs from :ref:`excluding-requires_grad` in how the flag propagates.
If there's even a single volatile input to an operation, its output is also
going to be volatile. Volatility spreads across the graph much easier than
going to be volatile. Volatility spreads accross the graph much easier than
non-requiring gradient - you only need a **single** volatile leaf to have a
volatile output, while you need **all** leaves to not require gradient to
have an output that doesn't require gradient. Using volatile flag you don't
have an output the doesn't require gradient. Using volatile flag you don't
need to change any settings of your model parameters to use it for
inference. It's enough to create a volatile input, and this will ensure that
no intermediate states are saved.
.. code::
>>> regular_input = Variable(torch.randn(1, 3, 227, 227))
>>> volatile_input = Variable(torch.randn(1, 3, 227, 227), volatile=True)
>>> regular_input = Variable(torch.randn(5, 5))
>>> volatile_input = Variable(torch.randn(5, 5), volatile=True)
>>> model = torchvision.models.resnet18(pretrained=True)
>>> model(regular_input).requires_grad
True
@ -86,28 +86,21 @@ no intermediate states are saved.
False
>>> model(volatile_input).volatile
True
>>> model(volatile_input).grad_fn is None
>>> model(volatile_input).creator is None
True
How autograd encodes the history
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Autograd is reverse automatic differentiation system. Conceptually,
autograd records a graph recording all of the operations that created
the data as you execute operations, giving you a directed acyclic graph
whose leaves are the input variables and roots are the output variables.
By tracing this graph from roots to leaves, you can automatically
compute the gradients using the chain rule.
Internally, autograd represents this graph as a graph of
:class:`Function` objects (really expressions), which can be
:meth:`~torch.autograd.Function.apply` ed to compute the result of
evaluating the graph. When computing the forwards pass, autograd
simultaneously performs the requested computations and builds up a graph
representing the function that computes the gradient (the ``.grad_fn``
attribute of each :class:`Variable` is an entry point into this graph).
When the forwards pass is completed, we evaluate this graph in the
backwards pass to compute the gradients.
Each Variable has a ``.creator`` attribute, that points to the function, of
which it is an output. This is an entry point to a directed acyclic graph (DAG)
consisting of :class:`Function` objects as nodes, and references between them
being the edges. Every time an operation is performed, a new :class:`Function`
representing it is instantiated, its :meth:`~torch.autograd.Function.forward`
method is called, and its output :class:`Variable` s creators are set to it.
Then, by following the path from any :class:`Variable` to the leaves, it is
possible to reconstruct the sequence of operations that has created the data,
and automatically compute the gradients.
An important thing to note is that the graph is recreated from scratch at every
iteration, and this is exactly what allows for using arbitrary Python control

View File

@ -1,113 +0,0 @@
.. _broadcasting-semantics:
Broadcasting semantics
======================
Many PyTorch operations support :any:`NumPy Broadcasting Semantics <numpy.doc.broadcasting>`.
In short, if a PyTorch operation supports broadcast, then its Tensor arguments can be
automatically expanded to be of equal sizes (without making copies of the data).
General semantics
-----------------
Two tensors are "broadcastable" if the following rules hold:
- Each tensor has at least one dimension.
- When iterating over the dimension sizes, starting at the trailing dimension,
the dimension sizes must either be equal, one of them is 1, or one of them
does not exist.
For Example::
>>> x=torch.FloatTensor(5,7,3)
>>> y=torch.FloatTensor(5,7,3)
# same shapes are always broadcastable (i.e. the above rules always hold)
>>> x=torch.FloatTensor()
>>> y=torch.FloatTensor(2,2)
# x and y are not broadcastable, because x does not have at least 1 dimension
# can line up trailing dimensions
>>> x=torch.FloatTensor(5,3,4,1)
>>> y=torch.FloatTensor( 3,1,1)
# x and y are broadcastable.
# 1st trailing dimension: both have size 1
# 2nd trailing dimension: y has size 1
# 3rd trailing dimension: x size == y size
# 4th trailing dimension: y dimension doesn't exist
# but:
>>> x=torch.FloatTensor(5,2,4,1)
>>> y=torch.FloatTensor( 3,1,1)
# x and y are not broadcastable, because in the 3rd trailing dimension 2 != 3
If two tensors :attr:`x`, :attr:`y` are "broadcastable", the resulting tensor size
is calculated as follows:
- If the number of dimensions of :attr:`x` and :attr:`y` are not equal, prepend 1
to the dimensions of the tensor with fewer dimensions to make them equal length.
- Then, for each dimension size, the resulting dimension size is the max of the sizes of
:attr:`x` and :attr:`y` along that dimension.
For Example::
# can line up trailing dimensions to make reading easier
>>> x=torch.FloatTensor(5,1,4,1)
>>> y=torch.FloatTensor( 3,1,1)
>>> (x+y).size()
torch.Size([5, 3, 4, 1])
# but not necessary:
>>> x=torch.FloatTensor(1)
>>> y=torch.FloatTensor(3,1,7)
>>> (x+y).size()
torch.Size([3, 1, 7])
>>> x=torch.FloatTensor(5,2,4,1)
>>> y=torch.FloatTensor(3,1,1)
>>> (x+y).size()
RuntimeError: The size of tensor a (2) must match the size of tensor b (3) at non-singleton dimension 1
In-place semantics
------------------
One complication is that in-place operations do not allow the in-place tensor to change shape
as a result of the broadcast.
For Example::
>>> x=torch.FloatTensor(5,3,4,1)
>>> y=torch.FloatTensor(3,1,1)
>>> (x.add_(y)).size()
torch.Size([5, 3, 4, 1])
# but:
>>> x=torch.FloatTensor(1,3,1)
>>> y=torch.FloatTensor(3,1,7)
>>> (x.add_(y)).size()
RuntimeError: The expanded size of the tensor (1) must match the existing size (7) at non-singleton dimension 2.
Backwards compatibility
-----------------------
Prior versions of PyTorch allowed certain pointwise functions to execute on tensors with different shapes,
as long as the number of elements in each tensor was equal. The pointwise operation would then be carried
out by viewing each tensor as 1-dimensional. PyTorch now supports broadcasting and the "1-dimensional"
pointwise behavior is considered deprecated and will generate a Python warning in cases where tensors are
not broadcastable, but have the same number of elements.
Note that the introduction of broadcasting can cause backwards incompatible changes in the case where
two tensors do not have the same shape, but are broadcastable and have the same number of elements.
For Example::
>>> torch.add(torch.ones(4,1), torch.randn(4))
would previously produce a Tensor with size: torch.Size([4,1]), but now produces a Tensor with size: torch.Size([4,4]).
In order to help identify cases in your code where backwards incompatibilities introduced by broadcasting may exist,
you may set `torch.utils.backcompat.broadcast_warning.enabled` to `True`, which will generate a python warning
in such cases.
For Example::
>>> torch.utils.backcompat.broadcast_warning.enabled=True
>>> torch.add(torch.ones(4,1), torch.ones(4))
__main__:1: UserWarning: self and other do not have the same shape, but are broadcastable, and have the same number of elements.
Changing behavior in a backwards incompatible manner to broadcasting rather than viewing as 1-dimensional.

View File

@ -12,7 +12,7 @@ of your selected device, and the results will be always placed in on the same
device as the tensor.
Cross-GPU operations are not allowed by default, with the only exception of
:meth:`~torch.Tensor.copy_`. Unless you enable peer-to-peer memory accesses,
:meth:`~torch.Tensor.copy_`. Unless you enable peer-to-peer memory accesses
any attempts to launch ops on tensors spread across different devices will
raise an error.

View File

@ -13,28 +13,31 @@ Extending :mod:`torch.autograd`
Adding operations to :mod:`~torch.autograd` requires implementing a new
:class:`Function` subclass for each operation. Recall that :class:`Function` s
are what :mod:`~torch.autograd` uses to compute the results and gradients, and
encode the operation history. Every new function requires you to implement 2
encode the operation history. Every new function requires you to implement 3
methods:
- ``__init__`` (*optional*) - if your operation is parametrized by/uses
objects different than :class:`Variable` s, you should pass them as arguments
to ``__init__``. For example, ``AddConstant`` function takes a scalar to add,
while ``Transpose`` requires specifying which two dimensions to swap. If your
function doesn't require any additional parameters, you can skip it.
- :meth:`~Function.forward` - the code that performs the operation. It can take
as many arguments as you want, with some of them being optional, if you
specify the default values. All kinds of Python objects are accepted here.
:class:`Variable` arguments will be converted to :class:`Tensor` s before the
call, and their use will be registered in the graph. Note that this logic won't
traverse lists/dicts/any other data structures and will only consider Variables
that are direct arguments to the call. You can return either a single
:class:`Tensor` output, or a :class:`tuple` of :class:`Tensor` s if there are
multiple outputs. Also, please refer to the docs of :class:`Function` to find
descriptions of useful methods that can be called only from :meth:`~Function.forward`.
as many arguments as you want, with some of them being
optional, if you specify the default values. Keep in mind that only
:class:`Variable` s will be passed in here. You can return either a single
:class:`Variable` output, or a :class:`tuple` of :class:`Variable` s if there
are multiple. Also, please refer to the docs of :class:`Function` to find
descriptions of useful methods that can be called only from
:meth:`~Function.forward`.
- :meth:`~Function.backward` - gradient formula. It will be given
as many :class:`Variable` arguments as there were outputs, with each of them
representing gradient w.r.t. that output. It should return as many
:class:`Variable` s as there were inputs, with each of them containing the
gradient w.r.t. its corresponding input. If your inputs didn't require
gradient (see :attr:`~Variable.needs_input_grad`), or were non-:class:`Variable`
objects, you can return :class:`python:None`. Also, if you have optional
arguments to :meth:`~Variable.forward` you can return more gradients than there
were inputs, as long as they're all :any:`python:None`.
as many arguments as there were outputs, with each of them representing
gradient w.r.t. that output. It should return as many :class:`Tensor` s as
there were inputs, with each of them containing the gradient w.r.t.
corresponding input. If you inputs didn't require gradient (see
:attr:`~Variable.needs_input_grad`), or it was non-differentiable, you
can return :class:`None`. Also, if you have optional arguments to
:meth:`~Variable.forward` you can return more gradients than there were
inputs, as long as they're all :any:`python:None`.
Below you can find code for a ``Linear`` function from :mod:`torch.nn`, with
additional comments::
@ -42,25 +45,22 @@ additional comments::
# Inherit from Function
class Linear(Function):
# Note that both forward and backward are @staticmethods
@staticmethod
# bias is an optional argument
def forward(ctx, input, weight, bias=None):
ctx.save_for_backward(input, weight, bias)
def forward(self, input, weight, bias=None):
self.save_for_backward(input, weight, bias)
output = input.mm(weight.t())
if bias is not None:
output += bias.unsqueeze(0).expand_as(output)
return output
# This function has only a single output, so it gets only one gradient
@staticmethod
def backward(ctx, grad_output):
def backward(self, grad_output):
# This is a pattern that is very convenient - at the top of backward
# unpack saved_tensors and initialize all gradients w.r.t. inputs to
# None. Thanks to the fact that additional trailing Nones are
# ignored, the return statement is simple even when the function has
# optional inputs.
input, weight, bias = ctx.saved_variables
input, weight, bias = self.saved_tensors
grad_input = grad_weight = grad_bias = None
# These needs_input_grad checks are optional and there only to
@ -76,39 +76,27 @@ additional comments::
return grad_input, grad_weight, grad_bias
Now, to make it easier to use these custom ops, we recommend aliasing their
``apply`` method::
Now, to make it easier to use these custom ops, we recommend wrapping them in
small helper functions::
linear = Linear.aply
Here, we give an additional example of a function that is parametrized by
non-Variable arguments::
class MulConstant(Function):
@staticmethod
def forward(ctx, tensor, constant):
# ctx is a context object that can be used to stash information
for backward computation
ctx.constant = constant
return tensor * constant
@staticmethod
def backward(ctx, grad_output):
# We return as many input gradients as there were arguments.
# Gradients of non-Tensor arguments to forward must be None.
return grad_output * ctx.constant, None
def linear(input, weight, bias=None):
# First braces create a Function object. Any arguments given here
# will be passed to __init__. Second braces will invoke the __call__
# operator, that will then use forward() to compute the result and
# return it.
return Linear()(input, weight, bias)
You probably want to check if the backward method you implemented actually
computes the derivatives of your function. It is possible by comparing with
numerical approximations using small finite differences::
from torch.autograd import gradcheck
# gradchek takes a tuple of tensor as input, check if your gradient
# evaluated with these tensors are close enough to numerical
# approximations and returns True if they all verify this condition.
input = (Variable(torch.randn(20,20).double(), requires_grad=True), Variable(torch.randn(30,20).double(), requires_grad=True),)
test = gradcheck(Linear.apply, input, eps=1e-6, atol=1e-4)
input = (Variable(torch.randn(20,20).double(), requires_grad=True),)
test = gradcheck(Linear(), input, eps=1e-6, atol=1e-4)
print(test)
Extending :mod:`torch.nn`

View File

@ -114,21 +114,3 @@ Algorithms
:members:
.. autoclass:: SGD
:members:
How to adjust Learning Rate
---------------------------
:mod:`torch.optim.lr_scheduler` provides several methods to adjust the learning
rate based on the number of epoches. :class:`torch.optim.lr_scheduler.ReduceLROnPlateau`
allows dynamic learning rate reducing based on some validation measurements.
.. autoclass:: torch.optim.lr_scheduler.LambdaLR
:members:
.. autoclass:: torch.optim.lr_scheduler.StepLR
:members:
.. autoclass:: torch.optim.lr_scheduler.MultiStepLR
:members:
.. autoclass:: torch.optim.lr_scheduler.ExponentialLR
:members:
.. autoclass:: torch.optim.lr_scheduler.ReduceLROnPlateau
:members:

View File

@ -1,7 +1,7 @@
.. currentmodule:: torch.sparse
torch.sparse
============
Sparse tensors
==============
.. warning::
@ -12,13 +12,16 @@ efficiently store and process tensors for which the majority of elements
are zeros.
A sparse tensor is represented as a pair of dense tensors: a tensor
of values and a tensor of indices. A sparse tensor can be constructed
which contains the actual values :class:`torch.sparse.values`, and a
tensor which contains the coordinates of those values
:class:`torch.sparse.indices`. A sparse tensor can be constructed
by providing these two tensors, as well as the size of the sparse tensor
(which cannot be inferred from these tensors!)
>>> i = torch.LongTensor([[0, 1], [2, 0]])
>>> v = torch.FloatTensor([3, 4])
>>> torch.sparse.FloatTensor(i, v, torch.Size([2,3])).to_dense()
0 0 3
4 0 0
[torch.FloatTensor of size 2x2]
@ -29,6 +32,7 @@ dimensions are sparse, and the rest of the dimensions are dense.
>>> i = torch.LongTensor([[2, 4]])
>>> v = torch.FloatTensor([[1, 3], [5, 7]])
>>> torch.sparse.FloatTensor(i, v).to_dense()
0 0
0 0
1 3
@ -44,71 +48,42 @@ An empty sparse tensor can be constructed by specifying its size:
and values:
[torch.FloatTensor with no dimension]
.. note::
Our sparse tensor format permits *uncoalesced* sparse tensors, where
there may be duplicate coordinates in the indices; in this case,
the interpretation is that the value at that index is the sum of all
duplicate value entries. Uncoalesced tensors permit us to implement
certain operators more efficiently.
For the most part, you shouldn't have to care whether or not a
sparse tensor is coalesced or not, as most operations will work
identically given a coalesced or uncoalesced sparse tensor.
However, there are two cases in which you may need to care.
First, if you repeatedly perform an operation that can produce
duplicate entries (e.g., :func:`torch.sparse.FloatTensor.add`), you
should occasionally coalesce your sparse tensors to prevent
them from growing too large.
Second, some operators will produce different values depending on
whether or not they are coalesced or not (e.g.,
:func:`torch.sparse.FloatTensor._values` and
:func:`torch.sparse.FloatTensor._indices`, as well as
:func:`torch.Tensor._sparse_mask`). These operators are
prefixed by an underscore to indicate that they reveal internal
implementation details and should be used with care, since code
that works with coalesced sparse tensors may not work with
uncoalesced sparse tensors; generally speaking, it is safest
to explicitly coalesce before working with these operators.
For example, suppose that we wanted to implement an operator
by operating directly on :func:`torch.sparse.FloatTensor._values`.
Multiplication by a scalar can be implemented in the obvious way,
as multiplication distributes over addition; however, square root
cannot be implemented directly, since ``sqrt(a + b) != sqrt(a) +
sqrt(b)`` (which is what would be computed if you were given an
uncoalesced tensor.)
Sparse tensors can have duplicate entries for an index; such a tensor is
called non-coalesced. Duplicate entries are summed together when
coalescing (or converting to another representation). Some operations
(for example, :func:`torch.FloatTensor.add`) produce duplicate entries;
if you repeatedly perform these operations, you should coalesce your
sparse tensors to prevent them from growing too large.
.. class:: FloatTensor()
.. method:: add
.. method:: add_
.. method:: clone
.. method:: dim
.. method:: div
.. method:: div_
.. method:: get_device
.. method:: hspmm
.. method:: mm
.. method:: mul
.. method:: mul_
.. method:: resizeAs_
.. method:: size
.. method:: spadd
.. method:: spmm
.. method:: sspaddmm
.. method:: sspmm
.. method:: sub
.. method:: sub_
.. method:: t_
.. method:: toDense
.. method:: transpose
.. method:: transpose_
.. method:: zero_
.. method:: coalesce
.. method:: is_coalesced
.. method:: _indices
.. method:: _values
.. method:: _nnz
.. automethod:: add
.. automethod:: add_
.. automethod:: clone
.. automethod:: contiguous
.. automethod:: dim
.. automethod:: div
.. automethod:: div_
.. automethod:: get_device
.. automethod:: hspmm
.. automethod:: indices
.. automethod:: is_contiguous
.. automethod:: mm
.. automethod:: mul
.. automethod:: mul_
.. automethod:: nnz
.. automethod:: resizeAs_
.. automethod:: size
.. automethod:: spadd
.. automethod:: sparse_mask
.. automethod:: spmm
.. automethod:: sspaddmm
.. automethod:: sspmm
.. automethod:: sub
.. automethod:: sub_
.. automethod:: t_
.. automethod:: toDense
.. automethod:: transpose
.. automethod:: transpose_
.. automethod:: values
.. automethod:: zero_

View File

@ -13,7 +13,7 @@ Data type CPU tensor GPU tensor
======================== =========================== ================================
32-bit floating point :class:`torch.FloatTensor` :class:`torch.cuda.FloatTensor`
64-bit floating point :class:`torch.DoubleTensor` :class:`torch.cuda.DoubleTensor`
16-bit floating point :class:`torch.HalfTensor` :class:`torch.cuda.HalfTensor`
16-bit floating point N/A :class:`torch.cuda.HalfTensor`
8-bit integer (unsigned) :class:`torch.ByteTensor` :class:`torch.cuda.ByteTensor`
8-bit integer (signed) :class:`torch.CharTensor` :class:`torch.cuda.CharTensor`
16-bit integer (signed) :class:`torch.ShortTensor` :class:`torch.cuda.ShortTensor`
@ -196,10 +196,9 @@ view of a storage and defines numeric operations on it.
.. automethod:: lt
.. automethod:: lt_
.. automethod:: map_
.. automethod:: masked_scatter_
.. automethod:: masked_copy_
.. automethod:: masked_fill_
.. automethod:: masked_select
.. automethod:: matmul
.. automethod:: max
.. automethod:: mean
.. automethod:: median

View File

@ -170,7 +170,6 @@ BLAS and LAPACK Operations
.. autofunction:: ger
.. autofunction:: gesv
.. autofunction:: inverse
.. autofunction:: matmul
.. autofunction:: mm
.. autofunction:: mv
.. autofunction:: orgqr

View File

@ -1,78 +1,129 @@
torchvision.datasets
====================
All datasets are subclasses of :class:`torch.utils.data.Dataset`
i.e, they have ``__getitem__`` and ``__len__`` methods implemented.
Hence, they can all be passed to a :class:`torch.utils.data.DataLoader`
which can load multiple samples parallelly using ``torch.multiprocessing`` workers.
For example: ::
imagenet_data = torchvision.datasets.ImageFolder('path/to/imagenet_root/')
data_loader = torch.utils.data.DataLoader(imagenet_data,
batch_size=4,
shuffle=True,
num_workers=args.nThreads)
The following dataset loaders are available:
The following datasets are available:
- `MNIST`_
- `COCO (Captioning and Detection)`_
- `LSUN Classification`_
- `ImageFolder`_
- `Imagenet-12`_
- `CIFAR10 and CIFAR100`_
- `STL10`_
.. contents:: Datasets
:local:
Datasets have the API:
All the datasets have almost similar API. They all have two common arguments:
``transform`` and ``target_transform`` to transform the input and target respectively.
- ``__getitem__``
- ``__len__``
They all subclass from ``torch.utils.data.Dataset``
Hence, they can all be multi-threaded (python multiprocessing) using
standard torch.utils.data.DataLoader.
For example:
.. currentmodule:: torchvision.datasets
``torch.utils.data.DataLoader(coco_cap, batch_size=args.batchSize, shuffle=True, num_workers=args.nThreads)``
In the constructor, each dataset has a slightly different API as needed,
but they all take the keyword args:
- ``transform`` - a function that takes in an image and returns a
transformed version
- common stuff like ``ToTensor``, ``RandomCrop``, etc. These can be
composed together with ``transforms.Compose`` (see transforms section
below)
- ``target_transform`` - a function that takes in the target and
transforms it. For example, take in the caption string and return a
tensor of word indices.
MNIST
~~~~~
.. autoclass:: MNIST
``dset.MNIST(root, train=True, transform=None, target_transform=None, download=False)``
- ``root`` : root directory of dataset where ``processed/training.pt`` and ``processed/test.pt`` exist.
- ``train`` : ``True`` = Training set, ``False`` = Test set
- ``download`` : ``True`` = downloads the dataset from the internet and puts it in root directory. If dataset already downloaded, place the processed dataset (function available in mnist.py) in the ``processed`` folder.
COCO
~~~~
.. note ::
These require the `COCO API to be installed`_
This requires the `COCO API to be installed`_
.. _COCO API to be installed: https://github.com/pdollar/coco/tree/master/PythonAPI
Captions
^^^^^^^^
.. autoclass:: CocoCaptions
:members: __getitem__
:special-members:
Detection
Captions:
^^^^^^^^^
.. autoclass:: CocoDetection
:members: __getitem__
:special-members:
``dset.CocoCaptions(root="dir where images are", annFile="json annotation file", [transform, target_transform])``
Example:
.. code:: python
import torchvision.datasets as dset
import torchvision.transforms as transforms
cap = dset.CocoCaptions(root = 'dir where images are',
annFile = 'json annotation file',
transform=transforms.ToTensor())
print('Number of samples: ', len(cap))
img, target = cap[3] # load 4th sample
print("Image Size: ", img.size())
print(target)
Output:
::
Number of samples: 82783
Image Size: (3L, 427L, 640L)
[u'A plane emitting smoke stream flying over a mountain.',
u'A plane darts across a bright blue sky behind a mountain covered in snow',
u'A plane leaves a contrail above the snowy mountain top.',
u'A mountain that has a plane flying overheard in the distance.',
u'A mountain view with a plume of smoke in the background']
Detection:
^^^^^^^^^^
``dset.CocoDetection(root="dir where images are", annFile="json annotation file", [transform, target_transform])``
LSUN
~~~~
.. autoclass:: LSUN
:members: __getitem__
:special-members:
``dset.LSUN(db_path, classes='train', [transform, target_transform])``
- db\_path = root directory for the database files
- ``classes`` = ``train`` (all categories, training set), ``val`` (all categories, validation set), ``test`` (all categories, test set)
- [``bedroom\_train``, ``church\_train``, …] : a list of categories to load
ImageFolder
~~~~~~~~~~~
.. autoclass:: ImageFolder
:members: __getitem__
:special-members:
A generic data loader where the images are arranged in this way:
::
root/dog/xxx.png
root/dog/xxy.png
root/dog/xxz.png
root/cat/123.png
root/cat/nsdf3.png
root/cat/asd932_.png
``dset.ImageFolder(root="root folder path", [transform, target_transform])``
It has the members:
- ``self.classes`` - The class names as a list
- ``self.class_to_idx`` - Corresponding class indices
- ``self.imgs`` - The list of (image path, class-index) tuples
Imagenet-12
~~~~~~~~~~~
This should simply be implemented with an ``ImageFolder`` dataset.
This is simply implemented with an ImageFolder dataset.
The data is preprocessed `as described
here <https://github.com/facebook/fb.resnet.torch/blob/master/INSTALL.md#download-the-imagenet-dataset>`__
@ -82,31 +133,30 @@ example <https://github.com/pytorch/examples/blob/27e2a46c1d1505324032b1d94fc6ce
CIFAR
~~~~~
.. autoclass:: CIFAR10
:members: __getitem__
:special-members:
``dset.CIFAR10(root, train=True, transform=None, target_transform=None, download=False)``
``dset.CIFAR100(root, train=True, transform=None, target_transform=None, download=False)``
- ``root`` : root directory of dataset where there is folder
``cifar-10-batches-py``
- ``train`` : ``True`` = Training set, ``False`` = Test set
- ``download`` : ``True`` = downloads the dataset from the internet and
puts it in root directory. If dataset already downloaded, doesn't do anything.
STL10
~~~~~
``dset.STL10(root, split='train', transform=None, target_transform=None, download=False)``
.. autoclass:: STL10
:members: __getitem__
:special-members:
SVHN
~~~~~
.. autoclass:: SVHN
:members: __getitem__
:special-members:
PhotoTour
~~~~~~~~~
.. autoclass:: PhotoTour
:members: __getitem__
:special-members:
- ``root`` : root directory of dataset where there is folder ``stl10_binary``
- ``split`` : ``'train'`` = Training set, ``'test'`` = Test set, ``'unlabeled'`` = Unlabeled set, ``'train+unlabeled'`` = Training + Unlabeled set (missing label marked as ``-1``)
- ``download`` : ``True`` = downloads the dataset from the internet and puts it in root directory. If dataset already downloaded, doesn't do anything.
.. _MNIST: #mnist
.. _COCO (Captioning and Detection): #coco
.. _LSUN Classification: #lsun
.. _ImageFolder: #imagefolder
.. _Imagenet-12: #imagenet-12
.. _CIFAR10 and CIFAR100: #cifar
.. _STL10: #stl10
.. _COCO API to be installed: https://github.com/pdollar/coco/tree/master/PythonAPI

View File

@ -1,12 +1,11 @@
torchvision.models
===================
.. currentmodule:: torchvision.models
.. automodule:: torchvision.models
:members: alexnet, resnet18, resnet34, resnet50, resnet101, resnet152,
vgg11, vgg11_bn, vgg13, vgg13_bn, vgg16, vgg16_bn, vgg19,
vgg19_bn, inception_v3, squeezenet1_0, squeezenet1_1, densenet121,
densenet169, densenet201, densenet161
vgg19_bn
:undoc-members:

View File

@ -3,8 +3,6 @@ torchvision.transforms
.. currentmodule:: torchvision.transforms
Transforms are common image transforms. They can be chained together using :class:`Compose`
.. autoclass:: Compose
Transforms on PIL.Image
@ -26,20 +24,14 @@ Transforms on torch.\*Tensor
----------------------------
.. autoclass:: Normalize
:members: __call__
:special-members:
Conversion Transforms
---------------------
.. autoclass:: ToTensor
:members: __call__
:special-members:
.. autoclass:: ToPILImage
:members: __call__
:special-members:
Generic Transforms
------------------

View File

@ -15,24 +15,12 @@ import os
from tools.setup_helpers.env import check_env_flag
from tools.setup_helpers.cuda import WITH_CUDA, CUDA_HOME
from tools.setup_helpers.cudnn import WITH_CUDNN, CUDNN_LIB_DIR, CUDNN_INCLUDE_DIR
from tools.setup_helpers.split_types import split_types
DEBUG = check_env_flag('DEBUG')
WITH_DISTRIBUTED = not check_env_flag('NO_DISTRIBUTED')
WITH_DISTRIBUTED = check_env_flag('WITH_DISTRIBUTED')
WITH_DISTRIBUTED_MW = WITH_DISTRIBUTED and check_env_flag('WITH_DISTRIBUTED_MW')
WITH_NCCL = WITH_CUDA and platform.system() != 'Darwin'
SYSTEM_NCCL = False
################################################################################
# Workaround setuptools -Wstrict-prototypes warnings
# I lifted this code from https://stackoverflow.com/a/29634231/23845
################################################################################
import distutils.sysconfig
cfg_vars = distutils.sysconfig.get_config_vars()
for key, value in cfg_vars.items():
if type(value) == str:
cfg_vars[key] = value.replace("-Wstrict-prototypes", "")
################################################################################
# Monkey-patch setuptools to compile in parallel
################################################################################
@ -156,10 +144,6 @@ class build_ext(setuptools.command.build_ext.build_ext):
print('-- Building NCCL library')
else:
print('-- Not using NCCL')
if WITH_DISTRIBUTED:
print('-- Building with distributed package ')
else:
print('-- Building without distributed package')
# cwrap depends on pyyaml, so we can't import it earlier
from tools.cwrap import cwrap
@ -171,14 +155,10 @@ class build_ext(setuptools.command.build_ext.build_ext):
from tools.cwrap.plugins.NullableArguments import NullableArguments
from tools.cwrap.plugins.CuDNNPlugin import CuDNNPlugin
from tools.cwrap.plugins.WrapDim import WrapDim
from tools.cwrap.plugins.AssertNDim import AssertNDim
from tools.cwrap.plugins.Broadcast import Broadcast
from tools.cwrap.plugins.ProcessorSpecificPlugin import ProcessorSpecificPlugin
thp_plugin = THPPlugin()
cwrap('torch/csrc/generic/TensorMethods.cwrap', plugins=[
ProcessorSpecificPlugin(), BoolOption(), thp_plugin,
AutoGPU(condition='IS_CUDA'), ArgcountSortPlugin(), KwargsPlugin(),
AssertNDim(), WrapDim(), Broadcast()
BoolOption(), thp_plugin, AutoGPU(condition='IS_CUDA'),
ArgcountSortPlugin(), KwargsPlugin(), WrapDim()
])
cwrap('torch/csrc/cudnn/cuDNN.cwrap', plugins=[
CuDNNPlugin(), NullableArguments()
@ -225,10 +205,11 @@ class clean(distutils.command.clean.clean):
include_dirs = []
library_dirs = []
extra_link_args = []
extra_compile_args = ['-std=c++11', '-Wno-write-strings',
# Python 2.6 requires -fno-strict-aliasing, see
# http://legacy.python.org/dev/peps/pep-3123/
'-fno-strict-aliasing']
extra_compile_args = ['-std=c++11', '-Wno-write-strings']
if os.getenv('PYTORCH_BINARY_BUILD') and platform.system() == 'Linux':
print('PYTORCH_BINARY_BUILD found. Static linking libstdc++ on Linux')
extra_compile_args += ['-static-libstdc++']
extra_link_args += ['-static-libstdc++']
cwd = os.path.dirname(os.path.abspath(__file__))
lib_path = os.path.join(cwd, "torch", "lib")
@ -241,7 +222,6 @@ include_dirs += [
tmp_install_path + "/include/TH",
tmp_install_path + "/include/THPP",
tmp_install_path + "/include/THNN",
tmp_install_path + "/include/ATen",
]
library_dirs.append(lib_path)
@ -254,10 +234,7 @@ THCS_LIB = os.path.join(lib_path, 'libTHCS.so.1')
THNN_LIB = os.path.join(lib_path, 'libTHNN.so.1')
THCUNN_LIB = os.path.join(lib_path, 'libTHCUNN.so.1')
THPP_LIB = os.path.join(lib_path, 'libTHPP.so.1')
ATEN_LIB = os.path.join(lib_path, 'libATen.so.1')
GLOO_LIB = os.path.join(lib_path, 'libgloo.a')
GLOO_CUDA_LIB = os.path.join(lib_path, 'libgloo_cuda.a')
THD_LIB = os.path.join(lib_path, 'libTHD.a')
THD_LIB = os.path.join(lib_path, 'libTHD.so.1')
NCCL_LIB = os.path.join(lib_path, 'libnccl.so.1')
if platform.system() == 'Darwin':
TH_LIB = os.path.join(lib_path, 'libTH.1.dylib')
@ -267,26 +244,26 @@ if platform.system() == 'Darwin':
THNN_LIB = os.path.join(lib_path, 'libTHNN.1.dylib')
THCUNN_LIB = os.path.join(lib_path, 'libTHCUNN.1.dylib')
THPP_LIB = os.path.join(lib_path, 'libTHPP.1.dylib')
ATEN_LIB = os.path.join(lib_path, 'libATen.1.dylib')
THD_LIB = os.path.join(lib_path, 'libTHD.1.dylib')
NCCL_LIB = os.path.join(lib_path, 'libnccl.1.dylib')
if WITH_NCCL and subprocess.call('ldconfig -p | grep libnccl >/dev/null', shell=True) == 0:
SYSTEM_NCCL = True
SYSTEM_NCCL = True
main_compile_args = ['-D_THP_CORE']
main_libraries = ['shm']
main_link_args = [TH_LIB, THS_LIB, THPP_LIB, THNN_LIB, ATEN_LIB]
main_link_args = [TH_LIB, THS_LIB, THPP_LIB, THNN_LIB]
main_sources = [
"torch/csrc/PtrWrapper.cpp",
"torch/csrc/Module.cpp",
"torch/csrc/Generator.cpp",
"torch/csrc/Size.cpp",
"torch/csrc/Exceptions.cpp",
"torch/csrc/Tensor.cpp",
"torch/csrc/Storage.cpp",
"torch/csrc/DynamicTypes.cpp",
"torch/csrc/byte_order.cpp",
"torch/csrc/utils.cpp",
"torch/csrc/expand_utils.cpp",
"torch/csrc/utils/object_ptr.cpp",
"torch/csrc/utils/tuple_parser.cpp",
"torch/csrc/allocators.cpp",
@ -295,7 +272,7 @@ main_sources = [
"torch/csrc/autograd/engine.cpp",
"torch/csrc/autograd/function.cpp",
"torch/csrc/autograd/variable.cpp",
"torch/csrc/autograd/input_buffer.cpp",
"torch/csrc/autograd/grad_buffer.cpp",
"torch/csrc/autograd/python_function.cpp",
"torch/csrc/autograd/python_cpp_function.cpp",
"torch/csrc/autograd/python_variable.cpp",
@ -303,14 +280,9 @@ main_sources = [
"torch/csrc/autograd/python_hook.cpp",
"torch/csrc/autograd/functions/batch_normalization.cpp",
"torch/csrc/autograd/functions/convolution.cpp",
"torch/csrc/autograd/functions/basic_ops.cpp",
"torch/csrc/autograd/functions/tensor.cpp",
"torch/csrc/autograd/functions/accumulate_grad.cpp",
"torch/csrc/autograd/functions/utils.cpp",
"torch/csrc/autograd/functions/init.cpp",
"torch/csrc/nn/THNN_generic.cpp",
]
main_sources += split_types("torch/csrc/Tensor.cpp")
try:
import numpy as np
@ -331,11 +303,8 @@ if WITH_DISTRIBUTED:
"torch/csrc/distributed/Tensor.cpp",
"torch/csrc/distributed/Storage.cpp",
]
extra_compile_args += ['-DWITH_DISTRIBUTED_MW']
include_dirs += [tmp_install_path + "/include/THD"]
main_link_args += [THD_LIB]
if platform.system() == 'Linux':
main_link_args += [GLOO_LIB]
if WITH_CUDA:
cuda_lib_dirs = ['lib64', 'lib']
@ -350,20 +319,17 @@ if WITH_CUDA:
extra_link_args.append('-Wl,-rpath,' + cuda_lib_path)
extra_compile_args += ['-DWITH_CUDA']
extra_compile_args += ['-DCUDA_LIB_PATH=' + cuda_lib_path]
main_libraries += ['cudart', 'nvToolsExt']
main_libraries += ['cudart']
main_link_args += [THC_LIB, THCS_LIB, THCUNN_LIB]
if platform.system() == 'Linux':
main_link_args += [GLOO_CUDA_LIB]
main_sources += [
"torch/csrc/cuda/Module.cpp",
"torch/csrc/cuda/Storage.cpp",
"torch/csrc/cuda/Stream.cpp",
"torch/csrc/cuda/Tensor.cpp",
"torch/csrc/cuda/AutoGPU.cpp",
"torch/csrc/cuda/utils.cpp",
"torch/csrc/cuda/expand_utils.cpp",
"torch/csrc/cuda/serialization.cpp",
]
main_sources += split_types("torch/csrc/cuda/Tensor.cpp")
if WITH_NCCL:
if SYSTEM_NCCL:
@ -380,8 +346,6 @@ if WITH_CUDNN:
"torch/csrc/cudnn/BatchNorm.cpp",
"torch/csrc/cudnn/Conv.cpp",
"torch/csrc/cudnn/cuDNN.cpp",
"torch/csrc/cudnn/GridSampler.cpp",
"torch/csrc/cudnn/AffineGridGenerator.cpp",
"torch/csrc/cudnn/Types.cpp",
"torch/csrc/cudnn/Handles.cpp",
]
@ -391,18 +355,6 @@ if DEBUG:
extra_compile_args += ['-O0', '-g']
extra_link_args += ['-O0', '-g']
if os.getenv('PYTORCH_BINARY_BUILD') and platform.system() == 'Linux':
print('PYTORCH_BINARY_BUILD found. Static linking libstdc++ on Linux')
# get path of libstdc++ and link manually.
# for reasons unknown, -static-libstdc++ doesn't fully link some symbols
CXXNAME = os.getenv('CXX', 'g++')
STDCPP_LIB = subprocess.check_output([CXXNAME, '-print-file-name=libstdc++.a'])
STDCPP_LIB = STDCPP_LIB[:-1]
if type(STDCPP_LIB) != str: # python 3
STDCPP_LIB = STDCPP_LIB.decode(sys.stdout.encoding)
main_link_args += [STDCPP_LIB]
version_script = os.path.abspath("tools/pytorch.version")
extra_link_args += ['-Wl,--version-script=' + version_script]
def make_relative_rpath(path):
if platform.system() == 'Darwin':
@ -415,7 +367,7 @@ def make_relative_rpath(path):
################################################################################
extensions = []
packages = find_packages(exclude=('tools', 'tools.*',))
packages = find_packages(exclude=('tools.*',))
C = Extension("torch._C",
libraries=main_libraries,
@ -462,7 +414,7 @@ if WITH_CUDA:
)
extensions.append(THCUNN)
version = '0.2.0'
version = '0.1.12'
if os.getenv('PYTORCH_BUILD_VERSION'):
assert os.getenv('PYTORCH_BUILD_NUMBER') is not None
version = os.getenv('PYTORCH_BUILD_VERSION') \
@ -495,5 +447,5 @@ setup(name="torch", version=version,
'lib/*.h',
'lib/include/TH/*.h', 'lib/include/TH/generic/*.h',
'lib/include/THC/*.h', 'lib/include/THC/generic/*.h']},
install_requires=['pyyaml', 'numpy'],
install_requires=['pyyaml'],
)

View File

@ -15,28 +15,15 @@ from torch.autograd import Variable
torch.set_default_tensor_type('torch.DoubleTensor')
SEED = 0
SEED_SET = 0
def parse_set_seed_once():
global SEED
global SEED_SET
def run_tests():
parser = argparse.ArgumentParser(add_help=False)
parser.add_argument('--seed', type=int, default=123)
args, remaining = parser.parse_known_args()
if SEED_SET == 0:
torch.manual_seed(args.seed)
if torch.cuda.is_available():
torch.cuda.manual_seed_all(args.seed)
SEED = args.seed
SEED_SET = 1
torch.manual_seed(args.seed)
if torch.cuda.is_available():
torch.cuda.manual_seed_all(args.seed)
remaining = [sys.argv[0]] + remaining
return remaining
def run_tests():
remaining = parse_set_seed_once()
unittest.main(argv=remaining)
@ -90,7 +77,7 @@ def to_gpu(obj, type_map={}):
elif torch.is_storage(obj):
return obj.new().resize_(obj.size()).copy_(obj)
elif isinstance(obj, Variable):
assert obj.is_leaf
assert obj.creator is None
t = type_map.get(type(obj.data), get_gpu_type(type(obj.data)))
return Variable(obj.data.clone().type(t), requires_grad=obj.requires_grad)
elif isinstance(obj, list):
@ -131,11 +118,6 @@ def is_iterable(obj):
class TestCase(unittest.TestCase):
precision = 1e-5
def setUp(self):
torch.manual_seed(SEED)
if torch.cuda.is_available():
torch.cuda.manual_seed_all(SEED)
def assertTensorsSlowEqual(self, x, y, prec=None, message=''):
max_err = 0
self.assertEqual(x.size(), y.size())
@ -147,7 +129,7 @@ class TestCase(unittest.TestCase):
tc = t.coalesce()
value_map = {}
for idx, val in zip(t._indices().t(), t._values()):
for idx, val in zip(t.indices().t(), t.values()):
idx_tup = tuple(idx)
if idx_tup in value_map:
value_map[idx_tup] += val
@ -156,31 +138,26 @@ class TestCase(unittest.TestCase):
new_indices = sorted(list(value_map.keys()))
new_values = [value_map[idx] for idx in new_indices]
if t._values().ndimension() < 2:
new_values = t._values().new(new_values)
if t.values().ndimension() < 2:
new_values = t.values().new(new_values)
else:
new_values = torch.stack(new_values)
new_indices = t._indices().new(new_indices).t()
new_indices = t.indices().new(new_indices).t()
tg = t.new(new_indices, new_values, t.size())
self.assertEqual(tc._indices(), tg._indices())
self.assertEqual(tc._values(), tg._values())
self.assertEqual(tc.indices(), tg.indices())
self.assertEqual(tc.values(), tg.values())
return tg
def unwrapVariables(self, x, y):
if isinstance(x, Variable) and isinstance(y, Variable):
return x.data, y.data
elif isinstance(x, Variable) or isinstance(y, Variable):
raise AssertionError("cannot compare {} and {}".format(type(x), type(y)))
return x, y
def assertEqual(self, x, y, prec=None, message=''):
if prec is None:
prec = self.precision
x, y = self.unwrapVariables(x, y)
if isinstance(x, Variable) and isinstance(y, Variable):
x = x.data
y = y.data
if torch.is_tensor(x) and torch.is_tensor(y):
def assertTensorsEqual(a, b):
@ -201,16 +178,13 @@ class TestCase(unittest.TestCase):
if x.is_sparse:
x = self.safeCoalesce(x)
y = self.safeCoalesce(y)
assertTensorsEqual(x._indices(), y._indices())
assertTensorsEqual(x._values(), y._values())
assertTensorsEqual(x.indices(), y.indices())
assertTensorsEqual(x.values(), y.values())
else:
assertTensorsEqual(x, y)
elif type(x) == str and type(y) == str:
super(TestCase, self).assertEqual(x, y)
elif type(x) == set and type(y) == set:
super(TestCase, self).assertEqual(x, y)
elif is_iterable(x) and is_iterable(y):
super(TestCase, self).assertEqual(len(x), len(y))
for x_, y_ in zip(x, y):
self.assertEqual(x_, y_, prec, message)
else:
@ -225,7 +199,9 @@ class TestCase(unittest.TestCase):
if prec is None:
prec = self.precision
x, y = self.unwrapVariables(x, y)
if isinstance(x, Variable) and isinstance(y, Variable):
x = x.data
y = y.data
if torch.is_tensor(x) and torch.is_tensor(y):
if x.size() != y.size():
@ -259,33 +235,24 @@ class TestCase(unittest.TestCase):
return
raise AssertionError("object not found in iterable")
if sys.version_info < (3, 2):
# assertRaisesRegexp renamed assertRaisesRegex in 3.2
assertRaisesRegex = unittest.TestCase.assertRaisesRegexp
def download_file(url, binary=True):
def download_file(url, path, binary=True):
if sys.version_info < (3,):
from urlparse import urlsplit
import urllib2
request = urllib2
error = urllib2
else:
from urllib.parse import urlsplit
from urllib import request, error
filename = os.path.basename(urlsplit(url)[2])
data_dir = os.path.join(os.path.dirname(__file__), 'data')
path = os.path.join(data_dir, filename)
import urllib.request
import urllib.error
request = urllib.request
error = urllib.error
if os.path.exists(path):
return path
return True
try:
data = request.urlopen(url, timeout=15).read()
with open(path, 'wb' if binary else 'w') as f:
f.write(data)
return path
except error.URLError:
msg = "could not download test file '{}'".format(url)
warnings.warn(msg, RuntimeWarning)
raise unittest.SkipTest(msg)
return True
except error.URLError as e:
return False

View File

@ -53,31 +53,29 @@ module_tests = [
dict(
module_name='ReLU',
input_size=(2, 3, 4, 5),
check_inplace=True,
check_inplace=True
),
dict(
module_name='ReLU6',
input_size=(2, 3, 4, 5),
check_inplace=True,
check_inplace=True
),
dict(
module_name='RReLU',
input_size=(1, 2, 2),
test_cuda=False,
check_gradgrad=False,
test_cuda=False
),
dict(
module_name='RReLU',
constructor_args=(0.1, 0.9),
input_size=(4, 4, 5),
desc='with_up_down',
test_cuda=False,
check_gradgrad=False,
test_cuda=False
),
dict(
module_name='Hardtanh',
input_size=(3, 2, 5),
reference_fn=lambda i, _: i.clamp(-1, 1),
reference_fn=lambda i, _: i.clamp(-1, 1)
),
dict(
module_name='Sigmoid',
@ -90,35 +88,35 @@ module_tests = [
dict(
module_name='Softmax',
input_size=(10, 20),
reference_fn=lambda i, _: torch.exp(i).div(torch.exp(i).sum(1, True).expand(10, 20)),
reference_fn=lambda i, _: torch.exp(i).div(torch.exp(i).sum(1).expand(10, 20))
),
dict(
module_name='Softmax2d',
input_size=(1, 3, 10, 20),
reference_fn=lambda i, _: torch.exp(i).div(torch.exp(i).sum(1, False)),
reference_fn=lambda i, _: torch.exp(i).div(torch.exp(i).sum(1).expand_as(i))
),
dict(
module_name='LogSoftmax',
input_size=(10, 20),
reference_fn=lambda i, _: torch.exp(i).div_(torch.exp(i).sum(1, True).expand(10, 20)).log_(),
reference_fn=lambda i, _: torch.exp(i).div_(torch.exp(i).sum(1).expand(10, 20)).log_()
),
dict(
module_name='LogSoftmax',
input_size=(1, 3, 10, 20),
reference_fn=lambda i, _: torch.exp(i).div_(torch.exp(i).sum(1, False)).log_(),
desc='multiparam',
reference_fn=lambda i, _: torch.exp(i).div_(torch.exp(i).sum(1).expand_as(i)).log_(),
desc='multiparam'
),
dict(
module_name='ELU',
constructor_args=(2.,),
input_size=(3, 2, 5),
check_inplace=True
),
# TODO: reference function
dict(
module_name='Hardshrink',
constructor_args=(2.,),
input_size=(4, 3, 2, 4),
check_gradgrad=False,
input_size=(4, 3, 2, 4)
),
dict(
module_name='LeakyReLU',
@ -135,40 +133,34 @@ module_tests = [
dict(
module_name='LogSigmoid',
input_size=(2, 3, 4),
reference_fn=lambda i, _: i.sigmoid().log(),
check_gradgrad=False,
reference_fn=lambda i, _: i.sigmoid().log()
),
dict(
module_name='Softplus',
input_size=(10, 20),
reference_fn=lambda i, _: torch.log(1 + torch.exp(i)),
check_gradgrad=False,
reference_fn=lambda i, _: torch.log(1 + torch.exp(i))
),
dict(
module_name='Softplus',
constructor_args=(2,),
input_size=(10, 20),
reference_fn=lambda i, _: 1. / 2. * torch.log(1 + torch.exp(2 * i)),
desc='beta',
check_gradgrad=False,
desc='beta'
),
dict(
module_name='Softshrink',
input_size=(3, 2, 5),
check_gradgrad=False,
input_size=(3, 2, 5)
),
dict(
module_name='Softshrink',
constructor_args=(1,),
input_size=(3, 2, 5),
desc='lambda',
check_gradgrad=False,
desc='lambda'
),
dict(
module_name='CrossMapLRN2d',
constructor_args=(5, 5e-3, 1e-3, 2),
input_size=(2, 3, 6, 6),
check_gradgrad=False,
input_size=(2, 3, 6, 6)
),
dict(
module_name='PReLU',
@ -212,12 +204,11 @@ module_tests = [
dict(
module_name='Softsign',
input_size=(3, 2, 5),
reference_fn=lambda i, _: i.div(1 + torch.abs(i)),
reference_fn=lambda i, _: i.div(1 + torch.abs(i))
),
dict(
module_name='Softmin',
input_size=(10, 20),
check_gradgrad=False,
input_size=(10, 20)
),
dict(
module_name='Tanhshrink',
@ -225,32 +216,19 @@ module_tests = [
),
]
criterion_tests = [
dict(module_name='L1Loss',
input_size=(2, 3, 4),
target=torch.randn(2, 3, 4),
reference_fn=lambda i, t, _: 1. / i.numel() *
sum((a - b).abs().sum() for a, b in zip(i, t)),
sum((a - b).abs().sum() for a, b in zip(i, t))
),
dict(
module_name='NLLLoss',
input=torch.rand(15, 10).log(),
target=torch.Tensor(15).uniform_().mul(10).floor().long(),
),
dict(
module_name='NLLLoss',
constructor_args=(None, False),
input=torch.rand(15, 10).log(),
target=torch.Tensor(15).uniform_().mul(10).floor().long(),
desc='no_size_average'
),
dict(
module_name='NLLLoss',
constructor_args=(None, True, 2),
input=torch.rand(15, 10).log(),
target=torch.Tensor(15).uniform_().mul(10).floor().long(),
desc='ignore_index'
),
dict(
module_name='NLLLoss',
constructor_args=(torch.rand(10),),
@ -258,159 +236,120 @@ criterion_tests = [
target=torch.Tensor(15).uniform_().mul(10).floor().long(),
desc='weights',
),
dict(
module_name='NLLLoss',
constructor_args=(torch.rand(10), True, 2),
input=torch.rand(15, 10).add(1e-2).log(),
target=torch.Tensor(15).uniform_().mul(10).floor().long(),
desc='weights_ignore_index'
),
dict(
module_name='NLLLoss',
constructor_args=(torch.rand(10), True, -1),
input=torch.rand(15, 10).add(1e-2).log(),
target=torch.Tensor(15).uniform_().mul(10 + 1).floor().long() - 1,
desc='weights_ignore_index_neg'
),
dict(
module_name='KLDivLoss',
input=torch.rand(10, 10).log(),
target=torch.rand(10, 10),
check_gradgrad=False,
target=torch.rand(10, 10)
),
dict(
module_name='MSELoss',
input=torch.randn(2, 3, 4, 5),
target=torch.randn(2, 3, 4, 5),
reference_fn=lambda i, t, _: (i - t).abs().pow(2).sum() / i.numel(),
check_gradgrad=False,
reference_fn=lambda i, t, _: (i - t).abs().pow(2).sum() / i.numel()
),
dict(
module_name='BCELoss',
input=torch.rand(15, 10).clamp_(1e-2, 1 - 1e-2),
target=torch.randn(15, 10).gt(0).double(),
check_gradgrad=False,
target=torch.randn(15, 10).gt(0).double()
),
dict(
module_name='BCELoss',
constructor_args=(torch.rand(10),),
input=torch.rand(15, 10).clamp_(1e-2, 1 - 1e-2),
target=torch.randn(15, 10).gt(0).double(),
desc='weights',
check_gradgrad=False,
desc='weights'
),
dict(
module_name='CrossEntropyLoss',
input=torch.randn(15, 10),
target=torch.Tensor(15).uniform_().mul(10).floor().long(),
check_gradgrad=False,
target=torch.Tensor(15).uniform_().mul(10).floor().long()
),
dict(
module_name='CrossEntropyLoss',
constructor_args=(torch.rand(10),),
input=torch.randn(15, 10),
target=torch.Tensor(15).uniform_().mul(10).floor().long(),
desc='weights',
check_gradgrad=False,
desc='weights'
),
dict(
module_name='NLLLoss2d',
input_size=(2, 3, 5, 5),
target=torch.rand(2, 5, 5).mul(3).floor().long(),
target=torch.rand(2, 5, 5).mul(3).floor().long()
),
dict(
module_name='NLLLoss2d',
constructor_args=(torch.rand(3),),
input_size=(2, 3, 5, 5),
target=torch.rand(2, 5, 5).mul(3).floor().long(),
desc='weights',
),
dict(
module_name='NLLLoss2d',
constructor_args=(None, True, 3),
input_size=(2, 3, 5, 5),
target=torch.rand(2, 5, 5).mul(4).floor().long(),
desc='ignore_index',
desc='weights'
),
dict(
module_name='HingeEmbeddingLoss',
input=torch.rand(10),
target=torch.randn(10).gt(0).double().mul_(2).sub(1),
check_gradgrad=False,
target=torch.randn(10).gt(0).double().mul_(2).sub(1)
),
dict(
module_name='HingeEmbeddingLoss',
constructor_args=(0.5,),
input=torch.rand(10),
target=torch.randn(10).gt(0).double().mul_(2).sub(1),
desc='margin',
check_gradgrad=False,
desc='margin'
),
dict(
module_name='MultiLabelMarginLoss',
input_size=(5, 10),
target=torch.rand(5, 10).mul(10).floor().long(),
check_gradgrad=False,
target=torch.rand(5, 10).mul(10).floor().long()
),
dict(
module_name='MultiLabelSoftMarginLoss',
input_size=(5, 10),
target=torch.rand(5, 10).mul(2).floor(),
check_gradgrad=False,
target=torch.rand(5, 10).mul(2).floor()
),
dict(
module_name='MultiLabelSoftMarginLoss',
constructor_args=(torch.rand(10),),
input_size=(5, 10),
target=torch.rand(5, 10).mul(2).floor(),
desc='weights',
check_gradgrad=False,
desc='weights'
),
dict(
module_name='MultiMarginLoss',
input_size=(5, 10),
target=torch.rand(5).mul(8).floor().long(),
check_gradgrad=False,
target=torch.rand(5).mul(8).floor().long()
),
dict(
module_name='SmoothL1Loss',
input_size=(5, 10),
target=torch.randn(5, 10),
check_gradgrad=False,
target=torch.randn(5, 10)
),
dict(
module_name='SoftMarginLoss',
input_size=(5, 5),
target=torch.randn(5, 5).sign(),
check_gradgrad=False,
target=torch.randn(5, 5).sign()
),
dict(
module_name='CosineEmbeddingLoss',
input=(torch.rand(15, 10), torch.rand(15, 10)),
target=torch.randn(15).sign(),
check_gradgrad=False,
target=torch.randn(15).sign()
),
dict(
module_name='CosineEmbeddingLoss',
constructor_args=(0.7,),
input=(torch.rand(15, 10), torch.rand(15, 10)),
target=torch.randn(15).sign(),
desc='margin',
check_gradgrad=False,
desc='margin'
),
dict(
module_name='MarginRankingLoss',
input=(torch.randn(50).mul(10), torch.randn(50).mul(10)),
target=torch.randn(50).sign(),
check_gradgrad=False,
target=torch.randn(50).sign()
),
dict(
module_name='MarginRankingLoss',
constructor_args=(2,),
input=(torch.randn(50).mul(10), torch.randn(50).mul(10)),
target=torch.randn(50).sign(),
desc='margin',
check_gradgrad=False,
desc='margin'
),
]
@ -440,7 +379,6 @@ class NNTestCase(TestCase):
if isinstance(input, Variable):
if input.requires_grad and input.grad is not None:
input.grad.data.zero_()
input.grad.detach_()
elif torch.is_tensor(input):
return
else:
@ -501,6 +439,7 @@ class NNTestCase(TestCase):
return out
res = tuple()
# TODO: enable non-contig tests
input = contiguous(input)
if jacobian_input:
res += get_numerical_jacobian(fw, input, input, eps=1e-6),
@ -752,7 +691,6 @@ class CriterionTest(TestBase):
test_case.assertEqual(out, expected_out)
test_case.check_criterion_jacobian(module, input, self.target)
self._do_extra_tests(test_case, module, input, self.target)
def test_cuda(self, test_case):
if not TEST_CUDA or not self.should_test_cuda:
@ -779,6 +717,3 @@ class CriterionTest(TestBase):
test_case.assertEqual(cpu_gradInput, gpu_gradInput, 4e-4)
except NotImplementedError:
pass
def _do_extra_tests(self, test_case, module, input, target):
pass

View File

@ -55,37 +55,32 @@ $PYCMD test_cuda.py $@
echo "Running NCCL tests"
$PYCMD test_nccl.py $@
distributed_set_up() {
export TEMP_DIR="$(mktemp -d)"
rm -rf "$TEMP_DIR/"*
mkdir "$TEMP_DIR/barrier"
mkdir "$TEMP_DIR/test_dir"
}
################################################################################
if [[ "$TEST_DISTRIBUTED" -eq 1 ]]; then
distributed_set_up() {
export TEMP_DIR="$(mktemp -d)"
rm -rf "$TEMP_DIR/"*
mkdir "$TEMP_DIR/barrier"
mkdir "$TEMP_DIR/test_dir"
}
distributed_tear_down() {
rm -rf "$TEMP_DIR"
}
distributed_tear_down() {
rm -rf "$TEMP_DIR"
}
trap distributed_tear_down EXIT SIGHUP SIGINT SIGTERM
trap distributed_tear_down EXIT SIGHUP SIGINT SIGTERM
echo "Running distributed tests for the TCP backend"
distributed_set_up
BACKEND=tcp WORLD_SIZE=3 $PYCMD ./test_distributed.py
distributed_tear_down
echo "Running distributed tests for the TCP backend"
distributed_set_up
BACKEND=tcp WORLD_SIZE=3 $PYCMD ./test_distributed.py
distributed_tear_down
echo "Running distributed tests for the Gloo backend"
distributed_set_up
BACKEND=gloo WORLD_SIZE=3 $PYCMD ./test_distributed.py
distributed_tear_down
if [ -x "$(command -v mpiexec)" ]; then
echo "Running distributed tests for the MPI backend"
distributed_set_up
BACKEND=mpi mpiexec -n 3 $PYCMD ./test_distributed.py
distributed_tear_down
else
echo "Skipping MPI backend tests (MPI not found)"
echo "Running distributed tests for the MPI backend"
distributed_set_up
BACKEND=mpi mpiexec -n 3 $PYCMD ./test_distributed.py
distributed_tear_down
fi
################################################################################
if [[ $COVERAGE -eq 1 ]]; then
coverage combine

File diff suppressed because it is too large Load Diff

View File

@ -94,7 +94,7 @@ def small_3d_positive(t):
def small_3d_unique(t):
return t(S, S, S).copy_(torch.arange(1, S * S * S + 1).view(S, S, S))
return t(S, S, S).copy_(torch.arange(1, S * S * S + 1))
def small_1d_lapack(t):
@ -113,10 +113,6 @@ def small_2d_lapack_fat(t):
return t(4, 3).copy_(torch.arange(1, 13).view(4, 3))
def large_2d_lapack(t):
return t(1000, 1000).normal_()
def new_t(*sizes):
def tmp(t):
return t(*sizes).copy_(torch.randn(*sizes))
@ -284,8 +280,6 @@ tests = [
('qr', small_2d_lapack, lambda t: [], 'square', float_types),
('qr', small_2d_lapack_skinny, lambda t: [], 'skinny', float_types),
('qr', small_2d_lapack_fat, lambda t: [], 'fat', float_types),
('qr', large_2d_lapack, lambda t: [], 'big', float_types),
('inverse', new_t(20, 20), lambda t: [], None, float_types),
]
@ -300,7 +294,6 @@ custom_precision = {
'baddbmm': 1e-4,
'rsqrt': 1e-4,
'cumprod': 1e-4,
'qr': 3e-4,
}
simple_pointwise = [
@ -602,17 +595,6 @@ class TestCuda(TestCase):
cuda_type = get_gpu_type(t)
self.assertEqual(cuda_type(seq), reference)
def test_torch_manual_seed_seeds_cuda_devices(self):
with freeze_rng_state():
x = torch.zeros(4, 4).float().cuda()
torch.manual_seed(2)
self.assertEqual(torch.cuda.initial_seed(), 2)
x.uniform_()
torch.manual_seed(2)
y = x.clone().uniform_()
self.assertEqual(x, y)
self.assertEqual(torch.cuda.initial_seed(), 2)
def test_manual_seed(self):
with freeze_rng_state():
x = torch.zeros(4, 4).float().cuda()
@ -841,60 +823,12 @@ class TestCuda(TestCase):
self.assertEqual(gpu_tensor1[0], 1)
self.assertEqual(gpu_tensor0[0], 2)
@staticmethod
def _select_broadcastable_dims(dims_full=None):
return TestTorch._select_broadcastable_dims(dims_full)
def test_broadcast(self):
TestTorch._test_broadcast(self, lambda t: t.cuda())
def test_broadcast_fallback(self):
TestTorch._test_broadcast_fallback(self, lambda t: t.cuda())
def test_broadcast_fused_matmul(self):
TestTorch._test_broadcast_fused_matmul(self, lambda t: t.cuda())
def test_broadcast_batched_matmul(self):
TestTorch._test_broadcast_batched_matmul(self, lambda t: t.cuda())
def test_advancedindex(self):
TestTorch._test_advancedindex(self, lambda t: t.cuda())
def test_advancedindex_big(self):
TestTorch._test_advancedindex_big(self, lambda t: t.cuda())
def test_btrifact(self):
TestTorch._test_btrifact(self, lambda t: t.cuda())
def test_btrisolve(self):
TestTorch._test_btrisolve(self, lambda t: t.cuda())
def test_tensor_gather(self):
TestTorch._test_gather(self, lambda t: t.cuda(), False)
def test_tensor_scatter(self):
TestTorch._test_scatter_base(self, lambda t: t.cuda(), 'scatter_', test_bounds=False)
def test_tensor_scatterAdd(self):
TestTorch._test_scatter_base(self, lambda t: t.cuda(), 'scatter_add_', test_bounds=False)
def test_tensor_scatterFill(self):
TestTorch._test_scatter_base(self, lambda t: t.cuda(), 'scatter_', True, test_bounds=False)
def test_arange(self):
for t in ['IntTensor', 'LongTensor', 'FloatTensor', 'DoubleTensor']:
a = torch.cuda.__dict__[t]()
torch.arange(0, 10, out=a)
b = torch.__dict__[t]()
torch.arange(0, 10, out=b)
self.assertEqual(a, b.cuda())
def test_nvtx(self):
# Just making sure we can see the symbols
torch.cuda.nvtx.range_push("foo")
torch.cuda.nvtx.mark("bar")
torch.cuda.nvtx.range_pop()
if HAS_CUDA:
for decl in tests:

View File

@ -3,7 +3,7 @@ import sys
import torch
import traceback
import unittest
from torch.utils.data import Dataset, TensorDataset, DataLoader, ConcatDataset
from torch.utils.data import Dataset, TensorDataset, DataLoader
from common import TestCase, run_tests, TEST_NUMPY
from common_nn import TEST_CUDA
@ -31,38 +31,6 @@ class TestTensorDataset(TestCase):
self.assertEqual(l[i], source[i][1])
class TestConcatDataset(TestCase):
def test_concat_two_singletons(self):
result = ConcatDataset([[0], [1]])
self.assertEqual(2, len(result))
self.assertEqual(0, result[0])
self.assertEqual(1, result[1])
def test_concat_two_non_singletons(self):
result = ConcatDataset([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
self.assertEqual(10, len(result))
self.assertEqual(0, result[0])
self.assertEqual(5, result[5])
def test_concat_two_non_singletons_with_empty(self):
# Adding an empty dataset somewhere is correctly handled
result = ConcatDataset([[0, 1, 2, 3, 4],
[],
[5, 6, 7, 8, 9]])
self.assertEqual(10, len(result))
self.assertEqual(0, result[0])
self.assertEqual(5, result[5])
def test_concat_raises_index_error(self):
result = ConcatDataset([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
with self.assertRaises(IndexError):
# this one goes to 11
result[11]
class ErrorDataset(Dataset):
def __init__(self, size):
@ -109,7 +77,7 @@ class TestDataLoader(TestCase):
errors = 0
while True:
try:
next(it)
it.next()
except NotImplementedError:
errors += 1
except StopIteration:
@ -123,14 +91,6 @@ class TestDataLoader(TestCase):
def test_sequential_batch(self):
self._test_sequential(DataLoader(self.dataset, batch_size=2))
def test_growing_dataset(self):
dataset = [torch.ones(4) for _ in range(4)]
dataloader_seq = DataLoader(dataset, shuffle=False)
dataloader_shuffle = DataLoader(dataset, shuffle=True)
dataset.append(torch.ones(4))
self.assertEqual(len(dataloader_seq), 5)
self.assertEqual(len(dataloader_shuffle), 5)
@unittest.skipIf(not TEST_CUDA, "CUDA unavailable")
def test_sequential_pin_memory(self):
loader = DataLoader(self.dataset, batch_size=2, pin_memory=True)
@ -156,29 +116,6 @@ class TestDataLoader(TestCase):
def test_shuffle_batch_workers(self):
self._test_shuffle(DataLoader(self.dataset, batch_size=2, shuffle=True, num_workers=4))
def _test_batch_sampler(self, **kwargs):
# [(0, 1), (2, 3, 4), (5, 6), (7, 8, 9), ...]
batches = []
for i in range(0, 100, 5):
batches.append(tuple(range(i, i + 2)))
batches.append(tuple(range(i + 2, i + 5)))
dl = DataLoader(self.dataset, batch_sampler=batches, **kwargs)
self.assertEqual(len(dl), 40)
for i, (input, _target) in enumerate(dl):
if i % 2 == 0:
offset = i * 5 // 2
self.assertEqual(len(input), 2)
self.assertEqual(input, self.data[offset:offset + 2])
else:
offset = i * 5 // 2
self.assertEqual(len(input), 3)
self.assertEqual(input, self.data[offset:offset + 3])
def test_batch_sampler(self):
self._test_batch_sampler()
self._test_batch_sampler(num_workers=4)
@unittest.skipIf(not TEST_CUDA, "CUDA unavailable")
def test_shuffle_pin_memory(self):
loader = DataLoader(self.dataset, batch_size=2, shuffle=True, num_workers=4, pin_memory=True)

View File

@ -14,12 +14,7 @@ from common import TestCase
BACKEND = os.environ['BACKEND']
TEMP_DIR = os.environ['TEMP_DIR']
MASTER_PORT = '29500'
MASTER_ADDR = '127.0.0.1'
if not dist.is_available():
print('Distributed not available, skipping tests')
sys.exit(0)
MASTER_ADDR = '127.0.0.1:' + MASTER_PORT
@contextmanager
@ -69,7 +64,7 @@ class Barrier(object):
data = f.read()
if int(data) >= cls.barrier_id:
arrived += 1
if arrived == dist.get_world_size():
if arrived == dist.get_num_processes():
break
if time.time() - start_time > timeout:
@ -92,7 +87,7 @@ class _DistTestBase(object):
return (group, group_id, rank)
def _init_global_test(self):
group = [i for i in range(0, dist.get_world_size())]
group = [i for i in range(0, dist.get_num_processes())]
group_id = dist.group.WORLD
rank = dist.get_rank()
return (group, group_id, rank)
@ -101,7 +96,7 @@ class _DistTestBase(object):
def test_get_rank(self):
test_dir = os.path.join(TEMP_DIR, 'test_dir')
pid = str(os.getpid())
num_processes = dist.get_world_size()
num_processes = dist.get_num_processes()
with open(os.path.join(test_dir, pid), 'w') as f:
f.write(str(dist.get_rank()))
@ -122,16 +117,15 @@ class _DistTestBase(object):
self._barrier()
# SEND RECV
@unittest.skipIf(BACKEND == 'gloo', "Gloo does not support send/recv")
def test_send_recv(self):
rank = dist.get_rank()
tensor = _build_tensor(rank + 1)
for dest in range(0, dist.get_world_size()):
for dest in range(0, dist.get_num_processes()):
if dest == rank:
continue
dist.send(tensor, dest)
for src in range(0, dist.get_world_size()):
for src in range(0, dist.get_num_processes()):
if src == rank:
continue
tensor = _build_tensor(src + 1, value=-1)
@ -142,32 +136,29 @@ class _DistTestBase(object):
self._barrier()
# SEND RECV ANY SOURCE
@unittest.skipIf(BACKEND == 'gloo',
"Gloo does not support send/recv from any source")
def test_send_recv_any_source(self):
rank = dist.get_rank()
tensor = _build_tensor(10, rank)
for dest in range(0, dist.get_world_size()):
for dest in range(0, dist.get_num_processes()):
if dest == rank:
continue
dist.send(tensor, dest)
recv_ranks = set()
for src in range(0, dist.get_world_size()):
for src in range(0, dist.get_num_processes()):
if src == rank:
continue
tensor = _build_tensor(10, value=-1)
dist.recv(tensor)
recv_ranks.add(tensor.resize_(1)[0])
self.assertEqual(len(recv_ranks), dist.get_world_size() - 1)
self.assertEqual(len(recv_ranks), dist.get_num_processes() - 1)
self._barrier()
# ISEND
@unittest.skipIf(BACKEND == 'gloo', "Gloo does not support isend")
def test_isend(self):
rank = dist.get_rank()
world_size = dist.get_world_size()
world_size = dist.get_num_processes()
if rank == 0:
requests = [
@ -184,10 +175,9 @@ class _DistTestBase(object):
self._barrier()
# IRECV
@unittest.skipIf(BACKEND == 'gloo', "Gloo does not support irecv")
def test_irecv(self):
rank = dist.get_rank()
world_size = dist.get_world_size()
world_size = dist.get_num_processes()
if rank == 0:
expected_tensors = [_build_tensor(src, -1) for src in range(1, world_size)]
@ -206,17 +196,13 @@ class _DistTestBase(object):
self._barrier()
# BROADCAST
def _test_broadcast_helper(self, group, group_id, rank, cuda=False):
def _test_broadcast_helper(self, group, group_id, rank):
for src in group:
expected_tensor = _build_tensor(src + 1)
if cuda:
expected_tensor = expected_tensor.cuda()
if rank == src:
dist.broadcast(expected_tensor, src, group_id)
else:
tensor = _build_tensor(src + 1, -1)
if cuda:
tensor = tensor.cuda()
dist.broadcast(tensor, src, group_id)
self.assertEqual(tensor, expected_tensor)
@ -226,11 +212,6 @@ class _DistTestBase(object):
group, group_id, rank = self._init_global_test()
self._test_broadcast_helper(group, group_id, rank)
@unittest.skipIf(BACKEND != 'gloo', "Only Gloo backend supports CUDA allReduce")
def test_broadcast_cuda(self):
group, group_id, rank = self._init_global_test()
self._test_broadcast_helper(group, group_id, rank, True)
def test_broadcast_group(self):
group, group_id, rank = self._init_group_test()
self._test_broadcast_helper(group, group_id, rank)
@ -248,14 +229,12 @@ class _DistTestBase(object):
self._barrier()
@unittest.skipIf(BACKEND == 'gloo', "Gloo does not support reduce")
def test_reduce_sum(self):
group, group_id, rank = self._init_global_test()
self._test_reduce_helper(
group, group_id, rank, dist.reduce_op.SUM, 2, 10, 2 + (10 * (len(group) - 1))
)
@unittest.skipIf(BACKEND == 'gloo', "Gloo does not support reduce")
def test_reduce_product(self):
group, group_id, rank = self._init_global_test()
self._test_reduce_helper(
@ -263,28 +242,24 @@ class _DistTestBase(object):
2, 10, reduce((lambda x, y: x * y), [10] * (len(group) - 1), 2)
)
@unittest.skipIf(BACKEND == 'gloo', "Gloo does not support reduce")
def test_reduce_min(self):
group, group_id, rank = self._init_global_test()
self._test_reduce_helper(
group, group_id, rank, dist.reduce_op.MIN, 1010, 1, 1
)
@unittest.skipIf(BACKEND == 'gloo', "Gloo does not support reduce")
def test_reduce_max(self):
group, group_id, rank = self._init_global_test()
self._test_reduce_helper(
group, group_id, rank, dist.reduce_op.MAX, -1, 10, 10
)
@unittest.skipIf(BACKEND == 'gloo', "Gloo does not support reduce")
def test_reduce_group_sum(self):
group, group_id, rank = self._init_group_test()
self._test_reduce_helper(
group, group_id, rank, dist.reduce_op.SUM, 2, 10, 2 + (10 * (len(group) - 1))
)
@unittest.skipIf(BACKEND == 'gloo', "Gloo does not support reduce")
def test_reduce_group_product(self):
group, group_id, rank = self._init_group_test()
self._test_reduce_helper(
@ -292,14 +267,12 @@ class _DistTestBase(object):
2, 10, reduce((lambda x, y: x * y), [10] * (len(group) - 1), 2)
)
@unittest.skipIf(BACKEND == 'gloo', "Gloo does not support reduce")
def test_reduce_group_min(self):
group, group_id, rank = self._init_group_test()
self._test_reduce_helper(
group, group_id, rank, dist.reduce_op.MIN, 1010, 1, 1
)
@unittest.skipIf(BACKEND == 'gloo', "Gloo does not support reduce")
def test_reduce_group_max(self):
group, group_id, rank = self._init_group_test()
self._test_reduce_helper(
@ -307,19 +280,14 @@ class _DistTestBase(object):
)
# ALL REDUCE
def _test_all_reduce_helper(self, group, group_id, rank, op, master_value,
worker_value, expected_value, cuda=False):
def _test_all_reduce_helper(self, group, group_id, rank, op, master_value, worker_value, expected_value):
for src in group:
if rank == src:
tensor = _build_tensor(src + 1).fill_(master_value)
if cuda:
tensor = tensor.cuda()
dist.all_reduce(tensor, op, group_id)
self.assertEqual(tensor, _build_tensor(src + 1, expected_value))
else:
tensor = _build_tensor(src + 1).fill_(worker_value)
if cuda:
tensor = tensor.cuda()
dist.all_reduce(tensor, op, group_id)
self.assertEqual(tensor, _build_tensor(src + 1, expected_value))
@ -331,13 +299,6 @@ class _DistTestBase(object):
group, group_id, rank, dist.reduce_op.SUM, 2, 10, 2 + (10 * (len(group) - 1))
)
@unittest.skipIf(BACKEND != 'gloo', "Only Gloo backend supports CUDA allReduce")
def test_all_reduce_sum_cuda(self):
group, group_id, rank = self._init_global_test()
self._test_all_reduce_helper(
group, group_id, rank, dist.reduce_op.SUM, 2, 10, 2 + (10 * (len(group) - 1)), True
)
def test_all_reduce_product(self):
group, group_id, rank = self._init_global_test()
self._test_all_reduce_helper(
@ -387,18 +348,20 @@ class _DistTestBase(object):
for dest in group:
tensor = _build_tensor(dest + 1, -1)
expected_tensor = _build_tensor(dest + 1, rank)
tensors = [_build_tensor(dest + 1, i) for i in group] if rank == dest else []
dist.scatter(tensor, src=dest, scatter_list=tensors, group=group_id)
self.assertEqual(tensor, expected_tensor)
if rank == dest:
tensors = [_build_tensor(dest + 1, i) for i in group]
dist.scatter_send(tensors, tensor, group_id)
self.assertEqual(tensor, expected_tensor)
else:
dist.scatter_recv(tensor, dest, group_id)
self.assertEqual(tensor, expected_tensor)
self._barrier()
@unittest.skipIf(BACKEND == 'gloo', "Gloo does not support scatter")
def test_scatter(self):
group, group_id, rank = self._init_global_test()
self._test_scatter_helper(group, group_id, rank)
@unittest.skipIf(BACKEND == 'gloo', "Gloo does not support scatter")
def test_scatter_group(self):
group, group_id, rank = self._init_group_test()
self._test_scatter_helper(group, group_id, rank)
@ -407,21 +370,22 @@ class _DistTestBase(object):
def _test_gather_helper(self, group, group_id, rank):
for dest in group:
tensor = _build_tensor(dest + 1, rank)
tensors = [_build_tensor(dest + 1, -1) for i in group] if rank == dest else []
dist.gather(tensor, dst=dest, gather_list=tensors, group=group_id)
if rank == dest:
tensors = [_build_tensor(dest + 1, -1) for i in group]
dist.gather_recv(tensors, tensor, group_id)
expected_tensors = [_build_tensor(dest + 1, i) for i in group]
for t1, t2 in zip(tensors, expected_tensors):
self.assertEqual(t1, t2)
else:
dist.gather_send(tensor, dest, group_id)
self._barrier()
@unittest.skipIf(BACKEND == 'gloo', "Gloo does not support gather")
def test_gather(self):
group, group_id, rank = self._init_global_test()
self._test_gather_helper(group, group_id, rank)
@unittest.skipIf(BACKEND == 'gloo', "Gloo does not support gather")
def test_gather_group(self):
group, group_id, rank = self._init_group_test()
self._test_gather_helper(group, group_id, rank)
@ -473,13 +437,13 @@ class _DistTestBase(object):
group, group_id, rank = self._init_group_test()
self._test_barrier_helper(group, group_id, rank)
if BACKEND == 'tcp' or BACKEND == 'gloo':
if BACKEND == 'tcp':
WORLD_SIZE = os.environ['WORLD_SIZE']
class TestTCPOrGloo(TestCase, _DistTestBase):
class TestTCP(TestCase, _DistTestBase):
MANAGER_PROCESS_RANK = -1
JOIN_TIMEOUT = 10
JOIN_TIMEOUT = 5
@staticmethod
def manager_join(fn):
@ -522,11 +486,7 @@ if BACKEND == 'tcp' or BACKEND == 'gloo':
def _run(self, rank):
self.rank = rank
try:
dist.init_process_group(backend=BACKEND)
except RuntimeError as e:
if 'recompile' in e.args[0]:
sys.exit(0)
dist.init_process_group(backend=BACKEND)
# self.id() == e.g. '__main__.TestDistributed.test_get_rank'
# We're retreiving a corresponding test and executing it.
getattr(self, self.id().split(".")[2])()

View File

@ -184,16 +184,16 @@ tests = [
OldModuleTest(nn.Sum,
(1,),
input_size=(2, 4, 5),
reference_fn=lambda i, _: i.sum(1, keepdim=False)),
reference_fn=lambda i, _: i.sum(1).squeeze(1)),
OldModuleTest(nn.Sum,
(1, True),
input_size=(2, 4, 5),
reference_fn=lambda i, _: i.sum(1, keepdim=False).div(i.size(1)),
reference_fn=lambda i, _: i.sum(1).div(i.size(1)).squeeze(1),
desc='sizeAverage'),
OldModuleTest(nn.Mean,
(1,),
input_size=(2, 4, 5),
reference_fn=lambda i, _: torch.mean(i, 1, keepdim=False)),
reference_fn=lambda i, _: torch.mean(i, 1).squeeze(1)),
OldModuleTest(lambda: nn.Sequential().add(nn.GradientReversal()).add(nn.GradientReversal()),
input_size=(4, 3, 2, 2),
fullname='GradientReversal'),
@ -233,19 +233,19 @@ tests = [
reference_fn=lambda i, _: torch.bmm(i[0], i[1].view(i[1].size(0), i[1].size(1), 1)).squeeze()),
OldModuleTest(nn.Max,
input_size=(4, 5, 3),
reference_fn=lambda i, _: torch.max(i, 0, False)[0]),
reference_fn=lambda i, _: torch.max(i, 0)[0].squeeze()),
OldModuleTest(nn.Max,
(1,),
input_size=(4, 5, 3),
reference_fn=lambda i, _: torch.max(i, 1, False)[0],
reference_fn=lambda i, _: torch.max(i, 1)[0].squeeze(),
desc='with_dimension'),
OldModuleTest(nn.Min,
input_size=(4, 5, 3),
reference_fn=lambda i, _: torch.min(i, 0, False)[0]),
reference_fn=lambda i, _: torch.min(i, 0)[0].squeeze()),
OldModuleTest(nn.Min,
(1,),
input_size=(4, 5, 3),
reference_fn=lambda i, _: torch.min(i, 1, False)[0],
reference_fn=lambda i, _: torch.min(i, 1)[0].squeeze(),
desc='with_dimension'),
OldModuleTest(nn.MixtureTable,
tuple(),
@ -532,7 +532,7 @@ for p in (1, 2, 1.5):
(p,),
input_size=(4, 5),
# Eh, we need to use p as a default, so it's passed by value
reference_fn=lambda i, _, p=p: i.div(i.norm(p, 1, True).expand_as(i)),
reference_fn=lambda i, _, p=p: i.div(i.norm(p, 1).expand_as(i)),
desc=str(p)),
)
for p in range(1, 4 + 1):
@ -807,14 +807,14 @@ class TestNN(NNTestCase):
str(m)
output = m.forward(input)
output2 = input.sum(1, True).expand(4, 5).repeat(num_modules, 1)
output2 = input.sum(1).expand(4, 5).repeat(num_modules, 1)
self.assertEqual(output2, output)
gradInput = m.backward(input, torch.ones(output2.size()))
gradInput2 = torch.ones(4, 2).fill_(num_modules * 5)
self.assertEqual(gradInput, gradInput2)
gradWeight = input.sum(0, keepdim=True).expand(5, 2)
gradWeight = input.sum(0).expand(5, 2)
for l in linears:
self.assertEqual(gradWeight, l.gradWeight)
@ -884,8 +884,8 @@ class TestNN(NNTestCase):
output2 = [input, input, input]
self.assertEqual(output2, output)
gradInput = module.backward(input, gradOutput)
gradInput2 = [_gradOutput[0].sum(0, keepdim=False), _gradOutput[1].sum(
0, keepdim=False), [_gradOutput[2].sum(0, keepdim=False)]]
gradInput2 = [_gradOutput[0].sum(0).squeeze(0), _gradOutput[1].sum(
0).squeeze(0), [_gradOutput[2].sum(0).squeeze(0)]]
self.assertTrue(isinstance(gradInput, list))
self.assertFalse(isinstance(gradInput[0], list))
self.assertFalse(isinstance(gradInput[1], list))

View File

@ -112,10 +112,9 @@ class leak_checker(object):
# test is no more than 4 higher than the 10th available at the
# start. This attempts to catch file descriptor leaks, but allows
# one-off initialization that may use up a file descriptor
# TODO: Disabled because this check is too flaky
# available_fds = self._get_next_fds(10)
# self.test_case.assertLessEqual(
# available_fds[-1] - self.next_fds[-1], 5)
available_fds = self._get_next_fds(10)
self.test_case.assertLessEqual(
available_fds[-1] - self.next_fds[-1], 5)
self.test_case.assertFalse(self.has_shm_files())
return False
@ -297,8 +296,7 @@ class TestMultiprocessing(TestCase):
ctx = mp.get_context('spawn')
tensors = []
for i in range(5):
device = i % 2
tensors += [torch.arange(i * 5, (i + 1) * 5).cuda(device)]
tensors += [torch.arange(i * 5, (i + 1) * 5).cuda()]
inq = ctx.Queue()
outq = ctx.Queue()
@ -314,7 +312,7 @@ class TestMultiprocessing(TestCase):
for i, tensor in enumerate(tensors):
v, device, tensor_size, storage_size = results[i]
self.assertEqual(v, torch.arange(i * 5, (i + 1) * 5).sum())
self.assertEqual(device, i % 2)
self.assertEqual(device, 0)
self.assertEqual(tensor_size, 5)
self.assertEqual(storage_size, 5)
@ -393,10 +391,6 @@ class TestMultiprocessing(TestCase):
param = Parameter(torch.arange(1, 26).view(5, 5))
self._test_autograd_sharing(param)
def test_empty_shared(self):
t = torch.Tensor()
t.share_memory_()
def _test_is_shared(self):
t = torch.randn(5, 5)
self.assertFalse(t.is_shared())

File diff suppressed because it is too large Load Diff

View File

@ -4,11 +4,8 @@ from copy import deepcopy
import torch
import torch.optim as optim
import torch.legacy.optim as old_optim
import torch.nn.functional as F
from torch.optim import SGD
from torch.autograd import Variable
from torch import sparse
from torch.optim.lr_scheduler import LambdaLR, StepLR, MultiStepLR, ExponentialLR, ReduceLROnPlateau
from common import TestCase, run_tests
@ -61,49 +58,6 @@ class TestOptim(TestCase):
self.assertLessEqual(params.data.dist(solution), initial_dist)
def _test_rosenbrock_sparse(self, constructor):
params_t = torch.Tensor([1.5, 1.5])
params = Variable(torch.Tensor([1.5, 1.5]), requires_grad=True)
params_c = Variable(torch.Tensor([1.5, 1.5]), requires_grad=True)
optimizer = constructor([params])
optimizer_c = constructor([params_c])
solution = torch.Tensor([1, 1])
initial_dist = params.data.dist(solution)
def eval(params, sparse_grad, w):
# Depending on w, provide only the x or y gradient
optimizer.zero_grad()
loss = rosenbrock(params)
loss.backward()
grad = drosenbrock(params.data)
# NB: We torture test the optimizer by returning an
# uncoalesced sparse tensor
if w:
i = torch.LongTensor([[0, 0]])
x = grad[0]
v = torch.DoubleTensor([x / 4., x - x / 4.])
else:
i = torch.LongTensor([[1, 1]])
y = grad[1]
v = torch.DoubleTensor([y - y / 4., y / 4.])
x = sparse.DoubleTensor(i, v, torch.Size([2]))
if sparse_grad:
params.grad.data = x
else:
params.grad.data = x.to_dense()
return loss
for i in range(2000):
# Do cyclic coordinate descent
w = i % 2
optimizer.step(functools.partial(eval, params, True, w))
optimizer_c.step(functools.partial(eval, params_c, False, w))
self.assertEqual(params.data, params_c.data)
self.assertLessEqual(params.data.dist(solution), initial_dist)
def _test_basic_cases_template(self, weight, bias, input, constructor):
weight = Variable(weight, requires_grad=True)
bias = Variable(bias, requires_grad=True)
@ -201,9 +155,6 @@ class TestOptim(TestCase):
def _build_params_dict(self, weight, bias, **kwargs):
return [dict(params=[weight]), dict(params=[bias], **kwargs)]
def _build_params_dict_single(self, weight, bias, **kwargs):
return [dict(params=bias, **kwargs)]
def test_sgd(self):
self._test_rosenbrock(
lambda params: optim.SGD(params, lr=1e-3),
@ -223,11 +174,6 @@ class TestOptim(TestCase):
self._build_params_dict(weight, bias, lr=1e-2),
lr=1e-3)
)
self._test_basic_cases(
lambda weight, bias: optim.SGD(
self._build_params_dict_single(weight, bias, lr=1e-2),
lr=1e-3)
)
def test_adam(self):
self._test_rosenbrock(
@ -290,11 +236,6 @@ class TestOptim(TestCase):
lr=1e-1)
)
def test_adagrad_sparse(self):
self._test_rosenbrock_sparse(
lambda params: optim.Adagrad(params, lr=1e-1)
)
def test_adamax(self):
self._test_rosenbrock(
lambda params: optim.Adamax(params, lr=1e-1),
@ -402,157 +343,5 @@ class TestOptim(TestCase):
optim.SGD(Variable(torch.randn(5, 5)), lr=3)
class SchedulerTestNet(torch.nn.Module):
def __init__(self):
super(SchedulerTestNet, self).__init__()
self.conv1 = torch.nn.Conv2d(1, 1, 1)
self.conv2 = torch.nn.Conv2d(1, 1, 1)
def forward(self, x):
return self.conv2(F.relu(self.conv1(x)))
class TestLRScheduler(TestCase):
def setUp(self):
self.net = SchedulerTestNet()
self.opt = SGD(
[{'params': self.net.conv1.parameters()}, {'params': self.net.conv2.parameters(), 'lr': 0.5}],
lr=0.05)
def test_step_lr(self):
# lr = 0.05 if epoch < 3
# lr = 0.005 if 30 <= epoch < 6
# lr = 0.0005 if epoch >= 9
single_targets = [0.05] * 3 + [0.005] * 3 + [0.0005] * 3 + [0.00005] * 3
targets = [single_targets, list(map(lambda x: x * 10, single_targets))]
scheduler = StepLR(self.opt, gamma=0.1, step_size=3)
epochs = 10
self._test(scheduler, targets, epochs)
def test_multi_step_lr(self):
# lr = 0.05 if epoch < 2
# lr = 0.005 if 2 <= epoch < 5
# lr = 0.0005 if epoch < 9
# lr = 0.00005 if epoch >= 9
single_targets = [0.05] * 2 + [0.005] * 3 + [0.0005] * 4 + [0.00005] * 3
targets = [single_targets, list(map(lambda x: x * 10, single_targets))]
scheduler = MultiStepLR(self.opt, gamma=0.1, milestones=[2, 5, 9])
epochs = 10
self._test(scheduler, targets, epochs)
def test_exp_lr(self):
single_targets = [0.05 * (0.9 ** x) for x in range(10)]
targets = [single_targets, list(map(lambda x: x * 10, single_targets))]
scheduler = ExponentialLR(self.opt, gamma=0.9)
epochs = 10
self._test(scheduler, targets, epochs)
def test_reduce_lr_on_plateau1(self):
for param_group in self.opt.param_groups:
param_group['lr'] = 0.5
targets = [[0.5] * 20]
metrics = [10 - i * 0.0167 for i in range(20)]
scheduler = ReduceLROnPlateau(self.opt, threshold_mode='abs', mode='min',
threshold=0.01, patience=5, cooldown=5)
epochs = 10
self._test_reduce_lr_on_plateau(scheduler, targets, metrics, epochs)
def test_reduce_lr_on_plateau2(self):
for param_group in self.opt.param_groups:
param_group['lr'] = 0.5
targets = [[0.5] * 6 + [0.05] * 7 + [0.005] * 7 + [0.0005] * 2]
metrics = [10 - i * 0.0165 for i in range(22)]
scheduler = ReduceLROnPlateau(self.opt, patience=5, cooldown=0, threshold_mode='abs',
mode='min', threshold=0.1)
epochs = 22
self._test_reduce_lr_on_plateau(scheduler, targets, metrics, epochs)
def test_reduce_lr_on_plateau3(self):
for param_group in self.opt.param_groups:
param_group['lr'] = 0.5
targets = [[0.5] * (2 + 6) + [0.05] * (5 + 6) + [0.005] * 4]
metrics = [-0.8] * 2 + [-0.234] * 20
scheduler = ReduceLROnPlateau(self.opt, mode='max', patience=5, cooldown=5,
threshold_mode='abs')
epochs = 22
self._test_reduce_lr_on_plateau(scheduler, targets, metrics, epochs)
def test_reduce_lr_on_plateau4(self):
for param_group in self.opt.param_groups:
param_group['lr'] = 0.5
targets = [[0.5] * 20]
metrics = [1.5 * (1.025 ** i) for i in range(20)] # 1.025 > 1.1**0.25
scheduler = ReduceLROnPlateau(self.opt, mode='max', patience=3,
threshold_mode='rel', threshold=0.1)
epochs = 20
self._test_reduce_lr_on_plateau(scheduler, targets, metrics, epochs)
def test_reduce_lr_on_plateau5(self):
for param_group in self.opt.param_groups:
param_group['lr'] = 0.5
targets = [[0.5] * 6 + [0.05] * (5 + 6) + [0.005] * 4]
metrics = [1.5 * (1.005 ** i) for i in range(20)]
scheduler = ReduceLROnPlateau(self.opt, mode='max', threshold_mode='rel',
threshold=0.1, patience=5, cooldown=5)
epochs = 20
self._test_reduce_lr_on_plateau(scheduler, targets, metrics, epochs)
def test_reduce_lr_on_plateau6(self):
for param_group in self.opt.param_groups:
param_group['lr'] = 0.5
targets = [[0.5] * 20]
metrics = [1.5 * (0.85 ** i) for i in range(20)]
scheduler = ReduceLROnPlateau(self.opt, mode='min', threshold_mode='rel',
threshold=0.1)
epochs = 20
self._test_reduce_lr_on_plateau(scheduler, targets, metrics, epochs)
def test_reduce_lr_on_plateau7(self):
for param_group in self.opt.param_groups:
param_group['lr'] = 0.5
targets = [[0.5] * 6 + [0.05] * (5 + 6) + [0.005] * 4]
metrics = [1] * 7 + [0.6] + [0.5] * 12
scheduler = ReduceLROnPlateau(self.opt, mode='min', threshold_mode='rel',
threshold=0.1, patience=5, cooldown=5)
epochs = 20
self._test_reduce_lr_on_plateau(scheduler, targets, metrics, epochs)
def test_reduce_lr_on_plateau8(self):
for param_group in self.opt.param_groups:
param_group['lr'] = 0.5
targets = [[0.5] * 6 + [0.4] * 14, [0.5] * 6 + [0.3] * 14]
metrics = [1.5 * (1.005 ** i) for i in range(20)]
scheduler = ReduceLROnPlateau(self.opt, mode='max', threshold_mode='rel', min_lr=[0.4, 0.3],
threshold=0.1, patience=5, cooldown=5)
epochs = 20
self._test_reduce_lr_on_plateau(scheduler, targets, metrics, epochs)
def test_lambda_lr(self):
self.opt.param_groups[0]['lr'] = 0.05
self.opt.param_groups[1]['lr'] = 0.4
targets = [[0.05 * (0.9 ** x) for x in range(10)], [0.4 * (0.8 ** x) for x in range(10)]]
scheduler = LambdaLR(self.opt,
lr_lambda=[lambda x1: 0.9 ** x1, lambda x2: 0.8 ** x2])
epochs = 10
self._test(scheduler, targets, epochs)
def _test(self, scheduler, targets, epochs=10):
for epoch in range(epochs):
scheduler.step(epoch)
for param_group, target in zip(self.opt.param_groups, targets):
self.assertAlmostEqual(target[epoch], param_group['lr'],
msg='LR is wrong in epoch {}: expected {}, got {}'.format(
epoch, target[epoch], param_group['lr']), delta=1e-5)
def _test_reduce_lr_on_plateau(self, scheduler, targets, metrics, epochs=10, verbose=False):
for epoch in range(epochs):
scheduler.step(metrics[epoch])
if verbose:
print('epoch{}:\tlr={}'.format(epoch, self.opt.param_groups[0]['lr']))
for param_group, target in zip(self.opt.param_groups, targets):
self.assertAlmostEqual(target[epoch], param_group['lr'],
msg='LR is wrong in epoch {}: expected {}, got {}'.format(
epoch, target[epoch], param_group['lr']), delta=1e-5)
if __name__ == '__main__':
run_tests()

View File

@ -8,63 +8,28 @@ from common import TestCase, run_tests
from common_nn import TEST_CUDA
from numbers import Number
# triplet := (index type, value type, sparse type)
cpu_triplet = (
torch.LongTensor,
torch.DoubleTensor,
torch.sparse.DoubleTensor)
def cpu_only(inner):
def outer(self, *args, **kwargs):
if self.is_cuda:
raise unittest.SkipTest("Test is CPU-only")
inner(self, *args, **kwargs)
return outer
def cuda_only(inner):
def outer(self, *args, **kwargs):
if not self.is_cuda:
raise unittest.SkipTest("Test is GPU-only")
inner(self, *args, **kwargs)
return outer
if TEST_CUDA:
cuda_triplet = (
torch.cuda.LongTensor,
torch.cuda.DoubleTensor,
torch.cuda.sparse.DoubleTensor)
class TestSparse(TestCase):
def setUp(self):
# These parameters control the various ways we can run the test.
# We will subclass and override this method to implement CUDA
# tests
self.is_cuda = False
self.is_uncoalesced = False
self.IndexTensor = torch.LongTensor
self.ValueTensor = torch.DoubleTensor
self.SparseTensor = torch.sparse.DoubleTensor
def _gen_sparse(self, d, nnz, with_size):
# TODO: Consider implementing this in the CUDA case by directly
# performing the operations on the GPU. You won't be able to
# use torch.rand/torch.randn in this case because they are
# CPU-only. If you do this, you can remove the is_cuda branch
# at the end.
#
# If you do this, be sure to update assert_uncoalesced too
@staticmethod
def _gen_sparse(d, nnz, with_size, is_cuda=False):
if isinstance(with_size, Number):
with_size = [with_size] * d
if self.is_uncoalesced:
# We want to generate a tensor with a lot of uncoalesced
# entries to stress test whether or not we handle this
# (subtle) case correctly
v_size = [nnz * 2] + list(with_size[d:])
v = torch.randn(*v_size)
r = torch.rand(d, nnz)
# Repeat the indexes, so every position shows up twice
i = torch.cat([r, r], dim=1) * \
torch.Tensor(with_size[:d]).repeat(nnz * 2, 1).transpose(0, 1)
i = i.type(torch.LongTensor)
x = torch.sparse.DoubleTensor(i, v, torch.Size(with_size))
self.assert_uncoalesced(x)
v = torch.randn(nnz)
i = (torch.rand(d, nnz) * with_size).type(torch.LongTensor)
x = torch.sparse.DoubleTensor(i, v)
else:
# Generate a sparse tensor with d sparse dimensions; the
# rest the dimensions with_size[d:] are dense.
v_size = [nnz] + list(with_size[d:])
v = torch.randn(*v_size)
i = torch.rand(d, nnz) * \
@ -72,62 +37,49 @@ class TestSparse(TestCase):
i = i.type(torch.LongTensor)
x = torch.sparse.DoubleTensor(i, v, torch.Size(with_size))
if self.is_cuda:
if is_cuda:
return x.cuda(), i.cuda(), v.cuda()
else:
return x, i.clone(), v.clone()
def assert_uncoalesced(self, x):
"""
Test if a CPU tensor is uncoalesced. This is used to ensure
correctness of the uncoalesced tensor generation algorithm.
"""
assert not x.is_coalesced()
# Strategy: construct a new sparse tensor with the raw value
# field overwritten to a tensor of ones, coalesce it, and then
# check if any value entries are > 1 (which indicates that the
# original was uncoalesced.)
i = x._indices().clone()
v = x._values().clone().fill_(1)
y = torch.sparse.DoubleTensor(i, v, x.size())
z = self.safeCoalesce(y)
assert (z._values() > 1).sum() > 0
def _test_basic(self, is_cuda):
x, i, v = self._gen_sparse(3, 10, 100, is_cuda)
def randn(self, *args, **kwargs):
"""
Variant of torch.randn that also works in the TEST_CUDA case.
"""
# TODO: Put this in torch.cuda.randn
return self.ValueTensor(*args, **kwargs).normal_()
self.assertEqual(i, x.indices())
self.assertEqual(v, x.values())
def test_basic(self):
x, i, v = self._gen_sparse(3, 10, 100)
self.assertEqual(i, x._indices())
self.assertEqual(v, x._values())
x, i, v = self._gen_sparse(3, 10, [100, 100, 100])
self.assertEqual(i, x._indices())
self.assertEqual(v, x._values())
x, i, v = self._gen_sparse(3, 10, [100, 100, 100], is_cuda)
self.assertEqual(i, x.indices())
self.assertEqual(v, x.values())
self.assertEqual(x.ndimension(), 3)
self.assertEqual(x.coalesce()._nnz(), 10)
self.assertEqual(x.nnz(), 10)
for i in range(3):
self.assertEqual(x.size(i), 100)
SparseTensor = (cuda_triplet if is_cuda else cpu_triplet)[2]
# Make sure we can access empty indices / values
x = self.SparseTensor()
self.assertEqual(x._indices().numel(), 0)
self.assertEqual(x._values().numel(), 0)
x = SparseTensor()
self.assertEqual(x.indices().numel(), 0)
self.assertEqual(x.values().numel(), 0)
def test_to_dense(self):
i = self.IndexTensor([
def test_basic(self):
self._test_basic(False)
@unittest.skipIf(not TEST_CUDA, 'CUDA not available')
def test_basic_cuda(self):
self._test_basic(True)
def _test_to_dense(self, is_cuda):
IndexTensor, ValueTensor, SparseTensor = \
cuda_triplet if is_cuda else cpu_triplet
i = IndexTensor([
[0, 1, 2, 2],
[0, 0, 0, 3],
[0, 0, 1, 4],
])
v = self.ValueTensor([2, 1, 3, 4])
x = self.SparseTensor(i, v, torch.Size([3, 4, 5]))
res = self.ValueTensor([
v = ValueTensor([2, 1, 3, 4])
x = SparseTensor(i, v, torch.Size([3, 4, 5]))
res = ValueTensor([
[[2, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
@ -147,23 +99,23 @@ class TestSparse(TestCase):
x.to_dense()
self.assertEqual(res, x.to_dense())
def test_shared(self):
i = self.IndexTensor([[2]])
v = self.ValueTensor([5])
x = self.SparseTensor(i, v, torch.Size([3]))
v[0] = 6
self.assertEqual(self.ValueTensor([0, 0, 6]), x.to_dense())
i[0][0] = 0
self.assertEqual(self.ValueTensor([6, 0, 0]), x.to_dense())
def test_to_dense(self):
self._test_to_dense(False)
def test_to_dense_hybrid(self):
i = self.IndexTensor([
@unittest.skipIf(not TEST_CUDA, 'CUDA not available')
def test_to_dense_cuda(self):
self._test_to_dense(True)
def _test_to_dense_hybrid(self, is_cuda):
IndexTensor, ValueTensor, SparseTensor = \
cuda_triplet if is_cuda else cpu_triplet
i = IndexTensor([
[0, 1, 2, 2],
[0, 0, 0, 3],
])
v = self.ValueTensor([[2, 3], [1, 2], [3, 4], [4, 5]])
x = self.SparseTensor(i, v, torch.Size([3, 4, 2]))
res = self.ValueTensor([
v = ValueTensor([[2, 3], [1, 2], [3, 4], [4, 5]])
x = SparseTensor(i, v, torch.Size([3, 4, 2]))
res = ValueTensor([
[[2, 3],
[0, 0],
[0, 0],
@ -183,131 +135,145 @@ class TestSparse(TestCase):
x.to_dense()
self.assertEqual(res, x.to_dense())
def test_contig(self):
i = self.IndexTensor([
def test_to_dense_hybrid(self):
self._test_to_dense_hybrid(False)
@unittest.skipIf(not TEST_CUDA, 'CUDA not available')
def test_to_dense_hybrid_cuda(self):
self._test_to_dense_hybrid(True)
def _test_contig(self, is_cuda):
IndexTensor, ValueTensor, SparseTensor = \
cuda_triplet if is_cuda else cpu_triplet
i = IndexTensor([
[1, 0, 35, 14, 39, 6, 71, 66, 40, 27],
[92, 31, 62, 50, 22, 65, 89, 74, 56, 34],
])
v = self.ValueTensor([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
x = self.SparseTensor(i, v, torch.Size([100, 100]))
exp_i = self.IndexTensor([
v = ValueTensor([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
x = SparseTensor(i, v, torch.Size([100, 100]))
exp_i = IndexTensor([
[0, 1, 6, 14, 27, 35, 39, 40, 66, 71],
[31, 92, 65, 50, 34, 62, 22, 56, 74, 89],
])
exp_v = self.ValueTensor([2, 1, 6, 4, 10, 3, 5, 9, 8, 7])
exp_v = ValueTensor([2, 1, 6, 4, 10, 3, 5, 9, 8, 7])
x = self.safeCoalesce(x)
self.assertEqual(exp_i, x._indices())
self.assertEqual(exp_v, x._values())
self.assertEqual(exp_i, x.indices())
self.assertEqual(exp_v, x.values())
i = self.IndexTensor([
i = IndexTensor([
[2, 0, 2, 1],
[0, 0, 3, 0],
[1, 0, 4, 0],
])
v = self.ValueTensor([3, 2, 4, 1])
x = self.SparseTensor(i, v, torch.Size([3, 4, 5]))
exp_i = self.IndexTensor([
v = ValueTensor([3, 2, 4, 1])
x = SparseTensor(i, v, torch.Size([3, 4, 5]))
exp_i = IndexTensor([
[0, 1, 2, 2],
[0, 0, 0, 3],
[0, 0, 1, 4],
])
exp_v = self.ValueTensor([2, 1, 3, 4])
exp_v = ValueTensor([2, 1, 3, 4])
x = self.safeCoalesce(x)
self.assertEqual(exp_i, x._indices())
self.assertEqual(exp_v, x._values())
self.assertEqual(exp_i, x.indices())
self.assertEqual(exp_v, x.values())
# Duplicate indices
i = self.IndexTensor([
i = IndexTensor([
[0, 0, 2, 0],
[0, 0, 3, 0],
[0, 0, 4, 0],
])
v = self.ValueTensor([3, 2, 4, 1])
x = self.SparseTensor(i, v, torch.Size([3, 4, 5]))
exp_i = self.IndexTensor([
v = ValueTensor([3, 2, 4, 1])
x = SparseTensor(i, v, torch.Size([3, 4, 5]))
exp_i = IndexTensor([
[0, 2],
[0, 3],
[0, 4],
])
exp_v = self.ValueTensor([6, 4])
exp_v = ValueTensor([6, 4])
x = self.safeCoalesce(x)
self.assertEqual(exp_i, x._indices())
self.assertEqual(exp_v, x._values())
self.assertEqual(exp_i, x.indices())
self.assertEqual(exp_v, x.values())
def test_contig_hybrid(self):
i = self.IndexTensor([
def test_contig(self):
self._test_contig(False)
@unittest.skipIf(not TEST_CUDA, 'CUDA not available')
def test_contig_cuda(self):
self._test_contig(True)
def _test_contig_hybrid(self, is_cuda):
IndexTensor, ValueTensor, SparseTensor = \
cuda_triplet if is_cuda else cpu_triplet
i = IndexTensor([
[1, 0, 35, 14, 39, 6, 71, 66, 40, 27],
[92, 31, 62, 50, 22, 65, 89, 74, 56, 34],
])
v = self.ValueTensor([
v = ValueTensor([
[1, 2], [2, 3], [3, 4], [4, 5], [5, 6],
[6, 7], [7, 8], [8, 9], [9, 10], [10, 11],
])
x = self.SparseTensor(i, v, torch.Size([100, 100, 2]))
exp_i = self.IndexTensor([
x = SparseTensor(i, v, torch.Size([100, 100, 2]))
exp_i = IndexTensor([
[0, 1, 6, 14, 27, 35, 39, 40, 66, 71],
[31, 92, 65, 50, 34, 62, 22, 56, 74, 89],
])
exp_v = self.ValueTensor([
exp_v = ValueTensor([
[2, 3], [1, 2], [6, 7], [4, 5], [10, 11],
[3, 4], [5, 6], [9, 10], [8, 9], [7, 8],
])
x = self.safeCoalesce(x)
self.assertEqual(exp_i, x._indices())
self.assertEqual(exp_v, x._values())
self.assertEqual(exp_i, x.indices())
self.assertEqual(exp_v, x.values())
i = self.IndexTensor([
i = IndexTensor([
[2, 0, 2, 1],
[0, 0, 3, 0],
[1, 0, 4, 0],
])
v = self.ValueTensor([[3, 3, 3], [2, 2, 2], [4, 4, 4], [1, 1, 1]])
x = self.SparseTensor(i, v, torch.Size([3, 4, 5, 3]))
exp_i = self.IndexTensor([
v = ValueTensor([[3, 3, 3], [2, 2, 2], [4, 4, 4], [1, 1, 1]])
x = SparseTensor(i, v, torch.Size([3, 4, 5, 3]))
exp_i = IndexTensor([
[0, 1, 2, 2],
[0, 0, 0, 3],
[0, 0, 1, 4],
])
exp_v = self.ValueTensor([[2, 2, 2], [1, 1, 1], [3, 3, 3], [4, 4, 4]])
exp_v = ValueTensor([[2, 2, 2], [1, 1, 1], [3, 3, 3], [4, 4, 4]])
x = self.safeCoalesce(x)
self.assertEqual(exp_i, x._indices())
self.assertEqual(exp_v, x._values())
self.assertEqual(exp_i, x.indices())
self.assertEqual(exp_v, x.values())
# Duplicate indices
i = self.IndexTensor([
i = IndexTensor([
[0, 0, 2, 0],
[0, 0, 3, 0],
[0, 0, 4, 0],
])
v = self.ValueTensor([[3, 2, 3], [2, 1, 1], [4, 3, 4], [1, 1, 1]])
x = self.SparseTensor(i, v, torch.Size([3, 4, 5, 3]))
exp_i = self.IndexTensor([
v = ValueTensor([[3, 2, 3], [2, 1, 1], [4, 3, 4], [1, 1, 1]])
x = SparseTensor(i, v, torch.Size([3, 4, 5, 3]))
exp_i = IndexTensor([
[0, 2],
[0, 3],
[0, 4],
])
exp_v = self.ValueTensor([[6, 4, 5], [4, 3, 4]])
exp_v = ValueTensor([[6, 4, 5], [4, 3, 4]])
x = self.safeCoalesce(x)
self.assertEqual(exp_i, x._indices())
self.assertEqual(exp_v, x._values())
self.assertEqual(exp_i, x.indices())
self.assertEqual(exp_v, x.values())
def test_clone(self):
x, _, _ = self._gen_sparse(4, 20, 5)
if self.is_uncoalesced:
self.assertFalse(x.is_coalesced())
y = x.clone()
self.assertFalse(y.is_coalesced())
x = x.coalesce()
self.assertTrue(x.is_coalesced())
y = x.clone()
self.assertTrue(y.is_coalesced())
def test_contig_hybrid(self):
self._test_contig_hybrid(False)
def test_transpose(self):
x = self._gen_sparse(4, 20, 5)[0]
@unittest.skipIf(not TEST_CUDA, 'CUDA not available')
def test_contig_hybrid_cuda(self):
self._test_contig_hybrid(True)
def _test_transpose(self, is_cuda):
x = self._gen_sparse(4, 20, 5, is_cuda=is_cuda)[0]
y = x.to_dense()
for i, j in itertools.combinations(range(4), 2):
@ -319,7 +285,13 @@ class TestSparse(TestCase):
y = y.transpose(i, j)
self.assertEqual(x.to_dense(), y)
@cpu_only
def test_transpose(self):
self._test_transpose(False)
@unittest.skipIf(not TEST_CUDA, 'CUDA not available')
def test_transpose_cuda(self):
self._test_transpose(True)
def test_mm(self):
def test_shape(di, dj, dk):
x, _, _ = self._gen_sparse(2, 20, [di, dj])
@ -344,7 +316,6 @@ class TestSparse(TestCase):
test_shape(100, 1000, 200)
test_shape(64, 10000, 300)
@cpu_only
def test_saddmm(self):
def test_shape(di, dj, dk):
x = self._gen_sparse(2, 20, [di, dj])[0]
@ -369,10 +340,12 @@ class TestSparse(TestCase):
test_shape(1000, 100, 100)
test_shape(3000, 64, 300)
def test_dsmm(self):
def _test_dsmm(self, is_cuda):
def test_shape(di, dj, dk):
x = self._gen_sparse(2, 20, [di, dj])[0]
y = self.randn(dj, dk)
x = self._gen_sparse(2, 20, [di, dj], is_cuda)[0]
y = torch.randn(dj, dk)
if is_cuda:
y = y.cuda()
res = torch.dsmm(x, y)
expected = torch.mm(x.to_dense(), y)
@ -382,10 +355,19 @@ class TestSparse(TestCase):
test_shape(1000, 100, 100)
test_shape(3000, 64, 300)
def test_hsmm(self):
def test_dsmm(self):
self._test_dsmm(False)
@unittest.skipIf(not TEST_CUDA, 'CUDA not available')
def test_dsmm_cuda(self):
self._test_dsmm(True)
def _test_hsmm(self, is_cuda):
def test_shape(di, dj, dk):
x = self._gen_sparse(2, 20, [di, dj])[0]
y = self.randn(dj, dk)
x = self._gen_sparse(2, 20, [di, dj], is_cuda)[0]
y = torch.randn(dj, dk)
if is_cuda:
y = y.cuda()
res = torch.hsmm(x, y)
expected = torch.mm(x.to_dense(), y)
@ -395,10 +377,19 @@ class TestSparse(TestCase):
test_shape(1000, 100, 100)
test_shape(3000, 64, 300)
def _test_spadd_shape(self, shape_i, shape_v=None):
def test_hsmm(self):
self._test_hsmm(False)
@unittest.skipIf(not TEST_CUDA, 'CUDA not available')
def test_hsmm_cuda(self):
self._test_hsmm(True)
def _test_spadd_shape(self, is_cuda, shape_i, shape_v=None):
shape = shape_i + (shape_v or [])
x, _, _ = self._gen_sparse(len(shape_i), 10, shape)
y = self.randn(*shape)
x, _, _ = self._gen_sparse(len(shape_i), 10, shape, is_cuda)
y = torch.randn(*shape)
if is_cuda:
y = y.cuda()
r = random.random()
res = torch.add(y, r, x)
@ -410,7 +401,9 @@ class TestSparse(TestCase):
s = list(shape)
s[0] = shape[-1]
s[-1] = shape[0]
y = self.randn(*s)
y = torch.randn(*s)
if is_cuda:
y = y.cuda()
y.transpose_(0, len(s) - 1)
r = random.random()
@ -419,22 +412,36 @@ class TestSparse(TestCase):
self.assertEqual(res, expected)
def _test_spadd(self, is_cuda):
self._test_spadd_shape(is_cuda, [5, 6])
self._test_spadd_shape(is_cuda, [10, 10, 10])
self._test_spadd_shape(is_cuda, [50, 30, 20])
self._test_spadd_shape(is_cuda, [5, 5, 5, 5, 5, 5])
def test_spadd(self):
self._test_spadd_shape([5, 6])
self._test_spadd_shape([10, 10, 10])
self._test_spadd_shape([50, 30, 20])
self._test_spadd_shape([5, 5, 5, 5, 5, 5])
self._test_spadd(False)
@unittest.skipIf(not TEST_CUDA, 'CUDA not available')
def test_spadd_cuda(self):
self._test_spadd(True)
def _test_spadd_hybrid(self, is_cuda):
self._test_spadd_shape(is_cuda, [5, 6], [2, 3])
self._test_spadd_shape(is_cuda, [10, 10, 10], [3])
self._test_spadd_shape(is_cuda, [50, 30, 20], [2])
self._test_spadd_shape(is_cuda, [5, 5, 5, 5, 5, 5], [2])
def test_spadd_hybrid(self):
self._test_spadd_shape([5, 6], [2, 3])
self._test_spadd_shape([10, 10, 10], [3])
self._test_spadd_shape([50, 30, 20], [2])
self._test_spadd_shape([5, 5, 5, 5, 5, 5], [2])
self._test_spadd_hybrid(False)
def _test_basic_ops_shape(self, shape_i, shape_v=None):
@unittest.skipIf(not TEST_CUDA, 'CUDA not available')
def test_spadd_hybrid_cuda(self):
self._test_spadd_hybrid(True)
def _test_basic_ops_shape(self, is_cuda, shape_i, shape_v=None):
shape = shape_i + (shape_v or [])
x1, _, _ = self._gen_sparse(len(shape_i), 9, shape)
x2, _, _ = self._gen_sparse(len(shape_i), 12, shape)
x1, _, _ = self._gen_sparse(len(shape_i), 9, shape, is_cuda)
x2, _, _ = self._gen_sparse(len(shape_i), 12, shape, is_cuda)
y1 = x1 + x2
y2 = x1.clone()
@ -491,25 +498,39 @@ class TestSparse(TestCase):
self.assertTrue(y.is_coalesced())
self.assertEqual(x1, y)
# check that coalesce is out of place
y._values().add_(1)
self.assertEqual(z._values() + 1, y._values())
y.values().add_(1)
self.assertEqual(z.values() + 1, y.values())
def _test_basic_ops(self, is_cuda):
self._test_basic_ops_shape(is_cuda, [5, 6])
self._test_basic_ops_shape(is_cuda, [10, 10, 10])
self._test_basic_ops_shape(is_cuda, [50, 30, 20])
self._test_basic_ops_shape(is_cuda, [5, 5, 5, 5, 5, 5])
def test_basic_ops(self):
self._test_basic_ops_shape([5, 6])
self._test_basic_ops_shape([10, 10, 10])
self._test_basic_ops_shape([50, 30, 20])
self._test_basic_ops_shape([5, 5, 5, 5, 5, 5])
self._test_basic_ops(False)
@unittest.skipIf(not TEST_CUDA, 'CUDA not available')
def test_basic_ops_cuda(self):
self._test_basic_ops(True)
def _test_basic_ops_hybrid(self, is_cuda):
self._test_basic_ops_shape(is_cuda, [5, 6], [2, 3])
self._test_basic_ops_shape(is_cuda, [10, 10, 10], [3])
self._test_basic_ops_shape(is_cuda, [50, 30, 20], [2])
self._test_basic_ops_shape(is_cuda, [5, 5, 5, 5, 5, 5], [2])
def test_basic_ops_hybrid(self):
self._test_basic_ops_shape([5, 6], [2, 3])
self._test_basic_ops_shape([10, 10, 10], [3])
self._test_basic_ops_shape([50, 30, 20], [2])
self._test_basic_ops_shape([5, 5, 5, 5, 5, 5], [2])
self._test_basic_ops_hybrid(False)
def _test_sparse_mask_shape(self, shape_i, shape_v=None):
@unittest.skipIf(not TEST_CUDA, 'CUDA not available')
def test_basic_ops_hybrid_cuda(self):
self._test_basic_ops_hybrid(True)
def _test_sparse_mask_shape(self, is_cuda, shape_i, shape_v=None):
shape = shape_i + (shape_v or [])
x1, _, _ = self._gen_sparse(len(shape_i), 9, shape)
x2, _, _ = self._gen_sparse(len(shape_i), 12, shape)
x1, _, _ = self._gen_sparse(len(shape_i), 9, shape, is_cuda)
x2, _, _ = self._gen_sparse(len(shape_i), 12, shape, is_cuda)
y1 = x1 + x2
y2 = x1.clone()
@ -518,108 +539,78 @@ class TestSparse(TestCase):
self.assertEqual(y1.to_dense(), expected)
self.assertEqual(y2.to_dense(), expected)
def _test_sparse_mask_fixed(self):
i = self.IndexTensor([
[1, 3, 0, 4],
[2, 1, 2, 3],
def _test_sparse_mask_fixed(self, is_cuda):
IndexTensor, ValueTensor, SparseTensor = \
cuda_triplet if is_cuda else cpu_triplet
i = IndexTensor([
[1, 3, 3, 0, 4],
[2, 1, 1, 2, 3],
])
v = self.ValueTensor([1, 2, 3, 4])
x = self.SparseTensor(i, v, torch.Size([5, 4])).coalesce()
dense = self.ValueTensor([
v = ValueTensor([1, 2, 3, 4, 5])
x = SparseTensor(i, v, torch.Size([5, 4]))
dense = ValueTensor([
[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12],
[13, 14, 15, 16],
[17, 18, 19, 20],
])
exp_v = self.ValueTensor([7, 14, 3, 20])
res = dense._sparse_mask(x)
expected = self.SparseTensor(i, exp_v, torch.Size([5, 4]))
exp_v = ValueTensor([7, 14, 14, 3, 20])
res = dense.sparse_mask(x)
expected = SparseTensor(i, exp_v, torch.Size([5, 4]))
self.assertEqual(res, expected)
def _test_sparse_mask(self, is_cuda):
self._test_sparse_mask_fixed(is_cuda)
self._test_sparse_mask_shape(is_cuda, [5, 6])
self._test_sparse_mask_shape(is_cuda, [10, 10, 10])
self._test_sparse_mask_shape(is_cuda, [50, 30, 20])
self._test_sparse_mask_shape(is_cuda, [5, 5, 5, 5, 5, 5])
def test_sparse_mask(self):
self._test_sparse_mask_fixed()
self._test_sparse_mask(False)
self._test_sparse_mask_shape([5, 6])
self._test_sparse_mask_shape([10, 10, 10])
self._test_sparse_mask_shape([50, 30, 20])
self._test_sparse_mask_shape([5, 5, 5, 5, 5, 5])
@unittest.skipIf(not TEST_CUDA, 'CUDA not available')
def test_sparse_mask_cuda(self):
self._test_sparse_mask(True)
def _test_sparse_mask_hybrid_fixed(self):
i = self.IndexTensor([
[1, 3, 0, 4],
[2, 1, 2, 3],
def _test_sparse_mask_hybrid_fixed(self, is_cuda):
IndexTensor, ValueTensor, SparseTensor = \
cuda_triplet if is_cuda else cpu_triplet
i = IndexTensor([
[1, 3, 3, 0, 4],
[2, 1, 1, 2, 3],
])
v = self.ValueTensor([[1, 2], [2, 3], [3, 4], [4, 5]])
# TODO: This is also testing that, if coalesce is a no-op,
# the indices don't get permuted. I don't know if we actually
# want to give this invariant.
x = self.SparseTensor(i, v, torch.Size([5, 4, 2])).coalesce()
dense = self.ValueTensor([
v = ValueTensor([[1, 2], [2, 3], [3, 4], [4, 5], [5, 6]])
x = SparseTensor(i, v, torch.Size([5, 4, 2]))
dense = ValueTensor([
[[1, 3], [2, 2], [3, 3], [4, 2]],
[[5, 7], [6, 7], [7, 9], [8, 9]],
[[9, 2], [10, 4], [11, 1], [12, 3]],
[[13, 5], [14, 1], [15, 1], [16, 6]],
[[17, 7], [18, 2], [19, 7], [20, 1]],
])
res = dense._sparse_mask(x)
exp_v = self.ValueTensor([[7, 9], [14, 1], [3, 3], [20, 1]])
expected = self.SparseTensor(i, exp_v, torch.Size([5, 4, 2]))
res = dense.sparse_mask(x)
exp_v = ValueTensor([[7, 9], [14, 1], [14, 1], [3, 3], [20, 1]])
expected = SparseTensor(i, exp_v, torch.Size([5, 4, 2]))
self.assertEqual(res, expected)
def _test_sparse_mask_hybrid(self, is_cuda):
self._test_sparse_mask_hybrid_fixed(is_cuda)
self._test_sparse_mask_shape(is_cuda, [5, 6], [2, 3])
self._test_sparse_mask_shape(is_cuda, [10, 10, 10], [3])
self._test_sparse_mask_shape(is_cuda, [50, 30, 20], [2])
self._test_sparse_mask_shape(is_cuda, [5, 5, 5, 5, 5, 5], [2])
def test_sparse_mask_hybrid(self):
self._test_sparse_mask_hybrid_fixed()
self._test_sparse_mask_hybrid(False)
self._test_sparse_mask_shape([5, 6], [2, 3])
self._test_sparse_mask_shape([10, 10, 10], [3])
self._test_sparse_mask_shape([50, 30, 20], [2])
self._test_sparse_mask_shape([5, 5, 5, 5, 5, 5], [2])
@unittest.skipIf(not TEST_CUDA, 'CUDA not available')
def test_sparse_mask_hybrid_cuda(self):
self._test_sparse_mask_hybrid(True)
@cuda_only
def test_storage_not_null(self):
x = torch.cuda.sparse.FloatTensor(2)
self.assertNotEqual(x.get_device(), -1)
@cuda_only
@unittest.skipIf(torch.cuda.device_count() < 2, "only one GPU detected")
def test_same_gpu(self):
i = self.IndexTensor([[2]]).cuda(1)
v = self.ValueTensor([5]).cuda(1)
x = self.SparseTensor(i, v, torch.Size([3]), device=1)
self.assertEqual(x.get_device(), 1)
self.assertEqual(x._values().get_device(), 1)
self.assertEqual(x._indices().get_device(), 1)
x = self.SparseTensor(3, device=1)
self.assertEqual(x.get_device(), 1)
self.assertEqual(x._values().get_device(), 1)
self.assertEqual(x._indices().get_device(), 1)
v = self.ValueTensor([5]).cuda(0)
self.assertRaises(RuntimeError, lambda: self.SparseTensor(i, v, torch.Size([3])))
class TestUncoalescedSparse(TestSparse):
def setUp(self):
super(TestUncoalescedSparse, self).setUp()
self.is_uncoalesced = True
@unittest.skipIf(not TEST_CUDA, 'CUDA not available')
class TestCudaSparse(TestSparse):
def setUp(self):
super(TestCudaSparse, self).setUp()
self.is_cuda = True
self.IndexTensor = torch.cuda.LongTensor
self.ValueTensor = torch.cuda.DoubleTensor
self.SparseTensor = torch.cuda.sparse.DoubleTensor
@unittest.skipIf(not TEST_CUDA, 'CUDA not available')
class TestCudaUncoalescedSparse(TestCudaSparse):
def setUp(self):
super(TestCudaUncoalescedSparse, self).setUp()
self.is_uncoalesced = True
if __name__ == '__main__':
run_tests()

File diff suppressed because it is too large Load Diff

View File

@ -336,11 +336,16 @@ class TestLuaReader(TestCase):
@classmethod
def init(cls):
try:
path = download_file('https://download.pytorch.org/test_data/legacy_modules.t7')
except unittest.SkipTest:
DATA_URL = 'https://download.pytorch.org/test_data/legacy_modules.t7'
data_dir = os.path.join(os.path.dirname(__file__), 'data')
test_file_path = os.path.join(data_dir, 'legacy_modules.t7')
succ = download_file(DATA_URL, test_file_path)
if not succ:
warnings.warn(("Couldn't download the test file for TestLuaReader! "
"Tests will be incomplete!"), RuntimeWarning)
return
tests = load_lua(path)
tests = load_lua(test_file_path)
for name, test in tests['modules'].items():
test_name = 'test_' + name.replace('nn.', '')
setattr(cls, test_name, cls._module_test(name, test))

View File

@ -4,7 +4,6 @@ from string import Template
from copy import deepcopy
from .plugins import ArgcountChecker, OptionalArguments, ArgumentReferences, \
BeforeAfterCall, ConstantArguments, ReturnArguments, GILRelease
from ..shared import cwrap_common
class cwrap(object):
@ -36,11 +35,11 @@ class cwrap(object):
DEFAULT_PLUGIN_CLASSES = [ArgcountChecker, ConstantArguments, OptionalArguments,
ArgumentReferences, BeforeAfterCall, ReturnArguments, GILRelease]
def __init__(self, source, destination=None, plugins=None, default_plugins=True):
def __init__(self, source, destination=None, plugins=[], default_plugins=True):
if destination is None:
destination = source.replace('.cwrap', '.cpp')
self.plugins = [] if plugins is None else plugins
self.plugins = plugins
if default_plugins:
defaults = [cls() for cls in self.DEFAULT_PLUGIN_CLASSES]
self.plugins = defaults + self.plugins
@ -52,10 +51,7 @@ class cwrap(object):
with open(source, 'r') as f:
declarations = f.read()
# wrap all the declarations in the source .cwrap file
wrapper = self.wrap_declarations(declarations)
# let each plugin do any post-processing of the wrapped file
for plugin in self.plugins:
wrapper = plugin.process_full_file(wrapper)
@ -77,7 +73,7 @@ class cwrap(object):
elif line == ']]':
in_declaration = False
declaration = yaml.load('\n'.join(declaration_lines))
cwrap_common.set_declaration_defaults(declaration)
self.set_declaration_defaults(declaration)
# Pass declaration in a list - maybe some plugins want to add
# multiple wrappers
@ -105,6 +101,24 @@ class cwrap(object):
return '\n'.join(output)
def set_declaration_defaults(self, declaration):
declaration.setdefault('arguments', [])
declaration.setdefault('return', 'void')
if 'cname' not in declaration:
declaration['cname'] = declaration['name']
# Simulate multiple dispatch, even if it's not necessary
if 'options' not in declaration:
declaration['options'] = [{'arguments': declaration['arguments']}]
del declaration['arguments']
# Parse arguments (some of them can be strings)
for option in declaration['options']:
option['arguments'] = self.parse_arguments(option['arguments'])
# Propagate defaults from declaration to options
for option in declaration['options']:
for k, v in declaration.items():
if k != 'name' and k != 'options':
option.setdefault(k, v)
def parse_arguments(self, args):
new_args = []
for arg in args:
@ -122,10 +136,6 @@ class cwrap(object):
return new_args
def search_plugins(self, fnname, args, fallback):
"""Search plugins for the given function to call with args.
If not found, call fallback with args.
"""
for plugin in self.plugins:
wrapper = getattr(plugin, fnname)(*args)
if wrapper is not None:

View File

@ -1,6 +1,4 @@
import os
from . import CWrapPlugin
from ...shared import cwrap_common
class ArgcountSortPlugin(CWrapPlugin):
@ -9,7 +7,8 @@ class ArgcountSortPlugin(CWrapPlugin):
self.descending = descending
def process_declarations(self, declarations):
def num_checked_args(option):
return sum(map(lambda a: not a.get('ignore_check', False), option['arguments']))
for declaration in declarations:
cwrap_common.sort_by_number_of_options(declaration,
self.descending)
declaration['options'].sort(key=num_checked_args, reverse=self.descending)
return declarations

View File

@ -1,29 +0,0 @@
from . import CWrapPlugin
from string import Template
class AssertNDim(CWrapPlugin):
PRE_CODE_TEMPLATE = Template(
"""if(THTensor_(nDimension)(LIBRARY_STATE ${arg_op}) != ${dim_value}) {
THError("Expected argument %s to have %d dimension(s), but has %d",
"${op}", ${dim_value}, THTensor_(nDimension)(LIBRARY_STATE ${arg_op}));
}
""")
def process_option_code_template(self, template, option):
new_code_pre = []
for _, arg in enumerate(option['arguments']):
if 'assert_ndim' not in arg:
continue
dim_value = arg.get('assert_ndim')
op = arg.get('assign_name', arg['name'])
arg_op = "arg_" + op
new_code_pre.append(self.PRE_CODE_TEMPLATE.substitute(op=op,
arg_op=arg_op,
dim_value=dim_value))
template = new_code_pre + template
return template

View File

@ -1,12 +1,6 @@
from . import CWrapPlugin
from string import Template
import sys
if sys.version_info[0] == 3:
string_type = str
else:
string_type = basestring
class BoolOption(CWrapPlugin):
@ -21,8 +15,7 @@ class BoolOption(CWrapPlugin):
for arg in option['arguments']:
if self.is_bool_option(arg):
arg['is_bool_option'] = True
if isinstance(arg['if_true'], string_type):
arg['type'] = 'const char*'
arg['type'] = 'const char*'
return declarations
def get_type_check(self, arg, option):

View File

@ -1,318 +0,0 @@
from . import CWrapPlugin
from string import Template
# Arguments to the Broadcast Plugin:
# broadcast: args_to_broadcast_against [inplace] [fallback]
# [args_to_broadcast_against]: either a single argument (e.g. "arg1") or a comma-seperated
# list of two arguments (e.g. "tensor1,tensor2") indicating
# arguments to broadcast specified argument (usually "self") against
# [inplace] will generate code for in-place function, which doesn't allow the in-place
# argument to be broadcast
# [fallback] if tensors aren't broadcastable, preserves "element number" pointwise behavior,
# where only number of elements need to match, and tensors are viewed as 1-dimensional.
# [dims] specify if the tensors shouldn't be broadcast to a specific tensor or tensors, but a combination
# of individual dimension sizes of a set of tensors. For example: addbmm(C,A,B) a.k.a. [C + A @ B]
# broadcasts C to the first dimension of A and the second dimension of B. Each dimension is specified as
# [arg].dim[#] and dimensions are comma-separated. So, to specify that the tensor should be
# broadcast to 3-dimensions with sizes:
# tensor0->size[0] x tensor1->size[1] x tensor2->size[2]
# you would write:
# dims:tensor0.dim0,tensor1.dim1,tensor2.dim2
# [types] if the tensors should be of different types than THTensor, specify as X where
# the actual type to use is THXTensor (i.e. Byte for THByteTensor). If the type
# should be THTensor, use 'Real'
# For out of place:
# Two args: expand the two args together
# Three args (fused kernels): (e.g. addcmul) expand all three args together
# Sketch of proof that this is the same:
# consider addcmul, under expansion we want: a + (b * c) = (a + b * c) [all expanded together]
# Let e(i, j) be the expansion of i with j, e(i, j, k) be the expansion of i with j,k
#
# Then a + (b * c) = e(a, e(b,c) * e(c,b)) + e(e(b,c) * e(c,b), a)
# = e(a, e(b,c)) + e(e(b,c) * e(c,b), a) (only size matters for second param)
# = e(a,b,c) + e(e(b,c) * e(c,b), a) (by associativity of max in expand)
# = e(a,b,c) + e(b,c,a) * e(c,b,a) (see L1)
# which is a + b * c all expanded together
#
# L1: Show e(i * j, a) = e(i,a) * e(j,a) where i,j have same size
# Consider any index _{ s_0, ..., s_n}
# e(i * j, a) = (i*j)_{f(s_0), ...,f(s_n)} where f is the expansion of that dimension with a
# = i_{f(s_0), ..., f(s_n)} * j_{f(s_0), ..., f(s_n)} by definition of pointwise operator
# = e(i,a) * e(j,a)
class Broadcast(CWrapPlugin):
# Save and restore passed in arguments in case later plugins use
POST_TEMPLATE = Template(
"""${arg_op_other} = ${arg_op_other}_save;\n""")
def getPreArgStringTemplate(self, type=None):
if type is None:
ret = """THTensor *${arg_op_other}_save = ${arg_op_other};
THTensorPtr ${arg_op_other}_guard(THTensor_(new)(LIBRARY_STATE_NOARGS));\n"""
else:
cpu_t = "TH" + type + "Tensor"
gpu_t = "THCuda" + type + "Tensor"
ret = ("#if !IS_CUDA\n" +
cpu_t + " *${arg_op_other}_save = ${arg_op_other};\n" +
cpu_t + "Ptr ${arg_op_other}_guard(" + cpu_t + "_new(LIBRARY_STATE_NOARGS));\n" +
"#else\n" +
gpu_t + " *${arg_op_other}_save = ${arg_op_other};\n" +
"THPPointer<" + gpu_t + "> ${arg_op_other}_guard(\n" + gpu_t + "_new(LIBRARY_STATE_NOARGS));\n" +
"#endif\n")
return Template(ret)
def getExpandTemplate(self, expand_call, success_code, raise_errors):
if not raise_errors:
return Template(
"bool expand_success = false;\n" +
"try {\n" +
expand_call +
"\nexpand_success = true;\n" +
"}\n"
"catch (std::exception &e) {}\n" +
"if(expand_success) {\n" +
success_code +
"\n}\n")
else:
return Template(
expand_call + "\n" +
success_code + "\n")
def getOutPlacePreExpand2Template(self, raise_errors):
expand_code = """expand_outplace2(LIBRARY_STATE ${arg_op_a}_guard.get(), ${arg_op_other}_guard.get(),
${arg_op_a}, ${arg_op_other},
\"${op_a}\", \"${op_other}\", !${raise_errors});"""
success_code = """${arg_op_a} = ${arg_op_a}_guard.get();
${arg_op_other} = ${arg_op_other}_guard.get();"""
return self.getExpandTemplate(expand_code, success_code, raise_errors)
def getOutPlacePreExpand3Template(self, raise_errors):
expand_code = """expand_outplace3(LIBRARY_STATE ${arg_op_a}_guard.get(),
${arg_op_other1}_guard.get(), ${arg_op_other2}_guard.get(),
${arg_op_a}, ${arg_op_other1}, ${arg_op_other2},
\"${op_a}\", \"${op_other1}\", \"${op_other2}\", !${raise_errors});"""
success_code = """${arg_op_a} = ${arg_op_a}_guard.get();
${arg_op_other1} = ${arg_op_other1}_guard.get();
${arg_op_other2} = ${arg_op_other2}_guard.get();"""
return self.getExpandTemplate(expand_code, success_code, raise_errors)
OUT_PLACE_PRE_EXPAND_PRE_DIM_TEMPLATE = Template(
"""if(THTensor_(nDimension)(LIBRARY_STATE ${arg_op_dim}) <= ${arg_op_dim_value}) {
THError("Argument %s requires at least %d dimensions, but only has %d",
"${op_dim}", ${arg_op_dim_value} + 1, THTensor_(nDimension)(LIBRARY_STATE ${arg_op_dim}));
}
long ${arg_op_a}_dim${idx}_size = THTensor_(size)(LIBRARY_STATE ${arg_op_dim}, ${arg_op_dim_value});\n""")
OUT_PLACE_PRE_EXPAND1_DIM_TEMPLATE = Template(
"""THLongStoragePtr ${arg_op_a}_storage(THLongStorage_newWithSize1(${arg_op_a}_dim0_size));\n""")
OUT_PLACE_PRE_EXPAND2_DIM_TEMPLATE = Template(
"""THLongStoragePtr ${arg_op_a}_storage(
THLongStorage_newWithSize2(${arg_op_a}_dim0_size, ${arg_op_a}_dim1_size));\n""")
OUT_PLACE_PRE_EXPAND3_DIM_TEMPLATE = Template(
"""THLongStoragePtr ${arg_op_a}_storage(
THLongStorage_newWithSize3(${arg_op_a}_dim0_size, ${arg_op_a}_dim1_size, ${arg_op_a}_dim2_size));\n""")
def getOutPlacePreExpandPostDimTemplate(self, raise_errors):
expand_code = """expand(LIBRARY_STATE ${arg_op_a}_guard.get(), ${arg_op_a}, ${arg_op_a}_storage);"""
success_code = """${arg_op_a} = ${arg_op_a}_guard.get();"""
return self.getExpandTemplate(expand_code, success_code, raise_errors)
OUT_PLACE_PRE_TEMPLATE = Template(
"""${code_arg_op_a}${code_arg_op_other1}${code_arg_op_other2}
${expand_code}""")
def getInPlacePreExpand1Template(self, raise_errors):
expand_code = """expand_inplace1(LIBRARY_STATE ${arg_op_other}_guard.get(), ${arg_op_other}, ${arg_op_a},
\"${op_other}\", \"${op_a}\", !${raise_errors});"""
success_code = """${arg_op_other} = ${arg_op_other}_guard.get();"""
return self.getExpandTemplate(expand_code, success_code, raise_errors)
def getInPlacePreExpand2Template(self, raise_errors):
expand_code = """expand_inplace2(LIBRARY_STATE ${arg_op_other1}_guard.get(), ${arg_op_other2}_guard.get(),
${arg_op_other1}, ${arg_op_other2}, ${arg_op_a},
\"${op_other1}\", \"${op_other2}\", \"${op_a}\", !${raise_errors});"""
success_code = """${arg_op_other1} = ${arg_op_other1}_guard.get();
${arg_op_other2} = ${arg_op_other2}_guard.get();"""
return self.getExpandTemplate(expand_code, success_code, raise_errors)
IN_PLACE_PRE_TEMPLATE = Template(
"""${code_arg_op_other1}${code_arg_op_other2}
${expand_code}""")
def initialize(self, cwrap):
self.cwrap = cwrap
# Arguments:
# [0]: name of tensor to broadcast with (possibly two comma separated)
# [1] inplace (optional). In place operations only broadcast on second tensor argument
# [2] fallback (optional). Will fallback to applying to tensor of equal nElem if broadcast fails
def process_option_code_template(self, template, option):
new_code_pre = []
new_code_post = []
for _, arg in enumerate(option['arguments']):
if 'broadcast' not in arg:
continue
params = arg.get('broadcast').split(" ")
op_a = arg.get('assign_name', arg['name'])
in_place = "inplace" in params
raise_errors = "false" if "fallback" in params else "true"
param_others = params[0].split(",")
if len(param_others) > 2:
raise ValueError('Broadcast only supports up to 2 secondary parameters')
op_b = param_others[0]
op_c = param_others[1] if len(param_others) == 2 else None
arg_op_b = "arg_" + op_b
arg_op_a = "arg_" + op_a
arg_op_c = ("arg_" + op_c) if op_c else None
dims_kvs = []
for p in params:
if p.startswith("dims:"):
assert(raise_errors == "true")
if len(dims_kvs) != 0:
raise ValueError("multiple specifications of dims")
dims = p[len("dims:"):].split(",")
for dim in dims:
batchdim = dim.split(".")
assert len(batchdim) == 2
assert batchdim[1].startswith("dim")
dim_val = batchdim[1][len("dim"):]
dims_kvs.append({"op": batchdim[0], "arg_op": "arg_" + batchdim[0], "val": dim_val})
assert len(dims_kvs) <= 3
for p in params[1:]:
if p != "inplace" and p != "fallback" and not p.startswith("dims:") and not p.startswith("types:"):
raise ValueError("invalid parameter {}".format(p))
type_op_b = None
type_op_c = None
for p in params:
if p.startswith("types:"):
if not in_place and len(dims_kvs) > 0:
raise ValueError("type specification not supported yet for out-of-place functions "
"that specify explicit dimensions")
types = p[len("types:"):].split(",")
assert(len(types) == (2 if op_c else 1))
type_op_b = None if types[0] == "Real" else types[0]
if op_c:
type_op_c = None if types[1] == "Real" else types[1]
op_b_mapping = {
"op_a": op_a,
"op_other": op_b,
"arg_op_a": arg_op_a,
"arg_op_other": arg_op_b,
"raise_errors": raise_errors
}
op_c_mapping = {
"op_a": op_a,
"op_other": op_c,
"arg_op_a": arg_op_a,
"arg_op_other": arg_op_c,
"raise_errors": raise_errors
}
if in_place:
code_arg_op_other1 = self.getPreArgStringTemplate(type=type_op_b).substitute(op_b_mapping)
code_arg_op_other2 = (
self.getPreArgStringTemplate(type=type_op_c).substitute(op_c_mapping) if op_c else "")
if op_c:
expand_code = self.getInPlacePreExpand2Template(raise_errors == "true").substitute(
op_b_mapping,
op_other1=op_b,
op_other2=op_c,
arg_op_other1=arg_op_b,
arg_op_other2=arg_op_c)
else:
expand_code = self.getInPlacePreExpand1Template(raise_errors == "true").substitute(op_b_mapping)
new_code_pre.append(self.IN_PLACE_PRE_TEMPLATE.substitute(
arg_op_a=arg_op_a,
code_arg_op_other1=code_arg_op_other1,
code_arg_op_other2=code_arg_op_other2,
expand_code=expand_code,
raise_errors=raise_errors))
new_code_pre.append("")
post_code = self.POST_TEMPLATE.substitute(op_b_mapping)
if op_c:
post_code += self.POST_TEMPLATE.substitute(op_c_mapping)
new_code_post.append(post_code)
new_code_post.append("")
else:
if len(dims_kvs) != 0:
code_arg_op_a = self.getPreArgStringTemplate().substitute(arg_op_other=arg_op_a)
code_arg_op_other1 = ""
code_arg_op_other2 = ""
expand_code = ""
for idx, kv in enumerate(dims_kvs):
expand_code += self.OUT_PLACE_PRE_EXPAND_PRE_DIM_TEMPLATE.substitute(
arg_op_a=arg_op_a,
op_dim=kv["op"],
arg_op_dim=kv["arg_op"],
arg_op_dim_value=kv["val"],
idx=idx)
if len(dims_kvs) == 1:
expand_code += self.OUT_PLACE_PRE_EXPAND1_DIM_TEMPLATE.substitute(
arg_op_a=arg_op_a,
arg_op_dim0=dims_kvs[0]["arg_op"])
elif len(dims_kvs) == 2:
expand_code += self.OUT_PLACE_PRE_EXPAND2_DIM_TEMPLATE.substitute(
arg_op_a=arg_op_a,
arg_op_dim0=dims_kvs[0]["arg_op"],
arg_op_dim1=dims_kvs[1]["arg_op"])
else:
expand_code += self.OUT_PLACE_PRE_EXPAND3_DIM_TEMPLATE.substitute(
arg_op_a=arg_op_a,
arg_op_dim0=dims_kvs[0]["arg_op"],
arg_op_dim1=dims_kvs[1]["arg_op"],
arg_op_dim2=dims_kvs[2]["arg_op"])
expand_code += self.getOutPlacePreExpandPostDimTemplate(raise_errors == "true").substitute(
arg_op_a=arg_op_a,
raise_errors=raise_errors)
post_code = self.POST_TEMPLATE.substitute(arg_op_other=arg_op_a)
else:
code_arg_op_a = self.getPreArgStringTemplate().substitute(arg_op_other=arg_op_a)
code_arg_op_other1 = self.getPreArgStringTemplate(type=type_op_b).substitute(op_b_mapping)
code_arg_op_other2 = (self.getPreArgStringTemplate(type=type_op_c).substitute(op_c_mapping)
if op_c else "")
if op_c:
expand_code = self.getOutPlacePreExpand3Template(raise_errors == "true").substitute(
op_b_mapping,
op_other1=op_b,
op_other2=op_c,
arg_op_other1=arg_op_b,
arg_op_other2=arg_op_c)
else:
expand_code = self.getOutPlacePreExpand2Template(
raise_errors == "true").substitute(op_b_mapping)
post_code = self.POST_TEMPLATE.substitute(arg_op_other=arg_op_a)
post_code += self.POST_TEMPLATE.substitute(op_b_mapping)
post_code += self.POST_TEMPLATE.substitute(op_c_mapping) if op_c else ""
new_code_pre.append(self.OUT_PLACE_PRE_TEMPLATE.substitute(
code_arg_op_a=code_arg_op_a,
code_arg_op_other1=code_arg_op_other1,
code_arg_op_other2=code_arg_op_other2,
expand_code=expand_code))
new_code_pre.append("")
new_code_post.append(post_code)
new_code_post.append("")
template = new_code_pre + template + new_code_post
return template

View File

@ -135,7 +135,7 @@ static PyObject * $name(PyObject *self, PyObject *args, PyObject *kwargs)
if arg['name'] in ['self', 'state', 'dataType', 'handle']:
arg['ignore_check'] = True
declaration['options'] = self.filter_unique_options(declaration['options'])
return [d for d in declarations if not d.get('only_register', False)]
return declarations
def filter_unique_options(self, options):
def signature(option):

View File

@ -23,8 +23,6 @@ class GILRelease(CWrapPlugin):
]
def process_option_code_template(self, template, option):
if option.get('with_gil', False):
return template
call_idx = template.index('$call')
template.insert(call_idx, self.BEFORE_CALL)
template.insert(call_idx + 2, self.AFTER_CALL)

View File

@ -30,8 +30,10 @@ class KwargsPlugin(CWrapPlugin):
for option in declaration['options']:
offset = 0
for arg in option['arguments']:
if arg.get('kwarg_only'):
arg['no_idx'] = True
if arg.get('kwarg_only') and not arg.get('ignore_check', False):
offset += 1
else:
arg['kwarg_offset'] = offset
return declarations
def get_arg_accessor(self, arg, option):
@ -39,14 +41,14 @@ class KwargsPlugin(CWrapPlugin):
return
if arg.get('kwarg_only'):
return self.KWARG_ONLY_ACCESSOR_TEMPLATE.substitute(name=arg['name'])
return self.ACCESSOR_TEMPLATE.substitute(idx=arg['idx'], name=arg['name'])
return self.ACCESSOR_TEMPLATE.substitute(idx=arg['idx'] - arg['kwarg_offset'], name=arg['name'])
def process_single_check(self, code, arg, arg_accessor):
if arg.get('no_kwargs'):
return code
if arg.get('kwarg_only'):
return self.KWARG_ONLY_CHECK_TEMPLATE.substitute(name=arg['name'], code=code)
return self.CHECK_TEMPLATE.substitute(idx=arg['idx'], name=arg['name'], code=code)
return self.CHECK_TEMPLATE.substitute(idx=arg['idx'] - arg['kwarg_offset'], name=arg['name'], code=code)
def process_wrapper(self, code, declaration):
if declaration.get('no_kwargs'):

View File

@ -1,18 +1,58 @@
import os
from copy import deepcopy
from . import CWrapPlugin
from itertools import product
from ...shared import cwrap_common
class OptionalArguments(CWrapPlugin):
def process_declarations(self, declarations):
new_options = []
for declaration in declarations:
cwrap_common.enumerate_options_due_to_default(
declaration,
allow_kwarg=True,
type_to_signature={},
remove_self=False)
for option in declaration['options']:
optional_args = []
for i, arg in enumerate(option['arguments']):
if 'default' in arg:
optional_args.append(i)
for permutation in product((True, False), repeat=len(optional_args)):
option_copy = deepcopy(option)
for i, bit in zip(optional_args, permutation):
arg = option_copy['arguments'][i]
if not bit:
arg['type'] = 'CONSTANT'
arg['ignore_check'] = True
# PyYAML interprets NULL as None...
arg['name'] = 'NULL' if arg['default'] is None else arg['default']
new_options.append(option_copy)
declaration['options'] = self.filter_unique_options(new_options)
return declarations
def filter_unique_options(self, options):
def signature(option, kwarg_only_count):
if kwarg_only_count == 0:
kwarg_only_count = None
else:
kwarg_only_count = -kwarg_only_count
arg_signature = '#'.join(
arg['type']
for arg in option['arguments'][:kwarg_only_count]
if not arg.get('ignore_check'))
if kwarg_only_count is None:
return arg_signature
kwarg_only_signature = '#'.join(
arg['name'] + '#' + arg['type']
for arg in option['arguments'][kwarg_only_count:]
if not arg.get('ignore_check'))
return arg_signature + "#-#" + kwarg_only_signature
seen_signatures = set()
unique = []
for option in options:
for num_kwarg_only in range(0, len(option['arguments']) + 1):
sig = signature(option, num_kwarg_only)
if sig not in seen_signatures:
if num_kwarg_only > 0:
for arg in option['arguments'][-num_kwarg_only:]:
arg['kwarg_only'] = True
unique.append(option)
seen_signatures.add(sig)
break
return unique

View File

@ -1,90 +0,0 @@
from copy import deepcopy
from . import CWrapPlugin
import yaml
class ProcessorSpecificPlugin(CWrapPlugin):
def process_declarations(self, declarations):
# In order to move Torch's random functions into the same cwrap
# declaration, we need to be able to handle the fact that on the CPU
# these functions take a generator argument, while on the GPU, they
# do not. As such, we would like to split those declarations at cwrap
# runtime into two separate declarations, one for the CPU (unchanged),
# and one for the GPU (with the generator argument removed).
#
# For example, the declaration arguments:
# arguments:
# - THTensor* self
# - arg: THGenerator* generator
# default: THPDefaultGenerator->cdata
# kwarg_only: True
#
# Would have the generator argument removed when generating for the GPU
# backend.
def arg_contains_generator(arg):
return (arg['type'] == 'THGenerator*' or (arg.get('default', None)
is not None and 'THPDefaultGenerator' in
str(arg.get('default', ""))))
def split_candidate(declaration):
# First, check and see if it is a declaration for both CPU/GPU
if all([proc in declaration['backends'] for
proc in ['CPU', 'CUDA']]):
for option in declaration['options']:
for argument in option['arguments']:
if arg_contains_generator(argument):
return True
return False
def can_we_handle_the_split(declaration):
# hook into here if the split cannot happen for some reason
return True
def generator_split(declaration):
# the split must make two changes: 1. remove the generator argument
# for the GPU, and 2. assign the correct backends/types to the
# split declaration
dec_cpu = declaration
dec_gpu = deepcopy(declaration)
# Remove GPU backend and types from dec_cpu
dec_cpu['backends'].remove('CUDA')
if dec_cpu.get('backend_type_pairs', False):
dec_cpu['backend_type_pairs'] = (
[pair for pair in dec_cpu['backend_type_pairs'] if
pair[1] == 'CPU'])
# also need to reach into options
for option in dec_cpu['options']:
option['backends'].remove('CUDA')
# Remove CPU backend and types from dec_gpu
dec_gpu['backends'].remove('CPU')
if dec_gpu.get('backend_type_pairs', False):
dec_gpu['backend_type_pairs'] = (
[pair for pair in dec_gpu['backend_type_pairs'] if
pair[1] == 'CUDA'])
# also need to reach into options
for option in dec_gpu['options']:
option['backends'].remove('CPU')
# Remove generator arguments from dec_gpu options
for option in dec_gpu['options']:
option['arguments'] = (
[arg for arg in option['arguments'] if
not arg_contains_generator(arg)])
return [dec_cpu, dec_gpu]
decs = []
for declaration in declarations:
if split_candidate(declaration):
assert(can_we_handle_the_split(declaration))
newdecs = generator_split(declaration)
decs.extend(newdecs)
else:
decs.append(declaration)
return decs

View File

@ -127,7 +127,7 @@ PyObject * $name(PyObject *self, PyObject *args, PyObject *kwargs)
""")
ALLOCATE_TMPL = Template("""\
THP${type}TensorPtr _${name}_guard((THP${type}Tensor*) THP${type}Tensor_NewEmpty());
THP${type}TensorPtr _${name}_guard = (THP${type}Tensor*) THP${type}Tensor_NewEmpty();
if (!_${name}_guard.get()) return NULL;
THP${type}Tensor* $name = _${name}_guard.get();
""")
@ -334,92 +334,8 @@ ${cpu}
for option in declaration['options']
for arg in option['arguments'])
def backends_types_to_defined_if_string(declaration):
# A declaration has two fields: 'backend', which stores a list of
# backends (currently 'cpu' and 'cuda') the declaration applies
# to, and 'types', which stores a list of real types the
# declaration applies to. In PyTorch, when a function is only
# supported by a subset of types, we wrap it in macro definition
# checks.
#
# Previously, we manually required the cwrap declaration to
# specify for which backend/type combinations a function was
# defined for. Now, we explicitly list the types and backends for
# a declaration, if it should only be supported for a specific
# subset of types, backends, or type-backend pairs.
types = declaration.get('types', [])
backends = declaration['backends']
all_backends = ['CPU', 'CUDA']
def get_defined_string(backend, real):
if backend == 'CUDA':
if real == 'all':
return "IS_CUDA"
else:
return 'CUDA_{0}'.format(real.upper())
else:
if real == 'all':
return "!IS_CUDA"
else:
return 'defined(TH_REAL_IS_{0})'.format(real.upper())
def expand_composite_type(p, t):
if t == 'floating_point':
result = ['double', 'float']
if p == 'CUDA':
result.append('half')
elif t == 'integral':
result = ['byte', 'char', 'short', 'int', 'long']
else:
result = [t]
return result
defineds = []
# The logic below does not handle corner cases well. We allow the
# declaration to have a field 'backend_type_pairs' that stores a
# dictionary from type --> backend representing allowed
# combinations. Let's use these first.
for pair in declaration.get('backend_type_pairs', []):
p, t = pair
defineds.extend([get_defined_string(p, et) for et in
expand_composite_type(p, t)])
# In the base case, types is empty and backends contains both
# 'CPU' and 'CUDA' --> this means we support all types, and our
# string should be empty, or simply the list of explict type
# backend pairs
if (len(types) == 0 and all([proc in backends for proc in
all_backends])):
return " || ".join(defineds)
# Case 2: types is empty, but only one backend type is specified
if len(types) == 0 and len(backends) == 1:
defineds.append('IS_CUDA' if backends[0] == 'CUDA' else
"!IS_CUDA")
return " || ".join(defineds)
# Else, we loop overall all of the backend, type pairs and add
# them
for p in backends:
for t in types:
defineds.extend([get_defined_string(p, et) for et in
expand_composite_type(p, t)])
return " || ".join(defineds)
for declaration in declarations:
# Disable all methods for THHalfTensor, unless cpu_half is True
dfstr = backends_types_to_defined_if_string(declaration)
if len(dfstr) > 0:
# for now, need to check for distributed defined if as well
if 'defined_if' in declaration:
declaration['defined_if'] += ' && (' + dfstr + ')'
else:
declaration['defined_if'] = dfstr
if not declaration.get('cpu_half', False):
defined_if = '!defined(TH_REAL_IS_HALF)'
if 'defined_if' in declaration:
@ -439,23 +355,15 @@ ${cpu}
declaration['variables'] += ['PyObject *__out;']
self.generate_out_options(declaration)
if has_long_args(declaration):
for option in declaration['options']:
for arg in option['arguments']:
if arg.get('long_args', False):
arg['no_kwargs'] = True
declaration['no_kwargs'] = True
for option in declaration['options']:
option['cname'] = 'TH{}Tensor_({})'.format(
'S' if option.get('sparse', False) else '', option['cname'])
if option.get('sparse', False):
defined_if = option.get('defined_if', '')
option['defined_if'] = '!IS_DISTRIBUTED' + (' && ' if defined_if else '') + defined_if
variants = declaration.get('variants', ['method'])
if 'function' in variants:
if declaration.get('with_stateless', False) or declaration.get('only_stateless', False):
stateless_declaration = self.make_stateless(declaration)
new_declarations.append(stateless_declaration)
self.stateless_declarations.append(stateless_declaration)
if 'method' not in variants:
if declaration.get('only_stateless', False):
continue
self.declarations.append(declaration)
@ -468,13 +376,9 @@ ${cpu}
register_only = [d for d in declarations if d.get('only_register', False)]
declarations = [d for d in declarations
if (('method' in d.get('variants', ['method'])) and
(not d.get('only_register', False)))]
self.declarations.extend(filter(lambda x: 'method' in x.get('variants',
['method']), register_only))
self.stateless_declarations.extend(filter(lambda x: 'method' not in
x.get('variants', ['method']),
register_only))
if (not d.get('only_stateless', False)) and (not d.get('only_register', False))]
self.declarations.extend(filter(lambda x: not x.get('only_stateless', False), register_only))
self.stateless_declarations.extend(filter(lambda x: x.get('only_stateless', False), register_only))
self.process_docstrings()
@ -516,7 +420,7 @@ ${cpu}
sparse=('' if not sparse else 'S'),
)
if sparse:
generated = '#if !defined(TH_REAL_IS_HALF) && !IS_DISTRIBUTED\n' + generated + '\n#endif\n\n'
generated = '#ifndef TH_REAL_IS_HALF\n' + generated + '\n#endif\n\n'
return generated
def process_full_file(self, code):
@ -557,24 +461,11 @@ ${cpu}
if any(arg.get('long_args', False) for arg in option['arguments']):
code = code.replace('__argcount ==', '__argcount >=')
expected = str(int(option.get('output_provided', False)) +
sum(not arg.get('no_kwargs', False) and not arg.get('ignore_check', False)
for arg in option['arguments']))
expected = str(int(option.get('output_provided', False)))
code = '__dictcount == ' + expected + ' &&\n ' + code
return code
def process_option_code(self, code, option):
if option.get('defined_if', ''):
defined_if = option['defined_if']
placeholder = ''
# This means that it's a first option, so we need a dummy if,
# so the next option can be an else if.
if 'else if' not in code:
placeholder = '\n #else\n if (false) {'
return '#if ' + defined_if + '\n ' + code + placeholder + '\n #endif\n'
return code
def process_pre_arg_assign(self, template, option):
new_args = []
for arg in option['arguments']:

View File

@ -1,422 +1,55 @@
class CWrapPlugin(object):
"""Base class from which all cwrap plugins should inherit.
Override any of the following methods to implement the desired wrapping
behavior.
"""
def initialize(self, cwrap):
"""Initialize the Plugin class prior to calling any other functions.
It is used to give the Plugin access to the cwrap object's helper
functions and state.
Args:
cwrap: the cwrap object performing the wrapping.
"""
pass
def get_type_check(self, arg, option):
"""Used to generate code for runtime checks of object types.
The type can be found in arg['type']. For example, it could be
THTensor*. If this Plugin recognizes the type in arg, it should
return a Template string containing code that checks whether a
Python object is of this type. For example, the return type in
this case would be:
Template('(PyObject*)Py_TYPE($arg) == THPTensorClass')
As a simpler example, if the type == 'bool' then we would return:
Template('PyBool_Check($arg)')
Note that the name of the identifier that will be subsituted must be
$arg.
Args:
arg: a Python object with a 'type' field representing the type
to generate a check string for.
option: dictionary containing the information for this specific
option.
Returns:
A Template string as described above, or None if this Plugin does
not have a corresponding type check for the passed type.
"""
pass
def get_type_unpack(self, arg, option):
"""Used to generate code unpacking of Python objects into C types.
Similar to get_type_check, but for unpacking Python objects into their
corresponding C types. The type is once again accessible via
arg['type']. This time we return a Template string that unpacks an
object. For a THTensor*, we know that the corresponding PyTorch type is
a THPTensor*, so we need to get the cdata from the object. So we would
return:
Template('((THPTensor*)$arg)->cdata')
For a simpler type, such as a long, we could do:
Template('PyLong_AsLong($arg)')
though in practice we will use our own custom unpacking code. Once
again, $arg must be used as the identifier.
Args:
arg: a Python object with a 'type' field representing the type
to generate a unpack string for.
option: dictionary containing the information for this specific
option.
Returns:
A Template string as described above, or None if this Plugin does
not have a corresponding type unpack for the passed type.
"""
pass
def get_return_wrapper(self, option):
"""Used to generate code wrapping a function's return value.
Wrapped functions should always return a PyObject *. However,
internally, the code will be working with C objects or primitives.
Therefore, if a function has a return value we need to convert it back
to a PyObject * before the function returns. Plugins can override this
function to generate wrapper code for returning specific C types. The
type is accessible via option['return'].
Continuing on with our THTensor* example, we might do something like:
Template('return THPTensor_(New)($result);')
In general, you want to do return <statement>; In this case, we call
into THP's library routine that takes a THTensor* (the $result
identifier) and returns a PyObject *.
For a bool, we could do Template('return PyBool_FromLong($result);').
Note that in other cases, our logic might be more complicated. For
example, if our return value is also an argument to the function call,
we could need to increase the reference count prior to returning.
Args:
option: dictionary containing the information for this specific
option.
Returns:
A Template string as described above, or None if this Plugin does
not have a corresponding return wrapper for the functions return
type or specifier.
"""
pass
def get_wrapper_template(self, declaration):
"""Used to create a code template to wrap the options.
This function returns a Template string that contains the function call
for the overall declaration, including the method definition, opening
and closing brackets, and any additional code within the method body.
Look through the examples to get a sense of what this might look like.
The only requirements are that it contains unsubstituted template
identifiers for anything the cwrap engine expects.
Note that for any declaration only one Plugin can generate the wrapper
template.
Args:
declaration: the declaration for the wrapped method.
Returns:
A template string representing the entire function declaration,
with identifiers as necessary.
"""
pass
def get_assign_args(self, arguments):
"""Used to modify argument metadata prior to assignment.
We have already setup argument checking, and how to unpack arguments.
This function allows you to modify the metadata of an argument prior to
actually performing the assignment. For example, you might want to
check that an argument is of a specific type, but when unpacking it you
might want to treat it as a different type. This function will allow
you to do stuff like that --> e.g. you could set the 'type' field for a
particular argument to be something else.
Args:
arguments: a list of argument metadata dictionaries.
Returns:
The same list of arguments, with any modifications as you see fit.
"""
pass
def get_arg_accessor(self, arg, option):
"""Used to generate a string for accessing the passed arg.
One of the key components of the YAML definition for a method to be
wrapped are the arguments to that method. Override this function to
show how to access that specific arg in the code. For example, you
might do something different if the argument is a keyword argument, or
a constant, or self. The base cwrap plugin has a fallback arg accessor
for loading elements from the args PyObject * tuple passed to the
function.
Its best to look at some of the existing Plugins to get a sense of what
one might do.
Args:
arg: a dictionary specifying attributes of the arg to be accessed
option: dictionary containing the information for this specific
option.
Returns:
A a string (note: not a Template string!) of code that can be used
to access the given arg. If the plugin does not know how to access
the arg, return None.
"""
pass
def process_full_file(self, code):
"""Used to modify the code for the entire output file.
The last thing any plugin can do. Code contains the results of wrapping
all the declarations. The plugin can do things like adding header
guards, include statements, etc.
Args:
code: a string source code for the wrapped declarations.
Returns:
The same code, modified as the plugin sees fit.
"""
return code
def process_single_check(self, code, arg, arg_accessor):
"""Used to postprocess a type check.
Above we defined a function get_type_check that returns a Template
string that allows for type checking a PyObject * for a specific type.
In this function, the passed "code" is a combination of that type check
along with a specific arg_accessor pasted in. For example:
'(PyObject*)Py_TYPE(PyTuple_GET_ITEM(args, 1)) == THPTensorClass'
This function can be overriden to support modifying this check string.
For example, if an argument can be null, we might want to check and see
if the type is Py_None, as well.
Args:
code: The string code representing a type check for a specific
argument being accessed.
arg: dictionary containing properties of that specific argument
arg_accessor: the arg_accessor string for that specific argument.
Note that this is likely also embedded in code, but if you want to
be able to access this arg and throw away the other code, you can
do so.
Returns:
A string representing the processed check/access string for this
arg. If the plugin does not know how to modify a specific input, it
should return the original code.
"""
return code
def process_all_checks(self, code, option):
"""Used to generate additional checks based on all the individual ones.
After individually processing each argument with get_type_check,
get_arg_accessor, process_single_check, this function allows you to
inspect the combined checks and do any additional checking/modify that
string as you see fit. In particular, given code is a string like:
CHECK_TYPE(GET_ARG(0)) && CHECK_TYPE(GET_ARG(1)) && ..
We can process it as we see fit. For example, we may want to add a
check at the beginning that we have the specified number of arguments.
Args:
code: A string representing each argument check separated by an
'&&'. code can be None if there are no arguments to be checked.
option: dictionary containing the information for this specific
option.
Returns:
The modified code string with any additional checks, or just the
existing code if no modifications are to be made.
"""
return code
def process_single_unpack(self, code, arg, arg_accessor):
"""Used to postprocess a type unpack.
Same as process_single_check above, but for type unpacking. E.g. an
example code could be:
PyLong_FromLong(PyTuple_GET_ITEM(args, 0))
And this code could modify that as it sees fit. For example, if the
result of accessing the argument is None, we would not want to call the
unpacking code.
Args:
code: The string code representing a type unpack for a specific
argument being accessed.
arg: dictionary containing properties of that specific argument
arg_accessor: the arg_accessor string for that specific argument.
Note that this is likely also embedded in code, but if you want to
be able to access this arg and throw away the other code, you can
do so.
Returns:
A string representing the processed unpack/access string for this
arg. If the plugin does not know how to modify a specific input, it
should return the original code.
"""
return code
def process_all_call_arg(self, code, option):
"""Used to modify the arguments to the underlying C function call.
Code is the string of comma-separated arguments that will be passed to
the wrapped C function. You can use this function to modify that string
as you see fit. For example, THP prepends the LIBRARY_STATE definition
so that the generated code will follow the conventions it uses for
writing one function for both TH/THC calls.
Args:
code: A string as described above.
option: dictionary containing the information for this specific
option.
Returns:
The same code, modified as the plugin sees fit.
"""
return code
def process_option_code(self, code, option):
"""Used to modify the entire code body for an option.
Code in this case is a string containing the entire generated code for
a specific option. Note that this body includes the checks for each
option, i.e. if (type checks for one permutation) { ... } else if (type
checks for another permutation) { ... } etc.
Args:
code: string representing the generated code for the option
option: dictionary containing the information for this specific
option.
Returns:
The same code, modified as the plugin sees fit.
"""
return code
def process_wrapper(self, code, declaration):
"""Used to modify the entire code body for a declaration.
Code in this case is a string containing the entire generated code for
a specific declaration. This code can be modified as the plugin sees
fit. For example, we might want to wrap the function in preprocessor
guards if it is only enabled for floats.
Args:
code: string representing the generated code for the declaration
declaration: the declaration metadata.
Returns:
The same code, modified as the plugin sees fit.
"""
return code
def process_declarations(self, declarations):
"""Used to process/modify the function's declaration.
Cwrap loads the YAML of a function to be cwrap'd into a dictionary.
This is known as the declaration. The cwrap code sets some defaults as
necessary, and then passes this dictionary to process_declarations.
Overriding this code allows the plugin to modify this declaration as it
sees fit prior to any code generation. The plugin may add, remove or
modify the fields of the declaration dictionary. It can also save state
to the Plugin for use in subsequent function overrides.
Its best to look at some of the existing Plugins to get a sense of what
one might do.
Args:
declarations: a list of declarations, i.e. dictionaries that define
the function(s) being wrapped. Note that this can be plural, so the
function must take care to modify each input declaration.
Returns:
Those same declarations, modified as the Plugin sees fit. Note that
you could insert a declaration, if you wanted to take an input
declaration and e.g. wrap it multiple times.
"""
return declarations
def process_option_code_template(self, template, option):
"""Used to modify the code template for the option.
The "code template" can be thought of the actual body implementing the
wrapped function call --> i.e. it is not the argument check,
assignment, etc. but the actual logic of the function. The template is
a list containing two operations: the $call, and the $return_result.
These represent the "locations" where the function call will happen,
and the function will return.
This function can modify the list to insert arbitrary code around the
$call and $return_result. For example, one might want to wrap the code
in a try/catch, or post-process the result in some way. This allows a
plugin to do that.
Args:
template: a list containing $call and $return_result, in addition
to any arbitrary code inserted by other plugins.
option: dictionary containing the information for this specific
option.
Returns:
The same "code template", possibly modified by this plugin.
"""
return template
def process_pre_arg_assign(self, template, option):
"""Used to include any code before argument assignment.
This function can be used to insert any code that will be part of the
resulting function. The code is inserted after argument checks occur,
but before argument assignment.
Args:
template: String representing the code to be inserted. If other
plugins have included code for pre_arg_assign, it will be included
here.
option: dictionary containing the information for this specific
option.
Returns:
template, with any additional code if needed.
"""
return template
@ -433,4 +66,3 @@ from .AutoGPU import AutoGPU
from .CuDNNPlugin import CuDNNPlugin
from .GenericNN import GenericNN
from .WrapDim import WrapDim
from .Broadcast import Broadcast

View File

@ -1,27 +0,0 @@
FROM ubuntu:16.04
LABEL com.nvidia.volumes.needed="nvidia_driver"
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
git \
curl \
ca-certificates \
libjpeg-dev \
libpng-dev && \
rm -rf /var/lib/apt/lists/*
RUN curl -o ~/miniconda.sh -O https://repo.continuum.io/miniconda/Miniconda3-4.2.12-Linux-x86_64.sh && \
chmod +x ~/miniconda.sh && \
~/miniconda.sh -b -p /opt/conda && \
rm ~/miniconda.sh && \
/opt/conda/bin/conda install conda-build && \
/opt/conda/bin/conda create -y --name pytorch-py35 python=3.5.2 numpy pyyaml scipy ipython mkl&& \
/opt/conda/bin/conda clean -ya
ENV PATH /opt/conda/envs/pytorch-py35/bin:$PATH
RUN conda install --name pytorch-py35 -c soumith magma-cuda80 && /opt/conda/bin/conda clean -ya
RUN conda install --name pytorch-py35 pytorch torchvision cuda80 -c soumith && /opt/conda/bin/conda clean -ya
ENV LD_LIBRARY_PATH /usr/local/nvidia/lib:/usr/local/nvidia/lib64
WORKDIR /workspace
RUN chmod -R a+w /workspace

View File

@ -3,13 +3,26 @@ import sys
from string import Template, ascii_lowercase
from ..cwrap import cwrap
from ..cwrap.plugins import StandaloneExtension, GenericNN, NullableArguments, AutoGPU
from ..shared import import_module
BASE_PATH = os.path.realpath(os.path.join(__file__, '..', '..', '..'))
WRAPPER_PATH = os.path.join(BASE_PATH, 'torch', 'csrc', 'nn')
THNN_UTILS_PATH = os.path.join(BASE_PATH, 'torch', '_thnn', 'utils.py')
def import_module(name, path):
if sys.version_info >= (3, 5):
import importlib.util
spec = importlib.util.spec_from_file_location(name, path)
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
return module
elif sys.version_info >= (3, 0):
from importlib.machinery import SourceFileLoader
return SourceFileLoader(name, path).load_module()
else:
import imp
return imp.load_source(name, path)
thnn_utils = import_module('torch._thnn.utils', THNN_UTILS_PATH)
FUNCTION_TEMPLATE = Template("""\
@ -75,17 +88,14 @@ def wrap_function(name, type, arguments):
cname = 'THNN_' + type + name
declaration = ''
declaration += 'extern "C" void ' + cname + \
'(' + ', '.join(TYPE_TRANSFORMS[type].get(arg.type, arg.type)
for arg in arguments) + ');\n'
'(' + ', '.join(TYPE_TRANSFORMS[type].get(arg.type, arg.type) for arg in arguments) + ');\n'
declaration += FUNCTION_TEMPLATE.substitute(name=type + name, cname=cname)
indent = ' ' * 4
dict_indent = ' ' * 6
prefix = indent + '- '
for arg in arguments:
if not arg.is_optional:
declaration += prefix + \
TYPE_TRANSFORMS[type].get(
arg.type, arg.type) + ' ' + arg.name + '\n'
declaration += prefix + TYPE_TRANSFORMS[type].get(arg.type, arg.type) + ' ' + arg.name + '\n'
else:
t = TYPE_TRANSFORMS[type].get(arg.type, arg.type)
declaration += prefix + 'type: ' + t + '\n' + \
@ -130,7 +140,6 @@ def wrap_cunn():
AutoGPU(has_self=False),
])
GENERIC_FUNCTION_TEMPLATE = Template("""\
[[
name: $name
@ -159,7 +168,7 @@ def wrap_generic():
defs = OrderedDict()
def should_wrap_function(name):
if name.startswith('LookupTable_'):
if name.startswith('LookupTable'):
return False
return (name.endswith('updateOutput') or
name.endswith('updateGradInput') or

View File

@ -1,12 +0,0 @@
{
global:
_TH*;
TH*;
*THP*;
*THCP*;
PyInit*;
init*;
state;
local:
*;
};

View File

@ -1,39 +1,17 @@
import os
import platform
import ctypes.util
from subprocess import Popen, PIPE
import os
from .env import check_env_flag
def find_nvcc():
proc = Popen(['which', 'nvcc'], stdout=PIPE, stderr=PIPE)
out, err = proc.communicate()
out = out.decode().strip()
if len(out) > 0:
return os.path.dirname(out)
else:
return None
if check_env_flag('NO_CUDA'):
WITH_CUDA = False
CUDA_HOME = None
else:
CUDA_HOME = os.getenv('CUDA_HOME', '/usr/local/cuda')
if not os.path.exists(CUDA_HOME):
# We use nvcc path on Linux and cudart path on macOS
osname = platform.system()
if osname == 'Linux':
cuda_path = find_nvcc()
else:
cudart_path = ctypes.util.find_library('cudart')
if cudart_path is not None:
cuda_path = os.path.dirname(cudart_path)
else:
cuda_path = None
if cuda_path is not None:
CUDA_HOME = os.path.dirname(cuda_path)
cudart_path = ctypes.util.find_library('cudart')
if cudart_path is not None:
CUDA_HOME = os.path.dirname(cudart_path)
else:
CUDA_HOME = None
WITH_CUDA = CUDA_HOME is not None

View File

@ -1,5 +1,4 @@
import os
import sys
import glob
from itertools import chain
@ -10,8 +9,6 @@ from .cuda import WITH_CUDA, CUDA_HOME
def gather_paths(env_vars):
return list(chain(*(os.getenv(v, '').split(':') for v in env_vars)))
is_conda = 'conda' in sys.version or 'Continuum' in sys.version
conda_dir = os.path.join(os.path.dirname(sys.executable), '..')
WITH_CUDNN = False
CUDNN_LIB_DIR = None
@ -22,7 +19,6 @@ if WITH_CUDA and not check_env_flag('NO_CUDNN'):
os.path.join(CUDA_HOME, 'lib'),
os.path.join(CUDA_HOME, 'lib64'),
'/usr/lib/x86_64-linux-gnu/',
'/usr/lib/powerpc64le-linux-gnu/',
] + gather_paths([
'LIBRARY_PATH',
])))
@ -35,9 +31,6 @@ if WITH_CUDA and not check_env_flag('NO_CUDNN'):
'C_INCLUDE_PATH',
'CPLUS_INCLUDE_PATH',
])))
if is_conda:
lib_paths.append(os.path.join(conda_dir, 'lib'))
include_paths.append(os.path.join(conda_dir, 'include'))
for path in lib_paths:
if path is None or not os.path.exists(path):
continue

View File

@ -1,58 +0,0 @@
import os
this_file = os.path.dirname(os.path.abspath(__file__))
generated_dir = os.path.abspath(os.path.join(this_file, '..', '..', 'torch', 'csrc', 'generated'))
line_start = '//generic_include '
types = [
'Double',
'Float',
'Half',
'Long',
'Int',
'Short',
'Char',
'Byte'
]
generic_include = '#define {lib}_GENERIC_FILE "{path}"'
generate_include = '#include "{lib}/{lib}Generate{type}Type.h"'
def split_types(file_name):
assert file_name.startswith('torch/csrc/')
if not os.path.exists(generated_dir):
os.makedirs(generated_dir)
with open(file_name, 'r') as f:
lines = f.read().split('\n')
# Find //generic_include
for i, l in enumerate(lines):
if l.startswith(line_start):
args = l[len(line_start):]
lib_prefix, generic_file = filter(bool, args.split())
break
else:
raise RuntimeError("generic include not found")
gen_name_prefix = file_name[len('torch/csrc/'):].replace('/', '_').replace('.cpp', '')
gen_path_prefix = os.path.join(generated_dir, gen_name_prefix)
prefix = '\n'.join(lines[:i])
suffix = '\n'.join(lines[i + 1:])
to_build = []
g_include = generic_include.format(lib=lib_prefix, path=generic_file)
for t in types:
t_include = generate_include.format(lib=lib_prefix, type=t)
gen_path = gen_path_prefix + t + '.cpp'
to_build.append(gen_path)
with open(gen_path, 'w') as f:
f.write(prefix + '\n' +
g_include + '\n' +
t_include + '\n' +
suffix)
return to_build

View File

@ -1,3 +0,0 @@
from .module_loader import import_module
from .cwrap_common import set_declaration_defaults, \
sort_by_number_of_options, enumerate_options_due_to_default

View File

@ -1 +0,0 @@
../../torch/lib/ATen/common_with_cwrap.py

View File

@ -1,16 +0,0 @@
import sys
def import_module(name, path):
if sys.version_info >= (3, 5):
import importlib.util
spec = importlib.util.spec_from_file_location(name, path)
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
return module
elif sys.version_info >= (3, 0):
from importlib.machinery import SourceFileLoader
return SourceFileLoader(name, path).load_module()
else:
import imp
return imp.load_source(name, path)

View File

@ -5,7 +5,7 @@ Additionally, it provides many utilities for efficient serializing of
Tensors and arbitrary types, and other useful utilities.
It has a CUDA counterpart, that enables you to run your tensor computations
on an NVIDIA GPU with compute capability >= 3.0.
on an NVIDIA GPU with compute capability >= 2.0.
"""
import sys
@ -15,7 +15,7 @@ from .version import __version__
__all__ = [
'typename', 'is_tensor', 'is_storage', 'set_default_tensor_type',
'set_rng_state', 'get_rng_state', 'manual_seed', 'initial_seed',
'save', 'load', 'set_printoptions', 'chunk', 'split', 'stack', 'matmul',
'save', 'load', 'set_printoptions', 'chunk', 'split', 'stack',
'DoubleStorage', 'FloatStorage', 'LongStorage', 'IntStorage',
'ShortStorage', 'CharStorage', 'ByteStorage',
'DoubleTensor', 'FloatTensor', 'LongTensor', 'IntTensor',
@ -129,9 +129,6 @@ def manual_seed(seed):
Args:
seed (int or long): The desired seed.
"""
if torch.cuda.is_available() and not torch.cuda._in_bad_fork:
torch.cuda.manual_seed_all(seed)
return default_generator.manual_seed(seed)
@ -268,12 +265,12 @@ class ByteTensor(_C.ByteTensorBase, _TensorBase):
_storage_classes = {
DoubleStorage, FloatStorage, LongStorage, IntStorage, ShortStorage,
CharStorage, ByteStorage, HalfStorage
CharStorage, ByteStorage,
}
_tensor_classes = {
DoubleTensor, FloatTensor, LongTensor, IntTensor, ShortTensor,
CharTensor, ByteTensor, HalfTensor
CharTensor, ByteTensor,
}
@ -339,9 +336,8 @@ import torch.nn
import torch.optim
import torch.multiprocessing
import torch.sparse
import torch.utils.backcompat
_C._init_names(list(torch._tensor_classes) + list(torch._storage_classes))
# attach docstrings to torch and tensor functions
from . import _torch_docs, _tensor_docs, _storage_docs
del _torch_docs, _tensor_docs, _storage_docs
from . import _torch_docs, _tensor_docs
del _torch_docs, _tensor_docs

View File

@ -1,31 +0,0 @@
# Copyright (c) 2010-2017 Benjamin Peterson
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
def with_metaclass(meta, *bases):
"""Create a base class with a metaclass."""
# This requires a bit of explanation: the basic idea is to make a dummy
# metaclass for one level of class instantiation that replaces itself with
# the actual metaclass.
class metaclass(meta):
def __new__(cls, name, this_bases, d):
return meta(name, bases, d)
return type.__new__(metaclass, 'temporary_class', (), {})

View File

@ -1,43 +0,0 @@
"""Adds docstrings to Storage functions"""
import torch._C
from torch._C import _add_docstr as add_docstr
storage_classes = [
'DoubleStorageBase',
'FloatStorageBase',
'LongStorageBase',
'IntStorageBase',
'ShortStorageBase',
'CharStorageBase',
'ByteStorageBase',
]
def add_docstr_all(method, docstr):
for cls_name in storage_classes:
cls = getattr(torch._C, cls_name)
try:
add_docstr(getattr(cls, method), docstr)
except AttributeError:
pass
add_docstr_all('from_file',
"""
from_file(filename, shared=False, size=0) -> Storage
If shared is True then memory is shared between all processes. All changes are
written to the file. If shared is False then the changes on the storage do not
affect the file.
Size is the number of elements in the storage. If shared is False then the file
must contain at least `size * sizeof(Type)` bytes (`Type` is the type of
storage). If shared is True the file will be created if needed.
Args:
filename (str): file name to map
shared (bool): whether to share memory
size (int): number of elements in the storage
""")

File diff suppressed because it is too large Load Diff

View File

@ -67,7 +67,7 @@ def set_printoptions(
def _number_format(tensor, min_sz=-1):
min_sz = max(min_sz, 2)
tensor = torch.DoubleTensor(tensor.size()).copy_(tensor).abs_().view(tensor.nelement())
tensor = torch.DoubleTensor(tensor.nelement()).copy_(tensor).abs_()
pos_inf_mask = tensor.eq(float('inf'))
neg_inf_mask = tensor.eq(float('-inf'))

File diff suppressed because it is too large Load Diff

View File

@ -3,8 +3,7 @@ import importlib
def _type(self, new_type=None, async=False):
"""Returns the type if `new_type` is not provided, else casts this object to
the specified type.
"""Casts this object to the specified type.
If this is already of the correct type, no copy is performed and the
original object is returned.
@ -28,8 +27,8 @@ def _type(self, new_type=None, async=False):
raise RuntimeError("Cannot cast sparse tensor to dense tensor")
new_type_name = new_type.__module__ + '.' + new_type.__name__
new_values_type_name = new_type_name.replace('.sparse', '')
new_values = self._values().type(new_values_type_name, async)
return new_type(self._indices(), new_values, self.size())
new_values = self.values().type(new_values_type_name, async)
return new_type(self.indices(), new_values, self.size())
if new_type.is_sparse:
raise RuntimeError("Cannot cast dense tensor to sparse tensor")
return new_type(self.size()).copy_(self, async)
@ -58,8 +57,8 @@ def _cuda(self, device=None, async=False):
with torch.cuda.device(device):
if self.is_sparse:
new_type = getattr(torch.cuda.sparse, self.__class__.__name__)
indices = self._indices().cuda(device, async)
values = self._values().cuda(device, async)
indices = self.indices().cuda(device, async)
values = self.values().cuda(device, async)
return new_type(indices, values, self.size())
else:
new_type = getattr(torch.cuda, self.__class__.__name__)
@ -99,47 +98,3 @@ def _accumulate(iterable, fn=lambda x, y: x + y):
for element in it:
total = fn(total, element)
yield total
def _flatten_tensors(tensors):
"""Flatten tensors into a single contiguous 1D buffer"""
if len(tensors) == 1:
return tensors[0].contiguous().view(-1)
numels = [tensor.numel() for tensor in tensors]
size = sum(numels)
offset = 0
flat = tensors[0].new(size)
for tensor, numel in zip(tensors, numels):
flat.narrow(0, offset, numel).copy_(tensor, broadcast=False)
offset += numel
return flat
def _unflatten_tensors(flat, tensors):
"""View a flat buffer using the sizes of tensors"""
outputs = []
offset = 0
for tensor in tensors:
numel = tensor.numel()
outputs.append(flat.narrow(0, offset, numel).view_as(tensor))
offset += numel
return tuple(outputs)
def _take_tensors(tensors, size_limit):
"""Groups tensors into lists of up to size_limit bytes"""
buf = []
size = 0
last_type = type(tensors[0]) if len(tensors) > 0 else None
for tensor in tensors:
t = type(tensor)
param_size = tensor.numel() * tensor.element_size()
if t is not last_type or (size + param_size > size_limit and size > 0):
yield buf
last_type = t
size = 0
buf = []
buf.append(tensor)
size += param_size
if len(buf) > 0:
yield buf

View File

@ -5,7 +5,6 @@ changes to the existing code - you only need to wrap all tensors in
:class:`.Variable` objects.
"""
import torch
import warnings
from .variable import Variable
from .function import Function, NestedIOFunction
@ -15,41 +14,13 @@ from .gradcheck import gradcheck
__all__ = ['Variable', 'Function', 'StochasticFunction', 'backward']
def _make_grads(outputs, grads, user_create_graph):
if user_create_graph is not None:
create_graph = user_create_graph
else:
create_graph = any(isinstance(grad, Variable) and not grad.volatile
for grad in grads)
new_grads = []
for out, grad in zip(outputs, grads):
if isinstance(grad, Variable):
new_grads.append(grad)
elif torch.is_tensor(grad):
new_grads.append(Variable(grad, volatile=not create_graph))
elif grad is None:
if out.requires_grad:
if out.numel() != 1:
raise RuntimeError("grad can be implicitly created only for scalar outputs")
data = out.data
new_grads.append(
Variable(data.new().resize_as_(data).fill_(1), volatile=not create_graph))
else:
new_grads.append(None)
else:
raise TypeError("gradients can be either Tensors, Variables or None, but got " +
type(grad).__name__)
return tuple(new_grads), create_graph
def backward(variables, grad_variables=None, retain_graph=None, create_graph=None, retain_variables=None):
def backward(variables, grad_variables, retain_variables=False):
"""Computes the sum of gradients of given variables w.r.t. graph leaves.
The graph is differentiated using the chain rule. If any of ``variables``
are non-scalar (i.e. their data has more than one element) and require
gradient, the function additionaly requires specifying ``grad_variables``.
It should be a sequence of matching length, that contains gradient of
It should be a sequence of matching length, that containins gradient of
the differentiated function w.r.t. corresponding variables (``None`` is an
acceptable value for all variables that don't need gradient tensors).
@ -59,98 +30,15 @@ def backward(variables, grad_variables=None, retain_graph=None, create_graph=Non
Arguments:
variables (sequence of Variable): Variables of which the derivative will be
computed.
grad_variables (sequence of (Tensor, Variable or None)): Gradients w.r.t.
each element of corresponding variables. Any tensors will be
automatically converted to Variables that are volatile unless
``create_graph`` is True. None values can be specified for scalar
Variables or ones that don't require grad. If a None value would
be acceptable for all grad_variables, then this argument is optional.
retain_graph (bool, optional): If False, the graph used to compute the grad
will be freed. Note that in nearly all cases setting this option to True
is not needed and often can be worked around in a much more efficient
way. Defaults to the value of ``create_graph``.
create_graph (bool, optional): If true, graph of the derivative will
be constructed, allowing to compute higher order derivative products.
Defaults to False, unless ``grad_variables`` contains at least one
non-volatile Variable.
grad_variables (sequence of Tensor): Gradients w.r.t. each element of
corresponding variables. Required only for non-scalar variables that
require gradient.
retain_variables (bool): If ``True``, buffers necessary for computing
gradients won't be freed after use. It is only necessary to
specify ``True`` if you want to differentiate some subgraph multiple
times.
"""
variables = (variables,) if isinstance(variables, Variable) else tuple(variables)
if grad_variables is None:
grad_variables = [None] * len(variables)
elif isinstance(grad_variables, Variable) or torch.is_tensor(grad_variables):
grad_variables = [grad_variables]
else:
grad_variables = list(grad_variables)
grad_variables, create_graph = _make_grads(variables, grad_variables, create_graph)
if retain_variables is not None:
if retain_graph is not None:
raise ValueError("only one of retain_graph and retain_variables can be specified")
retain_graph = retain_variables
warnings.warn("retain_variables option is deprecated and will be removed in 0.3. "
"Use retain_graph instead.")
elif retain_graph is None:
retain_graph = create_graph
Variable._execution_engine.run_backward(
variables, grad_variables, retain_graph)
tuple(variables), tuple(grad_variables), retain_variables)
def grad(outputs, inputs, grad_outputs=None, retain_graph=None, create_graph=None, only_inputs=True):
"""Computes and returns the sum of gradients of outputs w.r.t. the inputs.
``grad_outputs`` should be a sequence of length matching ``output``
containing the pre-computed gradients w.r.t. each of the outputs. If an
output doesn't require_grad, then the gradient can be ``None``).
Gradients can be given as Tensors when one doesn't need the graph of the
derivative, or as Variables, in which case the graph will be created.
If ``only_inputs`` is True, the function will only return a list of gradients
w.r.t the specified inputs. If it's False, then gradient w.r.t. all remaining
leaves will still be computed, and will be accumulated into their ``.grad``
attribute.
Arguments:
outputs (sequence of Variable): outputs of the differentiated function.
inputs (sequence of Variable): Inputs w.r.t. which the gradient will be
returned (and not accumulated into ``.grad``).
grad_outputs (sequence of Tensor or Variable): Gradients w.r.t. each output.
Any tensors will be automatically converted to Variables that are
volatile unless ``create_graph`` is True. None values can be
specified for scalar Variables or ones that don't require grad.
If a None value would be acceptable for all grad_variables, then
this argument is optional.
retain_graph (bool, optional): If False, the graph used to compute the grad
will be freed. Note that in nearly all cases setting this option to True
is not needed and often can be worked around in a much more efficient
way. Defaults to the value of ``create_graph``.
create_graph (bool, optional): If True, graph of the derivative will
be constructed, allowing to compute higher order derivative products.
Defaults to False, unless ``grad_variables`` contains at least one
non-volatile Variable.
only_inputs (bool, optional): If True, gradient w.r.t. leaves that are
part of the graph, but don't appear in ``inputs`` won't be computed
and accumulated. Defaults to True.
"""
outputs = (outputs,) if isinstance(outputs, Variable) else tuple(outputs)
inputs = (inputs,) if isinstance(inputs, Variable) else tuple(inputs)
if grad_outputs is None:
grad_outputs = [None] * len(outputs)
elif isinstance(grad_outputs, Variable) or torch.is_tensor(grad_outputs):
grad_outputs = [grad_outputs]
else:
grad_outputs = list(grad_outputs)
grad_outputs, create_graph = _make_grads(outputs, grad_outputs, create_graph)
if retain_graph is None:
retain_graph = create_graph
return Variable._execution_engine.run_backward(
outputs, grad_outputs, retain_graph,
inputs, only_inputs)
if not torch._C._autograd_init():
raise RuntimeError("autograd initialization failed")
assert torch._C._autograd_init()

View File

@ -1,228 +1,200 @@
import torch
from ..function import Function, InplaceFunction
from .utils import maybe_unexpand, maybe_unexpand_or_view
import math
def maybe_view(tensor, size):
if tensor.size() == size:
return tensor
return tensor.contiguous().view(size)
class Add(InplaceFunction):
@staticmethod
def forward(ctx, a, b, inplace=False):
ctx.a_size = a.size()
ctx.b_size = b.size()
if inplace:
ctx.mark_dirty(a)
def forward(self, a, b):
self.b_size = b.size()
if self.inplace:
self.mark_dirty(a)
return a.add_(b)
else:
return a.add(b)
@staticmethod
def backward(ctx, grad_output):
return maybe_unexpand(grad_output, ctx.a_size), maybe_unexpand_or_view(grad_output, ctx.b_size), None
def backward(self, grad_output):
return grad_output, maybe_view(grad_output, self.b_size)
class Sub(InplaceFunction):
@staticmethod
def forward(ctx, a, b, inplace=False):
ctx.a_size = a.size()
ctx.b_size = b.size()
if inplace:
ctx.mark_dirty(a)
def forward(self, a, b):
self.b_size = b.size()
if self.inplace:
self.mark_dirty(a)
return a.sub_(b)
else:
return a.sub(b)
@staticmethod
def backward(ctx, grad_output):
return maybe_unexpand(grad_output, ctx.a_size), maybe_unexpand_or_view(grad_output.neg(), ctx.b_size), None
def backward(self, grad_output):
return grad_output, maybe_view(grad_output.neg(), self.b_size)
class Mul(Function):
@staticmethod
def forward(ctx, a, b):
ctx.a_size = a.size()
ctx.b_size = b.size()
ctx.save_for_backward(a, b)
def forward(self, a, b):
self.b_size = b.size()
self.save_for_backward(a, b)
return a.mul(b)
@staticmethod
def backward(ctx, grad_output):
a, b = ctx.saved_variables
return maybe_unexpand(grad_output.mul(b), ctx.a_size), maybe_unexpand_or_view(grad_output.mul(a), ctx.b_size)
def backward(self, grad_output):
a, b = self.saved_tensors
return grad_output.mul(b), maybe_view(grad_output.mul(a), self.b_size)
class Div(Function):
@staticmethod
def forward(ctx, a, b):
ctx.a_size = a.size()
ctx.b_size = b.size()
ctx.save_for_backward(a, b)
def forward(self, a, b):
self.b_size = b.size()
self.save_for_backward(a, b)
return a.div(b)
@staticmethod
def backward(ctx, grad_output):
a, b = ctx.saved_variables
b_rec = b.reciprocal()
grad_a = grad_output.mul(b_rec)
grad_b = grad_output.neg().mul(a).mul(b_rec).mul(b_rec)
return maybe_unexpand(grad_a, ctx.a_size), maybe_unexpand_or_view(grad_b, ctx.b_size)
def backward(self, grad_output):
a, b = self.saved_tensors
return grad_output.div(b), maybe_view(grad_output.neg().mul(a).div_(b).div_(b), self.b_size)
class Pow(Function):
@staticmethod
def forward(ctx, a, b):
ctx.a_size = a.size()
ctx.b_size = b.size()
ctx.save_for_backward(a, b)
def forward(self, a, b):
self.b_size = b.size()
self.save_for_backward(a, b)
return a.pow(b)
@staticmethod
def backward(ctx, grad_output):
a, b = ctx.saved_variables
grad_a = grad_output.mul(b).mul(a.pow(b - 1))
grad_b = grad_output.mul(a.pow(b)).mul(a.log())
return maybe_unexpand(grad_a, ctx.a_size), maybe_unexpand_or_view(grad_b, ctx.b_size)
def sort_args(a, b):
return (a, b, True) if torch.is_tensor(a) else (b, a, False)
def backward(self, grad_output):
a, b = self.saved_tensors
return grad_output.mul(b).mul_(a.pow(b - 1)), maybe_view(grad_output.mul(a.pow(b)).mul_(a.log()), self.b_size)
class AddConstant(InplaceFunction):
@staticmethod
def forward(ctx, a, b, inplace=False):
tensor, constant, ctx.tensor_first = sort_args(a, b)
if inplace:
ctx.mark_dirty(tensor)
return tensor.add_(constant)
else:
return tensor.add(constant)
def __init__(self, constant, inplace=False):
super(AddConstant, self).__init__(inplace)
self.constant = constant
@staticmethod
def backward(ctx, grad_output):
if ctx.tensor_first:
return grad_output, None, None
def forward(self, a):
if self.inplace:
self.mark_dirty(a)
return a.add_(self.constant)
else:
return None, grad_output, None
return a.add(self.constant)
def backward(self, grad_output):
return grad_output
class SubConstant(InplaceFunction):
@staticmethod
def forward(ctx, a, b, inplace=False):
tensor, constant, ctx.tensor_first = sort_args(a, b)
if ctx.tensor_first:
if inplace:
ctx.mark_dirty(tensor)
return tensor.sub_(constant)
else:
return tensor.sub(constant)
else:
if inplace:
ctx.mark_dirty(tensor)
return tensor.neg_().add_(constant)
else:
return tensor.neg().add_(constant)
def __init__(self, constant, sub_tensor=False, inplace=False):
super(SubConstant, self).__init__(inplace)
self.constant = constant
self.sub_tensor = sub_tensor
@staticmethod
def backward(ctx, grad_output):
if ctx.tensor_first:
return grad_output, None, None
def forward(self, a):
if self.sub_tensor:
if a.is_signed() and self.inplace:
self.mark_dirty(a)
return a.neg_().add_(self.constant)
else:
assert not self.inplace, "can't perform (constant - tensor) " \
"subtraction in-place on an unsigned type"
return a.new().resize_as_(a).fill_(self.constant).sub_(a)
else:
return None, grad_output.neg(), None
if self.inplace:
self.mark_dirty(a)
return a.sub_(self.constant)
else:
return a.sub(self.constant)
def backward(self, grad_output):
if self.sub_tensor:
return grad_output.neg()
else:
return grad_output
class MulConstant(InplaceFunction):
@staticmethod
def forward(ctx, a, b, inplace=False):
tensor, ctx.constant, ctx.tensor_first = sort_args(a, b)
if inplace:
ctx.mark_dirty(tensor)
return tensor.mul_(ctx.constant)
else:
return tensor.mul(ctx.constant)
def __init__(self, constant, inplace=False):
super(MulConstant, self).__init__(inplace)
self.constant = constant
@staticmethod
def backward(ctx, grad_output):
grad_input = grad_output.mul(ctx.constant)
if ctx.tensor_first:
return grad_input, None, None
def forward(self, a):
if self.inplace:
self.mark_dirty(a)
return a.mul_(self.constant)
else:
return None, grad_input, None
return a.mul(self.constant)
def backward(self, grad_output):
return grad_output.mul(self.constant)
class DivConstant(InplaceFunction):
@staticmethod
def forward(ctx, a, b, inplace=False):
tensor, ctx.constant, ctx.tensor_first = sort_args(a, b)
ctx.inplace = inplace
if ctx.tensor_first:
if inplace:
ctx.mark_dirty(tensor)
return tensor.div_(ctx.constant)
else:
return tensor.div(ctx.constant)
else:
ctx.save_for_backward(tensor)
if inplace:
ctx.mark_dirty(tensor)
return tensor.reciprocal_().mul_(ctx.constant)
else:
return tensor.reciprocal().mul_(ctx.constant)
def __init__(self, constant, div_by_tensor=False, inplace=False):
super(DivConstant, self).__init__(inplace)
self.constant = constant
self.div_by_tensor = div_by_tensor
if self.inplace and self.div_by_tensor:
# TODO: actually, as long as the type is floating point, we can
raise RuntimeError("can't perform (constant / tensor) division in-place")
@staticmethod
def backward(ctx, grad_output):
if ctx.tensor_first:
return grad_output.div(ctx.constant), None, None
def forward(self, a):
if self.div_by_tensor:
self.save_for_backward(a)
return a.new().resize_as_(a).fill_(self.constant).div_(a)
else:
v, = ctx.saved_variables
if ctx.inplace:
return None, grad_output.mul(v).mul(v).div_(-ctx.constant), None
if self.inplace:
return a.div_(self.constant)
else:
v_rep = v.reciprocal()
return None, grad_output.mul(v_rep).mul(v_rep).mul_(-ctx.constant), None
return a.div(self.constant)
def backward(self, grad_output):
if self.div_by_tensor:
a = self.saved_tensors[0]
return grad_output.neg().mul_(self.constant).div_(a).div_(a)
else:
return grad_output.div(self.constant)
class PowConstant(Function):
@staticmethod
def forward(ctx, a, b):
tensor, ctx.constant, ctx.tensor_first = sort_args(a, b)
if ctx.tensor_first:
ctx.save_for_backward(tensor)
return tensor.pow(ctx.constant)
else:
result = torch.pow(ctx.constant, tensor)
ctx.save_for_backward(result)
return result
def __init__(self, constant, tensor_power=False):
super(PowConstant, self).__init__()
self.constant = constant
self.tensor_power = tensor_power
@staticmethod
def backward(ctx, grad_output):
if ctx.tensor_first:
var, = ctx.saved_variables
return grad_output.mul(ctx.constant).mul(var.pow(ctx.constant - 1)), None
def forward(self, a):
if self.tensor_power:
self.fw_result = torch.pow(self.constant, a)
return self.fw_result
else:
var_result, = ctx.saved_variables
return None, grad_output.mul(var_result).mul_(math.log(ctx.constant))
self.save_for_backward(a)
return a.pow(self.constant)
def backward(self, grad_output):
if self.tensor_power:
return grad_output.mul(self.fw_result).mul_(math.log(self.constant))
else:
a = self.saved_tensors[0]
return grad_output.mul(self.constant).mul_(a.pow(self.constant - 1))
class Negate(InplaceFunction):
@staticmethod
def forward(ctx, i, inplace=False):
if inplace:
ctx.mark_dirty(i)
def forward(self, i):
if self.inplace:
return i.neg_()
else:
return i.neg()
@staticmethod
def backward(ctx, grad_output):
return grad_output.neg(), None
def backward(self, grad_output):
return grad_output.neg()

View File

@ -1,224 +1,195 @@
import torch
from ..function import Function, InplaceFunction
from .utils import maybe_unexpand
# TODO: no need to save all args if the grad w.r.t. some of them is not needed
def _get_output(ctx, arg, inplace=False):
if inplace:
ctx.mark_dirty(arg)
return arg
else:
return arg.new().resize_as_(arg)
class _BlasBase(InplaceFunction):
def __init__(self, alpha=1, beta=1, inplace=False):
super(_BlasBase, self).__init__(inplace)
self.alpha = alpha
self.beta = beta
def _get_output(self, arg):
if self.inplace:
self.mark_dirty(arg)
return arg
else:
return arg.new().resize_as_(arg)
class Addmm(InplaceFunction):
class Addmm(_BlasBase):
@staticmethod
def forward(ctx, add_matrix, matrix1, matrix2, alpha=1, beta=1, inplace=False):
ctx.alpha = alpha
ctx.beta = beta
ctx.add_matrix_size = add_matrix.size()
ctx.save_for_backward(matrix1, matrix2)
output = _get_output(ctx, add_matrix, inplace=inplace)
return torch.addmm(alpha, add_matrix, beta,
def forward(self, add_matrix, matrix1, matrix2):
self.save_for_backward(matrix1, matrix2)
output = self._get_output(add_matrix)
return torch.addmm(self.alpha, add_matrix, self.beta,
matrix1, matrix2, out=output)
@staticmethod
def backward(ctx, grad_output):
matrix1, matrix2 = ctx.saved_variables
def backward(self, grad_output):
matrix1, matrix2 = self.saved_tensors
grad_add_matrix = grad_matrix1 = grad_matrix2 = None
if ctx.needs_input_grad[0]:
grad_add_matrix = maybe_unexpand(grad_output, ctx.add_matrix_size)
if ctx.alpha != 1:
grad_add_matrix = grad_add_matrix.mul(ctx.alpha)
if self.needs_input_grad[0]:
grad_add_matrix = grad_output
if self.alpha != 1:
grad_add_matrix = grad_add_matrix.mul(self.alpha)
if ctx.needs_input_grad[1]:
if matrix1.stride() == (1, matrix1.size(0)):
# column major gradient if input is column major
grad_matrix1 = torch.mm(matrix2, grad_output.t()).t()
else:
grad_matrix1 = torch.mm(grad_output, matrix2.t())
if ctx.beta != 1:
grad_matrix1 *= ctx.beta
if self.needs_input_grad[1]:
grad_matrix1 = torch.mm(grad_output, matrix2.t())
if self.beta != 1:
grad_matrix1 *= self.beta
if ctx.needs_input_grad[2]:
if matrix2.stride() == (1, matrix2.size(0)):
# column major gradient if input is column major
grad_matrix2 = torch.mm(grad_output.t(), matrix1).t()
else:
grad_matrix2 = torch.mm(matrix1.t(), grad_output)
if ctx.beta != 1:
grad_matrix2 *= ctx.beta
if self.needs_input_grad[2]:
grad_matrix2 = torch.mm(matrix1.t(), grad_output)
if self.beta != 1:
grad_matrix2 *= self.beta
return grad_add_matrix, grad_matrix1, grad_matrix2, None, None, None
return grad_add_matrix, grad_matrix1, grad_matrix2
class Addbmm(InplaceFunction):
class Addbmm(_BlasBase):
@staticmethod
def forward(ctx, add_matrix, batch1, batch2, alpha=1, beta=1, inplace=False):
ctx.alpha = alpha
ctx.beta = beta
ctx.add_matrix_size = add_matrix.size()
ctx.save_for_backward(batch1, batch2)
output = _get_output(ctx, add_matrix, inplace=inplace)
return torch.addbmm(alpha, add_matrix, beta,
def forward(self, add_matrix, batch1, batch2):
self.save_for_backward(batch1, batch2)
output = self._get_output(add_matrix)
return torch.addbmm(self.alpha, add_matrix, self.beta,
batch1, batch2, out=output)
@staticmethod
def backward(ctx, grad_output):
batch1, batch2 = ctx.saved_variables
def backward(self, grad_output):
batch1, batch2 = self.saved_tensors
grad_add_matrix = grad_batch1 = grad_batch2 = None
if ctx.needs_input_grad[0]:
grad_add_matrix = maybe_unexpand(grad_output, ctx.add_matrix_size)
if ctx.alpha != 1:
grad_add_matrix = grad_add_matrix.mul(ctx.alpha)
if self.needs_input_grad[0]:
grad_add_matrix = grad_output
if self.alpha != 1:
grad_add_matrix = grad_add_matrix.mul(self.alpha)
if any(ctx.needs_input_grad[1:]):
if any(self.needs_input_grad[1:]):
batch_grad_output = (grad_output
.unsqueeze(0)
.expand(batch1.size(0), batch1.size(1), batch2.size(2)))
if ctx.needs_input_grad[1]:
if self.needs_input_grad[1]:
grad_batch1 = torch.bmm(batch_grad_output, batch2.transpose(1, 2))
if ctx.beta != 1:
grad_batch1 *= ctx.beta
if self.beta != 1:
grad_batch1 *= self.beta
if ctx.needs_input_grad[2]:
if self.needs_input_grad[2]:
grad_batch2 = torch.bmm(batch1.transpose(1, 2), batch_grad_output)
if ctx.beta != 1:
grad_batch2 *= ctx.beta
if self.beta != 1:
grad_batch2 *= self.beta
return grad_add_matrix, grad_batch1, grad_batch2, None, None, None
return grad_add_matrix, grad_batch1, grad_batch2
class Baddbmm(InplaceFunction):
class Baddbmm(_BlasBase):
@staticmethod
def forward(ctx, add_batch, batch1, batch2, alpha=1, beta=1, inplace=False):
ctx.alpha = alpha
ctx.beta = beta
ctx.add_batch_size = add_batch.size()
ctx.save_for_backward(batch1, batch2)
output = _get_output(ctx, add_batch, inplace=inplace)
return torch.baddbmm(alpha, add_batch, beta,
def forward(self, add_batch, batch1, batch2):
self.save_for_backward(batch1, batch2)
output = self._get_output(add_batch)
return torch.baddbmm(self.alpha, add_batch, self.beta,
batch1, batch2, out=output)
@staticmethod
def backward(ctx, grad_output):
batch1, batch2 = ctx.saved_variables
def backward(self, grad_output):
batch1, batch2 = self.saved_tensors
grad_add_batch = grad_batch1 = grad_batch2 = None
if ctx.needs_input_grad[0]:
grad_add_batch = maybe_unexpand(grad_output, ctx.add_batch_size)
if ctx.alpha != 1:
grad_add_batch = grad_add_batch.mul(ctx.alpha)
if self.needs_input_grad[0]:
grad_add_batch = grad_output
if self.alpha != 1:
grad_add_batch = grad_add_batch.mul(self.alpha)
if ctx.needs_input_grad[1]:
if self.needs_input_grad[1]:
grad_batch1 = torch.bmm(grad_output, batch2.transpose(1, 2))
if ctx.beta != 1:
grad_batch1 *= ctx.beta
if self.beta != 1:
grad_batch1 *= self.beta
if ctx.needs_input_grad[2]:
if self.needs_input_grad[2]:
grad_batch2 = torch.bmm(batch1.transpose(1, 2), grad_output)
if ctx.beta != 1:
grad_batch2 *= ctx.beta
if self.beta != 1:
grad_batch2 *= self.beta
return grad_add_batch, grad_batch1, grad_batch2, None, None, None
return grad_add_batch, grad_batch1, grad_batch2
class Addmv(InplaceFunction):
class Addmv(_BlasBase):
@staticmethod
def forward(ctx, add_vector, matrix, vector, alpha=1, beta=1, inplace=False):
ctx.alpha = alpha
ctx.beta = beta
ctx.add_vector_size = add_vector.size()
ctx.save_for_backward(matrix, vector)
output = _get_output(ctx, add_vector, inplace=inplace)
return torch.addmv(alpha, add_vector, beta,
def forward(self, add_vector, matrix, vector):
self.save_for_backward(matrix, vector)
output = self._get_output(add_vector)
return torch.addmv(self.alpha, add_vector, self.beta,
matrix, vector, out=output)
@staticmethod
def backward(ctx, grad_output):
matrix, vector = ctx.saved_variables
def backward(self, grad_output):
matrix, vector = self.saved_tensors
grad_add_vector = grad_matrix = grad_vector = None
if ctx.needs_input_grad[0]:
grad_add_vector = maybe_unexpand(grad_output, ctx.add_vector_size)
if ctx.alpha != 1:
grad_add_vector = grad_add_vector.mul(ctx.alpha)
if self.needs_input_grad[0]:
grad_add_vector = grad_output
if self.alpha != 1:
grad_add_vector = grad_add_vector.mul(self.alpha)
if ctx.needs_input_grad[1]:
if self.needs_input_grad[1]:
grad_matrix = torch.ger(grad_output, vector)
if ctx.beta != 1:
grad_matrix *= ctx.beta
if self.beta != 1:
grad_matrix *= self.beta
if ctx.needs_input_grad[2]:
if self.needs_input_grad[2]:
grad_vector = torch.mv(matrix.t(), grad_output)
if ctx.beta != 1:
grad_vector *= ctx.beta
if self.beta != 1:
grad_vector *= self.beta
return grad_add_vector, grad_matrix, grad_vector, None, None, None
return grad_add_vector, grad_matrix, grad_vector
class Addr(InplaceFunction):
class Addr(_BlasBase):
@staticmethod
def forward(ctx, add_matrix, vector1, vector2, alpha=1, beta=1, inplace=False):
ctx.alpha = alpha
ctx.beta = beta
ctx.add_matrix_size = add_matrix.size()
ctx.save_for_backward(vector1, vector2)
output = _get_output(ctx, add_matrix, inplace=inplace)
return torch.addr(alpha, add_matrix, beta,
def forward(self, add_matrix, vector1, vector2):
self.save_for_backward(vector1, vector2)
output = self._get_output(add_matrix)
return torch.addr(self.alpha, add_matrix, self.beta,
vector1, vector2, out=output)
@staticmethod
def backward(ctx, grad_output):
vector1, vector2 = ctx.saved_variables
def backward(self, grad_output):
vector1, vector2 = self.saved_tensors
grad_add_matrix = grad_vector1 = grad_vector2 = None
if ctx.needs_input_grad[0]:
grad_add_matrix = maybe_unexpand(grad_output, ctx.add_matrix_size)
if ctx.alpha != 1:
grad_add_matrix = grad_add_matrix.mul(ctx.alpha)
if self.needs_input_grad[0]:
grad_add_matrix = grad_output
if self.alpha != 1:
grad_add_matrix = grad_add_matrix.mul(self.alpha)
if ctx.needs_input_grad[1]:
if self.needs_input_grad[1]:
grad_vector1 = torch.mv(grad_output, vector2)
if ctx.beta != 1:
grad_vector1 *= ctx.beta
if self.beta != 1:
grad_vector1 *= self.beta
if ctx.needs_input_grad[2]:
if self.needs_input_grad[2]:
# TODO: maybe it's better to do transpose + mv + transpose
grad_vector2 = torch.mm(vector1.unsqueeze(0), grad_output).squeeze(0)
if ctx.beta != 1:
grad_vector2 *= ctx.beta
if self.beta != 1:
grad_vector2 *= self.beta
return grad_add_matrix, grad_vector1, grad_vector2, None, None, None
return grad_add_matrix, grad_vector1, grad_vector2
class Dot(Function):
@staticmethod
def forward(ctx, vector1, vector2):
ctx.save_for_backward(vector1, vector2)
ctx.sizes = (vector1.size(), vector2.size())
def forward(self, vector1, vector2):
self.save_for_backward(vector1, vector2)
self.sizes = (vector1.size(), vector2.size())
return vector1.new((vector1.dot(vector2),))
@staticmethod
def backward(ctx, grad_output):
vector1, vector2 = ctx.saved_variables
def backward(self, grad_output):
vector1, vector2 = self.saved_tensors
grad_vector1 = grad_vector2 = None
if ctx.needs_input_grad[0]:
grad_vector1 = vector2.mul(grad_output.expand(ctx.sizes[1])).view(ctx.sizes[0])
if self.needs_input_grad[0]:
grad_vector1 = vector2.mul(grad_output[0]).view(self.sizes[0])
if ctx.needs_input_grad[1]:
grad_vector2 = vector1.mul(grad_output.expand(ctx.sizes[0])).view(ctx.sizes[1])
if self.needs_input_grad[1]:
grad_vector2 = vector1.mul(grad_output[0]).view(self.sizes[1])
return grad_vector1, grad_vector2

View File

@ -1,28 +1,19 @@
import torch
from ..function import Function
from .utils import maybe_unexpand, maybe_unexpand_or_view
# TODO: once Cpp-style functions are implemented we can detach a and b
# before calling forward.
class _CompareOp(Function):
@classmethod
def forward(cls, ctx, a, b):
ctx.a_size = a.size()
ctx.b_tensor = torch.is_tensor(b)
ctx.b_size = b.size() if ctx.b_tensor else None
ctx.input_type = type(a)
mask = getattr(a, cls.fn_name)(b)
ctx.mark_non_differentiable(mask)
return mask
def __init__(self, scalar=None):
super(_CompareOp, self).__init__()
self.scalar = scalar
@staticmethod
def backward(ctx, grad_output):
grad_input = (grad_output * 0).type(ctx.input_type)
return (maybe_unexpand(grad_input, ctx.a_size),
maybe_unexpand_or_view(grad_input, ctx.b_size) if ctx.b_tensor else None)
def forward(self, tensor1, tensor2=None):
other = tensor2 if tensor2 is not None else self.scalar
mask = getattr(tensor1, self.fn_name)(other)
self.mark_non_differentiable(mask)
return mask
class Eq(_CompareOp):

View File

@ -1,104 +1,71 @@
import torch
from ..function import Function
from ..variable import Variable
class Diag(Function):
@staticmethod
def forward(ctx, input, diagonal_idx=0):
ctx.diagonal_idx = diagonal_idx
return input.diag(ctx.diagonal_idx)
def __init__(self, diagonal_idx=0):
super(Diag, self).__init__()
self.diagonal_idx = diagonal_idx
@staticmethod
def backward(ctx, grad_output):
return grad_output.diag(ctx.diagonal_idx), None
def forward(self, input):
return input.diag(self.diagonal_idx)
def backward(self, grad_output):
return grad_output.diag(self.diagonal_idx)
class Tril(Function):
@staticmethod
def forward(ctx, input, diagonal_idx=0):
ctx.diagonal_idx = diagonal_idx
return input.tril(ctx.diagonal_idx)
def __init__(self, diagonal_idx=0):
super(Tril, self).__init__()
self.diagonal_idx = diagonal_idx
@staticmethod
def backward(ctx, grad_output):
return grad_output.tril(ctx.diagonal_idx), None
def forward(self, input):
return input.tril(self.diagonal_idx)
def backward(self, grad_output):
return grad_output.tril(self.diagonal_idx)
class Triu(Function):
@staticmethod
def forward(ctx, input, diagnoal_idx=0):
ctx.diagonal_idx = diagnoal_idx
return input.triu(ctx.diagonal_idx)
def __init__(self, diagonal_idx=0):
super(Triu, self).__init__()
self.diagonal_idx = diagonal_idx
@staticmethod
def backward(ctx, grad_output):
return grad_output.triu(ctx.diagonal_idx), None
def forward(self, input):
return input.triu(self.diagonal_idx)
def backward(self, grad_output):
return grad_output.triu(self.diagonal_idx)
class Trace(Function):
@staticmethod
def forward(ctx, input):
ctx.isize = input.size()
return input.new((input.trace(), ))
def forward(self, input):
self.isize = input.size()
return input.new((input.trace(),))
@staticmethod
def backward(ctx, grad_output):
isize = ctx.isize
min_size = min(isize)
grad_input = Variable(grad_output.data.new(isize).zero_()).view(-1)
grad_input[::(isize[1] + 1)] = grad_output.expand(min_size)
return grad_input.view(isize)
def backward(self, grad_output):
isize = self.isize
grad_input = grad_output.new(isize).zero_()
grad_input.view(-1)[::(isize[1] + 1)] = grad_output[0]
return grad_input
class Cross(Function):
@staticmethod
def forward(ctx, input, other, dim=-1):
ctx.dim = dim
ctx.save_for_backward(input, other)
return torch.cross(input, other, ctx.dim)
def __init__(self, dim=-1):
self.dim = dim
@staticmethod
def backward(ctx, grad_output):
input, other = ctx.saved_variables
grad_input = other.cross(grad_output, ctx.dim)
grad_other = grad_output.cross(input, ctx.dim)
return grad_input, grad_other, None
def forward(self, input, other):
self.save_for_backward(input, other)
return torch.cross(input, other, self.dim)
class Inverse(Function):
@staticmethod
def forward(ctx, input):
inverse = torch.inverse(input)
ctx.save_for_backward(inverse)
return inverse
@staticmethod
def backward(ctx, grad_output):
inverse, = ctx.saved_variables
return -torch.mm(inverse.t(), torch.mm(grad_output, inverse.t()))
class Gesv(Function):
@staticmethod
def forward(ctx, b, a):
# TODO see if one can backprop through LU
X, LU = torch.gesv(b, a)
ctx.save_for_backward(X, a)
ctx.mark_non_differentiable(LU)
return X, LU
@staticmethod
def backward(ctx, grad_output, grad_LU=None):
X, a = ctx.saved_variables
grad_b, _ = torch.gesv(grad_output, a.t())
grad_a = -torch.mm(grad_b, X.t())
return grad_b, grad_a
def backward(self, grad_output):
input, other = self.saved_tensors
grad_input = torch.cross(other, grad_output, self.dim)
grad_other = torch.cross(grad_output, input, self.dim)
return grad_input, grad_other

View File

@ -1,351 +1,282 @@
from itertools import repeat
from ..._thnn import type2backend
from ..function import Function, InplaceFunction
from ..variable import Variable
from .utils import maybe_unexpand, maybe_unexpand_or_view
class Exp(InplaceFunction):
@staticmethod
def forward(ctx, i, inplace=False):
if inplace:
ctx.mark_dirty(i)
def forward(self, i):
if self.inplace:
self.mark_dirty(i)
result = i.exp_()
else:
result = i.exp()
ctx.save_for_backward(result)
self.save_for_backward(result)
return result
@staticmethod
def backward(ctx, grad_output):
result, = ctx.saved_variables
return grad_output * result, None
def backward(self, grad_output):
return self.saved_tensors[0] * grad_output
class Log(Function):
@staticmethod
def forward(ctx, i):
ctx.save_for_backward(i)
def forward(self, i):
self.save_for_backward(i)
return i.log()
@staticmethod
def backward(ctx, grad_output):
i, = ctx.saved_variables
return grad_output.div(i)
def backward(self, grad_output):
return grad_output.div(self.saved_tensors[0])
class Log1p(Function):
@staticmethod
def forward(ctx, i):
ctx.save_for_backward(i)
def forward(self, i):
self.save_for_backward(i)
return i.log1p()
@staticmethod
def backward(ctx, grad_output):
i, = ctx.saved_variables
return grad_output.div(i.add(1))
def backward(self, grad_output):
return grad_output.div(self.saved_tensors[0].add(1))
class Tanh(InplaceFunction):
@staticmethod
def forward(ctx, i, inplace=False):
if inplace:
ctx.mark_dirty(i)
def forward(self, i):
if self.inplace:
self.mark_dirty(i)
result = i.tanh_()
else:
result = i.tanh()
ctx.save_for_backward(result)
self.save_for_backward(result)
return result
@staticmethod
def backward(ctx, grad_output):
result, = ctx.saved_variables
if grad_output.volatile:
grad_input = Variable(grad_output.data.new(grad_output.size()), volatile=True)
backend = type2backend[type(result.data)]
backend.Tanh_updateGradInput(backend.library_state, None, grad_output.data,
grad_input.data, result.data)
else:
grad_input = grad_output * (1 - result * result)
return grad_input, None
def backward(self, grad_output):
result, = self.saved_tensors
return grad_output * (1 - result * result)
class Sigmoid(InplaceFunction):
@staticmethod
def forward(ctx, i, inplace=False):
if inplace:
ctx.mark_dirty(i)
def forward(self, i):
if self.inplace:
self.mark_dirty(i)
result = i.sigmoid_()
else:
result = i.sigmoid()
ctx.save_for_backward(result)
self.save_for_backward(result)
return result
@staticmethod
def backward(ctx, grad_output):
result, = ctx.saved_variables
if grad_output.volatile:
grad_input = Variable(grad_output.data.new(grad_output.size()), volatile=True)
backend = type2backend[type(result.data)]
backend.Sigmoid_updateGradInput(backend.library_state, None, grad_output.data,
grad_input.data, result.data)
else:
grad_input = grad_output * ((1 - result) * result)
return grad_input, None
def backward(self, grad_output):
result, = self.saved_tensors
return grad_output * ((1 - result) * result)
class Sinh(Function):
@staticmethod
def forward(ctx, i):
ctx.save_for_backward(i)
def forward(self, i):
self.save_for_backward(i)
return i.sinh()
@staticmethod
def backward(ctx, grad_output):
i, = ctx.saved_variables
def backward(self, grad_output):
i, = self.saved_tensors
return grad_output * i.cosh()
class Cosh(Function):
@staticmethod
def forward(ctx, i):
ctx.save_for_backward(i)
def forward(self, i):
self.save_for_backward(i)
return i.cosh()
@staticmethod
def backward(ctx, grad_output):
i, = ctx.saved_variables
def backward(self, grad_output):
i, = self.saved_tensors
return grad_output * i.sinh()
class Abs(Function):
@staticmethod
def forward(ctx, i):
ctx.save_for_backward(i)
def forward(self, i):
self.save_for_backward(i)
return i.abs()
@staticmethod
def backward(ctx, grad_output):
i, = ctx.saved_variables
def backward(self, grad_output):
i, = self.saved_tensors
return grad_output * i.sign()
class Clamp(Function):
@staticmethod
def forward(ctx, i, min_val, max_val):
ctx._mask = (i.ge(min_val) * i.le(max_val))
return i.clamp(min_val, max_val)
def __init__(self, min_val, max_val):
super(Clamp, self).__init__()
self.min_val = min_val
self.max_val = max_val
@staticmethod
def backward(ctx, grad_output):
mask = Variable(ctx._mask.type_as(grad_output.data))
return grad_output * mask, None, None
def forward(self, i):
self.save_for_backward(i)
return i.clamp(self.min_val, self.max_val)
def backward(self, grad_output):
i, = self.saved_tensors
mask = i.ge(self.min_val) * i.le(self.max_val)
return grad_output * mask.type_as(grad_output)
class Sqrt(Function):
@staticmethod
def forward(ctx, i):
ctx.save_for_backward(i)
def forward(self, i):
self.save_for_backward(i)
return i.sqrt()
@staticmethod
def backward(ctx, grad_output):
i, = ctx.saved_variables
return grad_output.mul(i.pow(-0.5)).div_(2)
def backward(self, grad_output):
i, = self.saved_tensors
return grad_output.mul(i.pow(-0.5)).div(2)
class Sin(Function):
@staticmethod
def forward(ctx, i):
ctx.save_for_backward(i)
def forward(self, i):
self.save_for_backward(i)
return i.sin()
@staticmethod
def backward(ctx, grad_output):
i, = ctx.saved_variables
def backward(self, grad_output):
i, = self.saved_tensors
return grad_output * i.cos()
class Cos(Function):
@staticmethod
def forward(ctx, i):
ctx.save_for_backward(i)
def forward(self, i):
self.save_for_backward(i)
return i.cos()
@staticmethod
def backward(ctx, grad_output):
i, = ctx.saved_variables
def backward(self, grad_output):
i, = self.saved_tensors
return grad_output.mul(i.sin()).neg_()
class Tan(Function):
@staticmethod
def forward(ctx, i):
ctx.save_for_backward(i)
def forward(self, i):
self.save_for_backward(i)
return i.tan()
@staticmethod
def backward(ctx, grad_output):
i, = ctx.saved_variables
def backward(self, grad_output):
i, = self.saved_tensors
return grad_output.div(i.cos().pow(2))
class Asin(Function):
@staticmethod
def forward(ctx, i):
ctx.save_for_backward(i)
def forward(self, i):
self.save_for_backward(i)
return i.asin()
@staticmethod
def backward(ctx, grad_output):
i, = ctx.saved_variables
return grad_output * (1 - i.mul(i)).sqrt().reciprocal()
def backward(self, grad_output):
i, = self.saved_tensors
return grad_output * (1 - i.mul(i)).sqrt_().reciprocal_()
class Acos(Function):
@staticmethod
def forward(ctx, i):
ctx.save_for_backward(i)
def forward(self, i):
self.save_for_backward(i)
return i.acos()
@staticmethod
def backward(ctx, grad_output):
i, = ctx.saved_variables
return grad_output.mul((1 - i.mul(i)).sqrt().reciprocal()).neg_()
def backward(self, grad_output):
i, = self.saved_tensors
return grad_output.mul((1 - i.mul(i)).sqrt_().reciprocal_()).neg_()
class Atan(Function):
@staticmethod
def forward(ctx, i):
ctx.save_for_backward(i)
def forward(self, i):
self.save_for_backward(i)
return i.atan()
@staticmethod
def backward(ctx, grad_output):
i, = ctx.saved_variables
return grad_output * i.mul(i).add_(1).reciprocal()
def backward(self, grad_output):
i, = self.saved_tensors
return grad_output * i.mul(i).add_(1).reciprocal_()
class Atan2(Function):
@staticmethod
def forward(ctx, y, x):
ctx.save_for_backward(y, x)
return y.atan2(x)
@staticmethod
def backward(ctx, grad_output):
y, x, = ctx.saved_variables
denominator = y.mul(y).add(x.mul(x)).reciprocal()
return grad_output * x.mul(denominator), grad_output * y.neg().mul(denominator)
# TODO: make inplace and update grad formulas
class Reciprocal(Function):
@staticmethod
def forward(ctx, i):
def forward(self, i):
result = i.reciprocal()
ctx.save_for_backward(result)
self.save_for_backward(result)
return result
@staticmethod
def backward(ctx, grad_output):
result, = ctx.saved_variables
def backward(self, grad_output):
result, = self.saved_tensors
return grad_output * result.mul(result).neg_()
class Cmax(Function):
@staticmethod
def forward(ctx, a, b):
ctx._a_size = a.size()
ctx._b_size = b.size()
ctx._mask = a.gt(b)
def forward(self, a, b):
self._max_buffer = a.gt(b).type_as(a)
return a.max(b)
@staticmethod
def backward(ctx, grad_output):
mask = Variable(ctx._mask.type_as(grad_output.data))
def backward(self, grad_output):
return (
maybe_unexpand(grad_output * mask, ctx._a_size),
maybe_unexpand_or_view(grad_output * Variable(ctx._mask.eq(0).type_as(grad_output.data)), ctx._b_size)
grad_output * self._max_buffer,
grad_output * self._max_buffer.eq(0).type_as(grad_output)
)
class CmaxConstant(Function):
@staticmethod
def forward(ctx, i, constant):
ctx._mask = i.gt(constant)
return i.clamp(min=constant)
def __init__(self, constant):
super(CmaxConstant, self).__init__()
self.constant = constant
@staticmethod
def backward(ctx, grad_output):
mask = Variable(ctx._mask.type_as(grad_output.data))
return grad_output * mask, None
def forward(self, i):
self._max_buffer = i.gt(self.constant).type_as(i)
return i.clamp(min=self.constant)
def backward(self, grad_output):
return grad_output * self._max_buffer
class Cmin(Function):
@staticmethod
def forward(ctx, a, b):
ctx._a_size = a.size()
ctx._b_size = b.size()
ctx._mask = a.lt(b).type_as(a)
def forward(self, a, b):
self._min_buffer = a.lt(b).type_as(a)
return a.min(b)
@staticmethod
def backward(ctx, grad_output):
mask = Variable(ctx._mask.type_as(grad_output.data))
def backward(self, grad_output):
return (
maybe_unexpand(grad_output * mask, ctx._a_size),
maybe_unexpand_or_view(grad_output * Variable(ctx._mask.eq(0).type_as(grad_output.data)), ctx._b_size)
grad_output * self._min_buffer,
grad_output * self._min_buffer.eq(0).type_as(grad_output)
)
class CminConstant(Function):
@staticmethod
def forward(ctx, i, constant):
ctx._mask = i.lt(constant)
return i.clamp(max=constant)
def __init__(self, constant):
super(CminConstant, self).__init__()
self.constant = constant
@staticmethod
def backward(ctx, grad_output):
mask = Variable(ctx._mask.type_as(grad_output.data))
return grad_output * mask, None
def forward(self, i):
self._min_buffer = i.lt(self.constant).type_as(i)
return i.clamp(max=self.constant)
def backward(self, grad_output):
return grad_output * self._min_buffer
class _ConstantGrad(Function):
grad_value = 0
@classmethod
def forward(cls, ctx, *args):
ctx._num_args = len(args)
ctx._args0_size = args[0].size()
return getattr(args[0], cls.__name__.lower())(*args[1:])
def __init__(self, *args):
super(_ConstantGrad, self).__init__()
self.args = args
@classmethod
def backward(cls, ctx, grad_output):
return (maybe_unexpand(grad_output.mul(cls.grad_value), ctx._args0_size),) + (ctx._num_args - 1) * (None,)
def forward(self, i):
return getattr(i, type(self).__name__.lower())(*self.args)
def backward(self, grad_output):
grad_input = grad_output.new(*repeat(1, grad_output.dim()))
grad_input = grad_input.fill_(self.grad_value).expand_as(grad_output)
return grad_input.mul(grad_output)
class Floor(_ConstantGrad):
@ -382,96 +313,91 @@ class Remainder(_ConstantGrad):
class Lerp(Function):
@staticmethod
def forward(ctx, a, b, weight):
ctx._a_size = a.size()
ctx._b_size = b.size()
ctx._weight = float(weight)
return a.lerp(b, ctx._weight)
def __init__(self, weight):
super(Lerp, self).__init__()
self.weight = float(weight)
@staticmethod
def backward(ctx, grad_output):
return (maybe_unexpand(grad_output.mul(1 - ctx._weight), ctx._a_size),
maybe_unexpand_or_view(grad_output.mul(ctx._weight), ctx._b_size), None)
def forward(self, a, b):
return a.lerp(b, self.weight)
def backward(self, grad_output):
return grad_output.mul(1 - self.weight), grad_output.mul(self.weight)
class Rsqrt(InplaceFunction):
@staticmethod
def forward(ctx, i, inplace=False):
if inplace:
ctx.mark_dirty(i)
result = i.rsqrt_()
def forward(self, input):
if self.inplace:
self.mark_dirty(input)
result = input.rsqrt_()
else:
result = i.rsqrt()
ctx.save_for_backward(result)
result = input.rsqrt()
self.save_for_backward(result)
return result
@staticmethod
def backward(ctx, grad_output):
result, = ctx.saved_variables
return result.pow(3).div_(-2).mul(grad_output), None
def backward(self, grad_output):
result, = self.saved_tensors
return result.pow(3).div_(-2).mul_(grad_output)
class Addcmul(InplaceFunction):
@staticmethod
def forward(ctx, add_tensor, mul_tensor1, mul_tensor2, scale=1.0, inplace=False):
ctx._scale = scale
ctx._add_tensor_size = add_tensor.size()
ctx.save_for_backward(mul_tensor1, mul_tensor2)
if inplace:
ctx.mark_dirty(add_tensor)
return add_tensor.addcmul_(scale, mul_tensor1, mul_tensor2)
def __init__(self, scale=1, inplace=False):
super(Addcmul, self).__init__(inplace)
self.scale = scale
def forward(self, add_tensor, mul_tensor1, mul_tensor2):
self.save_for_backward(mul_tensor1, mul_tensor2)
if self.inplace:
return add_tensor.addcmul_(self.scale, mul_tensor1, mul_tensor2)
else:
return add_tensor.addcmul(scale, mul_tensor1, mul_tensor2)
return add_tensor.addcmul(self.scale, mul_tensor1, mul_tensor2)
@staticmethod
def backward(ctx, grad_output):
def backward(self, grad_output):
grad_add = grad_mul1 = grad_mul2 = None
mul_tensor1, mul_tensor2 = ctx.saved_variables
mul_tensor1, mul_tensor2 = self.saved_tensors
if ctx.needs_input_grad[0]:
grad_add = maybe_unexpand(grad_output, ctx._add_tensor_size)
if self.needs_input_grad[0]:
grad_add = grad_output
if ctx.needs_input_grad[1]:
grad_mul1 = maybe_unexpand_or_view(grad_output.mul(mul_tensor2).mul_(ctx._scale), mul_tensor1.size())
if self.needs_input_grad[1]:
grad_mul1 = grad_output.mul(mul_tensor2).mul(self.scale)
if ctx.needs_input_grad[2]:
grad_mul2 = maybe_unexpand_or_view(grad_output.mul(mul_tensor1).mul_(ctx._scale), mul_tensor2.size())
if self.needs_input_grad[2]:
grad_mul2 = grad_output.mul(mul_tensor1).mul(self.scale)
return grad_add, grad_mul1, grad_mul2, None, None
return grad_add, grad_mul1, grad_mul2
class Addcdiv(InplaceFunction):
@staticmethod
def forward(ctx, add_tensor, div_tensor1, div_tensor2, scale=1.0, inplace=False):
ctx._scale = scale
ctx._add_tensor_size = add_tensor.size()
ctx.save_for_backward(div_tensor1, div_tensor2)
if inplace:
ctx.mark_dirty(add_tensor)
return add_tensor.addcdiv_(ctx._scale, div_tensor1, div_tensor2)
def __init__(self, scale=1, inplace=False):
super(Addcdiv, self).__init__(inplace)
self.scale = scale
def forward(self, add_tensor, div_tensor1, div_tensor2):
self.save_for_backward(div_tensor1, div_tensor2)
if self.inplace:
return add_tensor.addcdiv_(self.scale, div_tensor1, div_tensor2)
else:
return add_tensor.addcdiv(ctx._scale, div_tensor1, div_tensor2)
return add_tensor.addcdiv(self.scale, div_tensor1, div_tensor2)
@staticmethod
def backward(ctx, grad_output):
def backward(self, grad_output):
grad_add = grad_div1 = grad_div2 = None
div_tensor1, div_tensor2 = ctx.saved_variables
div_tensor1, div_tensor2 = self.saved_tensors
if ctx.needs_input_grad[0]:
grad_add = maybe_unexpand(grad_output, ctx._add_tensor_size)
if self.needs_input_grad[0]:
grad_add = grad_output
if ctx.needs_input_grad[1]:
grad_div1 = maybe_unexpand_or_view(grad_output.div(div_tensor2).mul_(ctx._scale), div_tensor1.size())
if self.needs_input_grad[1]:
grad_div1 = grad_output.div(div_tensor2).mul(self.scale)
if ctx.needs_input_grad[2]:
if self.needs_input_grad[2]:
div_tensor2_sq = div_tensor2.mul(div_tensor2)
grad_div2 = maybe_unexpand_or_view(grad_output.mul(div_tensor1).div(div_tensor2_sq).mul(-ctx._scale),
div_tensor2.size())
grad_div2 = grad_output.mul(div_tensor1).div_(div_tensor2_sq)
grad_div2.neg_().mul_(self.scale)
return grad_add, grad_div1, grad_div2
return grad_add, grad_div1, grad_div2, None, None
# TODO: atan2 + inplace

View File

@ -1,141 +1,110 @@
from functools import reduce
from ..function import Function
from ..variable import Variable
import torch
class Sum(Function):
class _DimReduceFunction(Function):
@staticmethod
def forward(ctx, input, dim=None, keepdim=None):
ctx.dim = dim
ctx.keepdim = False if keepdim is None else keepdim
ctx.input_size = input.size()
if dim is None:
return input.new((input.sum(),))
def __init__(self, dim=None):
super(_DimReduceFunction, self).__init__()
self.dim = dim
def forward(self, input):
self.input_size = input.size()
fn = getattr(input, self.fn_name)
if self.dim is None:
return input.new((fn(),))
else:
if keepdim is not None:
return input.sum(dim, keepdim=keepdim)
else:
return input.sum(dim)
return fn(self.dim)
@staticmethod
def backward(ctx, grad_output):
if ctx.dim is None:
return grad_output.expand(ctx.input_size), None, None
class Sum(_DimReduceFunction):
fn_name = 'sum'
def backward(self, grad_output):
if self.dim is None:
return grad_output.new(self.input_size).fill_(grad_output[0])
else:
if ctx.keepdim is False and len(ctx.input_size) != 1:
grad_output = grad_output.unsqueeze(ctx.dim)
repeats = [1 for _ in ctx.input_size]
repeats[ctx.dim] = ctx.input_size[ctx.dim]
return grad_output.repeat(*repeats), None, None
repeats = [1 for _ in self.input_size]
repeats[self.dim] = self.input_size[self.dim]
return grad_output.repeat(*repeats),
class Prod(Function):
class Prod(_DimReduceFunction):
@staticmethod
def forward(ctx, input, dim=None, keepdim=None):
ctx.dim = dim
ctx.keepdim = False if keepdim is None else keepdim
ctx.input_size = input.size()
if dim is None:
ctx.result = input.prod()
ctx.save_for_backward(input)
return input.new((ctx.result,))
def forward(self, input):
self.input_size = input.size()
if self.dim is None:
self.result = input.prod()
self.save_for_backward(input)
return input.new((self.result,))
else:
if keepdim is not None:
output = input.prod(dim, keepdim=keepdim)
else:
output = input.prod(dim)
ctx.save_for_backward(input, output)
output = input.prod(self.dim)
self.save_for_backward(input, output)
return output
@staticmethod
def backward(ctx, grad_output):
def safe_zeros_backward(inp, dim):
# note that the gradient is equivalent to:
# cumprod(exclusive, normal) * cumprod(exclusive, reverse), e.g.:
# input: [ a, b, c]
# cumprod(exclusive, normal): [1 , a, a * b]
# cumprod(exclusive, reverse): [b * c, c, 1]
# product: [b * c, a * c, a * b]
# and this is safe under input with 0s.
if inp.size(dim) == 1:
return grad_output
def backward(self, grad_output):
if self.dim is None:
input, = self.saved_tensors
zero_idx = (input == 0).nonzero()
if zero_idx.dim() == 0:
return grad_output.mul(self.result).expand_as(input).div(input)
elif zero_idx.size(0) > 1:
return grad_output.new(self.input_size).zero_()
else:
grad_input = grad_output.new(self.input_size).zero_()
zero_idx = tuple(zero_idx[0].cpu())
input_copy = input.clone()
input_copy[zero_idx] = 1.
grad_input[zero_idx] = grad_output[0] * input_copy.prod()
return grad_input
else:
input, output = self.saved_tensors
dim = self.dim if self.dim >= 0 else self.dim + input.dim()
zero_mask = input == 0
slice_zero_count = zero_mask.sum(dim)
total_zeros = slice_zero_count.sum()
grad_input = grad_output.mul(output).expand_as(input).div(input)
if total_zeros == 0:
return grad_input
ones_size = torch.Size((inp.size()[:dim] + (1,) + inp.size()[dim + 1:]))
ones = Variable(grad_output.data.new(ones_size).fill_(1))
exclusive_normal_nocp = torch.cat((ones, inp.narrow(dim, 0, inp.size(dim) - 1)), dim)
exclusive_normal = exclusive_normal_nocp.cumprod(dim)
some_zeros = slice_zero_count.gt(0).expand_as(grad_input)
grad_input[some_zeros] = 0
def reverse_dim(var, dim):
return var.index_select(dim, Variable(torch.arange(var.size(dim) - 1, -1, -1)).long())
single_zero_idx = slice_zero_count.eq(1).nonzero()
narrow_reverse = reverse_dim(inp.narrow(dim, 1, inp.size(dim) - 1), dim)
exclusive_reverse_nocp = torch.cat((ones, narrow_reverse), dim)
exclusive_reverse = reverse_dim(exclusive_reverse_nocp.cumprod(dim), dim)
if len(single_zero_idx) == 0:
return grad_input
for idx in single_zero_idx:
idx_tuple = tuple(idx.cpu())
input_idx_tuple = idx_tuple[:dim] + (slice(0, None),) + idx_tuple[dim + 1:]
# slice_mask and input_copy are 1D
slice_mask = zero_mask[input_idx_tuple]
input_copy = input[input_idx_tuple].clone()
zero_idx = slice_mask.nonzero()[0, 0]
input_copy[zero_idx] = 1.
grad_idx_tuple = idx_tuple[:dim] + (zero_idx,) + idx_tuple[dim + 1:]
grad_input[grad_idx_tuple] = grad_output[idx_tuple] * input_copy.prod()
grad_input = grad_output.expand_as(exclusive_normal).mul(exclusive_normal.mul(exclusive_reverse))
return grad_input
if ctx.dim is None:
input, = ctx.saved_variables
zero_idx = (input.data == 0).nonzero()
if zero_idx.dim() == 0:
return grad_output.mul(ctx.result).expand_as(input).div(input), None, None
elif zero_idx.size(0) > 1:
return (grad_output * 0).expand_as(input), None, None
else:
return safe_zeros_backward(input.contiguous().view(-1), 0).view_as(input), None, None
class Mean(_DimReduceFunction):
fn_name = 'mean'
def backward(self, grad_output):
if self.dim is None:
grad_input_val = grad_output[0]
grad_input_val /= reduce(lambda x, y: x * y, self.input_size, 1)
return grad_output.new(*self.input_size).fill_(grad_input_val)
else:
input, output = ctx.saved_variables
dim = ctx.dim if ctx.dim >= 0 else ctx.dim + input.dim()
if ctx.keepdim is False and len(ctx.input_size) != 1:
grad_output = grad_output.unsqueeze(dim)
output = output.unsqueeze(dim)
zero_mask = input == 0
slice_zero_count = zero_mask.sum(dim, True)
total_zeros = slice_zero_count.data.sum()
if total_zeros == 0:
grad_input = grad_output.mul(output).expand_as(input).div(input)
else:
grad_input = safe_zeros_backward(input, dim)
return grad_input, None, None
class Mean(Function):
@staticmethod
def forward(ctx, input, dim=None, keepdim=None):
ctx.dim = dim
ctx.keepdim = False if keepdim is None else keepdim
ctx.input_size = input.size()
if dim is None:
return input.new((input.mean(),))
else:
if keepdim is not None:
return input.mean(dim, keepdim=keepdim)
else:
return input.mean(dim)
@staticmethod
def backward(ctx, grad_output):
if ctx.dim is None:
grad_input_val = grad_output / reduce(lambda x, y: x * y, ctx.input_size, 1)
return grad_input_val.expand(ctx.input_size), None, None
else:
if ctx.keepdim is False and len(ctx.input_size) != 1:
grad_output = grad_output.unsqueeze(ctx.dim)
repeats = [1 for _ in ctx.input_size]
dim_size = ctx.input_size[ctx.dim]
repeats[ctx.dim] = dim_size
return grad_output.repeat(*repeats).div_(dim_size), None, None
repeats = [1 for _ in self.input_size]
dim_size = self.input_size[self.dim]
repeats[self.dim] = dim_size
return grad_output.repeat(*repeats).div_(dim_size)
class _SelectionFunction(Function):
@ -143,53 +112,44 @@ class _SelectionFunction(Function):
# additional_args is prepended before dim when calling the tensor
# function. It's a no-op for subclasses other than kthvalue.
# kthvalue not only requires us to pass a dim, but also preceed it with k.
additional_args = tuple()
@classmethod
def forward(cls, ctx, input, dim=None, keepdim=None, additional_args=tuple()):
fn = getattr(input, cls.__name__.lower())
ctx.dim = dim
ctx.keepdim = False if keepdim is None else keepdim
ctx.additional_args = additional_args
ctx.input_size = input.size()
if ctx.dim is None and cls.has_all_reduce:
value = fn(*additional_args)
ctx.indices_tuple = tuple(input.eq(value).nonzero()[0])
def __init__(self, dim=None):
super(_SelectionFunction, self).__init__()
self.dim = dim
def forward(self, input):
fn = getattr(input, type(self).__name__.lower())
self.input_size = input.size()
if self.dim is None and self.has_all_reduce:
value = fn(*self.additional_args)
self.indices = tuple(input.eq(value).nonzero()[0])
return input.new((value,))
else:
if ctx.dim is None:
if self.dim is None:
dim = input.dim() - 1
else:
dim = ctx.dim
dim = self.dim
args = (dim,)
if additional_args:
args = additional_args + args
if keepdim is not None:
output, indices = fn(*args, keepdim=keepdim)
else:
output, indices = fn(*args)
ctx.save_for_backward(indices)
ctx.mark_non_differentiable(indices)
if self.additional_args:
args = self.additional_args + args
output, indices = fn(*args)
self.save_for_backward(indices)
self.mark_non_differentiable(indices)
return output, indices
@classmethod
def backward(cls, ctx, grad_output, grad_indices=None):
grad_input = Variable(grad_output.data.new(*ctx.input_size).zero_())
if ctx.dim is None and cls.has_all_reduce:
grad_input[ctx.indices_tuple] = grad_output
def backward(self, grad_output, grad_indices=None):
grad_input = grad_output.new(*self.input_size).zero_()
if self.dim is None and self.has_all_reduce:
grad_input[self.indices] = grad_output[0]
else:
if ctx.dim is None:
dim = len(ctx.input_size) - 1
if self.dim is None:
dim = input.dim() - 1
else:
dim = ctx.dim
indices, = ctx.saved_variables
if ctx.keepdim is False and len(ctx.input_size) != 1:
grad_output = grad_output.unsqueeze(dim)
grad_indices = grad_indices.unsqueeze(dim)
indices = indices.unsqueeze(dim)
dim = self.dim
indices, = self.saved_tensors
grad_input.scatter_(dim, indices, grad_output)
return grad_input, None, None, None
return grad_input
class Max(_SelectionFunction):
@ -205,63 +165,53 @@ class Mode(_SelectionFunction):
class Median(_SelectionFunction):
pass
has_all_reduce = False
class Kthvalue(_SelectionFunction):
has_all_reduce = False
@classmethod
def forward(cls, ctx, input, k, dim=None, keepdim=None):
return super(Kthvalue, cls).forward(ctx, input, dim, keepdim, (k,))
def __init__(self, k, dim=None):
super(Kthvalue, self).__init__(dim)
self.additional_args = (k,)
class Norm(Function):
@staticmethod
def forward(ctx, input, p=2, dim=None, keepdim=None):
ctx.p = p
ctx.dim = dim
ctx.keepdim = False if keepdim is None else keepdim
def __init__(self, norm_type=2, dim=None):
super(Norm, self).__init__()
self.norm_type = norm_type
self.dim = dim
if dim is None:
ctx.norm = input.norm(p)
ctx.save_for_backward(input)
return input.new((ctx.norm,))
def forward(self, input):
if self.dim is None:
self.norm = input.norm(self.norm_type)
self.save_for_backward(input)
return input.new((self.norm,))
else:
if keepdim is not None:
output = input.norm(p, dim, keepdim=keepdim)
else:
output = input.norm(p, dim)
ctx.save_for_backward(input, output)
output = input.norm(self.norm_type, self.dim)
self.save_for_backward(input, output)
return output
@staticmethod
def backward(ctx, grad_output):
if ctx.dim is None:
input, = ctx.saved_variables
if ctx.p == 2:
scale_v = (grad_output / ctx.norm).expand_as(input)
return input.mul(scale_v), None, None, None
def backward(self, grad_output):
if self.dim is None:
input, = self.saved_tensors
if self.norm_type == 2:
return input.mul(grad_output[0] / self.norm)
else:
pow = input.abs().pow(ctx.p - 2)
scale_v = (grad_output / ctx.norm ** (ctx.p - 1)).expand_as(input)
return input.mul(pow).mul(scale_v), None, None, None
pow = input.abs().pow(self.norm_type - 2)
scale = grad_output[0] / self.norm ** (self.norm_type - 1)
return input.mul(pow).mul(scale)
else:
input, output = ctx.saved_variables
if ctx.keepdim is False and input.dim() != 1:
grad_output = grad_output.unsqueeze(ctx.dim)
output = output.unsqueeze(ctx.dim)
input, output = self.saved_tensors
big_grad_output = grad_output.expand_as(input)
if ctx.p == 2:
if self.norm_type == 2:
big_output = output.expand_as(input)
return input.mul(big_grad_output).div(big_output), None, None, None
return input.mul(big_grad_output).div(big_output)
else:
pow = input.abs().pow(ctx.p - 2)
big_output = output.pow(ctx.p - 1).expand_as(input)
return input.mul(pow).mul(big_grad_output).div(big_output), None, None, None
pow = input.abs().pow(self.norm_type - 2)
big_output = output.pow(self.norm_type - 1).expand_as(input)
return input.mul(pow).mul(big_grad_output).div(big_output)
# TODO: renorm

View File

@ -1,3 +0,0 @@
%s/self/ctx/g
%s/\s\+def forward/ @staticmethod\r def forward/g
%s/\s\+def backward/ @staticmethod\r @once_differentiable\r def backward/g

View File

@ -23,9 +23,8 @@ class Multinomial(StochasticFunction):
if probs.dim() == 1:
probs = probs.unsqueeze(0)
samples = samples.unsqueeze(0)
reward = reward.unsqueeze(0)
# normalize probs (multinomial accepts weights)
probs /= probs.sum(1, True).expand_as(probs)
probs /= probs.sum(1).expand_as(probs)
grad_probs = probs.new().resize_as_(probs).zero_()
output_probs = probs.gather(1, samples)
output_probs.add_(1e-6).reciprocal_()

File diff suppressed because it is too large Load Diff

View File

@ -1,39 +0,0 @@
import torch
def maybe_view(variable, size):
if variable.size() == size:
return variable
return variable.contiguous().view(size)
def maybe_unexpand(variable, old_size):
num_unsqueezed = variable.dim() - len(old_size)
expanded_dims = [dim for dim, (expanded, original)
in enumerate(zip(variable.size()[num_unsqueezed:], old_size))
if expanded != original]
for _ in range(num_unsqueezed):
variable = variable.sum(0, keepdim=False)
for dim in expanded_dims:
variable = variable.sum(dim, keepdim=True)
return variable
def variable_expandable(variable, old_size):
try:
torch._C._infer_size(variable.size(), old_size)
except RuntimeError:
return False
return True
def maybe_unexpand_or_view(variable, old_size):
var_expanded = True
if maybe_view:
var_expanded = variable_expandable(variable, old_size)
if var_expanded:
return maybe_unexpand(variable, old_size)
else:
return maybe_view(variable, old_size)

85
torch/autograd/engine.py Normal file
View File

@ -0,0 +1,85 @@
from collections import deque, defaultdict
from torch._C import _ImperativeEngine as ImperativeEngine
from .variable import Variable
class BasicEngine(object):
def _compute_dependencies(self, function):
dependencies = defaultdict(int)
seen = {function}
queue = [function]
while len(queue) > 0:
fn = queue.pop()
for prev_fn, output_nr in fn.previous_functions:
if not prev_fn.requires_grad or isinstance(prev_fn, Variable):
continue
dependencies[prev_fn] += 1
if prev_fn not in seen:
queue.append(prev_fn)
seen.add(prev_fn)
return dependencies
def _free_backward_dependency(self, dependencies, prev_fn):
dependencies[prev_fn] -= 1
if dependencies[prev_fn] == 0:
del dependencies[prev_fn]
return True
return False
def _add_grad(self, need_copy, prev_grad, output_nr, d_prev_fn):
copy_id = (id(prev_grad), output_nr)
if not prev_grad[output_nr]:
prev_grad[output_nr] = d_prev_fn
need_copy.add(copy_id)
else:
grad_tensor = prev_grad[output_nr]
if copy_id in need_copy:
need_copy.remove(copy_id)
grad_tensor = grad_tensor.clone()
prev_grad[output_nr] = grad_tensor
grad_tensor.add_(d_prev_fn)
def run_backward(self, variable, grad, retain_variables):
if variable.creator is None:
variable._do_backward((grad,), retain_variables)
return
initial_grad = [None for _ in range(variable.creator.num_outputs)]
initial_grad[variable.output_nr] = grad
ready = deque([(variable.creator, initial_grad)])
not_ready = {}
need_copy = set()
dependencies = self._compute_dependencies(variable.creator)
while len(ready) > 0:
fn, grad = ready.pop()
grad_input = fn._do_backward(tuple(grad), retain_variables)
for (prev_fn, output_nr), d_prev_fn in zip(fn.previous_functions, grad_input):
if not prev_fn.requires_grad:
# TODO: check that d_prev_fn is None and warn otherwise
continue
if isinstance(prev_fn, Variable):
prev_fn._do_backward((d_prev_fn,), retain_variables)
continue
is_ready = self._free_backward_dependency(dependencies, prev_fn)
if is_ready:
if prev_fn in not_ready:
prev_grad = not_ready[prev_fn]
self._add_grad(need_copy, prev_grad, output_nr, d_prev_fn)
else:
if prev_fn.num_outputs != 1:
raise RuntimeError("one of the function outputs "
"wasn't used - this is an error not, but "
"it's going to be fixed soon")
prev_grad = (d_prev_fn,)
ready.appendleft((prev_fn, prev_grad))
else:
if prev_fn in not_ready:
prev_grad = not_ready[prev_fn]
else:
prev_grad = [None for _ in range(prev_fn.num_outputs)]
self._add_grad(need_copy, prev_grad, output_nr, d_prev_fn)
not_ready[prev_fn] = prev_grad

View File

@ -1,12 +1,47 @@
import torch
import torch._C as _C
import torch.utils.hooks as hooks
from torch._six import with_metaclass
import functools
from collections import OrderedDict
class _ContextMethodMixin(object):
class Function(_C._FunctionBase):
"""Records operation history and defines formulas for differentiating ops.
Every operation performed on :class:`Variable` s creates a new function
object, that performs the computation, and records that it happened.
The history is retained in the form of a DAG of functions, with edges
denoting data dependencies (``input <- output``). Then, when backward is
called, the graph is processed in the topological ordering, by calling
:func:`backward` methods of each :class:`Function` object, and passing
returned gradients on to next :class:`Function` s.
Normally, the only way users interact with functions is by creating
subclasses and defining new operations. This is a recommended way of
extending torch.autograd.
Since Function logic is a hotspot in most scripts, almost all of it
was moved to our C backend, to ensure that the framework overhead is
minimal.
Each function is meant to be used only once (in the forward pass).
Attributes:
saved_tensors: Tuple of Tensors that were saved in the call to
:func:`forward`.
needs_input_grad: Tuple of booleans of length :attr:`num_inputs`,
indicating whether a given input requires gradient. This can be
used to optimize buffers saved for backward, and ignoring gradient
computation in :func:`~Function.backward`.
num_inputs: Number of inputs given to :func:`forward`.
num_outputs: Number of tensors returned by :func:`forward`.
requires_grad: Boolean indicating whether the :func:`backward` will
ever need to be called.
previous_functions: Tuple of (int, Function) pairs of length
:attr:`num_inputs`. Each entry contains a reference to a
:class:`Function` that created corresponding input, and an index
of the previous function output that's been used.
"""
__call__ = _C._FunctionBase._do_forward
def save_for_backward(self, *tensors):
"""Saves given tensors for a future call to :func:`~Function.backward`.
@ -15,10 +50,9 @@ class _ContextMethodMixin(object):
:func:`forward` **method.**
Later, saved tensors can be accessed through the :attr:`saved_tensors`
attribute; or, if the corresponding Variable is needed (e.g. for double
backwards), those can be accessed through the :attr:`saved_variables`
attribute. Before returning them to the user, a check is made, to ensure
they weren't used in any in-place operation that modified their content.
attribute. Before returning them to the user, a check is made, to
ensure they weren't used in any in-place operation that modified
their content.
Arguments can also be ``None``.
"""
@ -31,7 +65,7 @@ class _ContextMethodMixin(object):
:func:`forward` **method, and all arguments should be inputs.**
Every tensor that's been modified in-place in a call to :func:`forward`
should be given to this function, to ensure correctness of our checks.
should be given to this function, to ensure correcness of our checks.
It doesn't matter wheter the function is called before or after
modification.
"""
@ -72,9 +106,6 @@ class _ContextMethodMixin(object):
"""
self.non_differentiable = args
class _HookMixin(object):
@staticmethod
def _register_hook(backward_hooks, hook):
if backward_hooks is None:
@ -83,84 +114,7 @@ class _HookMixin(object):
backward_hooks[handle.id] = hook
return backward_hooks, handle
class BackwardCFunction(_C._FunctionBase, _ContextMethodMixin, _HookMixin):
_is_legacy = False
def apply(self, *args):
return self._forward_cls.backward(self, *args)
class FunctionMeta(type):
"""Function metaclass.
This metaclass sets up the following properties:
_is_legacy: True if forward is not defined as a static method.
_backward_cls: The Function class corresponding to the differentiated
version of this function (which is generated on the fly by this
metaclass).
"""
def __init__(cls, name, bases, attrs):
for super_cls in cls.mro():
forward = super_cls.__dict__.get('forward')
if forward is not None:
has_static_forward = isinstance(forward, staticmethod) or isinstance(forward, classmethod)
break
setattr(cls, '_is_legacy', not has_static_forward)
# old-style functions
if not has_static_forward:
return super(FunctionMeta, cls).__init__(name, bases, attrs)
backward_fn = type(name + 'Backward', (BackwardCFunction,), {'_forward_cls': cls})
setattr(cls, '_backward_cls', backward_fn)
return super(FunctionMeta, cls).__init__(name, bases, attrs)
class Function(with_metaclass(FunctionMeta, _C._FunctionBase, _ContextMethodMixin, _HookMixin)):
"""Records operation history and defines formulas for differentiating ops.
Every operation performed on :class:`Variable` s creates a new function
object, that performs the computation, and records that it happened.
The history is retained in the form of a DAG of functions, with edges
denoting data dependencies (``input <- output``). Then, when backward is
called, the graph is processed in the topological ordering, by calling
:func:`backward` methods of each :class:`Function` object, and passing
returned gradients on to next :class:`Function` s.
Normally, the only way users interact with functions is by creating
subclasses and defining new operations. This is a recommended way of
extending torch.autograd.
Since Function logic is a hotspot in most scripts, almost all of it
was moved to our C backend, to ensure that the framework overhead is
minimal.
Each function is meant to be used only once (in the forward pass).
Attributes:
saved_tensors: Tuple of Tensors that were saved in the call to
:func:`forward`.
saved_variables: Tuple of Variables that correspond to the tensors
saved in the call to :func:`forward`.
needs_input_grad: Tuple of booleans of length :attr:`num_inputs`,
indicating whether a given input requires gradient. This can be
used to optimize buffers saved for backward, and ignoring gradient
computation in :func:`~Function.backward`.
num_inputs: Number of inputs given to :func:`forward`.
num_outputs: Number of tensors returned by :func:`forward`.
requires_grad: Boolean indicating whether the :func:`backward` will
ever need to be called.
"""
# only for backward compatibility
__call__ = _C._FunctionBase._do_forward
@staticmethod
def forward(*args, **kwargs):
def forward(self, *input):
"""Performs the operation.
This function is to be overriden by all subclasses.
@ -169,8 +123,7 @@ class Function(with_metaclass(FunctionMeta, _C._FunctionBase, _ContextMethodMixi
"""
raise NotImplementedError
@staticmethod
def backward(*grad_outputs):
def backward(self, *grad_output):
"""Defines a formula for differentiating the operation.
This function is to be overriden by all subclasses.
@ -184,41 +137,6 @@ class Function(with_metaclass(FunctionMeta, _C._FunctionBase, _ContextMethodMixi
raise NotImplementedError
def once_differentiable(fn):
from .variable import Variable
@functools.wraps(fn)
def wrapper(ctx, *args):
tensor_args = [arg.data if isinstance(arg, Variable) else arg
for arg in args]
outputs = fn(ctx, *tensor_args)
# XXX: this is only an approximation of these flags - there's no way
# to figure out if fn didn't use ctx.saved_variables and as a result
# some Variables might require grad, even if no args do.
# Unfortunately, this leads to unexpected error messages ("no nodes
# require computing gradients"), but I don't have a better idea.
# These functions would raise an error in backward anyway.
volatile = any(arg.volatile if isinstance(arg, Variable) else False
for arg in args)
requires_grad = any(arg.requires_grad if isinstance(arg, Variable) else False
for arg in args)
if volatile:
def err_fn(*args):
return args
kwargs = {'volatile': True}
else:
err_fn = torch._C._functions.DelayedError(
b"trying to differentiate twice a function that was marked"
b"with @once_differentiable")
kwargs = {'requires_grad': requires_grad}
if not isinstance(outputs, tuple):
var = Variable(outputs, **kwargs) if outputs is not None else None
return err_fn(var)
return err_fn(*[Variable(o, **kwargs) if o is not None else None
for o in outputs])
return wrapper
class InplaceFunction(Function):
def __init__(self, inplace=False):

View File

@ -1,41 +1,31 @@
import torch
from torch.autograd import Variable
from collections import Iterable
def iter_variables(x):
def iter_gradients(x):
if isinstance(x, Variable):
if x.requires_grad:
yield (x.grad.data, x.data) if x.grad is not None else (None, None)
elif isinstance(x, Iterable):
yield x.grad.data if x.grad is not None else None
else:
for elem in x:
for result in iter_variables(elem):
for result in iter_gradients(elem):
yield result
def zero_gradients(x):
if isinstance(x, Variable):
if x.grad is not None:
x.grad.detach_()
x.grad.data.zero_()
elif isinstance(x, Iterable):
for elem in x:
zero_gradients(elem)
def zero_gradients(i):
for t in iter_gradients(i):
if t is not None:
t.zero_()
def make_jacobian(input, num_out):
if isinstance(input, Variable) and not input.requires_grad:
return None
elif torch.is_tensor(input) or isinstance(input, Variable):
if torch.is_tensor(input) or isinstance(input, Variable):
return torch.zeros(input.nelement(), num_out)
elif isinstance(input, Iterable):
jacobians = list(filter(
lambda x: x is not None, (make_jacobian(elem, num_out) for elem in input)))
if not jacobians:
return None
return type(input)(jacobians)
else:
return None
return type(input)(filter(lambda x: x is not None,
(make_jacobian(elem, num_out) for elem in input)))
def iter_tensors(x, only_requiring_grad=False):
@ -44,7 +34,7 @@ def iter_tensors(x, only_requiring_grad=False):
elif isinstance(x, Variable):
if x.requires_grad or not only_requiring_grad:
yield x.data
elif isinstance(x, Iterable):
else:
for elem in x:
for result in iter_tensors(elem, only_requiring_grad):
yield result
@ -55,9 +45,8 @@ def contiguous(input):
return input.contiguous()
elif isinstance(input, Variable):
return input.contiguous()
elif isinstance(input, Iterable):
else:
return type(input)(contiguous(e) for e in input)
return input
def get_numerical_jacobian(fn, input, target, eps=1e-3):
@ -81,9 +70,9 @@ def get_numerical_jacobian(fn, input, target, eps=1e-3):
for i in range(flat_tensor.nelement()):
orig = flat_tensor[i]
flat_tensor[i] = orig - eps
outa.copy_(fn(input), broadcast=False)
outa.copy_(fn(input))
flat_tensor[i] = orig + eps
outb.copy_(fn(input), broadcast=False)
outb.copy_(fn(input))
flat_tensor[i] = orig
outb.add_(-1, outa).div_(2 * eps)
@ -94,31 +83,21 @@ def get_numerical_jacobian(fn, input, target, eps=1e-3):
def get_analytical_jacobian(input, output):
jacobian = make_jacobian(input, output.numel())
jacobian_reentrant = make_jacobian(input, output.numel())
grad_output = output.data.clone().zero_()
flat_grad_output = grad_output.view(-1)
reentrant = True
correct_grad_sizes = True
for i in range(flat_grad_output.numel()):
flat_grad_output.zero_()
flat_grad_output[i] = 1
for jacobian_c in (jacobian, jacobian_reentrant):
zero_gradients(input)
output.backward(grad_output, create_graph=True)
for jacobian_x, (d_x, x) in zip(jacobian_c, iter_variables(input)):
if d_x is None:
jacobian_x[:, i].zero_()
else:
if d_x.size() != x.size():
correct_grad_sizes = False
jacobian_x[:, i] = d_x.to_dense() if d_x.is_sparse else d_x
zero_gradients(input)
output.backward(grad_output, retain_variables=True)
for jacobian_x, d_x in zip(jacobian, iter_gradients(input)):
if d_x is None:
jacobian_x[:, i].zero_()
else:
jacobian_x[:, i] = d_x.to_dense() if d_x.is_sparse else d_x
for jacobian_x, jacobian_reentrant_x in zip(jacobian, jacobian_reentrant):
if (jacobian_x - jacobian_reentrant_x).abs().max() != 0:
reentrant = False
return jacobian, reentrant, correct_grad_sizes
return jacobian
def _as_tuple(x):
@ -161,65 +140,21 @@ def gradcheck(func, inputs, eps=1e-6, atol=1e-5, rtol=1e-3):
def fn(input):
return _as_tuple(func(*input))[i].data
analytical, reentrant, correct_grad_sizes = get_analytical_jacobian(_as_tuple(inputs), o)
numerical = get_numerical_jacobian(fn, inputs, inputs, eps)
analytical = get_analytical_jacobian(_as_tuple(inputs), o)
for a, n in zip(analytical, numerical):
if not ((a - n).abs() <= (atol + rtol * n.abs())).all():
return False
if not reentrant:
return False
if not correct_grad_sizes:
return False
# check if the backward multiplies by grad_output
zero_gradients(inputs)
output = _as_tuple(func(*inputs))
torch.autograd.backward(output, [o.data.new(o.size()).zero_() for o in output])
var_inputs = list(filter(lambda i: isinstance(i, Variable), inputs))
if not var_inputs:
raise RuntimeError("no Variables found in input")
for i in var_inputs:
for i in inputs:
if i.grad is None:
continue
if not i.grad.data.eq(0).all():
return False
return True
def gradgradcheck(func, inputs, grad_outputs, eps=1e-6, atol=1e-5, rtol=1e-3):
"""Check gradients of gradients computed via small finite differences
against analytical gradients
This function checks that backpropagating through the gradients computed
to the given grad_outputs are correct.
The check between numerical and analytical has the same behaviour as
numpy.allclose https://docs.scipy.org/doc/numpy/reference/generated/numpy.allclose.html
meaning it check that
absolute(a - n) <= (atol + rtol * absolute(n))
is true for all elements of analytical gradient a and numerical gradient n.
Args:
func: Python function that takes Variable inputs and returns
a tuple of Variables
inputs: tuple of Variables
grad_outputs: tuple of Variables
eps: perturbation for finite differences
atol: absolute tolerance
rtol: relative tolerance
Returns:
True if all differences satisfy allclose condition
"""
def new_func(*input_args):
input_args = input_args[:-len(grad_outputs)]
outputs = func(*input_args)
outputs = _as_tuple(outputs)
input_args = tuple(x for x in input_args if isinstance(x, Variable) and x.requires_grad)
grad_inputs = torch.autograd.grad(outputs, input_args, grad_outputs)
return grad_inputs
return gradcheck(new_func, inputs + grad_outputs, eps, atol, rtol)

View File

@ -1,11 +1,10 @@
import sys
import torch
import torch._C as _C
from collections import OrderedDict
import torch.sparse as sparse
import torch.utils.hooks as hooks
import warnings
import weakref
from ._functions import *
class Variable(_C._VariableBase):
@ -14,7 +13,7 @@ class Variable(_C._VariableBase):
Variable is a thin wrapper around a Tensor object, that also holds
the gradient w.r.t. to it, and a reference to a function that created it.
This reference allows retracing the whole chain of operations that
created the data. If the Variable has been created by the user, its grad_fn
created the data. If the Variable has been created by the user, its creator
will be ``None`` and we call such objects *leaf* Variables.
Since autograd only supports scalar valued function differentiation, grad
@ -34,9 +33,8 @@ class Variable(_C._VariableBase):
inference mode, i.e. don't save the history. See
:ref:`excluding-subgraphs` for more details.
Can be changed only on leaf Variables.
is_leaf: Boolean indicating if the Variable is a graph leaf (i.e
if it was created by the user).
grad_fn: Gradient function graph trace.
creator: Function of which the variable was an output. For leaf
(user created) variables it's ``None``. Read-only attribute.
Parameters:
data (any tensor class): Tensor to wrap.
@ -62,30 +60,29 @@ class Variable(_C._VariableBase):
def __getattr__(self, name):
if name in self._fallthrough_methods:
return getattr(self.data, name)
return object.__getattribute__(self, name)
raise AttributeError(name)
def __getitem__(self, key):
if torch.is_tensor(key):
key = Variable(key) # auto-wrap tensors
if isinstance(key, Variable):
if type(key.data).__name__ == 'ByteTensor':
return MaskedSelect.apply(self, key)
elif type(key.data).__name__ == 'LongTensor':
return IndexSelect.apply(self, 0, key)
# else fall through and raise an error in Index
return Index.apply(self, key)
if (isinstance(key, Variable) and
type(key.data).__name__ == 'ByteTensor'):
return MaskedSelect()(self, key)
return Index(key)(self)
def __setitem__(self, key, value):
if isinstance(key, Variable) and type(key.data).__name__ == 'ByteTensor':
if (isinstance(key, Variable) and
type(key.data).__name__ == 'ByteTensor'):
if isinstance(value, Variable):
return MaskedScatter.apply(self, key, value, True)
return MaskedCopy(inplace=True)(self, key, value)
else:
return MaskedFill.apply(self, key, value, True)
return MaskedFill(value, inplace=True)(self, key)
else:
return SetItem.apply(self, key, value)
if isinstance(value, Variable):
return SetItem(key)(self, value)
else:
return SetItem(key, value)(self)
def __deepcopy__(self, memo):
if not self.is_leaf:
if self.creator is not None:
raise RuntimeError("Only Variables created explicitly by the user "
"(graph leaves) support the deepcopy protocol at the moment")
result = type(self)(self.data.clone())
@ -109,22 +106,14 @@ class Variable(_C._VariableBase):
# legacy serialization of Variable
self.data = state[0]
state = (state[3], state[4], state[2])
if not self.is_leaf:
if self.creator is not None:
raise RuntimeError('__setstate__ can be only called on leaf variables')
self.requires_grad, self.volatile, self._backward_hooks = state
def __repr__(self):
return 'Variable containing:' + self.data.__repr__()
def __bool__(self):
if self.data.numel() == 0:
return False
raise RuntimeError("bool value of Variable objects containing non-empty " +
torch.typename(self.data) + " is ambiguous")
__nonzero__ = __bool__
def backward(self, gradient=None, retain_graph=None, create_graph=None, retain_variables=None):
def backward(self, gradient=None, retain_variables=False):
"""Computes the gradient of current variable w.r.t. graph leaves.
The graph is differentiated using the chain rule. If the variable is
@ -133,27 +122,28 @@ class Variable(_C._VariableBase):
It should be a tensor of matching type and location, that contains
the gradient of the differentiated function w.r.t. ``self``.
This function accumulates gradients in the leaves - you might need to
zero them before calling it.
This function accumulates gradients in the leaves - you might need to zero
them before calling it.
Arguments:
grad_variables (Tensor, Variable or None): Gradient w.r.t. the
variable. If it is a tensor, it will be automatically converted
to a Variable that is volatile unless ``create_graph`` is True.
None values can be specified for scalar Variables or ones that
don't require grad. If a None value would be acceptable then
this argument is optional.
retain_graph (bool, optional): If False, the graph used to compute
the grads will be freed. Note that in nearly all cases setting
this option to True is not needed and often can be worked around
in a much more efficient way. Defaults to the value of
``create_graph``.
create_graph (bool, optional): If true, graph of the derivative will
be constructed, allowing to compute higher order derivative
products. Defaults to False, unless ``gradient`` is a volatile
Variable.
gradient (Tensor): Gradient of the differentiated function
w.r.t. the data. Required only if the data has more than one
element. Type and location should match these of ``self.data``.
retain_variables (bool): If ``True``, buffers necessary for computing
gradients won't be freed after use. It is only necessary to
specify ``True`` if you want to differentiate some subgraph multiple
times (in some cases it will be much more efficient to use
`autograd.backward`).
"""
torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
if self.volatile:
raise RuntimeError('calling backward on a volatile variable')
if gradient is None and self.requires_grad:
if self.data.numel() != 1:
raise RuntimeError(
'backward should be called only on a scalar (i.e. 1-element tensor) '
'or with gradient w.r.t. the variable')
gradient = self.data.new().resize_as_(self.data).fill_(1)
self._execution_engine.run_backward((self,), (gradient,), retain_variables)
def register_hook(self, hook):
"""Registers a backward hook.
@ -187,8 +177,8 @@ class Variable(_C._VariableBase):
"doesn't require gradient")
if self._backward_hooks is None:
self._backward_hooks = OrderedDict()
if self.grad_fn is not None:
self.grad_fn._register_hook_dict(self)
if self.creator is not None:
self.creator._register_hook_dict(self)
handle = hooks.RemovableHandle(self._backward_hooks)
self._backward_hooks[handle.id] = hook
return handle
@ -204,10 +194,10 @@ class Variable(_C._VariableBase):
reward(Tensor): Tensor with per-element rewards. It has to match
the device location and shape of Variable's data.
"""
if not isinstance(self.grad_fn, StochasticFunction):
if not isinstance(self.creator, StochasticFunction):
raise RuntimeError("reinforce() can be only called on outputs "
"of stochastic functions")
self.grad_fn._reinforce(reward)
self.creator._reinforce(reward)
def detach(self):
"""Returns a new Variable, detached from the current graph.
@ -222,61 +212,35 @@ class Variable(_C._VariableBase):
errors in correctness checks.
"""
result = NoGrad()(self) # this is needed, because it merges version counters
result._grad_fn = None
result._creator = None
return result
def detach_(self):
"""Detaches the Variable from the graph that created it, making it a
leaf.
"""
self._grad_fn = None
"""Detaches the Variable from the graph that created it, making it a leaf."""
self._creator = None
self.requires_grad = False
def retain_grad(self):
"""Enables .grad attribute for non-leaf Variables."""
if self.grad_fn is None: # no-op for leaves
return
if not self.requires_grad:
raise RuntimeError("can't retain_grad on Variable that has requires_grad=False")
if hasattr(self, 'retains_grad'):
return
weak_self = weakref.ref(self)
def retain_grad_hook(grad):
var = weak_self()
if var is None:
return
if var._grad is None:
var._grad = grad.clone()
else:
var._grad = var._grad + grad
self.register_hook(retain_grad_hook)
self.retains_grad = True
def contiguous(self):
self.data = self.data.contiguous()
return self
def clone(self):
return Clone.apply(self)
return Clone()(self)
def type(self, t):
if t != type(self.data):
return Type.apply(self, t)
return Type(t)(self)
return self
def type_as(self, t):
if isinstance(t, Variable):
t = t.data
return self.type(type(t))
return self.type(type(t.data))
def _get_type(self, name):
module = torch._import_dotted_name(self.data.__module__)
return getattr(module, name)
def cuda(self, device_id=None, async=False):
return CudaTransfer.apply(self, device_id, async)
return CudaTransfer(device_id, async)(self)
def cpu(self):
return self.type(getattr(torch, type(self.data).__name__))
@ -310,10 +274,10 @@ class Variable(_C._VariableBase):
def _add(self, other, inplace):
if isinstance(other, Variable):
return Add.apply(self, other, inplace)
return Add(inplace)(self, other)
else:
assert not torch.is_tensor(other)
return AddConstant.apply(self, other, inplace)
return AddConstant(other, inplace)(self)
def add(self, other):
return self._add(other, False)
@ -323,10 +287,10 @@ class Variable(_C._VariableBase):
def _sub(self, other, inplace):
if isinstance(other, Variable):
return Sub.apply(self, other, inplace)
return Sub(inplace=inplace)(self, other)
else:
assert not torch.is_tensor(other)
return SubConstant.apply(self, other, inplace)
return SubConstant(other, inplace=inplace)(self)
def sub(self, other):
return self._sub(other, False)
@ -336,181 +300,178 @@ class Variable(_C._VariableBase):
def mul(self, other):
if isinstance(other, Variable):
return Mul.apply(self, other)
return Mul()(self, other)
else:
assert not torch.is_tensor(other)
return MulConstant.apply(self, other)
return MulConstant(other)(self)
def mul_(self, other):
if not isinstance(other, Variable) and not torch.is_tensor(other):
return MulConstant.apply(self, other, True)
return MulConstant(other, inplace=True)(self)
raise RuntimeError("mul_ only supports scalar multiplication")
def div(self, other):
if isinstance(other, Variable):
return Div.apply(self, other)
return Div()(self, other)
else:
assert not torch.is_tensor(other)
return DivConstant.apply(self, other)
return DivConstant(other)(self)
def div_(self, other):
if not isinstance(other, Variable) and not torch.is_tensor(other):
return DivConstant.apply(self, other, True)
return DivConstant(other, inplace=True)(self)
raise RuntimeError("div_ only supports scalar multiplication")
def pow(self, other):
if isinstance(other, Variable):
return Pow.apply(self, other)
return Pow()(self, other)
else:
assert not torch.is_tensor(other)
return PowConstant.apply(self, other)
return PowConstant(other)(self)
def exp(self):
return Exp.apply(self)
return Exp()(self)
def exp_(self):
return Exp.apply(self, True)
return Exp(inplace=True)(self)
def log(self):
return Log.apply(self)
return Log()(self)
def log1p(self):
return Log1p.apply(self)
return Log1p()(self)
def neg(self):
return Negate.apply(self)
return Negate()(self)
def neg_(self):
return Negate.apply(self, True)
return Negate(inplace=True)(self)
def tanh(self):
return Tanh.apply(self)
return Tanh()(self)
def tanh_(self):
return Tanh.apply(self, True)
return Tanh(True)(self)
def sigmoid(self):
return Sigmoid.apply(self)
return Sigmoid()(self)
def sigmoid_(self):
return Sigmoid.apply(self, True)
return Sigmoid(True)(self)
def sin(self):
return Sin.apply(self)
return Sin()(self)
def cos(self):
return Cos.apply(self)
return Cos()(self)
def tan(self):
return Tan.apply(self)
return Tan()(self)
def asin(self):
return Asin.apply(self)
return Asin()(self)
def acos(self):
return Acos.apply(self)
return Acos()(self)
def atan(self):
return Atan.apply(self)
def atan2(self, x):
return Atan2.apply(self, x)
return Atan()(self)
def sinh(self):
return Sinh.apply(self)
return Sinh()(self)
def cosh(self):
return Cosh.apply(self)
return Cosh()(self)
def abs(self):
return Abs.apply(self)
return Abs()(self)
def clamp(self, min=None, max=None):
if min is None and max is None:
raise ValueError("clamp requires specifying at least one of "
"min and max arguments")
elif min is None and max is not None:
return CminConstant.apply(self, max)
return CminConstant(max)(self)
elif min is not None and max is None:
return CmaxConstant.apply(self, min)
return CmaxConstant(min)(self)
else:
return Clamp.apply(self, min, max)
return Clamp(min, max)(self)
def reciprocal(self):
return Reciprocal.apply(self)
return Reciprocal()(self)
def floor(self):
return Floor.apply(self)
return Floor()(self)
def ceil(self):
return Ceil.apply(self)
return Ceil()(self)
def frac(self):
return Frac.apply(self)
return Frac()(self)
def sqrt(self):
return Sqrt.apply(self)
return Sqrt()(self)
def round(self):
return Round.apply(self)
return Round()(self)
def sign(self):
return Sign.apply(self)
return Sign()(self)
def trunc(self):
return Trunc.apply(self)
return Trunc()(self)
def fmod(self, value):
return Fmod.apply(self, value)
return Fmod(value)(self)
def remainder(self, value):
return Remainder.apply(self, value)
return Remainder(value)(self)
def lerp(self, tensor, weight):
return Lerp.apply(self, tensor, weight)
return Lerp(weight)(self, tensor)
def rsqrt(self):
return Rsqrt.apply(self)
return Rsqrt()(self)
def sum(self, dim=None, keepdim=None):
return Sum.apply(self, dim, keepdim)
def sum(self, dim=None):
return Sum(dim)(self)
def prod(self, dim=None, keepdim=None):
return Prod.apply(self, dim, keepdim)
def prod(self, dim=None):
return Prod(dim)(self)
def mean(self, dim=None, keepdim=None):
return Mean.apply(self, dim, keepdim)
def mean(self, dim=None):
return Mean(dim)(self)
def max(self, dim=None, keepdim=None):
def max(self, dim=None):
if isinstance(dim, Variable):
return Cmax.apply(self, dim)
return Max.apply(self, dim, keepdim)
return Cmax()(self, dim)
return Max(dim)(self)
def min(self, dim=None, keepdim=None):
def min(self, dim=None):
if isinstance(dim, Variable):
return Cmin.apply(self, dim)
return Min.apply(self, dim, keepdim)
return Cmin()(self, dim)
return Min(dim)(self)
def mode(self, dim=None, keepdim=None):
return Mode.apply(self, dim, keepdim)
def mode(self, dim):
return Mode(dim)(self)
def median(self, dim=None, keepdim=None):
return Median.apply(self, dim, keepdim)
def median(self, dim):
return Median(dim)(self)
def kthvalue(self, k, dim=None, keepdim=None):
return Kthvalue.apply(self, k, dim, keepdim)
def kthvalue(self, dim):
return Kthvalue(dim)(self)
def sort(self, dim=None, descending=False):
return Sort.apply(self, dim, descending, True)
return Sort(dim, descending)(self)
def topk(self, k, dim=None, largest=True, sorted=True):
return Topk.apply(self, k, dim, largest, sorted, True)
return Topk(k, dim, largest, sorted)(self)
def view(self, *sizes):
return View.apply(self, sizes)
return View(*sizes)(self)
def view_as(self, tensor):
return View.apply(self, tensor.size())
return View(*tensor.size())(self)
def split(self, split_size, dim=0):
return torch.split(self, split_size, dim)
@ -520,45 +481,32 @@ class Variable(_C._VariableBase):
repeats = repeats[0]
else:
repeats = torch.Size(repeats)
return Repeat.apply(self, repeats)
return Repeat(repeats)(self)
def cumsum(self, dim):
return Cumsum.apply(self, dim)
return Cumsum(dim)(self)
def cumprod(self, dim):
return Cumprod.apply(self, dim)
def unfold(self, dim, size, step):
return Unfold.apply(self, dim, size, step)
def var(self, dim=None, keepdim=None, unbiased=True):
keepdim_ = False if keepdim is None else keepdim
mean = self.mean(dim, keepdim)
def var(self, dim=None, unbiased=True):
mean = self.mean(dim)
if dim is None:
mean = mean.view(*(1 for s in self.size()))
# we could just set keepdim to True, but this preserves some fidelity
elif keepdim_ is False and self.dim() != 1:
mean = mean.unsqueeze(dim)
mean_expanded = mean.expand_as(self)
zero_centered = self.sub(mean_expanded)
var = zero_centered.mul(zero_centered).sum(dim, keepdim=keepdim_)
var = zero_centered.mul(zero_centered).sum(dim)
numel = self.numel() if dim is None else self.size(dim)
return var.div(numel - int(unbiased))
def std(self, dim=None, keepdim=None, unbiased=True):
return self.var(dim, keepdim, unbiased).sqrt()
def std(self, dim=None, unbiased=True):
return self.var(dim, unbiased).sqrt()
def renorm(self, p, dim, maxnorm):
t = self.transpose(dim, 0)
flat = t.contiguous().view(self.size(0), -1)
norms = flat.norm(p, 1, True)
norms = flat.norm(p, 1)
norms = norms.clamp(max=maxnorm).div(norms.add(1e-7))
flat_out = flat.mul(norms.expand_as(flat))
return flat_out.view(t.size()).transpose(dim, 0)
def matmul(self, other):
return torch.matmul(self, other)
@staticmethod
def _static_blas(cls, args, inplace):
num_args = len(args)
@ -569,14 +517,14 @@ class Variable(_C._VariableBase):
alpha, beta = args[1:3]
if num_args == 4:
alpha = args[1]
return cls.apply(*(args[:1] + args[-2:] + (alpha, beta, inplace)))
return cls(alpha, beta, inplace)(*(args[:1] + args[-2:]))
def _blas(self, cls, args, inplace):
return self._static_blas(cls, (self,) + args, inplace)
def mm(self, matrix):
output = Variable(self.data.new(self.data.size(0), matrix.data.size(1)))
return Addmm.apply(output, self, matrix, 0, 1, True)
return self._static_blas(Addmm, (output, 0, 1, self, matrix), False)
def bmm(self, batch):
output = Variable(self.data.new(self.data.size(0), self.data.size(1),
@ -592,10 +540,10 @@ class Variable(_C._VariableBase):
return self._static_blas(Addr, (output, 0, 1, self, vector), False)
def resize(self, *sizes):
return Resize.apply(self, sizes)
return Resize(*sizes)(self)
def resize_as(self, variable):
return Resize.apply(self, variable.size())
return Resize(*variable.size())(self)
def addmm(self, *args):
return self._blas(Addmm, args, False)
@ -628,186 +576,170 @@ class Variable(_C._VariableBase):
return self._blas(Addr, args, True)
def dot(self, other):
return Dot.apply(self, other)
return Dot()(self, other)
def _addcop(self, op, args, inplace):
def _addcop(self, op, args):
if len(args) == 3:
# args == [scale, tensor1, tensor2]
return op.apply(self, args[1], args[2], args[0], inplace)
# scale, tensor1, tensor2
return op(args[0])(self, *args[1:])
else:
# args == [tensor1, tensor2]
return op.apply(self, args[0], args[1], 1.0, inplace)
# tensor1, tensor2
return op()(self, *args)
def addcmul(self, *args):
return self._addcop(Addcmul, args, False)
return self._addcop(Addcmul, args)
def addcdiv(self, *args):
return self._addcop(Addcdiv, args, False)
return self._addcop(Addcdiv, args)
def addcmul_(self, *args):
return self._addcop(Addcmul, args, True)
def addcdiv_(self, *args):
return self._addcop(Addcdiv, args, True)
def norm(self, p=2, dim=None, keepdim=None):
return Norm.apply(self, p, dim, keepdim)
def norm(self, p=2, dim=None):
return Norm(p, dim)(self)
def dist(self, tensor, p=2):
return Norm.apply(self - tensor, p)
return Norm(p)(self - tensor)
def index_add(self, dim, index, tensor):
return IndexAdd.apply(self, dim, index, tensor)
def _advanced_index_add(self, index, tensor):
return AdvancedIndexAdd.apply(self, index, tensor)
return IndexAdd(dim)(self, index, tensor)
def index_add_(self, dim, index, tensor):
return IndexAdd.apply(self, dim, index, tensor, True)
return IndexAdd(dim, True)(self, index, tensor)
def index_copy(self, dim, index, tensor):
return IndexCopy.apply(self, dim, index, tensor)
return IndexCopy(dim)(self, index, tensor)
def index_copy_(self, dim, index, tensor):
return IndexCopy.apply(self, dim, index, tensor, True)
return IndexCopy(dim, True)(self, index, tensor)
def index_fill(self, dim, index, value):
return IndexFill.apply(self, dim, index, value)
return IndexFill(dim, value)(self, index)
def index_fill_(self, dim, index, value):
return IndexFill.apply(self, dim, index, value, True)
return IndexFill(dim, value, True)(self, index)
def index_select(self, dim, index):
return IndexSelect.apply(self, dim, index)
return IndexSelect(dim)(self, index)
def gather(self, dim, index):
return Gather.apply(self, dim, index)
return Gather(dim)(self, index)
def scatter(self, dim, index, source):
return Scatter.apply(self, dim, index, source)
return Scatter(dim)(self, index, source)
def scatter_(self, dim, index, source):
return Scatter.apply(self, dim, index, source, True)
def scatter_add(self, dim, index, source):
return ScatterAdd.apply(self, dim, index, source)
def scatter_add_(self, dim, index, source):
return ScatterAdd.apply(self, dim, index, source, True)
return Scatter(dim, True)(self, index, source)
def masked_copy(self, mask, variable):
warnings.warn("masked_copy is deprecated and renamed to masked_scatter, and will be removed in v0.3")
return MaskedScatter.apply(self, mask, variable)
return MaskedCopy()(self, mask, variable)
def masked_copy_(self, mask, variable):
warnings.warn("masked_copy_ is deprecated and renamed to masked_scatter_, and will be removed in v0.3")
return MaskedScatter.apply(self, mask, variable, True)
def masked_scatter(self, mask, variable):
return MaskedScatter.apply(self, mask, variable)
def masked_scatter_(self, mask, variable):
return MaskedScatter.apply(self, mask, variable, True)
return MaskedCopy(True)(self, mask, variable)
def masked_fill(self, mask, value):
return MaskedFill.apply(self, mask, value)
return MaskedFill(value)(self, mask)
def masked_fill_(self, mask, value):
return MaskedFill.apply(self, mask, value, True)
return MaskedFill(value, True)(self, mask)
def masked_select(self, mask):
return MaskedSelect.apply(self, mask)
return MaskedSelect()(self, mask)
def expand(self, *sizes):
return Expand.apply(self, sizes)
if isinstance(sizes[0], torch.Size):
if len(sizes) > 1:
raise ValueError("expand expects a several ints or a single "
"torch.Size argument")
sizes = sizes[0]
return Expand(sizes)(self)
def expand_as(self, tensor):
return Expand.apply(self, (tensor.size(),))
return Expand(tensor.size())(self)
def t(self):
if self.dim() != 2:
raise RuntimeError("t() expects a 2D Variable, but self is {}D".format(self.dim()))
return Transpose.apply(self, 0, 1)
return Transpose(0, 1)(self)
def transpose(self, dim1, dim2):
return Transpose.apply(self, dim1, dim2)
return Transpose(dim1, dim2)(self)
def select(self, dim, _index):
dim = dim if dim >= 0 else dim + self.dim()
index = tuple(slice(None, None) for _ in range(dim)) + (_index,)
return Index.apply(self, index)
return Index(index)(self)
def narrow(self, dim, start_index, length):
dim = dim if dim >= 0 else dim + self.dim()
index = tuple(slice(None, None) for _ in range(dim)) + \
(slice(start_index, start_index + length),)
return Index.apply(self, index)
return Index(index)(self)
def chunk(self, num_chunks, dim=0):
return Chunk.apply(self, num_chunks, dim)
return Chunk(num_chunks, dim)(self)
def squeeze(self, dim=None):
return Squeeze.apply(self, dim)
def squeeze_(self, dim=None):
return Squeeze.apply(self, dim, True)
return Squeeze(dim)(self)
def unsqueeze(self, dim):
return Unsqueeze.apply(self, dim)
return Unsqueeze(dim)(self)
def permute(self, *permutation):
return Permute.apply(self, permutation)
return Permute(permutation)(self)
def diag(self, diagonal=0):
return Diag.apply(self, diagonal)
def diag(self, diagonal_idx=0):
return Diag(diagonal_idx)(self)
def tril(self, diagonal=0):
return Tril.apply(self, diagonal)
def tril(self, diagonal_idx=0):
return Tril(diagonal_idx)(self)
def triu(self, diagonal=0):
return Triu.apply(self, diagonal)
def triu(self, diagonal_idx=0):
return Triu(diagonal_idx)(self)
def trace(self):
return Trace.apply(self)
return Trace()(self)
def cross(self, other, dim=-1):
return Cross.apply(self, other)
return Cross(dim)(self, other)
def inverse(self):
return Inverse.apply(self)
def gesv(self, a):
return Gesv.apply(self, a)
def multinomial(self, num_samples=1, replacement=False):
return Multinomial(num_samples, replacement)(self)
def multinomial(self, num_samples=1, with_replacement=False):
return Multinomial(num_samples, with_replacement)(self)
def bernoulli(self):
return Bernoulli()(self)
def eq(self, other):
if isinstance(other, Variable):
return Eq()(self, other)
assert not torch.is_tensor(other), "can't compare Variable and tensor"
return Eq.apply(self, other)
return Eq(other)(self)
def ne(self, other):
if isinstance(other, Variable):
return Ne()(self, other)
assert not torch.is_tensor(other), "can't compare Variable and tensor"
return Ne.apply(self, other)
return Ne(other)(self)
def gt(self, other):
if isinstance(other, Variable):
return Gt()(self, other)
assert not torch.is_tensor(other), "can't compare Variable and tensor"
return Gt.apply(self, other)
return Gt(other)(self)
def ge(self, other):
if isinstance(other, Variable):
return Ge()(self, other)
assert not torch.is_tensor(other), "can't compare Variable and tensor"
return Ge.apply(self, other)
return Ge(other)(self)
def lt(self, other):
if isinstance(other, Variable):
return Lt()(self, other)
assert not torch.is_tensor(other), "can't compare Variable and tensor"
return Lt.apply(self, other)
return Lt(other)(self)
def le(self, other):
if isinstance(other, Variable):
return Le()(self, other)
assert not torch.is_tensor(other), "can't compare Variable and tensor"
return Le.apply(self, other)
return Le(other)(self)
def __add__(self, other):
return self.add(other)
@ -823,7 +755,7 @@ class Variable(_C._VariableBase):
return self.sub_(other)
def __rsub__(self, other):
return SubConstant.apply(other, self)
return SubConstant(other, sub_tensor=True)(self)
def __mul__(self, other):
return self.mul(other)
@ -833,16 +765,28 @@ class Variable(_C._VariableBase):
return self.mul_(other)
def __matmul__(self, other):
if not isinstance(other, Variable):
dim_self = self.dim()
try:
dim_other = other.dim()
except AttributeError: # not a Variable
return NotImplemented
return self.matmul(other)
if dim_self == 1 and dim_other == 1:
return self.dot(other)
if dim_self == 2 and dim_other == 1:
return self.mv(other)
if dim_self == 1 and dim_other == 2:
return self.unsqueeze(0).mm(other).squeeze(0)
elif dim_self == 2 and dim_other == 2:
return self.mm(other)
raise ValueError("both arguments to __matmul__ need to be 1D or 2D, "
"but they are {}D and {}D".format(dim_self, dim_other))
def __div__(self, other):
return self.div(other)
__truediv__ = __div__
def __rdiv__(self, other):
return DivConstant.apply(other, self)
return DivConstant(other, div_by_tensor=True)(self)
__rtruediv__ = __rdiv__
def __idiv__(self, other):
@ -855,10 +799,10 @@ class Variable(_C._VariableBase):
raise NotImplementedError("in-place pow not implemented")
def __rpow__(self, other):
return PowConstant.apply(other, self)
return PowConstant(other, tensor_power=True)(self)
def __neg__(self):
return Negate.apply(self)
return Negate()(self)
def __len__(self):
return len(self.data)
@ -894,7 +838,7 @@ class Variable(_C._VariableBase):
@staticmethod
def cat(iterable, dim=0):
return Concat.apply(dim, *iterable)
return Concat(dim)(*iterable)
@staticmethod
def normal(means, std=1):
@ -917,7 +861,7 @@ class Variable(_C._VariableBase):
tensors = args[1:]
else:
tensors = args
return cls.apply(*(tensors + (alpha, beta, inplace)))
return cls(alpha, beta, inplace)(*tensors)
@classmethod
def addmm(cls, *args):
@ -951,6 +895,5 @@ for method in dir(Variable):
setattr(Variable._torch, method, as_static)
from ._functions import *
from torch._C import _ImperativeEngine as ImperativeEngine
from .engine import ImperativeEngine
Variable._execution_engine = ImperativeEngine()

View File

@ -17,12 +17,6 @@ def _libcudnn():
if hasattr(lib, 'cudnnGetErrorString'):
lib.cudnnGetErrorString.restype = ctypes.c_char_p
__cudnn_version = lib.cudnnGetVersion()
compile_version = torch._C._cudnn_version()
# Check that cuDNN major and minor versions match
if (__cudnn_version // 100) != (compile_version // 100):
raise RuntimeError(
'cuDNN version mismatch: PyTorch was compiled against {} '
'but linked against {}'.format(compile_version, __cudnn_version))
else:
lib = None
return lib

View File

@ -163,9 +163,9 @@ def get_parameters(fn, handle, weight_buf):
# might as well merge the CUDNN ones into a single tensor as well
if linear_id == 0 or linear_id == num_linear_layers / 2:
assert filter_dim_a.prod() == filter_dim_a[0]
size = (filter_dim_a[0] * num_linear_layers // 2, filter_dim_a[2])
param = fn.weight_buf.new().set_(
weight_buf.storage(), offset, size)
weight_buf.storage(), offset,
filter_dim_a[0] * num_linear_layers // 2, filter_dim_a[2])
layer_params.append(param)
else:
assert cur_offset == offset
@ -178,13 +178,10 @@ def get_parameters(fn, handle, weight_buf):
def _copyParams(params_from, params_to):
assert len(params_from) == len(params_to)
for layer_params_from, layer_params_to in zip(params_from, params_to):
# NOTE: these lists have all weights before all biases, so if the layer doesn't
# use biases, zip will terminate once layer_params_from ends and ignore them.
for param_from, param_to in zip(layer_params_from, layer_params_to):
assert param_from.type() == param_to.type()
param_to.copy_(param_from, broadcast=False)
param_to.copy_(param_from)
def forward(fn, input, hx, weight, output, hy):
@ -245,21 +242,17 @@ def forward(fn, input, hx, weight, output, hy):
fn.cy_desc = cudnn.descriptor(cx) if cx is not None else None
# create the weight buffer and copy the weights into it
if fn.weight_buf is None:
num_weights = get_num_weights(
handle, fn.rnn_desc, fn.x_descs[0], fn.datatype)
fn.weight_buf = x.new(num_weights)
fn.w_desc = init_weight_descriptor(fn, fn.weight_buf)
w = fn.weight_buf
# this zero might not seem necessary, but it is in the case
# where biases are disabled; then they won't be copied and must be zero'd.
# Alternatively, _copyParams could be written more carefully.
w.zero_()
params = get_parameters(fn, handle, w)
_copyParams(weight, params)
else:
fn.w_desc = init_weight_descriptor(fn, fn.weight_buf)
w = fn.weight_buf
num_weights = get_num_weights(
handle, fn.rnn_desc, fn.x_descs[0], fn.datatype)
fn.weight_buf = x.new(num_weights)
fn.w_desc = init_weight_descriptor(fn, fn.weight_buf)
w = fn.weight_buf
# this zero might not seem necessary, but it is in the case
# where biases are disabled; then they won't be copied and must be zero'd.
# Alternatively, _copyParams could be written more carefully.
w.zero_()
params = get_parameters(fn, handle, w)
_copyParams(weight, params)
if tuple(hx.size()) != hidden_size:
raise RuntimeError('Expected hidden size {}, got {}'.format(
@ -276,9 +269,7 @@ def forward(fn, input, hx, weight, output, hy):
fn.x_descs,
ctypes.byref(workspace_size)
))
fn.workspace_size = workspace_size.value
with torch.cuda.device_of(input):
workspace = torch.cuda.ByteTensor(fn.workspace_size)
fn.workspace = torch.cuda.ByteTensor(workspace_size.value)
if fn.requires_grad:
reserve_size = ctypes.c_long()
check_error(lib.cudnnGetRNNTrainingReserveSize(
@ -301,7 +292,7 @@ def forward(fn, input, hx, weight, output, hy):
fn.y_descs, ctypes.c_void_p(y.data_ptr()),
fn.hy_desc, ctypes.c_void_p(hy.data_ptr()),
fn.cy_desc, ctypes.c_void_p(cy.data_ptr()) if cx is not None else None,
ctypes.c_void_p(workspace.data_ptr()), workspace.size(0),
ctypes.c_void_p(fn.workspace.data_ptr()), fn.workspace.size(0),
ctypes.c_void_p(fn.reserve.data_ptr()), fn.reserve.size(0)
))
else: # inference
@ -316,7 +307,7 @@ def forward(fn, input, hx, weight, output, hy):
fn.y_descs, ctypes.c_void_p(y.data_ptr()),
fn.hy_desc, ctypes.c_void_p(hy.data_ptr()),
fn.cy_desc, ctypes.c_void_p(cy.data_ptr()) if cx is not None else None,
ctypes.c_void_p(workspace.data_ptr()), workspace.size(0)
ctypes.c_void_p(fn.workspace.data_ptr()), fn.workspace.size(0)
))
if fn.batch_first and not is_input_packed:
@ -381,8 +372,6 @@ def backward_grad(fn, input, hx, weight, output, grad_output, grad_hy, grad_inpu
if not dhy.is_cuda or not dy.is_cuda or (dcy is not None and not dcy.is_cuda):
raise RuntimeError('Gradients aren\'t CUDA tensors')
with torch.cuda.device_of(input):
workspace = torch.cuda.ByteTensor(fn.workspace_size)
check_error(cudnn.lib.cudnnRNNBackwardData(
handle,
fn.rnn_desc,
@ -397,7 +386,7 @@ def backward_grad(fn, input, hx, weight, output, grad_output, grad_hy, grad_inpu
fn.x_descs, ctypes.c_void_p(dx.data_ptr()),
fn.hx_desc, ctypes.c_void_p(dhx.data_ptr()),
fn.cx_desc, ctypes.c_void_p(dcx.data_ptr()) if cx is not None else None,
ctypes.c_void_p(workspace.data_ptr()), workspace.size(0),
ctypes.c_void_p(fn.workspace.data_ptr()), fn.workspace.size(0),
ctypes.c_void_p(fn.reserve.data_ptr()), fn.reserve.size(0)
))
@ -450,8 +439,6 @@ def backward_weight(fn, input, hx, output, weight, grad_weight):
y = output
dw = fn.weight_buf.new().resize_as_(fn.weight_buf).zero_()
with torch.cuda.device_of(input):
workspace = torch.cuda.ByteTensor(fn.workspace_size)
check_error(cudnn.lib.cudnnRNNBackwardWeights(
handle,
fn.rnn_desc,
@ -459,7 +446,7 @@ def backward_weight(fn, input, hx, output, weight, grad_weight):
fn.x_descs, ctypes.c_void_p(x.data_ptr()),
fn.hx_desc, ctypes.c_void_p(hx.data_ptr()),
fn.y_descs, ctypes.c_void_p(y.data_ptr()),
ctypes.c_void_p(workspace.data_ptr()), workspace.size(0),
ctypes.c_void_p(fn.workspace.data_ptr()), fn.workspace.size(0),
fn.w_desc, ctypes.c_void_p(dw.data_ptr()),
ctypes.c_void_p(fn.reserve.data_ptr()), fn.reserve.size(0)
))

View File

@ -58,53 +58,18 @@ static std::unordered_map<std::string, Type> type_names = {
{"Int", Type::INT},
{"Long", Type::LONG},
};
static std::unordered_map<std::string, at::ScalarType> attype_names = {
{"Float", at::kFloat},
{"Double", at::kDouble},
{"Half", at::kHalf},
{"Byte", at::kByte},
{"Char", at::kChar},
{"Short", at::kShort},
{"Int", at::kInt},
{"Long", at::kLong},
};
static std::unordered_map<PyTypeObject*, TensorType> pytype_to_tensortype;
static std::unordered_map<TensorType, PyTypeObject*, TensorTypeHasher> tensortype_to_pytype;
static std::unordered_map<PyTypeObject*, at::Type*> pytype_to_attype;
static std::unordered_map<at::Type*, PyTypeObject*> attype_to_pytype;
void registerPyTypeObject(PyTypeObject *pytype, const std::string& name, bool is_cuda, bool is_sparse)
{
TensorType type;
at::Backend device;
if(is_cuda) {
if(is_sparse){
device = at::kSparseCUDA;
} else {
device = at::kCUDA;
}
} else {
if(is_sparse){
device = at::kSparseCPU;
} else {
device = at::kCPU;
}
}
type.data_type = type_names.at(name);
type.is_cuda = is_cuda;
type.is_sparse = is_sparse;
pytype_to_tensortype[pytype] = type;
tensortype_to_pytype[type] = pytype;
if(!(is_sparse && name == "Half")) {
at::Type * attype = &at::getType(device,attype_names.at(name));
pytype_to_attype[pytype] = attype;
attype_to_pytype[attype] = pytype;
}
}
PyTypeObject* getPyTypeObject(const thpp::Tensor& tensor)
@ -116,12 +81,6 @@ PyTypeObject* getPyTypeObject(const thpp::Tensor& tensor)
return tensortype_to_pytype.at(type);
}
PyTypeObject* getPyTypeObject(const at::Tensor& tensor)
{
if(attype_to_pytype.count(&tensor.type()) == 0)
throw std::invalid_argument("unsupported Tensor type.");
return attype_to_pytype.at(&tensor.type());
}
static std::unique_ptr<Tensor> createTensor(void *tensor, Type type, bool is_cuda, bool is_sparse)
{
@ -208,22 +167,6 @@ std::unique_ptr<Tensor> createTensor(PyObject *data)
wrapper->retain();
return wrapper;
}
//rename to createTensor when THPP is removed
at::Tensor createTensorAT(PyObject *data)
{
auto tensor_type = pytype_to_attype.at(Py_TYPE(data));
auto tensor = ((THPVoidTensor *)data)->cdata;
return tensor_type->unsafeTensorFromTH(tensor, true);
}
PyObject* createPyObject(at::Tensor tensor)
{
auto type = getPyTypeObject(tensor);
PyObject *obj = type->tp_alloc(type, 0);
if (obj) {
((THPVoidTensor*)obj)->cdata = (THVoidTensor *)tensor.detach()->unsafeGetTH(true);
}
return obj;
}
PyObject* createPyObject(const thpp::Tensor& tensor)
{

View File

@ -2,10 +2,9 @@
// Provides conversions between Python tensor objects and thpp::Tensors.
#include <Python.h>
#include <memory>
#include <Python.h>
#include <THPP/THPP.h>
#include <ATen/ATen.h>
namespace torch {
@ -23,9 +22,4 @@ std::unique_ptr<thpp::Tensor> createTensor(PyObject *data);
// Creates Python tensor object from a Tensor
PyObject* createPyObject(const thpp::Tensor& tensor);
PyObject* createPyObject(at::Tensor tensor);
PyTypeObject* getPyTypeObject(const at::Tensor& tensor);
//rename to createPyObject when THPP is removed
at::Tensor createTensorAT(PyObject *data);
} // namespace torch

View File

@ -48,7 +48,6 @@ struct python_error : public std::exception {
/** Sets the current Python error from this exception */
inline void restore() {
if (!type) return;
// PyErr_Restore steals references
AutoGIL gil;
Py_XINCREF(type);
@ -64,6 +63,22 @@ struct python_error : public std::exception {
#ifdef _THP_CORE
struct THException: public std::exception {
THException(const char* msg): msg(msg) {};
virtual const char* what() const throw() {
return msg.c_str();
}
std::string msg;
};
struct THArgException: public THException {
THArgException(const char* msg, int argNumber): THException(msg), argNumber(argNumber) {};
const int argNumber;
};
bool THPException_init(PyObject *module);
#endif

View File

@ -33,7 +33,7 @@ static PyObject * THPGenerator_pynew(PyTypeObject *type, PyObject *args, PyObjec
THPUtils_setError("torch.Generator constructor doesn't accept any arguments");
return NULL;
}
THPGeneratorPtr self((THPGenerator *)type->tp_alloc(type, 0));
THPGeneratorPtr self = (THPGenerator *)type->tp_alloc(type, 0);
self->cdata = THGenerator_new();
return (PyObject*)self.release();
@ -44,7 +44,7 @@ static PyObject * THPGenerator_getState(THPGenerator *self)
{
HANDLE_TH_ERRORS
THGenerator *generator = self->cdata;
THPByteTensorPtr res((THPByteTensor *)THPByteTensor_NewEmpty());
THPByteTensorPtr res = (THPByteTensor *)THPByteTensor_NewEmpty();
if (!res) return NULL;
THByteTensor_getRNGState(generator, res->cdata);
return (PyObject *)res.release();

View File

@ -6,7 +6,6 @@
#include <unordered_map>
#include <libshm.h>
#include <TH/TH.h>
#include <ATen/ATen.h>
#include "torch/csrc/utils/python_strings.h"
@ -64,7 +63,7 @@ static PyObject * THPModule_initNames(PyObject *self, PyObject *arg)
{
static std::vector<std::string> names;
THPObjectPtr types(PySequence_Fast(arg, "expected a sequence"));
THPObjectPtr types = PySequence_Fast(arg, "expected a sequence");
if (!types) return NULL;
int num_classes = PySequence_Fast_GET_SIZE(types.get());
@ -74,7 +73,7 @@ static PyObject * THPModule_initNames(PyObject *self, PyObject *arg)
THPUtils_assert(PyType_Check(obj), "expected a PyTypeObject");
PyTypeObject* type = (PyTypeObject*)obj;
THPObjectPtr module_name(PyObject_GetAttrString(obj, "__module__"));
THPObjectPtr module_name = PyObject_GetAttrString(obj, "__module__");
if (!module_name) return NULL;
THPUtils_assert(THPUtils_checkString(module_name.get()),
"expected __module__ to be a string");
@ -214,7 +213,6 @@ dispatch: \
IMPLEMENT_STATELESS(sigmoid)
IMPLEMENT_STATELESS(log)
IMPLEMENT_STATELESS(log1p)
IMPLEMENT_STATELESS(lgamma)
IMPLEMENT_STATELESS(exp)
IMPLEMENT_STATELESS(cos)
IMPLEMENT_STATELESS(acos)
@ -467,64 +465,6 @@ PyObject *THPModule_addDocStr(PyObject *_unused, PyObject *args)
Py_RETURN_NONE;
}
PyObject *THPModule_inferSize(PyObject *_unused, PyObject *args)
{
HANDLE_TH_ERRORS
Py_ssize_t num_args = args ? PyTuple_Size(args) : 0;
THPUtils_assert(num_args == 2, "expected exactly 2 arguments");
PyObject *arg1 = PyTuple_GET_ITEM(args, 0);
THPUtils_assert(THPSize_Check(arg1), "expected a torch.Size as argument 1");
PyObject *arg2 = PyTuple_GET_ITEM(args, 1);
THPUtils_assert(THPSize_Check(arg2), "expected a torch.Size as argument 2");
THLongStoragePtr size1_guard = THPUtils_unpackSize(arg1);
THLongStorage *size1 = size1_guard.get();
THLongStoragePtr size2_guard = THPUtils_unpackSize(arg2);
THLongStorage *size2 = size2_guard.get();
THLongStoragePtr sizes_guard(THLongStorage_new());
THLongStorage *sizes = sizes_guard.get();
char error_buffer[1024];
int ret = THLongStorage_inferSize2(sizes, size1->data, size1->size, size2->data, size2->size, error_buffer, 1024);
THPUtils_assert(ret == 0, error_buffer);
return THPSize_New(sizes->size, sizes->data);
END_HANDLE_TH_ERRORS
}
static PyObject *THPModule_setBackcompatBroadcastWarn(PyObject *module, PyObject *arg) {
THPUtils_assert(PyBool_Check(arg), "set_backcompat_broadcast_warn expects a bool, "
"but got %s", THPUtils_typename(arg));
setBackCompatBroadcastWarn(arg == Py_True);
Py_RETURN_NONE;
}
static PyObject *THPModule_getBackcompatBroadcastWarn(PyObject *module)
{
return getBackCompatBroadcastWarn() ? Py_True : Py_False;
}
static PyObject *THPModule_setBackcompatKeepdimWarn(PyObject *module, PyObject *arg) {
THPUtils_assert(PyBool_Check(arg), "set_backcompat_keepdim_warn expects a bool, "
"but got %s", THPUtils_typename(arg));
setBackCompatKeepdimWarn(arg == Py_True);
Py_RETURN_NONE;
}
static PyObject *THPModule_getBackcompatKeepdimWarn(PyObject *module)
{
return getBackCompatKeepdimWarn() ? Py_True : Py_False;
}
PyObject *THPModule_hasDistributed(PyObject *_unused)
{
#ifdef WITH_DISTRIBUTED
Py_RETURN_TRUE;
#else
Py_RETURN_FALSE;
#endif
}
#ifdef WITH_CUDA
extern PyObject * THCPModule_initExtension(PyObject *self);
extern PyObject * THCPModule_setDevice_wrap(PyObject *self, PyObject *arg);
@ -557,7 +497,6 @@ static PyMethodDef TorchMethods[] = {
{"_add_docstr", (PyCFunction)THPModule_addDocStr, METH_VARARGS, NULL},
{"_sparse_init", (PyCFunction)THSPModule_initExtension, METH_NOARGS, NULL},
{"_init_names", (PyCFunction)THPModule_initNames, METH_O, NULL},
{"_has_distributed",(PyCFunction)THPModule_hasDistributed, METH_NOARGS, NULL},
#ifdef WITH_CUDA
{"_cuda_init", (PyCFunction)THCPModule_initExtension, METH_NOARGS, NULL},
{"_cuda_setDevice", (PyCFunction)THCPModule_setDevice_wrap, METH_O, NULL},
@ -584,11 +523,6 @@ static PyMethodDef TorchMethods[] = {
#endif
{"_safe_call", (PyCFunction)THPModule_safeCall, METH_VARARGS | METH_KEYWORDS, NULL},
{"_set_default_tensor_type", (PyCFunction)THPModule_setDefaultTensorType, METH_O, NULL},
{"_infer_size", (PyCFunction)THPModule_inferSize, METH_VARARGS, NULL},
{"_set_backcompat_broadcast_warn", (PyCFunction)THPModule_setBackcompatBroadcastWarn, METH_O, NULL},
{"_get_backcompat_broadcast_warn", (PyCFunction)THPModule_getBackcompatBroadcastWarn, METH_NOARGS, NULL},
{"_set_backcompat_keepdim_warn", (PyCFunction)THPModule_setBackcompatKeepdimWarn, METH_O, NULL},
{"_get_backcompat_keepdim_warn", (PyCFunction)THPModule_getBackcompatKeepdimWarn, METH_NOARGS, NULL},
{"get_num_threads", (PyCFunction)THPModule_getNumThreads, METH_NOARGS, NULL},
{"set_num_threads", (PyCFunction)THPModule_setNumThreads, METH_O, NULL},
{"from_numpy", (PyCFunction)THPModule_fromNumpy, METH_O, NULL},
@ -596,7 +530,6 @@ static PyMethodDef TorchMethods[] = {
{"sigmoid", (PyCFunction)THPModule_sigmoid, METH_VARARGS | METH_KEYWORDS, NULL},
{"log", (PyCFunction)THPModule_log, METH_VARARGS | METH_KEYWORDS, NULL},
{"log1p", (PyCFunction)THPModule_log1p, METH_VARARGS | METH_KEYWORDS, NULL},
{"lgamma", (PyCFunction)THPModule_lgamma, METH_VARARGS | METH_KEYWORDS, NULL},
{"exp", (PyCFunction)THPModule_exp, METH_VARARGS | METH_KEYWORDS, NULL},
{"cos", (PyCFunction)THPModule_cos, METH_VARARGS | METH_KEYWORDS, NULL},
{"acos", (PyCFunction)THPModule_acos, METH_VARARGS | METH_KEYWORDS, NULL},
@ -719,6 +652,22 @@ static PyMethodDef TorchMethods[] = {
{NULL, NULL, 0, NULL}
};
static void errorHandler(const char *msg, void *data)
{
throw THException(msg);
}
static void errorHandlerArg(int argNumber, const char *msg, void *data)
{
throw THArgException(msg, argNumber);
}
static void updateErrorHandlers()
{
THSetDefaultErrorHandler(errorHandler, NULL);
THSetDefaultArgErrorHandler(errorHandlerArg, NULL);
}
bool THCPDoubleStorage_init(PyObject *module);
bool THCPFloatStorage_init(PyObject *module);
bool THCPHalfStorage_init(PyObject *module);
@ -778,7 +727,6 @@ PyMODINIT_FUNC init_C()
PyMODINIT_FUNC PyInit__C()
#endif
{
THInferNumThreads();
#if PY_MAJOR_VERSION == 2
#define ASSERT_TRUE(cmd) if (!(cmd)) {PyErr_SetString(PyExc_ImportError, "initialization error"); return;}
@ -883,7 +831,8 @@ PyMODINIT_FUNC PyInit__C()
Py_INCREF(has_cudnn);
ASSERT_TRUE(PyModule_AddObject(module, "has_cudnn", has_cudnn) == 0);
#ifdef WITH_DISTRIBUTED_MW
// TODO THD: enable once master-worker mode is implemented
#if 0 && defined(WITH_DISTRIBUTED)
// See comment on CUDA objects
ASSERT_TRUE(THDPDoubleStorage_init(module));
ASSERT_TRUE(THDPFloatStorage_init(module));
@ -908,9 +857,7 @@ PyMODINIT_FUNC PyInit__C()
ASSERT_TRUE(THPDefaultGenerator != nullptr);
ASSERT_TRUE(PyModule_AddObject(module, "default_generator", (PyObject*)THPDefaultGenerator) == 0);
// force ATen to initialize because it handles
// setting up TH Errors so that they throw C++ exceptions
at::init();
updateErrorHandlers();
#ifdef WITH_NUMPY
import_array();

View File

@ -1,100 +0,0 @@
# csrc
The csrc directory contains all of the code concerned with integration
with Python. This is in contrast to lib, which contains the Torch
libraries that are Python agnostic. csrc depends on lib, but not vice
versa.
There are a number of utilities for easing integration with Python which
are worth knowing about, which we briefly describe here. But the most
important gotchas:
* DO NOT forget to take out the GIL with `AutoGil` before calling Python
API or bringing a `THPObjectPtr` into scope.
* Make sure you include `Python.h` first in your header files, before
any system headers; otherwise, you will get `error: "_XOPEN_SOURCE" redefined`
error. If you pay attention to warnings, you will see where you need to
do this.
## Notes
### Note [Storage is not NULL]
Historically, Torch supported NULL storage, as a minor optimization to
avoid having to allocate a storage object when it would be empty.
However, this is actually a confusing special case to deal with, so
by-in-large, PyTorch assumes that, in fact, storage is never NULL.
One important case where this assumption is important is when tracking
the CUDA device a tensor is stored in: this information is stored
solely in the storage, so if a storage is NULL, we lose this information.
Although storage is never NULL, the data field of THStorage may be NULL. This
mostly occurs when we want to pre-allocate an output tensor struct, but then
have it be resized and filled with data by some operator: there's no point in
allocating data for it in this case!
## Files
### `Exceptions.h`
Frequently when working with the Python API, you may call a function
which returns an error. In this case, we want to return directly to the
Python interpreter, so that this exception can be propagated
accordingly; however, because the Python API is C-based, what actually
will happen is it will return control to whatever C++ code called it.
Similarly, if we raise a C++ exception, prior to returning to the Python
interpreter, we must set the Python error flags, so it turns into a C++
exception.
Exceptions defines some useful helpers: `HANDLE_TH_ERRORS`, `END_HANDLE_TH_ERRORS`
and an exception class `python_error`. You call them like this:
```
// Entry point from Python interpreter
PyObject* run() {
HANDLE_TH_ERRORS
...
if (!x) throw python_error();
...
END_HANDLE_TH_ERRORS
}
```
The `HANDLE_TH_ERRORS` macro will catch all exceptions and convert them
into an appropriate Python signal. `python_error` is a special
exception which doesn't contain any info, instead it says, "An error
occurred in the Python API; if you return to the interpreter, Python
will raise that exception, nothing else needs to be done."
### `utils/auto_gil.h`
Whenever you make any calls to the Python API, you must have taken out
the Python GIL, as none of these calls are thread safe. `AutoGIL` is
a RAII struct which handles taking and releasing the GIL. Use it like
this:
```
void iWantToUsePython() {
AutoGil gil;
...
}
```
In general, the compiler will NOT warn you if you use Python
functionality without taking out the GIL, so DO NOT FORGET this call.
### `utils/object_ptr.h`
`THPPointer` is a smart pointer class analogous to `std::shared_ptr`,
but which is overloaded to handle reference counting scheme of various
objects which are not based on `shared_ptr`. The most important overloads are:
* `PyObject` (so important we've aliased it as `THPObjectPtr`), which
hooks into Python reference counting. (By the way, that means you
MUST take out the GIL before bringing one of these into scope!)
* The various TH tensor and storage types (e.g., `THTensor`), which
hook into TH's reference counting. (TH's reference counting
IS thread safe, no locks necessary.)

View File

@ -25,7 +25,7 @@ PyObject * THPSize_New(int dim, long *sizes)
static PyObject * THPSize_pynew(PyTypeObject *type, PyObject *args, PyObject *kwargs)
{
THPObjectPtr self(PyTuple_Type.tp_new(type, args, kwargs));
THPObjectPtr self = PyTuple_Type.tp_new(type, args, kwargs);
if (self) {
for (Py_ssize_t i = 0; i < PyTuple_Size(self); ++i) {
PyObject *item = PyTuple_GET_ITEM(self.get(), i);
@ -56,12 +56,13 @@ extern PyTypeObject THPSizeType;
template<typename FnType, FnType fn, typename ...Args>
static PyObject* wrap_tuple_fn(Args ... args)
{
THPObjectPtr result((*fn)(std::forward<Args>(args)...));
PyObject *result = (*fn)(std::forward<Args>(args)...);
if (!result) return NULL;
if (PyTuple_Check(result.get())) {
return PyObject_CallFunctionObjArgs((PyObject*)&THPSizeType, result.get(), NULL);
if (PyTuple_Check(result)) {
return PyObject_CallFunctionObjArgs((PyObject*)&THPSizeType, result, NULL);
}
return result.release();
Py_INCREF(result);
return result;
}
static auto sq_concat = PyTuple_Type.tp_as_sequence->sq_concat;

View File

@ -1,14 +1,12 @@
#ifndef THP_H
#define THP_H
#include <Python.h>
#include <stdbool.h>
#include <TH/TH.h>
#include <THS/THS.h>
// Back-compatibility macros, Thanks to http://cx-oracle.sourceforge.net/
// define PyInt_* macros for Python 3.x. NB: We must include Python.h first,
// otherwise we'll incorrectly conclude PyInt_Check isn't defined!
// define PyInt_* macros for Python 3.x
#ifndef PyInt_Check
#define PyInt_Check PyLong_Check
#define PyInt_FromLong PyLong_FromLong
@ -20,7 +18,6 @@
#define LIBRARY_STATE
#define LIBRARY_STATE_NOARGS
#define LIBRARY_STATE_TYPE
#define LIBRARY_STATE_TYPE_NOARGS
#define THP_API extern "C"

View File

@ -9,8 +9,12 @@
#include <tuple>
#include <TH/THMath.h>
#include "torch/csrc/THP.h"
#include "torch/csrc/copy_utils.h"
#include "torch/csrc/DynamicTypes.h"
#include "THP.h"
#include "copy_utils.h"
#include "DynamicTypes.h"
//generic_include TH torch/csrc/generic/Tensor.cpp
#include "generic/Tensor.cpp"
#include <TH/THGenerateAllTypes.h>
#include "generic/Tensor.cpp"
#include <TH/THGenerateHalfType.h>

View File

@ -29,33 +29,11 @@ void StorageWeakRefAllocator::free(void* ptr) {
#ifdef WITH_NUMPY
/**
* Note [Numpy memory management]
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* For efficiency reasons, when a user converts to/from numpy arrays,
* we want to share the underlying storage. This means that if we
* turn a Numpy array into a Torch tensor, the Torch tensor must
* keep the Numpy array alive, and vice versa for conversions in
* the other direction.
*
* A Torch tensor keeps its backing Numpy array alive using the custom allocator
* THNumpyArrayAllocator (backed by NumpyArrayAllocator), which holds a
* THPObjectPointer to the Numpy PyArrayObject, and nulls it out upon free.
* The relevant code is in torch/csrc/generic/Tensor.cpp.
*
* A Numpy array keeps its backing Torch tensor alive using the base object
* <https://docs.scipy.org/doc/numpy-dev/reference/c-api.array.html#c.PyArray_SetBaseObject>
* field of Numpy, which is Numpy's hook for allowing an external user to
* manage memory. The relevant code is in
* torch/csrc/generic/methods/TensorSerialization.cwrap
*/
// See Note [Numpy memory management]
void* NumpyArrayAllocator::realloc(void* ptr, ptrdiff_t size) {
PyArrayObject *array_ptr = (PyArrayObject*)object.get();
if (array_ptr && ptr == PyArray_DATA(array_ptr)) {
void* newPtr = this->malloc(size);
memcpy(newPtr, ptr, std::min((size_t) size, (size_t) PyArray_NBYTES(array_ptr)));
memcpy(newPtr, ptr, std::min(size, PyArray_NBYTES(array_ptr)));
// Whee! We're done!
object = nullptr;
return newPtr;
@ -63,7 +41,7 @@ void* NumpyArrayAllocator::realloc(void* ptr, ptrdiff_t size) {
return allocator->realloc(allocatorContext, ptr, size);
}
// See Note [Numpy memory management]
void NumpyArrayAllocator::free(void* ptr) {
PyArrayObject *array_ptr = (PyArrayObject*)object.get();
if (!array_ptr || ptr != PyArray_DATA(array_ptr))
@ -101,7 +79,6 @@ THAllocator THStorageWeakRefAllocator = {
};
#ifdef WITH_NUMPY
// See Note [Numpy memory management]
THAllocator THNumpyArrayAllocator = {
malloc_wrapper<NumpyArrayAllocator>,
realloc_wrapper<NumpyArrayAllocator>,

View File

@ -1,33 +0,0 @@
## Autograd
Autograd is a hotspot for PyTorch performance, so most of the heavy lifting is
implemented in C++. This implies that we have to do some shuffling between
Python and C++; and in general, we want data to be in a form that is convenient
to manipulate from C++.
Our general model is that for any key data type that autograd manipulates,
there are two implementations: a C++ type and a Python object type. For
example, consider variables in autograd: we have both `Variable` in `variable.h`
(the C++ type) and `THPVariable` in `python_variable.h` (the Python type.)
(By the way, THP stands for TorcH Python, not to be confused with THPP, TorcH
C++). `Variable` contains the payload of a variable, while `THPVariable` just
contains a `shared_ptr` reference to `Variable`, as well as references to other
Python objects which the Python runtime needs to know about. A lot of
data accessor implementations in `python_variable.cpp` simply reach through
to the underlying `Variable` and return the appropriate value.
The most complicated application of this principle is Function, which also
supports users implementing custom behavior in Python. We have the following
classes:
* `Function` in `function.h`, the C++ type.
* `THPFunction` in `python_function.h`, the Python object type. In
`python_function.cpp`, you can see the boilerplate that tells the Python
interpreter about this object.
* `PyFunction` in `python_function.h`, a subclass of `Function` which forwards
`apply` to a Python `THPFunction`. (NOT a Python object, despite its name!)
Outside of `PyFunction`, the C++ objects largely avoid referencing Python
objects (there are a few exceptions, like `pyobj` in `Variable`, and
`PyFunction`, whose whole point is to let C++ call into Python). And `pyobj`
in `Function` to ensure uniqueness of the associated python wrapper (if it exists).

View File

@ -1,11 +1,8 @@
#include "torch/csrc/autograd/engine.h"
#include "torch/csrc/autograd/functions/basic_ops.h"
#include "torch/csrc/utils/auto_gpu.h"
#include <atomic>
#include <condition_variable>
#include <cstdint>
#include <functional>
#include <iostream>
#include <mutex>
#include <set>
@ -25,25 +22,15 @@ using thpp::Tensor;
namespace torch { namespace autograd {
// XXX: Changes to the way multithreading works in execute should be done with
// great care. Right now the implementation guarantees that a single function's
// apply will never be entered concurrently (even if multiple graphs are
// executed at the same time). Adding multiple threads per-device or removing
// engine thread affinity to the device can break this invariant, and we depend
// on it in a few places (e.g. AccumulateGrad function).
struct FunctionTask {
GraphTask* base;
BackwardTask* base;
std::shared_ptr<Function> fn;
// This buffer serves as an implicit "addition" node for all of the
// gradients flowing here. Once all the dependencies are finished, we
// use the contents of this buffer to run the function.
InputBuffer inputs;
GradBuffer grad;
FunctionTask(GraphTask* base, std::shared_ptr<Function> fn, InputBuffer inputs)
FunctionTask(BackwardTask* base, std::shared_ptr<Function> fn, GradBuffer grad)
: base(base)
, fn(fn)
, inputs(std::move(inputs)) {}
, grad(std::move(grad)) {}
};
struct ReadyQueue {
@ -55,32 +42,26 @@ struct ReadyQueue {
FunctionTask pop_back();
};
struct GraphTask {
struct BackwardTask {
std::exception_ptr exception;
// Indicates if an error occurred while executing any task. When this is
// true, it signals all threads to stop executing.
std::atomic_bool has_error;
std::atomic<uint64_t> outstanding_tasks;
bool keep_graph;
bool has_any_work;
bool retain_variables;
bool node_requires_grad;
std::mutex mutex;
// Notified when a task finishes executing. Check outstanding_tasks to see
// if all tasks are done.
std::condition_variable not_done;
const Engine::callback_map& function_callbacks;
std::unordered_map<Function*, InputBuffer> not_ready;
std::unordered_map<Function*, GradBuffer> not_ready;
std::unordered_map<Function*, int> dependencies;
GraphTask(bool keep_graph, const Engine::callback_map& function_callbacks)
BackwardTask(bool retain_variables)
: exception()
, has_error(false)
, outstanding_tasks(0)
, keep_graph(keep_graph)
, has_any_work(false)
, retain_variables(retain_variables)
, node_requires_grad(false)
, mutex()
, not_done()
, function_callbacks(function_callbacks)
, not_ready()
, dependencies() {}
};
@ -107,9 +88,7 @@ Engine::Engine() : ready_queues() {
// This Engine's ReadyQueues and their corresponding threads are leaked here
Engine::~Engine() = default;
auto Engine::thread_main(std::shared_ptr<ReadyQueue> queue, int device) -> void {
THInferNumThreads();
AutoGPU guard(device);
auto Engine::thread_main(std::shared_ptr<ReadyQueue> queue) -> void {
while (1) {
FunctionTask task = queue->pop_back();
if (!task.base->has_error.load()) {
@ -134,73 +113,78 @@ auto Engine::thread_on_exception(FunctionTask& task, std::exception& e) -> void
}
}
static variable_list call_pre_hooks(Function& fn, variable_list inputs) {
static variable_list call_pre_hooks(Function& fn, variable_list grad_output) {
for (auto& hook : fn.pre_hooks) {
inputs = (*hook)(inputs);
grad_output = (*hook)(grad_output);
}
return inputs;
return grad_output;
}
static variable_list call_post_hooks(Function& fn, variable_list outputs, variable_list inputs) {
static variable_list call_post_hooks(Function& fn, variable_list grad_input, variable_list grad_output) {
for (auto& hook : fn.post_hooks) {
outputs = (*hook)(outputs, inputs);
grad_input = (*hook)(grad_input, grad_output);
}
return outputs;
return grad_input;
}
static variable_list call_function(FunctionTask& task) {
auto& fn = *task.fn;
auto inputs = call_pre_hooks(fn, InputBuffer::variables(std::move(task.inputs)));
auto& function_callbacks = task.base->function_callbacks;
auto callback_it = function_callbacks.find(&fn);
if (callback_it != function_callbacks.end()) {
auto& callback = callback_it->second;
if (!callback(&fn, inputs)) return variable_list(fn.next_functions.size());
}
auto fn_outputs = fn.apply(inputs);
return call_post_hooks(fn, std::move(fn_outputs), std::move(inputs));
auto grad_output = call_pre_hooks(*task.fn, GradBuffer::variables(std::move(task.grad)));
auto grad_input = task.fn->apply(grad_output);
return call_post_hooks(*task.fn, std::move(grad_input), std::move(grad_output));
}
auto Engine::evaluate_function(FunctionTask& task) -> void {
auto outputs = call_function(task);
auto grad_inputs = call_function(task);
auto& fn = *task.fn;
if (!task.base->keep_graph) {
if (!task.base->retain_variables) {
fn.releaseVariables();
}
if (outputs.size() != fn.next_functions.size()) {
if (grad_inputs.size() != fn.previous_functions.size()) {
std::stringstream ss;
ss << "Function '" << fn.name() << "' returned an invalid number of outputs - expected ";
ss << fn.next_functions.size() << ", but got " << outputs.size();
ss << "Function '" << fn.name() << "' returned an invalid number of gradients - expected ";
ss << fn.previous_functions.size() << ", but got " << grad_inputs.size();
throw std::runtime_error(ss.str());
}
int num_outputs = outputs.size();
for (int i = 0; i < num_outputs; ++i) {
auto& output = outputs[i];
auto& next_fn = fn.next_functions[i].first;
int input_nr = fn.next_functions[i].second;
int size = grad_inputs.size();
for (int i = 0; i < size; ++i) {
auto& grad_input = grad_inputs[i];
auto& prev_fn = fn.previous_functions[i].first;
int output_nr = fn.previous_functions[i].second;
if (!next_fn) {
// null inputs have no previous_function and we skip them here
if (!prev_fn) {
continue;
}
// Stochastic functions are placed in the ready queue by
// compute_dependencies, so we have to skip them here.
if (next_fn->is_stochastic || !next_fn->is_executable) {
// compute_dependencies, so we can skip them here.
if (prev_fn->is_stochastic || !prev_fn->requires_grad) {
continue;
}
std::lock_guard<std::mutex> lock(task.base->mutex);
// Check if the next function is ready to be computed
if (auto var = dynamic_cast<Variable*>(prev_fn.get())) {
if (!grad_input) {
// NOTE: grad_input can be NULL if the function returns None for a
// non_differentiable input. We may need to track additional information
// at the function level to determine if a NULL grad_input is an error.
std::stringstream ss;
ss << "Function '" << fn.name() << "' missing gradient at " << i;
throw std::runtime_error(ss.str());
}
var->backward(grad_input);
continue;
}
// Check if the function is ready for backward
bool is_ready = false;
auto& dependencies = task.base->dependencies;
auto it = dependencies.find(next_fn.get());
auto it = dependencies.find(prev_fn.get());
if (it == dependencies.end()) {
auto name = next_fn->name();
auto name = prev_fn->name();
throw std::runtime_error(std::string("dependency not found for ") + name);
} else if (--it->second == 0) {
dependencies.erase(it);
@ -208,24 +192,24 @@ auto Engine::evaluate_function(FunctionTask& task) -> void {
}
auto& not_ready = task.base->not_ready;
auto not_ready_it = not_ready.find(next_fn.get());
auto not_ready_it = not_ready.find(prev_fn.get());
if (not_ready_it == not_ready.end()) {
// No buffers have been allocated for the function
InputBuffer input_buffer(next_fn->num_inputs);
input_buffer.add(input_nr, std::move(output));
GradBuffer prev_buffer(prev_fn->num_outputs);
prev_buffer.addGrad(output_nr, std::move(grad_input));
if (is_ready) {
auto& queue = ready_queue(input_buffer.device());
queue.push_front(FunctionTask(task.base, next_fn, std::move(input_buffer)));
auto& queue = ready_queue(prev_buffer.device());
queue.push_front(FunctionTask(task.base, prev_fn, std::move(prev_buffer)));
} else {
not_ready.emplace(next_fn.get(), std::move(input_buffer));
not_ready.emplace(prev_fn.get(), std::move(prev_buffer));
}
} else {
// The function already has a buffer
auto &input_buffer = not_ready_it->second;
input_buffer.add(input_nr, std::move(output));
auto &prev_buffer = not_ready_it->second;
prev_buffer.addGrad(output_nr, std::move(grad_input));
if (is_ready) {
auto& queue = ready_queue(input_buffer.device());
queue.push_front(FunctionTask(task.base, next_fn, std::move(input_buffer)));
auto& queue = ready_queue(prev_buffer.device());
queue.push_front(FunctionTask(task.base, prev_fn, std::move(prev_buffer)));
not_ready.erase(not_ready_it);
}
}
@ -233,30 +217,30 @@ auto Engine::evaluate_function(FunctionTask& task) -> void {
}
/** Finds all stochastic functions and appends them to the queue */
auto Engine::find_stochastic_functions(function_queue& queue, Function* graph_root, GraphTask& task) -> void {
std::unordered_set<Function*> seen {graph_root};
function_queue search_queue {graph_root};
auto Engine::find_stochastic_functions(function_queue& queue, BackwardTask& task) -> void {
std::unordered_set<Function*> seen;
function_queue search_queue(queue);
while (search_queue.size() > 0) {
auto fn = search_queue.back(); search_queue.pop_back();
for (auto& next_fn_pair : fn->next_functions) {
auto& next_fn = next_fn_pair.first;
Function* next_ptr = next_fn.get();
if (!next_ptr) continue;
if (next_ptr->is_stochastic && next_ptr->is_executable && seen.count(next_ptr) == 0) {
ready_queue(-1).push_front(FunctionTask(&task, next_fn, InputBuffer(0)));
queue.push_back(next_ptr);
task.has_any_work = true;
for (auto& prev_fn_pair : fn->previous_functions) {
auto& prev_fn = prev_fn_pair.first;
Function* prev_ptr = prev_fn.get();
if (!prev_ptr) continue;
if (prev_ptr->is_stochastic && prev_ptr->requires_grad && seen.count(prev_ptr) == 0) {
ready_queue(-1).push_front(FunctionTask(&task, prev_fn, GradBuffer(0)));
queue.push_back(prev_ptr);
task.node_requires_grad = true;
}
if (seen.count(next_ptr) == 0) {
seen.insert(next_ptr);
search_queue.push_back(next_ptr);
if (seen.count(prev_ptr) == 0) {
seen.insert(prev_ptr);
search_queue.push_back(prev_ptr);
}
}
}
}
/** Computes the number of dependencies for each function which requires grad */
auto Engine::compute_dependencies(function_queue queue, GraphTask& task) -> void {
auto Engine::compute_dependencies(function_queue queue, BackwardTask& task) -> void {
// Just to make sure that they will never be added to the queue again
std::unordered_set<Function*> seen(queue.begin(), queue.end());
@ -265,97 +249,99 @@ auto Engine::compute_dependencies(function_queue queue, GraphTask& task) -> void
auto& dependencies = task.dependencies;
while (queue.size() > 0) {
auto fn = std::move(queue.back()); queue.pop_back();
for (auto& next_fn_pair : fn->next_functions) {
Function* next_ptr = next_fn_pair.first.get();
if (!next_ptr) continue;
if (!next_ptr->is_executable) continue;
if (next_ptr->is_stochastic) continue; // Stochastic nodes were in the queue already
dependencies[next_ptr] += 1;
if (seen.count(next_ptr) == 0) {
seen.insert(next_ptr);
queue.push_back(next_ptr);
// This is needed only to filter out backward roots that don't require grad
if (!fn->requires_grad) continue;
for (auto& prev_fn_pair : fn->previous_functions) {
Function* prev_ptr = prev_fn_pair.first.get();
if (!prev_ptr) continue;
if (dynamic_cast<Variable*>(prev_ptr)) continue;
if (!prev_ptr->requires_grad) continue;
if (prev_ptr->is_stochastic) continue; // Stochastic nodes were in the queue already
dependencies[prev_ptr] += 1;
if (seen.count(prev_ptr) == 0) {
seen.insert(prev_ptr);
queue.push_back(prev_ptr);
}
}
}
}
struct ClearCallbacks {
ClearCallbacks(std::vector<std::function<void()>>& callbacks,
std::mutex &callbacks_lock)
: callbacks(callbacks)
, callbacks_lock(callbacks_lock) { clear(); }
~ClearCallbacks() { clear(); }
void clear() {
std::lock_guard<std::mutex> lock(callbacks_lock);
callbacks.clear();
}
std::vector<std::function<void()>>& callbacks;
std::mutex& callbacks_lock;
};
auto Engine::execute(const function_list& input_roots,
variable_list& inputs,
bool keep_graph,
const callback_map& callbacks) -> void {
std::call_once(start_threads_flag, &Engine::start_threads, this);
// Callbacks are only valid for the duration of this run and should always be cleared
ClearCallbacks _cb_guard(post_callbacks, post_callbacks_lock);
GraphTask graph_task(keep_graph, callbacks);
std::unique_lock<std::mutex> lock(graph_task.mutex);
auto graph_root = std::make_shared<GraphRoot>(input_roots, inputs);
function_queue roots;
for (auto entry : input_roots) {
if (entry.first->is_executable) {
graph_task.has_any_work = true;
roots.push_back(graph_root.get());
ready_queue(-1).push_front(FunctionTask(&graph_task, graph_root, InputBuffer(0)));
break;
auto Engine::find_creators(const variable_list& variables,
tensor_list& grad_variables,
BackwardTask& task) -> function_queue {
function_queue creators;
std::unordered_map<std::shared_ptr<Function>, std::unique_ptr<GradBuffer>> creator_grad;
int size = variables.size();
for (int i = 0; i < size; ++i) {
auto& var = variables[i];
auto& grad = grad_variables[i];
if (!var->creator) {
// If someone calls .backward() on a leaf, it's simple...
if (var->requires_grad) {
var->backward(std::make_shared<Variable>(std::move(grad), false, true));
task.node_requires_grad = true;
}
} else {
auto& creator = var->creator;
auto& buf = creator_grad[creator];
if (creator->requires_grad) {
if (!buf) buf.reset(new GradBuffer(creator->num_outputs));
buf->addGrad(var->output_nr, Variable::of(std::move(grad)));
}
}
}
// Search the graph and find all stochastic functions. Append them to the queue.
find_stochastic_functions(roots, graph_root.get(), graph_task);
for (auto& entry: creator_grad) {
const auto& creator = entry.first;
creators.push_back(creator.get());
if (creator->requires_grad) {
// NOTE: buf is null if creator doesn't require gradient
auto& buf = entry.second;
auto& queue = ready_queue(buf->device());
queue.push_front(FunctionTask(&task, creator, std::move(*buf)));
task.node_requires_grad = true;
}
}
if (!graph_task.has_any_work) {
return creators;
}
auto Engine::backward(const variable_list& variables,
tensor_list& grad_variables,
bool retain_variables) -> void {
static std::once_flag once_flag;
std::call_once(once_flag, &Engine::start_threads, this);
BackwardTask backward_task(retain_variables);
std::unique_lock<std::mutex> lock(backward_task.mutex);
// Find the unique creators and backprop into variables which don't have creators.
auto creators = find_creators(variables, grad_variables, backward_task);
// Search the graph and find all stochastic functions. Append them to the queue.
find_stochastic_functions(creators, backward_task);
if (!backward_task.node_requires_grad) {
throw std::runtime_error(
"there are no graph nodes that require computing gradients");
}
// Now compute the dependencies for all executable functions
compute_dependencies(std::move(roots), graph_task);
// Now compute the dependencies for each function which requires grad
compute_dependencies(std::move(creators), backward_task);
// Wait for all tasks to complete
graph_task.not_done.wait(lock, [&graph_task]{
return graph_task.outstanding_tasks.load() == 0;
// wait for all tasks to complete
backward_task.not_done.wait(lock, [&backward_task]{
return backward_task.outstanding_tasks.load() == 0;
});
// Check for an exception while running backwards
if (graph_task.has_error.load()) {
std::rethrow_exception(graph_task.exception);
// check for an exception while running backwards
if (backward_task.has_error.load()) {
std::rethrow_exception(backward_task.exception);
}
if (!graph_task.not_ready.empty()) {
if (!backward_task.not_ready.empty()) {
throw std::runtime_error("could not compute gradients for some functions");
}
// Unlocking is necessary, because the callback can register
// more callbacks (or they can be registered from other threads
// while it's waiting.
std::unique_lock<std::mutex> cb_lock(post_callbacks_lock);
for (std::size_t i = 0; i < post_callbacks.size(); ++i) {
cb_lock.unlock();
post_callbacks[i]();
cb_lock.lock();
}
}
void Engine::queue_callback(std::function<void()> callback) {
std::lock_guard<std::mutex> lock(post_callbacks_lock);
post_callbacks.emplace_back(std::move(callback));
}
auto Engine::ready_queue(int device) -> ReadyQueue& {
@ -371,12 +357,10 @@ auto Engine::start_threads() -> void {
num_devices = 0;
}
#endif
int num_threads = num_devices + 1;
ready_queues = std::vector<std::shared_ptr<ReadyQueue>>(num_threads);
for (int i = 0; i < num_threads; ++i) {
auto& queue = ready_queues[i];
ready_queues = std::vector<std::shared_ptr<ReadyQueue>>(num_devices + 1);
for (auto& queue : ready_queues) {
queue.reset(new ReadyQueue());
std::thread t(&Engine::thread_main, this, queue, i - 1);
std::thread t(&Engine::thread_main, this, queue);
t.detach();
}
}

View File

@ -3,22 +3,20 @@
// Engine implements backpropagation from output variables and their gradients
// to "root" variables (variables created by the user with requires_grad=True).
#include <Python.h>
#include <deque>
#include <memory>
#include <unordered_map>
#include <utility>
#include <vector>
#include <functional>
#include "torch/csrc/autograd/function.h"
#include "torch/csrc/autograd/input_buffer.h"
#include "torch/csrc/autograd/grad_buffer.h"
namespace torch { namespace autograd {
struct ReadyQueue;
struct FunctionTask;
struct GraphTask;
struct BackwardTask;
// A single instance of this struct should be created through the whole process lifetime.
// The worker thread creation logic and Engine's destructor rely on this.
@ -26,39 +24,31 @@ struct Engine {
Engine();
virtual ~Engine();
using ready_queue_type = std::deque<std::pair<std::shared_ptr<Function>, InputBuffer>>;
using ready_queue_type = std::deque<std::pair<std::shared_ptr<Function>, GradBuffer>>;
using function_queue = std::vector<Function*>;
using dependencies_type = std::unordered_map<Function*, int>;
using callback_type = std::function<bool (Function*, variable_list&)>;
using callback_map = std::unordered_map<Function*, callback_type>;
// Given a list of (Function, input number) pairs computes the value of the graph
// by following next_function references.
void execute(
const function_list& roots,
variable_list& inputs,
bool keep_graph,
const callback_map& callbacks = callback_map());
void queue_callback(std::function<void()> callback);
// Given a list of output variables and their gradients, computes the
// gradients of "root" variables by backpropagation.
void backward(
const variable_list& variables,
tensor_list& grad_variables,
bool retain_variables);
protected:
function_queue find_roots(
const function_list& roots,
variable_list& inputs,
GraphTask& task);
void find_stochastic_functions(function_queue& queue, Function* graph_root, GraphTask& task);
void compute_dependencies(function_queue queue, GraphTask& task);
function_queue find_creators(
const variable_list& variables,
tensor_list& grad_variables,
BackwardTask& task);
void find_stochastic_functions(function_queue& queue, BackwardTask& task);
void compute_dependencies(function_queue queue, BackwardTask& task);
void evaluate_function(FunctionTask& task);
ReadyQueue& ready_queue(int device);
void start_threads();
virtual void thread_main(std::shared_ptr<ReadyQueue> queue, int device);
virtual void thread_main(std::shared_ptr<ReadyQueue> queue);
virtual void thread_on_exception(FunctionTask& task, std::exception& e);
std::once_flag start_threads_flag;
std::vector<std::shared_ptr<ReadyQueue>> ready_queues;
std::vector<std::function<void()>> post_callbacks;
std::mutex post_callbacks_lock;
};
}} // namespace torch::autograd

View File

@ -10,22 +10,22 @@ namespace torch { namespace autograd {
auto Function::flags(const variable_list& inputs) -> FunctionFlags {
int num_inputs = inputs.size();
FunctionFlags f;
f.is_executable = false;
f.requires_grad = false;
f.is_volatile = false;
f.next_functions.resize(num_inputs);
f.previous_functions.resize(num_inputs);
for (int i = 0; i != num_inputs; ++i) {
auto& var = inputs[i];
if (var) {
f.is_executable |= var->requires_grad;
f.requires_grad |= var->requires_grad;
f.is_volatile |= var->is_volatile;
if (var->grad_fn) {
f.next_functions[i] = std::make_pair<>(var->grad_fn, var->output_nr);
if (var->creator) {
f.previous_functions[i] = std::make_pair<>(var->creator, var->output_nr);
} else {
f.next_functions[i] = std::make_pair<>(var->get_grad_accumulator(), 0);
f.previous_functions[i] = std::make_pair<>(var, 0);
}
}
}
f.is_executable &= !f.is_volatile;
f.requires_grad &= !f.is_volatile;
return f;
}

View File

@ -1,19 +1,18 @@
#pragma once
// Function is an abstract class that represents a single operation from one or
// more variables to one more or variables.
// more variables to one more or varaibles.
//
// Subclasses may represent "forward" or "backward" operations (i.e functions
// and their derivatives). Some functions may be used as both.
#include <Python.h>
#include "torch/csrc/autograd/function_hook.h"
#include <THPP/THPP.h>
#include <memory>
#include <THPP/THPP.h>
#include <vector>
#include "torch/csrc/autograd/saved_variable.h"
#include "torch/csrc/autograd/function_hook.h"
namespace torch { namespace autograd {
struct Function;
@ -25,37 +24,30 @@ using function_list = std::vector<std::pair<std::shared_ptr<Function>, int>>;
// State used to create "backward" functions
struct FunctionFlags {
// Roughly speaking, is_executable corresponds to requires_grad.
// See http://pytorch.org/docs/notes/autograd.html for more details:
// both is_executable and is_volatile specify whether or not backwards
// gradient computation will be performed for a function, but they differ in
// their precedence.
bool is_executable = false;
bool requires_grad = false;
bool is_volatile = false;
// What functions take the output of this function as input.
// There is one function per output of this function.
function_list next_functions;
function_list previous_functions;
};
struct Function {
Function()
: num_inputs(0)
, next_functions()
, is_executable(false)
: num_outputs(0)
, previous_functions()
, requires_grad(false)
, is_volatile(false)
, is_stochastic(false)
, pre_hooks()
, post_hooks()
, pyobj(nullptr)
{}
Function(FunctionFlags&& flags)
: num_inputs(0)
, next_functions(std::move(flags.next_functions))
, is_executable(flags.is_executable)
: num_outputs(0)
, previous_functions(std::move(flags.previous_functions))
, requires_grad(flags.requires_grad)
, is_volatile(flags.is_volatile)
, is_stochastic(false)
, pre_hooks()
, post_hooks()
, pyobj(nullptr)
{}
Function(const Function& other) = delete;
@ -65,7 +57,7 @@ struct Function {
// Implements the operation
virtual variable_list apply(const variable_list& inputs) = 0;
// Computes is_executable, is_volatile, and next_functions from a list
// Computes requires_grad, is_volatile, and previous_functions from a list
// of input variables
static FunctionFlags flags(const variable_list& inputs);
@ -75,24 +67,21 @@ struct Function {
// Function name for debugging
virtual std::string name();
inline bool should_compute_output(int i) const {
auto& fn = next_functions[i].first;
return fn && fn->is_executable;
inline bool needs_input_grad(int i) const {
auto& fn = previous_functions[i].first;
return fn && fn->requires_grad;
}
inline void set_flags(FunctionFlags&& flags) {
is_executable = flags.is_executable;
next_functions = std::move(flags.next_functions);
}
int num_inputs;
function_list next_functions;
bool is_executable;
// These variables are usually only meaningful for "backward" functions.
// num_outputs is the number of outputs of corresponding "forward" function;
// it's actually the number of inputs of this function.
int num_outputs;
function_list previous_functions;
bool requires_grad;
bool is_volatile;
bool is_stochastic;
std::vector<std::shared_ptr<FunctionPreHook>> pre_hooks;
std::vector<std::shared_ptr<FunctionPostHook>> post_hooks;
PyObject *pyobj; // weak reference
};

Some files were not shown because too many files have changed in this diff Show More