Summary:
When tracing scripted functions, we used to only allow Tensor arguments.
This enables tracing script modules with List[Tensor] or Tuple[Tensor, Tensor] arguments (passing
tuples).
Fixes: #13566
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13597
Differential Revision: D12990464
Pulled By: soumith
fbshipit-source-id: fdce3afcb1e09f3c26d6ce834c01bf18d261f47c
Summary:
Also, add an assertion in the GraphExecutor to make sure we don't
access memory out of bounds.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13553
Differential Revision: D12924796
Pulled By: soumith
fbshipit-source-id: ea2a134084538484178b8ebad33d6716a8e1d633
Summary:
Enables almost all `modernize-*` checks in clang-tidy. This warns against things such as:
- Use of `const std::string&` instead of new-style `std::string` + move,
- Using old-style loops instead of range-for loops,
- Use of raw `new`
- Use of `push_back` instead of `emplace_back`
- Use of `virtual` together with `override` (`override` is sufficient)
ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13196
Differential Revision: D12891837
Pulled By: goldsborough
fbshipit-source-id: 4d0f782a09eb391ee718d3d66f74c095ee121c09
Summary:
This PR changes the compiler to correctly emit in-place operators for augmented assignments (`+=` and friends).
- To better match the Python AST structure, add an `AugAssign` tree view and make `Assign` apply only to `=` assignments.
- Emit those `AugAssign` exprs in the compiler, dispatching to in-place aten ops for tensors and lowering to simple assignments for scalar types.
- In order to preserve (suspect) ONNX export semantics, add a pass to lower the in-place operators to out-of-place operators.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13364
Differential Revision: D12899734
Pulled By: suo
fbshipit-source-id: bec83be0062cb0235eb129aed78d6110a9e2c146
Summary:
Goodbye, World! This PR removes the world tokens and associated pass and switches lists over to the new mutability/aliasing annotations.
Should resolve#12780 since we are disabling optimization pending alias analysis.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13406
Differential Revision: D12886463
Pulled By: suo
fbshipit-source-id: e64e55905aebdcad273b39862df3209f823f5408
Summary:
This PR principally redesigns the fuser's logical flow to be hierarchical, with device-independent logic directing (relatively little) device-specific logic. This design is based on reviews of XLA, TVM, internal design review at NVIDIA and discussions with fuser owners at Facebook. To further vet the design I have begun developing the next significant PR (extended fusion logic) on top of this architecture and it has made the work significantly easier. This PR also improves fuser modularity, which should make it easier for others to contribute to. Unfortunately, this PR is large and its nature has made breaking it into smaller pieces challenging. Future PRs should be smaller.
The fusion flow is now:
- Fusions are "registered" and "upfront compilation" occurs. The fusion specifications, which includes the graph, go into a thread-safe device-independent cache. Upfront compilation generates some information used later during shape inference.
- Fusions are run, which passes them to an executor that performs shape inference, requests an instantiated fusion from the specification's thread-safe store, and launches them. Launch logic eventually defers to device-specific logic.
- Fusions not previously instantiated are compiled. Compilation is device-specific and arg-specific. Compilation logic eventually defers to device-specific logic.
- If the fusion could not be run because fusion on the requested device is disabled or shape inference fails a fallback is invoked.
This flow can be thought of as PyTorch IR -> Device-Independent Fusion Logic -> Device-Specific Fusion Logic. The current upstream logic is, by contrast, PyTorch IR -> Device-Specific Logic -> Device-Independent Logic, which results in needless code duplication and lack of conceptual clarity. That was my mistake when splitting the fuser off from the rest of the jit and our reviews since then have been incredibly helpful in understanding why the approach in this PR is better.
This PR does not only move code around. It also fixes few couple bugs and makes some logical/code changes.
Bug fixes:
- thread-safety is improved with caches preventing concurrent access
- the nvrtc version is now reviewed to determine the appropriate compute architecture to compile for, fixing a bug that would cause runtime errors if a user's nvrtc didn't support the compute architecture their gpu reported
- an issue with DeviceGuard not setting the device properly and failing silently is worked-around (ezyang mentioned he was reviewing the dynamic registration DeviceGuard uses, which may resolve the issue)
Code/Logical changes:
- "const" now appears many more places (note: I cast const away in operator.h because of some obscure build issues -- I think we should be able to fix this and will take a look while this goes through testing)
- The new flow allowed some redundant code to be removed (AnnotatedGraph is gone, for example, and the more straightforward flow eliminated duplication of effort elsewhere)
- Fallback logic is now also invoked if a fusion is requested on a device that cannot handle fusions
- Use of macros to determine which files are compiled is reduced (though they may come back if the Windows build is unhappy)
- There is no more "common" code or folder, the device-independent logic being at the forefront of the fuser replaces and improves upon the goal of sharing code
apaszke who I promised naming rights to
zdevito who correctly pointed out that the device-independent logic should be the bulk of what the fuser is doing
ngimel who contributed to the design of this architecture
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13108
Reviewed By: gchanan, fmassa
Differential Revision: D12850608
Pulled By: soumith
fbshipit-source-id: 24e2df6dfa97591ee36aeca8944519678c301fa3
Summary:
This commit is a minimial initial pass at adding inplace and _out variants to the JIT.
It changes gen_jit_dispatch.py to add bindings for these operators, and it also
supplements the FunctionSchema with alias information for these operators and for
viewing operators.
Tests are very minimal and will need to be improved in future commits.
Notes:
* Custom operator tests needed to be changed since _out variants add overloads, which
the custom operator pipeline does not handle when called from python. This commit
registers special test ops in the _test namespace for this purpose.
* Extends the schema parser to parse alias annotations more robustly.
* Extends FunctionSchema with `writes()` a set of alias set names that the op will write to,
and `annotatedType()` which will return AnnotatedType objects which contain the alias_set
information that was parsed from the schema.
* Disables all optimizations in graph executor when a mutable operator is found. This
is something that will be improved in the future but is necessary for correctness now.
* Adds annotate_ops to gen_jit_dispatch which adds aliasing information to all of the
aten ops.
* Adds AnnotatedType to the type hierarchy which is used to mark List and Tensor types
with their alias_set. These types only appear in schema when you call annotatedType
and are erased from types in normal use.
* Extends jit::Type with .containedTypes() and .withContained(new_types). The first returns all types contained
within the type (e.g. T for T[], or {T,L} for a tuple (T, L)). The second constructs a new
version of the same type, replacing the contained types with new_types. This simplifies
a lot of logic for recursively cleaning up types.
* Refactor List[T] into a common part that is shared with Annotated[T] and can be shared
with Optional[T] and Future[T] when they are merged.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13093
Differential Revision: D10848176
Pulled By: zdevito
fbshipit-source-id: d057f23eeb99cde8881129b42d3f151ed5e7655d
Summary:
Add a pass to move all constants to the beginning of the graph, and deduplicate.
This extends https://github.com/pytorch/pytorch/pull/10231 to also handle constants introduced in inlining, constant propagation, etc.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12222
Reviewed By: driazati
Differential Revision: D10201616
Pulled By: eellison
fbshipit-source-id: bc9c5be26868c8b5414257a0d4462de025aeb9bd
Summary:
This PR implements the design that we discussed. Changes:
- Added a World token IValue and type. The IValue is basically a dummy struct for now, in the future we may extend it (say, add thread-local state).
- Effectful ops explicitly declare they are mutable by having World tokens as inputs and outputs in their schema.
- Purely functional ops that use mutable values will get "fenced" and the world token will be threaded through the fences
- AnnotateEffects pass which wires up all the world tokens together.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10700
Reviewed By: eellison
Differential Revision: D9547881
Pulled By: michaelsuo
fbshipit-source-id: ebbd786c31f15bf45e2ddb0c188438ff2f5f3c88
Summary:
This PR does a few things:
Previously test_jit.py only tested autograd on backward graphs.
This is because we borrow from test_autograd and construct graphs with a small
number of nodes. Because the number of nodes is small (typically 1-2), those graph
do not end up containing autodiff subgraphs, so autodiff never gets tested.
This PR enables autodiff testing by doing the following:
- added disableDebugAutodiffSubgraphInlining fn to graph_executor to disable
autodiff subgraph inlining.
- (implementation) added autodiffSubgraphNodeThreshold and autodiffSubgraphInlineThreshold.
These are set to their default values (2, 5) but disableDebugAutodiffSubgraphInlining()
sets both to 1, disabling subgraph inlining and allowing 1-node autodiff subgraphs.
- The relevant backward jit tests disable autodiff subgraph inlining so they
will test the autodiff versions of the operators instead of autograd whenever
an autodiff variant exists.
- We don't run the tests that do inline autodiff subgraphs anymore.
This has no impact on testing correctness because the assumption is
that autograd functions are correct and are tested in test_autograd.py
This allows the graph fuser to work better because a lot of these ops were previously not autodiff-compatible but fusible. On a more concrete example, lstm backward contains a lot of tensor-scalar operations; these autodiff formulas help its double backward pass.
Included:
- arithmetic overloads
- abs, acos, asin, atan, ceil, cos, cosh, exp, expm1, floor, fmod, frac, log, log10, log1p, log2 reciprocal, remainder, round, sin, sinh, tan, trunc, rsqrt
TestJitGenerated tests autodiff for all of the added operations.
cc apaszke zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11832
Differential Revision: D10031256
Pulled By: zou3519
fbshipit-source-id: 9daf9900a5ad187743609cd0fbbd10b15411ad93
Summary:
This is pretty important because a common situation of passing LSTM hidden states as a tuple completely trashes performance of a network.
Cleans up all our propagation/undef specialization passes, at a cost of increased complexity of `ArgumentSpec` and `GraphExecutor`. An alternative would be to simply flatten all tuple inputs to a graph ahead of time, but that might just end up being confusing in the future (you never know if you're working with a graph that can have tuple or not).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11863
Differential Revision: D9992814
Pulled By: apaszke
fbshipit-source-id: 0a565a3b23e32f8fa72c0534e07c1ce6187739fc
Summary:
Previously, we would pretty much assume that all floating point tensors do require grad, which might result in some unnecessary compute.
I don't really like the fact that `TensorType` uses `tensor.is_variable() && tensor.requires_grad()` to infer the value of `requires_grad`, but changing constants to keep variables turns out to be pretty hard. I got halfway there, but it would still need some more work.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11586
Reviewed By: ezyang
Differential Revision: D9813648
Pulled By: apaszke
fbshipit-source-id: 77f77756d18ff7632fca3aa68ce855e1d7f3bdb8
Summary:
**Review last commit only.** Stacked on top of #10949.
This commit fixes a number of issues connected to caching
differentiability status of graphs inside graph executors,
and changes the rules for optimization of differentiable subgraphs.
Previously every one of those was instantiated as a separate graph
executor, but now they are simply heavier-optimized graph regions,
and graph executors are only instantiated for their backward.
zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10977
Differential Revision: D9600626
Pulled By: apaszke
fbshipit-source-id: dad09a0f586e396afbd5406319c1cd54fbb8a3d3
Summary:
Operators like aten::chunk used to return a number of tensors, but
now return a list. To make it easier to do shape prop through
aten::chunk and fuse it, I've also introduced prim::ConstantChunk,
which behaves like the previous implementation (has a variable length
output list).
The downside of this PR is that the introduction of more lists to the IR causes the LSTM and MiLSTM graphs to be considered as non-differentiable by the graph executor. I verified that they are still optimize correctly, and my next patch (that changes how the specializations/differentiation works) will restore those.
zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10949
Reviewed By: zdevito
Differential Revision: D9556823
Pulled By: apaszke
fbshipit-source-id: 33e63b17fc7247cac6cfc05eb7eb9bf069b499ee
Summary:
Fixes#10032
When capturing an output, GraphExecutorAutogradFunction creates
SavedVariable with is_output=False and owns it:
https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/graph_executor.cpp#L87
Constructing SavedVariable with is_output=False makes it own a copy of
the shared_ptr<GraphExecutorAutogradFunction>, which causes a reference
cycle:
6456b944fd/torch/csrc/autograd/saved_variable.cpp (L27)
The solution in this PR is to construct the SavedVariable with
is_output=True if the captured value is an output.
Test Plan
Turn on cuda memory checking for JitTestCase. If the test's name
includes "cuda" or "gpu" in it, the cuda memory checking test happens.
cc zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10222
Reviewed By: ezyang
Differential Revision: D9162995
Pulled By: zou3519
fbshipit-source-id: aeace85a09160c7a7e79cf35f6ac61eac87cbf66
Summary:
This lays out initial support for taking and returning a richer set
of types than only tensors. Floats and ints are already valid, lists are
straightforward to add, tuples need some discussion.
Based on top of #9948. Review only the last commit.
zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9969
Reviewed By: zdevito
Differential Revision: D9076973
Pulled By: apaszke
fbshipit-source-id: 5a1fe912ea6b79ab2bfd0dcce265eb05855b5ff0
Summary:
Adding a constant propagation pass to the JIT. I have added examples to the expect files.
There are a couple of special cases which have not been implemented here. IF nodes with constant conditions can be inlined with the correct block. WHILE nodes can be removed if the condition is false. I have added a test for each case in test_jit.py file as expected failures.
To be consistent with DCE, python ops & CPP ops are treated as not having side-effects.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8808
Reviewed By: wanchaol
Differential Revision: D8906770
Pulled By: eellison
fbshipit-source-id: 10ad796d89f80b843566c9ddad6a0abd1f3dc74c
Summary:
More clang tidy cleanups in `torch/csrc`. This time:
1. `hicpp-use-equals-default` recommends `= default` instead of `{}` for constructors/destructors. This is better practice because it expresses the intent better (https://stackoverflow.com/questions/6502828/what-does-default-mean-after-a-class-function-declaration)
2. `readability-inconsistent-declaration-parameter-name` enforces that parameter names in the declaration match parameter names in the definition. This is just generally useful and can prevent confusion and bugs.
Also updated my script a little bit.
apaszke ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9737
Differential Revision: D9069069
Pulled By: goldsborough
fbshipit-source-id: f7b3f3a4eb4c9fadc30425a153566d3b613a41ae
Summary:
Based on top of #9763 (first 3 commits belong to that PR). The first commits from this PR are "Stop using attributes ..."
I tried to separate the changes into fairly meaningful commits. I can't split them up into smaller PRs, because everything starts working and all tests pass only after the whole sequence, but hopefully this will make reviewing somewhat easier.
Known issues/regressions/future tasks:
- `aten::lerp` and `aten::clamp` are no longer fusable
- `CreateAutodiffSubgraphs` needs a rewrite
- It is much more strict now, and will miss a lot of opportunities, especially when viewing ops are involved. Our previous approach was "ignore the assumption on shape availability in gradient formulas to determine differentiability, and hope that shape prop will be robust enough to actually deliver them before we differentiate", which obviously doesn't scale well to more complex cases. We should either work on reducing the size dependency of grad formulas (feasible e.g. for `view`/`reshape`, unfeasible for `squeeze`/`unsqueeze`), or make `CreateAutodiffSubgraphs` integrate some kind of "I could integrate this node into an AD subgraph, but will I be able to infer the shape of its input" reasoning (kind of like a limited shape prop, that doesn't infer anything, and only tells if it *could* infer something).
- It sometimes creates constant-only (or constants + one node) graphs, which is useless
- Broken `aten::add` in auto-batching, because it gained a non-tensor input. I changed the test for pointwise operations to use `aten::mul` instead, but I needed to disable the LSTM cell test. I'm not sure how scalar constants should be implemented in this case, because I don't fully understand our format. cc: ChunliF
- Graph import does some hacks to recover type of constants. This code should be removed once we'll gain the ability to export the IR along with value types.
- There's still a fair amount of dead code that can be removed. I didn't want to make this diff any bigger, and removing it is an easy task.
- Graph fuser could be improved to use signature matching (possibly using `OperatorSet`) instead of basing on node kinds.
- Manual constant propagation for the `ListConstruct` node in `torch/onnx/utils.py` should be replaced with a proper constant propagation pass (or we should ensure that the one we have handles at least this case before we remove this code).
zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9807
Reviewed By: ezyang
Differential Revision: D9004285
Pulled By: apaszke
fbshipit-source-id: fe88026a765f6b687354add034c86402362508b7
Summary:
This is blocking the IR operator unification, because I need to be able to pass scalars to backward functions.
zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9763
Reviewed By: zou3519
Differential Revision: D8978457
Pulled By: apaszke
fbshipit-source-id: 570b4c3409322459cb0f2592069730a7d586ab20
Summary:
I got some tensor->variable conversion exceptions from `torch/csrc/autograd/variable.h`, which used the `TORCH_ASSERTM` macros instead of `AT_CHECK`, so they didn't have backtraces. This was such a substantial loss for debugability that I decided to update the whole codebase to use the backtrace-enabled ATen macros instead of `TORCH_ASSERT` and `JIT_ASSERT`, the latter having been an alias of the former.
ezyang apaszke zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9575
Differential Revision: D8924566
Pulled By: goldsborough
fbshipit-source-id: 7a4013b13eec9dbf024cef94cf49fca72f61d441
Summary:
As in the title. Lets us simplify a lot of code.
Depends on #9363, so please review only the last commit.
zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9414
Reviewed By: zdevito
Differential Revision: D8836496
Pulled By: apaszke
fbshipit-source-id: 9b3c3d1f001a9dc522f8478abc005b6b86cfa3e3
Summary:
IValue is short for interpreter value. It is used frequently so a short name is important.
This will allow us to implement more non-tensor types in an efficient way and remove
many hacks from the compiler.
This PR is limited. It only introduces IValue and changes interpreter to use it.
Follow up PRs will:
* Change the way aten_ops consume non-tensor types so that integer lists,
are no longer represented as Tensors.
* Introduce TensorList as a fundamental type and remove all vararg handling in gen_jit_dispatch
* Change the compiler to implement math on primitive numbers rather than converting to tensors.
jamesr66a apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9368
Reviewed By: ezyang
Differential Revision: D8817598
Pulled By: zdevito
fbshipit-source-id: 29dce80611ce5f6384234de9d12a67861d2b112f
This commit implements the solution proposed in https://github.com/pytorch/pytorch/issues/8410
to workaround the need to create zero tensors with the same shape as inputs.
It introduces the concept of a LinearBlock which marks places in the code
where we know if all the inputs to the node are zero, then the outputs
to the node are also zero. Autodiff introduces LinearBlocks around
backwards functions, which have this property. specializeUndef then
propagates Undef nodes using this information.
Notes:
* Since we do not always specialize, we have a pass LowerLinearBlocks
that replaces the block with an if statement that dynamically guards
the Undef case.
* We introduce AutogradAdd which is addition that still works when
its inputs might be undefined. In cases where we specialize this will
get removed in favor of a normal add, but there are cases where
gradient graphs do not specialize (e.g. when they are not differentiable,
but a derivative is required) so it is important for this op to be executable.
Addresses #8177
A design doc can be found here: [gist](https://gist.github.com/zou3519/4b7f13f03cc9f3612bd9363e6405fa0a) version or [quip](https://fb.quip.com/azL1AqUckBdo) version
General approach:
- Add NumberType, FloatType, IntType to represent Python numbers, floats and ints.
- Emit these types for python literals
- Change aten_schema such that Scalars are NumberType, int64_t and bool are IntType.
- Emit aten::type_as, prim::NumToTensor, and prim::TensorToNum nodes for tensor-number math. (see examples below)
- Erase NumberType, prim::NumToTensor, and prim::TensorToNum for ONNX export
### Tensor/number math
```
import torch
@torch.jit.script
def fn(x):
return x + 1
```
```
graph(%x : Dynamic) {
%1 : int = prim::Constant[value={1}]()
%2 : Dynamic = prim::NumToTensor(%1)
%3 : Dynamic = aten::type_as(%2, %x)
%4 : Dynamic = aten::add[alpha={1}](%x, %4)
return (%5);
}
```
### Number/Number Math
```
import torch
@torch.jit.script
def fn(zero):
c = 1 + 1
return zero + c
```
```
graph(%zero : Dynamic) {
%1 : int = prim::Constant[value={1}]()
%2 : int = prim::Constant[value={1}]()
%3 : Dynamic = prim::num_to_tensor(%1)
%4 : Dynamic = prim::num_to_tensor(%2)
%5 : Dynamic = aten::add[alpha={1}](%3, %4)
%c : int = prim::TensorToNum(%6) # this is the result of the addition
...
return (%13);
}
```
List of squashed commits:
* Introduce Python Number types
Added: IntType, FloatType, NumberType with
IntType <: NumberType
FloatType <: NumberType
Changed aten_schema so arguments have corresponding types
* Emit a NumberType for python literals.
Also emit a NumberType for Scalar default values.
* Add prim::NumToTensor and prim::TensorToNum
* Add DynamicType -> NumberType implicit cast for bc
* Better ensureTensor error message
* Add ensureTensorOrNumber. Allow passing Number to some functions
Like the range() construct and slices
* Patch IntList to work.
IntList is still a DynamicType in the frontend: a tensor gets built from
a List[int].
Also, IntList[1] is a "union between int and IntList" the way it is
implemented. If the frontend sees an int being passed for an IntList[1]
arg, it converts it to a tensor as well.
* Enforce some order on schemas to avoid overload ambiguity
add(Tensor, Tensor) should appear earlier than add(Tensor, Scalar). This
matches the order in which python_arg_parser parses its arguments.
* Disable std_dim and var_dim tests.
With the new schema information, std(input, keepdim) and std(input, dim)
are ambiguous. This will need to be fixed at a later date.
* Add NumberType erasure pass.
This is used for ONNX export and to ensure that NumberType information
doesn't reach the interpreter
* Add support for mixed tensor/number math ops.
* Tests for new functionality.
Includes:
- Tensor/number math
- number/number math
- EraseNumberTypes pass test
* Patch tests
Update expect tests for:
- decompose_addmm
- loop unrolling tests
Because python numbers are now NumberType, they cannot be returned by
functions anymore. Work around this by using "torch.full", or by adding
a tensor([0]) (taken from FIXME_zerol()). Both approaches are used
because torch.full is more readable, but it is broken in some cases.
* Add erase_number_types to torch/CMakeLists.txt
* Move math back to emitSimpleExpr from emitSugaredExpr
* Remove some dead lines
* Renable some excluded script/trace tests that are fixed.
* Move some tests to expected failure
* Address some comments (more addressing to come)
* Erase relevant aten::type_as nodes in EraseNumberTypes
I also changed it so that EraseNumberTypes is only called for ONNX
export. It is no longer used to prevent
prim::NumToTensor/prim::TensorToNum from reaching shape_analysis or
interpreter.cpp.
shape_analysis infers the type of the output of these nodes to be the
same as their input.
intepreter.cpp treats both of these nodes as no-ops.
* Add reminder to fix std/var
* Call EraseNumberTypes only when exporting a script module
* Update expects after rebase
* add comparison operators to jit
* try to fix CI
* address review comments
* fix type of comparison ops result
* address review comments
* fix indentation
* add comments
* require type_as to have non-dynamic tensor arg
* Typo (should check if template argument of type_as, inputs()[1], is tensor)
* Use .at() instead of []
* Use .at() again
* this removes the flag controlling whether the interpreter works on variables.
* now the interpreter _always_ works on variables
* constants in the IR are still _always_ non-variables, and an assert was added to ensure this.
* as_tensor was split into as_variable and as_tensor since it is sometimes used
to construct constants in the IR
* I tried changing the IR to also always use variables but that change was much more
cross cutting and fragile and I never got it working
When tracing we record expand nodes. This is useful in some cases because
it makes it clear a broadcast happened. However, in future runs
the broadcast may be different or not needed. This change adds an
attribute to expand to track if it was implicitly added. This
takes the form of an unused input to expand with a default value.
The execution engine then removes implicit expands before execution.
Note that shape_analysis will re-add expands when it can prove by
shape analysis that they will exist and this is useful for the fuser,
so this change should not affect fusion passes.
* Enable WERROR in tests
* Also set WERROR=1 for cpp_build in CI
* Enable Werror after the compiler checks
* Remove -DWERROR because its picked up from the env var
* Had to fix some errors in aten/contrib/data
* Allow an uninitialized variable in ReduceOpsKernel.cpp
* Use CUDNN_DATA_UINT8 in cuDNN type string conversion
* Fixes and use target_compile_options
* Fix uninitialized variables in THNN
* Include Python.h earlier in tensor_types.cpp
* Use CUDNN_VERSION 7100 instead of 7000?
* More Python.h includes
* Make switch case in common_subexpression_elimination.cpp exhaustive
* Build with WERROR=0 just to see all the warnings
* Remove some Python includes
* Enable WERROR=1 again
* Bring back switch case default
* Workaround in onnx to get transposes into init_nets
This adds a pass to ONNX so that it can speculate Transpose
operators so that ONNX's split pass can put them into an init_net
Also fixes a potential bug in onnx peephole where an optimization
across blocks might move a Value and violate scoping.
* Perform shape propagation when embedding a program into a trace.
This ensures the trace still has type information specific to that trace, which will help onnx export succeed in more cases.
This adds the ability to trace script functions while preserving their
control flow. When the trace encounters a script function it inlines
the graph of the function into the trace rather than tracing the
function itself.
* Fixes to the way script handles multiple values, and other minor fixes.
This commit improves our handling of operators that return multiple values.
Builtins are now checked so that they return the right number of values,
and support for TupleValue is extended to all things that can return
multiple values.
This resolves issues where the compiler accepted things like:
a, b = c + c
This would cause the interpreter to crash. Now each operator knows
how many results it will produce and can check it against the number
of requested inputs.
Notes:
* Allow True/False literals in constant expressions
* make handling of keyword constants more consistent to support True/False
* make parsing constants match the way we construct constants from python
* improve the error messages when accessing bad graph attributes.
* switch findTensorOp to return an optional.
* check that attribute types are correct in findTensorOp
* Check the correct number of outputs for builtins
This also changes emitExpr to return a single SugaredValue
Rather than possibly returning multiple values, emitExpr now
always returns a single value, which _might_ be a tuple. This approach
more closely follows python making the code easier to follow.
Checks for returning the right number of values are now located in
the assignment operator, and occur when unpacking the tuple.
We still pass `n_binders` to function calls so that calls into python
know how many values they should return.
Like `__slots__` the `__constants__` property changes the set/getattr behavior of a script module for the keys listed so they behave as constants.
This enables script methods to use them in way that are otherwise not allowed.
* Python numbers/bools can be inlined as constants in script code.
* List of numbers can be iterated over using for loops
* nn.ModuleLists can be used in for loops as well, unrolling their content.
* Namespaced symbols
- Our interned strings now have structure, "ns::symname" rather than just
"symname" before. We support efficient namespace testing for uniques
by encoding the namespace in one byte in the Symbol internal representation.
See torch/csrc/jit/interned_strings.h for a more in-depth implementation
discussion.
- All uses of ksymbol are now attr::symbol (or some appropriate namespace).
The valid namespaces are prim, attr, onnx and aten.
- Symbol is bound in Python as a qualified string "attr::symbol", EXCEPT for the
attribute setting/getting API, whose symbols must always be attr
symbols; they get special cased to assume strings are passed.
There's a little bit of naughtiness in the implementation, maybe you know
how to solve it.
- However, the g.op() convenience function assumes that you're generating
ONNX operators, unless you explicitly qualify.
- All ATen operators and nodes have built-in interned strings generated
for them, so you should never have to write a string literal ever again.
The tracing code is adjusted to use it.
- ONNX exporter now properly tests to see that all operators are in
onnx namespace before accepting the export. This is way more
robust than the previous exporter, which would be willing to
export capitalized operators which were not actually ONNX operators.
- A slight organizational change for symbolic.py; this module now ONLY
contains aten operators. In particular, the exporter for Constant
has moved into utils.py (along with Undefined, from the C++ side),
since primitive ops get "special treatment."
- The un-inplacing logic in recording is more robust, so that we don't
delete a trailing underscore from __and__. This never affected us
before because we didn't have any tests for it.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
This PR adds the possibility to build the C++ parts of autograd and jit, with no dependency on Python.
The goal is to allow taking a PyTorch IR representation (a tree s-expr) and running it with provided inputs.
Prerequisite: build PyTorch so that codegen runs once.
Instructions:
cd tools/cpp_build
bash build_all.sh
This will build libtorchjit and torchjit_test in tools/cpp_build/build/torchjit-build. The latter basically runs the code in test_jit.cpp for now.
While writing the PR, it turned out that a few of Python.h includes were redundant. They were removed here (PyTorch tests still pass on my machine, we'll see CI).
* Introduce Python-free builds of autograd and jit
* Remove NO_PYTHON ifdef in functions/special
* Add Python function calls to script
* Script compiler gains a `Resolver` object that runs when it does not understand a function call. This decouples the python resolution from the conversion to IR.
* Use stacks in the interpreter/aten_dispatch
Rather than have separate input/output lists,
the interpreter now works using a single stack.
Operators in the interpreter push/pop from the stack.
This allows ownership of tensors to transfer directly to an operator,
and an operator can drop the reference to a tensors as soon as it is
no longer needed. This is important for the GraphExecutor op,
which recursively runs the interpreter.
Once autograd is updated to pass variables to Function by value,
we will be able to ensure that we release ownership as soon as possible.
This commit also switches the interpreter to use a fake
tensor 'ContainerTensor' rather than at::Retainable to hold non-tensor
data in the interpreter. This allows us to use std::vector<at::Tensor>
for all registers, which is significantly less confusing than the
OwnedRetainables struct it was replacing.
* Add If and Loop to interpreter
* Preprocess loop to calculate where references to tensor should be dropped
* Add control instructions JumpZ/JumpNZ/Jump
* Switch from explicitly having stage structs to having a single list
of instructions with Store/Load instructions to take values off the
initial stack
* Make the interpreter tests executable rather than use expect files
* add a flag to interpreter code so that constants are variables
if the interpreter is running on variables.
* Add tensor_as to its own file
* Improve Function interface
* Undo tracer changes
* Fix bug in VariableType.set_history
* Rename function_counter and sequence_number to sequence_nr
* Clarify Function documentation
* Replace swap_next_edges with next_edges() getter
* Bring back set_gradient_edge
* Simplify special.cpp
* add_gradient_edge -> create_gradient_edge
* Add mutable getters for pre/post hooks
* Use make_variable with Edge
* Remove remove_gradient_edge in favor of detach_
* Fix documentation and remove create_gradient_edge friend method
* Canonicalize some includes
* Improve Variable interface
* Address comments from @apaszke and @colesbury
* string ::operator= is not noexcept
* Remove ir.h from tracer_state.h to improve build times
* Make Variable a struct and pack SavedVariable fields
* Implement as_variable_ref
* grad_fn_ptr() -> grad_fn_unsafe()
* Reduce hackiness of set_type hack
* Include variable.h and edge.h in tracer_state.h because it uses them
* class Variable -> struct Variable because Windows cant even
* Make Variable::output_nr uint32_t instead of int
* Add comment about tracing state
* Replaced more static_cast<Variable&> and improve docs
* Remove SavedVariable destructor and construct members in init list
* Clarify docs for Variable
* Variable::set_version -> set_version_counter