Summary:
Currently when the argument to isinf and isfinite is not tensor, a ValueError is raised. This, however, should be a TypeError, because the error is a type mismatch.
In the error message, "str(tensor)" is replaced by "repr(tensor)" because, when an error occurs, a printable representation of the object is likely more useful than the "informal" string version of the object.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20817
Differential Revision: D15495624
Pulled By: ezyang
fbshipit-source-id: 514198dcd723a7031818e50a87e187b22d51af73
Summary:
Attention mask should be of shape `(L, S)` since it is added to `attn_output_weights`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20850
Differential Revision: D15495587
Pulled By: ezyang
fbshipit-source-id: 61d6801da5291df960daab273e874df28aedbf6e
Summary:
Resubmit #20698 which got messed up.
Idea is that when PyTorch is used in a custom build environment (e.g. Facebook), it's useful to track usage of various APIs centrally. This PR introduces a simple very lightweight mechanism to do so - only first invocation of a trigger point would be logged. This is significantly more lightweight than #18235 and thus we can allow to put logging in e.g. TensorImpl.
Also adds an initial list of trigger points. Trigger points are added in such a way that no static initialization triggers them, i.e. just linking with libtorch.so will not cause any logging. Further suggestions of what to log are welcomed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20745
Differential Revision: D15429196
Pulled By: dzhulgakov
fbshipit-source-id: a5e41a709a65b7ebccc6b95f93854e583cf20aca
Summary:
As part of the Variable/Tensor merge work: https://github.com/pytorch/pytorch/issues/13638, we make the following changes in this PR:
1. Remove the `Variable::Impl` class and the `DifferentiableViewImpl` class
2. Change all `Variable.data()` call sites to either use `Variable` directly, or use `Variable.tensor_data()`
3. Remove `Variable.data()` API
3. Add `Variable.variable_data()` that matches `tensor.data` in Python API, which creates a new `Variable` that shares the same storage and tensor metadata with the original `Variable`, but with a completely new autograd history.
After this PR, Variable doesn't wrap a Tensor internally anymore, and both Variable and Tensor use the same TensorImpl class as its `impl_`. The only difference is that Variable always has AutogradMeta in its TensorImpl, but Tensor doesn't.
**Note that this PR is BC-breaking in the following use cases:**
**Use Case 1:**
Previously, `x.data = y` works even if `x` and `y` are of different TensorImpl type (e.g. `x` is a CPU dense tensor whose impl is of type TensorImpl, while `y` is a CPU sparse tensor whose impl is of type SparseTensorImpl). However, after this PR, `x.data = y` doesn't work anymore if `x` and `y` are of different TensorImpl type, because the underlying implementation `variable.set_data(tensor)` no longer works if `variable` and `tensor` have different TensorImpl type.
**Use Case 2:**
If a tensor `x`'s `grad` is sparse, accumulating dense gradients to `x` will change the tensor that `x.grad` is pointing to. This is better illustrated with the following example:
```python
params = torch.tensor([1.5, 1.5]).requires_grad_()
with torch.no_grad():
# Change gradient to a sparse tensor
params.grad = torch.sparse_coo_tensor(torch.tensor([[1, 1]]).long(), torch.tensor([1., 1.]))
grad_saved = params.grad
params.backward(torch.tensor([1.5, 1.5]))
assert id(grad_saved) == id(params.grad) # This will fail after this PR
```
The assertion in the last line will fail after this PR, because adding dense gradients to sparse gradients will change the `params.grad` tensor reference.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17072
Differential Revision: D14075257
Pulled By: yf225
fbshipit-source-id: 0e681df641270dea586042dd26db59f2e76b5957
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20669
Before, Dict was a value type, i.e. copying it did a deep copy.
Unfortunately, this doesn't work well with storing and passing Dicts around in IValues because IValues are reference types.
This diff changes Dict to be a reference type.
Reviewed By: dzhulgakov
Differential Revision: D15404911
fbshipit-source-id: dc990d3eb7cae044b74dd0253f8b704dde6a6c86
Summary:
When adding custom scalars like this
```python
from torch.utils.tensorboard import SummaryWriter
with SummaryWriter() as writer:
writer.add_custom_scalars({'Stuff': {
'Losses': ['MultiLine', ['loss/(one|two)']],
'Metrics': ['MultiLine', ['metric/(three|four)']],
}})
```
This error is raised:
```
TypeError: Parameter to MergeFrom() must be instance of same class: expected tensorboard.SummaryMetadata.PluginData got list.
```
Removing the square brackets around `SummaryMetadata.PluginData(plugin_name='custom_scalars')` should be enough to fix it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20580
Differential Revision: D15469700
Pulled By: orionr
fbshipit-source-id: 7ce58034bc2a74ab149fee6419319db68d8abafe
Summary:
fix#20781#20757
hmm I don't know an easy way to add a test to make sure it runs against a package installed as .egg. But i tested it locally with torchvision.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20782
Differential Revision: D15443600
Pulled By: ailzhang
fbshipit-source-id: 285eb0d9a44d6edb8e93618fa293f4feb431d2ae
Summary:
The duplicated code of `_optimize_trace` in _pytorch_graph.py is used to bypass some optimization step which causes missing scope.
It seems that most of the problematic steps have been fixed recently. Standard models implemented in torchvision are visually inspected before the commit. However, the `+=` in 50d54a82d1/torchvision/models/resnet.py (L63) will let f4d9bfaa4d/torch/onnx/utils.py (L159) produce a bad result. It can be fixed by replacing it with `out += identity`. This also implies that `+=` has non-intuitive behavior.
cc orionr ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20394
Reviewed By: NarineK
Differential Revision: D15452204
Pulled By: orionr
fbshipit-source-id: eaa4c13f16551c78dc6419f1e22eb2c560af4cc5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20816
Previously, the c10 dispatcher expected ops to be called with Variables and unwrapped them to Tensors before calling into the kernel.
The kernel was expected to return Tensors that were re-wrapped into Variables before passing them on into the system.
However, that doesn't work with kernels that call other operators. One recent example was a kernel that returned the result of `torch::ones()` as output.
Now, with this diff, the c10 dispatcher still passes Tensors to the kernel and Variables back into the system, but it accepts ops to be called with both Tensors or Variables
and kernels are also allowed to return either.
After https://github.com/pytorch/pytorch/pull/17072 , we should be able to get rid of the whole wrapping/unwrapping logic.
Reviewed By: hl475
Differential Revision: D15453963
fbshipit-source-id: 7602b7f2bc43e8ceb8a8c0e97aafcc53d4c47b6c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20787
Set requires_grad=False for bias: this will block the jit tracing.
The as_type fix: The input tensor shape and output tensor shape will be different, which will trigger the assertion failure at https://fburl.com/0m8xy7tc.
Reviewed By: jamesr66a
Differential Revision: D15445092
fbshipit-source-id: 22da41a56ecb9ac092585d0cc1ff0658fb9d631b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20045
This pass adds quant-dequant nodes for bias. This pass requires
quant-dequant pass for activations and weights to be done as it is required
to compute the qparams for bias
Differential Revision: D15179141
fbshipit-source-id: 3aab9fceefcadc3fa42a4e802d9b1e18addad78a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20770
Add dict type since it's part of the pytorch built-in system, and sparse features and text features will be converted to Dict
Reviewed By: pritamdamania87
Differential Revision: D15436255
fbshipit-source-id: 239adbd6a8f68be29020fe656d790f6872f1f0e9
Summary:
Previous implementation of magic methods extended from BuiltinOperators, but it should be able to work with other sugared values, such as casts.
I was also considering making CastValue's and BuiltinOperators's extend from a MagicMethod super class, and having them try to call into the super's before their own call. However, not all Builtin Operators have corresponding magic methods so i did it this way instead (although there are workarounds for that).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20654
Differential Revision: D15434469
Pulled By: eellison
fbshipit-source-id: 813fa00bf8b5b9ada46505075ebf984d8eee6aef
Summary:
This PR implements auto-conversion of GPU arrays that support the `__cuda_array_interface__` protocol (fixes#15601).
If an object exposes the `__cuda_array_interface__` attribute, `touch.as_tensor()` and `touch.tensor()` will use the exposed device memory.
#### Zero-copy
When using `touch.as_tensor(...,device=D)` where `D` is the same device as the one used in `__cuda_array_interface__`.
#### Implicit copy
When using `touch.as_tensor(...,device=D)` where `D` is the CPU or another non-CUDA device.
#### Explicit copy
When using `torch.tensor()`.
#### Exception
When using `touch.as_tensor(...,device=D)` where `D` is a CUDA device not used in `__cuda_array_interface__`.
#### Lifetime
`torch.as_tensor(obj)` tensor grabs a reference to `obj` so that the lifetime of `obj` exceeds the tensor
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20584
Differential Revision: D15435610
Pulled By: ezyang
fbshipit-source-id: c423776ba2f2c073b902e0a0ce272d54e9005286
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20468
ScalarType node is mandatory for activations and parameters now.
This change inserts ScalarType node for all the quant-dequant nodes. For the activations, currently the default value is at::ScalarType::Undefined. Remove this and explicitly pass the at::ScalarType::QUint8 dtype
Differential Revision: D15331600
fbshipit-source-id: 5b51e0b42e694bf409026af4783a12da6d7e234b
Summary:
Symbols are given hidden visibility by default on Linux to emulate the behavior on Windows. This helps developers catch visibility issues in their streamlined Linux dev environment before being surprised, late in the process, by Windows errors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20461
Reviewed By: kostmo
Differential Revision: D15410410
Pulled By: dzhulgakov
fbshipit-source-id: 1d684b5a9a80b692966a775c3f1c56b7c72ffc95
Summary:
Fixes#20017
This wraps the `torch._C.Function` currently returned from `torch.jit.script` and `torch.jit.trace` in a `ScriptFunction` and `TracedFunction` respectively, both of which are just wrappers to hold the function.
](https://our.intern.facebook.com/intern/diff/15403161/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20386
Pulled By: driazati
Differential Revision: D15403161
fbshipit-source-id: 94fb9f32929e62a00be6cf7512ea144ec9b91e0b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20677
With new changes in IR, it is possible to insert nodes after param
nodes in graph. Thus we do not need to have two methods for inserting q-dq
nodes to input or output to quantizable nodes.
Differential Revision: D15406354
fbshipit-source-id: 1963762f434fd82877fa76a272e8520c342b6069
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20040
Add the support of feature store example in fblearner pytorch predictor, end to end
Reviewed By: dzhulgakov
Differential Revision: D15177897
fbshipit-source-id: 0f6df8b064eb9844fc9ddae61e978d6574c22916
Summary:
load_state_dict includes a recursive inner function `load` that captures
Tensors through the close-over variable `state_dict`. Because it's
recursive, it also captures itself leading to a reference cycle.
This breaks the reference cycle so that any Tensors in state_dict can be
collected immediately instead of waiting until the next GC cycle.
Alternatively, we could have passed `state_dict` and `metadata` as
arguments to load to prevent capture of Tensors. (That would still
result in cyclic garbage, but not any cyclic garbage of Tensors).
See:
https://github.com/pytorch/pytorch/issues/20199#issuecomment-491089004
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20397
Differential Revision: D15414834
Pulled By: colesbury
fbshipit-source-id: 4c2275a08b2d8043deb3779db28be03bda15872d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20607
Add a new method SummaryWriter.flush() that iterates through all of the FileWriters and flushes them
Reviewed By: orionr
Differential Revision: D15380124
fbshipit-source-id: 1975f3f61c5ae3754552bfdb23f2cd78f687d19f