Commit Graph

131 Commits

Author SHA1 Message Date
8f1c3c68d3 [BE] Use nested namespaces in .cpp/.cu files (#92100)
As we live in C++17 world

This is a functional no-op, just
- `s/namespace at { namespace native {/namespace at::native {/`
- `s/namespace torch { namespace jit {/namespace torch::jit {/`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92100
Approved by: https://github.com/izaitsevfb
2023-01-13 16:32:34 +00:00
3ac6106523 Add out of bounds checks inside irparser.cpp and unpickler.cpp (#91401)
Hi!

I've been fuzzing different pytorch modules, and found a few crashes.

Inside unpickler.cpp/irparser.cpp there are a few places, where `.at()` and `.pop_back()` are called before checking target container size. Lack of these checks results in an attempt to access elements oob (in case of `.at()`), and an actual out-of-bounds access while calling `.pop_back()`/`.pop()` on a `stack_` variable.

Crash-files:

1. Crash location: `unpickler.cpp:439` (Call to `.at(idx)` with idx that exceeds `memo_table_` size).
    - Reproduce the crash: `/message_deserialize_fuzz /homedir/crash-5695ad5b2921127775d4137ee02e23834a0bedc4`
    - Crash file: [crash-5695ad5b2921127775d4137ee02e23834a0bedc4.zip](https://github.com/pytorch/pytorch/files/10308463/crash-5695ad5b2921127775d4137ee02e23834a0bedc4.zip)
    - ASAN report: [asan-report-crash-5695ad5b2921127775d4137ee02e23834a0bedc4.log](https://github.com/pytorch/pytorch/files/10308612/asan-report-crash-5695ad5b2921127775d4137ee02e23834a0bedc4.log)

2. Crash location: `irparser.cpp:504` (Call to `.at(idx)` with idx that exceeds `schema->returns()` size).
    - Reproduce the crash: `/irparser_fuzz /homedir/crash-779ecab3d637c8c87de21e23dddb9def82a26792`
    - Crash file: [crash-779ecab3d637c8c87de21e23dddb9def82a26792.zip](https://github.com/pytorch/pytorch/files/10308475/crash-779ecab3d637c8c87de21e23dddb9def82a26792.zip)
    - ASAN report: [asan-report-crash-779ecab3d637c8c87de21e23dddb9def82a26792.log](https://github.com/pytorch/pytorch/files/10308611/asan-report-crash-779ecab3d637c8c87de21e23dddb9def82a26792.log)

3. Crash location: `unpickler.cpp:451` (Call to `.pop_back()` with empty `stack_`).
    - Reproduce the crash: `/message_deserialize_fuzz /homedir/crash-735acc19c9f39b9bbb5667878af995c9167da37f`
    - Crash file: [crash-735acc19c9f39b9bbb5667878af995c9167da37f.zip](https://github.com/pytorch/pytorch/files/10308565/crash-735acc19c9f39b9bbb5667878af995c9167da37f.zip)
    - ASAN report: [asan-report-crash-735acc19c9f39b9bbb5667878af995c9167da37f.log](https://github.com/pytorch/pytorch/files/10308558/asan-report-crash-735acc19c9f39b9bbb5667878af995c9167da37f.log)

4. Crash location: `unpickler.cpp:469` (Call to `.pop()` with empty `stack_`).
    - Reproduce the crash: `/message_deserialize_fuzz /homedir/crash-b552f1a2bbba5eab0f6aeba58475175b18e5b1b9`
    - Crash file: [crash-b552f1a2bbba5eab0f6aeba58475175b18e5b1b9.zip](https://github.com/pytorch/pytorch/files/10308568/crash-b552f1a2bbba5eab0f6aeba58475175b18e5b1b9.zip)
    - ASAN report: [asan-report-crash-b552f1a2bbba5eab0f6aeba58475175b18e5b1b9.log](https://github.com/pytorch/pytorch/files/10308555/asan-report-crash-b552f1a2bbba5eab0f6aeba58475175b18e5b1b9.log)

The provided patch adds missing size checks.

### How to reproduce

1. To reproduce the crashes, use provided docker: [Dockerfile](https://github.com/ispras/oss-sydr-fuzz/blob/master/projects/pytorch/Dockerfile)

6. Build the container: `docker build -t oss-sydr-fuzz-pytorch-reproduce .`

7. Copy crash file to the current directory

8. Run the container: ``docker run --privileged --network host -v `pwd`:/homedir --rm -it oss-sydr-fuzz-pytorch-reproduce /bin/bash``

9. And execute fuzz-targets with the given arguments

After execution completes you will see ASAN reports.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91401
Approved by: https://github.com/davidberard98
2022-12-29 19:58:29 +00:00
3916d7a575 Apply modernize-use-emplace to aten, c10, torch (#91077)
Apply clang-tidy check modernize-use-emplace. This is slightly more efficient by using an inplace constructor and is the recommended style in parts of the codebase covered by clang-tidy. This just manually applies the check to rest of the codebase. Pinging @ezyang as this is related to my other PRs he reviewed like #89000

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91077
Approved by: https://github.com/ezyang
2022-12-19 07:49:56 +00:00
f74946324e [fix] allow saving python attr on Tensor and Parameter via torch.save (#81616)
Fixes: https://github.com/pytorch/pytorch/issues/72129

TODO:
* [x] Fix for Parameter

Benchmark
(Measurable diff for small tensors)
```
[-------------- Save and Load --------------]
                    |  After PR  |  Before PR
1 threads: ----------------------------------
      ()            |    111.7   |     106.9
      (4, 4)        |    114.4   |     109.2
      (128, 128)    |    135.2   |     128.3
      (1024, 1024)  |   1431.9   |    1431.3

Times are in microseconds (us).
```

<details>

<summary> Benchmark Script </summary>

```python
import torch
from torch.testing._internal.common_utils import BytesIOContext
from torch.utils import benchmark
import pickle

shapes = ((), (4, 4), (128, 128), (1024, 1024))

sizes = [1, 64, 1024, 10000]
results = []

def save_load_fn(t):
    with BytesIOContext() as f:
        torch.save(t, f)
        f.seek(0)
        torch.load(f)

for shape in shapes:
    t = torch.randn(shape)
    label = 'Save and Load'
    sub_label = f'{shape}'
    results.append(benchmark.Timer(
        stmt='save_load_fn(t)',
        globals={'t': t, 'save_load_fn':save_load_fn},
        label=label,
        sub_label=sub_label,
        description='Before PR',
    ).blocked_autorange(min_run_time=2))

compare = benchmark.Compare(results)
compare.print()

with open('before_pr.pkl', 'wb') as f:
    pickle.dump(results, f)

# with open('after_pr.pkl', 'rb') as f:
#     after_pr = pickle.load(f)

# with open('before_pr.pkl', 'rb') as f:
#     before_pr = pickle.load(f)

# compare = benchmark.Compare(after_pr + before_pr)
# compare.print()
```

</details>

NOTE : **BC-Breaking** : After this PR, all tensors (also regular tensors) will be serialised using `_rebuild_from_type_v2`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81616
Approved by: https://github.com/albanD, https://github.com/kurtamohler
2022-11-11 21:11:12 +00:00
eb9b156019 [fix] MathBits: serialization (#88182)
Fixes #81690

TODO:

* [x] C++ Unpickler Fix (locally tested pickled in Python and unpickled in C++)
* [x] C++ Pickler Fix (locally tested pickled in C++ and unpickled in Python)
* [x] Do quant_tensor, sparse_tensor, etc require similar changes? (Sparse and Quant don't need this)
* [x] Add Comments
* [x] How to make sure C++ and Python are in sync? (Functions in `pickler.h` help in getting and setting Tensor Metadata (math-bits for now) on a tensor. They are the only place which should handle this.)

Notes:
Quant Tensor don't support complex dtypes and for float they segfault with `_neg_view` : https://github.com/pytorch/pytorch/issues/88484

Sparse Tensor:
```python
>>> a = torch.tensor([[0, 2.], [3j, 0]]).to_sparse()
>>> a.conj().is_conj()
False
>>> a._neg_view()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NotImplementedError: Cannot access storage of SparseTensorImpl
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88182
Approved by: https://github.com/ezyang, https://github.com/anjali411
2022-11-09 17:15:12 +00:00
78a0ca29d9 Revert "[fix] allow saving python attr on Tensor and Parameter via torch.save (#81616)"
This reverts commit 54b6188cc6dee45b775d688223b847dc8ea85bff.

Reverted https://github.com/pytorch/pytorch/pull/81616 on behalf of https://github.com/mehtanirav due to Internal publishing is broken
2022-11-07 18:51:16 +00:00
54b6188cc6 [fix] allow saving python attr on Tensor and Parameter via torch.save (#81616)
Fixes: https://github.com/pytorch/pytorch/issues/72129

TODO:
* [x] Fix for Parameter

Benchmark
(Measurable diff for small tensors)
```
[-------------- Save and Load --------------]
                    |  After PR  |  Before PR
1 threads: ----------------------------------
      ()            |    111.7   |     106.9
      (4, 4)        |    114.4   |     109.2
      (128, 128)    |    135.2   |     128.3
      (1024, 1024)  |   1431.9   |    1431.3

Times are in microseconds (us).
```

<details>

<summary> Benchmark Script </summary>

```python
import torch
from torch.testing._internal.common_utils import BytesIOContext
from torch.utils import benchmark
import pickle

shapes = ((), (4, 4), (128, 128), (1024, 1024))

sizes = [1, 64, 1024, 10000]
results = []

def save_load_fn(t):
    with BytesIOContext() as f:
        torch.save(t, f)
        f.seek(0)
        torch.load(f)

for shape in shapes:
    t = torch.randn(shape)
    label = 'Save and Load'
    sub_label = f'{shape}'
    results.append(benchmark.Timer(
        stmt='save_load_fn(t)',
        globals={'t': t, 'save_load_fn':save_load_fn},
        label=label,
        sub_label=sub_label,
        description='Before PR',
    ).blocked_autorange(min_run_time=2))

compare = benchmark.Compare(results)
compare.print()

with open('before_pr.pkl', 'wb') as f:
    pickle.dump(results, f)

# with open('after_pr.pkl', 'rb') as f:
#     after_pr = pickle.load(f)

# with open('before_pr.pkl', 'rb') as f:
#     before_pr = pickle.load(f)

# compare = benchmark.Compare(after_pr + before_pr)
# compare.print()
```

</details>

NOTE : **BC-Breaking** : After this PR, all tensors (also regular tensors) will be serialised using `_rebuild_from_type_v2`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81616
Approved by: https://github.com/albanD, https://github.com/kurtamohler
2022-11-03 09:57:47 +00:00
c141f28b64 Fix compilation warning and spurious print (#87297)
Fixes compilation warning, make this warning an error and remove a random print.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87297
Approved by: https://github.com/malfet
2022-10-19 20:56:37 +00:00
61b4e8a7bf More SymFloat support (#85411)
- Support storing SymFloat in IValue
- Add SymFloat to JIT type system (erases to float)
- Printing support for SymFloat
- add/sub/mul/truediv operator support for SymFloat
- Support truediv on integers, it returns a SymFloat
- Support parsing SymFloat from Python object

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85411
Approved by: https://github.com/albanD
2022-09-22 08:07:22 +00:00
d438e86719 Add assertions to fix torch::jit::load bugs (#79192)
Fixes #77561, #77563, #77573 and #77575

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79192
Approved by: https://github.com/Gamrix
2022-08-11 18:03:00 +00:00
6592259ea5 [HPU] Enable torch.jit.load for HPU (#81759)
As per torch.jit.load documentation, all previously saved modules,
irrespective of their device, are first loaded onto CPU, and then
are moved to the devices they were saved from. So far, supported
devices included CPU and CUDA only. To enable torch.jit.load for
HPU, additional check for HPU is introduced.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81759
Approved by: https://github.com/eellison
2022-08-01 09:28:44 +00:00
0b9eb93fe9 Make type_resolver_ null error have more useful info (#81466)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81466
Approved by: https://github.com/yinghai
2022-07-15 05:58:37 +00:00
94eba341f8 Revert RPC Meta device support
This reverts commit 058be5f16293357b4bd2bc087f1f54cd8c17f468 and 2e2200d76c611eed8d0aed2ff93e0adc344407d2.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77875

Approved by: https://github.com/mrshenli
2022-05-19 23:47:47 +00:00
0a14a4c280 Register prims as operators.
This makes prims look as if they were defined in native_functions.yaml
but they're still all written in Python.  You now need to give a full
schema string for your prims.  The returned prim object is now
torch.ops.prim overload (prims are not allowed to be overloaded,
so we return the overload, not the overload packet, for speed.)

Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77117

Approved by: https://github.com/mruberry, https://github.com/albanD
2022-05-11 16:38:14 +00:00
2e2200d76c RPC Meta device support
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76882

Approved by: https://github.com/jamesr66a, https://github.com/mrshenli
2022-05-10 01:26:59 +00:00
5177f95d21 Introducing SymInt to Pytorch (for tracing size arithmetic) (master rebase) (#74861)
Summary:
This PR introduces `SymInt` type to Pytorch which will be used by LTC and AOTAutograd for tracing size arithmetic and tests.
`SymInt` is a C++ union structure [int64_t, SymbolicIntNode*] that wraps around an int64_t field where the value of the field could be an index into a list of `shared_ptr<SymbolicIntNode>` or a real int.
This PR doesn't add any support for actually tracing symbolic ints. i.e. data_ for now can only contain real ints.

```
Goal 1: just to show we can add a type to PyTorch core. (wraps int) LANDEABLE
Finalize the naming - symint
Want the name to be short
Does invoke “size” - NO
SInt/SymInt/SymbolicInt
SInt could mean signed int
sym_int or symint or SymInt (originally it was “int”; capitalized implies object semantics, whereas lowercase implies value semantics)
JIT schema - symint
C++ - symint
```

See more details here: https://docs.google.com/document/d/1iiLNwR5ohAsw_ymfnOpDsyF6L9RTUaHMpD8 (d843f63f2a)YLw-jxEw

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74861

Reviewed By: qihqi, ngimel

Differential Revision: D35226230

Pulled By: Krovatkin

fbshipit-source-id: 34acf342bd50fcaa4d8d5dd49c2fd6a98823a5b3
(cherry picked from commit 218643f63ef181cabb92d13a6e837eb64f2dda3c)
2022-03-31 21:59:59 +00:00
99db53eaa7 Jit save/load meta tensors (#73435)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73435

Add support for torch.jit.save and load for meta tensors to use in meta tensor based xl weights.

Test Plan:
```
buck test //caffe2/test:jit && -- -r .*save_load_meta_tensors.*
```

Reviewed By: houseroad

Differential Revision: D34479511

fbshipit-source-id: 117ccb12e9e427290a17297204508ec85495e3be
(cherry picked from commit ee9aaaf8208d6c9530c828a4b9f28cf2cca05630)
2022-03-10 19:48:29 +00:00
fe277b8717 [jit][edge] Migrate to TypeFactory for jit types on mobile (#71516)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71516

Mobile should be able to contruct dynamic types by default.
ghstack-source-id: 147498365

Test Plan:
CI.

**-48KB** binary size reduction for igios BSB.
UMBEX link: https://www.internalfb.com/intern/unigraph/explorer/?jsgq_traversal_spec=%7B%22builds%22%3A[%22bsb%3A422553426218394%5Cu0040base%22%2C%22bsb%3A422553426218394%5Cu0040diff%22]%7D&unigraph_project=UnigraphProjectMbex&is_mbex_redirected

Reviewed By: iseeyuan

Differential Revision: D33673958

fbshipit-source-id: 8600c04ae929283681971aae264d3774188df9cd
(cherry picked from commit 64ebcec09e69d2eff64fdbf926fb43d3b67f99b2)
2022-01-26 07:32:04 +00:00
4f35b9144c [jit][edge] Migrate ListType to DynamicType on mobile. (#70212)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70212

Use DynamicType instead of ListType all over the place in Lite Interpreter. Namely we need to modify the following places:
1. Type parser which produces the Type constants.
2. IValue::type() which returns reflected Type from IValues.
3. Helper functions to construct the container value.
4. Typechecks which test whether a type instance is a particular container type.
ghstack-source-id: 146818619

Test Plan: CI

Reviewed By: iseeyuan

Differential Revision: D33176931

fbshipit-source-id: 9144787f5fc4778538e5c665946974eb6171a2e6
2022-01-11 10:57:53 -08:00
b12ca69179 [jit][edge] Migrate DictType to DynamicType on mobile. (#70202)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70202

Use DynamicType instead of DictType all over the place in Lite Interpreter. Namely we need to modify the following places:
1. Type parser which produces the Type constants.
2. IValue::type() which returns reflected Type from IValues.
3. Helper functions to construct the container value.
4. Typechecks which test whether a type instance is a particular container type.
ghstack-source-id: 146735648

Test Plan: no behavior change.

Reviewed By: iseeyuan

Differential Revision: D33137257

fbshipit-source-id: 971bf431658c422ea9353cc32cdab66e98876e9d
2022-01-10 15:55:29 -08:00
30699cbfd5 Reland D33284352: [jit][edge] Do not reuse mobile type parser for all unpicklers. (#71048)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71048

reland D33284352 (0a921ba0d0)
ghstack-source-id: 146735646

Test Plan: All Github CI: ciflow rerun -l ciflow/all

Reviewed By: gmagogsfm

Differential Revision: D33489731

fbshipit-source-id: 3e160209a1abb193ad3eed3018054aa7d331025e
2022-01-10 12:42:23 -08:00
9762aa0fdc Revert D33284352: [jit][edge] Do not reuse mobile type parser for all unpicklers.
Test Plan: revert-hammer

Differential Revision:
D33284352 (0a921ba0d0)

Original commit changeset: 997c4f110b36

Original Phabricator Diff: D33284352 (0a921ba0d0)

fbshipit-source-id: af316727442a64f1ae40d53d7a9d26ec550d634e
2022-01-07 19:58:03 -08:00
0a921ba0d0 [jit][edge] Do not reuse mobile type parser for all unpicklers. (#70338)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70338

Today Unpickler is used by both server and mobile for deserializing model, and it always fallback to mobile parser when there's no type resolver provided by user. However this is not intended as server and mobile type parser supports different things. In this diff we provide a default fallback using script parser and opt it out for all mobile cases.
ghstack-source-id: 146727330

(Note: this ignores all push blocking failures!)

Test Plan: CI

Reviewed By: iseeyuan

Differential Revision: D33284352

fbshipit-source-id: 997c4f110b36eee6596e8f23f6a87bf91a4197ed
2022-01-07 18:35:32 -08:00
649dda9fee [jit] Implement DynamicType for TorchScript runtime. (#68136)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68136

DynamicType is an extension to existing server JIT types. Today using normal server types on Edge is a bit problematic because in embedded environments we don't need the full spectrum of types but we still build with these unneeded dependencies.

Is it possible to just get rid of unneeded JIT types from Edge builds? It's not easy to do so at this moment. For example, on Edge we don't support Union type, but we have to pull in the dependency of Union type because Optional type is being supported which inherits from Union type, so Union type has to be included in the build. Although we could split Union type and Optional type, it could be argued that the root cause is every time we use anything inheriting from `c10::Type`, we don't have the direct evidence of how much dependency we pull in, because we do virtual calls and we don't know what exactly we're calling with server JIT types. If we don't know, it's highly possible that the linker doesn't know either so it cannot effectively strip unused methods.

To address this problem, one option is to implement a separate `DynamicType` which has simpler behavior and doesn't store different types as different symbols in binary but rather raw data (or "tag"). This could increase the binary size by several KBs, so I included several binary size reductions in the same stack, hoping at least we don't regress the binary size.

Currently `DynamicType` inherits from `c10::Type` because I want to reduce the migration cost of `DynamicType` by making it interfacing with existing server JIT types. In the future `DynamicType` should be implemented as a separate class without relying on `c10::Type` to make things both simpler and leaner.
ghstack-source-id: 146670522

Test Plan: in the next diff.

Reviewed By: VitalyFedyunin

Differential Revision: D32264615

fbshipit-source-id: 180eb0998a14eacc1d8b28db39870d84fcc17d5b
2022-01-07 11:23:07 -08:00
41959ce77f [JIT] scripting, freezing, serialization for sparse csr (#69555)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69555

1. Implement pickling/unpickling
2. Add `test_freeze_sparse_csr, tests_serialize_sparse_csr` tests

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D33181367

Pulled By: davidberard98

fbshipit-source-id: a15d5193a7b1b1625a27e4af003cec33cdbc8071
2021-12-20 11:13:34 -08:00
b0817e19e0 [PyTorch] Avoid reading file from stream for 0 byte Tensor storage (#67787)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67787

First noticed in https://fb.workplace.com/groups/pytorch.edge.team/posts/952737705280969/ - basically one of the speech models has ~400 0 byte tensor files, so we're basically paying the cost of looking it up in the archive and reading nothing from it.

Turns out that there's a fairly simple fix to avoid reading a 0 byte tensor. Once we notice that it's 0 bytes, just use the default `DataPtr` instead to initializing it with 0 bytes read in from the input file stream.

ghstack-source-id: 142025211

Test Plan: CI and manually ran a couple production mobile models with bundled inputs. CI Will run all prod. mobile mobiles with bundled inputs.

Reviewed By: swolchok

Differential Revision: D32054983

fbshipit-source-id: 919b0cdbc44bccb8f6cfe0da10ff5474af37fd99
2021-11-09 21:45:05 -08:00
82f7f8d471 [PyTorch] Adopt IValue::toTupleRef() where obvious (#65505)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65505

Generated with

`fastmod -m 'toTuple\(\)(\s*)->' 'toTupleRef()${1}.'`

, followed by

`fastmod '(std::move\(.*)toTupleRef\(\).' '${1}toTuple()->'`

to unbreak 2 callsites.
ghstack-source-id: 142065835

Test Plan: CI

Reviewed By: gchanan

Differential Revision: D31131025

fbshipit-source-id: 54457ae5bbeb38db9c7f196d469b98521c3d3f34
2021-11-02 10:22:18 -07:00
f65b4b7a4c [PyTorch] Avoid refcount bump in UnionType::canHoldType (#66693)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66693

Passing a `TypePtr` by value causes an unnececssary refcount
bump. We don't need to take ownership, so `const Type&` is all we
need.

I considered providing a compatibility shim that takes `const
TypePtr&`, but doing so is dangerous because a
copy is required to convert from a more specific pointer like
`NoneTypePtr`.
ghstack-source-id: 140737081

Test Plan: CI

Reviewed By: suo

Differential Revision: D31691869

fbshipit-source-id: f766ce3234a28771c2a9ca4c284eb3f96993a3d0
2021-10-18 17:39:59 -07:00
e88d1c4f10 [PyTorch] Add tuple inline storage (#64066)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64066

I noticed a bunch of time being spent heap-allocating Tuples
in the unpickler. 1-, 2-, and 3-element Tuples are apparently common
enough that they get their own bytecode instructions, so I decided to
try also giving them their own representation. We store up to 3
IValues inline in `Tuple` rather than doing a second heap allocation
for a `std::vector<IValue>`.
ghstack-source-id: 140695395

Test Plan:
Added automated tests for TupleElements.

Pixel 3 before: https://www.internalfb.com/intern/aibench/details/761596366576284
Pixel 3 after: https://www.internalfb.com/intern/aibench/details/591414145082422
We went from 347 ms to 302 ms.

Reviewed By: dhruvbird

Differential Revision: D30592622

fbshipit-source-id: 93625c54c9dca5f765ef6d5c191944179cb281a8
2021-10-15 12:16:51 -07:00
176d3c6fb4 [PyTorch] Fix many Tuple::elements() callsites (#64065)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64065

It is only safe to mutate Tuple elements if you are the sole owner
of the tuple. The most efficient way to do this, then, is
`std::move(*std::move(tupleIValue).toTuple()).elements()` (the
innermost move allows `IValue::toTuple()` to avoid a refcount bump and
the outermost move allows the element vector to be moved out of the
tuple), but many callsites write simply
`tupleIValue.toTuple().elements()`, which incurs many extra refcount
bumps.

ghstack-source-id: 139468088

Test Plan: CI

Reviewed By: ezyang

Differential Revision: D30592621

fbshipit-source-id: e8312de866de09b9ea2a62e5128cbf403ee16f09
2021-10-01 11:36:05 -07:00
6831d8e379 Support Union in TorchScript (#64234)
Summary:
This PR is created to replace https://github.com/pytorch/pytorch/pull/53180 PR stack, which has all the review discussions. Reason for needing a replacement is due to a messy Sandcastle issue.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64234

Reviewed By: gmagogsfm

Differential Revision: D30656444

Pulled By: ansley

fbshipit-source-id: 77536c8bcc88162e2c72636026ca3c16891d669a
2021-09-03 06:12:24 -07:00
16ecdbbaa2 [PyTorch] Fix missing move in unpickler (#63974)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63974

Saw some time spent in this for model loading, no reason not to move here.
ghstack-source-id: 136760979

Test Plan: Re-profile model loading on devserver; IValue copy ctor time has gone down

Reviewed By: dhruvbird

Differential Revision: D30548923

fbshipit-source-id: 42000f2e18582762b43353cca10ae094833de3b3
2021-08-30 09:38:55 -07:00
7ebdbf82dc add support for sending cpu sparse tensors over rpc (#62794)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62794

This pr updates jit serialization to support pickling Sparse COO tensors.
This pr updates message.cpp to support Sparse COO tensors.
A bug was filed a few years ago https://github.com/pytorch/pytorch/issues/30807.

I tested the fix by adding sparse tensor tests to rpc_test.py and dist_autograd_test.py.

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski SciPioneer H-Huang mrzzd cbalioglu gcramer23 gmagogsfm

Test Plan: Imported from OSS

Reviewed By: soulitzer

Differential Revision: D30608848

Pulled By: gcramer23

fbshipit-source-id: 629ba8e4a3d8365875a709c9b87447c7a71204fb
2021-08-29 11:35:00 -07:00
0dd90cceaf [package] track storages across lifetime of PackageExporter (#59735)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59735

1. Fixes ABA storage identity problem during serialization for `torch.package` by keeping reference of serialized storages through lifetime of `PackageExporter` to prevent reuse of memory address. Achieved by extending logic used in solution to mobile's same issue.
2. Adds determinism to naming scheme of serialized storages in export code paths which utilize `tensor_cdata_naming_scheme`(introduced 2nd mapping in `StorageContext`, now maps `storage cdata ptr` -> `unique id`, `unique id` -> `c10::Storage`)
3. Additionally uses presence of a storage in the `StorageContext` instance as marker for if a storage has been serialized or not, removing the need to scan the `PythonStreamWriter` for presence of the storage's serialization file

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D29075276

Pulled By: Lilyjjo

fbshipit-source-id: 15a5c30b1de99c5bd7079388f2db9b6ece2eca12
2021-06-29 14:16:54 -07:00
9403fe17ce [torch.package/TorchScript] logic to enable sharing of tensors on load (#57573)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57573

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D28226975

Pulled By: Lilyjjo

fbshipit-source-id: bc8cb3e8052fa18336c437e0601d8b0028fd1895
2021-05-14 08:21:43 -07:00
4cb534f92e Make PyTorch code-base clang-tidy compliant (#56892)
Summary:
This is an automatic change generated by the following script:
```
#!/usr/bin/env python3
from subprocess import check_output, check_call
import os

def get_compiled_files_list():
    import json
    with open("build/compile_commands.json") as f:
        data = json.load(f)
    files = [os.path.relpath(node['file']) for node in data]
    for idx, fname in enumerate(files):
        if fname.startswith('build/') and fname.endswith('.DEFAULT.cpp'):
            files[idx] = fname[len('build/'):-len('.DEFAULT.cpp')]
    return files

def run_clang_tidy(fname):
    check_call(["python3", "tools/clang_tidy.py", "-c", "build", "-x", fname,"-s"])
    changes = check_output(["git", "ls-files", "-m"])
    if len(changes) == 0:
        return
    check_call(["git", "commit","--all", "-m", f"NOLINT stubs for {fname}"])

def main():
    git_files = check_output(["git", "ls-files"]).decode("ascii").split("\n")
    compiled_files = get_compiled_files_list()
    for idx, fname in enumerate(git_files):
        if fname not in compiled_files:
            continue
        if fname.startswith("caffe2/contrib/aten/"):
            continue
        print(f"[{idx}/{len(git_files)}] Processing {fname}")
        run_clang_tidy(fname)

if __name__ == "__main__":
    main()
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56892

Reviewed By: H-Huang

Differential Revision: D27991944

Pulled By: malfet

fbshipit-source-id: 5415e1eb2c1b34319a4f03024bfaa087007d7179
2021-04-28 14:10:25 -07:00
2130f4ccc4 Use c10::ArrayRef instead of std::vector for the jit::unpickle's tensor_table. (#54428)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54428

Using c10::ArrayRef as the parameter type makes the API more flexible and allows the caller to leverage small-buffer optimizations (e.g. c10::SmallVector, std::array) for performance critical cases.

Test Plan: No behavioral changes. Run the existing unit and integration tests.

Reviewed By: suo

Differential Revision: D27232222

fbshipit-source-id: 7b13bc6bd02257097ca119077028fbccc68cc925
2021-03-22 15:31:47 -07:00
cyy
d8730194e7 use device methods (#52899)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52899

Reviewed By: zou3519

Differential Revision: D26752203

Pulled By: albanD

fbshipit-source-id: eaef89377999b20655fe85d5a38ca7a2c5882de7
2021-03-02 20:14:23 -08:00
0e2520baae [PyTorch] Don't read 1 char per iteration in Unpickler::readString (#51901)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51901

It's much more efficient to read multiple chars with 1 memcpy than to call `read<char>` multiple times.
ghstack-source-id: 121278774

Test Plan:
Run WireSerializerBench before/after for small tensors:

```
/tmp/WireSerializerBench.Reader --real_data /mnt/homedir/hwwang/test_serialized_api_request --real_pytorch_api_request --bm_regex '[Ss]mall'
```

Before:
```
DeSerializeWire(Small)                                       7.65us  130.65K
DeSerializeWire(small_Zstd)                      100.49%     7.62us  131.29K
DeSerializeWire(small_Snappy)                    100.49%     7.62us  131.29K
DeSerializeWireIValue(Small)                      82.89%     9.23us  108.30K
DeSerializeWireIValue(small_Zstd)                 82.87%     9.24us  108.27K
DeSerializeWireIValue(small_Snappy)               82.33%     9.30us  107.57K
DeSerializeC2ToBlob(small_NoCompress)           1150.28%   665.39ns    1.50M
DeSerializeC2ToBlob(small_Zstd)                 1149.70%   665.72ns    1.50M
DeSerializeC2ToBlob(small_Zstd_Fast)            1150.94%   665.00ns    1.50M
DeSerializeC2ToBlob(Small_Snappy)               1151.70%   664.57ns    1.50M
DeSerializeC2ToString(small)                    9297.81%    82.32ns   12.15M
```

After:
```
DeSerializeWire(Small)                                       6.86us  145.84K
DeSerializeWire(small_Zstd)                      100.52%     6.82us  146.60K
DeSerializeWire(small_Snappy)                    100.13%     6.85us  146.03K
DeSerializeWireIValue(Small)                      83.94%     8.17us  122.42K
DeSerializeWireIValue(small_Zstd)                 84.00%     8.16us  122.50K
DeSerializeWireIValue(small_Snappy)               84.53%     8.11us  123.28K
DeSerializeC2ToBlob(small_NoCompress)           1019.48%   672.58ns    1.49M
DeSerializeC2ToBlob(small_Zstd)                 1020.03%   672.23ns    1.49M
DeSerializeC2ToBlob(small_Zstd_Fast)            1020.59%   671.85ns    1.49M
DeSerializeC2ToBlob(Small_Snappy)               1020.30%   672.05ns    1.49M
DeSerializeC2ToString(small)                    7709.63%    88.94ns   11.24M
```

Second run after to demonstrate it wasn't just variance:

```
DeSerializeWire(Small)                                       6.92us  144.57K
DeSerializeWire(small_Zstd)                       99.24%     6.97us  143.47K
DeSerializeWire(small_Snappy)                     99.58%     6.95us  143.97K
DeSerializeWireIValue(Small)                      84.83%     8.15us  122.63K
DeSerializeWireIValue(small_Zstd)                 84.72%     8.16us  122.49K
DeSerializeWireIValue(small_Snappy)               84.59%     8.18us  122.29K
DeSerializeC2ToBlob(small_NoCompress)           1031.03%   670.89ns    1.49M
DeSerializeC2ToBlob(small_Zstd)                 1030.64%   671.14ns    1.49M
DeSerializeC2ToBlob(small_Zstd_Fast)            1013.39%   682.57ns    1.47M
DeSerializeC2ToBlob(Small_Snappy)               1013.95%   682.19ns    1.47M
DeSerializeC2ToString(small)                    8155.98%    84.81ns   11.79M
```

By the way, this gets us closer to deserialization parity for the real data sample included in D26049387:

baseline:
```
DeSerializeWire(RealData)                                    7.34ms   136.24
DeSerializeWire(RealData_Zstd)                    99.95%     7.34ms   136.17
DeSerializeWire(RealData_Snappy)                 100.09%     7.33ms   136.36
DeSerializeWireIValue(RealData)                   82.69%     8.88ms   112.65
DeSerializeWireIValue(RealData_Zstd)              82.76%     8.87ms   112.75
DeSerializeWireIValue(RealData_Snappy)            82.68%     8.88ms   112.64
DeSerializeC2ToBlob(RealData_NoCompress)         116.87%     6.28ms   159.23
DeSerializeC2ToBlob(RealData_Zstd)               117.33%     6.26ms   159.85
DeSerializeC2ToBlob(RealData_Zstd_Fast)          117.38%     6.25ms   159.91
DeSerializeC2ToBlob(RealData_Snappy)             117.61%     6.24ms   160.23
DeSerializeC2ToString(RealData)                 4571.81%   160.55us    6.23K
```

with this diff:
```
DeSerializeWire(RealData)                                    6.57ms   152.17
DeSerializeWire(RealData_Zstd)                   100.17%     6.56ms   152.43
DeSerializeWire(RealData_Snappy)                 100.09%     6.57ms   152.31
DeSerializeWireIValue(RealData)                   83.06%     7.91ms   126.40
DeSerializeWireIValue(RealData_Zstd)              83.16%     7.90ms   126.54
DeSerializeWireIValue(RealData_Snappy)            83.22%     7.90ms   126.64
DeSerializeC2ToBlob(RealData_NoCompress)         104.02%     6.32ms   158.29
DeSerializeC2ToBlob(RealData_Zstd)               103.46%     6.35ms   157.43
DeSerializeC2ToBlob(RealData_Zstd_Fast)          104.64%     6.28ms   159.23
DeSerializeC2ToBlob(RealData_Snappy)             104.65%     6.28ms   159.25
DeSerializeC2ToString(RealData)                 4051.03%   162.22us    6.16K
```

Reviewed By: qizzzh

Differential Revision: D26321083

fbshipit-source-id: 92d45e760580bb290078ddac84128174daef0e55
2021-02-17 11:00:48 -08:00
680c4ce1dd [PyTorch] Avoid some extra intrusive_ptr<Tuple> copies in Unpickler (#51902)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51902

These seem like straightforward improvements. (I don't have measurements; feel free to reject if you're skeptical)
ghstack-source-id: 121278775

Test Plan: CI

Reviewed By: qizzzh

Differential Revision: D26322438

fbshipit-source-id: d393a32cc34bb68bc4f804f4b1cc5a8af27763c9
2021-02-17 07:31:58 -08:00
18a7ec7d7d Update the JIT complex type name to be consistent with Python (#51476)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51476

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D26179237

Pulled By: anjali411

fbshipit-source-id: 6a5c60c8545eb42416583836b8038ceffd3f3244
2021-02-03 09:59:08 -08:00
7328710cbc [PyTorch][codemod] Replace immediately-dereferenced cast calls w/castRaw (#50229)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50229

`fastmod -m 'cast(<((at|c10)::)?\w+Type>\(\)\s*)->' 'castRaw${1}->'` Presuming it builds, this is a safe change: the
result of `cast()` wasn't being saved anywhere, so we didn't need
it, so we can use a raw pointer instead of a new `shared_ptr`.
ghstack-source-id: 120769170

Test Plan: CI

Reviewed By: SplitInfinity

Differential Revision: D25837494

fbshipit-source-id: 46319100dc0dfc78f6d2b45148207f83481f2ada
2021-02-01 23:12:07 -08:00
f9f22c8b5c Add serialization logic for complex numbers (#51287)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51287

This reverts commit dfdb1547b9c1934904bfd137b4007d6a46a6f597.

Test Plan: Imported from OSS

Reviewed By: SplitInfinity

Differential Revision: D26131165

Pulled By: anjali411

fbshipit-source-id: 047167fac594ddb670c5e169446e90e74991679a
2021-01-28 17:25:35 -08:00
dfdb1547b9 Revert D26094906: Add serialization logic for complex numbers
Test Plan: revert-hammer

Differential Revision:
D26094906 (2de4ecd4eb)

Original commit changeset: 7b2614f3ee4a

fbshipit-source-id: 6f32a9fc6bb2a904ca1a282bbc6b2df0aee50068
2021-01-27 19:44:26 -08:00
2de4ecd4eb Add serialization logic for complex numbers (#50885)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50885

Test Plan: Imported from OSS

Reviewed By: SplitInfinity

Differential Revision: D26094906

Pulled By: anjali411

fbshipit-source-id: 7b2614f3ee4a30c4b4cf04aaa3432988b38a0721
2021-01-27 15:19:36 -08:00
5a5bca8ef0 [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D26043955

fbshipit-source-id: 0a5740a82bdd3ac7bd1665a325ff7fe79488ccea
2021-01-25 04:20:03 -08:00
9ac30d96aa Add complex IValues (#50883)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50883

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D26003682

Pulled By: anjali411

fbshipit-source-id: f02967d2d236d740cd8647891f732f1d63098d3e
2021-01-22 09:44:40 -08:00
4a8ef4525e Add new backend type for Intel heterogeneous computation platform. (#49786)
Summary:
Add a new device type 'XPU' ('xpu' for lower case) to PyTorch. Changes are needed for code related to device model and kernel dispatch, e.g. DeviceType, Backend and DispatchKey etc.

https://github.com/pytorch/pytorch/issues/48246

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49786

Reviewed By: mrshenli

Differential Revision: D25893962

Pulled By: ezyang

fbshipit-source-id: 7ff0a316ee34cf0ed6fc7ead08ecdeb7df4b0052
2021-01-20 08:15:18 -08:00
480a756194 [PyTorch] IValue::toTensor can now return const Tensor& (#48868)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48868

Building on the previous diff, we can make `toTensor()` return a
`const Tensor&`, which should make it easier to avoid reference
counting.
ghstack-source-id: 119327372

Test Plan: internal benchmarks.

Reviewed By: bwasti

Differential Revision: D25325379

fbshipit-source-id: ca699632901691bcee432f595f75b0a4416d55dd
2021-01-06 08:40:50 -08:00
c619892482 Fix errata (#49903)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49903

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D25718411

Pulled By: ansley

fbshipit-source-id: 0cc365c5a53077752dc1c5a5c4a65b873baa3604
2020-12-28 20:40:41 -08:00