Commit Graph

209 Commits

Author SHA1 Message Date
785e98783b Delete links to non-existing run_plan_mpi.cc (#136204)
That were deleted by https://github.com/pytorch/pytorch/pull/125092

Fixes https://github.com/pytorch/pytorch/issues/136199

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136204
Approved by: https://github.com/albanD, https://github.com/seemethere
2024-09-17 17:51:56 +00:00
cyy
2fd75667b4 [Caffe2]Remove Caffe2 scripts and benchmarks (#126747)
Due to removal of Caffe2.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126747
Approved by: https://github.com/ezyang, https://github.com/malfet
2024-06-05 23:46:31 +00:00
ed327876f5 [codemod] c10:optional -> std::optional (#126135)
Generated by running the following from PyTorch root:
```
find . -regex ".*\.\(cpp\|h\|cu\|hpp\|cc\|cxx\)$" | grep -v "build/" | xargs -n 50 -P 4 perl -pi -e 's/c10::optional/std::optional/'
```

`c10::optional` is just an alias for `std::optional`. This removes usages of that alias in preparation for eliminating it entirely.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126135
Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/albanD, https://github.com/aaronenyeshi
2024-05-14 19:35:51 +00:00
b9e7b35912 Remove caffe2 from more build files (#125898)
Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125898
Approved by: https://github.com/Skylion007
2024-05-13 18:37:59 +00:00
cyy
2ed17e0b1e Remove binaries using caffe2 functionality (#125885)
This PR removed some binaries using deleted or to be deleted Caffe2 functions.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125885
Approved by: https://github.com/r-barnes, https://github.com/Chillee
2024-05-10 06:21:10 +00:00
cyy
83845a7c78 [1/2] Remove caffe2 db and distributed from build system (#125092)
This PR tries to decompose https://github.com/pytorch/pytorch/pull/122527 into a smaller one. Caffe2 db, distributed and some binaries have been removed.
To be noted, this was inspired and is co-dev with @r-barnes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125092
Approved by: https://github.com/malfet
2024-05-04 06:48:46 +00:00
cyy
5d5990fc49 Remaining replacement of c10::stoi with std::stoi (#109482)
PR #109179 replaced c10::stoi with std::stoi. However, there were some files forgotten to change. This patch fixes them.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109482
Approved by: https://github.com/vfdev-5, https://github.com/Skylion007
2023-09-18 16:05:09 +00:00
f70844bec7 Enable UFMT on a bunch of low traffic Python files outside of main files (#106052)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106052
Approved by: https://github.com/albanD, https://github.com/Skylion007
2023-07-27 01:01:17 +00:00
a229b4526f [BE] Prefer dash over underscore in command-line options (#94505)
Preferring dash over underscore in command-line options. Add `--command-arg-name` to the argument parser. The old arguments with underscores `--command_arg_name` are kept for backward compatibility.

Both dashes and underscores are used in the PyTorch codebase. Some argument parsers only have dashes or only have underscores in arguments. For example, the `torchrun` utility for distributed training only accepts underscore arguments (e.g., `--master_port`). The dashes are more common in other command-line tools. And it looks to be the default choice in the Python standard library:

`argparse.BooleanOptionalAction`: 4a9dff0e5a/Lib/argparse.py (L893-L895)

```python
class BooleanOptionalAction(Action):
    def __init__(...):
            if option_string.startswith('--'):
                option_string = '--no-' + option_string[2:]
                _option_strings.append(option_string)
```

It adds `--no-argname`, not `--no_argname`. Also typing `_` need to press the shift or the caps-lock key than `-`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94505
Approved by: https://github.com/ezyang, https://github.com/seemethere
2023-02-09 20:16:49 +00:00
df1cc0ef47 [Vulkan] Add Vulkan Rewrite to Transfer Inputs and Outputs to Vulkan and CPU Backends Respectively (#87432)
With this change, we don't have to manually invoke transferring input and output backends when we run vulkan models.

Graph rewrite code based off of:
- 32efff45ba (diff-a473bddb458dc24225866a45092d6eca064eddd256245d93020e48e216eee4d5R160-R179)

Differential Revision: [D39519168](https://our.internmc.facebook.com/intern/diff/D39519168/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D39519168/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87432
Approved by: https://github.com/mcr229, https://github.com/digantdesai
2022-10-31 14:18:45 +00:00
bc68625151 [Vulkan] Add support for Optimization Blocklist to Vulkan Rewrite (#87431)
Optimization Blocklist will be used in a future diff (D40315730) to make the rewrite to transfer input/output backends optional

Differential Revision: [D40315729](https://our.internmc.facebook.com/intern/diff/D40315729/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87431
Approved by: https://github.com/mcr229, https://github.com/digantdesai
2022-10-31 14:15:51 +00:00
e0229d6517 Remove caffe2 mobile (#84338)
We're no longer building Caffe2 mobile as part of our CI, and it adds a lot of clutter to our make files. Any lingering internal dependencies will use the buck build and so wont be effected.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84338
Approved by: https://github.com/dreiss
2022-09-08 01:49:55 +00:00
4f34cd6d1e Replace all CHECK_ and DCHECK_ with TORCH_* macros (#82032)
Avoid exposing defines that conflict with google logging, since this blocks external usage of libtorch in certain cases.

All the 'interesting' changes should be in these two files, and the rest should just be mechanical changes via sed.
c10/util/logging_is_not_google_glog.h
c10/util/logging_is_google_glog.h

Fixes https://github.com/pytorch/pytorch/issues/81415

cc @miladm @malfet
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82032
Approved by: https://github.com/soumith, https://github.com/miladm
2022-07-26 01:20:44 +00:00
cb630c775e Add multithreading test to model compare binary (#80958)
This diff adds a option (`--nthreads`) which will launch the specified number of threads to load the models execute the correctness check on them.

Differential Revision: [D37465661](https://our.internmc.facebook.com/intern/diff/D37465661/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80958
Approved by: https://github.com/manuelcandales
2022-07-06 21:02:04 +00:00
c083489f46 [kineto] Optimize getStepCallbacks for common case of no active callbacks
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77804

IIUC, the result of this function will be empty and unused if there are no sampled callbacks, which is the common case. We can accelerate this case by wrapping the result in an optional to save initializing an empty SmallVector.

Differential Revision: [D36497279](https://our.internmc.facebook.com/intern/diff/D36497279/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36497279/)!

Approved by: https://github.com/robieta
2022-05-24 19:38:01 +00:00
a5e338a826 [RecordFunction] More effecient machinery to determine which callbacks to run. (#75807)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75807

There is a tension in RecordFunction between two use cases:
1) In the normal eager path we don't run any callbacks, so we need to bail out of the profiling path as soon as possible to minimize eager overhead.
2) When profiling we want to determine which callbacks to run as efficiently as possible to minimize instrumentation overhead.

The confounding factor in all of this is sampling callbacks because they change which callbacks will run on each call, even in steady state operation. This has traditionally been handled with a two stage procedure: first we flip a coin to determine if a sampled callback *might* run. If false (which it usually is), do nothing. This solves (1). If true, check to see if we need to build the full callback set or if it was a false positive. This procedure has two negative effects:
* It forces us to rebuild the set of callbacks to run on every step when profiling
* It leaks the sampling abstraction, requiring other parts of the code to bump certain values and forces RecordFunction to lazily initialize.

This change introduces a multi-level cache which can (in the common case) quickly determine which callbacks *will* run, rather than if callbacks *might* run. This means that rather than call `shouldRunRecordFunction`, we can simply get the callbacks for an invocation and check if they are empty. (And completely removes the pre-sampling heuristic.) Another major benefit of the new cache structure is that it allows thread-safe registration and unregistration of global callbacks.

It's worth briefly discussing how this maintains eager performance. In the standard eager case (only sampling callbacks registered) the cache first checks that the global callbacks haven't changed (atomic read), decrements a counter to see if a sampling callback fired, and then returns the active callbacks which is simply a SmallVector of pointer pairs and a couple POD values (scope, needs inputs/outputs/ids). The biggest cost according to perf is the SmallVector logic; we could consider adopting a hard limit on active callbacks; more than half a dozen callbacks *running* in a single step would be quite a lot. But the total cost relative to `PYTORCH_DISABLE_PER_OP_PROFILING` is only ~10ns, so debatable if it's worth it to switch to `std::array`.

The primary change is in `record_function.cpp`, which has a more detailed description of the new cache structure. `record_function.h` has some minor changes to align with the new calling convention and the remaining files are simply changes to the call sites.

Future work:
  * RecordFunction no longer needs to be lazily initialized.
  * We can deprecate the disable/reenable APIs, since we can not safely add and remove global callbacks.

Test Plan:
I tested eager mode performance using the overhead benchmark and found that the non-profiled path was unaffected. However the no-op observer dropped from 0.41us to 0.37us (0.25us if no observers are active) which is about 1/3rd reduction in the cost of the callback selection machinery.

I also added several C++ unit tests, as the core RecordFunction machinery (especially sampling) was largely untested.

Reviewed By: swolchok, davidberard98

Differential Revision: D35276158

fbshipit-source-id: 35135f444724fba4eb97c0ae7f3f710f0f9016fd
(cherry picked from commit 9e359b87422c18f2a195185f32e7e85c82f956fd)
2022-04-19 20:46:16 +00:00
9bbe1d632e Fix ONNX ATen fallback for non-caffe2 engines
This PR introduces 3 BC changes:

First, this PR propagates `BUILD_CAFFE2` flag to `libtorch` and `libtorch_python`, which is necessary for non-caffe2 ONNX runtimes when using `ONNX_ATEN_FALLBACK` operator export type.

Second, as a complement of https://github.com/pytorch/pytorch/pull/68490, this PR refactors Caffe2's Aten ops symbolics to consider not only the `operator_export_type` (aka `ONNX_ATEN_FALLBACK`) to emit Caffe2 Aten ops, but also whether `BUILD_CAFFE2` (which is called `torch.onnx._CAFFE2_ATEN_FALLBACK` in python binding) is set.

Lastly, it renames `onnx::ATen` to `aten::ATen` for ONNX spec consistency in a BC fashion.
ONNX doesn't have `ATen` op on its spec, but PyTorch ONNX converter emits them. Non-Caffe2 backend engines would be mislead by such operator's name/domain. A non-ideal workaround would be to have Aten ops handled based on its name and ignore the (non-complaint) domain. Moreover, users could incorrectly file bugs to either ONNX or ONNX Runtime when they inspect the model and notice the presence of an unspecified ONNX operator.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73954
Approved by: https://github.com/BowenBao, https://github.com/malfet, https://github.com/garymm, https://github.com/jiafatom
2022-04-14 23:18:45 +00:00
be7177751e Add binary to benchmark model load speed (#74700)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74700

Test Plan:
Imported from OSS

Some results running this benchmark for a quantized CPU xirp14b model on a Pixel 5:

```
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "46749"}
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "19261"}
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "19235"}
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "19396"}
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "19486"}
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "19562"}
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "19566"}
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "19559"}
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "19632"}
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "19938"}
```

Some results running this benchmark for the Vulkan xirp20a model on Pixel 5, after pre-loading the Context:

```
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "38664"}
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "19921"}
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "20316"}
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "20255"}
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "20219"}
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "20329"}
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "20463"}
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "21072"}
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "20668"}
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "20889"}
```

Without pre-loading Context:

```
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "70850"}
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "19867"}
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "20211"}
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "20039"}
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "20082"}
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "20268"}
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "20363"}
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "21103"}
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "20511"}
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "20528"}

```

Reviewed By: mrshenli

Differential Revision: D35124881

Pulled By: SS-JIA

fbshipit-source-id: 0f093e4aa45d69c538a4fe2003e0d5617d72b97a
(cherry picked from commit 96f991420ad720300aea51cc0a1a6c0f79d2820b)
2022-03-30 20:22:57 +00:00
ac97e953b4 Add dynamic shape support to AOT driver & compiler (#72995)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72995

Add ability to specify input dimensions that need to be dynamic.
Example: if dim 115 can be dynamic in input sizes "1,115;1", then specify dynamic_dims as "115"

Also recompile and update CI models and some asm code as the old ones don't compile with compiler changes in context.cpp

Test Plan: - Compiles and runs BI Bytedoc model with and without dynamic inputs.

Reviewed By: ZolotukhinM

Differential Revision: D34233121

fbshipit-source-id: 35095e549ebd6d3bec98b9abb3f0764366a0ff6f
(cherry picked from commit 33166a9f9ac9194b5df0a35280b57708df255ebd)
2022-02-24 04:30:48 +00:00
52175307e2 [vulkan] Allow benchmark binary to handle non-single tensor inputs/outputs for Vulkan models (#73109)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73109

This change updates the Vulkan model runner in `speed_benchmark_torch` to be able to generate inputs for models that have input/output types other than just a single tensor. Input elements are processed depending on their type.

Test Plan: Imported from OSS

Reviewed By: mikaylagawarecki

Differential Revision: D34354839

Pulled By: SS-JIA

fbshipit-source-id: 993e55372d2664fa7eddb16146deba264727f399
(cherry picked from commit 4a140202acb336412676ac090a38d7b93ae49898)
2022-02-19 01:33:51 +00:00
c32b74cecb [nnc][aot_compiler] Memory formats args to aot_compiler (#72873)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72873

Test Plan: Imported from OSS

Reviewed By: priyaramani

Differential Revision: D34250984

Pulled By: IvanKobzarev

fbshipit-source-id: e723ee64b024883eef78853e1b185b7040cafb09
(cherry picked from commit e9908df045acf33aa3cd0aec6784f15421236787)
2022-02-16 18:39:31 +00:00
444191de56 Use default value on empty llvm_code_path (#72758)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72758

Bug: FLAGS_output_llvm option was introduced recently to specify LLVM assembly code file. Without previously default value, now the llvm code is not being saved to a file if asmfile input is not specified and is resulting in making the compiled output unusable.

Fix: Use default value if output_llvm/asmfile input is not specified.

Test Plan: Verified that output is saved to deafult .ll file path

Reviewed By: IvanKobzarev

Differential Revision: D34189107

fbshipit-source-id: ee51e8c17de92d3045690ca871fb9569fc3164d6
(cherry picked from commit 46352d446b3e9d7df0eddf0a29790de6f7757d26)
2022-02-12 00:35:24 +00:00
a60e2ae037 [TensorExpr] Move AOT compilation logic from aot_compiler.cpp to NNC's to_backend (#70375)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70375

Differential Revision:
D33303645
D33303645

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin, priyaramani

Pulled By: ZolotukhinM

fbshipit-source-id: 01ab9fab9bb0d63f89b06a146d3c5fb6ed7fe52d
(cherry picked from commit aac8e0ed900d1b760606b0b50eb064e6b00f8b7a)
2022-02-02 02:34:55 +00:00
4868907cf3 [binaries] fix dump_operator_name binary (#71246)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71246

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D33555962

Pulled By: IvanKobzarev

fbshipit-source-id: 2b386e52fa8e76c877fec5b6b15d99f7d280801f
(cherry picked from commit f6d60fdff68964f77aa46ca2c51327cb66566194)
2022-01-20 17:33:08 +00:00
3c0c5bde0e [cmake] Uncomment binaries (#71157)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71157

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D33528259

Pulled By: IvanKobzarev

fbshipit-source-id: b8c216558ca612bedd4c37205f38ed29c2c82b3c
2022-01-12 15:01:44 -08:00
29d759948e use irange for loops 2 (#66746)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66746

Modified loops in files under fbsource/fbcode/caffe2/ from the format

`for(TYPE var=x0;var<x_max;x++)`

to the format

`for(const auto var: irange(xmax))`

This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D31705361

fbshipit-source-id: 33fd22eb03086d114e2c98e56703e8ec84460268
2021-12-10 04:26:23 -08:00
8cc9ec2f6b Add option to get input dtype from user (#68751)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68751

Add option to get input dtype from user for AOT compilation

Test Plan:
BI model compiles and runs fine
```
(pytorch)  ~/fbsource/fbcode/caffe2/fb/nnc
└─ $ buck run //caffe2/binaries:aot_model_compiler -- --model=bi.pt --model_name=pytorch_dev_bytedoc --model_version=v1 '--input_dims=1,115;1' --input_types='int64;int64'
Building... 8.3 sec (99%) 7673/7674 jobs, 0/7674 updated
WARNING: Logging before InitGoogleLogging() is written to STDERR
W1116 14:32:44.632536 1332111 TensorImpl.h:1418] Warning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (function operator())
E1116 14:32:44.673710 1332111 huge_pages_allocator.cc:287] Not using huge pages because not linked with jemalloc
The compiled llvm assembly code was saved to bi.compiled.ll
The compiled model was saved to bi.compiled.pt
```

> Error thrown when input dims and input types sizes don't match

```
(pytorch)  ~/fbsource/fbcode/caffe2/fb/nnc
└─ $ buck run //caffe2/binaries:aot_model_compiler -- --model=bi.pt --model_name=pytorch_dev_bytedoc --model_version=v1 '--input_dims=1,115;1' --input_types='int64;int64;int64'
.
.
terminate called after throwing an instance of 'c10::Error'
  what():  [enforce fail at aot_model_compiler.cc:208] split(';', FLAGS_input_dims).size() == split(';', FLAGS_input_types).size(). Number of input_dims and input_types should be the same
.
.
.
```

Reviewed By: ljk53

Differential Revision: D32477001

fbshipit-source-id: 8977b0b59cf78b3a2fec0c8428f83a16ad8685c5
2021-11-29 21:39:49 -08:00
d71092f668 [android][fbjni] Update fbjni to 0.2.2 (#68400)
Summary:
ghstack-source-id: caeb8df3a18a6fa48d591af126ac59d8e41494b5
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68400

Fixes #{issue number}

CI-all check:
https://github.com/pytorch/pytorch/pull/68497

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68495

Reviewed By: linbinyu

Differential Revision: D32481451

Pulled By: IvanKobzarev

fbshipit-source-id: b19ce05ff9d63b3f701d718eefbf1e9d66e11639
2021-11-17 16:54:22 -08:00
82f7f8d471 [PyTorch] Adopt IValue::toTupleRef() where obvious (#65505)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65505

Generated with

`fastmod -m 'toTuple\(\)(\s*)->' 'toTupleRef()${1}.'`

, followed by

`fastmod '(std::move\(.*)toTupleRef\(\).' '${1}toTuple()->'`

to unbreak 2 callsites.
ghstack-source-id: 142065835

Test Plan: CI

Reviewed By: gchanan

Differential Revision: D31131025

fbshipit-source-id: 54457ae5bbeb38db9c7f196d469b98521c3d3f34
2021-11-02 10:22:18 -07:00
7fbcf79684 [tensorexpr][nnc] Support quantization (#66676)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66676

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D31676329

Pulled By: IvanKobzarev

fbshipit-source-id: 288b41ff4ed603dfaacb465f296997f14bb23c22
2021-10-31 22:49:30 -07:00
fa70d72e95 Set kernel func name from AOT Compiler (#67229)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67229

Right now, assembly code generated for the a given method from the model is named wrapper or func by default. The function name is then replaced with a proper kernel_func_name after target specific assembly is generated.
This PR propagates a desired kernel_func_name right from aotCompiler API so that the generated function has the needed name that doesn't need to be replaced later.

Note: Most of this change was landed in https://github.com/pytorch/pytorch/pull/66337 which had to be reverted as it was breaking `test_profiler` in `test_jit_fuser_te` as it replaced the name generated for graph with the default kernel_func_name value. This PR fixes that as well.

```
(pytorch)  ~/local/pytorch kname
└─ $ python3 test/test_jit_fuser_te.py
CUDA not available, skipping tests
monkeytype is not installed. Skipping tests for Profile-Directed Typing
........................................<string>:3: UserWarning: torch.cholesky is deprecated in favor of torch.linalg.cholesky and will be removed in a future PyTorch release.
L = torch.cholesky(A)
should be replaced with
L = torch.linalg.cholesky(A)
and
.
.
.
......................<string>:3: UserWarning: torch.symeig is deprecated in favor of torch.linalg.eigh and will be removed in a future PyTorch release.
The default behavior has changed from using the upper triangular portion of the matrix by default to using the lower triangular portion.
L, _ = torch.symeig(A, upper=upper)
should be replaced with
L = torch.linalg.eigvalsh(A, UPLO='U' if upper else 'L')
and
L, V = torch.symeig(A, eigenvectors=True)
should be replaced with
L, V = torch.linalg.eigh(A, UPLO='U' if upper else 'L') (Triggered internally at  ../aten/src/ATen/native/BatchLinearAlgebra.cpp:2492.)
......[W pybind_utils.cpp:35] Warning: Using sparse tensors in TorchScript is experimental. Many optimization pathways have not been thoroughly tested with sparse tensors. Please include the fact that the network is running sparse tensors in any bug reports submitted. (function operator())
/data/users/priyaramani/pytorch/torch/testing/_internal/common_utils.py:403: UserWarning: Using sparse tensors in TorchScript is experimental. Many optimization pathways have not been thoroughly tested with sparse tensors. Please include the fact that the network is running sparse tensors in any bug reports submitted. (Triggered internally at  ../torch/csrc/jit/python/pybind_utils.h:691.)
  return callable(*args, **kwargs)
.....................................................................[W Resize.cpp:23] Warning: An output with one or more elements was resized since it had shape [1], which does not match the required output shape [].This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (function resize_output_check)
[W Resize.cpp:23] Warning: An output with one or more elements was resized since it had shape [1, 5], which does not match the required output shape [5].This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (function resize_output_check)
........................................................................s.......s...s.s....s......s..sss............................
----------------------------------------------------------------------
Ran 503 tests in 37.536s

OK (skipped=10)
```

Test Plan: Imported from OSS

Reviewed By: navahgar, pbelevich

Differential Revision: D31945713

Pulled By: priyaramani

fbshipit-source-id: f2246946f0fd51afba5cb6186d9743051e3b096b
2021-10-27 13:10:49 -07:00
b55a2500d2 [jit] Remove graph() call from abstract Function interface. (#65967)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65967

Graph is an implementation detail. If user wants to get access to the
underlying graph, they should be able to explicitly dynamic cast instead.
ghstack-source-id: 141659819

Test Plan: no behavior change.

Reviewed By: gmagogsfm

Differential Revision: D31326153

fbshipit-source-id: a0e984f57c6013494b92a7095bf5bb660035eb84
2021-10-27 11:54:26 -07:00
ecf7e96969 [Light] Remove ambiguity from compile_spec names, use actual output type (#67209)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67209

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67198

Fixing a couple instances where parameters were named method_compile_spec when they were actually compile_specs that could have multiple method_compile_specs inside.
Also use output dtype from buffer.

Test Plan:
Mobilenetv3 compiles and runs fine
```
(pytorch)  ~/fbsource/fbcode/caffe2/fb/nnc
└─ $ PYTORCH_JIT_LOG_LEVEL="aot_compiler" buck run //caffe2/binaries:aot_model_compiler -- --model mobilenetv3.pt --model_name=pytorch_dev_mobilenetv3 --model_version=v1 --input_dims="1,3,224,224
"
Downloaded 4501/6195 artifacts, 433.89 Mbytes, 14.3% cache miss (for updated rules)
Building: finished in 06:34.6 min (100%) 20233/20233 jobs, 5467/20233 updated
  Total time: 06:35.0 min
BUILD SUCCEEDED
The compiled llvm assembly code was saved to mobilenetv3.compiled.ll
The compiled model was saved to mobilenetv3.compiled.pt

└─ $ ./compile_model.sh -m pytorch_dev_mobilenetv3 -p /data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/mobilenetv3.pt -v v1 -i "1,3,224,224"
+ VERSION=v1
+ getopts m:p:v:i:h opt
+ case $opt in
+ MODEL=pytorch_dev_mobilenetv3
.
.
Columns 961 to 9701e-11 *
-4.2304 -3.9674  2.4473 -0.8664 -0.7513  1.2140  0.0010  3.8675  1.2714  2.2989

Columns 971 to 9801e-11 *
-2.7203  1.6772 -0.7460 -0.6936  4.4421 -0.9865 -0.5186 -1.4441  1.3047 -1.6112

Columns 981 to 9901e-11 *
 0.1275 -1.8815  2.5105 -0.4871 -2.2342  0.8520  0.8658  1.6180  3.8901 -0.2454

Columns 991 to 10001e-11 *
-1.4896  4.1337 -2.6640  0.8226  0.2441 -1.4830 -1.7430  1.8758  0.5481  0.5093
[ CPUFloatType{1,1000} ]
Starting benchmark.
Running warmup runs.
Main runs.
Main run finished. Milliseconds per iter: 276.255. Iters per second: 3.61984
Memory usage before main runs: 104366080 bytes
Memory usage after main runs: 343441408 bytes
Average memory increase per iter: 2.39075e+07 bytes
0 value means "not available" in above
```

Reviewed By: ljk53

Differential Revision: D31698338

fbshipit-source-id: da6c74c1321ec02e0652f3afe6f97bf789d3361b
2021-10-25 17:44:05 -07:00
b6fa998892 Revert D31514095: Use kernel_func_name from aotCompiler
Test Plan: revert-hammer

Differential Revision:
D31514095 (7b55dc8340)

Original commit changeset: b70c8e2c7336

fbshipit-source-id: ad4d828f33506e612b51c276149fa0e12b0565d5
2021-10-23 17:17:53 -07:00
7b55dc8340 Use kernel_func_name from aotCompiler (#66337)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66337

Right now, assembly code generated for the a given method from the model is named wrapper or func by default. The function name is then replaced with a proper kernel_func_name after target specific assembly is generated.
This PR propagates a desired kernel_func_name right from aotCompiler API so that the generated function has the needed name that doesn't need to be replaced later.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D31514095

Pulled By: priyaramani

fbshipit-source-id: b70c8e2c733600a435cd4e8b32092d37b7bf7de5
2021-10-23 02:20:45 -07:00
9e3a2babfa Make aotCompile support multiple input sizes (#66727)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66727

Make aotCompile support multiple input sizes

Test Plan:
Able to compile and run a model with multiple inputs
```
(pytorch)  ~/fbsource/fbcode/caffe2/fb/nnc
└─ $ PYTORCH_JIT_LOG_LEVEL=aot_compiler buck run //caffe2/binaries:aot_model_compiler -- --model aot_test_model.pt --model_name=aot_test_model --model_version=v1 --input_dims="2,2,2;2,2,2"
Building: finished in 3.2 sec (100%) 7461/7461 jobs, 0/7461 updated
  Total time: 3.4 sec
BUILD SUCCEEDED
[DUMP aot_compiler.cpp:097] graph before shape propagation
[DUMP aot_compiler.cpp:097] graph(%x.1 : Tensor,
[DUMP aot_compiler.cpp:097]       %y.1 : Tensor):
[DUMP aot_compiler.cpp:097]   %3 : int = prim::Constant[value=1]() # :0:0
[DUMP aot_compiler.cpp:097]   %4 : Tensor = aten::add(%x.1, %y.1, %3) # /data/users/priyaramani/fbsource/fbcode/caffe2/test/mobile/nnc/aot_test_model.py:10:15
[DUMP aot_compiler.cpp:097]   return (%4)
(1,.,.) =                                                                                                                                                                                            0.3357  0.6137
  0.8472  0.0858

(2,.,.) =
  0.8406  0.2959
  0.6012  0.7184
[ CPUFloatType{2,2,2} ]
(1,.,.) =
  0.7086  0.6398
  0.0579  0.1913

(2,.,.) =
  0.8598  0.3641
  0.5925  0.0200
[ CPUFloatType{2,2,2} ]
here
2
2
graph 0x6130001ee2d0
[DUMP aot_compiler.cpp:118] graph after shape propagation
[DUMP aot_compiler.cpp:118] graph(%x.1 : Float(2, 2, 2, strides=[4, 2, 1], requires_grad=0, device=cpu),
[DUMP aot_compiler.cpp:118]       %y.1 : Float(2, 2, 2, strides=[4, 2, 1], requires_grad=0, device=cpu)):
[DUMP aot_compiler.cpp:118]   %3 : int = prim::Constant[value=1]() # :0:0
[DUMP aot_compiler.cpp:118]   %4 : Tensor(2, 2, 2) = aten::add(%x.1, %y.1, %3) # /data/users/priyaramani/fbsource/fbcode/caffe2/test/mobile/nnc/aot_test_model.py:10:15
[DUMP aot_compiler.cpp:118]   return (%4)
The compiled llvm assembly code was saved to aot_test_model.compiled.ll
The compiled model was saved to aot_test_model.compiled.pt

└─ $ ./compile_model.sh -m aot_test_model -p /data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model.pt -v v1 -i "2,2,2;2,2,2"
+ VERSION=v1
+ getopts m:p:v:i:h opt
+ case $opt in
+ MODEL=aot_test_model
+ getopts m:p:v:i:h opt
+ case $opt in
+ MODEL_PATH=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model.pt
+ getopts m:p:v:i:h opt
+ case $opt in
+ VERSION=v1
+ getopts m:p:v:i:h opt
+ case $opt in
+ INPUT_DIMS='2,2,2;2,2,2'
+ getopts m:p:v:i:h opt
+ require_arg m aot_test_model
+ '[' -n aot_test_model ']'
+ require_arg p /data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model.pt
+ '[' -n /data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model.pt ']'
+ require_arg i '2,2,2;2,2,2'
+ '[' -n '2,2,2;2,2,2' ']'
+ '[' '!' -f /data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model.pt ']'
+++ dirname ./compile_model.sh
++ cd .
++ pwd -P
+ SRC_DIR=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc
+ FBCODE_DIR=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/../../..
+ FBSOURCE_DIR=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/../../../..
+ KERNEL_DIR=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/../../../../xplat/pytorch_models/build/aot_test_model/v1/nnc
++ echo /data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model.pt
++ sed 's/.pt.*//'
+ MODEL_PATH_PREFIX=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model
+ LLVM_CODE_PATH=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model.compiled.ll
+ ASSEMBLY_CODE_PATH=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model.compiled.s
+ COMPILED_MODEL_FILE_PATH=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model.compiled.pt
+ KERNEL_FUNC_NAME=nnc_aot_test_model_v1_forward
+ cd /data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/../../../..
+ buck run //xplat/caffe2/fb/lite_predictor:lite_predictor_nnc -- --model /data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model.compiled.pt --print_output true --input_dims '2,2,2$
2,2,2' --input_type 'float;float' --input_memory_format 'contiguous_format;contiguous_format'
clang-9: warning: argument unused during compilation: '-pthread' [-Wunused-command-line-argument]

Downloaded 1/4 artifacts, 2.11 Kbytes, 50.0% cache miss (for updated rules)
Building: finished in 12.2 sec (100%) 4572/4572 jobs, 3/4572 updated
  Total time: 12.2 sec
BUILD SUCCEEDED
Run with 56 threads
Run with 56 threads
Loading model...
Model loaded: /data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model.compiled.pt
Running forward ...
(1,.,.) =
 -0.7451 -0.7451
 -0.7451 -0.7451

(2,.,.) =
 -0.7451 -0.7451
 -0.7451 -0.7451
[ CPUFloatType{2,2,2} ]
Starting benchmark.
Running warmup runs.
Main runs.
Main run finished. Milliseconds per iter: 0.0887. Iters per second: 11274
Memory usage before main runs: 71262208 bytes
Memory usage after main runs: 71573504 bytes
Average memory increase per iter: 31129.6 bytes
0 value means "not available" in above
```

Reviewed By: ljk53

Differential Revision: D31631975

fbshipit-source-id: 7956787b3e121f9c14f4733398a64c2f7ae84373
2021-10-16 20:04:52 -07:00
962c6476da Refactor: move method to func compilation work to compileMethod, add option to specify method name (#66726)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66726

Move method to func compilation work to compileMethod

Test Plan:
Mobilenetv3 compiles and runs successfully
```
(pytorch)  ~/fbsource/fbcode/caffe2/fb/nnc
└─ $ buck run //caffe2/binaries:aot_model_compiler -- --model mobilenetv3.pt --model_name=pytorch_dev_mobilenetv3 --model_version=v1 --input_dims="1,3,224,224"
Downloaded 0/4 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules)
Building: finished in 13.2 sec (100%) 18719/18719 jobs, 2/18719 updated
  Total time: 13.5 sec
BUILD SUCCEEDED
The compiled llvm assembly code was saved to mobilenetv3.compiled.ll
The compiled model was saved to mobilenetv3.compiled.pt
```

Reviewed By: ljk53, IvanKobzarev

Differential Revision: D31624342

fbshipit-source-id: 233a6e94ea05ba8d6fc166d2414034c9e58cb076
2021-10-16 20:03:24 -07:00
2f099c7555 Revert D30652629: use irange for loops
Test Plan: revert-hammer

Differential Revision:
D30652629 (687c2267d4)

Original commit changeset: 0ae6c4bbbb55

fbshipit-source-id: 5c4f067b584a021c8c9656454d1ee60999600fb3
2021-10-15 15:23:10 -07:00
687c2267d4 use irange for loops (#66234)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66234

Modified loops in files under fbsource/fbcode/caffe2/ from the format

`for(TYPE var=x0;var<x_max;x++)`

to the format

`for(const auto var: irange(xmax))`

This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.

bypass_size_limit
allow-large-files

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D30652629

fbshipit-source-id: 0ae6c4bbbb554bad42e372792a6430e1acf15e3e
2021-10-15 13:50:33 -07:00
0cad2c0615 Move intraop_launch_future from Parallel.h (#64166)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64166

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D30728585

Pulled By: dagitses

fbshipit-source-id: 75a41418ae9218bec9bac27597051295222b6eee
2021-10-08 09:07:35 -07:00
78209b93b3 Don't build shared library for AOT Compiler (#66227)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66227

Building a shared library for AOT Compiler is not necessary as it's included in libtorch. Also having this built as a shared library was affecting android builds and we don't need to build AOT Compiler for mobile builds

Before fix:
```
(pytorch)  ~/local/pytorch master
└─ $ ANDROID_NDK=/opt/android_ndk/r20/ BUILD_PYTORCH_MOBILE=1 ANDROID_ABI=armeabi-v7a ./scripts/build_android.sh -DBUILD_BINARY=ON
Build with ANDROID_ABI[armeabi-v7a], ANDROID_NATIVE_API_LEVEL[21]
Bash: GNU bash, version 5.0.11(1)-release (x86_64-redhat-linux-gnu)
Python: 3.9.7 (default, Sep 16 2021, 13:09:58)
[GCC 7.5.0]
Caffe2 path: /data/users/priyaramani/pytorch
Using Android NDK at /opt/android_ndk/r20/
.
.
FAILED: lib/libaot_compiler.so
: && /opt/android_ndk/r20/toolchains/llvm/prebuilt/linux-x86_64/bin/clang++ --target=armv7-none-linux-androideabi21 --gcc-toolchain=/opt/android_ndk/r20/toolchains/llvm/prebuilt/linux-x86_64 --sysroot=/opt/and
roid_ndk/r20/toolchains/llvm/prebuilt/linux-x86_64/sysroot -fPIC -g -DANDROID -fdata-sections -ffunction-sections -funwind-tables -fstack-protector-strong -no-canonical-prefixes -fno-addrsig -march=armv7-a -mt
humb -Wa,--noexecstack -Wformat -Werror=format-security -frtti -fexceptions  -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -
DBUILD_LITE_INTERPRETER -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bound
s -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -W
no-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -Wno-invalid-partial-specialization -Wno-typedef-redefinition -Wno-unknown-warning-option -Wno-unused-private-field -Wno-inconsistent-miss
ing-override -Wno-aligned-allocation-unavailable -Wno-c++14-extensions -Wno-constexpr-not-const -Wno-missing-braces -Qunused-arguments -fcolor-diagnostics -Wno-unused-but-set-variable -Wno-maybe-uninitialized
-fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -g0 -Oz -DNDEBUG  -Wl,--exclude-libs,libgcc.a -Wl,--exclude-libs,libatomic.a -static-libstdc++ -Wl,--build-id -Wl,--warn-shared-text
rel -Wl,--fatal-warnings -Wl,--exclude-libs,libunwind.a -Wl,--no-undefined -Qunused-arguments -Wl,-z,noexecstack  -rdynamic -shared -Wl,-soname,libaot_compiler.so -o lib/libaot_compiler.so caffe2/torch/CMakeFi
les/aot_compiler.dir/csrc/jit/mobile/nnc/aot_compiler.cpp.o  -latomic -lm && :
caffe2/torch/CMakeFiles/aot_compiler.dir/csrc/jit/mobile/nnc/aot_compiler.cpp.o:aot_compiler.cpp:function at::from_blob(void*, c10::ArrayRef<long long>, c10::TensorOptions const&): error: undefined reference t
o 'at::TensorMaker::make_tensor()'
.
.
caffe2/torch/CMakeFiles/aot_compiler.dir/csrc/jit/mobile/nnc/aot_compiler.cpp.o:aot_compiler.cpp:function torch::jit::mobile::nnc::Function::Function(): error: undefined reference to 'c10::AnyType::get()'
clang++: error: linker command failed with exit code 1 (use -v to see invocation)
```

After fix:
```
(pytorch)  ~/local/pytorch master
└─ $ ANDROID_NDK=/opt/android_ndk/r20/ BUILD_PYTORCH_MOBILE=1 ANDROID_ABI=armeabi-v7a ./scripts/build_android.sh -DBUILD_BINARY=ON
Build with ANDROID_ABI[armeabi-v7a], ANDROID_NATIVE_API_LEVEL[21]
Bash: GNU bash, version 5.0.11(1)-release (x86_64-redhat-linux-gnu)
Python: 3.9.7 (default, Sep 16 2021, 13:09:58)
[GCC 7.5.0]
Caffe2 path: /data/users/priyaramani/pytorch
Using Android NDK at /opt/android_ndk/r20/
.
.
-- Build files have been written to: /data/users/priyaramani/pytorch/build_android
Will install headers and libs to /data/users/priyaramani/pytorch/build_android/install for further Android project usage.
[2/3] Install the project...
-- Install configuration: "Release"
Installation completed, now you can copy the headers/libs from /data/users/priyaramani/pytorch/build_android/install to your Android project directory.
```

Test Plan: Imported from OSS

Reviewed By: ljk53, axitkhurana

Differential Revision: D31450970

Pulled By: priyaramani

fbshipit-source-id: 87e48033f1db46fef112bae1239a09a2365620d2
2021-10-06 15:57:32 -07:00
df475aa1dc Update Vulkan runner in benchmark binary to handle non-tensor inputs (#66123)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66123

Some models may take in a list of tensors as inputs, thus the bundled inputs will contain `IValues` that are of the type `c10::List`. For Vulkan models, every tensor in the `IValue` list has to be converted to a vulkan tensor first, and this case is not currently handled by the Vulkan model wrapper in the benchmark binary.

This diff introduces `IValue` type checking to the input processor of the Vulkan model wrapper, and adds support for Tensor and List types.

Test Plan:
```
# Build the binary
cd ~/fbsource
buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:ptmobile_compareAndroid\#android-arm64 --show-output
# Push it to the device
adb push buck-out/gen/xplat/caffe2/ptmobile_compareAndroid\#android-arm64 /data/local/tmp/compare_models

# Run the benchmark binary
BENCH_CMD="/data/local/tmp/compare_models"
BENCH_CMD+=" --model=$PATH_TO_MODEL"
BENCH_CMD+=" --refmodel=$PATH_TO_REFERENCE_MODEL"
BENCH_CMD+=" --input_type=float --input_dims=$MODEL_INPUT_SIZE"
BENCH_CMD+=" --iter=100"
BENCH_CMD+=" --tolerance 1e-5"
```

Reviewed By: beback4u

Differential Revision: D31276862

fbshipit-source-id: 1d9abf958963da6ecad641202f0458402bee5ced
2021-10-05 07:59:56 -07:00
63bb7c6dba Refactor AotCompile to return a pair (#65707)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65707

Refactoring aotCompile to return a pair of compiled function and the LLVM assembly instead of updating an incoming string with assembly code

Testing: Gives expected results when compiled and run
```
(pytorch)  ~/local/pytorch refactor_aot
└─ $ build/bin/aot_model_compiler --model mobilenetv3.pt --model_name=pytorch_dev_mobilenetv3 --model_version=v1 --input_dims="2,2,2"
The compiled model was saved to mobilenetv3.compiled.pt
```

Test Plan: Imported from OSS

Reviewed By: qihqi

Differential Revision: D31220452

Pulled By: priyaramani

fbshipit-source-id: f957c53ba83f876a2e7dbdd4b4571a760b3b6a9a
2021-09-27 18:56:04 -07:00
f101070587 Small improvements to compare_models_torch binary (#65171)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65171

Add the model comparison binary to BUCK, and also add some quality of life features such as controlling the input range.

Test Plan:
```
# Build the binary
cd ~/fbsource
buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:ptmobile_compareAndroid\#android-arm64 --show-ou
# Push it to the device
adb push buck-out/gen/xplat/caffe2/ptmobile_compareAndroid\#android-arm64 /data/local/tmp/compare_models

# Run the benchmark binary
BENCH_CMD="/data/local/tmp/compare_models"
BENCH_CMD+=" --model=$PATH_TO_MODEL"
BENCH_CMD+=" --refmodel=$PATH_TO_REFERENCE_MODEL"
BENCH_CMD+=" --input_type=float --input_dims=$MODEL_INPUT_SIZE"
BENCH_CMD+=" --iter=100"
BENCH_CMD+=" --tolerance 1e-5"

```

Reviewed By: beback4u

Differential Revision: D30371322

fbshipit-source-id: 5e520aaf119c90985a1d5a135f76e4057148333b
2021-09-17 08:32:45 -07:00
206646d6ed Add NNC AOT Compiler executable (#63994)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63994

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D30582149

Pulled By: priyaramani

fbshipit-source-id: 3bbf085428824c3cb308e006c18bb0a57f50fef6
2021-09-15 19:18:24 -07:00
07e41cf2d7 [easy]Unbreak caffe2benchmarking build (#63655)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63655

ghstack-source-id: 136324310

Test Plan: buck build //fbobjc/Apps/Internal/Caffe2Benchmarking:Caffe2Benchmarking fbobjc/mode/iphonesimulator

Reviewed By: hl475, JacobSzwejbka

Differential Revision: D30455659

fbshipit-source-id: b6da6be4f89b6e84753ef0849ffedea04785034a
2021-08-20 12:57:27 -07:00
1022443168 Revert D30279364: [codemod][lint][fbcode/c*] Enable BLACK by default
Test Plan: revert-hammer

Differential Revision:
D30279364 (b004307252)

Original commit changeset: c1ed77dfe43a

fbshipit-source-id: eab50857675c51e0088391af06ec0ecb14e2347e
2021-08-12 11:45:01 -07:00
b004307252 [codemod][lint][fbcode/c*] Enable BLACK by default
Test Plan: manual inspection & sandcastle

Reviewed By: zertosh

Differential Revision: D30279364

fbshipit-source-id: c1ed77dfe43a3bde358f92737cd5535ae5d13c9a
2021-08-12 10:58:35 -07:00
3a0801f960 [skip ci] Fix "arugment" typos (#61459)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/61455.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61459

Reviewed By: soulitzer

Differential Revision: D29636559

Pulled By: samestep

fbshipit-source-id: 9ad65265c0491d9e81bb303abe3a07c6843bfa4a
2021-07-15 15:20:18 -07:00
808d0e3353 [caffe2] update make_mnist_db and make_image_db to move strings into DB::Put() (#60919)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60919

Update make_mnist_db.cc and make_image_db.cc to work with the DB API changes
in D29204425 (00896cb9ed).  This is similar to the changes to make_cifar_db.cc landed in
D29374754 (394f60b0fc).
ghstack-source-id: 132621346

Test Plan: buck build caffe2/binaries/...

Reviewed By: valmikir

Differential Revision: D29447314

fbshipit-source-id: 33aff85c24d8b785211287de23d46704c7eb0726
2021-06-29 11:52:43 -07:00