Summary:
1. add one more model lib dep.
2. add error message when torchscript failed to find a class in python compilation unit.
Test Plan: CI
Reviewed By: jingsh
Differential Revision: D59243250
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129897
Approved by: https://github.com/jingsh
The important comment:
```
# Whenever we allocate a fresh unbacked Symbol, we add it to this
# pending list. Unbacked symbol allocation can occur at unpredictable
# points during meta tensor propagation, but at some point, the we
# have to know what the binding site for an unbacked symbol is, and
# this is computed when we actually place the node in the graph. The
# important thing is that we always actually handle every unaccounted
# for unbacked symbol, so this list helps us keep track of them and
# then make sure they are all accounted for.
#
# We could potentially give rise to errors earlier by lexically
# scoping when we do propagation, and only allowing unbacked symbols
# to be allocated at this point in time. However this is inconvenient
# to do in Dynamo, because fake tensor propagation is far from when we
# analyze binding sites (set_example_value), so we do it in a more
# mutatey way.
#
# NB: fresh unbacked symbols NEVER get substitutions applied to them,
# they are binding sites!
```
The compute_unbacked_bindings is the other half of the equation: the thing that actually consumes the pending_fresh_unbacked_symbols and does something with them. Important comment:
```
After having run fake tensor propagation and producing example_value
result, traverse example_value looking for freshly bound unbacked
symbols and record their paths for later. It is an error if
we have allocated an unbacked SymInt but it cannot be found in
example_value. (NB: this means if you have a multi-output
function, you must call this on the tuple of tensor output, you
cannot wait!)
```
For example, if I return a tensor with size `[u0, u1]`, and u1 is a fresh unbacked SymInt, then I'll have `{u1: KeyPath(".size(1)")}`, telling me I can get u1 by running `size(1)` on the result of this node. u0 is not fresh (it probably flowed in as an argument), so I don't generate a binding for it.
I eventually intend to propagate this information all the way to Inductor lowering, where extra metadata about unbacked symbol binding will be canonically used for codegen, instead of trying to infer it from defs/uses.
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124290
Approved by: https://github.com/lezcano
We override the `__call__` method and register fake, functional, proxy default dispatch mode implementation in its python_key_mode_table.
The idea is:
1. when inputs contains FakeScriptObject, we dispatch it through _get_dispatch mechanism. We implement dispatch mode keys automatically in the operator's constructor.
2. when inputs are not fakified, we dispatch through the original c++ dispatcher.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123367
Approved by: https://github.com/zou3519
This PR also contains a basket of fixes that were turned up by now testing more arguments with SymInt. I fixed as many of the easy ones as I could easily get earlier in this stack and a bunch here, but there are some more annoying ones I xfailed.
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113452
Approved by: https://github.com/Chillee
ghstack dependencies: #113877, #113911
This moves the `overloaded_args` field from FunctionSignature to PythonArgs. FunctionSignature is shared by all calls and should be immutable. PythonArgs contains the parsing results for an single call to the PyTorch API.
I did not measure a difference in performance in the "overrides_benchmark", although I expect there to be a bit more work in the common case. Note that the noise factor for the benchmark is much larger than the differences reported below:
Before:
```
Type tensor had a minimum time of 2.3615360260009766 us and a standard deviation of 0.7833134150132537 us.
Type SubTensor had a minimum time of 10.473251342773438 us and a standard deviation of 0.1973132457351312 us.
Type WithTorchFunction had a minimum time of 5.484819412231445 us and a standard deviation of 0.13305981701705605 us.
Type SubWithTorchFunction had a minimum time of 11.098146438598633 us and a standard deviation of 0.15598918253090233 us.
```
After:
```
Type tensor had a minimum time of 2.2134780883789062 us and a standard deviation of 0.802064489107579 us.
Type SubTensor had a minimum time of 10.625839233398438 us and a standard deviation of 0.15155907021835446 us.
Type WithTorchFunction had a minimum time of 5.520820617675781 us and a standard deviation of 0.23115111980587244 us.
Type SubWithTorchFunction had a minimum time of 11.227846145629883 us and a standard deviation of 0.23032321769278497 us.
```
Fixes#106974
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106983
Approved by: https://github.com/zou3519, https://github.com/ezyang, https://github.com/albanD
arguments() returns vector member of object returned by schema() call.
When object returned by schema() call is destroyed, the vector is deallocated as well,
it's lifetime isn't extended.
This issue detected while running `pytest -v test/mobile/test_lite_script_type.py -k test_nest_typing_namedtuple_custom_classtype` with ASAN.
<details>
<summary>ASAN output</summary>
```
==1134126==ERROR: AddressSanitizer: heap-use-after-free on address 0x60d0005a5790 at pc 0x03ff844488d8 bp 0x03fff584afe8 sp 0x03fff584afd8
READ of size 8 at 0x60d0005a5790 thread T0
#0 0x3ff844488d7 in __gnu_cxx::__normal_iterator<c10::Argument const*, std::vector<c10::Argument, std::allocator<c10::Argument> > >::__normal_iterator(c10::Argument const* const&) /usr/lib/gcc/s390x-i
bm-linux-gnu/11/include/g++-v11/bits/stl_iterator.h:1028
#1 0x3ff8444293f in std::vector<c10::Argument, std::allocator<c10::Argument> >::begin() const /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-v11/bits/stl_vector.h:821
#2 0x3ff84d807d1 in torch::jit::toPyObject(c10::IValue) /home/user/pytorch/torch/csrc/jit/python/pybind_utils.cpp:617
#3 0x3ff84d80305 in torch::jit::toPyObject(c10::IValue) /home/user/pytorch/torch/csrc/jit/python/pybind_utils.cpp:604
#4 0x3ff84856871 in pybind11::detail::type_caster<c10::IValue, void>::cast(c10::IValue, pybind11::return_value_policy, pybind11::handle) /home/user/pytorch/torch/csrc/jit/python/pybind.h:138
#5 0x3ff85318191 in pybind11::cpp_function::initialize<torch::jit::initJitScriptBindings(_object*)::$_45, c10::IValue, torch::jit::mobile::Module&, pybind11::tuple const&, pybind11::name, pybind11::is
_method, pybind11::sibling, pybind11::arg>(torch::jit::initJitScriptBindings(_object*)::$_45&&, c10::IValue (*)(torch::jit::mobile::Module&, pybind11::tuple const&), pybind11::name const&, pybind11::is_me
thod const&, pybind11::sibling const&, pybind11::arg const&)::{lambda(pybind11::detail::function_call&)#1}::operator()(pybind11::detail::function_call&) const /home/user/pytorch/cmake/../third_party/pybin
d11/include/pybind11/pybind11.h:249
#6 0x3ff85317cfd in pybind11::cpp_function::initialize<torch::jit::initJitScriptBindings(_object*)::$_45, c10::IValue, torch::jit::mobile::Module&, pybind11::tuple const&, pybind11::name, pybind11::is
_method, pybind11::sibling, pybind11::arg>(torch::jit::initJitScriptBindings(_object*)::$_45&&, c10::IValue (*)(torch::jit::mobile::Module&, pybind11::tuple const&), pybind11::name const&, pybind11::is_me
thod const&, pybind11::sibling const&, pybind11::arg const&)::{lambda(pybind11::detail::function_call&)#1}::__invoke(pybind11::detail::function_call&) /home/user/pytorch/cmake/../third_party/pybind11/incl
ude/pybind11/pybind11.h:224
#7 0x3ff82ee52e9 in pybind11::cpp_function::dispatcher(_object*, _object*, _object*) /home/user/pytorch/cmake/../third_party/pybind11/include/pybind11/pybind11.h:929
#8 0x3ffab002903 in cfunction_call Objects/methodobject.c:543
#9 0x3ffaaf8a933 in _PyObject_MakeTpCall Objects/call.c:215
#10 0x3ffaaf8e919 in _PyObject_VectorcallTstate Include/cpython/abstract.h:112
#11 0x3ffaaf8eddd in method_vectorcall Objects/classobject.c:53
#12 0x3ffab0f00a9 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114
#13 0x3ffab0f013d in PyObject_Vectorcall Include/cpython/abstract.h:123
#14 0x3ffab105447 in call_function Python/ceval.c:5891
#15 0x3ffab0ff779 in _PyEval_EvalFrameDefault Python/ceval.c:4181
#16 0x3ffab0f052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46
#17 0x3ffab102b67 in _PyEval_Vector Python/ceval.c:5065
#18 0x3ffaaf8aec1 in _PyFunction_Vectorcall Objects/call.c:342
#19 0x3ffaaf8a615 in _PyObject_FastCallDictTstate Objects/call.c:142
#20 0x3ffaaf8b271 in _PyObject_Call_Prepend Objects/call.c:431
#21 0x3ffab03f307 in slot_tp_call Objects/typeobject.c:7494
#22 0x3ffaaf8a933 in _PyObject_MakeTpCall Objects/call.c:215
#23 0x3ffab0f0081 in _PyObject_VectorcallTstate Include/cpython/abstract.h:112
#24 0x3ffab0f013d in PyObject_Vectorcall Include/cpython/abstract.h:123
#25 0x3ffab105447 in call_function Python/ceval.c:5891
#26 0x3ffab0ff905 in _PyEval_EvalFrameDefault Python/ceval.c:4213
#27 0x3ffab0f052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46
#28 0x3ffab102b67 in _PyEval_Vector Python/ceval.c:5065
#29 0x3ffaaf8aec1 in _PyFunction_Vectorcall Objects/call.c:342
#30 0x3ffaaf8e941 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114
#31 0x3ffaaf8eddd in method_vectorcall Objects/classobject.c:53
#32 0x3ffab0f00a9 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114
#33 0x3ffab0f013d in PyObject_Vectorcall Include/cpython/abstract.h:123
#34 0x3ffab105447 in call_function Python/ceval.c:5891
#35 0x3ffab0ff905 in _PyEval_EvalFrameDefault Python/ceval.c:4213
#36 0x3ffab0f052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46
#37 0x3ffab102b67 in _PyEval_Vector Python/ceval.c:5065
#38 0x3ffaaf8aec1 in _PyFunction_Vectorcall Objects/call.c:342
#39 0x3ffab0f00a9 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114
#40 0x3ffab0f013d in PyObject_Vectorcall Include/cpython/abstract.h:123
#41 0x3ffab105447 in call_function Python/ceval.c:5891
#42 0x3ffab0ff7d7 in _PyEval_EvalFrameDefault Python/ceval.c:4198
#43 0x3ffab0f052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46
#44 0x3ffab102b67 in _PyEval_Vector Python/ceval.c:5065
#45 0x3ffaaf8aec1 in _PyFunction_Vectorcall Objects/call.c:342
#46 0x3ffaaf8e941 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114
#47 0x3ffaaf8eddd in method_vectorcall Objects/classobject.c:53
#48 0x3ffab0f00a9 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114
#49 0x3ffab0f013d in PyObject_Vectorcall Include/cpython/abstract.h:123
#50 0x3ffab105447 in call_function Python/ceval.c:5891
#51 0x3ffab0ffa57 in _PyEval_EvalFrameDefault Python/ceval.c:4231
#52 0x3ffab0f052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46
#53 0x3ffab102b67 in _PyEval_Vector Python/ceval.c:5065
#54 0x3ffaaf8aec1 in _PyFunction_Vectorcall Objects/call.c:342
#55 0x3ffaaf8e941 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114
#56 0x3ffaaf8eddd in method_vectorcall Objects/classobject.c:53
#57 0x3ffab0f00a9 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114
#58 0x3ffab0f013d in PyObject_Vectorcall Include/cpython/abstract.h:123
#59 0x3ffab105447 in call_function Python/ceval.c:5891
#60 0x3ffab0ffa57 in _PyEval_EvalFrameDefault Python/ceval.c:4231
#61 0x3ffab0f052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46
#62 0x3ffab102b67 in _PyEval_Vector Python/ceval.c:5065
#63 0x3ffaaf8aec1 in _PyFunction_Vectorcall Objects/call.c:342
#64 0x3ffaaf8e941 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114
#65 0x3ffaaf8eddd in method_vectorcall Objects/classobject.c:53
#66 0x3ffaaf8ab9b in PyVectorcall_Call Objects/call.c:267
#67 0x3ffaaf8ac65 in _PyObject_Call Objects/call.c:290
#68 0x3ffaaf8ada9 in PyObject_Call Objects/call.c:317
#69 0x3ffab1059c7 in do_call_core Python/ceval.c:5943
#70 0x3ffab0ffd39 in _PyEval_EvalFrameDefault Python/ceval.c:4277
#71 0x3ffab0f052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46
#72 0x3ffab102b67 in _PyEval_Vector Python/ceval.c:5065
#73 0x3ffaaf8aec1 in _PyFunction_Vectorcall Objects/call.c:342
#74 0x3ffaaf8a695 in _PyObject_FastCallDictTstate Objects/call.c:153
#75 0x3ffaaf8b271 in _PyObject_Call_Prepend Objects/call.c:431
#76 0x3ffab03f307 in slot_tp_call Objects/typeobject.c:7494
#77 0x3ffaaf8a933 in _PyObject_MakeTpCall Objects/call.c:215
#78 0x3ffab0f0081 in _PyObject_VectorcallTstate Include/cpython/abstract.h:112
#79 0x3ffab0f013d in PyObject_Vectorcall Include/cpython/abstract.h:123
#80 0x3ffab105447 in call_function Python/ceval.c:5891
#81 0x3ffab0ffa57 in _PyEval_EvalFrameDefault Python/ceval.c:4231
#82 0x3ffab0f052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46
#83 0x3ffab102b67 in _PyEval_Vector Python/ceval.c:5065
#84 0x3ffaaf8aec1 in _PyFunction_Vectorcall Objects/call.c:342
#85 0x3ffab0f00a9 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114
#86 0x3ffab0f013d in PyObject_Vectorcall Include/cpython/abstract.h:123
#87 0x3ffab105447 in call_function Python/ceval.c:5891
#88 0x3ffab0ff7d7 in _PyEval_EvalFrameDefault Python/ceval.c:4198
#89 0x3ffab0f052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46
#90 0x3ffab102b67 in _PyEval_Vector Python/ceval.c:5065
#91 0x3ffaaf8aec1 in _PyFunction_Vectorcall Objects/call.c:342
#92 0x3ffaaf8ab15 in PyVectorcall_Call Objects/call.c:255
#93 0x3ffaaf8ac65 in _PyObject_Call Objects/call.c:290
#94 0x3ffaaf8ada9 in PyObject_Call Objects/call.c:317
#95 0x3ffab1059c7 in do_call_core Python/ceval.c:5943
#96 0x3ffab0ffd39 in _PyEval_EvalFrameDefault Python/ceval.c:4277
#97 0x3ffab0f052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46
#98 0x3ffab102b67 in _PyEval_Vector Python/ceval.c:5065
#99 0x3ffaaf8aec1 in _PyFunction_Vectorcall Objects/call.c:342
#100 0x3ffab0f00a9 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114
#101 0x3ffab0f013d in PyObject_Vectorcall Include/cpython/abstract.h:123
#102 0x3ffab105447 in call_function Python/ceval.c:5891
#103 0x3ffab0ff779 in _PyEval_EvalFrameDefault Python/ceval.c:4181
#104 0x3ffab0f052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46
#105 0x3ffab102b67 in _PyEval_Vector Python/ceval.c:5065
#106 0x3ffaaf8aec1 in _PyFunction_Vectorcall Objects/call.c:342
#107 0x3ffaaf8e941 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114
#108 0x3ffaaf8eddd in method_vectorcall Objects/classobject.c:53
#109 0x3ffab0f00a9 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114
#110 0x3ffab0f013d in PyObject_Vectorcall Include/cpython/abstract.h:123
#111 0x3ffab105447 in call_function Python/ceval.c:5891
#112 0x3ffab0ff779 in _PyEval_EvalFrameDefault Python/ceval.c:4181
#113 0x3ffab0f052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46
#114 0x3ffab102b67 in _PyEval_Vector Python/ceval.c:5065
#115 0x3ffaaf8aec1 in _PyFunction_Vectorcall Objects/call.c:342
#116 0x3ffaaf8a695 in _PyObject_FastCallDictTstate Objects/call.c:153
#117 0x3ffaaf8b271 in _PyObject_Call_Prepend Objects/call.c:431
#118 0x3ffab03f307 in slot_tp_call Objects/typeobject.c:7494
#119 0x3ffaaf8ad17 in _PyObject_Call Objects/call.c:305
#120 0x3ffaaf8ada9 in PyObject_Call Objects/call.c:317
#121 0x3ffab1059c7 in do_call_core Python/ceval.c:5943
#122 0x3ffab0ffd39 in _PyEval_EvalFrameDefault Python/ceval.c:4277
#123 0x3ffab0f052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46
#124 0x3ffab102b67 in _PyEval_Vector Python/ceval.c:5065
#125 0x3ffaaf8aec1 in _PyFunction_Vectorcall Objects/call.c:342
#126 0x3ffab0f00a9 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114
#127 0x3ffab0f013d in PyObject_Vectorcall Include/cpython/abstract.h:123
#128 0x3ffab105447 in call_function Python/ceval.c:5891
#129 0x3ffab0ff905 in _PyEval_EvalFrameDefault Python/ceval.c:4213
#130 0x3ffab0f052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46
#131 0x3ffab102b67 in _PyEval_Vector Python/ceval.c:5065
#132 0x3ffaaf8aec1 in _PyFunction_Vectorcall Objects/call.c:342
#133 0x3ffaaf8e941 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114
#134 0x3ffaaf8eddd in method_vectorcall Objects/classobject.c:53
#135 0x3ffab0f00a9 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114
#136 0x3ffab0f013d in PyObject_Vectorcall Include/cpython/abstract.h:123
#137 0x3ffab105447 in call_function Python/ceval.c:5891
#138 0x3ffab0ffa57 in _PyEval_EvalFrameDefault Python/ceval.c:4231
#139 0x3ffab0f052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46
#140 0x3ffab102b67 in _PyEval_Vector Python/ceval.c:5065
#141 0x3ffaaf8aec1 in _PyFunction_Vectorcall Objects/call.c:342
#142 0x3ffaaf8ab15 in PyVectorcall_Call Objects/call.c:255
#143 0x3ffaaf8ac65 in _PyObject_Call Objects/call.c:290
#144 0x3ffaaf8ada9 in PyObject_Call Objects/call.c:317
#145 0x3ffab1059c7 in do_call_core Python/ceval.c:5943
#146 0x3ffab0ffd39 in _PyEval_EvalFrameDefault Python/ceval.c:4277
#147 0x3ffab0f052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46
#148 0x3ffab102b67 in _PyEval_Vector Python/ceval.c:5065
#149 0x3ffaaf8aec1 in _PyFunction_Vectorcall Objects/call.c:342
#150 0x3ffab0f00a9 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114
#151 0x3ffab0f013d in PyObject_Vectorcall Include/cpython/abstract.h:123
#152 0x3ffab105447 in call_function Python/ceval.c:5891
#153 0x3ffab0ff905 in _PyEval_EvalFrameDefault Python/ceval.c:4213
#154 0x3ffab0f052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46
#155 0x3ffab102b67 in _PyEval_Vector Python/ceval.c:5065
#156 0x3ffaaf8aec1 in _PyFunction_Vectorcall Objects/call.c:342
#157 0x3ffab0f00a9 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114
#158 0x3ffab0f013d in PyObject_Vectorcall Include/cpython/abstract.h:123
#159 0x3ffab105447 in call_function Python/ceval.c:5891
#160 0x3ffab0ffa57 in _PyEval_EvalFrameDefault Python/ceval.c:4231
#161 0x3ffab0f052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46
#162 0x3ffab102b67 in _PyEval_Vector Python/ceval.c:5065
#163 0x3ffaaf8aec1 in _PyFunction_Vectorcall Objects/call.c:342
#164 0x3ffaaf8ab15 in PyVectorcall_Call Objects/call.c:255
#165 0x3ffaaf8ac65 in _PyObject_Call Objects/call.c:290
#166 0x3ffaaf8ada9 in PyObject_Call Objects/call.c:317
#167 0x3ffab1059c7 in do_call_core Python/ceval.c:5943
#168 0x3ffab0ffd39 in _PyEval_EvalFrameDefault Python/ceval.c:4277
#169 0x3ffab0f052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46
#170 0x3ffab102b67 in _PyEval_Vector Python/ceval.c:5065
#171 0x3ffaaf8aec1 in _PyFunction_Vectorcall Objects/call.c:342
#172 0x3ffab0f00a9 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114
#173 0x3ffab0f013d in PyObject_Vectorcall Include/cpython/abstract.h:123
#174 0x3ffab105447 in call_function Python/ceval.c:5891
#175 0x3ffab0ff779 in _PyEval_EvalFrameDefault Python/ceval.c:4181
#176 0x3ffab0f052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46
#177 0x3ffab102b67 in _PyEval_Vector Python/ceval.c:5065
#178 0x3ffaaf8aec1 in _PyFunction_Vectorcall Objects/call.c:342
#179 0x3ffaaf8e941 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114
#180 0x3ffaaf8eddd in method_vectorcall Objects/classobject.c:53
#181 0x3ffab0f00a9 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114
#182 0x3ffab0f013d in PyObject_Vectorcall Include/cpython/abstract.h:123
#183 0x3ffab105447 in call_function Python/ceval.c:5891
#184 0x3ffab0ff779 in _PyEval_EvalFrameDefault Python/ceval.c:4181
#185 0x3ffab0f052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46
#186 0x3ffab102b67 in _PyEval_Vector Python/ceval.c:5065
#187 0x3ffaaf8aec1 in _PyFunction_Vectorcall Objects/call.c:342
#188 0x3ffaaf8a695 in _PyObject_FastCallDictTstate Objects/call.c:153
#189 0x3ffaaf8b271 in _PyObject_Call_Prepend Objects/call.c:431
#190 0x3ffab03f307 in slot_tp_call Objects/typeobject.c:7494
#191 0x3ffaaf8a933 in _PyObject_MakeTpCall Objects/call.c:215
#192 0x3ffab0f0081 in _PyObject_VectorcallTstate Include/cpython/abstract.h:112
#193 0x3ffab0f013d in PyObject_Vectorcall Include/cpython/abstract.h:123
#194 0x3ffab105447 in call_function Python/ceval.c:5891
#195 0x3ffab0ffa57 in _PyEval_EvalFrameDefault Python/ceval.c:4231
#196 0x3ffab0f052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46
#197 0x3ffab102b67 in _PyEval_Vector Python/ceval.c:5065
#198 0x3ffaaf8aec1 in _PyFunction_Vectorcall Objects/call.c:342
#199 0x3ffaaf8ab15 in PyVectorcall_Call Objects/call.c:255
#200 0x3ffaaf8ac65 in _PyObject_Call Objects/call.c:290
#201 0x3ffaaf8ada9 in PyObject_Call Objects/call.c:317
#202 0x3ffab1059c7 in do_call_core Python/ceval.c:5943
#203 0x3ffab0ffd39 in _PyEval_EvalFrameDefault Python/ceval.c:4277
#204 0x3ffab0f052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46
#205 0x3ffab102b67 in _PyEval_Vector Python/ceval.c:5065
#206 0x3ffaaf8aec1 in _PyFunction_Vectorcall Objects/call.c:342
#207 0x3ffab0f00a9 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114
#208 0x3ffab0f013d in PyObject_Vectorcall Include/cpython/abstract.h:123
#209 0x3ffab105447 in call_function Python/ceval.c:5891
#210 0x3ffab0ff779 in _PyEval_EvalFrameDefault Python/ceval.c:4181
#211 0x3ffab0f052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46
#212 0x3ffab102b67 in _PyEval_Vector Python/ceval.c:5065
#213 0x3ffaaf8aec1 in _PyFunction_Vectorcall Objects/call.c:342
#214 0x3ffaaf8e941 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114
#215 0x3ffaaf8eddd in method_vectorcall Objects/classobject.c:53
#216 0x3ffab0f00a9 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114
#216 0x3ffab0f00a9 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114
#217 0x3ffab0f013d in PyObject_Vectorcall Include/cpython/abstract.h:123
#218 0x3ffab105447 in call_function Python/ceval.c:5891
#219 0x3ffab0ff779 in _PyEval_EvalFrameDefault Python/ceval.c:4181
#220 0x3ffab0f052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46
#221 0x3ffab102b67 in _PyEval_Vector Python/ceval.c:5065
#222 0x3ffaaf8aec1 in _PyFunction_Vectorcall Objects/call.c:342
#223 0x3ffaaf8a695 in _PyObject_FastCallDictTstate Objects/call.c:153
#224 0x3ffaaf8b271 in _PyObject_Call_Prepend Objects/call.c:431
#225 0x3ffab03f307 in slot_tp_call Objects/typeobject.c:7494
#226 0x3ffaaf8a933 in _PyObject_MakeTpCall Objects/call.c:215
#227 0x3ffab0f0081 in _PyObject_VectorcallTstate Include/cpython/abstract.h:112
#228 0x3ffab0f013d in PyObject_Vectorcall Include/cpython/abstract.h:123
#229 0x3ffab105447 in call_function Python/ceval.c:5891
#230 0x3ffab0ffa57 in _PyEval_EvalFrameDefault Python/ceval.c:4231
#231 0x3ffab0f052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46
#232 0x3ffab102b67 in _PyEval_Vector Python/ceval.c:5065
#233 0x3ffaaf8aec1 in _PyFunction_Vectorcall Objects/call.c:342
#234 0x3ffab0f00a9 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114
#235 0x3ffab0f013d in PyObject_Vectorcall Include/cpython/abstract.h:123
#236 0x3ffab105447 in call_function Python/ceval.c:5891
#237 0x3ffab0ff905 in _PyEval_EvalFrameDefault Python/ceval.c:4213
#238 0x3ffab0f052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46
#239 0x3ffab102b67 in _PyEval_Vector Python/ceval.c:5065
#240 0x3ffaaf8aec1 in _PyFunction_Vectorcall Objects/call.c:342
#241 0x3ffab0f00a9 in _PyObject_VectorcallTstate Include/cpython/abstract.h:114
#242 0x3ffab0f013d in PyObject_Vectorcall Include/cpython/abstract.h:123
#243 0x3ffab105447 in call_function Python/ceval.c:5891
#244 0x3ffab0ff905 in _PyEval_EvalFrameDefault Python/ceval.c:4213
#245 0x3ffab0f052b in _PyEval_EvalFrame Include/internal/pycore_ceval.h:46
#246 0x3ffab102b67 in _PyEval_Vector Python/ceval.c:5065
#247 0x3ffaaf8aec1 in _PyFunction_Vectorcall Objects/call.c:342
#248 0x3ffaaf8ab15 in PyVectorcall_Call Objects/call.c:255
#249 0x3ffaaf8ac65 in _PyObject_Call Objects/call.c:290
0x60d0005a5790 is located 80 bytes inside of 136-byte region [0x60d0005a5740,0x60d0005a57c8)
freed by thread T0 here:
#0 0x3ffab537de5 in operator delete(void*) /var/tmp/portage/sys-devel/gcc-11.3.1_p20230303/work/gcc-11-20230303/libsanitizer/asan/asan_new_delete.cpp:160
#1 0x3ff55984fdb in __gnu_cxx::new_allocator<std::_Sp_counted_ptr_inplace<c10::FunctionSchema, std::allocator<c10::FunctionSchema>, (__gnu_cxx::_Lock_policy)2> >::deallocate(std::_Sp_counted_ptr_inplace<c10::FunctionSchema, std::allocator<c10::FunctionSchema>, (__gnu_cxx::_Lock_policy)2>*, unsigned long) /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-v11/ext/new_allocator.h:145
previously allocated by thread T0 here:
#0 0x3ffab53734f in operator new(unsigned long) /var/tmp/portage/sys-devel/gcc-11.3.1_p20230303/work/gcc-11-20230303/libsanitizer/asan/asan_new_delete.cpp:99
#1 0x3ff5598443f in __gnu_cxx::new_allocator<std::_Sp_counted_ptr_inplace<c10::FunctionSchema, std::allocator<c10::FunctionSchema>, (__gnu_cxx::_Lock_policy)2> >::allocate(unsigned long, void const*) /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-v11/ext/new_allocator.h:127
#2 0x3fff5849ecf ([stack]+0xb2ecf)
SUMMARY: AddressSanitizer: heap-use-after-free /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-v11/bits/stl_iterator.h:1028 in __gnu_cxx::__normal_iterator<c10::Argument const*, std::vector<c10::Argument, std::allocator<c10::Argument> > >::__normal_iterator(c10::Argument const* const&)
Shadow bytes around the buggy address:
0x100c1a000b4aa0: fd fd fd fd fd fd fd fd fd fd fd fa fa fa fa fa
0x100c1a000b4ab0: fa fa fa fa fd fd fd fd fd fd fd fd fd fd fd fd
0x100c1a000b4ac0: fd fd fd fd fd fa fa fa fa fa fa fa fa fa fd fd
0x100c1a000b4ad0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa
0x100c1a000b4ae0: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
=>0x100c1a000b4af0: fd fd[fd]fd fd fd fd fd fd fa fa fa fa fa fa fa
0x100c1a000b4b00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x100c1a000b4b10: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x100c1a000b4b20: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x100c1a000b4b30: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x100c1a000b4b40: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Shadow gap: cc
==1134126==ABORTING
```
Additional backtraces (not full):
Allocation:
```
#0 __memset_z196 () at ../sysdeps/s390/memset-z900.S:144
#1 0x000003ff96f3072a in __asan::Allocator::Allocate (this=this@entry=0x3ff97041eb8 <__asan::instance>, size=size@entry=136, alignment=8, alignment@entry=0, stack=<optimized out>,
stack@entry=0x3ffdbb45d78, alloc_type=<optimized out>, can_fill=true) at /var/tmp/portage/sys-devel/gcc-11.3.1_p20230303/work/gcc-11-20230303/libsanitizer/asan/asan_allocator.cpp:599
#2 0x000003ff96f2c088 in __asan::asan_memalign (alignment=alignment@entry=0, size=size@entry=136, stack=stack@entry=0x3ffdbb45d78, alloc_type=alloc_type@entry=__asan::FROM_NEW)
at /var/tmp/portage/sys-devel/gcc-11.3.1_p20230303/work/gcc-11-20230303/libsanitizer/asan/asan_allocator.cpp:1039
#3 0x000003ff96fb73b0 in operator new (size=136) at /var/tmp/portage/sys-devel/gcc-11.3.1_p20230303/work/gcc-11-20230303/libsanitizer/asan/asan_new_delete.cpp:99
#4 0x000003ff41404440 in __gnu_cxx::new_allocator<std::_Sp_counted_ptr_inplace<c10::FunctionSchema, std::allocator<c10::FunctionSchema>, (__gnu_cxx::_Lock_policy)2> >::allocate (this=0x3ffdbb468c0,
__n=1) at /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-v11/ext/new_allocator.h:127
#5 0x000003ff414042a0 in std::allocator_traits<std::allocator<std::_Sp_counted_ptr_inplace<c10::FunctionSchema, std::allocator<c10::FunctionSchema>, (__gnu_cxx::_Lock_policy)2> > >::allocate (__a=...,
__n=1) at /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-v11/bits/alloc_traits.h:464
#6 0x000003ff41403b66 in std::__allocate_guarded<std::allocator<std::_Sp_counted_ptr_inplace<c10::FunctionSchema, std::allocator<c10::FunctionSchema>, (__gnu_cxx::_Lock_policy)2> > > (__a=...)
at /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-v11/bits/allocated_ptr.h:98
#7 0x000003ff4140372a in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<c10::FunctionSchema, std::allocator<c10::FunctionSchema>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<c10::Argument, std::allocator<c10::Argument> >, std::vector<c10::Argument, std::allocator<c10::Argument> > > (this=0x3ffdbb47888, __p=@0x3ffdbb47880: 0x0, __a=..., __args=..., __args=..., __args=..., __args=...)
at /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-v11/bits/shared_ptr_base.h:648
#8 0x000003ff41403328 in std::__shared_ptr<c10::FunctionSchema, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<c10::FunctionSchema>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<c10::Argument, std::allocator<c10::Argument> >, std::vector<c10::Argument, std::allocator<c10::Argument> > > (this=0x3ffdbb47880, __tag=..., __args=..., __args=..., __args=..., __args=...) at /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-v11/bits/shared_ptr_base.h:1342
#9 0x000003ff41402f06 in std::shared_ptr<c10::FunctionSchema>::shared_ptr<std::allocator<c10::FunctionSchema>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<c10::Argument, std::allocator<c10::Argument> >, std::vector<c10::Argument, std::allocator<c10::Argument> > > (
this=0x3ffdbb47880, __tag=..., __args=..., __args=..., __args=..., __args=...) at /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-v11/bits/shared_ptr.h:409
#10 0x000003ff41402b6e in std::allocate_shared<c10::FunctionSchema, std::allocator<c10::FunctionSchema>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<c10::Argument, std::allocator<c10::Argument> >, std::vector<c10::Argument, std::allocator<c10::Argument> > > (__a=...,
__args=..., __args=..., __args=..., __args=...) at /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-v11/bits/shared_ptr.h:862
#11 0x000003ff4140215c in std::make_shared<c10::FunctionSchema, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<c10::Argument, std::allocator<c10::Argument> >, std::vector<c10::Argument, std::allocator<c10::Argument> > > (__args=..., __args=..., __args=..., __args=...)
at /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-v11/bits/shared_ptr.h:878
#12 0x000003ff413d180c in c10::TupleType::createWithSpec<c10::basic_string_view<char> > (qualName=..., field_names=std::vector of length 1, capacity 1 = {...},
field_types=std::vector of length 1, capacity 1 = {...}, field_defaults=std::vector of length 0, capacity 0) at /home/user/pytorch/aten/src/ATen/core/type.cpp:769
#13 0x000003ff413b9ca6 in c10::TupleType::createNamed (qualName=..., field_names=std::vector of length 1, capacity 1 = {...}, field_types=std::vector of length 1, capacity 1 = {...})
at /home/user/pytorch/aten/src/ATen/core/type.cpp:725
#14 0x000003ff4115fbac in c10::ivalue::TupleTypeFactory<c10::TupleType>::fallback (type=...) at /home/user/pytorch/aten/src/ATen/core/dynamic_type.cpp:383
#15 0x000003ff708217fe in c10::ivalue::Tuple::type<c10::TupleType> (this=0x6080004b8520) at /home/user/pytorch/aten/src/ATen/core/ivalue_inl.h:781
#16 0x000003ff70800740 in torch::jit::toPyObject (ivalue=...) at /home/user/pytorch/torch/csrc/jit/python/pybind_utils.cpp:613
#17 0x000003ff70800306 in torch::jit::toPyObject (ivalue=...) at /home/user/pytorch/torch/csrc/jit/python/pybind_utils.cpp:604
#18 0x000003ff702d6872 in pybind11::detail::type_caster<c10::IValue, void>::cast (src=...) at /home/user/pytorch/torch/csrc/jit/python/pybind.h:138
#19 0x000003ff70d98192 in pybind11::cpp_function::initialize<torch::jit::initJitScriptBindings(_object*)::$_45, c10::IValue, torch::jit::mobile::Module&, pybind11::tuple const&, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::arg>(torch::jit::initJitScriptBindings(_object*)::$_45&&, c10::IValue (*)(torch::jit::mobile::Module&, pybind11::tuple const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::arg const&)::{lambda(pybind11::detail::function_call&)#1}::operator()(pybind11::detail::function_call&) const (this=0x3ffdbb4ca20, call=...)
at /home/user/pytorch/cmake/../third_party/pybind11/include/pybind11/pybind11.h:249
#20 0x000003ff70d97cfe in pybind11::cpp_function::initialize<torch::jit::initJitScriptBindings(_object*)::$_45, c10::IValue, torch::jit::mobile::Module&, pybind11::tuple const&, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::arg>(torch::jit::initJitScriptBindings(_object*)::$_45&&, c10::IValue (*)(torch::jit::mobile::Module&, pybind11::tuple const&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::arg const&)::{lambda(pybind11::detail::function_call&)#1}::__invoke(pybind11::detail::function_call&) (call=...)
at /home/user/pytorch/cmake/../third_party/pybind11/include/pybind11/pybind11.h:224
#21 0x000003ff6e9652ea in pybind11::cpp_function::dispatcher (self=<PyCapsule at remote 0x3ff83e27720>,
args_in=(<torch._C.LiteScriptModule at remote 0x3ff811844b0>, (<Tensor at remote 0x3ff814efb00>,)), kwargs_in=0x0) at /home/user/pytorch/cmake/../third_party/pybind11/include/pybind11/pybind11.h:929
```
Deallocation:
```
#0 operator delete (ptr=0x60d0005a5740) at /var/tmp/portage/sys-devel/gcc-11.3.1_p20230303/work/gcc-11-20230303/libsanitizer/asan/asan_new_delete.cpp:160
#1 0x000003ff44904fdc in __gnu_cxx::new_allocator<std::_Sp_counted_ptr_inplace<c10::FunctionSchema, std::allocator<c10::FunctionSchema>, (__gnu_cxx::_Lock_policy)2> >::deallocate (this=0x3ffc5dc8020,
__p=0x60d0005a5740, __t=1) at /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-v11/ext/new_allocator.h:145
#2 0x000003ff44904fa8 in std::allocator_traits<std::allocator<std::_Sp_counted_ptr_inplace<c10::FunctionSchema, std::allocator<c10::FunctionSchema>, (__gnu_cxx::_Lock_policy)2> > >::deallocate (
__a=..., __p=0x60d0005a5740, __n=1) at /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-v11/bits/alloc_traits.h:496
#3 0x000003ff449041f2 in std::__allocated_ptr<std::allocator<std::_Sp_counted_ptr_inplace<c10::FunctionSchema, std::allocator<c10::FunctionSchema>, (__gnu_cxx::_Lock_policy)2> > >::~__allocated_ptr (
this=0x3ffc5dc8030) at /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-v11/bits/allocated_ptr.h:74
#4 0x000003ff44904888 in std::_Sp_counted_ptr_inplace<c10::FunctionSchema, std::allocator<c10::FunctionSchema>, (__gnu_cxx::_Lock_policy)2>::_M_destroy (this=0x60d0005a5740)
at /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-v11/bits/shared_ptr_base.h:538
#5 0x000003ff43895a62 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x60d0005a5740) at /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-v11/bits/shared_ptr_base.h:184
#6 0x000003ff43895420 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x611000c40648) at /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-v11/bits/shared_ptr_base.h:705
#7 0x000003ff4466e7f4 in std::__shared_ptr<c10::FunctionSchema, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x611000c40640)
at /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-v11/bits/shared_ptr_base.h:1154
#8 0x000003ff4466d820 in std::shared_ptr<c10::FunctionSchema>::~shared_ptr (this=0x611000c40640) at /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-v11/bits/shared_ptr.h:122
#9 0x000003ff448d82f6 in c10::TupleType::~TupleType (this=0x611000c40580) at /home/user/pytorch/aten/src/ATen/core/jit_type.h:1142
#10 0x000003ff448d8346 in c10::TupleType::~TupleType (this=0x611000c40580) at /home/user/pytorch/aten/src/ATen/core/jit_type.h:1142
#11 0x000003ff731296a4 in std::_Sp_counted_ptr<c10::TupleType*, (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=0x603000c43ae0)
at /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-v11/bits/shared_ptr_base.h:348
#12 0x000003ff71eaf666 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x603000c43ae0) at /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-v11/bits/shared_ptr_base.h:168
#13 0x000003ff71eaf330 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x3ffc5dc9368) at /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-v11/bits/shared_ptr_base.h:705
#14 0x000003ff73129ee4 in std::__shared_ptr<c10::TupleType, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x3ffc5dc9360)
at /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-v11/bits/shared_ptr_base.h:1154
#15 0x000003ff73122390 in std::shared_ptr<c10::TupleType>::~shared_ptr (this=0x3ffc5dc9360) at /usr/lib/gcc/s390x-ibm-linux-gnu/11/include/g++-v11/bits/shared_ptr.h:122
#16 0x000003ff73d00788 in torch::jit::toPyObject (ivalue=...) at /home/user/pytorch/torch/csrc/jit/python/pybind_utils.cpp:613
#17 0x000003ff73d00306 in torch::jit::toPyObject (ivalue=...) at /home/user/pytorch/torch/csrc/jit/python/pybind_utils.cpp:604
```
</details>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101400
Approved by: https://github.com/zou3519
We want to make TorchRec sharded models TorchScriptable.
TorchRec sharded models uses generic types Awaitable[W] and LazyAwaitable[W] (https://github.com/pytorch/torchrec/blob/main/torchrec/distributed/types.py#L212).
In sharded model those types are used instead of contained type W, having the initialization function that produces object of type W.
At the moment when the first attribute of W is requested - `LazyAwaitable[W]` will call its initialization function (on the same stack), cache the result inside and work transparently as an object of W. So we can think about it as a delayed object initialization.
To support this behavior in TorchScript - we propose a new type to TorchScript - `Await`.
In eager mode it works the same as `LazyAwaitable[W]` in TorchRec, being dynamically typed - acting as a type `W` while it is `Await[W]`.
Within torchscript it is `Await[W]` and can be only explicitly converted to W, using special function `torch.jit.awaitable_wait(aw)`.
Creation of this `Await[W]` is done via another special function `torch.jit.awaitable(func, *args)`.
The semantic is close to `torch.jit.Future`, fork, wait and uses the same jit mechanics (inline fork Closures) with the difference that it does not start this function in parallel on fork. It only stores as a lambda inside IValue that will be called on the same thread when `torch.jit.awaitable_wait` is called.
For example (more examples in this PR `test/jit/test_await.py`)
```
def delayed(z: Tensor) -> Tensor:
return Tensor * 3
@torch.jit.script
def fn(x: Tensor):
aw: Await[int] = torch.jit._awaitable(delayed, 99)
a = torch.eye(2)
b = torch.jit._awaitable_wait(aw)
return a + b + x
```
Functions semantics:
`_awaitable(func -> Callable[Tuple[...], W], *args, **kwargs) -> Await[W]`
Creates Await object, owns args and kwargs. Once _awaitable_wait calls, executes function func and owns the result of the function. Following _awaitable_wait calls will return this result from the first function call.
`_awaitable_wait(Await[W]) -> W`
Returns either cached result of W if it is not the first _awaitable_wait call to this Await object or calls specified function if the first.
`_awaitable_nowait(W) -> Await[W]`
Creates trivial Await[W] wrapper on specified object To be type complaint for the corner cases.
Differential Revision: [D42502706](https://our.internmc.facebook.com/intern/diff/D42502706)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90863
Approved by: https://github.com/davidberard98
Not only is this change usually shorter and more readable, it also can yield better performance. size() is not always a constant time operation (such as on LinkedLists), but empty() always is.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93236
Approved by: https://github.com/malfet
As we live in C++17 world
This is a functional no-op, just
- `s/namespace at { namespace native {/namespace at::native {/`
- `s/namespace torch { namespace jit {/namespace torch::jit {/`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/92100
Approved by: https://github.com/izaitsevfb
#75854
A naive attempt at working around the limitations of using a single 64-bit integer to pack `stream_id`, `device_index`, and `device_type`.
Stills needs sanity checks, testing, and minimization of BC-breaking changes.
Currently a Holder for the `StreamData3` struct is used for `IValue` compatibility. While doing this seems to work for `ivalue.h` and `ivalue_inl.h`, this doesn't seem to be naively working for the JIT CUDA stream wrapper? (Something about ambiguous calls if an `intrusive_ptr` to `c10::ivalue::StreamData3Holder` is used as the return type for `pack()`. It turns out that the methods required to access the fields for rematerializing a CUDA Stream are basically already present anyway, so `pack` is simply removed in the wrapper for now and the methods to access the required fields are called directly.
CC @ptrblck
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81596
Approved by: https://github.com/ezyang
Noticed the toSymFloat / toSymInt overloads always copied the internal pointer of an ivalue even if it was an rvalue unlike other overloads (like toTensor). This fixes that issue by adding the appropriate methods needed to facilitate that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91405
Approved by: https://github.com/ezyang
This refactor was prompted by challenges handling mixed int/float
operations in C++. A previous version of this patch
added overloads for each permutation of int/float and was unwieldy
https://github.com/pytorch/pytorch/pull/87722/ This PR takes a different
approach.
The general outline of the patch is to combine the C++ types SymIntNode
and SymFloatNode into a single type, SymNode. This is type erased; we
no longer know statically at C++ if we have an int/float and have to test
it with the is_int()/is_float() virtual methods. This has a number of
knock on effects.
- We no longer have C++ classes to bind to Python. Instead, we take an
entirely new approach to our Python API, where we have a SymInt/SymFloat
class defined entirely in Python, which hold a SymNode (which corresponds
to the C++ SymNode). However, SymNode is not pybind11-bound; instead,
it lives as-is in Python, and is wrapped into C++ SymNode using PythonSymNode
when it goes into C++. This implies a userland rename.
In principle, it is also possible for the canonical implementation of SymNode
to be written in C++, and then bound to Python with pybind11 (we have
this code, although it is commented out.) However, I did not implement
this as we currently have no C++ implementations of SymNode.
Because we do return SymInt/SymFloat from C++ bindings, the C++ binding
code needs to know how to find these classes. Currently, this is done
just by manually importing torch and getting the attributes.
- Because SymInt/SymFloat are easy Python wrappers, __sym_dispatch__ now
takes SymInt/SymFloat, rather than SymNode, bringing it in line with how
__torch_dispatch__ works.
Some miscellaneous improvements:
- SymInt now has a constructor that takes SymNode. Note that this
constructor is ambiguous if you pass in a subclass of SymNode,
so an explicit downcast is necessary. This means toSymFloat/toSymInt
are no more. This is a mild optimization as it means rvalue reference
works automatically.
- We uniformly use the caster for c10::SymInt/SymFloat, rather than
going the long way via the SymIntNode/SymFloatNode.
- Removed some unnecessary toSymInt/toSymFloat calls in normalize_*
functions, pretty sure this doesn't do anything.
- guard_int is now a free function, since to guard on an int you cannot
assume the method exists. A function can handle both int and SymInt
inputs.
- We clean up the magic method definition code for SymInt/SymFloat/SymNode.
ONLY the user classes (SymInt/SymFloat) get magic methods; SymNode gets
plain methods; this is to help avoid confusion between the two types.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
cc @jansel @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87817
Approved by: https://github.com/albanD, https://github.com/anjali411
At the moment, they were casted to `int64`, which breaks quite a few
casting rules for example in `ops.aten`.
Quite a vintage bug, circa 2020.
With this fix, the following code prints `torch.bool`, rather than `torch.int64`.
```python
import torch
msk = torch.tensor([False])
b = torch.tensor([False])
print(torch.ops.aten.where.ScalarSelf(msk, True, b).dtype)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87179
Approved by: https://github.com/albanD
We have logic that says if you ask for a SymIntList from an IValue, but the IValue is actually an IntList, we will still give it to you in that case (check ivalue_to_arg in aten/src/ATen/core/boxing/impl/make_boxed_from_unboxed_functor.h). However, we also need the *inverse* version of this logic, which says that if you construct an IValue from a SymIntArrayRef, and it is actually integer only, we need to store it as an IntList, so that toIntList on the IValue will work.
The way this works is a bit twisty, but our basic strategy is to disable construction of IValue from list container types that contain SymInt directly, and then directly implement variants of these constructors by hand, which iterate over the elements of the list and test if there are any SymInts or not to decide what type to construct the underlying List. These variants have to be templated, otherwise we will run afoul ambiguous overloads. I only did the overloads that actually occurred in practice; you may need to add more if you SymIntify more stuff.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86094
Approved by: https://github.com/anjali411, https://github.com/albanD
- Make toIValue accept SymIntNode and SymFloatNode where number (aka Scalar) is
expected
- Binding for symintlistOptional in python arg parser
- Teach translate to convert from IntArrayRef to ArrayRef<int64_t>
- Don't query _symint function for meta info in LTC unless LTC is
code generating a symint function
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86042
Approved by: https://github.com/Chillee
- Support storing SymFloat in IValue
- Add SymFloat to JIT type system (erases to float)
- Printing support for SymFloat
- add/sub/mul/truediv operator support for SymFloat
- Support truediv on integers, it returns a SymFloat
- Support parsing SymFloat from Python object
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85411
Approved by: https://github.com/albanD
Also Back out "Revert D39075159: [acc_tensor] Use SymIntArrayRef for overloaded empty.memory_format's signature"
Original commit changeset: dab4a9dba4fa
Original commit changeset: dcaf16c037a9
Original Phabricator Diff: D38984222
Original Phabricator Diff: D39075159
Also update Metal registrations for C++ registration changes.
Also update NNPI registration to account for tightened schema checking
Differential Revision: [D39084762](https://our.internmc.facebook.com/intern/diff/D39084762/)
**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D39084762/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84173
Approved by: https://github.com/Krovatkin
This PR adds nvfuser-specific primitive - `var_mean`.
Interpretation `torch.var_mean` -> `torch.ops.nvprims.var_mean` is handled by `TorchRefsNvfuserCapabilityMode` context manager.
I moved some helper code from `_prims/__init__.py` to `_prims_common`. Correctness is tested with OpInfo tests (see `PythonRefInfo("ops.nvprims.var_mean"`).
Layer norm reference now uses `torch.var_mean` instead of `torch._refs.var_mean` to allow interception. Here's a simple comparison of performance with this PR and master (on 3080ti):
```py
import torch
from torch._prims.context import TorchRefsNvfuserCapabilityMode
from torch.fx.experimental.proxy_tensor import make_fx
from torch._prims.executor import execute
def func(a):
return torch.native_layer_norm(a, (1024,), None, None, 1e-6)
a = torch.randn(10, 512, 1024, dtype=torch.float16, device="cuda")
with TorchRefsNvfuserCapabilityMode():
gm = make_fx(func)(a)
for _ in range(10):
execute(gm, a, executor="strictly_nvfuser");
```
run with `PYTORCH_NVFUSER_DUMP=dump_eff_bandwidth python script.py`
```py
# WITH THIS PR
# kernel1 run in 0.032768 ms, achieved: 641.25 GB/s
# kernel1 run in 0.033792 ms, achieved: 621.818 GB/s
# kernel1 run in 0.032768 ms, achieved: 641.25 GB/s
# kernel1 run in 0.032608 ms, achieved: 644.396 GB/s
# kernel1 run in 0.031744 ms, achieved: 661.935 GB/s
# kernel1 run in 0.031744 ms, achieved: 661.935 GB/s
# kernel1 run in 0.032768 ms, achieved: 641.25 GB/s
# kernel1 run in 0.03072 ms, achieved: 684 GB/s
# kernel1 run in 0.031744 ms, achieved: 661.935 GB/s
# kernel1 run in 0.031744 ms, achieved: 661.935 GB/s
# ON MASTER
# kernel1 run in 0.05632 ms, achieved: 373.091 GB/s
# kernel1 run in 0.044032 ms, achieved: 477.209 GB/s
# kernel1 run in 0.044032 ms, achieved: 477.209 GB/s
# kernel1 run in 0.044032 ms, achieved: 477.209 GB/s
# kernel1 run in 0.043808 ms, achieved: 479.649 GB/s
# kernel1 run in 0.043008 ms, achieved: 488.571 GB/s
# kernel1 run in 0.044032 ms, achieved: 477.209 GB/s
# kernel1 run in 0.043008 ms, achieved: 488.571 GB/s
# kernel1 run in 0.043008 ms, achieved: 488.571 GB/s
# kernel1 run in 0.043008 ms, achieved: 488.571 GB/s
```
So this PR gives about 35% improvement in performance using nvfuser executor with this specific normalized shape.
Also this PR fixes https://github.com/pytorch/pytorch/issues/83506 (see the change in `torch/csrc/jit/python/pybind_utils.cpp`).
Ref. https://github.com/pytorch/pytorch/issues/80187
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83508
Approved by: https://github.com/ngimel
This PR adds nvfuser-specific primitive - `var_mean`.
Interpretation `torch.var_mean` -> `torch.ops.nvprims.var_mean` is handled by `TorchRefsNvfuserCapabilityMode` context manager.
I moved some helper code from `_prims/__init__.py` to `_prims_common`. Correctness is tested with OpInfo tests (see `PythonRefInfo("ops.nvprims.var_mean"`).
Layer norm reference now uses `torch.var_mean` instead of `torch._refs.var_mean` to allow interception. Here's a simple comparison of performance with this PR and master (on 3080ti):
```py
import torch
from torch._prims.context import TorchRefsNvfuserCapabilityMode
from torch.fx.experimental.proxy_tensor import make_fx
from torch._prims.executor import execute
def func(a):
return torch.native_layer_norm(a, (1024,), None, None, 1e-6)
a = torch.randn(10, 512, 1024, dtype=torch.float16, device="cuda")
with TorchRefsNvfuserCapabilityMode():
gm = make_fx(func)(a)
for _ in range(10):
execute(gm, a, executor="strictly_nvfuser");
```
run with `PYTORCH_NVFUSER_DUMP=dump_eff_bandwidth python script.py`
```py
# WITH THIS PR
# kernel1 run in 0.032768 ms, achieved: 641.25 GB/s
# kernel1 run in 0.033792 ms, achieved: 621.818 GB/s
# kernel1 run in 0.032768 ms, achieved: 641.25 GB/s
# kernel1 run in 0.032608 ms, achieved: 644.396 GB/s
# kernel1 run in 0.031744 ms, achieved: 661.935 GB/s
# kernel1 run in 0.031744 ms, achieved: 661.935 GB/s
# kernel1 run in 0.032768 ms, achieved: 641.25 GB/s
# kernel1 run in 0.03072 ms, achieved: 684 GB/s
# kernel1 run in 0.031744 ms, achieved: 661.935 GB/s
# kernel1 run in 0.031744 ms, achieved: 661.935 GB/s
# ON MASTER
# kernel1 run in 0.05632 ms, achieved: 373.091 GB/s
# kernel1 run in 0.044032 ms, achieved: 477.209 GB/s
# kernel1 run in 0.044032 ms, achieved: 477.209 GB/s
# kernel1 run in 0.044032 ms, achieved: 477.209 GB/s
# kernel1 run in 0.043808 ms, achieved: 479.649 GB/s
# kernel1 run in 0.043008 ms, achieved: 488.571 GB/s
# kernel1 run in 0.044032 ms, achieved: 477.209 GB/s
# kernel1 run in 0.043008 ms, achieved: 488.571 GB/s
# kernel1 run in 0.043008 ms, achieved: 488.571 GB/s
# kernel1 run in 0.043008 ms, achieved: 488.571 GB/s
```
So this PR gives about 35% improvement in performance using nvfuser executor with this specific normalized shape.
Also this PR fixes https://github.com/pytorch/pytorch/issues/83506 (see the change in `torch/csrc/jit/python/pybind_utils.cpp`).
Ref. https://github.com/pytorch/pytorch/issues/80187
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83508
Approved by: https://github.com/ngimel
Previously, we introduced new SymInt overloads for every function we wanted. This led to a lot of boilerplate, and also a lot of confusion about how the overloads needed to be implemented.
This PR takes a simpler but more risky approach: just take the original function and changes its ints to SymInts.
This is BC-breaking in the following ways:
* The C++ API for registering implementations for aten operators will change from int64_t to SymInt whenever you make this change. Code generated registrations in PyTorch do not change as codegen handles the translation automatically, but manual registrations will need to follow the change. Typically, if you now accept a SymInt where you previously only took int64_t, you have to convert it back manually. This will definitely break XLA, see companion PR https://github.com/pytorch/xla/pull/3914 Note that not all dispatch keys get the automatic translation; all the composite keys and Meta keys are modified to take SymInt directly (because they should handle them directly), and so there are adjustments for this.
This is not BC-breaking in the following ways:
* The user facing C++ API remains compatible. Even if a function changes from int to SymInt, the default C++ binding still takes only ints. (e.g., at::empty(IntArrayRef, ...). To call with SymInts, you must call at::empty_symint instead. This involved adding two more signatures to CppSignatureGroup; in many cases I refactored code to iterate over all signatures in the group instead of hard-coding the two that previously existed.
* This is TorchScript compatible; internally we treat SymInts as ints so there is no change to what happens at runtime in TorchScript. In particular, it's OK to reference an empty schema by its old type (using int types), as long as you're not doing string equality (which you shouldn't be), these parse to the same underyling type.
Structure of the PR:
* The general strategy of this PR is that, even when you write `SymInt` inside `native_functions.yaml`, sometimes, we will treat it *as if* it were an `int`. This idea pervades the codegen changes, where we have a translation from SymInt to c10::SymInt or int64_t, and this is controlled by a symint kwarg which I added and then audited all call sites to decide which I wanted. Here are some of the major places where we pick one or the other:
* The C++ FunctionSchema representation represents `SymInt` as `int`. There are a few places we do need to know that we actually have a SymInt and we consult `real_type()` to get the real type in this case. In particular:
* When we do schema validation of C++ operator registration, we must compare against true schema (as the C++ API will provide `c10::SymInt`, and this will only be accepted if the schema is `SymInt`. This is handled with cloneWithRealTypes before we check for schema differences.
* In `toIValue` argument parsing, we parse against the true schema value. For backwards compatibility reasons, I do still accept ints in many places where Layout/SymInt/etc were expected. (Well, accepting int where SymInt is expected is not BC, it's just the right logic!)
* In particular, because SymInt never shows up as type() in FunctionSchema, this means that we no longer need a dedicated Tag::SymInt. This is good, because SymInts never show up in mobile anyway.
* Changes to functorch/aten are mostly about tracking changes to the C++ API registration convention. Additionally, since SymInt overloads no longer exist, registrations for SymInt implementations are deleted. In many cases, the old implementations did not properly support SymInts; I did not add any new functionality with this PR, but I did try to annotate with TODOs where this is work to do. Finally, because the signature of `native::` API changed from int to SymInt, I need to find alternative APIs for people who were directly calling these functions to call. Typically, I insert a new dispatch call when perf doesn't matter, or use `at::compositeexplicitautograd` namespace to handle other caes.
* The change to `make_boxed_from_unboxed_functor.h` is so that we accept a plain IntList IValue anywhere a SymIntList is expected; these are read-only arguments so covariant typing is OK.
* I change how unboxing logic works slightly. Previously, we interpret the C++ type for Layout/etc directly as IntType JIT type, which works well because the incoming IValue is tagged as an integer. Now, we interpret the C++ type for Layout as its true type, e.g., LayoutType (change to `jit_type.h`), but then we accept an int IValue for it anyway. This makes it symmetric with SymInt, where we interpret the C++ type as SymIntType, and then accept SymInt and int IValues for it.
* I renamed the `empty.names` overload to `empty_names` to make it less confusing (I kept mixing it up with the real empty overload)
* I deleted the `empty.SymInt` overload, which ended up killing a pile of functions. (This was originally a separate PR but the profiler expect test was giving me grief so I folded it in.)
* I deleted the LazyDynamicOpsTest tests. These were failing after these changes, and I couldn't figure out why they used to be passing: they make use of `narrow_copy` which didn't actually support SymInts; they were immediately converted to ints.
* I bashed LTC into working. The patches made here are not the end of the story. The big problem is that SymInt translates into Value, but what if you have a list of SymInt? This cannot be conveniently represented in the IR today, since variadic Values are not supported. To work around this, I translate SymInt[] into plain int[] (this is fine for tests because LTC dynamic shapes never actually worked); but this will need to be fixed for proper LTC SymInt support. The LTC codegen also looked somewhat questionable; I added comments based on my code reading.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83628
Approved by: https://github.com/albanD, https://github.com/bdhirsh
Previously, we introduced new SymInt overloads for every function we wanted. This led to a lot of boilerplate, and also a lot of confusion about how the overloads needed to be implemented.
This PR takes a simpler but more risky approach: just take the original function and changes its ints to SymInts.
This is BC-breaking in the following ways:
* The C++ API for registering implementations for aten operators will change from int64_t to SymInt whenever you make this change. Code generated registrations in PyTorch do not change as codegen handles the translation automatically, but manual registrations will need to follow the change. Typically, if you now accept a SymInt where you previously only took int64_t, you have to convert it back manually. This will definitely break XLA, see companion PR https://github.com/pytorch/xla/pull/3914 Note that not all dispatch keys get the automatic translation; all the composite keys and Meta keys are modified to take SymInt directly (because they should handle them directly), and so there are adjustments for this.
This is not BC-breaking in the following ways:
* The user facing C++ API remains compatible. Even if a function changes from int to SymInt, the default C++ binding still takes only ints. (e.g., at::empty(IntArrayRef, ...). To call with SymInts, you must call at::empty_symint instead. This involved adding two more signatures to CppSignatureGroup; in many cases I refactored code to iterate over all signatures in the group instead of hard-coding the two that previously existed.
* This is TorchScript compatible; internally we treat SymInts as ints so there is no change to what happens at runtime in TorchScript. In particular, it's OK to reference an empty schema by its old type (using int types), as long as you're not doing string equality (which you shouldn't be), these parse to the same underyling type.
Structure of the PR:
* The general strategy of this PR is that, even when you write `SymInt` inside `native_functions.yaml`, sometimes, we will treat it *as if* it were an `int`. This idea pervades the codegen changes, where we have a translation from SymInt to c10::SymInt or int64_t, and this is controlled by a symint kwarg which I added and then audited all call sites to decide which I wanted. Here are some of the major places where we pick one or the other:
* The C++ FunctionSchema representation represents `SymInt` as `int`. There are a few places we do need to know that we actually have a SymInt and we consult `real_type()` to get the real type in this case. In particular:
* When we do schema validation of C++ operator registration, we must compare against true schema (as the C++ API will provide `c10::SymInt`, and this will only be accepted if the schema is `SymInt`. This is handled with cloneWithRealTypes before we check for schema differences.
* In `toIValue` argument parsing, we parse against the true schema value. For backwards compatibility reasons, I do still accept ints in many places where Layout/SymInt/etc were expected. (Well, accepting int where SymInt is expected is not BC, it's just the right logic!)
* In particular, because SymInt never shows up as type() in FunctionSchema, this means that we no longer need a dedicated Tag::SymInt. This is good, because SymInts never show up in mobile anyway.
* Changes to functorch/aten are mostly about tracking changes to the C++ API registration convention. Additionally, since SymInt overloads no longer exist, registrations for SymInt implementations are deleted. In many cases, the old implementations did not properly support SymInts; I did not add any new functionality with this PR, but I did try to annotate with TODOs where this is work to do. Finally, because the signature of `native::` API changed from int to SymInt, I need to find alternative APIs for people who were directly calling these functions to call. Typically, I insert a new dispatch call when perf doesn't matter, or use `at::compositeexplicitautograd` namespace to handle other caes.
* The change to `make_boxed_from_unboxed_functor.h` is so that we accept a plain IntList IValue anywhere a SymIntList is expected; these are read-only arguments so covariant typing is OK.
* I change how unboxing logic works slightly. Previously, we interpret the C++ type for Layout/etc directly as IntType JIT type, which works well because the incoming IValue is tagged as an integer. Now, we interpret the C++ type for Layout as its true type, e.g., LayoutType (change to `jit_type.h`), but then we accept an int IValue for it anyway. This makes it symmetric with SymInt, where we interpret the C++ type as SymIntType, and then accept SymInt and int IValues for it.
* I renamed the `empty.names` overload to `empty_names` to make it less confusing (I kept mixing it up with the real empty overload)
* I deleted the `empty.SymInt` overload, which ended up killing a pile of functions. (This was originally a separate PR but the profiler expect test was giving me grief so I folded it in.)
* I deleted the LazyDynamicOpsTest tests. These were failing after these changes, and I couldn't figure out why they used to be passing: they make use of `narrow_copy` which didn't actually support SymInts; they were immediately converted to ints.
* I bashed LTC into working. The patches made here are not the end of the story. The big problem is that SymInt translates into Value, but what if you have a list of SymInt? This cannot be conveniently represented in the IR today, since variadic Values are not supported. To work around this, I translate SymInt[] into plain int[] (this is fine for tests because LTC dynamic shapes never actually worked); but this will need to be fixed for proper LTC SymInt support. The LTC codegen also looked somewhat questionable; I added comments based on my code reading.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83628
Approved by: https://github.com/albanD, https://github.com/bdhirsh
### Description
Adding a custom caster for `c10::SymInt`. This simplifies handling of c10::SymInt on C++/Pytorch boundary. Namely, removing if statements to handle the union nature (e.g. SymIntNode, int) of c10::SymInt.
### Issue
<!-- Link to Issue ticket or RFP -->
### Testing
<!-- How did you test your change? -->
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82692
Approved by: https://github.com/ezyang
Done via
```
git grep -l 'SymbolicIntNode' | xargs sed -i 's/SymbolicIntNode/SymIntNodeImpl/g'
```
Reasoning for the change:
* Sym is shorter than Symbolic, and consistent with SymInt
* You usually will deal in shared_ptr<...>, so we're going to
reserve the shorter name (SymIntNode) for the shared pointer.
But I don't want to update the Python name, so afterwards I ran
```
git grep -l _C.SymIntNodeImpl | xargs sed -i 's/_C.SymIntNodeImpl/_C.SymIntNode/'
```
and manually fixed up the binding code
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82350
Approved by: https://github.com/Krovatkin