This PR applies clang-tidy readability checks to jit sources and all headers in the code base.
`readability-redundant-inline-specifier` is suppressed because it incurs too many changes. `readability-redundant-inline-specifier` is used to detect redundant inline specifiers on function and variable declarations. There are many in-class method definitions that are marked inline.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/164652
Approved by: https://github.com/Skylion007
Number of OSS PR were reverted, because new signed-unsigned comparison warnings, which are treated as errors in some internal builds.
Not sure how those selective rules are applied, but this PR removes `-Wno-sign-compare` from PyTorch codebase.
The only tricky part in this PR, as making sure that non-ASCII character detection works for both signed and unsigned chars here:
6e3d51b08a/torch/csrc/jit/serialization/python_print.cpp (L926)
Exclude several files from sign-compare if flash attention is used, due to the violation in cutlass, to be fixed by https://github.com/NVIDIA/cutlass/pull/869
Do not try to fix sign compare violations in caffe2 codebase
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96723
Approved by: https://github.com/albanD
Summary: `IValue::toString()` creates a `new c10::intrusive_ptr` (like `std::shared_ptr`) and `->string()` immediately accesses it, creating an atomic reference increment/decrement. We can skip both of these operations by calling `IValue::toStringRef()`.
Test Plan: CI
Reviewed By: jaybean-dev
Differential Revision: D39605242
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85437
Approved by: https://github.com/jfix71
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68037
Right now mobile::Code doesn't outlive its enclosing Function, and all accesses to Code happens inside interpreter loop which doesn't outlive the module, so we don't need to use std::shared_ptr here. This also should saves us 1-2 KB for binary size, because shared_ptr seems to bloat on arm64 android.
ghstack-source-id: 145818696
Test Plan: eyes.
Reviewed By: qihqi, tugsbayasgalan
Differential Revision: D32264616
fbshipit-source-id: d83f538d6604cf75fd7728a25127b4849ce7ab2a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64066
I noticed a bunch of time being spent heap-allocating Tuples
in the unpickler. 1-, 2-, and 3-element Tuples are apparently common
enough that they get their own bytecode instructions, so I decided to
try also giving them their own representation. We store up to 3
IValues inline in `Tuple` rather than doing a second heap allocation
for a `std::vector<IValue>`.
ghstack-source-id: 140695395
Test Plan:
Added automated tests for TupleElements.
Pixel 3 before: https://www.internalfb.com/intern/aibench/details/761596366576284
Pixel 3 after: https://www.internalfb.com/intern/aibench/details/591414145082422
We went from 347 ms to 302 ms.
Reviewed By: dhruvbird
Differential Revision: D30592622
fbshipit-source-id: 93625c54c9dca5f765ef6d5c191944179cb281a8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64065
It is only safe to mutate Tuple elements if you are the sole owner
of the tuple. The most efficient way to do this, then, is
`std::move(*std::move(tupleIValue).toTuple()).elements()` (the
innermost move allows `IValue::toTuple()` to avoid a refcount bump and
the outermost move allows the element vector to be moved out of the
tuple), but many callsites write simply
`tupleIValue.toTuple().elements()`, which incurs many extra refcount
bumps.
ghstack-source-id: 139468088
Test Plan: CI
Reviewed By: ezyang
Differential Revision: D30592621
fbshipit-source-id: e8312de866de09b9ea2a62e5128cbf403ee16f09
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61862
Modularize functions of parsing bytecode tables so that they can be used as needed in situations other than mobile lite interpreter.
* The decoupled functions are re-used by current lite interpreter loader.
* The bytecode can be serialized/deserialized from other formats.
* The decoupled functions have minimum dependencies on other PyTorch components.
Next:
Build a driver binary to include the parser and interpreter, but only has necessary dependency on other PyTorch components.
ghstack-source-id: 137867287
Test Plan:
As an example, a simple bytecode is parsed to a mobile function, and directly run in the added unit test, `RunTimeTest:ParseBytecode`. It contains basic control flow (if, else) and basic data orchestration (list construction).
CI
Reviewed By: larryliu0820
Differential Revision: D29798382
Pulled By: iseeyuan
fbshipit-source-id: 1c173a5f5d37097e3a97baec3f3e48e1eea1400f