Automatically replaces split with rsplit when relevant and only performs the split up to the first ( or last value). This allows early return of the split function and improve efficiency.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160107
Approved by: https://github.com/albanD
Adds `c_shim_aten.{h/cpp}` and use this for `fill_`
This is the generated `c_shim_aten.cpp` for reference
```cpp
// WARNING: THIS FILE IS AUTOGENERATED BY torchgen. DO NOT MODIFY BY HAND.
// See 7e86a7c015/torchgen/gen.py (L2424-L2436) for details
// This file corresponds to the aten_shimified_ops list in torchgen/aoti/fallback_ops.py
#include <torch/csrc/inductor/aoti_torch/generated/c_shim_aten.h>
#include <torch/csrc/inductor/aoti_torch/utils.h>
#ifndef AT_PER_OPERATOR_HEADERS
#include <ATen/Functions.h>
#include <ATen/CompositeExplicitAutogradFunctions.h>
#include <ATen/CompositeExplicitAutogradNonFunctionalFunctions.h>
#include <ATen/CompositeImplicitAutogradFunctions.h>
#else
#include <ATen/ops/fill.h>
#endif // AT_PER_OPERATOR_HEADERS
using namespace torch::aot_inductor;
AOTITorchError aoti_torch_aten_fill__Scalar(AtenTensorHandle self, double value) {
AOTI_TORCH_CONVERT_EXCEPTION_TO_ERROR_CODE({
at::fill_(
*tensor_handle_to_tensor_pointer(self), value
);
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158974
Approved by: https://github.com/albanD, https://github.com/janeyx99
Most added ops are backwards ops, which have not been well-tested previously (thus why they were missed). Necessary ops were identified by manual examination of torch/_meta_registrations.py return values.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158073
Approved by: https://github.com/desertfire
This word appears often in class descriptions and is not consistently spelled. Update comments and some function names to use the correct spelling consistently. Facilitates searching the codebase.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155944
Approved by: https://github.com/Skylion007
The op test__weight_int4pack_mm_with_scales_and_zeros is for Intel GPU. It is functionally equivalent to the CUDA/CPU op test__weight_int4pack_mm (with the constraint that oneDNN only supports integer zero points, which is why we need this API). Since test__weight_int4pack_mm is already included in AOTI's fallback list, this PR adds support for XPU.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155780
Approved by: https://github.com/jansel
Fixes #ISSUE_NUMBER
## Background
Task: [T222738229](https://www.internalfb.com/intern/tasks/?t=222738229)
It's the first starter task on the project **_Enabling TorchNative Standalone on Whisper_**. We are using cshim to create a layer of abstraction between _**libtorch**_ and **_AOTInductor generated artifacts_**.
So we needed to add an entry in the cshim for every API surface in libtorch. And we only care about operators that AOTInductor does not handle. And for this task, we only wanted to add it for the following ops.
## What I've done?
4 new fallback ops are added that show up in the Whisper model. (torchgen/aoti/fallback_ops.py)
- aten.permute (default)
- aten.squueze (dim)
- aten.abs (default)
- aten.hann_window (default)
Then I ran the below command to generate new header C shim header files. As it says [here](7e86a7c015/torchgen/gen.py (L2424-L2436%20for%20details))
`python torchgen/gen.py --update-aoti-c-shim`
Then, `python setup.py develop` to rebuild PyTorch
## Testing
Also 4 new tests have been added on test/inductor/test_aot_inductor.py
- test_proxy_executor_permute
- test_proxy_executor_abs
- test_proxy_executor_squeeze
- test_proxy_executor_hann
I ran these commands to test it (inside local pytorch root folder):
`python test/inductor/test_aot_inductor.py -k test_proxy_executor_permute`
`python test/inductor/test_aot_inductor.py -k test_proxy_executor_abs`
`python test/inductor/test_aot_inductor.py -k test_proxy_executor_squeeze`
`python test/inductor/test_aot_inductor.py -k test_proxy_executor_hann`
## NOTE:
I didn't see any order between the tests inside _test/inductor/test_aot_inductor.py_. That's why, I added new tests just after the test given in the example.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154251
Approved by: https://github.com/angelayi
Summary:
# Context
See the first PR https://github.com/pytorch/pytorch/pull/153670
# This PR
1. Migrate 3 clamp ops from out-of-tree to in-tree(had to migrate the 3 ops altogether, because clamp.out calls all 3 stubs, which are also called by the other 2 ops):
- clamp.out
- clamp_min.out
- clamp_max.out
2. Also enabled structured kernel codegen for MTIA, which is needed by clamp
3. Also introduced the `--mtia` flag to torchgen to prevent OSS from gencoding MTIA code.(Otherwise we got such link error `lib/libtorch_cpu.so: undefined reference to at::detail::empty_mtia`)
Differential Revision: D74674418
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154015
Approved by: https://github.com/albanD, https://github.com/nautsimon
Added AOTIModelContainerRunnerMps and a shim for mps fallback ops.
I also added a mps-specific shim which contains one operator, which will be used to set arguments being passed to the Metal kernel:
```
AOTI_TORCH_EXPORT AOTITorchError aoti_torch_mps_set_arg(
AOTIMetalKernelFunctionHandle func,
unsigned idx,
AtenTensorHandle tensor);
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153964
Approved by: https://github.com/malfet, https://github.com/desertfire
Added AOTIModelContainerRunnerMps and a shim for mps fallback ops.
I also added a mps-specific shim which contains one operator, which will be used to set arguments being passed to the Metal kernel:
```
AOTI_TORCH_EXPORT AOTITorchError aoti_torch_mps_set_arg(
AOTIMetalKernelFunctionHandle func,
unsigned idx,
AtenTensorHandle tensor);
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153964
Approved by: https://github.com/malfet, https://github.com/desertfire
Summary:
# Context
The MTIA New Aten Backend work is essentially to move MTIA operators from pytorch out-of-tree to in-tree, with following benefits:
1. Avoid duplicate code copied from pytorch, e.g. view ops implementation, util functions.
2. Utilize TensorIterator and structured kernel codegen, avoid manual implementation of broadcasting, dtype casting, asserting, etc.
3. Eliminate MTIA's own codegen flow, which is unnecessary complexity.
4. Overall make MTIA's aten backend more pytorch native.
Differential Revision: D74672464
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153670
Approved by: https://github.com/albanD, https://github.com/nautsimon
https://github.com/pytorch/pytorch/pull/129001#discussion_r1645126801 is the motivation for the whole stack of PRs. In `torch/__init__.py`, `torch._C.Type` shadows `from typing import Type`, and there is no type stub for `torch._C.Type` in `torch/_C/__init__.pyi`. So we need to use `from typing import Type as _Type`. After enabling [Generic TypeAlias (PEP 585)](https://peps.python.org/pep-0585) in the `.pyi` type stub files, we can use `type` instead of `typing.Type` or `from typing import Type as _Type`.
------
- [Generic TypeAlias (PEP 585)](https://peps.python.org/pep-0585): e.g. `typing.List[T] -> list[T]`, `typing.Dict[KT, VT] -> dict[KT, VT]`, `typing.Type[T] -> type[T]`.
- [Union Type (PEP 604)](https://peps.python.org/pep-0604): e.g. `Union[X, Y] -> X | Y`, `Optional[X] -> X | None`, `Optional[Union[X, Y]] -> X | Y | None`.
Note that in `.pyi` stub files, we do not need `from __future__ import annotations`. So this PR does not violate issue #117449:
- #117449
------
Pull Request resolved: https://github.com/pytorch/pytorch/pull/150727
Approved by: https://github.com/aorenste
ghstack dependencies: #150726
This change enables basic NestedTensor operations on HPU,
fixing the runtime error when creating a NestedTensor on HPU.
- Extended `NestedTensorImpl` to recognize `hpu` as a valid storage device.
- Added `NestedTensorHPU` to `DispatchKey` parsing in `DispatchKey.cpp`.
- Updated `torchgen/model.py` to include `NestedTensorHPU` in `dispatch_keys`.
- Modified `native_functions.yaml` to enable `NestedTensorHPU` support for various ops.
Fixes #ISSUE_NUMBER
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148659
Approved by: https://github.com/jeromean, https://github.com/albanD, https://github.com/sujoysaraswati
Fix: https://github.com/pytorch/xla/issues/8755
This PR introduces `TORCH_DISABLE_FUNCTIONALIZATION_META_REFERENCE`
environment variable. Setting this variable makes it so the
functionalization kernels won't run the meta reference, which is used to
propagate expected sizes and strides.
Currently, PyTorch/XLA doesn't actually propagates the correct strides
to its tensors. It was also shown that calling these meta functions may
incur in significant overhead.
Running the provided minimal reproducer (see issue), we see a speedup
close to 4.3x:
- Baseline: 0.0747s
- `XLA_DISABLE_FUNCTIONALIZATION=1`: 0.0159s
- `TORCH_DISABLE_FUNCTIONALIZATION_META_REFERENCE=1`: 0.0175s
In summary, this PR:
- Creates the `disable_meta_reference()` function, which checks whether
the environment variable is set
- Modifies codegen for functionalization kernels, adding the call to
`disable_meta_reference()` function to the appropriate conditions
- Creates a new bash function for running `lazy/test_ts_opinfo.py` with
the environment variable set
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148822
Approved by: https://github.com/bdhirsh
Motivation
===
This PR is part of the plan of OneDNN Upstreaming, as #114848 [(comment)](https://github.com/pytorch/pytorch/issues/114848#issuecomment-2451553203) stated. The support of SDPA is via the overridable variance on XPU backend. Beside the added `Attention.cpp` file, `Graph.h` is added to hold utils for OneDNN graph including those for kernel/compile graph caching. In addition, a selection of testcases in `test/test_transformers.py` are copied into the new `test/xpu/test_transformers.py` and modified accordingly to provide additional tests beyond `./third_party/torch-xpu-ops/test/xpu/test_ops_xpu.py`.
Depends on OneDNN version v3.7 upgrade in #147498
Depends on BUILD_GRAPH switch in #147608
Pull Request resolved: https://github.com/pytorch/pytorch/pull/147614
Approved by: https://github.com/jansel, https://github.com/EikanWang
Fixes https://github.com/pytorch/executorch/issues/8711
In ExecuTorch when we try to parse the following schema:
```
aten::__lshift__.Scalar(Tensor self, Scalar other) -> Tensor
```
Repro:
```python
from torchgen.model import FunctionSchema
native_schema = FunctionSchema.parse("aten::__lshift__.Scalar(Tensor self, Scalar other) -> Tensor")
```
It's failing because `BaseOperatorName` categorizes it to be a
inplace operator.
I understand we are not supposed to pass in namespace "aten::" into
`FunctionSchema.parse()` but unfortunately ExecuTorch requires this
feature to work.
This PR adds a new `namespace` attribute to `BaseOperatorName` and makes
sure the rest of the stack works as before, if a schema without
namespace is passed in
Pull Request resolved: https://github.com/pytorch/pytorch/pull/148038
Approved by: https://github.com/bdhirsh
Reference: https://docs.astral.sh/ruff/formatter/black/#assert-statements
> Unlike Black, Ruff prefers breaking the message over breaking the assertion, similar to how both Ruff and Black prefer breaking the assignment value over breaking the assignment target:
>
> ```python
> # Input
> assert (
> len(policy_types) >= priority + num_duplicates
> ), f"This tests needs at least {priority+num_duplicates} many types."
>
>
> # Black
> assert (
> len(policy_types) >= priority + num_duplicates
> ), f"This tests needs at least {priority+num_duplicates} many types."
>
> # Ruff
> assert len(policy_types) >= priority + num_duplicates, (
> f"This tests needs at least {priority + num_duplicates} many types."
> )
> ```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144546
Approved by: https://github.com/malfet