12 Commits

Author SHA1 Message Date
c55e72bea1 [Re-land][Inductor] Support native Inductor as backend for MTIA (#159211)
The previous [diff/PR] (https://github.com/pytorch/pytorch/pull/158526) was reverted due to this docstring lint error:
<img width="1736" height="722" alt="image" src="https://github.com/user-attachments/assets/216b1720-4002-48da-b5f3-32b5d48aaa54" />
I didn't add the docstring cause I thought I'm not supposed to add docstring for an EXISTING function.

So this diff/PR is an exactly copy of the previous one, except for adding the docstring.

-------------
This diff/PR includes the changes to support native Inductor integration for MTIA. The goal is to support `torch.compile(backend="inductor")` for MTIA. Inductor should generate code(triton kernel + python wrapper code) similar to CUDA. And the triton kernels can be launched eagerly.

The changes include:
- Add MTIA device interfaces used by Dynamo and Inductor, including APIs on device, stream, event, etc.
- Add required torch.mtia APIs, like is_bf16_supported, memory_allocated, set_stream_by_id, etc.
- MTIA specific codegen logic, for example, loading MTIA dynamic_library.
- Other necessary changes to integrate with Inductor codegn, following other devices like CUDA, XPU.
- Integrate with the [empty_strided_mtia](https://www.internalfb.com/code/fbsource/[0d017d3a4a1bdff7253f9c66a9f38e77bd62166b]/fbcode/caffe2/aten/src/ATen/native/mtia/EmptyTensor.cpp?lines=49%2C63%2C71%2C74%2C78) API that we’ve added for the new MTIA ATen backend.
- A change in Inductor runtime to avoid re-initialize MTIADriver.
- BUCK changes to include ATen-mtia in Inductor, and to use -USE_MTIA preprocessor flag.
- Update `test_mnist_e2e.py` to cover native Inductor as backend, using the `--use_native_inductor` flag.
- Add a personal script(`scripts/anwang/run_native_inductor_script.py`) for testing purpose.

Note:
- This approach(option 3) aims to provide a pytorch native approach of Inductor integration for MTIA, minimizing the onboarding overhead. The downside of this approach is that it doesn't leverage MTIA specific graph optimization, and is limited to eagerly launch overhead.
- MTIA will support another approach(option 2) to provide best performance, based on WrapperFxCodegen. We should be able to reuse the fundamental changes of this diff for option 2, like the device interfaces, steam/event APIs, etc, especially as WrapperFxCodegen inherits PythonWrapperCodegen.

Internal:
References:
- [post for context](https://fb.workplace.com/groups/mtiasw/permalink/1718377262384606/)
- [Inductor integration discussion(option 1/2/3)](https://docs.google.com/document/d/1p6363OXtVIRv1hPoaKlRSK3j-iir3QIbDd5bjyqCNig/edit?tab=t.0#heading=h.7s4ns6wcnhmb)
- [Project design doc(option 3)](https://docs.google.com/document/d/1jXUmhgoV9WvkMf-bcY3Od_kK9K_RDOdgHdt1LoQ5Tc4/edit?tab=t.0#heading=h.y43gwdqlv46w)
- [early prototying diff](https://www.internalfb.com/diff/D75110196)
- [MPS integration PR](https://github.com/pytorch/pytorch/pull/153959)
- [empty_strided_xpu PR](https://github.com/pytorch/pytorch/pull/126678)

Differential Revision: [D79040806](https://our.internmc.facebook.com/intern/diff/D79040806/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/159211
Approved by: https://github.com/eellison, https://github.com/blaine-rister, https://github.com/jansel
2025-07-29 17:03:24 +00:00
fe0ff12dab Revert "[Inductor] Support native Inductor as backend for MTIA (#158526)"
This reverts commit cd68559d0451185f8521912c23e77b83d76b87cf.

Reverted https://github.com/pytorch/pytorch/pull/158526 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/158526#issuecomment-3122186057))
2025-07-26 17:58:00 +00:00
cd68559d04 [Inductor] Support native Inductor as backend for MTIA (#158526)
This diff/PR includes the changes to support native Inductor integration for MTIA. The goal is to support `torch.compile(backend="inductor")` for MTIA. Inductor should generate code(triton kernel + python wrapper code) similar to CUDA. And the triton kernels can be launched eagerly.

The changes include:
- Add MTIA device interfaces used by Dynamo and Inductor, including APIs on device, stream, event, etc.
- Add required torch.mtia APIs, like is_bf16_supported, memory_allocated, set_stream_by_id, etc.
- MTIA specific codegen logic, for example, loading MTIA dynamic_library.
- Other necessary changes to integrate with Inductor codegn, following other devices like CUDA, XPU.
- Integrate with the [empty_strided_mtia](https://www.internalfb.com/code/fbsource/[0d017d3a4a1bdff7253f9c66a9f38e77bd62166b]/fbcode/caffe2/aten/src/ATen/native/mtia/EmptyTensor.cpp?lines=49%2C63%2C71%2C74%2C78) API that we’ve added for the new MTIA ATen backend.
- A change in Inductor runtime to avoid re-initialize MTIADriver.
- BUCK changes to include ATen-mtia in Inductor, and to use -USE_MTIA preprocessor flag.
- Update `test_mnist_e2e.py` to cover native Inductor as backend, using the `--use_native_inductor` flag.
- Add a personal script(`scripts/anwang/run_native_inductor_script.py`) for testing purpose.

Note:
- This approach(option 3) aims to provide a pytorch native approach of Inductor integration for MTIA, minimizing the onboarding overhead. The downside of this approach is that it doesn't leverage MTIA specific graph optimization, and is limited to eagerly launch overhead.
- MTIA will support another approach(option 2) to provide best performance, based on WrapperFxCodegen. We should be able to reuse the fundamental changes of this diff for option 2, like the device interfaces, steam/event APIs, etc, especially as WrapperFxCodegen inherits PythonWrapperCodegen.

Internal:
References:
- [post for context](https://fb.workplace.com/groups/mtiasw/permalink/1718377262384606/)
- [Inductor integration discussion(option 1/2/3)](https://docs.google.com/document/d/1p6363OXtVIRv1hPoaKlRSK3j-iir3QIbDd5bjyqCNig/edit?tab=t.0#heading=h.7s4ns6wcnhmb)
- [Project design doc(option 3)](https://docs.google.com/document/d/1jXUmhgoV9WvkMf-bcY3Od_kK9K_RDOdgHdt1LoQ5Tc4/edit?tab=t.0#heading=h.y43gwdqlv46w)
- [early prototying diff](https://www.internalfb.com/diff/D75110196)
- [MPS integration PR](https://github.com/pytorch/pytorch/pull/153959)
- [empty_strided_xpu PR](https://github.com/pytorch/pytorch/pull/126678)

Differential Revision: [D78458745](https://our.internmc.facebook.com/intern/diff/D78458745/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/158526
Approved by: https://github.com/blaine-rister, https://github.com/jansel, https://github.com/eellison
2025-07-26 08:16:34 +00:00
001ebbf734 [MTIA] (4/n) Implement PyTorch APIs to query/reset device peak memory usage (#146751)
Summary: Public summary (shared with Github): This diff updates the unit test for the PyTorch API "reset_peak_memory_stats".

Test Plan:
```
buck2 test //mtia/host_runtime/torch_mtia/tests:test_torch_mtia_api -- -r test_reset_peak_memory_stats
```

https://www.internalfb.com/intern/testinfra/testrun/9007199321947161

Reviewed By: yuhc

Differential Revision: D68989900

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146751
Approved by: https://github.com/nautsimon
2025-02-11 03:51:48 +00:00
dcac3c3e06 [MTIA] (2/n) Implement PyTorch APIs to query/reset device peak memory usage (#146659)
Summary:
Public summary (shared with Github): This diff implements the correct version of the PyTorch API "max_memory_allocated".

Nit: The file previously contained two unit tests with the same name (due to wrong revert); I deleted a deprecated one to revamp the correct version.

Test Plan:
```
buck2 test //mtia/host_runtime/torch_mtia/tests:test_torch_mtia_api -- -r test_max_memory_allocated
```

https://www.internalfb.com/intern/testinfra/testrun/12103424065182810

Reviewed By: yuhc

Differential Revision: D68988435

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146659
Approved by: https://github.com/nautsimon
2025-02-07 23:06:35 +00:00
805c4b597a PEP585 update - torch/_higher_order_ops torch/_subclasses torch/backends torch/compiler torch/cuda torch/masked torch/mtia torch/nested (#145202)
See #145101 for details.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/145202
Approved by: https://github.com/bobrenjc93
2025-01-20 22:37:26 +00:00
c7d7eff798 Revert "[MTIA] (3/n) Implement PyTorch APIs to query/reset device peak memory usage (#143347)"
This reverts commit efe21ee59dfdd6642cc693e69e07aa9d8be13eb9.

Reverted https://github.com/pytorch/pytorch/pull/143347 on behalf of https://github.com/huydhn due to D67118173 has been backed out internally ([comment](https://github.com/pytorch/pytorch/pull/143347#issuecomment-2557983266))
2024-12-21 04:04:16 +00:00
dabc9566c4 Revert "(MTIA) Move "empty_cache" API (#143402)"
This reverts commit c7d9f298072a3f59b39517e367c7d3d2ea30e6d9.

Reverted https://github.com/pytorch/pytorch/pull/143402 on behalf of https://github.com/huydhn due to The internal diff D67148738 has been reverted ([comment](https://github.com/pytorch/pytorch/pull/143402#issuecomment-2557982597))
2024-12-21 04:01:23 +00:00
c7d9f29807 (MTIA) Move "empty_cache" API (#143402)
Summary: This diff moves one of memory-related APIs to the consolidated location, which is `mtia/memory.py`.

Test Plan:
```
buck2 test //mtia/host_runtime/torch_mtia/tests:test_torch_mtia_api
```

https://www.internalfb.com/intern/testinfra/testrun/13510798943184259

Reviewed By: nautsimon

Differential Revision: D67148738

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143402
Approved by: https://github.com/nautsimon
2024-12-20 17:39:06 +00:00
efe21ee59d [MTIA] (3/n) Implement PyTorch APIs to query/reset device peak memory usage (#143347)
Summary: This diff implements the "max_memory_allocated" PyTorch API for MTIA devices, which returns the peak device DRAM usage

Test Plan:
Passed the local unit test
```
buck2 test //mtia/host_runtime/torch_mtia/tests:test_torch_mtia_api -- -r test_max_memory_allocated
```

https://www.internalfb.com/intern/testinfra/testrun/8444249544807192

Reviewed By: yuhc, egienvalue

Differential Revision: D67118173

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143347
Approved by: https://github.com/nautsimon
2024-12-17 23:37:03 +00:00
92cc345683 Implement "torch.mtia.max_memory_allocated" API (#142406)
Summary: This diff implements the inferface of  "torch.mtia.max_memory_allocated" API. The internal implementation will be addressed in a separate diff.

Test Plan:
Passed a local unit test: `buck run //mtia/host_runtime/torch_mtia/tests:test_torch_mtia_api`

```
----------------------------------------------------------------------
Ran 15 tests in 16.862s

OK
I1127 11:31:14.613909 2272144 afg_bindings.cpp:943] afg-aten::mul.out-dtype_Float-uqJKuNc0 executable has been unloaded
I1127 11:31:14.615438 2272144 afg_bindings.cpp:943] afg-add-dtype_Float-fa37JncC executable has been unloaded
```

Reviewed By: ttrung149, nautsimon

Differential Revision: D66553954

Pull Request resolved: https://github.com/pytorch/pytorch/pull/142406
Approved by: https://github.com/nautsimon
2024-12-11 03:06:18 +00:00
005c5694eb Refactor "torch.mtia.memory_stats" API (#141723)
Summary:
This diff refactors the code for the "torch.mtia.memory_stats" API to maintain the same file hierarchy as its CUDA counterpart:
- All device memory APIs are now located under ".../mtia/memory.py".
- Device memory APIs can be accessed using either "torch.mtia.XYZ" or "torch.mtia.memory.XYZ".

Test Plan:
Passed a local unit test: `buck run //mtia/host_runtime/torch_mtia/tests:test_torch_mtia_api`

```
Ran 14 tests in 16.657s

OK
I1127 11:06:06.505201 2133030 afg_bindings.cpp:943] afg-aten::mul.out-dtype_Float-bBtLGD6Y executable has been unloaded
I1127 11:06:06.506654 2133030 afg_bindings.cpp:943] afg-add-dtype_Float-fa37JncC executable has been unloaded
W1127 11:06:08.731138 2133030 HazptrDomain.h:148] Tagged objects remain. This may indicate a higher-level leak of object(s) that use hazptr_obj_cohort.
```

Differential Revision: D66549179

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141723
Approved by: https://github.com/nautsimon
2024-12-09 19:19:19 +00:00