mirror of
https://github.com/pytorch/pytorch.git
synced 2025-10-20 21:14:14 +08:00
* Added a cpp loader, AOTIModelPackageLoader, which can load the .pt2, build the .so, and create a runner. The python-facing API is that users can directly call the `run` function, whereas in cpp users can directly access the `runner_` if they are more familiar with that. I couldn't figure out how to bind the `get_runner()` function to python... * Added a new config, `aot_inductor.package_cpp_only` which will **not** package the so. This means that whenever the package is loaded, we will need to build the so. This is turned off by default so that new environments do not need to rebuild their so. The `package_cpp_only` is a feature which torchchat intends to use to provide flexibility to users. * Added a new config, `aot_inductor.metadata` which stores user-provided metadata, serialized to the pt2 as a json file. It also stores the device used when exporting, "cuda" or "cpu", so that during load time, we can use that data to determine which AOTIModelContainerRunner to use. The metadata can be accessed through `loader.get_metadata()`. TODO is to move this metadata to the toplevel `package_aoti` function so that we can remove the metadata as a config. * Separated out `package_aoti` as a standalone function, instead of it automatically being called in inductor. This is to prepare for the case where users will compile multiple models, and want to bundle it in one package. The specific use case is in torchchat, where we want to package the separately-exported encoder and decoder layers. An example of how to use this is in `test_multiple_methods`. * `load_package` will load a singular model, given the model name. * The loader doesn't support windows for now, I think I need to add some more casing to make the build commands work on windows? Differential Revision: [D62329906](https://our.internmc.facebook.com/intern/diff/D62329906) Pull Request resolved: https://github.com/pytorch/pytorch/pull/135374 Approved by: https://github.com/desertfire, https://github.com/malfet
24 lines
681 B
Python
24 lines
681 B
Python
from ctypes import c_void_p
|
|
|
|
from torch import Tensor
|
|
|
|
# Defined in torch/csrc/inductor/aoti_runner/pybind.cpp
|
|
|
|
# Tensor to AtenTensorHandle
|
|
def unsafe_alloc_void_ptrs_from_tensors(tensors: list[Tensor]) -> list[c_void_p]: ...
|
|
def unsafe_alloc_void_ptr_from_tensor(tensor: Tensor) -> c_void_p: ...
|
|
|
|
# AtenTensorHandle to Tensor
|
|
def alloc_tensors_by_stealing_from_void_ptrs(
|
|
handles: list[c_void_p],
|
|
) -> list[Tensor]: ...
|
|
def alloc_tensor_by_stealing_from_void_ptr(
|
|
handle: c_void_p,
|
|
) -> Tensor: ...
|
|
|
|
class AOTIModelContainerRunnerCpu: ...
|
|
class AOTIModelContainerRunnerCuda: ...
|
|
|
|
# Defined in torch/csrc/inductor/aoti_package/pybind.cpp
|
|
class AOTIModelPackageLoader: ...
|