Commit Graph

6 Commits

Author SHA1 Message Date
cd9ee49a69 [aoti] Add cpp loader (#135374)
* Added a cpp loader, AOTIModelPackageLoader, which can load the .pt2, build the .so, and create a runner. The python-facing API is that users can directly call the `run` function, whereas in cpp users can directly access the `runner_` if they are more familiar with that. I couldn't figure out how to bind the `get_runner()` function to python...
* Added a new config, `aot_inductor.package_cpp_only` which will **not** package the so. This means that whenever the package is loaded, we will need to build the so. This is turned off by default so that new environments do not need to rebuild their so. The `package_cpp_only` is a feature which torchchat intends to use to provide flexibility to users.
* Added a new config, `aot_inductor.metadata` which stores user-provided metadata, serialized to the pt2 as a json file. It also stores the device used when exporting, "cuda" or "cpu", so that during load time, we can use that data to determine which AOTIModelContainerRunner to use. The metadata can be accessed through `loader.get_metadata()`. TODO is to move this metadata to the toplevel `package_aoti` function so that we can remove the metadata as a config.
* Separated out `package_aoti` as a standalone function, instead of it automatically being called in inductor. This is to prepare for the case where users will compile multiple models, and want to bundle it in one package. The specific use case is in torchchat, where we want to package the separately-exported encoder and decoder layers. An example of how to use this is in `test_multiple_methods`.
* `load_package` will load a singular model, given the model name.
* The loader doesn't support windows for now, I think I need to add some more casing to make the build commands work on windows?

Differential Revision: [D62329906](https://our.internmc.facebook.com/intern/diff/D62329906)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135374
Approved by: https://github.com/desertfire, https://github.com/malfet
2024-09-11 03:00:01 +00:00
09f9c256ad Add basic mypy annotations to inductor (#132416)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132416
Approved by: https://github.com/XuehaiPan, https://github.com/jamesjwu
ghstack dependencies: #132415
2024-08-04 18:43:37 +00:00
f2ddd5e9e0 Revert "Add basic mypy annotations to inductor (#132416)"
This reverts commit 78927d37f6085a0b30269cceb731d8097302c091.

Reverted https://github.com/pytorch/pytorch/pull/132416 on behalf of https://github.com/ZainRizvi due to Sorry, this PR has entered a weird state in the diff train. Trying to revert it to skip it, and then we can try relanding it ([comment](https://github.com/pytorch/pytorch/pull/132415#issuecomment-2267631785))
2024-08-04 18:39:29 +00:00
78927d37f6 Add basic mypy annotations to inductor (#132416)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132416
Approved by: https://github.com/XuehaiPan, https://github.com/jamesjwu
ghstack dependencies: #132415
2024-08-01 20:14:25 +00:00
b6d477fd56 [BE][Easy][16/19] enforce style for empty lines in import segments in torch/_i*/ (#129768)
See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter.

You can review these PRs via:

```bash
git diff --ignore-all-space --ignore-blank-lines HEAD~1
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129768
Approved by: https://github.com/jansel
2024-07-20 16:20:58 +00:00
cbf274d4a7 [aoti] Add packaging solution (#129895)
In this PR, I added support for packaging the AOTI generated files into a zipfile, and loading it in python.

`compile_so` takes the path to the package, a device, and a desired so_path location, and compiles package into a .so, and saves to the specified location.
`load_package` takes a path to the package and device, calls _extract_so, and then creates a callable to run the compiled model.

The zipfile generated looks like the following:
```
|- version
|- archive_format
|- data
   |- aotinductor
      |- cbtnafqaqrhvwztv7xudlal4xs6sofxa5oxccyuaqtrt6aozaklx.cubin  # AOTI cuda generated cubin files
      |- cskkqtna23bty2v3aq7g2q37cxrgufehlkuaaolhlgug5zg6fuwe.cpp  # AOTI generated cpp file
      |- cskkqtna23bty2v3aq7g2q37cxrgufehlkuaaolhlgug5zg6fuwe_compile_flags  # Flags for compiling the .o
      |- c6qqtnpgwfi3dv5nb76ai773kt45ezoxfwdmd7q37lvq6fs2tnoi.o  # AOTI saved const.o
      |- cskkqtna23bty2v3aq7g2q37cxrgufehlkuaaolhlgug5zg6fuwe_linker_flags  # Flags for linking the files to form the .so
   |- constants
      |- constants.pt  # Constants saved using torch.save, can be loaded using mmap
```

The workflow is something like:
```
with torch.no_grad():
    ep = torch.export.export(
        model,
        example_inputs,
        dynamic_shapes=dynamic_shapes,
        strict=False,
    )
    gm = ep.module()
    package_path = torch._inductor.aot_compile(
        gm,
        example_inputs,
        options= {
              "aot_inductor.output_path": "my_path.pt2",  # or a directory
              "aot_inductor.package": True,
        }
    )
compiled_model = torch._inductor.package.load_package(package_path, device)
return compiled_model
```

I tried turning on loading the weights using mmap by default, but had some trouble with it, so that is just left as a todo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129895
Approved by: https://github.com/malfet
2024-07-17 13:56:58 +00:00