Using the existing WrapperFxCodegen backend, this PR prototypes an AOT version of it which will directly return a graph module.
How to use:
```python
exported_gm = torch.export.export(model, inp, dynamic_shapes=dynamic_shapes).module()
compiled_gm = torch._inductor.aot_compile(
exported_gm, inp, options={"fx_wrapper": True, "compile_threads": 1}
)
assert torch.allclose(model(*inp), compiled_gm(*inp))
```
The motivation behind this is that backends like ExecuTorch/MTIA would like to use inductor's optimization technologies, but might have their own graph lowering pipelines so they might not want to use AOTI (which generates an so).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/160765
Approved by: https://github.com/jansel
We package the weights and save them in `data/weights/` (`WEIGHTS_DIR`). In addition, we store a `weights_config.json` in the model folder for each model to specify which weight file corresponding to which weight name.
Models can share weights. We dedup the weights based on their underlying storage (`tensor.untyped_storate()`).
- Use `"aot_inductor.package_constants_on_disk": True` config to produce the `Weights` in aot_compile
- If we see `Weights` in aoti_files, we'll automatically package them to disk
- `"aot_inductor.package_constants_on_disk"` config and `"aot_inductor.package_constants_in_so"` config work independently.
- Use `load_pt2(package_path, load_weights_from_disk=True)` to load the weights from disk. `load_weights_from_disk` defaults to False.
Test Plan:
```
buck2 run @//mode/dev-nosan //caffe2/test/inductor:aot_inductor_package -- -r "test_package_shared_weights"
```
Tested with whisper at https://github.com/pytorch-labs/torchnative/pull/7
Rollback Plan:
Differential Revision: D74747190
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155241
Approved by: https://github.com/desertfire
For better tracking, we need to make maybe aliasing/mutating ops with proper tag. We need to special case native_batch_norm because it is not a CIA but has a wrong schema. I guess native_batch_norm will be removed at some point, so until then we just keep it around.
D60347117
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131990
Approved by: https://github.com/bdhirsh
# Why?
I want the following code to work.
minimal repro:
```
class M(torch.nn.Module):
def forward(self, dilate_flag):
return dilate_flag.item()
input1 = (torch.tensor([1], dtype=torch.bool, device="cuda"),)
model = M().cuda()
ep = torch.export.export(model, input1, strict=True)
path = torch._inductor.aot_compile(ep.module(), input1)
aot_model = torch._export.aot_load(path, device="cuda")
actual_output = aot_model(*input1)
```
error: AssertionError: Encountered an unsupported object of type <class 'torch.SymBool'> while writing the metadata for exported program
second error will be handled by https://github.com/pytorch/pytorch/pull/138760
# Motivation
I could technically bypass it with a torch.int tensor. However, it doesn't work with torch.cond. I want the following to work. It would also require https://github.com/pytorch/pytorch/pull/138760 for aot compile to work.
```
class M(torch.nn.Module):
def __init__(self) -> None:
super().__init__()
self.dilate_flag = 0
def forward(self, dilate_flag):
self.dilate_flag = dilate_flag.item()
def true_fn(dilate_flag):
return dilate_flag.clone()
def false_fn(dilate_flag):
return dilate_flag.clone()
torch.cond(
self.dilate_flag,
true_fn,
false_fn,
(dilate_flag,),
)
return self.dilate_flag
input1 = (torch.tensor([1], dtype=torch.bool, device="cuda"),)
input2 = (torch.tensor([0], dtype=torch.bool, device="cuda"),)
inputs = (input1, input2)
model = M().cuda()
for input in inputs:
expected_output = model(*input)
ep = torch.export.export(model, input, strict=False)
path = torch._inductor.aot_compile(ep.module(), input)
aot_model = torch._export.aot_load(path, device="cuda")
actual_output = aot_model(*input)
assert (
expected_output == actual_output
), f"henry they are not equal {expected_output} != {actual_output}"
```
Differential Revision: D64867504
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138765
Approved by: https://github.com/ydwu4
Summary:
Skip test_prepare_qat_conv_bn_fusion_getitem_placeholder when we use training ir, since it's only for bn-getitem pattern, but the pattern doesn't exist in training ir.
Remove BLOCK_LIST since it's empty.
Now all internal unittests will use training ir.
Test Plan:
```
buck2 run 'fbcode//mode/dev-nosan' caffe2/test/quantization:test_quantization -- -r test_prepare_qat_conv_bn_fusion_getitem_placeholder
buck2 run 'fbcode//mode/dev-nosan' caffe2/test:quantization_pt2e_qat -- -r test_prepare_qat_conv_bn_fusion_getitem_placeholder
```
Differential Revision: D62387987
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135729
Approved by: https://github.com/tugsbayasgalan
Summary:
Make quantization tests compatible with the new training IR.
With the new batch norm node `torch.ops.aten.batch_norm.default`, we don't need an additional getitem node after the bn node, so tests need to be fixed to not check for the getitem node.
We added a capture_pre_autograd_graph_using_training_ir() function, which returns True when we are using the training ir, and False otherwise. This way, the code supports both training ir and the old ir.
For now, we are just rolling out the training ir for fbcode internal tests.
Test Plan:
```
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_qat_preserve_source_fn_stack
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_qat_update_shared_qspec
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_conv2d
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_qat_conv_bn_relu_fusion
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_qat_conv_bn_fusion
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_qat_conv_bn_fusion_literal_args
```
Reviewed By: andrewor14, tugsbayasgalan
Differential Revision: D61292102
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134259
Approved by: https://github.com/tugsbayasgalan
Summary:
Change ReorderConvertTest to work with the new `capture_pre_autograd_graph` implementation using D61175223.
Note that now `ReorderConvertTest` doesn't work with the old `capture_pre_autograd_graph` anymore.
Test Plan:
```
buck2 run 'fbcode//mode/dev-nosan' fbcode//bolt/nn/executorch/passes/tests:optimize_test -- -r ReorderConvertTest
```
Differential Revision: D61507772
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134010
Approved by: https://github.com/tugsbayasgalan