Summary: AOTI already has weights embedded in .so file. So for the initial load, no need to load the weights again. This allows lowered modules can have different set of weights on different hardwares.
Test Plan:
```
MODEL_TYPE=ads_mtml_offsite_cvr_oba_optout_dedicated_model
MODEL_ENTITY_ID=895279202
SNAPSHOT_ID=0
MODULE=merge
buck2 run mode/dev-nosan -c fbcode.nvcc_arch=a100,h100 -c fbcode.enable_gpu_sections=true fbcode//caffe2/torch/fb/model_transform/fx2trt/packaging:load_net_predictor -- --loadMode=Benchmark --inputNetFile=/data/users/$USER/models/${MODEL_ENTITY_ID}/${SNAPSHOT_ID}/${MODEL_ENTITY_ID}_${SNAPSHOT_ID}.predictor.disagg.gpu.${MODULE} --moduleName ${MODULE} --predictor-hardware-type 1 --submodToDevice "" --benchmarkDontRebatchSamples=true --benchmarkNumIterations 1000
```
Rollback Plan:
Differential Revision: D78383881
Pull Request resolved: https://github.com/pytorch/pytorch/pull/158416
Approved by: https://github.com/henryoier, https://github.com/SherlockNoMad
Summary:
Moves DelegateExecutor base class to PyTorch core. It provides the extension point of backend delegation for NativeRT.
Torch Native Runtime RFC: pytorch/rfcs#72
Test Plan:
This is only a virtual base class. So relying on internal CI is sufficient.
Rollback Plan:
Differential Revision: D76351984
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155581
Approved by: https://github.com/zhxchen17