4742080ed9
[AOTI XPU] Enable Cpp wraper for Intel GPU. ( #135318 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135318
Approved by: https://github.com/jgong5 , https://github.com/EikanWang , https://github.com/guangyey , https://github.com/desertfire
2024-11-26 11:51:32 +00:00
7b2138b864
[inductor] fix uncaught exception when checking for openmp on macos ( #141208 )
...
Based on #133776
Pull Request resolved: https://github.com/pytorch/pytorch/pull/141208
Approved by: https://github.com/Skylion007
2024-11-21 22:17:52 +00:00
12e95aa4ee
[BE]: Apply PERF401 autofixes from ruff ( #140980 )
...
* Automatically applies ruff rule 401. Turns loops into equivalent list comprehensions which are faster and do not leak the scope of the loop variables.
* list comprehensions not only often have better typing, but are 50+% faster than for loops on overhead. They also preserve length information etc and are better for the interpreter to optimize.
* Manually went back and made mypy happy after the change.
* Also fixed style lints in files covered by flake8 but not by pyfmt
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140980
Approved by: https://github.com/justinchuby , https://github.com/malfet
2024-11-20 17:52:07 +00:00
263a5bf95e
[cpu] Modify inductor opt flag --- ftree-loop-vectorize ( #136827 )
...
Reopen https://github.com/pytorch/pytorch/pull/121782 , as more optimizations have landed.
Fixes https://github.com/pytorch/pytorch/issues/115261 , https://github.com/pytorch/pytorch/issues/113017 .
For CPU inductor path, remove -ftree-loop-vectorize from optimization flags to fix functional issues.
### Validation on 3 benchmark suites
#### FP32

Outlier models (speedup<0.8, single socket): None.
#### BF16

Outlier models (speedup<0.8, single socket multi threads):
- functorch_dp_cifar10 0.58
- opacus_cifar10 0.57
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136827
Approved by: https://github.com/jansel , https://github.com/jgong5
2024-11-12 01:26:18 +00:00
347f96061f
Revert "[cpu] Modify inductor opt flag --- ftree-loop-vectorize ( #136827 )"
...
This reverts commit cf0bb6c435c58db4c72e489f462e1a0ebe310f14.
Reverted https://github.com/pytorch/pytorch/pull/136827 on behalf of https://github.com/ZainRizvi due to Sorry but this breaks internally. See D65605094 for more details ([comment](https://github.com/pytorch/pytorch/pull/136827#issuecomment-2465805271 ))
2024-11-08 21:52:33 +00:00
cf0bb6c435
[cpu] Modify inductor opt flag --- ftree-loop-vectorize ( #136827 )
...
Reopen https://github.com/pytorch/pytorch/pull/121782 , as more optimizations have landed.
Fixes https://github.com/pytorch/pytorch/issues/115261 , https://github.com/pytorch/pytorch/issues/113017 .
For CPU inductor path, remove -ftree-loop-vectorize from optimization flags to fix functional issues.
### Validation on 3 benchmark suites
#### FP32

Outlier models (speedup<0.8, single socket): None.
#### BF16

Outlier models (speedup<0.8, single socket multi threads):
- functorch_dp_cifar10 0.58
- opacus_cifar10 0.57
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136827
Approved by: https://github.com/jansel , https://github.com/jgong5
2024-11-07 02:49:52 +00:00
b021486405
Enable Windows Arm64 ( #133088 )
...
This PR enables Pytorch for Windows on Arm64 - CPU only.
Currently, there aren't any checks in place to build and test for Windows on Arm64, but we're working to implement those as soon as possible.
We recommend using [Arm Performance Libraries (APL)](https://developer.arm.com/Tools%20and%20Software/Arm%20Performance%20Libraries ) as a BLAS option, which is introduced in this PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133088
Approved by: https://github.com/malfet
Co-authored-by: cristian panaite <panaite.cristian2000@gmail.com >
Co-authored-by: Stefan-Alin Pahontu <56953855+alinpahontu2912@users.noreply.github.com >
Co-authored-by: Ozan Aydin <148207261+ozanMSFT@users.noreply.github.com >
2024-10-24 16:10:44 +00:00
2414c3f534
AOTI fixes for MI300 lowering ( #137939 )
...
Summary:
1) Add sleef back to enable SIMD on AMD
2) adding kpack to triton compute_meta for AMD triton, since there will be user-defined triton kernels using this for k-dim packing
Test Plan:
```
HIP_VISIBLE_DEVICES=0 TORCHINDUCTOR_UNIQUE_KERNEL_NAMES=1 TORCH_LOGS="output_code,graph_code" buck run mode/{opt,amd-gpu} -c fbcode.triton_backend=amd -c fbcode.enable_gpu_sections=true //hpc/new/models/feed/benchmark:feed_lower_benchmark -- --skip-flop-estimation --skip-trt --skip-ait --enable-aot-inductor --sync-mode=0 --gpu-trace --sample-input-tile-factor=1 --load="manifold://ads_storage_fblearner/tree/user/facebook/fblearner/predictor/925729118/0/gpu_lowering/input.merge" --lowering-input-str='{"serialized_inference_model_input_path":"ads_storage_fblearner/tree/user/facebook/fblearner/predictor/925729118/0/gpu_lowering/input.merge","serialized_inference_model_output_path":"ads_storage_fblearner/tree/user/facebook/fblearner/predictor/925729118/0/gpu_lowering/mi300_output.merge","submodule_names_to_lower":["merge"],"inductor_lowering_context":{"aot_inductor_lowering_settings":{"use_scripting":true,"preset_lowerer":"ifu_cint;disable_new_lowering_weights;disable_dper_passes:passes=fuse_parallel_linear_no_weight_change","precision":3,"output_precision":3, "remove_unexpected_type_cast":false, "sample_input_tile_factor":32}},"model_entity_id":925729118,"model_snapshot_id":0,"add_sample_inputs":false,"hardware_type":0,"platform_arch":1,"dense_in_place_format":2}' --precision=bf16 2>&1 | tee local_benchmark_log.txt
```
Differential Revision: D64262924
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137939
Approved by: https://github.com/frank-wei
2024-10-17 16:09:04 +00:00
fe43f72be7
[AOTI] Remove the non-ABI-compatible mode (part 2) ( #138047 )
...
Summary: Continue to clean up non-ABI-compatible mode related code.
Differential Revision: [D64444327](https://our.internmc.facebook.com/intern/diff/D64444327 )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138047
Approved by: https://github.com/chenyang78
ghstack dependencies: #137982 , #138016 , #138009
2024-10-17 02:54:24 +00:00
a0a978ce23
[aoti config] add raise_error_on_ignored_optimization ( #138035 )
...
Summary: Unfortunately this means adding another config.
Test Plan: ci
Differential Revision: D64437699
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138035
Approved by: https://github.com/chenyang78 , https://github.com/desertfire
2024-10-16 18:38:47 +00:00
c04b35a5ae
[AOTI] Add standalone version of TORCH_CHECK ( #136873 )
...
Summary: In the standalone mode, TORCH_CHECK throws std::runtime_error, instead of c10::Error. The goal is to cut dependency on libtorch. Specifically, AOTI generates CPU code which may call ATen vectorization ops and we need to make sure those ops are self-contained.
Differential Revision: [D63911928](https://our.internmc.facebook.com/intern/diff/D63911928 )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136873
Approved by: https://github.com/albanD , https://github.com/chenyang78
2024-10-08 15:30:01 +00:00
b3972ee19a
[triton] Unify build_paths.py for NV & AMD, fix typing ( #136952 )
...
Summary: Some build improvements.
Test Plan: CI
Differential Revision: D63583959
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136952
Approved by: https://github.com/bertmaher
2024-09-30 21:51:45 +00:00
2a178a6982
Avoid changing FTZ/DAZ flags in CPP builder ( #136466 )
...
Fixes https://github.com/pytorch/pytorch/issues/136273
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136466
Approved by: https://github.com/ezyang
2024-09-24 14:39:17 +00:00
67735d1ee8
[Inductor] Generalize is_cuda
to specific device_type to make cpp_wrapper mode be extensible ( #134693 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134693
Approved by: https://github.com/ezyang , https://github.com/EikanWang , https://github.com/jansel
2024-09-10 10:11:13 +00:00
29d72c1100
[inductor] check intel compiler minimal version ( #135209 )
...
On Windows: early version icx has `-print-file-name` issue, and can't preload correctly for inductor. Add minimal version check for Intel compiler.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135209
Approved by: https://github.com/ezyang
2024-09-06 03:21:07 +00:00
6448d351db
[inductor] clean up cpp_builder code. ( #134909 )
...
Clean up cpp_builder duplication code.
Hi @henrylhtsang , could you please help on land internally?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134909
Approved by: https://github.com/henrylhtsang
2024-09-04 05:29:08 +00:00
c40e622966
[inductor] add openmp config for intel conpiler on Linux. ( #134973 )
...
Config `openmp` for Intel Compiler on Linux.
Base on this PR, we can confirm the Intel optimized libraries are work built well.
<img width="1039" alt="image" src="https://github.com/user-attachments/assets/838d5114-c778-4961-9cfe-39a814647089 ">
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134973
Approved by: https://github.com/jgong5 , https://github.com/jansel
2024-09-03 20:10:21 +00:00
136badae64
[inductor] preload icx built in math libs ( #134870 )
...
Intel Compiler implenmented more math libraries than clang, for performance proposal.
We need preload them like openmp library.
reproduce UT:
```cmd
pytest test/inductor/test_cpu_cpp_wrapper.py -v -k test_silu_cpu_dynamic_shapes_cpp_wrapper
```
Depends of module:
<img width="804" alt="Image" src="https://github.com/user-attachments/assets/9a672e03-ebf5-4ebb-b182-09180e6f7841 ">
Local test pass:
<img width="857" alt="image" src="https://github.com/user-attachments/assets/afbb8c1c-8fcc-4d64-a3ad-c8521b137d2d ">
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134870
Approved by: https://github.com/jansel
2024-08-31 04:50:31 +00:00
15f5a4858b
[inductor] enable Intel Compiler(icx-cl) for inductor windows ( #134772 )
...
This PR is enable Intel Compiler (`icx-cl`) for Windows inductor, likes previous PR: https://github.com/pytorch/pytorch/pull/134444 which enable clang.
Changes:
1. Fix icx-cl crash by wrong decode args, the right decode should be "utf-8".
2. Add intel compiler check, and intel compiler Windows drivers check(icx-cl).
3. Add Intel compiler openmp args config.
4. Add intel compiler openmp binary preload.
For intel compiler openmp binary path:
<img width="788" alt="image" src="https://github.com/user-attachments/assets/54c76356-018d-4bef-a9b7-0ea150fd7aba ">
For performance, Intel compiler(`icx-cl`) is much better performance than MSVC(`cl`):
<img width="875" alt="image" src="https://github.com/user-attachments/assets/67865faf-b1de-4535-917a-486b72527204 ">
Append `clang-cl` performance data:
<img width="821" alt="image" src="https://github.com/user-attachments/assets/476f4568-bf58-457f-b73d-4e57f49be384 ">
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134772
Approved by: https://github.com/jgong5 , https://github.com/jansel
2024-08-30 17:51:46 +00:00
8b4c487581
Fix AOTInductor complication on ROCM ( #134522 )
...
Summary:
Original PR (https://github.com/pytorch/pytorch/pull/124123 ) is broken by cpp_builder refactoring
So resubmit it to fix
Test Plan: Test with command here: https://www.internalfb.com/phabricator/paste/view/P1549765548
Differential Revision: D61827208
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134522
Approved by: https://github.com/frank-wei
2024-08-29 21:59:04 +00:00
1dd4b9221b
[inductor] enable clang for Windows inductor ( #134444 )
...
Changes:
1. Add Windows clang-cl compiler check.
2. Add openmp config for clang-cl.
3. Preload libomp.dll when use clang.
4. Add compiler flags syntax check for `clang` and `clang++`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134444
Approved by: https://github.com/jgong5 , https://github.com/jansel , https://github.com/malfet
2024-08-26 18:19:59 +00:00
98d6a6eb7d
[inductor] clean up TODO comments. ( #133718 )
...
clean up TODO comments.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133718
Approved by: https://github.com/henrylhtsang
2024-08-16 22:12:01 +00:00
89795da5e3
[inductor] process compile_only case in all build options class. ( #129975 )
...
Optimize `compile_only` logical. Origin code only apply for `CppTorchCudaOptions`, this PR make it apply for all build option classes.
Changes:
1. Remove `libraries_dirs` and `libraries` settings, when `compile_only`.
2. Remove compile_only from CppTorchCudaOptions.
3. Make the `compile_only` apply for all classes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129975
Approved by: https://github.com/henrylhtsang
2024-08-13 16:45:27 +00:00
9f0d90655d
[inductor] cpp_builder add dynamo time trace for compile_file ( #133103 )
...
trace `compile_file` time for cpp_builder.
Ref: https://github.com/pytorch/pytorch/pull/132328/files#diff-c9b517f8db609ffa866804dfa2689188a4fee20abacaa0b0dca91625c1b5cb8dR2224
<img width="994" alt="image" src="https://github.com/user-attachments/assets/862c7943-79dc-4d06-b398-a09595ad1295 ">
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133103
Approved by: https://github.com/ezyang
2024-08-10 04:55:02 +00:00
e98eac76b3
[inductor] switch AotCodeCompiler to new cpp_builder. (take 3) ( #132766 )
...
Summary: This is basically https://github.com/pytorch/pytorch/pull/131304 together with https://github.com/pytorch/pytorch/pull/132594 and absolute path fix for fbcode.
Test Plan: ci
Differential Revision: D60773405
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132766
Approved by: https://github.com/xuhancn , https://github.com/chenyang78 , https://github.com/desertfire
2024-08-06 23:56:34 +00:00
a672f6c84e
[inductor] unificate SUBPROCESS_DECODE_ARGS variable in cpp_builder.py ( #132615 )
...
[inductor] unificate SUBPROCESS_DECODE_ARGS variable in cpp_builder.py
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132615
Approved by: https://github.com/jgong5 , https://github.com/desertfire
2024-08-05 16:00:35 +00:00
7f8a384a8f
[inductor] add msvc_cl compiler check ( #132571 )
...
add `msvc_cl` compiler check.
Local test:
<img width="880" alt="image" src="https://github.com/user-attachments/assets/fe4da5e0-dd52-4dbc-831e-c32479e27a29 ">
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132571
Approved by: https://github.com/ezyang
2024-08-04 03:48:25 +00:00
36ec0fdf10
[inductor] check compiler exist on Windows. ( #132533 )
...
Current Windows env, if we are not activate the MSVC env. It will not raise a clear error to compiler:
<img width="904" alt="image" src="https://github.com/user-attachments/assets/725ea608-d181-40b1-8930-42fe2b32643a ">
With this PR, we can help users point to the issue is from compiler.
<img width="1034" alt="image" src="https://github.com/user-attachments/assets/8515a796-e3e9-4909-a68f-8a14d4864951 ">
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132533
Approved by: https://github.com/jansel
2024-08-03 07:47:11 +00:00
475da800c7
[inductor] optimize cflags for Windows. ( #131980 )
...
changes:
1. optimize cflags for Windows. Ref: https://github.com/pytorch/pytorch/blob/v2.4.0/torch/utils/cpp_extension.py#L215
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131980
Approved by: https://github.com/jgong5 , https://github.com/jansel
2024-07-30 02:59:51 +00:00
28fd2e905d
[inductor] enhance cpp_builder lint check. ( #131752 )
...
enhance cpp_builder `mypy` check.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131752
Approved by: https://github.com/jgong5 , https://github.com/jansel
2024-07-27 02:46:27 +00:00
72d17d95d7
[inductor] Enable dynamo for Windows. RC1 ( #131286 )
...
Changes:
1. Enable Windows in `check_if_inductor_supported`.
2. Disable Windows in `AotCodeCompiler`.
3. Force Windows inductor to `c++20` to support `std::enable_if_t`.
4. Disable `test_x86inductor_quantizer` UT on `Windows` temporary, It still some issue need to be fix: https://github.com/pytorch/pytorch/pull/131308 .
Based on this PR, I have run first model `resnet18` on Windows inductor successful.
<img width="1036" alt="image" src="https://github.com/user-attachments/assets/2642bda1-1845-417a-aaba-39bdf22e65d6 ">
TODO:
1. Upgrade pytorch Windows build to `c++20`.
2. Fix and re-enable `test_x86inductor_quantizer` UT on `Windows`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131286
Approved by: https://github.com/jgong5 , https://github.com/jansel
2024-07-24 15:26:55 +00:00
b6d477fd56
[BE][Easy][16/19] enforce style for empty lines in import segments in torch/_i*/
( #129768 )
...
See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501 . Most changes are auto-generated by linter.
You can review these PRs via:
```bash
git diff --ignore-all-space --ignore-blank-lines HEAD~1
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129768
Approved by: https://github.com/jansel
2024-07-20 16:20:58 +00:00
6e7b9ee8a0
[inductor] adapte windows file path ( #130713 )
...
This PR is depends on https://github.com/pytorch/pytorch/pull/130132 can be landed successful.
The detailed log: https://github.com/pytorch/pytorch/issues/124245#issuecomment-2211889758
After the file path was adapted for Windows, the first Windows inductor case was run successful.
```python
import torch
def foo(x, y):
a = torch.sin(x)
b = torch.cos(x)
return a + b
opt_foo1 = torch.compile(foo)
print(opt_foo1(torch.randn(10, 10), torch.randn(10, 10)))
```
Result:

Co-authored-by: Jiong Gong <jiong.gong@intel.com >
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130713
Approved by: https://github.com/jgong5 , https://github.com/jansel , https://github.com/desertfire
2024-07-18 23:19:38 +00:00
41f5d5dcaf
Revert "[inductor] adapte windows file path ( #130713 )"
...
This reverts commit e51e971a8675826e517a78bf2a97f8e2df5f4abd.
Reverted https://github.com/pytorch/pytorch/pull/130713 on behalf of https://github.com/clee2000 due to sorry but I think its still failing, this time on windows CUDA https://github.com/pytorch/pytorch/actions/runs/9971126834/job/27552761451 bb62e9d7c3
. It was not run on PR due to being on the periodic workflow, which isnt usually run on PRs due to capacity issues for windows CUDA machines. I will add ciflow/periodic to the PR to ensure the test gets run ([comment](https://github.com/pytorch/pytorch/pull/130713#issuecomment-2234092078 ))
2024-07-17 19:37:16 +00:00
cbf274d4a7
[aoti] Add packaging solution ( #129895 )
...
In this PR, I added support for packaging the AOTI generated files into a zipfile, and loading it in python.
`compile_so` takes the path to the package, a device, and a desired so_path location, and compiles package into a .so, and saves to the specified location.
`load_package` takes a path to the package and device, calls _extract_so, and then creates a callable to run the compiled model.
The zipfile generated looks like the following:
```
|- version
|- archive_format
|- data
|- aotinductor
|- cbtnafqaqrhvwztv7xudlal4xs6sofxa5oxccyuaqtrt6aozaklx.cubin # AOTI cuda generated cubin files
|- cskkqtna23bty2v3aq7g2q37cxrgufehlkuaaolhlgug5zg6fuwe.cpp # AOTI generated cpp file
|- cskkqtna23bty2v3aq7g2q37cxrgufehlkuaaolhlgug5zg6fuwe_compile_flags # Flags for compiling the .o
|- c6qqtnpgwfi3dv5nb76ai773kt45ezoxfwdmd7q37lvq6fs2tnoi.o # AOTI saved const.o
|- cskkqtna23bty2v3aq7g2q37cxrgufehlkuaaolhlgug5zg6fuwe_linker_flags # Flags for linking the files to form the .so
|- constants
|- constants.pt # Constants saved using torch.save, can be loaded using mmap
```
The workflow is something like:
```
with torch.no_grad():
ep = torch.export.export(
model,
example_inputs,
dynamic_shapes=dynamic_shapes,
strict=False,
)
gm = ep.module()
package_path = torch._inductor.aot_compile(
gm,
example_inputs,
options= {
"aot_inductor.output_path": "my_path.pt2", # or a directory
"aot_inductor.package": True,
}
)
compiled_model = torch._inductor.package.load_package(package_path, device)
return compiled_model
```
I tried turning on loading the weights using mmap by default, but had some trouble with it, so that is just left as a todo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129895
Approved by: https://github.com/malfet
2024-07-17 13:56:58 +00:00
e51e971a86
[inductor] adapte windows file path ( #130713 )
...
This PR is depends on https://github.com/pytorch/pytorch/pull/130132 can be landed successful.
The detailed log: https://github.com/pytorch/pytorch/issues/124245#issuecomment-2211889758
After the file path was adapted for Windows, the first Windows inductor case was run successful.
```python
import torch
def foo(x, y):
a = torch.sin(x)
b = torch.cos(x)
return a + b
opt_foo1 = torch.compile(foo)
print(opt_foo1(torch.randn(10, 10), torch.randn(10, 10)))
```
Result:

Co-authored-by: Jiong Gong <jiong.gong@intel.com >
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130713
Approved by: https://github.com/jgong5 , https://github.com/jansel , https://github.com/desertfire
2024-07-17 06:36:11 +00:00
5f3c356a56
Revert "[inductor] adapte windows file path ( #130713 )"
...
This reverts commit 69e99172450e40536bf2e6c110183d34a0e283e2.
Reverted https://github.com/pytorch/pytorch/pull/130713 on behalf of https://github.com/clee2000 due to broke functorch\test_eager_transforms.py on windows https://github.com/pytorch/pytorch/actions/runs/9958208725/job/27530132704 69e9917245
. Test failure on PR is real, possibly force merged to get around lint error? ([comment](https://github.com/pytorch/pytorch/pull/130713#issuecomment-2231901793 ))
2024-07-16 22:07:55 +00:00
69e9917245
[inductor] adapte windows file path ( #130713 )
...
This PR is depends on https://github.com/pytorch/pytorch/pull/130132 can be landed successful.
The detailed log: https://github.com/pytorch/pytorch/issues/124245#issuecomment-2211889758
After the file path was adapted for Windows, the first Windows inductor case was run successful.
```python
import torch
def foo(x, y):
a = torch.sin(x)
b = torch.cos(x)
return a + b
opt_foo1 = torch.compile(foo)
print(opt_foo1(torch.randn(10, 10), torch.randn(10, 10)))
```
Result:

Co-authored-by: Jiong Gong <jiong.gong@intel.com >
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130713
Approved by: https://github.com/jgong5 , https://github.com/jansel , https://github.com/desertfire
2024-07-16 13:53:39 +00:00
e235db98c9
[Inductor] Add aot_mode UT to new cpp_builder. ( #130105 )
...
Changes:
1. Add `aot_mode` parameter to `validate_new_cpp_commands` UT.
2. Switch AotCodeCompiler vec isa command gen to new cpp_builder.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130105
Approved by: https://github.com/jgong5 , https://github.com/jansel
2024-07-09 04:08:35 +00:00
f9bb258892
Revert "[Inductor] Add aot_mode UT to new cpp_builder. ( #130105 )"
...
This reverts commit 21eeedb4554edab22b42bcb2f75f19e85652b72e.
Reverted https://github.com/pytorch/pytorch/pull/130105 on behalf of https://github.com/izaitsevfb due to Breaks 46 tests internally at meta with: OSError: CUDA_HOME environment variable is not set ([comment](https://github.com/pytorch/pytorch/pull/130105#issuecomment-2215392198 ))
2024-07-08 21:40:03 +00:00
5e467604c3
Revert "[inductor] switch AotCodeCompiler to new cpp_builder ( #130127 )"
...
This reverts commit dc5f37193f8d144d3de8525bf64eb1775d91e932.
Reverted https://github.com/pytorch/pytorch/pull/130127 on behalf of https://github.com/izaitsevfb due to Depends on #130105 which has to be reverted ([comment](https://github.com/pytorch/pytorch/pull/130127#issuecomment-2215355259 ))
2024-07-08 21:25:28 +00:00
dc5f37193f
[inductor] switch AotCodeCompiler to new cpp_builder ( #130127 )
...
Changes:
1. Switch `AotCodeCompiler` to new cpp_builder.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130127
Approved by: https://github.com/jgong5 , https://github.com/jansel
2024-07-06 18:44:13 +00:00
01ec03bac6
[inductor] switch HalideCodeCache to new cpp_builder. ( #130146 )
...
Fixes #ISSUE_NUMBER
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130146
Approved by: https://github.com/jgong5 , https://github.com/jansel
2024-07-06 17:35:17 +00:00
21eeedb455
[Inductor] Add aot_mode UT to new cpp_builder. ( #130105 )
...
Changes:
1. Add `aot_mode` parameter to `validate_new_cpp_commands` UT.
2. Switch AotCodeCompiler vec isa command gen to new cpp_builder.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130105
Approved by: https://github.com/jgong5 , https://github.com/jansel
2024-07-04 19:08:56 +00:00
2926655761
[inductor] optimize cpp builder configuration code ( #129577 )
...
Changes:
1. Combine choose isa condition dispatch code.
2. Unificate MacOS openmp configuration code.
3. Clean up useless code.
Co-authored-by: Jason Ansel <jansel@jansel.net >
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129577
Approved by: https://github.com/jgong5 , https://github.com/jansel
2024-07-02 14:41:59 +00:00
567dd1a3ca
[inductor] unificate toolchain code. ( #129816 )
...
This PR is the implemention of https://github.com/pytorch/pytorch/issues/124245#issuecomment-2197778902 plan 2, and it is continued PR to https://github.com/pytorch/pytorch/pull/129789
Changes:
1. Unificate cpp builder's toolchain code.
2. Move all build related code to `cpp_builder.py`.
3. Optimize `codecache.py`, `cpp_builder.py` and `cpu_vec_isa.py` import logical follow: https://github.com/pytorch/pytorch/issues/124245#issuecomment-2197778902
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129816
Approved by: https://github.com/jansel
2024-07-02 09:52:06 +00:00
76259ebfdd
[inductor] split cpu vec isa to dedicate file (keep git history) ( #129789 )
...
This PR is the implemention of https://github.com/pytorch/pytorch/issues/124245#issuecomment-2197778902 plan 1
Changes:
1. Duplicate `codecache.py` to `cpu_vec_isa.py` with its `git history`.
<img width="745" alt="image" src="https://github.com/pytorch/pytorch/assets/8433590/106533da-ce80-4825-8271-35ffb3141f92 ">
2. Make `cpu_vec_isa.py` as dedicate file for CPU vec isa. It also good to extend for more archtectures and vec isa.
3. Update code for above changes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129789
Approved by: https://github.com/jgong5 , https://github.com/jansel
2024-07-02 05:29:05 +00:00
433b691f98
Revert "[inductor] optimize cpp builder configuration code ( #129577 )"
...
This reverts commit 2e3ff394bf94d3b9cbab0fe8a93a9ea7c9cb4267.
Reverted https://github.com/pytorch/pytorch/pull/129577 on behalf of https://github.com/jeanschmidt due to Breaking internal builds, see D59181128 ([comment](https://github.com/pytorch/pytorch/pull/129577#issuecomment-2200554824 ))
2024-07-01 16:14:06 +00:00
19e17216a2
Revert "[inductor] split cpu vec isa to dedicate file (keep git history) ( #129789 )"
...
This reverts commit 58f346c874a8a982679b4d4f3876602cc05d66d4.
Reverted https://github.com/pytorch/pytorch/pull/129789 on behalf of https://github.com/jeanschmidt due to Need to revert in order to revert https://github.com/pytorch/pytorch/pull/129577 ([comment](https://github.com/pytorch/pytorch/pull/129789#issuecomment-2200545144 ))
2024-07-01 16:08:44 +00:00
b6dc37bb4e
Revert "[inductor] unificate toolchain code. ( #129816 )"
...
This reverts commit 67c9ec2b6d12ffd0e83861dcc16c1cd1a9b74d35.
Reverted https://github.com/pytorch/pytorch/pull/129816 on behalf of https://github.com/jeanschmidt due to Need to revert in order to revert #129577 ([comment](https://github.com/pytorch/pytorch/pull/129816#issuecomment-2200539687 ))
2024-07-01 16:06:22 +00:00