mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Files

PyTorch MergeBot 2a308c7dee Revert "Improve device info with new flops and bandwidth formula based on hardware libraries (#162245 )"

This reverts commit 35d7b321597ed00245aad533a8fa6b7fdadd73ea.

Reverted https://github.com/pytorch/pytorch/pull/162245 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/162245#issuecomment-3313669412))

2025-09-19 20:09:12 +00:00

__init__.py

Inductor logging + analysis of torch.profile (#149697 )

2025-07-07 22:13:34 +00:00

device_info.py

Revert "Improve device info with new flops and bandwidth formula based on hardware libraries (#162245 )"

2025-09-19 20:09:12 +00:00

profile_analysis.py

Revert "Improve device info with new flops and bandwidth formula based on hardware libraries (#162245 )"

2025-09-19 20:09:12 +00:00

README.md

[BE][2/16] fix typos in torch/ (torch/_*/) (#156312 )

2025-07-12 05:47:06 +00:00

README.md

`torch._inductor.analysis`

Contains scripts for inductor performance analysis.

Analysis

This will analyze a chrome trace to create a table useful for performance work. We mainly care about . Currently, it will add the flops and the memory reads of a kernel via formula (it's not looking at program counters or anything.) These, combined with the kernel duration, can be use to calculate achieved flops, achieved memory bandwidth, and roofline calculations.

Usage

python profile_analysis.py --analysis <input_json_profile> <default_dtype>

Arguments

input_json_profile: The json profile files generated by torch.profile.export_chrome_trace().
default_dtype: The default dtype of the model. Sometimes the dtypes of the kernel inputs are not available in the profile, so we use the default dtype to infer the dtypes of the inputs.

Diff

This mode will diff two different profiles and output a table of the differences. It groups by kernel name, which can fail to properly match across hardware vendors. More intelligent grouping coming soon.

Usage

python profile_analysis.py --diff <json_profile_1> <profile_name_1> <json_profile_2> <profile_name_2> <default_dtype> --name_limit 50

Arguments

json_profile_1 json_profile_2: The json profile files generated by torch.profile.export_chrome_trace().
profile_name_1 profile_name_2: The name of the profile. This is used to identify the profile in the output table.
default_dtype: The default dtype of the model. Sometimes the dtypes of the kernel inputs are not available in the profile, so we use the default dtype to infer the dtypes of the inputs.
name_limit: The maximum number of characters in the kernel name (they can be quite lengthy and hard to read).

Augment

This mode will add post-hoc analysis to a profile. Currently, it will add the flops and the memory reads of a kernel via formula (it's not looking at program counters or anything.) These, combined with the kernel duration, can be use to calculate achieved flops, achieved memory bandwidth, and roofline calculations.

Usage

python profile_analysis.py --augment_trace <input_json_profile> <output_json_profile> <default_dtype>

Arguments

input_json_profile: The json profile files generated by torch.profile.export_chrome_trace().
output_json_profile: Where the augmented profile is written.
default_dtype: The default dtype of the model. Sometimes the dtypes of the kernel inputs are not available in the profile, so we use the default dtype to infer the dtypes of the inputs.