Commit Graph

67 Commits

Author SHA1 Message Date
b0cf287b46 [export][training ir migration] Fix getitem not exist (#134259)
Summary:
Make quantization tests compatible with the new training IR.

With the new batch norm node `torch.ops.aten.batch_norm.default`, we don't need an additional getitem node after the bn node, so tests need to be fixed to not check for the getitem node.

We added a capture_pre_autograd_graph_using_training_ir() function, which returns True when we are using the training ir, and False otherwise. This way, the code supports both training ir and the old ir.

For now, we are just rolling out the training ir for fbcode internal tests.

Test Plan:
```
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_qat_preserve_source_fn_stack
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_qat_update_shared_qspec
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_conv2d
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_qat_conv_bn_relu_fusion

buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_qat_conv_bn_fusion
buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_qat_conv_bn_fusion_literal_args
```

Reviewed By: andrewor14, tugsbayasgalan

Differential Revision: D61292102

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134259
Approved by: https://github.com/tugsbayasgalan
2024-08-22 22:00:14 +00:00
3c5485fb7f [Retry] Log chromium events to scuba (#134118)
Summary:
This diff implements a bunch of views for internal scuba viewing.

TODOS that I might punt to another diff:
- Saving cache stats via counter is definitely sus here, but there's not really a good way to track "fx graph cache hit for this compile phase" right now. Will think about this more.
- We should definitely log frame id, compile id, etc
- We should definitely be logging configs. That way, we can A/B test based on whether a config is turned on.
- idk what I'm doing with compile_uuid yet, but it's useful when you want to look at samples for a single run. I think if we had mast job info this field is not needed, but it's nice to be able to drill down to a single run and get its chrome trace view or icicle view, so idk

Test Plan:
All of the above views are run with nanogpt benchmark:

```
buck run mode/opt caffe2/benchmarks/dynamo:torchbench -- --training --backend=inductor --only nanogpt --performance
```

Differential Revision: D61603243

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134118
Approved by: https://github.com/oulgen
2024-08-22 14:59:45 +00:00
8b6b1721c8 remove StrobelightCompileTimeProfiler.profile_compile_time from stacktrace when strobelight profiling not enabled (#133831)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133831
Approved by: https://github.com/oulgen
2024-08-19 09:14:52 +00:00
fa36eba77d Turn off remote caching in unit tests unless explicitly on (#133258)
Summary: This PR turns off remote caching in unit tests unless the unit test explicitly turns it on.

Test Plan: existing tests

Differential Revision: D61152154

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133258
Approved by: https://github.com/masnesral
2024-08-13 02:49:43 +00:00
eee76c86a8 Write trace_structured events to scuba (#130955)
Summary: https://fb.workplace.com/groups/1286739428954016/posts/1287192258908733

Test Plan: Run test with tlparse and inspect https://www.internalfb.com/intern/scuba/query/?dataset=pt2_trace_structured_events

Differential Revision: D59866096

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130955
Approved by: https://github.com/ezyang
2024-07-19 06:02:47 +00:00
37d4d04309 [torchscript] Add logging for model id. (#130118)
Summary: as title.

Test Plan: CI

Reviewed By: angelayi

Differential Revision: D59348256

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130118
Approved by: https://github.com/BoyuanFeng
2024-07-09 22:24:16 +00:00
f85d1e845a [BE] enable UFMT for torch/nn/*.py (#128593)
Part of #123062

- #123062
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128593
Approved by: https://github.com/mikaylagawarecki
2024-06-23 16:05:13 +00:00
cc8193c707 Revert "[BE] enable UFMT for torch/nn/functional.py (#128592)"
This reverts commit f6e6e55fa7d883a89ba99584f8632c260519ba73.

Reverted https://github.com/pytorch/pytorch/pull/128592 on behalf of https://github.com/fbgheith due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/128592#issuecomment-2181783936))
2024-06-21 00:44:16 +00:00
f6e6e55fa7 [BE] enable UFMT for torch/nn/functional.py (#128592)
Part of #123062

- #123062

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128592
Approved by: https://github.com/mikaylagawarecki
ghstack dependencies: #128596, #128594
2024-06-17 16:29:29 +00:00
afe15d2d2f Flip default value for mypy disallow_untyped_defs [3/11] (#127840)
See #127836 for details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127840
Approved by: https://github.com/oulgen
2024-06-08 18:28:01 +00:00
cdf2133186 Add compile time profiler for non fbcode targets (#126904)
This is a tool that allow profiling compile time using strobelight profiler, its a meta only tool.
but works on non-fbcode targets.

A follow up diff will unify this with caffe2/fb/strobelight/compile_time_profiler.py.
example test:

```
run  python tools/strobelight/examples/compile_time_profile_example.py
```

```
python torch/utils/_strobelight/examples/compile_time_profile_example.py
strobelight_compile_time_profiler, line 61, 2024-05-23 10:49:28,101, INFO: compile time strobelight profiling enabled
strobelight_compile_time_profiler, line 93, 2024-05-23 10:49:28,102, INFO: Unique sample tag for this run is: 2024-05-23-10:49:282334638devvm4561.ash0.facebook.com
strobelight_compile_time_profiler, line 94, 2024-05-23 10:49:28,102, INFO: You can use the following link to access the strobelight profile at the end of the run: https://www.internalfb.com/intern/scuba/query/?dataset=pyperf_experimental%2Fon_demand&drillstate=%7B%22purposes%22%3A[]%2C%22end%22%3A%22now%22%2C%22start%22%3A%22-30%20days%22%2C%22filterMode%22%3A%22DEFAULT%22%2C%22modifiers%22%3A[]%2C%22sampleCols%22%3A[]%2C%22cols%22%3A[%22namespace_id%22%2C%22namespace_process_id%22]%2C%22derivedCols%22%3A[]%2C%22mappedCols%22%3A[]%2C%22enumCols%22%3A[]%2C%22return_remainder%22%3Afalse%2C%22should_pivot%22%3Afalse%2C%22is_timeseries%22%3Afalse%2C%22hideEmptyColumns%22%3Afalse%2C%22timezone%22%3A%22America%2FLos_Angeles%22%2C%22compare%22%3A%22none%22%2C%22samplingRatio%22%3A%221%22%2C%22metric%22%3A%22count%22%2C%22aggregation_field%22%3A%22async_stack_complete%22%2C%22top%22%3A10000%2C%22aggregateList%22%3A[]%2C%22param_dimensions%22%3A[%7B%22dim%22%3A%22py_async_stack%22%2C%22op%22%3A%22edge%22%2C%22param%22%3A%220%22%2C%22anchor%22%3A%220%22%7D]%2C%22order%22%3A%22weight%22%2C%22order_desc%22%3Atrue%2C%22constraints%22%3A[[%7B%22column%22%3A%22sample_tags%22%2C%22op%22%3A%22all%22%2C%22value%22%3A[%22[%5C%222024-05-23-10:49:282334638devvm4561.ash0.facebook.com%5C%22]%22]%7D]]%2C%22c_constraints%22%3A[[]]%2C%22b_constraints%22%3A[[]]%2C%22ignoreGroupByInComparison%22%3Afalse%7D&view=GraphProfilerView&&normalized=1712358002&pool=uber
strobelight_function_profiler, line 241, 2024-05-23 10:49:34,943, INFO: strobelight run id is: 3507039740348330
strobelight_function_profiler, line 243, 2024-05-23 10:50:00,907, INFO: strobelight profiling running
strobelight_function_profiler, line 224, 2024-05-23 10:50:02,741, INFO: strobelight profiling stopped
strobelight_function_profiler, line 215, 2024-05-23 10:50:06,173, INFO: Total samples: 7
strobelight_function_profiler, line 215, 2024-05-23 10:50:06,173, INFO: GraphProfiler (python stack): https://fburl.com/scuba/pyperf_experimental/on_demand/75cxdro3
strobelight_function_profiler, line 215, 2024-05-23 10:50:06,173, INFO: Icicle view (python stack): https://fburl.com/scuba/pyperf_experimental/on_demand/qsgydsee
strobelight_compile_time_profiler, line 120, 2024-05-23 10:50:06,174, INFO: 1 strobelight success runs out of 1 non-recursive compilation events.
strobelight_function_profiler, line 241, 2024-05-23 10:50:08,137, INFO: strobelight run id is: 8721740011604497
strobelight_function_profiler, line 243, 2024-05-23 10:50:34,801, INFO: strobelight profiling running
strobelight_function_profiler, line 224, 2024-05-23 10:50:36,803, INFO: strobelight profiling stopped
strobelight_function_profiler, line 215, 2024-05-23 10:50:41,289, INFO: Total samples: 3
strobelight_function_profiler, line 215, 2024-05-23 10:50:41,289, INFO: GraphProfiler (python stack): https://fburl.com/scuba/pyperf_experimental/on_demand/qmi2ucwp
strobelight_function_profiler, line 215, 2024-05-23 10:50:41,289, INFO: Icicle view (python stack): https://fburl.com/scuba/pyperf_experimental/on_demand/7fjkhs9i
strobelight_compile_time_profiler, line 120, 2024-05-23 10:50:41,289, INFO: 2 strobelight success runs out of 2 non-recursive compilation events.
strobelight_function_profiler, line 241, 2024-05-23 10:50:43,597, INFO: strobelight run id is: 1932476082259558
strobelight_function_profiler, line 243, 2024-05-23 10:51:09,791, INFO: strobelight profiling running
strobelight_function_profiler, line 224, 2024-05-23 10:51:11,883, INFO: strobelight profiling stopped
strobelight_function_profiler, line 215, 2024-05-23 10:51:16,218, INFO: Total samples: 3
strobelight_function_profiler, line 215, 2024-05-23 10:51:16,218, INFO: GraphProfiler (python stack): https://fburl.com/scuba/pyperf_experimental/on_demand/vy1ujxec
strobelight_function_profiler, line 215, 2024-05-23 10:51:16,218, INFO: Icicle view (python stack): https://fburl.com/scuba/pyperf_experimental/on_demand/2xgadviv
strobelight_compile_time_profiler, line 120, 2024-05-23 10:51:16,219, INFO: 3 strobelight success runs out of 3 non-recursive compilation events.
```

or pass TORCH_COMPILE_STROBELIGHT=TRUE for any torch compile python program.
ex running on XLNetLMHeadModel.
```
 TORCH_COMPILE_STROBELIGHT=TRUE TORCHINDUCTOR_FORCE_DISABLE_CACHES=1 time python benchmarks/dynamo/huggingface.py --ci --accuracy --timing --explain --inductor --device cuda --training --amp  --only XLNetLMHeadModel
 ```
 result:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126904
Approved by: https://github.com/aorenste
ghstack dependencies: #126444
2024-05-29 05:06:37 +00:00
9521528f71 Log export result of torch.jit.trace to scuba (#126900)
Summary: We want to track how well torch.jit.trace can be converted to export in large scale. As a first step, we log all of torch.jit.trace unittests whether we can convert the traced module to export module OR we can export the model directly

Test Plan: CI

Differential Revision: D57629682

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126900
Approved by: https://github.com/SherlockNoMad
2024-05-28 17:49:34 +00:00
7121ea6f70 Revert "Add compile time profiler for non fbcode targets (#126904)"
This reverts commit 575cb617db4043dd7a76aaf523dc3ab7ee07e7a5.

Reverted https://github.com/pytorch/pytorch/pull/126904 on behalf of https://github.com/atalman due to Broke nightly smoke test ([comment](https://github.com/pytorch/pytorch/pull/126904#issuecomment-2133418687))
2024-05-27 12:52:09 +00:00
575cb617db Add compile time profiler for non fbcode targets (#126904)
This is a tool that allow profiling compile time using strobelight profiler, its a meta only tool.
but works on non-fbcode targets.

A follow up diff will unify this with caffe2/fb/strobelight/compile_time_profiler.py.
example test:

```
run  python tools/strobelight/examples/compile_time_profile_example.py
```

```
python torch/utils/_strobelight/examples/compile_time_profile_example.py
strobelight_compile_time_profiler, line 61, 2024-05-23 10:49:28,101, INFO: compile time strobelight profiling enabled
strobelight_compile_time_profiler, line 93, 2024-05-23 10:49:28,102, INFO: Unique sample tag for this run is: 2024-05-23-10:49:282334638devvm4561.ash0.facebook.com
strobelight_compile_time_profiler, line 94, 2024-05-23 10:49:28,102, INFO: You can use the following link to access the strobelight profile at the end of the run: https://www.internalfb.com/intern/scuba/query/?dataset=pyperf_experimental%2Fon_demand&drillstate=%7B%22purposes%22%3A[]%2C%22end%22%3A%22now%22%2C%22start%22%3A%22-30%20days%22%2C%22filterMode%22%3A%22DEFAULT%22%2C%22modifiers%22%3A[]%2C%22sampleCols%22%3A[]%2C%22cols%22%3A[%22namespace_id%22%2C%22namespace_process_id%22]%2C%22derivedCols%22%3A[]%2C%22mappedCols%22%3A[]%2C%22enumCols%22%3A[]%2C%22return_remainder%22%3Afalse%2C%22should_pivot%22%3Afalse%2C%22is_timeseries%22%3Afalse%2C%22hideEmptyColumns%22%3Afalse%2C%22timezone%22%3A%22America%2FLos_Angeles%22%2C%22compare%22%3A%22none%22%2C%22samplingRatio%22%3A%221%22%2C%22metric%22%3A%22count%22%2C%22aggregation_field%22%3A%22async_stack_complete%22%2C%22top%22%3A10000%2C%22aggregateList%22%3A[]%2C%22param_dimensions%22%3A[%7B%22dim%22%3A%22py_async_stack%22%2C%22op%22%3A%22edge%22%2C%22param%22%3A%220%22%2C%22anchor%22%3A%220%22%7D]%2C%22order%22%3A%22weight%22%2C%22order_desc%22%3Atrue%2C%22constraints%22%3A[[%7B%22column%22%3A%22sample_tags%22%2C%22op%22%3A%22all%22%2C%22value%22%3A[%22[%5C%222024-05-23-10:49:282334638devvm4561.ash0.facebook.com%5C%22]%22]%7D]]%2C%22c_constraints%22%3A[[]]%2C%22b_constraints%22%3A[[]]%2C%22ignoreGroupByInComparison%22%3Afalse%7D&view=GraphProfilerView&&normalized=1712358002&pool=uber
strobelight_function_profiler, line 241, 2024-05-23 10:49:34,943, INFO: strobelight run id is: 3507039740348330
strobelight_function_profiler, line 243, 2024-05-23 10:50:00,907, INFO: strobelight profiling running
strobelight_function_profiler, line 224, 2024-05-23 10:50:02,741, INFO: strobelight profiling stopped
strobelight_function_profiler, line 215, 2024-05-23 10:50:06,173, INFO: Total samples: 7
strobelight_function_profiler, line 215, 2024-05-23 10:50:06,173, INFO: GraphProfiler (python stack): https://fburl.com/scuba/pyperf_experimental/on_demand/75cxdro3
strobelight_function_profiler, line 215, 2024-05-23 10:50:06,173, INFO: Icicle view (python stack): https://fburl.com/scuba/pyperf_experimental/on_demand/qsgydsee
strobelight_compile_time_profiler, line 120, 2024-05-23 10:50:06,174, INFO: 1 strobelight success runs out of 1 non-recursive compilation events.
strobelight_function_profiler, line 241, 2024-05-23 10:50:08,137, INFO: strobelight run id is: 8721740011604497
strobelight_function_profiler, line 243, 2024-05-23 10:50:34,801, INFO: strobelight profiling running
strobelight_function_profiler, line 224, 2024-05-23 10:50:36,803, INFO: strobelight profiling stopped
strobelight_function_profiler, line 215, 2024-05-23 10:50:41,289, INFO: Total samples: 3
strobelight_function_profiler, line 215, 2024-05-23 10:50:41,289, INFO: GraphProfiler (python stack): https://fburl.com/scuba/pyperf_experimental/on_demand/qmi2ucwp
strobelight_function_profiler, line 215, 2024-05-23 10:50:41,289, INFO: Icicle view (python stack): https://fburl.com/scuba/pyperf_experimental/on_demand/7fjkhs9i
strobelight_compile_time_profiler, line 120, 2024-05-23 10:50:41,289, INFO: 2 strobelight success runs out of 2 non-recursive compilation events.
strobelight_function_profiler, line 241, 2024-05-23 10:50:43,597, INFO: strobelight run id is: 1932476082259558
strobelight_function_profiler, line 243, 2024-05-23 10:51:09,791, INFO: strobelight profiling running
strobelight_function_profiler, line 224, 2024-05-23 10:51:11,883, INFO: strobelight profiling stopped
strobelight_function_profiler, line 215, 2024-05-23 10:51:16,218, INFO: Total samples: 3
strobelight_function_profiler, line 215, 2024-05-23 10:51:16,218, INFO: GraphProfiler (python stack): https://fburl.com/scuba/pyperf_experimental/on_demand/vy1ujxec
strobelight_function_profiler, line 215, 2024-05-23 10:51:16,218, INFO: Icicle view (python stack): https://fburl.com/scuba/pyperf_experimental/on_demand/2xgadviv
strobelight_compile_time_profiler, line 120, 2024-05-23 10:51:16,219, INFO: 3 strobelight success runs out of 3 non-recursive compilation events.
```

or pass TORCH_COMPILE_STROBELIGHT=TRUE for any torch compile python program.
ex running on XLNetLMHeadModel.
```
 TORCH_COMPILE_STROBELIGHT=TRUE TORCHINDUCTOR_FORCE_DISABLE_CACHES=1 time python benchmarks/dynamo/huggingface.py --ci --accuracy --timing --explain --inductor --device cuda --training --amp  --only XLNetLMHeadModel
 ```
 result:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126904
Approved by: https://github.com/aorenste
ghstack dependencies: #126693
2024-05-24 01:39:40 +00:00
4644611b14 [cprofile] log manifold link instead of raw data to trace_structured (#126451)
Internal D57459752 returns manifold URL and this PR adds to tlparse payload

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126451
Approved by: https://github.com/jamesjwu
2024-05-21 00:44:55 +00:00
b2d9b80fba Also remove compile_time_strobelight_meta frame when generating stack (#126289)
I think I also need to fix this in fbcode, leaving that for future work.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126289
Approved by: https://github.com/yanboliang
2024-05-15 23:55:37 +00:00
b7d67e476d upload pt2 cprofile stats to manifold (#125162)
Summary:
https://fb.workplace.com/groups/257735836456307/permalink/657458576484029/

upload cprofile to manifold

D56696397 has a script to convert profiler stats to dot graphs (see its test plan)

Test Plan:
non-MAST
`TORCH_COMPILE_CPROFILE=1 buck2 run mode/opt mode/inplace //pytorch/benchmark:run -- ads_mc_igctr_mc3_v0 -d cuda -t train --torchdynamo inductor --profile --profile-export-chrome-trace`

https://www.internalfb.com/manifold/explorer/pyper_traces/tree/compilation_cprofile/test/20240428_234002_7562397568

MAST
`buck2 run mode/opt aps_models/ads/icvr:icvr_launcher -- mode=mast_ctr_cvr_cmf_rep launcher.fbl_entitlement=ai_infra_training_rnd_tc features=ctr_cvr_conso_cmf_pipeline_features_455876776_3teach model=ctr_cvr_cmf_when_rep_config_msmn_3teach model_name=ctr_cvr_when model.when_arch.use_extended_residual_contexts=True optimizers.dense_default.lr_schedule.0.max_iters=20000 training.planner.storage_reservation_policy=FixedPercentage training.planner.storage_reservation_percentage=0.72 data_loader.dataset.batch_size=2048 trainer.garbage_collection.garbage_collection_interval=100 model.when_arch.layer_norm_init_weight=0.3 optimizers.dense_default.lr_schedule.0.value=0.001 model.when_arch.customized_mlp_init_scale=0.3 launcher.num_workers=128 launcher.max_retries=10 launcher.data_project=oncall_ads_model_platform launcher.hardware=ZIONEX_80G data_loader.dataset.table_ds="[2024-01-01]" launcher.job_name=test_inductor_logging`

https://www.internalfb.com/manifold/explorer/pyper_traces/tree/compilation_cprofile/aps-test_inductor_logging-745febb51a

Generating dotty files from D56696397
```
Generating dot file from cprofile stats /home/daohang/aps-test_inductor_logging-745febb51a/0/0/_compile1.profile ...
P1225733598: https://www.internalfb.com/intern/paste/P1225733598/
Dotty: https://www.internalfb.com/intern/graphviz/?paste=1225733598
Generating dot file from cprofile stats /home/daohang/aps-test_inductor_logging-745febb51a/0/0/_compile10.profile ...
P1225733629: https://www.internalfb.com/intern/paste/P1225733629/
Dotty: https://www.internalfb.com/intern/graphviz/?paste=1225733629
Generating dot file from cprofile stats /home/daohang/aps-test_inductor_logging-745febb51a/0/0/_compile0.profile ...
P1225733649: https://www.internalfb.com/intern/paste/P1225733649/
Dotty: https://www.internalfb.com/intern/graphviz/?paste=1225733649
```

Differential Revision: D56679561

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125162
Approved by: https://github.com/anijain2305
2024-04-30 15:05:01 +00:00
4b586a434f [ROCm] Triton upstream AMD backend integration (#121801)
Update ROCm-triton to use the AMD backend from https://github.com/openai/triton

Note: `test__int_mm` can be enabled after https://github.com/pytorch/pytorch/pull/122431 is landed

Co-authored-by: Pruthvi Madugundu <pruthvigithub@gmail.com>
Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121801
Approved by: https://github.com/nmacchioni, https://github.com/malfet
2024-04-25 20:44:27 +00:00
3890848ec2 Revert "[ROCm] Triton upstream AMD backend integration (#121801)"
This reverts commit 9888d7495ece6b6df3b7334fc7c2a9d869359250.

Reverted https://github.com/pytorch/pytorch/pull/121801 on behalf of https://github.com/jeanschmidt due to need to revert so I can revert https://github.com/pytorch/pytorch/pull/124592 ([comment](https://github.com/pytorch/pytorch/pull/121801#issuecomment-2076951327))
2024-04-25 11:22:19 +00:00
9888d7495e [ROCm] Triton upstream AMD backend integration (#121801)
Update ROCm-triton to use the AMD backend from https://github.com/openai/triton

Note: `test__int_mm` can be enabled after https://github.com/pytorch/pytorch/pull/122431 is landed

Co-authored-by: Pruthvi Madugundu <pruthvigithub@gmail.com>
Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121801
Approved by: https://github.com/nmacchioni, https://github.com/malfet
2024-04-24 17:28:12 +00:00
8cf54929e3 compiletime->compile_time (#124579)
Summary: title.

Test Plan: run strobelight profiler.

Reviewed By: oulgen

Differential Revision: D56395415

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124579
Approved by: https://github.com/oulgen
2024-04-23 20:50:53 +00:00
acbf888a13 rename sl to strobelight (#124455)
Summary:
TORCH_COMPILE_SL_PROFILE ->TORCH_COMPILE_STROBELIGHT
SL_MAX_STACK_LENGTH -> COMPILE_STROBELIGHT_MAX_STACK_LENGTH
SL_MAX_PROFILE_TIME -> COMPILE_STROBELIGHT_MAX_PROFILE_TIME
profile_with_sl() -> strobelight()
compiletime_sl_profile_meta() -> compiletime_strobelight_meta()

Test Plan:
1. run and verify
```
TORCH_COMPILE_STROBELIGHT=TRUE buck2 run  @//mode/inplace  @//mode/opt  //caffe2/fb/strobelight:compiletime_profiler_example
```
2. run and verify
```
buck2 run  @//mode/inplace  @//mode/opt  //caffe2/fb/strobelight:function_profiler_example --local-only
```
3. run and verify truncated stack for
```
TORCH_COMPILE_STROBELIGHT=TRUE COMPILE_STROBELIGHT_MAX_STACK_LENGTH=1 buck2 run  @//mode/inplace  @//mode/opt  //caffe2/fb/strobelight:compiletime_profiler_example
```
4. add infinite loop in _verify and verify samples for
```
COMPILE_STROBELIGHT_MAX_PROFILE_TIME=30 TORCH_COMPILE_STROBELIGHT=TRUE buck2 run  @//mode/inplace  @//mode/opt  //caffe2/fb/strobelight:compiletime_profiler_example
```

Reviewed By: oulgen

Differential Revision: D56327139

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124455
Approved by: https://github.com/oulgen
2024-04-19 22:50:13 +00:00
d1e1d671ef Stop requiring a pystub for register_fake by default (#124064)
Previously, if someone used `register_fake` to add a fake impl for an
operator defined in C++, we would require them to add a
`m.set_python_module(<module>)` call to C++. This was to avoid
situations where a user imported the C++ operator without importing the
fake impl.

This "breaks" open registration: there's no way to add a fake impl
outside of a repository that defines an operator, so we want to turn
this behavior off by default in open source.

Test Plan:
- existing tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124064
Approved by: https://github.com/albanD
ghstack dependencies: #123937
2024-04-17 23:51:20 +00:00
47dbfecd37 Rename impl_abstract to register_fake, part 1/2 (#123937)
This PR:
- adds a new torch.library.register_fake and deprecates
  torch.library.impl_abstract. The motivation is that we have a lot of
  confusion around the naming so we are going to align the naming with
  the actual subsystem (FakeTensor).
- renames `m.impl_abstract_pystub("fbgemm_gpu.sparse_ops")` to
  `m.has_python_registration("fbgemm_gpu.sparse_ops")`. No deprecation
  here yet; I need to test how this works with static initialization.
- Renames a bunch of internals to match (e.g. abstractimplpystub ->
  pystub)

I'm scared to rename the Python-side internal APIs (e.g.
torch._library.abstract_impl) because of torch.package concerns. I'll do
that in its own isolated PR next just in case it causes problems.

DEPRECATION NOTE: torch.library.impl_abstract was renamed to to
torch.library.register_fake. Please use register_fake. We'll delete
impl_abstract in a future version of PyTorch.

Test Plan:
- existing tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123937
Approved by: https://github.com/albanD
2024-04-17 12:46:01 +00:00
caed7f6727 profile pt2 compile time with strobelight (#123311)
For oss this diff adds a decorator @profile_sb_fbcode that is a nop for non meta workload.

Facebook:
With this diff someone can generate a strobelight profile for pt2 compilation.
users need to set the env variable TORCH_COMPILE_SL_PROFILE =TRUE .

For example:
```
TORCH_COMPILE_SL_PROFILE =TRUE buck2 run  @//mode/inplace  @//mode/opt  //caffe2/fb/strobelight:compiletime_profile_example
```
see sample output bellow, at the end of summary.

The way this works, is that a unique id is generated and associated with all samples that are collected for functions that are decorated with profile_sb_fbcode.
This id can then be used to combine different strobe light profile into one. (for example three compilation events happens in the code bellow).

Right now the following two functions are annotated with  profile_sb_fbcode.  bw_compiler and _compile. if two profile_sl_fbcode is called recursively, recursive invocations are ignored and a log is printed.

The output is:
```
Strobelight is enabled for pt2 compilation
Unique user-id for this run is: 2024-04-03-13:59:49147091devvm4561.ash0.facebook.com
You can use the following link to access the strobelight profile at the end of the run:
https://www.internalfb.com/intern/scuba/query/?dataset=pyperf_experimental%2Fon_demand&drillstate=%7B%22purposes%22%3A[]%2C%22end%22%3A%22now%22%2C%22start%22%3A%22-30%20days%22%2C%22filterMode%22%3A%22DEFAULT%22%2C%22modifiers%22%3A[]%2C%22sampleCols%22%3A[]%2C%22cols%22%3A[%22namespace_id%22%2C%22namespace_process_id%22]%2C%22derivedCols%22%3A[]%2C%22mappedCols%22%3A[]%2C%22enumCols%22%3A[]%2C%22return_remainder%22%3Afalse%2C%22should_pivot%22%3Afalse%2C%22is_timeseries%22%3Afalse%2C%22hideEmptyColumns%22%3Afalse%2C%22timezone%22%3A%22America%2FLos_Angeles%22%2C%22compare%22%3A%22none%22%2C%22samplingRatio%22%3A%221%22%2C%22metric%22%3A%22count%22%2C%22aggregation_field%22%3A%22async_stack_complete%22%2C%22top%22%3A10000%2C%22aggregateList%22%3A[]%2C%22param_dimensions%22%3A[%7B%22dim%22%3A%22py_async_stack%22%2C%22op%22%3A%22edge%22%2C%22param%22%3A%220%22%2C%22anchor%22%3A%220%22%7D]%2C%22order%22%3A%22weight%22%2C%22order_desc%22%3Atrue%2C%22constraints%22%3A[[%7B%22column%22%3A%22run_user%22%2C%22op%22%3A%22eq%22%2C%22value%22%3A[%22[%5C%222024-04-03-13:59:49147091devvm4561.ash0.facebook.com%5C%22]%22]%7D]]%2C%22c_constraints%22%3A[[]]%2C%22b_constraints%22%3A[[]]%2C%22ignoreGroupByInComparison%22%3Afalse%7D&view=GraphProfilerView&&pool=uber&graphprofiler_filter=&graphprofiler_column_to_sort_by=exclusive
the link below takes you to the collected strobelight profile
https://www.internalfb.com/intern/scuba/query/?dataset=pyperf_experimental%2Fon_demand&drillstate=%7B%22dimensions%22%3A%5B%5D%2C%22param_dimensions%22%3A%5B%7B%22anchor%22%3A%220%22%2C%22param%22%3A%220%22%2C%22op%22%3A%22edge%22%2C%22dim%22%3A%22py_async_stack%22%7D%5D%2C%22constraints%22%3A%5B%5B%7B%22value%22%3A%5B%22%5B%5C%22-6800545191281321%5C%22%5D%22%5D%2C%22op%22%3A%22eq%22%2C%22column%22%3A%22run_id%22%7D%2C%7B%22value%22%3A%5B%22%5B%5C%222024-04-03-13%3A59%3A49147091devvm4561.ash0.facebook.com%5C%22%5D%22%5D%2C%22op%22%3A%22eq%22%2C%22column%22%3A%22run_user%22%7D%5D%5D%2C%22top%22%3A10000%2C%22end%22%3A%221712181610%22%2C%22start%22%3A%221712174410%22%7D&view=GraphProfilerView&
1 storbelight success runs out of 1 non-ignored runs.
strobelight run id is: 6181728288420687
the link below takes you to the collected strobelight profile
https://www.internalfb.com/intern/scuba/query/?dataset=pyperf_experimental%2Fon_demand&drillstate=%7B%22dimensions%22%3A%5B%5D%2C%22param_dimensions%22%3A%5B%7B%22anchor%22%3A%220%22%2C%22param%22%3A%220%22%2C%22op%22%3A%22edge%22%2C%22dim%22%3A%22py_async_stack%22%7D%5D%2C%22constraints%22%3A%5B%5B%7B%22value%22%3A%5B%22%5B%5C%226181728288420687%5C%22%5D%22%5D%2C%22op%22%3A%22eq%22%2C%22column%22%3A%22run_id%22%7D%2C%7B%22value%22%3A%5B%22%5B%5C%222024-04-03-13%3A59%3A49147091devvm4561.ash0.facebook.com%5C%22%5D%22%5D%2C%22op%22%3A%22eq%22%2C%22column%22%3A%22run_user%22%7D%5D%5D%2C%22top%22%3A10000%2C%22end%22%3A%221712181621%22%2C%22start%22%3A%221712174421%22%7D&view=GraphProfilerView&
2 storbelight success runs out of 2 non-ignored runs.
strobelight run id is: -1026103682715688
the link below takes you to the collected strobelight profile
https://www.internalfb.com/intern/scuba/query/?dataset=pyperf_experimental%2Fon_demand&drillstate=%7B%22dimensions%22%3A%5B%5D%2C%22param_dimensions%22%3A%5B%7B%22anchor%22%3A%220%22%2C%22param%22%3A%220%22%2C%22op%22%3A%22edge%22%2C%22dim%22%3A%22py_async_stack%22%7D%5D%2C%22constraints%22%3A%5B%5B%7B%22value%22%3A%5B%22%5B%5C%22-1026103682715688%5C%22%5D%22%5D%2C%22op%22%3A%22eq%22%2C%22column%22%3A%22run_id%22%7D%2C%7B%22value%22%3A%5B%22%5B%5C%222024-04-03-13%3A59%3A49147091devvm4561.ash0.facebook.com%5C%22%5D%22%5D%2C%22op%22%3A%22eq%22%2C%22column%22%3A%22run_user%22%7D%5D%5D%2C%22top%22%3A10000%2C%22end%22%3A%221712181647%22%2C%22start%22%3A%221712174447%22%7D&view=GraphProfilerView&
3 storbelight success runs out of 3 non-ignored runs.
```

Test Plan:
Was tested on buck2 run  @//mode/inplace  @//mode/opt  //caffe2/fb/strobelight:compiletime_profile_example

This was also tested in one of the ads benchmarks
```
TORCH_COMPILE_SL_PROFILE =TRUE buck2 run mode/opt mode/inplace //pytorch/benchmark:run -- ads_mc_igctr_mc3_v0 -d cuda -t train --torchdynamo inductor
```
The results matches the results reported in
https://fb.workplace.com/groups/257735836456307/permalink/657458576484029

Differential Revision: D55672271

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123311
Approved by: https://github.com/aorenste
2024-04-06 18:57:44 +00:00
b1fa0ce4aa [export] build the infra to rollout predispatch export. (#122326)
Test Plan:
fbcode:caffe2/test/quantization:test_quantization
fbcode:bolt/nn/executorch/backends/tests:qnn_test
fbcode:on_device_ai/helios/compiler_tests/...
fbcode:pyspeech/tests:pyspeech_utils_test_oss
fbcode:caffe2/test:quantization_pt2e_qat
fbcode:on_device_ai/Assistant/Jarvis/tests:test_custom_ops
fbcode:modai/test:test_modai
fbcode:executorch/exir/backend/test:test_partitioner

Differential Revision: D55133846

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122326
Approved by: https://github.com/tugsbayasgalan
2024-03-22 00:55:10 +00:00
ba9a1d96a4 Add scuba logging for TorchScript usage (#121936)
Summary: Infra to log live usage of TorchScript internally

Test Plan: manually tested

Differential Revision: D54923510

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121936
Approved by: https://github.com/zhxchen17
2024-03-19 17:38:27 +00:00
a04e7fca8e Use memcache versioning for autotune remote cache (#121748)
Summary: Internal training platform doesn't get updated very frequently, so lets use versioning for memcache.

Test Plan: existing tests

Differential Revision: D54818197

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121748
Approved by: https://github.com/aakhundov, https://github.com/jansel
2024-03-14 00:36:10 +00:00
02a410ee12 Enable TORCH_TRACE by default in all Tupperware like environments (#120915)
Summary:
This is a reimplemented version of the FB specific code in https://www.internalfb.com/diff/D54230697

The new strategy is that we unconditionally install an FB handler to trace_log logger (and always set level to DEBUG). When the first log message is emitted, we check the JK/filesystem to see if we should actually do logging. If we decide we don't do logging, we remove the handler from trace_log and are done.

build_only[github-export-checks,executorch,pytorch_benchmark,pytorch_quantization,pytorch_distributed,pytorch_distributed_gpu,pytorch_dynamo_inductor,pytorch_functorch,pytorch_fx2trt,pytorch_diff_train_tests_ads,glow_fb_pytorch_tests,training_platform,training_platform_compatibility,training_toolkit_applications,training_toolkit_examples,training_toolkit_model_optimization,dper3_pytorch,xplat_caffe2,pytorch_dev,android-pytorch-instrumentation-tests,smartpytorchgithub_first_try_merge,frl-target-determinator,f6-buck,training_platform_for_github,sigmoid_cpu,sigmoid_gpu,aiplatform_modelprocessing_for_github,accelerators_workloads_models_slimdsnn,ae_aotinductor_benchmark_test,aps_,aps_deterministic_ne_tests,dper_lib_silvertorch,torchrec,torchrec_fb,deeplearning_aot_inductor]

Test Plan:
sandcastle

```
buck2 test 'fbcode//mode/dev-nosan' fbcode//torchrec/inference/tests:test_single_gpu_executor -- --exact 'torchrec/inference/tests:test_single_gpu_executor - TorchDeployGPUTest.NestedModelSingleGPU'
buck2 test 'fbcode//mode/dev-nosan' fbcode//dper_lib/silvertorch/modules/dynamic_stats/tests:accumulators_test -- --exact 'dper_lib/silvertorch/modules/dynamic_stats/tests:accumulators_test - test_global_fixed_interval_accumulator (dper_lib.silvertorch.modules.dynamic_stats.tests.accumulators_test.GlobalFixedIntervalUnivalentAcculumatorTest)'
```

Also running a test flow with/without JK enabled

Differential Revision: D54275086

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120915
Approved by: https://github.com/yanboliang
2024-03-01 04:47:13 +00:00
f3dd2a544c Revert "Add structured trace logs (#120289)"
This reverts commit 9dfaef962cda5f65eec53e5fd6f07b5226ea65cb.

Reverted https://github.com/pytorch/pytorch/pull/120289 on behalf of https://github.com/kit1980 due to breaking internal builds, see D54230697 ([comment](https://github.com/pytorch/pytorch/pull/120289#issuecomment-1967477120))
2024-02-27 19:49:05 +00:00
9dfaef962c Add structured trace logs (#120289)
Overall design: https://docs.google.com/document/d/1CX_hJ0PNy9f3R1y8TJrfkSeLkvGjjjLU84BSXgS2AZ8/edit

How to read the diff:
* Most files are me augmenting pre-existing logging with structured variants. For the most part it's simple (esp FX graphs, which have a canonical string representation); it gets more complicated when I decided to JSON-ify some data structure instead of keeping the ad hoc printing (notably, guards and dynamo output graph sizes)
* torch/_functorch/_aot_autograd/collect_metadata_analysis.py is some unrelated fixes I noticed while auditing artifact logs
* torch/_logging/_internal.py has the actual trace log implementation. The trace logger is implement as a logger named torch.__trace which is disconnected from the logging hierarchy. It gets its own handler and formatter (TorchLogsFormatter with _is_trace True). There's a teensy bit of FB specific code to automatically enable trace logging if a /logs directory exists. `trace_structured` is the main way to emit a trace log. Unusually, there's a separate "metadata" and "payload" field. The metadata field should not be too long (as it is serialized as a single line) and is always JSON (we put contextual things like compile id in it); the payload field can be long and is emitted after the metadata log line and can span multiple lines.
* torch/_logging/structured.py contains some helpers for converting Python data structures into JSON form. Notably, we have a string interning implementation here, which helps reduce the cost of serializing filenames into the log.
* test/dynamo/test_structured_trace.py the tests are cribbed from test_logging.py, but all rewritten to use expect tests on munged versions of what we'd actually output. Payloads are never tested, since they tend not be very stable.

https://github.com/ezyang/tlparse is a POC Rust program that can interpret these logs.

Testing that the fbcode detection works at https://www.internalfb.com/mlhub/pipelines/runs/fblearner/534553450 (Meta-only)

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120289
Approved by: https://github.com/Skylion007
2024-02-27 00:04:23 +00:00
7b1f5c874f [PT2][Optimus][Observability] Log the optimus graph transformation to the scuba (#119745)
Summary: Current everstore upload logging may cuase excessive compilation time when the model has lots of graph breaks (post: https://fb.workplace.com/groups/257735836456307/permalink/633533465543207/), we here log the transformation only when the graph changed

Test Plan:
timeout flows:
f528209775
f530084719

Differential Revision: D53692344

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119745
Approved by: https://github.com/jackiexu1992
2024-02-16 21:32:04 +00:00
563f1b9fef [inductor] Use torch.cuda.clock_rate instead of triton.testing.nvsmi (#118662)
`triton.testing.nvsmi` invokes `nvidia-smi` as a subprocess, and Meta
prod usually doesn't make nvidia-smi available.  Might as well just use
something that's native to torch.

Differential Revision: [D53235814](https://our.internmc.facebook.com/intern/diff/D53235814/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118662
Approved by: https://github.com/jansel
2024-02-14 03:23:49 +00:00
8069b29603 [export] Implement logging for scuba. (#119585)
Summary: As we're growing the user surface of torch.export, we'd like to understand better how people are using our APIs. It's also possible to analyze the usages based on static analysis, but due to the fact that there could be many creative ways to call things in Python, I think just building some logging infra will benefit us in the short term and gain us some insights.

Test Plan:
buck test caffe2/test:test_export
{F1454519846}

Reviewed By: tugsbayasgalan

Differential Revision: D53618220

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119585
Approved by: https://github.com/avikchaudhuri
2024-02-12 17:28:14 +00:00
da0635d17c Add pytorch-distributed justknobs helper (#118568)
Summary:
Sets up a helper that checks any JKs relevent to pytorch distributed,
and propagates their values to ENV.

Test Plan: Added unit test

Differential Revision: D53192406

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118568
Approved by: https://github.com/zdevito
2024-01-30 08:13:52 +00:00
bb55970e5b Revert "Add justknobs env helper for pytorch distributed (#118451)"
This reverts commit 4d1bb2175a49e9b4440085a3dc2e2b211e5cf99e.

Reverted https://github.com/pytorch/pytorch/pull/118451 on behalf of https://github.com/wconstab due to Broke internal tests ([comment](https://github.com/pytorch/pytorch/pull/118451#issuecomment-1915369013))
2024-01-29 19:01:05 +00:00
4d1bb2175a Add justknobs env helper for pytorch distributed (#118451)
Summary:
Adds a JK killswitch check and configures the env for enabling pytorch
nccl flight recorder.  Note- this only enables recording events in memory, not
dumping them.

Test Plan: CI test

Reviewed By: zdevito

Differential Revision: D52920092

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118451
Approved by: https://github.com/malfet
2024-01-29 08:57:16 +00:00
93b1e47586 [inductor][Observability] Add log for Optimus to enable easier debug (#110452)
Summary: The log breaks one of ads-model export flows, and we change the log to debug

Test Plan: see details in D49710166

Differential Revision: D49844303

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110452
Approved by: https://github.com/jackiexu1992
2023-12-01 18:25:56 +00:00
d1c092ae1b Update impl_abstract_pystub to be less boilerplatey (#113182)
Summary:

We've made the following changes:
- The new way to use the API is `m.impl_abstract_pystub(module, context)`.
  Every subsequent m.def of an op inside the TORCH_LIBRARY block gives
  the op the `impl_abstract_pystub`.
- Added a mechanism to determine if an operator was defined in Python or C++.
  Library.define in Python appends the op to a global set, which is analogous
  to what we do for tracking Library.impl.
- If someone does `torch.library.impl_abstract` in Python for an operator, then
  we require that it has an `impl_abstract_pystub` specified and we also check
  that the module in the `impl_abstract_pystub` is the same as the module where
  the call to `torch.library.impl_abstract` exists.
- Unfortunately we can't check the "context" (which is the buck target on
  buck-based systems) because buck sits above us.

bypass-github-export-checks

Test Plan: - existing tests

Differential Revision: D51080493

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113182
Approved by: https://github.com/ezyang
2023-11-08 00:39:00 +00:00
bc3e2e03cd Revert "Update impl_abstract_pystub to be less boilerplatey (#112851)"
This reverts commit 6ae4e3a8d249a96d9a8bbfba389d0509783e11e1.

Reverted https://github.com/pytorch/pytorch/pull/112851 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/112851#issuecomment-1799539354))
2023-11-07 18:53:13 +00:00
6ae4e3a8d2 Update impl_abstract_pystub to be less boilerplatey (#112851)
Summary:
We've made the following changes:
- The new way to use the API is `m.impl_abstract_pystub(module, context)`.
  Every subsequent m.def of an op inside the TORCH_LIBRARY block gives
  the op the `impl_abstract_pystub`.
- Added a mechanism to determine if an operator was defined in Python or C++.
  Library.define in Python appends the op to a global set, which is analogous
  to what we do for tracking Library.impl.
- If someone does `torch.library.impl_abstract` in Python for an operator, then
  we require that it has an `impl_abstract_pystub` specified and we also check
  that the module in the `impl_abstract_pystub` is the same as the module where
  the call to `torch.library.impl_abstract` exists.
- Unfortunately we can't check the "context" (which is the buck target on
  buck-based systems) because buck sits above us.

Test Plan: - existing tests

Differential Revision: D50972148

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112851
Approved by: https://github.com/ezyang
2023-11-07 16:07:42 +00:00
8124a6c40c [TORCH_LIBRARY] Add impl_abstract_pystub (#109529)
We want users to be able to define custom ops in C++ but put the
abstract impl in Python (since it is easier to write them in Python and
the abstract impl better models device semantics and data-dependent
operators).

`m.impl_abstract_pystub(opname, python_module, context)` declares the
abstract_impl of the operator to exist in the given python module.
When the abstract_impl needs to be accessed (either via FakeTensor or
Meta), and it does not exist, the PyTorch Dispatcher will yell
with a descriptive error message.

Some details:
- We construct a new global AbstractImplPyStub mapping in
  Dispatcher.cpp. Read/write to this map is protected by the Dispatcher
  lock.
- We add a new Meta Tensor fallback kernel. The fallback errors out if there is
  no meta kernel, but also offers a nicer error message if we see that there is
  a pystub.
- We create a `torch._utils_internal.throw_abstract_impl_not_imported_error`
  helper function to throw errors. This way, we can throw different error
  messages in OSS PyTorch vs internal PyTorch. To invoke this from C++, we
  added a PyInterpreter::throw_abstract_impl_not_imported_error.

Differential Revision: [D49464753](https://our.internmc.facebook.com/intern/diff/D49464753/)

Differential Revision: [D49464753](https://our.internmc.facebook.com/intern/diff/D49464753)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109529
Approved by: https://github.com/ezyang, https://github.com/bdhirsh
2023-09-22 04:55:36 +00:00
49b18ae546 Revert "python functionalization: add helpers, functionalize_sync and mirror_autograd_meta (#107917)"
This reverts commit 0ad595954a1766f26aa55b0f72814d55865bb1dc.

Reverted https://github.com/pytorch/pytorch/pull/107917 on behalf of https://github.com/clee2000 due to breaking internal builds D49346637 ([comment](https://github.com/pytorch/pytorch/pull/107917#issuecomment-1722566885))
2023-09-17 20:57:41 +00:00
0ad595954a python functionalization: add helpers, functionalize_sync and mirror_autograd_meta (#107917)
Added two new utils to help with turning python functionalization on in AOTAutograd (next PR):

(1) updated `torch._sync()`. Previously, this API could only handle `torch.Tensor` instances that had a `FunctionalTensorWrapper` TensorImpl. It now needs to handle python `FunctionalTensor`'s. In theory I can probably break BC and change this API (since it's private?), but I decided not to do it in this PR stack do minimize the chance of reverts. Instead of updating that API directly (which is in C++), I just added a python shim that first tries to unwrap the python `FunctionalTensor` if there is one, then calls the existing C++ logic

(2) `mirror_autograd_meta` is now a standalone API that tries to mirror the `requires_grad` and `is_leaf` autograd metadata from one tensor to another. Previously this was hardcoded into `torch._to_functional_tensor()`. But I now need to use it in a more standalone way: later in AOTAutograd when we unwrap and re-wrap a tensor subclasses, we need to manually mirror the autograd metadata from the original to the updated version of the subclass.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107917
Approved by: https://github.com/ezyang
ghstack dependencies: #106404
2023-09-15 20:19:25 +00:00
fbfb9a1648 [Dynamo] Improve PT2 fbcode logging observability (#106932)
Summary:
https://docs.google.com/document/d/1D5K3_ELsda3tIUeSyNL_2yee-M3jVWbirqSQ5BDNvHQ/edit

This is the revamped version of D47908299.

For each frame, we will record a list of compilation metrics: e.g, backend_compile time, entire_frame_compile time, cache_size, co_filename, co_firstlineno, co_name, guards, graph input_count, graph node_count, graph op_count.

With the help of job info: mast_job_name, global_rank, we can satisfy the requirements from `Things I’ve used/wanted to use our logging to determine` in https://docs.google.com/document/d/1D5K3_ELsda3tIUeSyNL_2yee-M3jVWbirqSQ5BDNvHQ/edit (or add more metrics for this framework)

Test Plan:
```
buck2 test //caffe2/test:test_dynamo
```

Differential Revision: D48142400

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106932
Approved by: https://github.com/anijain2305
2023-08-11 20:46:04 +00:00
59dff01319 Add top level function to check if running with deploy (#101420)
Also not sure if this should be a public function or not. Leaving it private for now but let me know if you prefer for it to be public.

FYI @nikitaved this will logically conflict with your triton kernel PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101420
Approved by: https://github.com/malfet
2023-05-16 16:05:49 +00:00
1e807f1189 Log PT2 compile to Scuba (#98790)
Summary:
Modeled off of https://www.internalfb.com/code/fbsource/[5f363eaeab1b5d620b9df83ba0de65adfd96771b]/fbcode/caffe2/torch/fb/trainer/profilers/gpu_mem_signpost.py?lines=106-115

I didn't use the Scuba integration in torch/_inductor/fb/logging.py to avoid
having to make a new Scuba table; probably should do this.

Test Plan:
```
buck2 test //caffe2/test:test_dynamo
```

Differential Revision: D44850903

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98790
Approved by: https://github.com/desertfire, https://github.com/bertmaher
2023-04-11 20:10:35 +00:00
12cb26509a Apply ufmt to torch internal (#81643)
This is a big bang PR, merge conflicts are probably expected and will be addressed at merge.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81643
Approved by: https://github.com/ezyang
2022-07-22 02:19:50 +00:00
1022443168 Revert D30279364: [codemod][lint][fbcode/c*] Enable BLACK by default
Test Plan: revert-hammer

Differential Revision:
D30279364 (b004307252)

Original commit changeset: c1ed77dfe43a

fbshipit-source-id: eab50857675c51e0088391af06ec0ecb14e2347e
2021-08-12 11:45:01 -07:00
b004307252 [codemod][lint][fbcode/c*] Enable BLACK by default
Test Plan: manual inspection & sandcastle

Reviewed By: zertosh

Differential Revision: D30279364

fbshipit-source-id: c1ed77dfe43a3bde358f92737cd5535ae5d13c9a
2021-08-12 10:58:35 -07:00