pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-21 05:34:18 +08:00

Author	SHA1	Message	Date
Edward Z. Yang	a6630bcf87	Profile guided optimization for automatic_dynamic (#139001 ) Previously: https://github.com/pytorch/pytorch/pull/138052 but the implementation is done from scratch, so I open a new PR. This implements the ability to save and load profiles of automatic dynamic decisions, so on subsequent runs we can directly make something automatically dynamic. Unlike the previous implementation, this cache is never enabled by default; instead, you have to specify a "job id" that says it's OK to share results. We will be able to automatically populate this id for internal MAST jobs but for generic OSS users you will have to explicitly opt into it. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Differential Revision: [D65065497](https://our.internmc.facebook.com/intern/diff/D65065497) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139001 Approved by: https://github.com/oulgen	2024-11-01 21:43:25 +00:00
Colin L. Rice	abc5d59dcb	config: create Config objects with JK support (#138766 ) This teaches install_config_module (and the underlying code) to understands Config objects. Additionally we've added a JK option to this which resolves the JK. This config gets stored within the _ConfigEntry class and is evaluated when __getattr__ is called. If justknobs is set, it'll call justknobs_check to see the result. Due to preceeding work, basically everything works correctly here and we had to update a couple of tests, and modify the getattr behaviour. Note that we are updating the justknob_check function to support a default option, to make default work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138766 Approved by: https://github.com/ezyang	2024-11-01 19:20:37 +00:00
James Wu	a16476b671	Add support for adding extra metadata to chromium events, log to separate columns (#138477 ) This diff does a few things: ## Add metadata to events in progress Adds the ability to add extra metadata to Chromium Events via `add_event_data`. Metadata can only be added to chromium events that have started, but not ended (so, in progress events) - When you add the data, the metadata is appended to the metadata when you call log_event_end(). - The metadata appears in chromium events in tlparse. It also gets logged to scuba. ## New `dynamo` chromium event We add a new `dynamo` chromium event to the top of the stack, where we collect various metadata found in dynamo_compile. So the new order of events goes: ``` __start__ -> dynamo (dynamo compile metrics) -> entire_frame_compile (compile.inner) -> backend_compile (i.e. aotdispatch) -> create_aot_dispatch_function -> inductor_compile -> ... ``` BackwardCompilationMetrics doesn't have any dynamo specific information (as it's mostly inductor timings). So we don't include that here. FAQ: Why can't we use `entire_frame_compile` as the event? This is mostly due to backward compatibility with `dynamo_compile`. `dynamo_compile` collects CompilationMetrics outside of `compile.compile_inner`, and uses `dynamo_timed` to grab timings from phases of the compiler, including `entire_frame_compile`. So we don't have a CompilationMetric object until after an `entire_frame_compile` event ends! Separately, `dynamo` as a name for all of dynamo compile is more descriptive than `entire_frame_compile`, imo. ## Log metadata as separate columns (Meta only): Separately, this also changes the `metadata` column in PT2 Compile Events. Instead of logging a single metadata column in JSON, it separates the JSON into separate columns. This is much better for data analysis. Now that this table is more mature, I think logging keys to separate columns is a better system.Differential Revision: [D64696287](https://our.internmc.facebook.com/intern/diff/D64696287/) NOTE FOR REVIEWERS: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D64696287/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/138477 Approved by: https://github.com/aorenste	2024-10-22 21:17:44 +00:00
Aaron Orenstein	07cc4bd3e2	typing compile_fx.py (#138033 ) Type annotations for compile_fx. - Some of the stuff here is pretty complicated (functions which return functions that take functions) so I bailed on those and used `Any` just to get the rest landed. - There are also changes to type signatures in other files which I did just to let mypy know more about the types in compile_fx.py. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138033 Approved by: https://github.com/Skylion007	2024-10-21 18:14:59 +00:00
James Wu	295de00908	[PT2 Compile Events] Revamp PT2 Compile/chromium event logging [1/?] (#138093 ) This diff is the starting steps of https://docs.google.com/document/u/2/d/1kAEBt4AyW7HTAhXHbjoz8FBFHNyyEA2Qo2mPn7v3WUQ/edit?usp=drive_web&ouid=113555078003219714709 It implements the following changes: - Only log spans to scuba, so no start events are ever logged - Log events as the full event name, without "START" or "END" - Only log to scuba major phases from chromium events. These are: - entire_frame_compile (dynamo) - backend_compile (aotdispatch) - inductor_compile (inductor) - codegen (inductor codegen) Tlparse chromium events stay basically the same. But I implemented a few changes to clean that up as well: - When there's a phase name available, log the phase name instead of the function name as the event name. This simplifies the trace to not have two identical rows. The fn_name is avaliable as metadata on the chromium event, if interested - Log new events for pre and post grad passes. These do not log to scuba. By making the phases much simpler in Scuba, with only categories for major phases of PT2 Compilation, we pave the way to add much more metadata and information to each individual event type. Diffs for that will come later. IMPLEMENTATION NOTES: - The logic for `log_chromium_event_internal` (which is the function that logs to Scuba) lives in chromium_events for now, but in the future as we add more metadata, it may belong independently in dynamo_timed or even outside of dynamo_timed. I haven't explored in detail what the refactor will look like. Once we start logging metadata for dynamo, aotdispatch, inductor, I suspect we will call log_pt2_compile_event directly, instead of making chromium event logger handle the pt2_compile_event logic. But that refactor is left for another PR on top of this one. - There's an interesting space after pre grad passes within AOT autograd logic, that's between create_aot_dispatcher_function and pre grad passes. I'm not sure what we're spending time doing in that time, but I'll find out with a profile later. Differential Revision: [D64479033](https://our.internmc.facebook.com/intern/diff/D64479033/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/138093 Approved by: https://github.com/ezyang	2024-10-18 20:36:08 +00:00
James Wu	3bf6594d13	Log compile ids to pt2_remote_cache and pt2_compile_events (#137431 ) Log the current compilation id for all relevant samples for these two tables, so we can have a 1:1 analog with dynamo_compile. Differential Revision: [D63900826](https://our.internmc.facebook.com/intern/diff/D63900826/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137431 Approved by: https://github.com/oulgen	2024-10-08 18:04:48 +00:00
Tugsbayasgalan Manlaibaatar	97634e4f82	Rollout infra for executorch migration to training IR (#132703 ) Title Differential Revision: [D60432217](https://our.internmc.facebook.com/intern/diff/D60432217/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132703 Approved by: https://github.com/tarun292	2024-10-04 04:33:08 +00:00
PyTorch MergeBot	357b7fb579	Revert "[Pytorch] Consolidate Strobelight compile time profiler between OSS and fbcode (#135953 )" This reverts commit b8637503c036abb898f6b880b325aeffe6f09c03. Reverted https://github.com/pytorch/pytorch/pull/135953 on behalf of https://github.com/kollasb due to Broke internal module factory compatibility, revert from Phabricator failed ([comment](https://github.com/pytorch/pytorch/pull/135953#issuecomment-2351381777))	2024-09-15 05:32:38 +00:00
Suresh Babu Kolla	b8637503c0	[Pytorch] Consolidate Strobelight compile time profiler between OSS and fbcode (#135953 ) Summary: Move towards consolidating strobelight profiler implementations between OSS and fbcode. This change is a first step towards that. - Created a new function to abstract out compile time profiling enablement. This function allows profiler to switch between different function profilers (e.g. Thrift based or CLI based) - Both OSS and Fbcode now use one compile time profiler in torch/_strobelight Test Plan: Tested OSS with following commands: ``` python torch/_strobelight/examples/compile_time_profile_example.py python torch/_strobelight/examples/cli_function_profiler_example.py TORCH_COMPILE_STROBELIGHT=TRUE TORCHINDUCTOR_FORCE_DISABLE_CACHES=1 python benchmarks/dynamo/huggingface.py --ci --accuracy --timing --explain --inductor --device cuda --training --amp --only XLNetLMHeadModel ``` See test commands for fbcode in comments. Differential Revision: D62444551 Pull Request resolved: https://github.com/pytorch/pytorch/pull/135953 Approved by: https://github.com/laithsakka	2024-09-14 16:35:22 +00:00
Oguz Ulgen	2dadc2c8fc	Log fx graph cache bypass reasons (#134792 ) Summary: Lets track when we bypass and why Test Plan: unit tests Differential Revision: D61994739 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134792 Approved by: https://github.com/jamesjwu	2024-09-01 19:02:09 +00:00
Animesh Jain	7a694f6683	[justknobs] Override __bool__ method (#134799 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/134799 Approved by: https://github.com/ezyang	2024-08-30 04:54:02 +00:00
Colin L. Rice	cf11fc0dcb	dynamo: Only log if we've disabled eval_frame once. (#134529 ) This spams logs pretty badly otherwise Pull Request resolved: https://github.com/pytorch/pytorch/pull/134529 Approved by: https://github.com/chuanhaozhuge, https://github.com/oulgen	2024-08-30 00:35:25 +00:00
Colin L. Rice	9dc4bd7466	Create a JustknobConfig for use in config (#134161 ) This is designed to be a more ergonomic interface on top of justknob_feature (see https://github.com/pytorch/pytorch/pull/134151 for just the PR with the base commits). The idea is that people stop having to think about this as much, and can just do JustkobsConfig("//the:thing", "FORCE_THING") and it'll do the right thing. Primarily sending this to see how people feel about the API, and using it for new config changes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134161 Approved by: https://github.com/ezyang	2024-08-27 16:07:33 +00:00
Shangdi Yu	b0cf287b46	[export][training ir migration] Fix getitem not exist (#134259 ) Summary: Make quantization tests compatible with the new training IR. With the new batch norm node `torch.ops.aten.batch_norm.default`, we don't need an additional getitem node after the bn node, so tests need to be fixed to not check for the getitem node. We added a capture_pre_autograd_graph_using_training_ir() function, which returns True when we are using the training ir, and False otherwise. This way, the code supports both training ir and the old ir. For now, we are just rolling out the training ir for fbcode internal tests. Test Plan: ``` buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_qat_preserve_source_fn_stack buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_qat_update_shared_qspec buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_conv2d buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_qat_conv_bn_relu_fusion buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_qat_conv_bn_fusion buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_qat_conv_bn_fusion_literal_args ``` Reviewed By: andrewor14, tugsbayasgalan Differential Revision: D61292102 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134259 Approved by: https://github.com/tugsbayasgalan	2024-08-22 22:00:14 +00:00
James Wu	3c5485fb7f	[Retry] Log chromium events to scuba (#134118 ) Summary: This diff implements a bunch of views for internal scuba viewing. TODOS that I might punt to another diff: - Saving cache stats via counter is definitely sus here, but there's not really a good way to track "fx graph cache hit for this compile phase" right now. Will think about this more. - We should definitely log frame id, compile id, etc - We should definitely be logging configs. That way, we can A/B test based on whether a config is turned on. - idk what I'm doing with compile_uuid yet, but it's useful when you want to look at samples for a single run. I think if we had mast job info this field is not needed, but it's nice to be able to drill down to a single run and get its chrome trace view or icicle view, so idk Test Plan: All of the above views are run with nanogpt benchmark: ``` buck run mode/opt caffe2/benchmarks/dynamo:torchbench -- --training --backend=inductor --only nanogpt --performance ``` Differential Revision: D61603243 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134118 Approved by: https://github.com/oulgen	2024-08-22 14:59:45 +00:00
Laith Sakka	8b6b1721c8	remove StrobelightCompileTimeProfiler.profile_compile_time from stacktrace when strobelight profiling not enabled (#133831 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/133831 Approved by: https://github.com/oulgen	2024-08-19 09:14:52 +00:00
Oguz Ulgen	fa36eba77d	Turn off remote caching in unit tests unless explicitly on (#133258 ) Summary: This PR turns off remote caching in unit tests unless the unit test explicitly turns it on. Test Plan: existing tests Differential Revision: D61152154 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133258 Approved by: https://github.com/masnesral	2024-08-13 02:49:43 +00:00
Oguz Ulgen	eee76c86a8	Write trace_structured events to scuba (#130955 ) Summary: https://fb.workplace.com/groups/1286739428954016/posts/1287192258908733 Test Plan: Run test with tlparse and inspect https://www.internalfb.com/intern/scuba/query/?dataset=pt2_trace_structured_events Differential Revision: D59866096 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130955 Approved by: https://github.com/ezyang	2024-07-19 06:02:47 +00:00
Zhengxu Chen	37d4d04309	[torchscript] Add logging for model id. (#130118 ) Summary: as title. Test Plan: CI Reviewed By: angelayi Differential Revision: D59348256 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130118 Approved by: https://github.com/BoyuanFeng	2024-07-09 22:24:16 +00:00
Xuehai Pan	f85d1e845a	[BE] enable UFMT for `torch/nn/*.py` (#128593 ) Part of #123062 - #123062 Pull Request resolved: https://github.com/pytorch/pytorch/pull/128593 Approved by: https://github.com/mikaylagawarecki	2024-06-23 16:05:13 +00:00
PyTorch MergeBot	cc8193c707	Revert "[BE] enable UFMT for `torch/nn/functional.py` (#128592 )" This reverts commit f6e6e55fa7d883a89ba99584f8632c260519ba73. Reverted https://github.com/pytorch/pytorch/pull/128592 on behalf of https://github.com/fbgheith due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/128592#issuecomment-2181783936))	2024-06-21 00:44:16 +00:00
Xuehai Pan	f6e6e55fa7	[BE] enable UFMT for `torch/nn/functional.py` (#128592 ) Part of #123062 - #123062 Pull Request resolved: https://github.com/pytorch/pytorch/pull/128592 Approved by: https://github.com/mikaylagawarecki ghstack dependencies: #128596, #128594	2024-06-17 16:29:29 +00:00
Aaron Orenstein	afe15d2d2f	Flip default value for mypy disallow_untyped_defs [3/11] (#127840 ) See #127836 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127840 Approved by: https://github.com/oulgen	2024-06-08 18:28:01 +00:00
laithsakka	cdf2133186	Add compile time profiler for non fbcode targets (#126904 ) This is a tool that allow profiling compile time using strobelight profiler, its a meta only tool. but works on non-fbcode targets. A follow up diff will unify this with caffe2/fb/strobelight/compile_time_profiler.py. example test: ``` run python tools/strobelight/examples/compile_time_profile_example.py ``` ``` python torch/utils/_strobelight/examples/compile_time_profile_example.py strobelight_compile_time_profiler, line 61, 2024-05-23 10:49:28,101, INFO: compile time strobelight profiling enabled strobelight_compile_time_profiler, line 93, 2024-05-23 10:49:28,102, INFO: Unique sample tag for this run is: 2024-05-23-10:49:282334638devvm4561.ash0.facebook.com strobelight_compile_time_profiler, line 94, 2024-05-23 10:49:28,102, INFO: You can use the following link to access the strobelight profile at the end of the run: https://www.internalfb.com/intern/scuba/query/?dataset=pyperf_experimental%2Fon_demand&drillstate=%7B%22purposes%22%3A[]%2C%22end%22%3A%22now%22%2C%22start%22%3A%22-30%20days%22%2C%22filterMode%22%3A%22DEFAULT%22%2C%22modifiers%22%3A[]%2C%22sampleCols%22%3A[]%2C%22cols%22%3A[%22namespace_id%22%2C%22namespace_process_id%22]%2C%22derivedCols%22%3A[]%2C%22mappedCols%22%3A[]%2C%22enumCols%22%3A[]%2C%22return_remainder%22%3Afalse%2C%22should_pivot%22%3Afalse%2C%22is_timeseries%22%3Afalse%2C%22hideEmptyColumns%22%3Afalse%2C%22timezone%22%3A%22America%2FLos_Angeles%22%2C%22compare%22%3A%22none%22%2C%22samplingRatio%22%3A%221%22%2C%22metric%22%3A%22count%22%2C%22aggregation_field%22%3A%22async_stack_complete%22%2C%22top%22%3A10000%2C%22aggregateList%22%3A[]%2C%22param_dimensions%22%3A[%7B%22dim%22%3A%22py_async_stack%22%2C%22op%22%3A%22edge%22%2C%22param%22%3A%220%22%2C%22anchor%22%3A%220%22%7D]%2C%22order%22%3A%22weight%22%2C%22order_desc%22%3Atrue%2C%22constraints%22%3A[[%7B%22column%22%3A%22sample_tags%22%2C%22op%22%3A%22all%22%2C%22value%22%3A[%22[%5C%222024-05-23-10:49:282334638devvm4561.ash0.facebook.com%5C%22]%22]%7D]]%2C%22c_constraints%22%3A[[]]%2C%22b_constraints%22%3A[[]]%2C%22ignoreGroupByInComparison%22%3Afalse%7D&view=GraphProfilerView&&normalized=1712358002&pool=uber strobelight_function_profiler, line 241, 2024-05-23 10:49:34,943, INFO: strobelight run id is: 3507039740348330 strobelight_function_profiler, line 243, 2024-05-23 10:50:00,907, INFO: strobelight profiling running strobelight_function_profiler, line 224, 2024-05-23 10:50:02,741, INFO: strobelight profiling stopped strobelight_function_profiler, line 215, 2024-05-23 10:50:06,173, INFO: Total samples: 7 strobelight_function_profiler, line 215, 2024-05-23 10:50:06,173, INFO: GraphProfiler (python stack): https://fburl.com/scuba/pyperf_experimental/on_demand/75cxdro3 strobelight_function_profiler, line 215, 2024-05-23 10:50:06,173, INFO: Icicle view (python stack): https://fburl.com/scuba/pyperf_experimental/on_demand/qsgydsee strobelight_compile_time_profiler, line 120, 2024-05-23 10:50:06,174, INFO: 1 strobelight success runs out of 1 non-recursive compilation events. strobelight_function_profiler, line 241, 2024-05-23 10:50:08,137, INFO: strobelight run id is: 8721740011604497 strobelight_function_profiler, line 243, 2024-05-23 10:50:34,801, INFO: strobelight profiling running strobelight_function_profiler, line 224, 2024-05-23 10:50:36,803, INFO: strobelight profiling stopped strobelight_function_profiler, line 215, 2024-05-23 10:50:41,289, INFO: Total samples: 3 strobelight_function_profiler, line 215, 2024-05-23 10:50:41,289, INFO: GraphProfiler (python stack): https://fburl.com/scuba/pyperf_experimental/on_demand/qmi2ucwp strobelight_function_profiler, line 215, 2024-05-23 10:50:41,289, INFO: Icicle view (python stack): https://fburl.com/scuba/pyperf_experimental/on_demand/7fjkhs9i strobelight_compile_time_profiler, line 120, 2024-05-23 10:50:41,289, INFO: 2 strobelight success runs out of 2 non-recursive compilation events. strobelight_function_profiler, line 241, 2024-05-23 10:50:43,597, INFO: strobelight run id is: 1932476082259558 strobelight_function_profiler, line 243, 2024-05-23 10:51:09,791, INFO: strobelight profiling running strobelight_function_profiler, line 224, 2024-05-23 10:51:11,883, INFO: strobelight profiling stopped strobelight_function_profiler, line 215, 2024-05-23 10:51:16,218, INFO: Total samples: 3 strobelight_function_profiler, line 215, 2024-05-23 10:51:16,218, INFO: GraphProfiler (python stack): https://fburl.com/scuba/pyperf_experimental/on_demand/vy1ujxec strobelight_function_profiler, line 215, 2024-05-23 10:51:16,218, INFO: Icicle view (python stack): https://fburl.com/scuba/pyperf_experimental/on_demand/2xgadviv strobelight_compile_time_profiler, line 120, 2024-05-23 10:51:16,219, INFO: 3 strobelight success runs out of 3 non-recursive compilation events. ``` or pass TORCH_COMPILE_STROBELIGHT=TRUE for any torch compile python program. ex running on XLNetLMHeadModel. ``` TORCH_COMPILE_STROBELIGHT=TRUE TORCHINDUCTOR_FORCE_DISABLE_CACHES=1 time python benchmarks/dynamo/huggingface.py --ci --accuracy --timing --explain --inductor --device cuda --training --amp --only XLNetLMHeadModel ``` result: Pull Request resolved: https://github.com/pytorch/pytorch/pull/126904 Approved by: https://github.com/aorenste ghstack dependencies: #126444	2024-05-29 05:06:37 +00:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	9521528f71	Log export result of torch.jit.trace to scuba (#126900 ) Summary: We want to track how well torch.jit.trace can be converted to export in large scale. As a first step, we log all of torch.jit.trace unittests whether we can convert the traced module to export module OR we can export the model directly Test Plan: CI Differential Revision: D57629682 Pull Request resolved: https://github.com/pytorch/pytorch/pull/126900 Approved by: https://github.com/SherlockNoMad	2024-05-28 17:49:34 +00:00
PyTorch MergeBot	7121ea6f70	Revert "Add compile time profiler for non fbcode targets (#126904 )" This reverts commit 575cb617db4043dd7a76aaf523dc3ab7ee07e7a5. Reverted https://github.com/pytorch/pytorch/pull/126904 on behalf of https://github.com/atalman due to Broke nightly smoke test ([comment](https://github.com/pytorch/pytorch/pull/126904#issuecomment-2133418687))	2024-05-27 12:52:09 +00:00
laithsakka	575cb617db	Add compile time profiler for non fbcode targets (#126904 ) This is a tool that allow profiling compile time using strobelight profiler, its a meta only tool. but works on non-fbcode targets. A follow up diff will unify this with caffe2/fb/strobelight/compile_time_profiler.py. example test: ``` run python tools/strobelight/examples/compile_time_profile_example.py ``` ``` python torch/utils/_strobelight/examples/compile_time_profile_example.py strobelight_compile_time_profiler, line 61, 2024-05-23 10:49:28,101, INFO: compile time strobelight profiling enabled strobelight_compile_time_profiler, line 93, 2024-05-23 10:49:28,102, INFO: Unique sample tag for this run is: 2024-05-23-10:49:282334638devvm4561.ash0.facebook.com strobelight_compile_time_profiler, line 94, 2024-05-23 10:49:28,102, INFO: You can use the following link to access the strobelight profile at the end of the run: https://www.internalfb.com/intern/scuba/query/?dataset=pyperf_experimental%2Fon_demand&drillstate=%7B%22purposes%22%3A[]%2C%22end%22%3A%22now%22%2C%22start%22%3A%22-30%20days%22%2C%22filterMode%22%3A%22DEFAULT%22%2C%22modifiers%22%3A[]%2C%22sampleCols%22%3A[]%2C%22cols%22%3A[%22namespace_id%22%2C%22namespace_process_id%22]%2C%22derivedCols%22%3A[]%2C%22mappedCols%22%3A[]%2C%22enumCols%22%3A[]%2C%22return_remainder%22%3Afalse%2C%22should_pivot%22%3Afalse%2C%22is_timeseries%22%3Afalse%2C%22hideEmptyColumns%22%3Afalse%2C%22timezone%22%3A%22America%2FLos_Angeles%22%2C%22compare%22%3A%22none%22%2C%22samplingRatio%22%3A%221%22%2C%22metric%22%3A%22count%22%2C%22aggregation_field%22%3A%22async_stack_complete%22%2C%22top%22%3A10000%2C%22aggregateList%22%3A[]%2C%22param_dimensions%22%3A[%7B%22dim%22%3A%22py_async_stack%22%2C%22op%22%3A%22edge%22%2C%22param%22%3A%220%22%2C%22anchor%22%3A%220%22%7D]%2C%22order%22%3A%22weight%22%2C%22order_desc%22%3Atrue%2C%22constraints%22%3A[[%7B%22column%22%3A%22sample_tags%22%2C%22op%22%3A%22all%22%2C%22value%22%3A[%22[%5C%222024-05-23-10:49:282334638devvm4561.ash0.facebook.com%5C%22]%22]%7D]]%2C%22c_constraints%22%3A[[]]%2C%22b_constraints%22%3A[[]]%2C%22ignoreGroupByInComparison%22%3Afalse%7D&view=GraphProfilerView&&normalized=1712358002&pool=uber strobelight_function_profiler, line 241, 2024-05-23 10:49:34,943, INFO: strobelight run id is: 3507039740348330 strobelight_function_profiler, line 243, 2024-05-23 10:50:00,907, INFO: strobelight profiling running strobelight_function_profiler, line 224, 2024-05-23 10:50:02,741, INFO: strobelight profiling stopped strobelight_function_profiler, line 215, 2024-05-23 10:50:06,173, INFO: Total samples: 7 strobelight_function_profiler, line 215, 2024-05-23 10:50:06,173, INFO: GraphProfiler (python stack): https://fburl.com/scuba/pyperf_experimental/on_demand/75cxdro3 strobelight_function_profiler, line 215, 2024-05-23 10:50:06,173, INFO: Icicle view (python stack): https://fburl.com/scuba/pyperf_experimental/on_demand/qsgydsee strobelight_compile_time_profiler, line 120, 2024-05-23 10:50:06,174, INFO: 1 strobelight success runs out of 1 non-recursive compilation events. strobelight_function_profiler, line 241, 2024-05-23 10:50:08,137, INFO: strobelight run id is: 8721740011604497 strobelight_function_profiler, line 243, 2024-05-23 10:50:34,801, INFO: strobelight profiling running strobelight_function_profiler, line 224, 2024-05-23 10:50:36,803, INFO: strobelight profiling stopped strobelight_function_profiler, line 215, 2024-05-23 10:50:41,289, INFO: Total samples: 3 strobelight_function_profiler, line 215, 2024-05-23 10:50:41,289, INFO: GraphProfiler (python stack): https://fburl.com/scuba/pyperf_experimental/on_demand/qmi2ucwp strobelight_function_profiler, line 215, 2024-05-23 10:50:41,289, INFO: Icicle view (python stack): https://fburl.com/scuba/pyperf_experimental/on_demand/7fjkhs9i strobelight_compile_time_profiler, line 120, 2024-05-23 10:50:41,289, INFO: 2 strobelight success runs out of 2 non-recursive compilation events. strobelight_function_profiler, line 241, 2024-05-23 10:50:43,597, INFO: strobelight run id is: 1932476082259558 strobelight_function_profiler, line 243, 2024-05-23 10:51:09,791, INFO: strobelight profiling running strobelight_function_profiler, line 224, 2024-05-23 10:51:11,883, INFO: strobelight profiling stopped strobelight_function_profiler, line 215, 2024-05-23 10:51:16,218, INFO: Total samples: 3 strobelight_function_profiler, line 215, 2024-05-23 10:51:16,218, INFO: GraphProfiler (python stack): https://fburl.com/scuba/pyperf_experimental/on_demand/vy1ujxec strobelight_function_profiler, line 215, 2024-05-23 10:51:16,218, INFO: Icicle view (python stack): https://fburl.com/scuba/pyperf_experimental/on_demand/2xgadviv strobelight_compile_time_profiler, line 120, 2024-05-23 10:51:16,219, INFO: 3 strobelight success runs out of 3 non-recursive compilation events. ``` or pass TORCH_COMPILE_STROBELIGHT=TRUE for any torch compile python program. ex running on XLNetLMHeadModel. ``` TORCH_COMPILE_STROBELIGHT=TRUE TORCHINDUCTOR_FORCE_DISABLE_CACHES=1 time python benchmarks/dynamo/huggingface.py --ci --accuracy --timing --explain --inductor --device cuda --training --amp --only XLNetLMHeadModel ``` result: Pull Request resolved: https://github.com/pytorch/pytorch/pull/126904 Approved by: https://github.com/aorenste ghstack dependencies: #126693	2024-05-24 01:39:40 +00:00
dshi7	4644611b14	[cprofile] log manifold link instead of raw data to trace_structured (#126451 ) Internal D57459752 returns manifold URL and this PR adds to tlparse payload Pull Request resolved: https://github.com/pytorch/pytorch/pull/126451 Approved by: https://github.com/jamesjwu	2024-05-21 00:44:55 +00:00
Edward Z. Yang	b2d9b80fba	Also remove compile_time_strobelight_meta frame when generating stack (#126289 ) I think I also need to fix this in fbcode, leaving that for future work. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/126289 Approved by: https://github.com/yanboliang	2024-05-15 23:55:37 +00:00
Daohang Shi	b7d67e476d	upload pt2 cprofile stats to manifold (#125162 ) Summary: https://fb.workplace.com/groups/257735836456307/permalink/657458576484029/ upload cprofile to manifold D56696397 has a script to convert profiler stats to dot graphs (see its test plan) Test Plan: non-MAST `TORCH_COMPILE_CPROFILE=1 buck2 run mode/opt mode/inplace //pytorch/benchmark:run -- ads_mc_igctr_mc3_v0 -d cuda -t train --torchdynamo inductor --profile --profile-export-chrome-trace` https://www.internalfb.com/manifold/explorer/pyper_traces/tree/compilation_cprofile/test/20240428_234002_7562397568 MAST `buck2 run mode/opt aps_models/ads/icvr:icvr_launcher -- mode=mast_ctr_cvr_cmf_rep launcher.fbl_entitlement=ai_infra_training_rnd_tc features=ctr_cvr_conso_cmf_pipeline_features_455876776_3teach model=ctr_cvr_cmf_when_rep_config_msmn_3teach model_name=ctr_cvr_when model.when_arch.use_extended_residual_contexts=True optimizers.dense_default.lr_schedule.0.max_iters=20000 training.planner.storage_reservation_policy=FixedPercentage training.planner.storage_reservation_percentage=0.72 data_loader.dataset.batch_size=2048 trainer.garbage_collection.garbage_collection_interval=100 model.when_arch.layer_norm_init_weight=0.3 optimizers.dense_default.lr_schedule.0.value=0.001 model.when_arch.customized_mlp_init_scale=0.3 launcher.num_workers=128 launcher.max_retries=10 launcher.data_project=oncall_ads_model_platform launcher.hardware=ZIONEX_80G data_loader.dataset.table_ds="[2024-01-01]" launcher.job_name=test_inductor_logging` https://www.internalfb.com/manifold/explorer/pyper_traces/tree/compilation_cprofile/aps-test_inductor_logging-745febb51a Generating dotty files from D56696397 ``` Generating dot file from cprofile stats /home/daohang/aps-test_inductor_logging-745febb51a/0/0/_compile1.profile ... P1225733598: https://www.internalfb.com/intern/paste/P1225733598/ Dotty: https://www.internalfb.com/intern/graphviz/?paste=1225733598 Generating dot file from cprofile stats /home/daohang/aps-test_inductor_logging-745febb51a/0/0/_compile10.profile ... P1225733629: https://www.internalfb.com/intern/paste/P1225733629/ Dotty: https://www.internalfb.com/intern/graphviz/?paste=1225733629 Generating dot file from cprofile stats /home/daohang/aps-test_inductor_logging-745febb51a/0/0/_compile0.profile ... P1225733649: https://www.internalfb.com/intern/paste/P1225733649/ Dotty: https://www.internalfb.com/intern/graphviz/?paste=1225733649 ``` Differential Revision: D56679561 Pull Request resolved: https://github.com/pytorch/pytorch/pull/125162 Approved by: https://github.com/anijain2305	2024-04-30 15:05:01 +00:00
Jack Taylor	4b586a434f	[ROCm] Triton upstream AMD backend integration (#121801 ) Update ROCm-triton to use the AMD backend from https://github.com/openai/triton Note: `test__int_mm` can be enabled after https://github.com/pytorch/pytorch/pull/122431 is landed Co-authored-by: Pruthvi Madugundu <pruthvigithub@gmail.com> Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/121801 Approved by: https://github.com/nmacchioni, https://github.com/malfet	2024-04-25 20:44:27 +00:00
PyTorch MergeBot	3890848ec2	Revert "[ROCm] Triton upstream AMD backend integration (#121801 )" This reverts commit 9888d7495ece6b6df3b7334fc7c2a9d869359250. Reverted https://github.com/pytorch/pytorch/pull/121801 on behalf of https://github.com/jeanschmidt due to need to revert so I can revert https://github.com/pytorch/pytorch/pull/124592 ([comment](https://github.com/pytorch/pytorch/pull/121801#issuecomment-2076951327))	2024-04-25 11:22:19 +00:00
Jack Taylor	9888d7495e	[ROCm] Triton upstream AMD backend integration (#121801 ) Update ROCm-triton to use the AMD backend from https://github.com/openai/triton Note: `test__int_mm` can be enabled after https://github.com/pytorch/pytorch/pull/122431 is landed Co-authored-by: Pruthvi Madugundu <pruthvigithub@gmail.com> Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/121801 Approved by: https://github.com/nmacchioni, https://github.com/malfet	2024-04-24 17:28:12 +00:00
Laith Sakka	8cf54929e3	compiletime->compile_time (#124579 ) Summary: title. Test Plan: run strobelight profiler. Reviewed By: oulgen Differential Revision: D56395415 Pull Request resolved: https://github.com/pytorch/pytorch/pull/124579 Approved by: https://github.com/oulgen	2024-04-23 20:50:53 +00:00
Laith Sakka	acbf888a13	rename sl to strobelight (#124455 ) Summary: TORCH_COMPILE_SL_PROFILE ->TORCH_COMPILE_STROBELIGHT SL_MAX_STACK_LENGTH -> COMPILE_STROBELIGHT_MAX_STACK_LENGTH SL_MAX_PROFILE_TIME -> COMPILE_STROBELIGHT_MAX_PROFILE_TIME profile_with_sl() -> strobelight() compiletime_sl_profile_meta() -> compiletime_strobelight_meta() Test Plan: 1. run and verify ``` TORCH_COMPILE_STROBELIGHT=TRUE buck2 run @//mode/inplace @//mode/opt //caffe2/fb/strobelight:compiletime_profiler_example ``` 2. run and verify ``` buck2 run @//mode/inplace @//mode/opt //caffe2/fb/strobelight:function_profiler_example --local-only ``` 3. run and verify truncated stack for ``` TORCH_COMPILE_STROBELIGHT=TRUE COMPILE_STROBELIGHT_MAX_STACK_LENGTH=1 buck2 run @//mode/inplace @//mode/opt //caffe2/fb/strobelight:compiletime_profiler_example ``` 4. add infinite loop in _verify and verify samples for ``` COMPILE_STROBELIGHT_MAX_PROFILE_TIME=30 TORCH_COMPILE_STROBELIGHT=TRUE buck2 run @//mode/inplace @//mode/opt //caffe2/fb/strobelight:compiletime_profiler_example ``` Reviewed By: oulgen Differential Revision: D56327139 Pull Request resolved: https://github.com/pytorch/pytorch/pull/124455 Approved by: https://github.com/oulgen	2024-04-19 22:50:13 +00:00
rzou	d1e1d671ef	Stop requiring a pystub for register_fake by default (#124064 ) Previously, if someone used `register_fake` to add a fake impl for an operator defined in C++, we would require them to add a `m.set_python_module(<module>)` call to C++. This was to avoid situations where a user imported the C++ operator without importing the fake impl. This "breaks" open registration: there's no way to add a fake impl outside of a repository that defines an operator, so we want to turn this behavior off by default in open source. Test Plan: - existing tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/124064 Approved by: https://github.com/albanD ghstack dependencies: #123937	2024-04-17 23:51:20 +00:00
rzou	47dbfecd37	Rename impl_abstract to register_fake, part 1/2 (#123937 ) This PR: - adds a new torch.library.register_fake and deprecates torch.library.impl_abstract. The motivation is that we have a lot of confusion around the naming so we are going to align the naming with the actual subsystem (FakeTensor). - renames `m.impl_abstract_pystub("fbgemm_gpu.sparse_ops")` to `m.has_python_registration("fbgemm_gpu.sparse_ops")`. No deprecation here yet; I need to test how this works with static initialization. - Renames a bunch of internals to match (e.g. abstractimplpystub -> pystub) I'm scared to rename the Python-side internal APIs (e.g. torch._library.abstract_impl) because of torch.package concerns. I'll do that in its own isolated PR next just in case it causes problems. DEPRECATION NOTE: torch.library.impl_abstract was renamed to to torch.library.register_fake. Please use register_fake. We'll delete impl_abstract in a future version of PyTorch. Test Plan: - existing tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/123937 Approved by: https://github.com/albanD	2024-04-17 12:46:01 +00:00
Laith Sakka	caed7f6727	profile pt2 compile time with strobelight (#123311 ) For oss this diff adds a decorator @profile_sb_fbcode that is a nop for non meta workload. Facebook: With this diff someone can generate a strobelight profile for pt2 compilation. users need to set the env variable TORCH_COMPILE_SL_PROFILE =TRUE . For example: ``` TORCH_COMPILE_SL_PROFILE =TRUE buck2 run @//mode/inplace @//mode/opt //caffe2/fb/strobelight:compiletime_profile_example ``` see sample output bellow, at the end of summary. The way this works, is that a unique id is generated and associated with all samples that are collected for functions that are decorated with profile_sb_fbcode. This id can then be used to combine different strobe light profile into one. (for example three compilation events happens in the code bellow). Right now the following two functions are annotated with profile_sb_fbcode. bw_compiler and _compile. if two profile_sl_fbcode is called recursively, recursive invocations are ignored and a log is printed. The output is: ``` Strobelight is enabled for pt2 compilation Unique user-id for this run is: 2024-04-03-13:59:49147091devvm4561.ash0.facebook.com You can use the following link to access the strobelight profile at the end of the run: https://www.internalfb.com/intern/scuba/query/?dataset=pyperf_experimental%2Fon_demand&drillstate=%7B%22purposes%22%3A[]%2C%22end%22%3A%22now%22%2C%22start%22%3A%22-30%20days%22%2C%22filterMode%22%3A%22DEFAULT%22%2C%22modifiers%22%3A[]%2C%22sampleCols%22%3A[]%2C%22cols%22%3A[%22namespace_id%22%2C%22namespace_process_id%22]%2C%22derivedCols%22%3A[]%2C%22mappedCols%22%3A[]%2C%22enumCols%22%3A[]%2C%22return_remainder%22%3Afalse%2C%22should_pivot%22%3Afalse%2C%22is_timeseries%22%3Afalse%2C%22hideEmptyColumns%22%3Afalse%2C%22timezone%22%3A%22America%2FLos_Angeles%22%2C%22compare%22%3A%22none%22%2C%22samplingRatio%22%3A%221%22%2C%22metric%22%3A%22count%22%2C%22aggregation_field%22%3A%22async_stack_complete%22%2C%22top%22%3A10000%2C%22aggregateList%22%3A[]%2C%22param_dimensions%22%3A[%7B%22dim%22%3A%22py_async_stack%22%2C%22op%22%3A%22edge%22%2C%22param%22%3A%220%22%2C%22anchor%22%3A%220%22%7D]%2C%22order%22%3A%22weight%22%2C%22order_desc%22%3Atrue%2C%22constraints%22%3A[[%7B%22column%22%3A%22run_user%22%2C%22op%22%3A%22eq%22%2C%22value%22%3A[%22[%5C%222024-04-03-13:59:49147091devvm4561.ash0.facebook.com%5C%22]%22]%7D]]%2C%22c_constraints%22%3A[[]]%2C%22b_constraints%22%3A[[]]%2C%22ignoreGroupByInComparison%22%3Afalse%7D&view=GraphProfilerView&&pool=uber&graphprofiler_filter=&graphprofiler_column_to_sort_by=exclusive the link below takes you to the collected strobelight profile https://www.internalfb.com/intern/scuba/query/?dataset=pyperf_experimental%2Fon_demand&drillstate=%7B%22dimensions%22%3A%5B%5D%2C%22param_dimensions%22%3A%5B%7B%22anchor%22%3A%220%22%2C%22param%22%3A%220%22%2C%22op%22%3A%22edge%22%2C%22dim%22%3A%22py_async_stack%22%7D%5D%2C%22constraints%22%3A%5B%5B%7B%22value%22%3A%5B%22%5B%5C%22-6800545191281321%5C%22%5D%22%5D%2C%22op%22%3A%22eq%22%2C%22column%22%3A%22run_id%22%7D%2C%7B%22value%22%3A%5B%22%5B%5C%222024-04-03-13%3A59%3A49147091devvm4561.ash0.facebook.com%5C%22%5D%22%5D%2C%22op%22%3A%22eq%22%2C%22column%22%3A%22run_user%22%7D%5D%5D%2C%22top%22%3A10000%2C%22end%22%3A%221712181610%22%2C%22start%22%3A%221712174410%22%7D&view=GraphProfilerView& 1 storbelight success runs out of 1 non-ignored runs. strobelight run id is: 6181728288420687 the link below takes you to the collected strobelight profile https://www.internalfb.com/intern/scuba/query/?dataset=pyperf_experimental%2Fon_demand&drillstate=%7B%22dimensions%22%3A%5B%5D%2C%22param_dimensions%22%3A%5B%7B%22anchor%22%3A%220%22%2C%22param%22%3A%220%22%2C%22op%22%3A%22edge%22%2C%22dim%22%3A%22py_async_stack%22%7D%5D%2C%22constraints%22%3A%5B%5B%7B%22value%22%3A%5B%22%5B%5C%226181728288420687%5C%22%5D%22%5D%2C%22op%22%3A%22eq%22%2C%22column%22%3A%22run_id%22%7D%2C%7B%22value%22%3A%5B%22%5B%5C%222024-04-03-13%3A59%3A49147091devvm4561.ash0.facebook.com%5C%22%5D%22%5D%2C%22op%22%3A%22eq%22%2C%22column%22%3A%22run_user%22%7D%5D%5D%2C%22top%22%3A10000%2C%22end%22%3A%221712181621%22%2C%22start%22%3A%221712174421%22%7D&view=GraphProfilerView& 2 storbelight success runs out of 2 non-ignored runs. strobelight run id is: -1026103682715688 the link below takes you to the collected strobelight profile https://www.internalfb.com/intern/scuba/query/?dataset=pyperf_experimental%2Fon_demand&drillstate=%7B%22dimensions%22%3A%5B%5D%2C%22param_dimensions%22%3A%5B%7B%22anchor%22%3A%220%22%2C%22param%22%3A%220%22%2C%22op%22%3A%22edge%22%2C%22dim%22%3A%22py_async_stack%22%7D%5D%2C%22constraints%22%3A%5B%5B%7B%22value%22%3A%5B%22%5B%5C%22-1026103682715688%5C%22%5D%22%5D%2C%22op%22%3A%22eq%22%2C%22column%22%3A%22run_id%22%7D%2C%7B%22value%22%3A%5B%22%5B%5C%222024-04-03-13%3A59%3A49147091devvm4561.ash0.facebook.com%5C%22%5D%22%5D%2C%22op%22%3A%22eq%22%2C%22column%22%3A%22run_user%22%7D%5D%5D%2C%22top%22%3A10000%2C%22end%22%3A%221712181647%22%2C%22start%22%3A%221712174447%22%7D&view=GraphProfilerView& 3 storbelight success runs out of 3 non-ignored runs. ``` Test Plan: Was tested on buck2 run @//mode/inplace @//mode/opt //caffe2/fb/strobelight:compiletime_profile_example This was also tested in one of the ads benchmarks ``` TORCH_COMPILE_SL_PROFILE =TRUE buck2 run mode/opt mode/inplace //pytorch/benchmark:run -- ads_mc_igctr_mc3_v0 -d cuda -t train --torchdynamo inductor ``` The results matches the results reported in https://fb.workplace.com/groups/257735836456307/permalink/657458576484029 Differential Revision: D55672271 Pull Request resolved: https://github.com/pytorch/pytorch/pull/123311 Approved by: https://github.com/aorenste	2024-04-06 18:57:44 +00:00
Zhengxu Chen	b1fa0ce4aa	[export] build the infra to rollout predispatch export. (#122326 ) Test Plan: fbcode:caffe2/test/quantization:test_quantization fbcode:bolt/nn/executorch/backends/tests:qnn_test fbcode:on_device_ai/helios/compiler_tests/... fbcode:pyspeech/tests:pyspeech_utils_test_oss fbcode:caffe2/test:quantization_pt2e_qat fbcode:on_device_ai/Assistant/Jarvis/tests:test_custom_ops fbcode:modai/test:test_modai fbcode:executorch/exir/backend/test:test_partitioner Differential Revision: D55133846 Pull Request resolved: https://github.com/pytorch/pytorch/pull/122326 Approved by: https://github.com/tugsbayasgalan	2024-03-22 00:55:10 +00:00
Yanan Cao (PyTorch)	ba9a1d96a4	Add scuba logging for TorchScript usage (#121936 ) Summary: Infra to log live usage of TorchScript internally Test Plan: manually tested Differential Revision: D54923510 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121936 Approved by: https://github.com/zhxchen17	2024-03-19 17:38:27 +00:00
Oguz Ulgen	a04e7fca8e	Use memcache versioning for autotune remote cache (#121748 ) Summary: Internal training platform doesn't get updated very frequently, so lets use versioning for memcache. Test Plan: existing tests Differential Revision: D54818197 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121748 Approved by: https://github.com/aakhundov, https://github.com/jansel	2024-03-14 00:36:10 +00:00
Edward Yang	02a410ee12	Enable TORCH_TRACE by default in all Tupperware like environments (#120915 ) Summary: This is a reimplemented version of the FB specific code in https://www.internalfb.com/diff/D54230697 The new strategy is that we unconditionally install an FB handler to trace_log logger (and always set level to DEBUG). When the first log message is emitted, we check the JK/filesystem to see if we should actually do logging. If we decide we don't do logging, we remove the handler from trace_log and are done. build_only[github-export-checks,executorch,pytorch_benchmark,pytorch_quantization,pytorch_distributed,pytorch_distributed_gpu,pytorch_dynamo_inductor,pytorch_functorch,pytorch_fx2trt,pytorch_diff_train_tests_ads,glow_fb_pytorch_tests,training_platform,training_platform_compatibility,training_toolkit_applications,training_toolkit_examples,training_toolkit_model_optimization,dper3_pytorch,xplat_caffe2,pytorch_dev,android-pytorch-instrumentation-tests,smartpytorchgithub_first_try_merge,frl-target-determinator,f6-buck,training_platform_for_github,sigmoid_cpu,sigmoid_gpu,aiplatform_modelprocessing_for_github,accelerators_workloads_models_slimdsnn,ae_aotinductor_benchmark_test,aps_,aps_deterministic_ne_tests,dper_lib_silvertorch,torchrec,torchrec_fb,deeplearning_aot_inductor] Test Plan: sandcastle ``` buck2 test 'fbcode//mode/dev-nosan' fbcode//torchrec/inference/tests:test_single_gpu_executor -- --exact 'torchrec/inference/tests:test_single_gpu_executor - TorchDeployGPUTest.NestedModelSingleGPU' buck2 test 'fbcode//mode/dev-nosan' fbcode//dper_lib/silvertorch/modules/dynamic_stats/tests:accumulators_test -- --exact 'dper_lib/silvertorch/modules/dynamic_stats/tests:accumulators_test - test_global_fixed_interval_accumulator (dper_lib.silvertorch.modules.dynamic_stats.tests.accumulators_test.GlobalFixedIntervalUnivalentAcculumatorTest)' ``` Also running a test flow with/without JK enabled Differential Revision: D54275086 Pull Request resolved: https://github.com/pytorch/pytorch/pull/120915 Approved by: https://github.com/yanboliang	2024-03-01 04:47:13 +00:00
PyTorch MergeBot	f3dd2a544c	Revert "Add structured trace logs (#120289 )" This reverts commit 9dfaef962cda5f65eec53e5fd6f07b5226ea65cb. Reverted https://github.com/pytorch/pytorch/pull/120289 on behalf of https://github.com/kit1980 due to breaking internal builds, see D54230697 ([comment](https://github.com/pytorch/pytorch/pull/120289#issuecomment-1967477120))	2024-02-27 19:49:05 +00:00
Edward Z. Yang	9dfaef962c	Add structured trace logs (#120289 ) Overall design: https://docs.google.com/document/d/1CX_hJ0PNy9f3R1y8TJrfkSeLkvGjjjLU84BSXgS2AZ8/edit How to read the diff: * Most files are me augmenting pre-existing logging with structured variants. For the most part it's simple (esp FX graphs, which have a canonical string representation); it gets more complicated when I decided to JSON-ify some data structure instead of keeping the ad hoc printing (notably, guards and dynamo output graph sizes) * torch/_functorch/_aot_autograd/collect_metadata_analysis.py is some unrelated fixes I noticed while auditing artifact logs * torch/_logging/_internal.py has the actual trace log implementation. The trace logger is implement as a logger named torch.__trace which is disconnected from the logging hierarchy. It gets its own handler and formatter (TorchLogsFormatter with _is_trace True). There's a teensy bit of FB specific code to automatically enable trace logging if a /logs directory exists. `trace_structured` is the main way to emit a trace log. Unusually, there's a separate "metadata" and "payload" field. The metadata field should not be too long (as it is serialized as a single line) and is always JSON (we put contextual things like compile id in it); the payload field can be long and is emitted after the metadata log line and can span multiple lines. * torch/_logging/structured.py contains some helpers for converting Python data structures into JSON form. Notably, we have a string interning implementation here, which helps reduce the cost of serializing filenames into the log. * test/dynamo/test_structured_trace.py the tests are cribbed from test_logging.py, but all rewritten to use expect tests on munged versions of what we'd actually output. Payloads are never tested, since they tend not be very stable. https://github.com/ezyang/tlparse is a POC Rust program that can interpret these logs. Testing that the fbcode detection works at https://www.internalfb.com/mlhub/pipelines/runs/fblearner/534553450 (Meta-only) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/120289 Approved by: https://github.com/Skylion007	2024-02-27 00:04:23 +00:00
Menglu Yu	7b1f5c874f	[PT2][Optimus][Observability] Log the optimus graph transformation to the scuba (#119745 ) Summary: Current everstore upload logging may cuase excessive compilation time when the model has lots of graph breaks (post: https://fb.workplace.com/groups/257735836456307/permalink/633533465543207/), we here log the transformation only when the graph changed Test Plan: timeout flows: f528209775 f530084719 Differential Revision: D53692344 Pull Request resolved: https://github.com/pytorch/pytorch/pull/119745 Approved by: https://github.com/jackiexu1992	2024-02-16 21:32:04 +00:00
Bert Maher	563f1b9fef	[inductor] Use torch.cuda.clock_rate instead of triton.testing.nvsmi (#118662 ) `triton.testing.nvsmi` invokes `nvidia-smi` as a subprocess, and Meta prod usually doesn't make nvidia-smi available. Might as well just use something that's native to torch. Differential Revision: [D53235814](https://our.internmc.facebook.com/intern/diff/D53235814/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/118662 Approved by: https://github.com/jansel	2024-02-14 03:23:49 +00:00
Zhengxu Chen	8069b29603	[export] Implement logging for scuba. (#119585 ) Summary: As we're growing the user surface of torch.export, we'd like to understand better how people are using our APIs. It's also possible to analyze the usages based on static analysis, but due to the fact that there could be many creative ways to call things in Python, I think just building some logging infra will benefit us in the short term and gain us some insights. Test Plan: buck test caffe2/test:test_export {F1454519846} Reviewed By: tugsbayasgalan Differential Revision: D53618220 Pull Request resolved: https://github.com/pytorch/pytorch/pull/119585 Approved by: https://github.com/avikchaudhuri	2024-02-12 17:28:14 +00:00
Will Constable	da0635d17c	Add pytorch-distributed justknobs helper (#118568 ) Summary: Sets up a helper that checks any JKs relevent to pytorch distributed, and propagates their values to ENV. Test Plan: Added unit test Differential Revision: D53192406 Pull Request resolved: https://github.com/pytorch/pytorch/pull/118568 Approved by: https://github.com/zdevito	2024-01-30 08:13:52 +00:00
PyTorch MergeBot	bb55970e5b	Revert "Add justknobs env helper for pytorch distributed (#118451 )" This reverts commit 4d1bb2175a49e9b4440085a3dc2e2b211e5cf99e. Reverted https://github.com/pytorch/pytorch/pull/118451 on behalf of https://github.com/wconstab due to Broke internal tests ([comment](https://github.com/pytorch/pytorch/pull/118451#issuecomment-1915369013))	2024-01-29 19:01:05 +00:00
Will Constable	4d1bb2175a	Add justknobs env helper for pytorch distributed (#118451 ) Summary: Adds a JK killswitch check and configures the env for enabling pytorch nccl flight recorder. Note- this only enables recording events in memory, not dumping them. Test Plan: CI test Reviewed By: zdevito Differential Revision: D52920092 Pull Request resolved: https://github.com/pytorch/pytorch/pull/118451 Approved by: https://github.com/malfet	2024-01-29 08:57:16 +00:00

1 2 3

130 Commits