pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Author	SHA1	Message	Date
Yuanyuan Chen	36871622f1	[2/N] Mark unused parameters in C++ code (#165121 ) This is follow-up of #164912 to mark unused C++ parameters to improve code readability. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165121 Approved by: https://github.com/Skylion007	2025-10-15 03:04:39 +00:00
Mihai Polceanu	6fa3715c12	Expose Kineto event metadata in PyTorch Profiler events (#161624 ) ## Overview This PR allows the profiler users to access `Kineto` and `TorchOp` metadata in JSON string format through a new `metadata_json` attribute in `FunctionEvent` objects, which is triggered through a new `expose_kineto_event_metadata` flag in `ExperimentalConfig`. ## Testing A unit test was added to validate functionality. ## Documentation Added/updated function doc strings where appropriate. ## Example output ```python import torch from torch.profiler import profile with profile(experimental_config=torch._C._profiler._ExperimentalConfig(expose_kineto_event_metadata=True)) as prof: res = torch.mm(torch.rand(1024, 1024), torch.rand(1024, 1024)) for event in prof.events(): print(f'name: {event.key}, metadata: {event.metadata_json}') ``` ``` name: aten::rand, metadata: "Ev Idx": 0 name: aten::empty, metadata: "Ev Idx": 1 name: aten::uniform_, metadata: "Ev Idx": 2 name: aten::rand, metadata: "Ev Idx": 3 name: aten::empty, metadata: "Ev Idx": 4 name: aten::uniform_, metadata: "Ev Idx": 5 name: aten::mm, metadata: "Ev Idx": 6 name: aten::resolve_conj, metadata: "Ev Idx": 7 name: aten::resolve_conj, metadata: "Ev Idx": 8 name: aten::resolve_conj, metadata: "Ev Idx": 9 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/161624 Approved by: https://github.com/sraikund16	2025-09-25 14:58:30 +00:00
FFFrog	ab2ce3c50e	[Code Clean] Replace std::runtime_error with TORCH_CHECK (#163264 ) Related ISSUE: https://github.com/pytorch/pytorch/issues/148114 Pull Request resolved: https://github.com/pytorch/pytorch/pull/163264 Approved by: https://github.com/albanD, https://github.com/cyyever	2025-09-25 11:28:51 +00:00
Mu-Chu Lee	6b5ad5f211	[Kineto] Add list of string parsing for profiler (#163593 ) Summary: We add the parsing for list of string. This is needed for AOTInductor profiling for input information of Triton kernels. Test Plan: Included in commit. test_profiler_op_event_kwargs_list_of_strings Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/163593 Approved by: https://github.com/sraikund16	2025-09-23 22:45:49 +00:00
Witold Dziurdz	54701a0c94	Add is_hidden_event method to KinetoEvent Python interface (#155214 ) Fixes #155213 Pull Request resolved: https://github.com/pytorch/pytorch/pull/155214 Approved by: https://github.com/sraikund16	2025-07-02 16:29:21 +00:00
Xuehai Pan	5b210bb3a6	[BE][9/16] fix typos in torch/ (torch/csrc/) (#156319 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156319 Approved by: https://github.com/albanD ghstack dependencies: #156313, #156314, #156315, #156316, #156317	2025-06-23 02:57:50 +00:00
PyTorch MergeBot	1d3bca40ed	Revert "[BE][9/16] fix typos in torch/ (torch/csrc/) (#156319 )" This reverts commit a23ccaa8479e038e79532759a64e9947c0fac43d. Reverted https://github.com/pytorch/pytorch/pull/156319 on behalf of https://github.com/atalman due to export/test_torchbind.py::TestCompileTorchbind::test_compile_error_on_input_aliasing_contents_backend_aot_eager [GH job link](https://github.com/pytorch/pytorch/actions/runs/15804799771/job/44548489912) [HUD commit link](`c95f7fa874`) ([comment](https://github.com/pytorch/pytorch/pull/156313#issuecomment-2994171213))	2025-06-22 12:31:56 +00:00
Xuehai Pan	a23ccaa847	[BE][9/16] fix typos in torch/ (torch/csrc/) (#156319 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156319 Approved by: https://github.com/albanD ghstack dependencies: #156313, #156314, #156315, #156316, #156317	2025-06-22 08:43:49 +00:00
Frost Mitchell	d83ff89d3b	Add toggle functionality for XPU profiler (#155135 ) Fixes #154898 by adding ability to toggle XPU profiler on and off (which has already been added in pytorch/kineto#1088 Pull Request resolved: https://github.com/pytorch/pytorch/pull/155135 Approved by: https://github.com/guangyey, https://github.com/sraikund16	2025-06-20 17:27:48 +00:00
cyyever	24ca7e91e6	[1/N] Use internal linkage in torch/csrc C++ files. (#150930 ) Turn more functions and variables into static if they are not used outside the cpp files. Unused functions are removed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/150930 Approved by: https://github.com/Skylion007 Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>	2025-04-11 02:19:31 +00:00
Shivam Raikundalia	99c9a31386	[submodule] [Snapshot/Profiler] Memory Snapshot On Demand (#150559 ) Summary: Profiler side of memory snapshot. 1. Add API to actually do snapshot when client interface is called 2. Add ifdefs to builds so that kineto hooks snapshot correctly. Design Philosophy: There is one interesting part of this implementation and it is during export. For export we are callign the python impl of the export rather than CPP even though we are already in CPP. This is because it is better to simply have one path of export rather than 2. Personally, I want there to be parity between auto-trace and on-demand so it if we can limit the side paths then we will have an easier time maintaining this relationship Test Plan: {F1976563426} Reviewed By: sanrise Differential Revision: D70733247 Pull Request resolved: https://github.com/pytorch/pytorch/pull/150559 Approved by: https://github.com/sanrise	2025-04-07 13:04:38 +00:00
Mwiza Kunda	b5873292c6	Add overload names to profiler trace (#143114 ) Currently, recorded profiler events for aten ops do not store overload names. It would be useful to know which overloads are actually called to analyse performance. For example, consider the following dispatch trace which occurs if there is a fallthrough kernel registered for aten::add: ``` [call] op=[aten::add.Tensor], key=[AutogradCPU] [redispatch] op=[aten::add.Tensor], key=[Undefined] [call] op=[aten::empty.memory_format], key=[BackendSelect] [redispatch] op=[aten::empty.memory_format], key=[CPU] [call] op=[aten::add.out], key=[CPU] ``` In this case, aten::add.out is a child of aten::add.Tensor, however the current profiler trace provides no way to differentiate aten op calls. See the added unit test for a more detailed example. Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/143114 Approved by: https://github.com/sraikund16	2025-03-05 01:00:29 +00:00
Natalia Gimelshein	2ab698e708	allow profiling on all threads via experimentalConfig (#143659 ) In some situations we want to profile calls coming from all threads (similar to on-demand), not just the thread that started profiling and the spawned threads that would inherit KinetoThreadLocal state. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143659 Approved by: https://github.com/sraikund16	2024-12-23 20:41:27 +00:00
Shivam Raikundalia	ff373171d0	[Profiler] Add Optional Flag to turn off external correlations v2 (#143314 ) Summary: The original diff got reverted because its base commit was on a broken version of pytorch that was failing rocm tests. There is no indication that this diff had any effect on rocm. Had trouble rebasing the GH pr after revert and accidentally closed the PR so submitting again . Test Plan: See original PR with same name Differential Revision: D67293040 Pull Request resolved: https://github.com/pytorch/pytorch/pull/143314 Approved by: https://github.com/leitian, https://github.com/aaronenyeshi	2024-12-16 23:49:13 +00:00
PyTorch MergeBot	9ed045eae9	Revert "[Profiler] Add Optional Flag to turn off external correlations (#142516 )" This reverts commit b29fc52f827cc4b4336ecd24cc0a019ec9cf24b6. Reverted https://github.com/pytorch/pytorch/pull/142516 on behalf of https://github.com/huydhn due to Sorry for reverting your change but the test is failing on ROCm ([comment](https://github.com/pytorch/pytorch/pull/142516#issuecomment-2543431758))	2024-12-15 03:34:37 +00:00
Shivam Raikundalia	b29fc52f82	[Profiler] Add Optional Flag to turn off external correlations (#142516 ) Summary: External Correlations are super spammy and oftentimes not even useful. Add flag during init to remove them entirely Test Plan: https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/devvm2185.cco0.facebook.com/rank-0.Dec_10_12_33_31.531106.pt.trace.json.gz&bucket=gpu_traces Differential Revision: D67048206 Pull Request resolved: https://github.com/pytorch/pytorch/pull/142516 Approved by: https://github.com/ngimel	2024-12-13 22:32:09 +00:00
Darshan Sanghani	28efc17d2c	[pytorch/profiler] Honor escape quotes arg in a profiler metadata log formatter (#141527 ) (#141626 ) Summary: We were ignoring the with_escaped_quotes param in format_list inline function iin utils.cpp in the case where we had to truncate a list of more than kTruncatelength items. In that case we would truncate a list into a string but always return it with an escaped quotes wrapping it. this will cause issues if this string is meant to be added to other lists which will also go through formatting. Leading to cases like `"["[a, b, c, ...]"]"`. now the above will be well formatted as `"[[a, b, c, ...]]"` as the escape quote requests will be honored. Differential Revision: D66521676 Pull Request resolved: https://github.com/pytorch/pytorch/pull/141626 Approved by: https://github.com/sraikund16	2024-12-03 20:42:57 +00:00
Darshan Sanghani	32e93dfa92	[pytorch/profiler] Profiler NCCL metadata can now contain collective Input and Ouput Tensor addrs (#140637 ) Summary: Studying memory access patterns is the primary use cases. Differential Revision: D65918359 Pull Request resolved: https://github.com/pytorch/pytorch/pull/140637 Approved by: https://github.com/briancoutinho	2024-11-19 22:22:16 +00:00
PyTorch MergeBot	97d995a0d3	Revert "[pytorch/profiler] Profiler NCCL metadata can now contain collective Input and Ouput Tensor addrs (#139837 )" This reverts commit 3e277eb9febbbdd435e6a07a3f0750d4e362625a. Reverted https://github.com/pytorch/pytorch/pull/139837 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/139837#issuecomment-2473466607))	2024-11-13 12:26:43 +00:00
Darshan Sanghani	3e277eb9fe	[pytorch/profiler] Profiler NCCL metadata can now contain collective Input and Ouput Tensor addrs (#139837 ) Studying memory access patterns is the primary use cases. Internal: The data may be used to find the % of operators that may cause alignment related overhead. Differential Revision: [D64413699](https://our.internmc.facebook.com/intern/diff/D64413699/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139837 Approved by: https://github.com/sraikund16	2024-11-13 04:57:16 +00:00
Shivam Raikundalia	267641f6f1	[Profiler] Add More Logging for Dynamic Collection API (#140285 ) Summary: Add a log warning users about how disabling only CUDA events can cause incorrect correlation IDs Test Plan: Log was printed in the correct scenario Differential Revision: D65762576 Pull Request resolved: https://github.com/pytorch/pytorch/pull/140285 Approved by: https://github.com/sanrise	2024-11-12 22:59:04 +00:00
Shivam Raikundalia	ac7acfb894	[Profiler] Create Auto-Trace Frontend for Trace ID (#139310 ) Summary: This PR adds Auto-Trace implementation for Trace ID. By default, the python side will generate a uuid in the same format as the one set in the backend by kineto. Upon running an auto-trace, the python generated trace id will overwrite the one set in kineto using the Config variable. Since we don't expect users to generate on-demand traces after an auto-trace we can simply keep overwriting the backend trace id whenever autotrace is ran. If we one day want to eventually do something like this, we simply have to add a call in kineto on the backend to generate a new ID upon start of profiling. We also implement a custom callback in the frontend such that users can generate their own trace ids if they wish to. This works similarly as the default, only difference being that they have to manually set this callback after a profiler is generated. We use a specific call to set this rather then putting it in the frontend initializer in case users want to change the trace_id for different repeats. Test Plan: Tested both default and custom callbacks using the verbose prints added. Trace ids on the frontend and the prints on the backend for the manifold upload matched. Differential Revision: D65178308 Pull Request resolved: https://github.com/pytorch/pytorch/pull/139310 Approved by: https://github.com/shengfukevin	2024-10-31 19:02:57 +00:00
Shivam Raikundalia	3a1239a248	[Profiler] Harden Record Function Kwargs (#135365 ) Summary: In S445839, we had HTA break because of the "stream" parameter that was added to gpu traces. This brought up discussions regarding hardening our post processing of said inputs as to not break JSON schema as well as downstream tools. For this reason, this diff does the following. 1. Only allow int, double, bool and string values to be processed as kwinputs for JSON output. We can handle lists if needed in the future. 2. Make sure that any boolean is lowercase when a string so that the JSON does not break when parsing it 3. Force stream parameter to be an int Test Plan: Added unit tests to ensure that the list of requirements above is true for kwargs only. Differential Revision: D62304843 Pull Request resolved: https://github.com/pytorch/pytorch/pull/135365 Approved by: https://github.com/aaronenyeshi	2024-09-10 18:44:05 +00:00
Shivam Raikundalia	195ac85fb6	[Profiler] Allow kwinputs to be non-string values (#134893 ) Summary: When we process keyword arguments in profiler today we assume that all values will be strings. This breaks HTA because it assumes that "stream" and other values similar to it will be ints. To fix this we will only put quotes around strings for ivalues. Test Plan: Add chrome trace export in unit tests and check that stream does not have quotes around it Differential Revision: D62056059 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134893 Approved by: https://github.com/sanrise, https://github.com/izaitsevfb	2024-09-04 16:34:10 +00:00
Shivam Raikundalia	bcfc560aea	[Profiler/CPU] Add Test for Dynamic Activity Toggling [4/n] (#134149 ) Summary: Add tests that check function events for dynamic activity toggling for both GPU and CPU events. Also added comments from previous GH comments Test Plan: Make sure all tests pass Differential Revision: D61617514 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134149 Approved by: https://github.com/aaronenyeshi	2024-08-23 01:13:42 +00:00
Shivam Raikundalia	9c2d119194	[Profiler/CPU] Add API for Dynamic Activity Toggling [3/n] (#133353 ) Summary: In this diff, we add the CPU activity implementation of being able to dynamically toggle profiling in between steps. To do this we remove the callbacks for Torch Ops and add them back in when an enable call is made. This diff also adds some support code for doing the same in python; however, the python stack comes with its own set of compilcations when enabling this feature. For one, we get into a scenario where the python stack during the toggle never gets an exit as it the tracing gets turned off which makes for some tricky post processing. For this reason, we can leave the python dynamic toggling off for now and revisit if there is enough demand. Test Plan: Got the following tracing by disabling torch and cuda ops: https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/devvm2185.cco0.facebook.com/rank-0.Aug_13_13_03_02.606577.pt.trace.json.gz&bucket=gpu_traces Differential Revision: D61221497 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133353 Approved by: https://github.com/sanrise, https://github.com/aaronenyeshi	2024-08-16 16:36:57 +00:00
Shivam Raikundalia	d2ecdcb2f7	[Profiler] Add API for Dynamic Activity Toggling [2/n] (#133035 ) Summary: During PT2 there are many GPU/CPU events that are unneccessary to profile in between a given step. To remedy this, we can add an API that takes in a list of activities and an arg to toggle said activies or not. For this diff we are adding the profiler API to propogate down to kineto (and in the future the collection.cpp logic). Subsequent diffs will be added for CPU toggling and e2e testing. Test Plan: Tested by toggling backward gpu traces off and got following trace: https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/devvm2185.cco0.facebook.com/rank-0.Jul_31_13_40_55.3251726.pt.trace.json.gz&bucket=gpu_traces Reviewed By: aaronenyeshi Differential Revision: D60541767 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133035 Approved by: https://github.com/aaronenyeshi	2024-08-09 21:54:54 +00:00
Shivam Raikundalia	6f275ae4d0	Add kwinputs to Kineto Traces (#130373 ) Summary: On the autograd side of things, we are currently saving the kwinputs but we aren't doing anything with them on the profiler side. This diff enables the use of the kwinputs for both FunctionEvents and Chrome Traces. Test Plan: Added unit testing for both chrome traces and FunctionEvents. Used RecordFunctionFast to test kwinputs since test already had kwargs being passed in but not tested. Differential Revision: D59472345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130373 Approved by: https://github.com/davidberard98	2024-07-14 00:40:59 +00:00
Haiping Zhao	c0782e7c81	Kineto profiler: collecting observer traces from C++ child threads (#128743 ) Summary: In a C++ program, if we have child threads doing GPU work, it would be nice to get traces of those threads as well. The problem is, pushProfilingCallbacks() is not called on child threads, therefore, no observer traces are collected on these threads, entirely missing in the final output. This diff provides a new API that a child thread may elect to call to register itself onto the profiler that was started in main thread (or whatever the Python thread that manages the profiler). Test Plan: ``` buck2 test @mode/opt //caffe2/test:profiler_test_cpp_thread ``` Reviewed By: aaronenyeshi Differential Revision: D56669942 Pull Request resolved: https://github.com/pytorch/pytorch/pull/128743 Approved by: https://github.com/aaronenyeshi	2024-06-29 00:44:30 +00:00
Shivam Raikundalia	1d0efedc85	[Profiler] Add TSC Clock Callback to CUPTI (#125036 ) Summary: Right now we use the default clock for CUPTI which is not monotonic nor particularly fast. We have already added the Kineto side of the implementation here: https://www.internalfb.com/diff/D56525885 This diff only adds the compile flags such that the TSC format is used and sets the converter using a libkineto call in the profiler Test Plan: Obtained following trace using resnet test: https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/devvm2185.cco0.facebook.com/rank-0.Apr_25_11_03_18.3862943.pt.trace.json.gz&bucket=gpu_traces TBD: Add benchmarks Differential Revision: D56584521 Pull Request resolved: https://github.com/pytorch/pytorch/pull/125036 Approved by: https://github.com/aaronenyeshi	2024-06-27 21:07:43 +00:00
Chen, Zejun	a028e5862d	[profiler] Directly use end_ns to create the FunctionEvent instead of using start_ns + duration_ns in pytorch profiler post processing for checking parent-child precisely (#129554 ) Use the raw end_ns directly, instead of the sum of start_ns and duration_ns, in order to avoid negative CPU time in profiler. Fix https://github.com/pytorch/pytorch/issues/101861 Pull Request resolved: https://github.com/pytorch/pytorch/pull/129554 Approved by: https://github.com/gujinghui, https://github.com/aaronenyeshi	2024-06-27 10:46:05 +00:00
Shivam Raikundalia	d3b8230639	Fix profiler_kineto Clang errors (#128464 ) Summary: There are clang errors in profiler_kineto. It would probably be a good idea to fix them as the file is already quite dense. Test Plan: Make sure all on Phabricator all tests under static_tests/lint_root pass Differential Revision: D58431005 Pull Request resolved: https://github.com/pytorch/pytorch/pull/128464 Approved by: https://github.com/aaronenyeshi	2024-06-13 03:10:50 +00:00
FEI	b950217f19	Support third-party devices emit a range for each autograd operator (#125822 ) Fixes #125752 Pull Request resolved: https://github.com/pytorch/pytorch/pull/125822 Approved by: https://github.com/aaronenyeshi	2024-05-15 05:06:24 +00:00
Richard Barnes	ed327876f5	[codemod] `c10:optional` -> `std::optional` (#126135 ) Generated by running the following from PyTorch root: ``` find . -regex ".*\.$cpp\\|h\\|cu\\|hpp\\|cc\\|cxx$$" \| grep -v "build/" \| xargs -n 50 -P 4 perl -pi -e 's/c10::optional/std::optional/' ``` `c10::optional` is just an alias for `std::optional`. This removes usages of that alias in preparation for eliminating it entirely. Pull Request resolved: https://github.com/pytorch/pytorch/pull/126135 Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/albanD, https://github.com/aaronenyeshi	2024-05-14 19:35:51 +00:00
David Berard	a00a99e801	[profiler] Report strides in json trace (#125851 ) We already collect strides, we just don't report them anywhere. Note: this depends on concrete input collection being enabled, which I think is currently not the case internally. Differential Revision: [D57165421](https://our.internmc.facebook.com/intern/diff/D57165421) Pull Request resolved: https://github.com/pytorch/pytorch/pull/125851 Approved by: https://github.com/Chillee, https://github.com/aaronenyeshi	2024-05-14 18:24:24 +00:00
Florian	7ad6dc2cf3	[Profiler][PrivateUse1] Profiler support PrivateUse1 key (#124818 ) Summary: 1.Package public headers of kineto if USE_KINETO so that they can be used by PrivateUse1 user. 2.Add PrivateUse1 key to ActivityType. 3. Support PrivateUse1 key in function deviceTypeFromActivity and _supported_activities. 4. Fix some bugs when processing profiler results. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124818 Approved by: https://github.com/aaronenyeshi	2024-04-24 18:52:08 +00:00
PyTorch MergeBot	36f6928a37	Revert "[Profiler][PrivateUse1] Profiler support PrivateUse1 key (#120556 )" This reverts commit 41613a0803f7cde7956f039bc80f94253b0843f9. Reverted https://github.com/pytorch/pytorch/pull/120556 on behalf of https://github.com/aaronenyeshi due to Breaks GPU Chrome trace UI ([comment](https://github.com/pytorch/pytorch/pull/120556#issuecomment-2061578951))	2024-04-17 15:38:14 +00:00
Florian	41613a0803	[Profiler][PrivateUse1] Profiler support PrivateUse1 key (#120556 ) Summary: 1.Package public headers of kineto if USE_KINETO so that they can be used by PrivateUse1 user. 2.Add PrivateUse1 key to ActivityType. 3. Support PrivateUse1 key in function deviceTypeFromActivity and _supported_activities. 4. Fix some bugs when processing profiler results. Co-authored-by: albanD <desmaison.alban@gmail.com> Co-authored-by: Aaron Shi <enye.shi@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/120556 Approved by: https://github.com/aaronenyeshi	2024-04-12 14:28:19 +00:00
Shivam Raikundalia	3ebbeb75fd	[Profiler] Make Kineto traces export ns granularity for finer timestamps (#122425 ) (#123650 ) Summary: Kineto traces use microsecond level granularity because of chrome tracing defaults to that precision. Fix by adding preprocessor flag to TARGETS and BUCK files. Also remove any unnecessary ns to us conversions made in the profiler itself. This diff contains profiler changes only. Libkineto changes found in D54964435. Test Plan: Check JSON and chrome tracing to make sure values are as expected. Tracing with flags enabled should have ns precision. Tracings without flags should be same as master. Zoomer: https://www.internalfb.com/intern/zoomer/?profiling_run_fbid=796886748550189 Ran key_averages() to make sure FunctionEvent code working as expected: -- ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ProfilerStep* 0.74% 3.976ms 64.40% 346.613ms 69.323ms 0.000us 0.00% 61.710ms 12.342ms 5 Optimizer.zero_grad#SGD.zero_grad 0.76% 4.109ms 0.76% 4.109ms 821.743us 0.000us 0.00% 0.000us 0.000us 5 ## forward ## 6.89% 37.057ms 27.19% 146.320ms 29.264ms 0.000us 0.00% 58.708ms 11.742ms 5 aten::conv2d 0.22% 1.176ms 7.74% 41.658ms 157.199us 0.000us 0.00% 27.550ms 103.962us 265 aten::convolution 0.79% 4.273ms 7.52% 40.482ms 152.762us 0.000us 0.00% 27.550ms 103.962us 265 aten::_convolution 0.69% 3.688ms 6.73% 36.209ms 136.637us 0.000us 0.00% 27.550ms 103.962us 265 aten::cudnn_convolution 6.04% 32.520ms 6.04% 32.520ms 122.719us 27.550ms 8.44% 27.550ms 103.962us 265 aten::add_ 2.42% 13.045ms 2.42% 13.045ms 30.694us 12.700ms 3.89% 12.700ms 29.882us 425 aten::batch_norm 0.19% 1.027ms 8.12% 43.717ms 164.971us 0.000us 0.00% 16.744ms 63.185us 265 aten::_batch_norm_impl_index 0.31% 1.646ms 7.93% 42.691ms 161.096us 0.000us 0.00% 16.744ms 63.185us 265 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Differential Revision: D55925068 Pull Request resolved: https://github.com/pytorch/pytorch/pull/123650 Approved by: https://github.com/aaronenyeshi	2024-04-11 04:29:20 +00:00
PyTorch MergeBot	c66d503194	Revert "[Profiler][submodule] Make Kineto traces export ns granularity for finer timestamps (#122425 )" This reverts commit 6f7dd2f84a4237b31eac29054b86a5284ef6cb6b. Reverted https://github.com/pytorch/pytorch/pull/122425 on behalf of https://github.com/malfet due to Breaks ROCM builds ([comment](https://github.com/pytorch/pytorch/pull/122425#issuecomment-2041129241))	2024-04-06 16:19:00 +00:00
Shivam Raikundalia	6f7dd2f84a	[Profiler][submodule] Make Kineto traces export ns granularity for finer timestamps (#122425 ) Summary: Kineto traces use microsecond level granularity because of chrome tracing defaults to that precision. Fix by adding preprocessor flag to TARGETS and BUCK files. Also remove any unnecessary ns to us conversions made in the profiler itself. This diff contains profiler changes only. Libkineto changes found in D54964435. Test Plan: Check JSON and chrome tracing to make sure values are as expected. Tracing with flags enabled should have ns precision. Tracings without flags should be same as master. Tracing with flags enabled: https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/devvm2185.cco0.facebook.com/rank-0.Mar_18_14_37_22.4155151.pt.trace.json.gz&bucket=gpu_traces Tracing without flags enabled: https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/devvm2185.cco0.facebook.com/rank-0.Mar_18_14_39_15.4166047.pt.trace.json.gz&bucket=gpu_traces Tracing on main: https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/devvm2185.cco0.facebook.com/rank-0.Mar_18_14_42_43.4177559.pt.trace.json.gz&bucket=gpu_traces Ran key_averages() to make sure FunctionEvent code working as expected: -- ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ProfilerStep* 0.74% 3.976ms 64.40% 346.613ms 69.323ms 0.000us 0.00% 61.710ms 12.342ms 5 Optimizer.zero_grad#SGD.zero_grad 0.76% 4.109ms 0.76% 4.109ms 821.743us 0.000us 0.00% 0.000us 0.000us 5 ## forward ## 6.89% 37.057ms 27.19% 146.320ms 29.264ms 0.000us 0.00% 58.708ms 11.742ms 5 aten::conv2d 0.22% 1.176ms 7.74% 41.658ms 157.199us 0.000us 0.00% 27.550ms 103.962us 265 aten::convolution 0.79% 4.273ms 7.52% 40.482ms 152.762us 0.000us 0.00% 27.550ms 103.962us 265 aten::_convolution 0.69% 3.688ms 6.73% 36.209ms 136.637us 0.000us 0.00% 27.550ms 103.962us 265 aten::cudnn_convolution 6.04% 32.520ms 6.04% 32.520ms 122.719us 27.550ms 8.44% 27.550ms 103.962us 265 aten::add_ 2.42% 13.045ms 2.42% 13.045ms 30.694us 12.700ms 3.89% 12.700ms 29.882us 425 aten::batch_norm 0.19% 1.027ms 8.12% 43.717ms 164.971us 0.000us 0.00% 16.744ms 63.185us 265 aten::_batch_norm_impl_index 0.31% 1.646ms 7.93% 42.691ms 161.096us 0.000us 0.00% 16.744ms 63.185us 265 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Differential Revision: D55087993 Pull Request resolved: https://github.com/pytorch/pytorch/pull/122425 Approved by: https://github.com/aaronenyeshi	2024-04-06 06:04:28 +00:00
briancoutinho	239abb2a14	add record function Id to Torch ops (#122948 ) Fixes #122833 Add record function ID as additional metadata for PyTorch op events. This enables correlation with PyTorch Execution traces. * Adds a new field "Record function id" for all PyTorch Op events. This value comes from `handle` in record function callback. This is a unique ID to correlate with the PyTorch Execution Trace. * Updated unit tests. ## Test Run a simple example uncommenting the `print trace` in the test below ```pytest test/profiler/test_profiler.py -k test_execution_trace_with_kineto``` We can see the new record function ID field in ET and Kineto Note: the name is "Record function id" now to match the other strings Kineto ![Screenshot 2024-03-28 at 5 48 55 PM](https://github.com/pytorch/pytorch/assets/6922212/08243698-8167-4ea0-9be6-2aede9fe9c43) Execution Trace. ![Screenshot 2024-03-28 at 5 49 14 PM](https://github.com/pytorch/pytorch/assets/6922212/22e4e876-9fbe-43da-9150-dae2927b6e31) We also see for cases where "External ID" is drifting but "Record function ID" is still matching. Kineto ![Screenshot 2024-03-28 at 5 50 34 PM](https://github.com/pytorch/pytorch/assets/6922212/60905ea4-0da1-4c4b-a0d0-24500e8f7006) Execution Trace ![Screenshot 2024-03-28 at 5 50 28 PM](https://github.com/pytorch/pytorch/assets/6922212/680db244-6725-48bf-a7ab-995c658a01ee) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122948 Approved by: https://github.com/davidberard98, https://github.com/shengfukevin	2024-04-06 01:35:03 +00:00
cyy	cb0886ecf2	[DeviceIndex][4/N] Use DeviceIndex in more places (#119741 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/119741 Approved by: https://github.com/aaronenyeshi, https://github.com/ezyang	2024-02-14 00:29:10 +00:00
cyy	8a3c241094	Remove unused header inclusion (#119667 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/119667 Approved by: https://github.com/Skylion007	2024-02-12 05:36:25 +00:00
cyy	d250b2158e	[4/N] Fixes clang-tidy warnings in header files (#115163 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/115163 Approved by: https://github.com/Skylion007	2023-12-06 05:00:01 +00:00
cyy	41e8632ca4	[1/N] Fix clang-tidy warnings in torch/csrc/profiler (#112360 ) This PR fixes some clang-tidy warnings in torch/csrc/profiler Pull Request resolved: https://github.com/pytorch/pytorch/pull/112360 Approved by: https://github.com/ezyang	2023-11-10 07:37:23 +00:00
Aaron Enye Shi	63c089b09d	[c10] Move profiler clock to libc10 for timestamps (#111972 ) Summary: Move the profiler's Approximate Clock from libtorch to libc10. The main reason is to allow c10 features to get time. The clock is using TSC when available for performance. CUDA Caching Allocator's implementation of memory snapshot will add the timestamps to memory events with this same clock in subsequent diff. Test Plan: CI Differential Revision: D50601935 Pulled By: aaronenyeshi Pull Request resolved: https://github.com/pytorch/pytorch/pull/111972 Approved by: https://github.com/davidberard98	2023-10-27 16:18:40 +00:00
Yue Dong	ed15fa7cc2	[Kineto][NCCL][3/n] Get the NCCL communication info from PARAM_COMMS_INFO (#111846 ) This diff enables the functionality to get the NCCL communication metadata from `c10::DebugInfoKind::PARAM_COMMS_INFO` available in `ThreadLocalDebugInfo`. To make the overhead lighweight and avoid comparing the function name on each op, we add the method `bool isNcclMeta()`, which decided during initialization. Differential Revision: [D50439211](https://our.internmc.facebook.com/intern/diff/D50439211/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111846 Approved by: https://github.com/aaronenyeshi ghstack dependencies: #111842, #111843	2023-10-25 20:35:06 +00:00
PyTorch MergeBot	9c7391ea36	Revert " [1/N] Apply clang-tidy to c10 cuda files (#111137 )" This reverts commit 43b023694eea4348fa28e8028fa7445d6375860c. Reverted https://github.com/pytorch/pytorch/pull/111137 on behalf of https://github.com/malfet due to Was reverted internally due to the failures in torch.cuda.memory_stats(device=0) (presumably) ([comment](https://github.com/pytorch/pytorch/pull/111137#issuecomment-1769274103))	2023-10-18 20:32:53 +00:00
cyy	43b023694e	[1/N] Apply clang-tidy to c10 cuda files (#111137 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/111137 Approved by: https://github.com/zou3519, https://github.com/Skylion007	2023-10-17 04:52:50 +00:00

1 2 3 4 5

216 Commits