pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-21 13:44:15 +08:00

Author	SHA1	Message	Date
Yuanyuan Chen	36871622f1	[2/N] Mark unused parameters in C++ code (#165121 ) This is follow-up of #164912 to mark unused C++ parameters to improve code readability. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165121 Approved by: https://github.com/Skylion007	2025-10-15 03:04:39 +00:00
PyTorch MergeBot	c6329524d8	Revert "Add magic TORCH_MAKE_PYBIND_ENUM_FASTER macro (#163527 )" This reverts commit 50c0550f5a5b1e35885d892081a7d5115d8b4489. Reverted https://github.com/pytorch/pytorch/pull/163527 on behalf of https://github.com/swolchok due to breaking import torch in debug builds, see #164297 ([comment](https://github.com/pytorch/pytorch/pull/163527#issuecomment-3361919142))	2025-10-02 15:42:42 +00:00
Scott Wolchok	50c0550f5a	Add magic TORCH_MAKE_PYBIND_ENUM_FASTER macro (#163527 ) See comment on the macro definition. In short, pybind11 3.x added `py::native_enum`, and also had to add overhead for that new way to bind enums on the critical path for calling functions that take regular old `py::enum_`s as arguments (for example, `__eq__`). Differential Revision: [D82873169](https://our.internmc.facebook.com/intern/diff/D82873169/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/163527 Approved by: https://github.com/ezyang	2025-09-26 17:59:22 +00:00
Karhou Tam	39df24fe04	[Code Clean] Replace `std::runtime_error` with `TORCH_CHECK` (#163610 ) Including: - `torch/csrc/instruction_counter` - `torch/csrc/lazy` - `torch/csrc/monitor` - `torch/csrc/profiler` - `torch/csrc/dynamo` Fixes part of #148114 Personal mistake about (PR #163317), this PR does the same thing and PR #163317 has already been approved by @albanD. This is a personal mistake on my part, and I'm so sorry about that. Hope you won't mind @albanD. 🥹 Pull Request resolved: https://github.com/pytorch/pytorch/pull/163610 Approved by: https://github.com/albanD, https://github.com/Skylion007	2025-09-26 04:52:48 +00:00
Mihai Polceanu	6fa3715c12	Expose Kineto event metadata in PyTorch Profiler events (#161624 ) ## Overview This PR allows the profiler users to access `Kineto` and `TorchOp` metadata in JSON string format through a new `metadata_json` attribute in `FunctionEvent` objects, which is triggered through a new `expose_kineto_event_metadata` flag in `ExperimentalConfig`. ## Testing A unit test was added to validate functionality. ## Documentation Added/updated function doc strings where appropriate. ## Example output ```python import torch from torch.profiler import profile with profile(experimental_config=torch._C._profiler._ExperimentalConfig(expose_kineto_event_metadata=True)) as prof: res = torch.mm(torch.rand(1024, 1024), torch.rand(1024, 1024)) for event in prof.events(): print(f'name: {event.key}, metadata: {event.metadata_json}') ``` ``` name: aten::rand, metadata: "Ev Idx": 0 name: aten::empty, metadata: "Ev Idx": 1 name: aten::uniform_, metadata: "Ev Idx": 2 name: aten::rand, metadata: "Ev Idx": 3 name: aten::empty, metadata: "Ev Idx": 4 name: aten::uniform_, metadata: "Ev Idx": 5 name: aten::mm, metadata: "Ev Idx": 6 name: aten::resolve_conj, metadata: "Ev Idx": 7 name: aten::resolve_conj, metadata: "Ev Idx": 8 name: aten::resolve_conj, metadata: "Ev Idx": 9 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/161624 Approved by: https://github.com/sraikund16	2025-09-25 14:58:30 +00:00
Mu-Chu Lee	6b5ad5f211	[Kineto] Add list of string parsing for profiler (#163593 ) Summary: We add the parsing for list of string. This is needed for AOTInductor profiling for input information of Triton kernels. Test Plan: Included in commit. test_profiler_op_event_kwargs_list_of_strings Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/163593 Approved by: https://github.com/sraikund16	2025-09-23 22:45:49 +00:00
Shivam Raikundalia	dae5beae8e	[RecordFunction] Add Scope for Record Function Fast (#162661 ) Differential Revision: D82164587 Pull Request resolved: https://github.com/pytorch/pytorch/pull/162661 Approved by: https://github.com/davidberard98	2025-09-15 21:01:47 +00:00
Shivam Raikundalia	3373b074f5	[Profiler] Add GC Events to Python Stack Tracer (#161209 ) Summary: Adds Python Garbage Collection to Kineto Traces and Profiler FunctionEvents. Create custom cpp callback in profiler_python.cpp. Then define a python function with cpp and register that callback for all python garbage collection. We don't worry about thread safety in this case because we are only doing init/teardown for main thread while holding GIL. Currently we are hiding this behind experimental config because python tracing tends to be unstable especially when adding any new feature. If this is found to not add too much overhead we can set this to on by default. NOTE: To enable this you need both with_stack=True and the experimental config on! Test Plan: Ran trace with GC induced and saw it on trace Also added a test Rollback Plan: Differential Revision: D80491146 Pull Request resolved: https://github.com/pytorch/pytorch/pull/161209 Approved by: https://github.com/ngimel	2025-08-22 22:11:25 +00:00
Denghui Dong	e92e3eaf4e	[Profiler] the doc of _ExperimentalConfig is incorrectly truncated by commas (#156586 ) Hi team, Please help review this trivial fix. Without this change: ``` python >>> import torch >>> print(torch._C._profiler._ExperimentalConfig.__init__.__doc__) __init__(self: torch._C._profiler._ExperimentalConfig, profiler_metrics: list[str] = [], profiler_measure_per_kernel: bool = False, verbose: bool = False, performance_events: list[str] = [], enable_cuda_sync_events: bool = False, adjust_profiler_step: bool = False, disable_external_correlation: bool = False, profile_all_threads: bool = False, capture_overload_names: bool = False) -> None capture_overload_names (bool) : whether to include ATen overload names in the profile ``` With this change: ```python >>> import torch >>> print(torch._C._profiler._ExperimentalConfig.__init__.__doc__) __init__(self: torch._C._profiler._ExperimentalConfig, profiler_metrics: list[str] = [], profiler_measure_per_kernel: bool = False, verbose: bool = False, performance_events: list[str] = [], enable_cuda_sync_events: bool = False, adjust_profiler_step: bool = False, disable_external_correlation: bool = False, profile_all_threads: bool = False, capture_overload_names: bool = False) -> None An experimental config for Kineto features. Please note thatbackward compatibility is not guaranteed. profiler_metrics : a list of CUPTI profiler metrics used to measure GPU performance events. If this list contains values Kineto runs in CUPTI profiler mode profiler_measure_per_kernel (bool) : whether to profile metrics per kernel or for the entire measurement duration. verbose (bool) : whether the trace file has `Call stack` field or not. performance_events : a list of profiler events to be used for measurement. enable_cuda_sync_events : for CUDA profiling mode, enable adding CUDA synchronization events that expose CUDA device, stream and event synchronization activities. This feature is new and currently disabled by default. adjust_profiler_step (bool) : whether to adjust the profiler step to match the parent python event duration. This feature is new and currently disabled by default. disable_external_correlation (bool) : whether to disable external correlation profile_all_threads (bool) : whether to profile all threads capture_overload_names (bool) : whether to include ATen overload names in the profile ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/156586 Approved by: https://github.com/sraikund16, https://github.com/cyyever	2025-07-16 04:10:49 +00:00
fuwenguang	f860992db5	Add a custom profiler configuration option (#151656 ) We aim to pass some configuration options to our custom Kineto backend via ExperimentalConfig,, so we added a `custom_profiler_config` parameter. Requires https://github.com/pytorch/kineto/pull/1077 , Pull Request resolved: https://github.com/pytorch/pytorch/pull/151656 Approved by: https://github.com/sraikund16	2025-07-01 00:36:09 +00:00
Xuehai Pan	ced90016c1	[BE][7/16] fix typos in torch/ (torch/csrc/) (#156317 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156317 Approved by: https://github.com/albanD ghstack dependencies: #156313, #156314, #156315, #156316	2025-06-23 02:57:41 +00:00
PyTorch MergeBot	035a68d25a	Revert "[BE][7/16] fix typos in torch/ (torch/csrc/) (#156317 )" This reverts commit ee72815f1180fe2d8bcdb23493999256169ac2fa. Reverted https://github.com/pytorch/pytorch/pull/156317 on behalf of https://github.com/atalman due to export/test_torchbind.py::TestCompileTorchbind::test_compile_error_on_input_aliasing_contents_backend_aot_eager [GH job link](https://github.com/pytorch/pytorch/actions/runs/15804799771/job/44548489912) [HUD commit link](`c95f7fa874`) ([comment](https://github.com/pytorch/pytorch/pull/156313#issuecomment-2994171213))	2025-06-22 12:31:56 +00:00
Xuehai Pan	ee72815f11	[BE][7/16] fix typos in torch/ (torch/csrc/) (#156317 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/156317 Approved by: https://github.com/albanD ghstack dependencies: #156313, #156314, #156315, #156316	2025-06-22 08:43:41 +00:00
Nikita Shulga	c4d1ff02f8	[Lint] Update clang-format to 19.1.4 (#153889 ) All changes other than the one to `tools/linter/adapters/s3_init_config.json` are generated by newer clang-format Pull Request resolved: https://github.com/pytorch/pytorch/pull/153889 Approved by: https://github.com/cyyever, https://github.com/atalman	2025-05-20 14:12:46 +00:00
Zizeng Meng	a762dd1f67	[Memento] On-demand mode using without torch api (#153171 ) Summary: CUDA Post: https://fb.workplace.com/groups/ai.efficiency.tools.users/permalink/2020094788475989/ # Context In this diff, we want to enable the on-demand mode of memory snapshot to allow user to trace any remote process via dyno command line. # Design decision How do we send on-demand signal to remote process We leverage the dyno-Kineto approach. Since dyno is running on all machine in Meta, it can send a request to the remote machine to start the Kineto. Kineto will start another thread for memoryProfiler (https://fburl.com/code/dxsmmrok) why we use different approach as CUDA On CUDA side, we are using pybind to load torch Module and invoke the python api to start/stop the profiling. However, this requires us to compile the whole torch binary in the predictor which is not recommended by runtime(andruwang) Thus, we decide to use the CPP api directly to avoid un-necessary dependency why the snapshot is saved as json string directly instead of pickle Pickle is primarily designed for use with Python and doesn't have well support in cpp. Also, it is hard for user to download the snapshot file and open locally. Due to the dependency issue, it is hard to import the gzip/pickle library to decode the data. Thus, let's use JSON for now. I will work on the visualizer to fasten the render and support other format later. Plan: * Now, we will encoded file into gz for MTIA ondemand only and update the visualizer to support both type. * Update auto-trace and CUDA side to encode in gzip as well * Fully remove pickle dependency. Test Plan: # Remote cogwheel test Servicelab: https://fburl.com/servicelab/pckux7a3 snapshot file manifold: https://fburl.com/manifold/fnotk18c snapshot file in pastry: P1805522232 Visualization on D74399684 {F1977786422} # Local Predictor Test url: https://fburl.com/pytorch_memory_visualizer/y06kskkm {F1977787329} Differential Revision: D74179606 Pull Request resolved: https://github.com/pytorch/pytorch/pull/153171 Approved by: https://github.com/sraikund16	2025-05-15 06:07:04 +00:00
cyy	8fa81a6066	Enable misc-use-internal-linkage check and apply fixes (#148948 ) Enables clang-tidy rule [`misc-use-internal-linkage`](https://clang.llvm.org/extra/clang-tidy/checks/misc/use-internal-linkage.html). This new check was introduced in Clang-Tidy 18 and is available due to recent update of Clang-Tidy 19. The check marks functions and variables used only in the translation unit as static. Therefore undesired symbols are not leaked into other units, more link time optimisations are possible and the resulting binaries may be smaller. The detected violations were mostly fixed by using static. In other cases, the symbols were indeed consumed by others files, then their declaring headers were included. Still some declarations were wrong and have been fixed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/148948 Approved by: https://github.com/Skylion007	2025-03-12 14:22:56 +00:00
wdziurdz	edc3ca577e	[Profiler] Add profiler activity for HPU devices (#148182 ) Fixes #148181 Pull Request resolved: https://github.com/pytorch/pytorch/pull/148182 Approved by: https://github.com/sraikund16	2025-03-05 01:37:48 +00:00
Mwiza Kunda	b5873292c6	Add overload names to profiler trace (#143114 ) Currently, recorded profiler events for aten ops do not store overload names. It would be useful to know which overloads are actually called to analyse performance. For example, consider the following dispatch trace which occurs if there is a fallthrough kernel registered for aten::add: ``` [call] op=[aten::add.Tensor], key=[AutogradCPU] [redispatch] op=[aten::add.Tensor], key=[Undefined] [call] op=[aten::empty.memory_format], key=[BackendSelect] [redispatch] op=[aten::empty.memory_format], key=[CPU] [call] op=[aten::add.out], key=[CPU] ``` In this case, aten::add.out is a child of aten::add.Tensor, however the current profiler trace provides no way to differentiate aten op calls. See the added unit test for a more detailed example. Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/143114 Approved by: https://github.com/sraikund16	2025-03-05 01:00:29 +00:00
dilililiwhy	7c52c97a65	Expose several APIs to public (torch python APIs) (#144525 ) Fixes #144302 Try to expose several APIs to public for privateuse1 scenario. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144525 Approved by: https://github.com/cyyever, https://github.com/albanD	2025-01-15 14:34:45 +00:00
cyy	b0be30dd79	[19/N] Fix extra warnings brought by clang-tidy-17 (#144448 ) Apply more clang-tidy fixes. There was a bug introduced by #144014 due to incorrect namespace concatenation which is reverted here. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144448 Approved by: https://github.com/albanD	2025-01-09 15:58:05 +00:00
Natalia Gimelshein	2ab698e708	allow profiling on all threads via experimentalConfig (#143659 ) In some situations we want to profile calls coming from all threads (similar to on-demand), not just the thread that started profiling and the spawned threads that would inherit KinetoThreadLocal state. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143659 Approved by: https://github.com/sraikund16	2024-12-23 20:41:27 +00:00
Shivam Raikundalia	ff373171d0	[Profiler] Add Optional Flag to turn off external correlations v2 (#143314 ) Summary: The original diff got reverted because its base commit was on a broken version of pytorch that was failing rocm tests. There is no indication that this diff had any effect on rocm. Had trouble rebasing the GH pr after revert and accidentally closed the PR so submitting again . Test Plan: See original PR with same name Differential Revision: D67293040 Pull Request resolved: https://github.com/pytorch/pytorch/pull/143314 Approved by: https://github.com/leitian, https://github.com/aaronenyeshi	2024-12-16 23:49:13 +00:00
PyTorch MergeBot	9ed045eae9	Revert "[Profiler] Add Optional Flag to turn off external correlations (#142516 )" This reverts commit b29fc52f827cc4b4336ecd24cc0a019ec9cf24b6. Reverted https://github.com/pytorch/pytorch/pull/142516 on behalf of https://github.com/huydhn due to Sorry for reverting your change but the test is failing on ROCm ([comment](https://github.com/pytorch/pytorch/pull/142516#issuecomment-2543431758))	2024-12-15 03:34:37 +00:00
Shivam Raikundalia	b29fc52f82	[Profiler] Add Optional Flag to turn off external correlations (#142516 ) Summary: External Correlations are super spammy and oftentimes not even useful. Add flag during init to remove them entirely Test Plan: https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/devvm2185.cco0.facebook.com/rank-0.Dec_10_12_33_31.531106.pt.trace.json.gz&bucket=gpu_traces Differential Revision: D67048206 Pull Request resolved: https://github.com/pytorch/pytorch/pull/142516 Approved by: https://github.com/ngimel	2024-12-13 22:32:09 +00:00
cyy	40fb738197	Use Wextra-semi (#140236 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/140236 Approved by: https://github.com/ezyang	2024-11-13 02:15:16 +00:00
Shivam Raikundalia	ac7acfb894	[Profiler] Create Auto-Trace Frontend for Trace ID (#139310 ) Summary: This PR adds Auto-Trace implementation for Trace ID. By default, the python side will generate a uuid in the same format as the one set in the backend by kineto. Upon running an auto-trace, the python generated trace id will overwrite the one set in kineto using the Config variable. Since we don't expect users to generate on-demand traces after an auto-trace we can simply keep overwriting the backend trace id whenever autotrace is ran. If we one day want to eventually do something like this, we simply have to add a call in kineto on the backend to generate a new ID upon start of profiling. We also implement a custom callback in the frontend such that users can generate their own trace ids if they wish to. This works similarly as the default, only difference being that they have to manually set this callback after a profiler is generated. We use a specific call to set this rather then putting it in the frontend initializer in case users want to change the trace_id for different repeats. Test Plan: Tested both default and custom callbacks using the verbose prints added. Trace ids on the frontend and the prints on the backend for the manifold upload matched. Differential Revision: D65178308 Pull Request resolved: https://github.com/pytorch/pytorch/pull/139310 Approved by: https://github.com/shengfukevin	2024-10-31 19:02:57 +00:00
Shivam Raikundalia	8486d3df69	[Profiler] Hide ProfilerStep Alignment behind Experimental Config (#137668 ) Summary: Aligning ProfilerStep# annotation can be useful for visual purposes but it affects downstream tools like HTA to misreport how long each step took. For this reason, lets give users the option to turn on this alignment manually but also turn it off by default Test Plan: Alignment off: https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/devvm2185.cco0.facebook.com/rank-0.Oct_09_16_11_48.2543945.pt.trace.json.gz&bucket=gpu_traces Alignment on: https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/devvm2185.cco0.facebook.com/rank-0.Oct_09_16_08_27.2518391.pt.trace.json.gz&bucket=gpu_traces Differential Revision: D64146115 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137668 Approved by: https://github.com/aaronenyeshi	2024-10-11 22:57:05 +00:00
Xuehai Pan	8962610247	[BE][clang-format] make macro `PyObject_HEAD_INIT(type)` and `PyVarObject_HEAD_INIT(type, size)` have its own line (#136949 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/136949 Approved by: https://github.com/albanD, https://github.com/eqy ghstack dependencies: #136945	2024-10-02 18:39:22 +00:00
Xuehai Pan	89c37be6b7	[BE][clang-format] make macro `PyObject_HEAD` have its own line (#136945 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/136945 Approved by: https://github.com/albanD	2024-10-02 18:39:21 +00:00
Shivam Raikundalia	9ffcca7060	[Profiler] Handle Tensor Sizes/Strides Parsing Error (#134862 ) Summary: Currently some jobs are encountering the following trace, P1539415198. This suggests that when we are parsing through tensors the path is prone to encountering an invalid address. This is is possibly occurring because for some reason the sizes() and strides() of a Tensor seem to not be of the same dimensions. We assume such when iterating through the shapes to get the Ivalue generator. When browsing some of the tensor implementations, I found that some of the size and stride paths are different which could be the cause of this issue. Regardless, the profiler should be flexible enough to handle such issues without bringing down the whole main thread. If the crashes still persist, it will still give us a data point as to where they are occurring and we can rule out the strides/sizes as the culprit Test Plan: This change doesn't break anything in the happy path, just makes sure the bad path is not exited abruptly. We should use this in order to debug what the events are having mismatching dimensions between sizes and strides. Differential Revision: D62008788 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134862 Approved by: https://github.com/aaronenyeshi	2024-09-03 23:46:38 +00:00
cyy	f4dcf2ae93	[1/N] Change #include <c10/util/Optional.h> to #include <optional> (#128301 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/128301 Approved by: https://github.com/ezyang, https://github.com/r-barnes	2024-07-08 07:03:53 +00:00
PyTorch MergeBot	846bb30e13	Revert "[1/N] Change #include <c10/util/Optional.h> to #include <optional> (#128301 )" This reverts commit bd72e28314d8d63bb347becb8309f5ac7761c6b5. Reverted https://github.com/pytorch/pytorch/pull/128301 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it fails XLA build `bd72e28314`. Please rebase your PR before relanding because I think the failure is hidden by an unrelated broken trunk XLA failure from your current base commit ([comment](https://github.com/pytorch/pytorch/pull/128301#issuecomment-2169035822))	2024-06-15 01:58:20 +00:00
cyy	bd72e28314	[1/N] Change #include <c10/util/Optional.h> to #include <optional> (#128301 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/128301 Approved by: https://github.com/ezyang	2024-06-14 23:21:01 +00:00
cyy	e2a72313e8	Concat namespaces of torch/csrc/profiler code and other fixes (#128606 ) Improve namespaces and modernize codebase of torch/csrc/profiler code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/128606 Approved by: https://github.com/Skylion007, https://github.com/aaronenyeshi	2024-06-13 16:46:34 +00:00
FEI	b950217f19	Support third-party devices emit a range for each autograd operator (#125822 ) Fixes #125752 Pull Request resolved: https://github.com/pytorch/pytorch/pull/125822 Approved by: https://github.com/aaronenyeshi	2024-05-15 05:06:24 +00:00
zdevito	352a893b0c	Fast standalone symbolize for unwinding (#123966 ) We've had issues using addr2line. On certain versions of CentOS it is on a version that has a performance regression making it very slow, and even normallly it is not that fast, taking several seconds even when parallelized for a typical memory trace dump. Folly Symbolize or LLVMSymbolize are fast but it requires PyTorch take a dependency on those libraries to do this, and given the number of environments we run stuff in, we end up hitting cases where we fallback to slow addr2line behavior. This adds a standalone symbolizer to PyTorch similar to the unwinder which has no external dependencies and is ~20x faster than addr2line for unwinding PyTorch frames. I've tested this on some memory profiling runs using all combinations of {gcc, clang} x {dwarf4, dwarf5} and it seems to do a good job at getting line numbers and function names right. It is also careful to route all reads of library data through the `CheckedLexer` object, which ensure it is not reading out of bounds of the section. Errors are routed through UnwindError so that those exceptions get caught and we produce a ?? frame rather than crash. I also added a fuzz test which gives all our symbolizer options random addresses in the process to make sure they do not crash. Differential Revision: [D56828968](https://our.internmc.facebook.com/intern/diff/D56828968) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123966 Approved by: https://github.com/ezyang, https://github.com/aaronenyeshi	2024-05-14 19:39:17 +00:00
albanD	b119e1bcc2	Fix refcount handling for dtype, layout and memory format (#125271 ) Finish fixing https://github.com/pytorch/pytorch/issues/124868 re-use our wrap() utils as much as possible and NewRef in other places. Pull Request resolved: https://github.com/pytorch/pytorch/pull/125271 Approved by: https://github.com/colesbury	2024-05-02 02:34:34 +00:00
PyTorch MergeBot	c0fd7894cc	Revert "Fast standalone symbolize for unwinding (#123966 )" This reverts commit 772ae6da1eb9be1f4238ff993830c56488ecae13. Reverted https://github.com/pytorch/pytorch/pull/123966 on behalf of https://github.com/jeanschmidt due to Breaking internal builds, check D56522678 ([comment](https://github.com/pytorch/pytorch/pull/123966#issuecomment-2076821043))	2024-04-25 10:04:48 +00:00
Florian	7ad6dc2cf3	[Profiler][PrivateUse1] Profiler support PrivateUse1 key (#124818 ) Summary: 1.Package public headers of kineto if USE_KINETO so that they can be used by PrivateUse1 user. 2.Add PrivateUse1 key to ActivityType. 3. Support PrivateUse1 key in function deviceTypeFromActivity and _supported_activities. 4. Fix some bugs when processing profiler results. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124818 Approved by: https://github.com/aaronenyeshi	2024-04-24 18:52:08 +00:00
zdevito	772ae6da1e	Fast standalone symbolize for unwinding (#123966 ) We've had issues using addr2line. On certain versions of CentOS it is on a version that has a performance regression making it very slow, and even normallly it is not that fast, taking several seconds even when parallelized for a typical memory trace dump. Folly Symbolize or LLVMSymbolize are fast but it requires PyTorch take a dependency on those libraries to do this, and given the number of environments we run stuff in, we end up hitting cases where we fallback to slow addr2line behavior. This adds a standalone symbolizer to PyTorch similar to the unwinder which has no external dependencies and is ~20x faster than addr2line for unwinding PyTorch frames. I've tested this on some memory profiling runs using all combinations of {gcc, clang} x {dwarf4, dwarf5} and it seems to do a good job at getting line numbers and function names right. It is also careful to route all reads of library data through the `CheckedLexer` object, which ensure it is not reading out of bounds of the section. Errors are routed through UnwindError so that those exceptions get caught and we produce a ?? frame rather than crash. I also added a fuzz test which gives all our symbolizer options random addresses in the process to make sure they do not crash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123966 Approved by: https://github.com/ezyang	2024-04-23 15:27:18 +00:00
PyTorch MergeBot	36f6928a37	Revert "[Profiler][PrivateUse1] Profiler support PrivateUse1 key (#120556 )" This reverts commit 41613a0803f7cde7956f039bc80f94253b0843f9. Reverted https://github.com/pytorch/pytorch/pull/120556 on behalf of https://github.com/aaronenyeshi due to Breaks GPU Chrome trace UI ([comment](https://github.com/pytorch/pytorch/pull/120556#issuecomment-2061578951))	2024-04-17 15:38:14 +00:00
Florian	41613a0803	[Profiler][PrivateUse1] Profiler support PrivateUse1 key (#120556 ) Summary: 1.Package public headers of kineto if USE_KINETO so that they can be used by PrivateUse1 user. 2.Add PrivateUse1 key to ActivityType. 3. Support PrivateUse1 key in function deviceTypeFromActivity and _supported_activities. 4. Fix some bugs when processing profiler results. Co-authored-by: albanD <desmaison.alban@gmail.com> Co-authored-by: Aaron Shi <enye.shi@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/120556 Approved by: https://github.com/aaronenyeshi	2024-04-12 14:28:19 +00:00
Shivam Raikundalia	c9c099b271	Add kwargs to RecordFunctionFast (#123600 ) Differential Revision: [D55897888](https://our.internmc.facebook.com/intern/diff/D55897888/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123600 Approved by: https://github.com/davidberard98	2024-04-10 18:17:50 +00:00
sraikund16	6fa72480d3	Enhance RecordFunctionFast input args and use input args in triton_heuristics.py (#123459 ) Summary: Now that we can input shapes as input args for RecordFunctionFast, let's add that to the triton heuristics. Also, lets add the ability to pass in a tuple into the RecordFunctionFast constructor. Test Plan: Ran both the _inductor/test_profile.py and profiler/test_profiler.py unit tests. Also added tuple based unit test to profiler/test_profiler.py Ran record_function_fast.py from the following branch https://github.com/pytorch/pytorch/compare/sraikund/record_funct_test?expand=1 No shape or args: tests function fast with no args and profile without record_shapes With shape tests: tests function fast with args and profile with record_shapes true Args no shape: tests function fast with args inputted but record_shapes set to false Args shape tuple: tests function fast with args inputted in form of tuple and record_shapes true Stdout: No shape or args:: 1.8491458892822266 us With shape:: 2.211381196975708 us Args no shape:: 1.9212646484375 us With shape tuple:: 2.245788335800171 us Differential Revision: D55809967 Pull Request resolved: https://github.com/pytorch/pytorch/pull/123459 Approved by: https://github.com/davidberard98	2024-04-06 02:44:06 +00:00
Shivam Raikundalia	4732375042	make RecordFunctionFast take inputs (#123208 ) Summary: RECORD_FUNCTION in C++ and torch.profiler.record_function already support recording inputs. Let's do the same for RecordFunctionFast. Test Plan: Add tests in test_profiler.py that take args and also do not take args so we can support it being an optional parameter Differential Revision: D55648870 Pull Request resolved: https://github.com/pytorch/pytorch/pull/123208 Approved by: https://github.com/davidberard98	2024-04-03 21:58:09 +00:00
dujinhang	9990d1bc22	Add 'profiler/python' to the package.' (#121892 ) Fixes #ISSUE_NUMBER expose the `py_symbolize` interface for use. thank you Pull Request resolved: https://github.com/pytorch/pytorch/pull/121892 Approved by: https://github.com/zdevito	2024-03-16 11:11:26 +00:00
zdevito	5395331644	Avoid GIL during exit (#116709 ) Stacks recorded when tensors are being freed during exit could try to acquire the GIL. Py_IsInitialized can be used to check if we are post Python exit and should not attempt to acquire the GIL. Pull Request resolved: https://github.com/pytorch/pytorch/pull/116709 Approved by: https://github.com/aaronenyeshi	2024-01-04 01:56:44 +00:00
cyy	ff82dcd8fa	[2/N] Enable clang-tidy checks in torch/csrc/profiler (#113439 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/113439 Approved by: https://github.com/Skylion007	2023-11-14 00:39:54 +00:00
cyy	41e8632ca4	[1/N] Fix clang-tidy warnings in torch/csrc/profiler (#112360 ) This PR fixes some clang-tidy warnings in torch/csrc/profiler Pull Request resolved: https://github.com/pytorch/pytorch/pull/112360 Approved by: https://github.com/ezyang	2023-11-10 07:37:23 +00:00
cyy	168f516fae	[3/N] Move c10::variant to std::variant (#110141 ) This PR moves more c10::variant calls to std::variant Pull Request resolved: https://github.com/pytorch/pytorch/pull/110141 Approved by: https://github.com/Skylion007	2023-09-28 18:43:55 +00:00

1 2

99 Commits