Applies so more fixes to headers that may have been missed before for performance optimization.cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @EikanWang @ezyang since this more in the series of the clang-tidy fixup
This is PR fixes 3 main issues:
1. Use emplacement more in headers
1. Avoid unnecessary copies and use const ref when possible
1. Default any special functions when possible to make them potentially trivial and more readable.
1. There is also one change in this PR that tries to prevent unnecessary math promotion, the rest of these changes are in another PR
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91445
Approved by: https://github.com/ezyang
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56830
Opt into formatting on GitHub and format everything. This is a trial run before turning on formatting for more and eventually all of the codebase.
Test Plan: CI
Reviewed By: zertosh
Differential Revision: D27979080
fbshipit-source-id: a80f0c48691c08ae8ca0af06377b87e6a2351151
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55976
- Define a concrete `DebugInfo` to collect Param comms.
- Add a macro to easily log `DebugInfo`
Test Plan:
Tested on `ads:simplified_launcher` with `dyno gputrace`
locally tested in libkinetoObserver that it can collect the debug Infobase
Reviewed By: kingchc, ilia-cher
Differential Revision: D26773447
fbshipit-source-id: a8eeede2d6dbf34d7a1b3614843b4a1baba94448
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48762
In distributed inference, we want to use a new type info to pass some information to operators. add a new key to threadLocalDebugInfo to unblock the development.
Test Plan: Only add a new key. Should have not effect on current build.
Reviewed By: dzhulgakov
Differential Revision: D25291242
fbshipit-source-id: c71565ff7a38cc514d7cd65246c7d5f6b2ce3b8b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47796
`ThreadLocalDebugInfo::get()` is a hot function. For example, it is called by `DefaultCPUAllocator::allocate()`. Most callers do not even bother to keep the returned `shared_ptr` around, proving that they have no lifetime issues currently. For the rest, it appears that the only way that the returned pointer could become invalid is if they then called a function that swapped out `ThreadLocalDebugInfo` using `ThreadLocalStateGuard`. There are very few such paths, and it doesn't look like any current callers of `ThreadLocalDebugInfo::get()` needed a `shared_ptr` at all.
ghstack-source-id: 116979577
Test Plan:
1) reviewers to double-check audit of safety
2) run framework overhead benchmarks
Reviewed By: dzhulgakov
Differential Revision: D24902978
fbshipit-source-id: d684737cc2568534cac7cd3fb8d623b971c2fd28
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44653
This changes the profiler per a discussion with ilia-cher offline that enables `disableProfiler()` event consolidation logic to be called from different threads (i.e. threads where the profiler was not explicitly enabled). This is needed to support the functionality enabled by D23638387 where we defer profiling event collection until executing an async callback that can execute on a different thread, to support RPC async function profiling.
This is done by introducing 2 flags `cleanupTLSState` and `consolidate` which controls whether we should clean up thread local settings (we don't do this when calling `disableProfiler()` on non-main threads) and whether we should consolidate all profiled events. Backwards compatiblity is ensured since both options are true by default.
Added a test in `test_misc.cpp` to test this.
ghstack-source-id: 112605620
Reviewed By: mrshenli
Differential Revision: D23638499
fbshipit-source-id: f5bbb0d41ef883c5e5870bc27e086b8b8908f46b
Summary:
When building, my log was being spammed with:
```
warning: attribute "__visibility__" does not apply here
```
Which, at least on gcc 7.4 isn't covered by silencing `-Wattribute`. The warning suggests `enum`s don't need to be exported on linux, so I just `ifdef` it out instead.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38988
Differential Revision: D21722032
Pulled By: ezyang
fbshipit-source-id: ed4cfebc187dceaa9e748d85f756611fd7eda4b4