Summary:
It seems like this issues is due to leftover cupti events during warmup staying persistent in the queue during profiling. These events start before our actual time window and therefore have a timestamp lower than our basetime. This makes the delta become negative which results in unsigned overflow. This then creates a large number which later gets sign added which creates the signed overflow.
Solution: If a raw timestamp is less than the base timestamp, just mark the process timestamp as -1 so we can mark these events as "to ignore". In Kineto, add a special case to ignore timestamps that are negative.
Test Plan: Test with ASAN
Differential Revision: D65835650
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140441
Approved by: https://github.com/davidberard98
Summary:
Move the profiler's Approximate Clock from libtorch to libc10. The main reason is to allow c10 features to get time.
The clock is using TSC when available for performance. CUDA Caching Allocator's implementation of memory snapshot will add the timestamps to memory events with this same clock in subsequent diff.
Test Plan: CI
Differential Revision: D50601935
Pulled By: aaronenyeshi
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111972
Approved by: https://github.com/davidberard98