pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Author	SHA1	Message	Date
xinan.lin	61d9a5180e	[Fix XPU CI] [Inductor UT] Fix test cases broken by community. (#165714 ) Fixes #165719, Fixes #165771 Pull Request resolved: https://github.com/pytorch/pytorch/pull/165714 Approved by: https://github.com/jansel	2025-10-19 23:59:04 +00:00
eellison	b3f6d49b69	Overlap scheduler improvements (#165318 ) Bucketing a number of smallish improvements: - Account for bucketing in overlap calculation: if an in-flight collective exists with the same bucket key, reduce new collectives estimated time by its latency time - Update compute domination so we are ordering based on compute idx, as opposed to compute depth, so we never reorder compute. this makes it a bit easier to reason about memory, and pre-fetching, although we can exploring reordering in the future. - When we wait on a collective, force all collectives on the same process group as it that were enqueued prior to the collective to wait as well. Better Memory Handling: - Pre-fetch limiting - when scheduling collectives for overlap, only pre-fetch up to a certain distance, then schedule off-path collectives (which are typically memory reducing). - When we are above peak memory, schedule waits. TODO: - for each compute node, we know its original memory in the graph. we could limit pre-fetching that goes across peak memory - By scheduling off-path collectives for overlap, we reduce memory, but if there weren't enough compute for overlap, we need to proactively schedule them. not an issue yet on examples. - config some hard coded constants, clean up enablement (can do in subsequent pr) On small llama 2d backward : 578 of 618 potentially hideable collectives hidden original mem 14.4GB, rescheduled mem, 15.9GB on forward: 254/256 potentially hideable collectives hidden original mem 5.8 gb, reshceduled mem 5.8GB WIP: adding tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/165318 Approved by: https://github.com/ezyang, https://github.com/IvanKobzarev ghstack dependencies: #164738, #164783, #164944, #164945, #165059	2025-10-15 21:58:47 +00:00
eellison	2b71b62045	Add Memory Estimation Tracker (#165059 ) Add Memory Tracker utility, which will track live memory given alternate ordering of nodes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/165059 Approved by: https://github.com/ezyang, https://github.com/IvanKobzarev ghstack dependencies: #164738, #164783, #164944, #164945	2025-10-15 19:44:29 +00:00
eellison	aed5ed1076	Refactor memory estimator to use node storages, add test (#164783 ) - Update the Memory Estimator to use node storages for analysis, which simplifies book keeping, as opposed to manually looking at operator schema. This will also allow me to reuse this component elsewhere. - Factor out into separate class, so that this same logic can be used in scheduling (node allocations / aliasing / uses) - Adds Tests for correctness - right now only on fwd/bwd by itself, not with both. Pull Request resolved: https://github.com/pytorch/pytorch/pull/164783 Approved by: https://github.com/ruisizhang123 ghstack dependencies: #164738	2025-10-08 22:07:43 +00:00

4 Commits