pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-21 05:34:18 +08:00

Author	SHA1	Message	Date
Xuan Zhang	9c2ffce71a	add condition for freeable input buffer (#139480 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139480 Approved by: https://github.com/yf225 ghstack dependencies: #139396	2024-11-01 21:15:40 +00:00
Xuan Zhang	86602a66d7	[orm] fix live_memory computation in lpmf algorithm (#139396 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139396 Approved by: https://github.com/yf225	2024-10-31 23:45:30 +00:00
Xuan Zhang	2980aed65b	[inductor][memory] restructuring memory.py and turn on the flag (#137205 ) Addressing additional comments given in PR https://github.com/pytorch/pytorch/pull/134874 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137205 Approved by: https://github.com/eellison	2024-10-25 17:19:34 +00:00
Xuan Zhang	c9d12f6360	[inductor][memory] add signpost event for memory pass (#136538 ) Add logging to scuba table for internal models. For verification, I triggered a sample workflow internally and checked the scuba table logging to make sure the `Paramaters` column has the expected loggings, see [here](https://fburl.com/scuba/workflow_signpost/39h7qo9s). Pull Request resolved: https://github.com/pytorch/pytorch/pull/136538 Approved by: https://github.com/yf225	2024-09-25 21:47:46 +00:00
Xuan Zhang	03957efa5d	[inductor][scheduler] reorder scheduler nodes after fusion to reduce peak memory (#134874 ) Motivations: A topological order of the scheduler nodes that optimize the liveness of buffers can reduce the peak memory utilization. This has been observed and studied e.g., [here](https://arxiv.org/pdf/1910.02653) and [here](https://proceedings.mlr.press/v202/steiner23a/steiner23a.pdf). Solutions: 1. implement a peak memory estimator via liveness analysis 2. implement a few memory aware topological sorting algorithms and pick the one with the lowest peak memory Results: On some models we can reduce the peak memory significantly: \| model \| batch size \| peak_memory baseline \| peak_memory new \| ratio \| \|:-----------------------------:\|:----------:\|:--------------------:\|:---------------:\|:-----:\| \| alexnet \| 128 \| 1.17 \| 0.99 \| 1.19 \| \| vgg16 \| 64 \| 4.10 \| 3.57 \| 1.15 \| \| DebertaV2ForQuestionAnswering \| 1 \| 11.60 \| 10.56 \| 1.10 \| In the presence of compiler based AC, peak memory can be further reduced: \| model \| batch size \| peak_memory baseline \| peak_memory new \| ratio \| \|:------------------------------:\|:----------:\|:--------------------:\|:---------------:\|:-----:\| \| AlbertForMaskedLM \| 4 \| 6.87 \| 6.43 \| 1.07 \| \| AlbertForQuestionAnswering \| 4 \| 8.69 \| 7.76 \| 1.12 \| \| MobileBertForQuestionAnswering \| 128 \| 4.67 \| 3.90 \| 1.20 \| [Here](https://fb.workplace.com/groups/1075192433118967/posts/1499920537312819/?comment_id=1499938843977655&reply_comment_id=1499951630643043) is an internal use case. Other infos: * neutral model runtime, because the the reordering happens after fusion. So memory saving is _for free_. * minimal compile time overhead as the algorithm is linear in the number of edges of the inductor graph. For all hugglingface benchmark models, the additional compile time is less than 1 second. * no peak memory regression since we only adopt a new order if the peak memory is reduced based on the estimator. However, the model is unaware of operators' working memories, but for large models, the working memory should be negligible. We haven't observed any significant regressions on all of our tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134874 Approved by: https://github.com/yf225	2024-09-21 16:28:38 +00:00

5 Commits