pytorch

frozenleaves/pytorch

Fork 0

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-21 05:34:18 +08:00

Commit Graph

Author SHA1 Message Date

Author	SHA1	Message	Date
Xuehai Pan	6d5c789ad5	[BE][PYFMT] migrate PYFMT for `test/[a-h]*/` to `ruff format` (#144555 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144555 Approved by: https://github.com/ezyang ghstack dependencies: #144551, #144554	2025-06-24 04:53:54 +00:00
Basil Wong	1250106630	[pytorch] Remove numpy dependency from Knapsack Evaluator (#150825 ) Summary: The two implementations are functionally equivalent. They both calculate the memory budget at the knee point in the Pareto frontier using the same algorithm. 1. np.linspace -> basic list comprehension 2. runtime and memory values -> lists instead of numpy arrays 3. np.ptp -> max - min 4. np.norm -> diff with min value / range 5. np.sqrt -> *0.5 5. np.argmin -> .index(min(_)) Test Plan: # Unit Testing ``` buck test mode/opt //caffe2/test/functorch:test_ac_knapsack; pingme "tests done" Buck UI: https://www.internalfb.com/buck2/f4e41eb8-e775-4f04-b4e7-8e567599deb8 Test UI: https://www.internalfb.com/intern/testinfra/testrun/10133099236155875 Network: Up: 24KiB Down: 1.9GiB (reSessionID-7cd11487-f3e7-43ab-982a-805510771c8d) Executing actions. Remaining 0/259826 98:15:40.5s exec time total Command: test. Finished 3 local, 5 remote, 103467 cache (99% hit) 98:15:14.8s exec time cached (99%) Time elapsed: 1:09.9s Tests finished: Pass 15. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` # End to End Testing ### Baseline Run with DP Let's confirm everything we are running on works. - Optimization Algo: DP - Memory Budget: 0.05 - AIX Link: apf_local-basilwong-2025-03-22_20:39:10 - TLParse rank 0: https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmpDJaWp5/rank_0/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=10000 - TLParse rank 1: https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmpDJaWp5/rank_1/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=10000 ### Dynamic Memory Budget (Before Change) - Revision: 2c95489b7f79 - Optimization Algo: Dynamic Memory Budget - Memory Budget: 0.05 - AIX Link: https://www.internalfb.com/mlhub/pipeline/4088035428184866 - TLParse: - https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmpykEy8U/rank_0/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=10000 - https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmpykEy8U/rank_1/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=10000 ### Dynamic Memory Budget (After Change) - Revision: 14353eef3c9e - Optimization Algo: Dynamic Memory Budget - Memory Budget: 0.05 - AIX Link: https://www.internalfb.com/mlhub/pipeline/1613558749306737 - TLParse Links: - https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmpZKNWFw/rank_0/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=10000 - https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmpZKNWFw/rank_1/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=10000 As a sanity check lets take the AC information for the following compile id: 7_0_0 from the rank 0 of each TLParse. {F1976883124} Baseline: P1779400819 * Saved node values show we are storing much more compared to dynamic memory: ``` "Knapsack Saved Nodes": [ 16, 17, 19, 20, 21, 22, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60 ] ``` * Before Change: P1779401775 * Saved nodes are similar to after change but not exactly. ``` "Knapsack Saved Nodes": [ 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 49, 50 ] ``` * After Change: P1779402106 * Here we se the largest nodes that are saved are around the same, but there is a small discrepancy for the smallest nodes. ``` "Knapsack Saved Nodes": [ 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 50, 51, 57, 58, 59, 60, 61, 62 ], ``` The discrepancy can be explained by looking at the estimated memory values. This is the non-deterministic part(below are the top 5 memory values for considered candidates): ``` 0.05774741703905514, 0.007333005338292718, 0.007333005338292718, 0.007333005338292718, 0.007333005338292718, ``` vs ``` 0.049254204820440746, 0.006254502199421049, 0.006254502199421049, 0.006254502199421049, 0.006254502199421049, ``` Based on that the dynamic memory implementations performed similarly in an E2E test and that memory is non-deterministic we should be good to go to land. Differential Revision: D71692245 Pull Request resolved: https://github.com/pytorch/pytorch/pull/150825 Approved by: https://github.com/seemethere, https://github.com/jansel	2025-04-10 17:07:03 +00:00
Basil Wong	7b2af25f80	[1/n] Support Dynamic Memory Budget in Auto AC (#143539 ) # Summary: Full Context: https://docs.google.com/document/d/1-j5KSbfGFJQcH4sYh7BIeJXso3zYzl5G5yFQqXdKx_o/edit?usp=sharing tl;dr This change introduces classes which help determine a dynamic memory budget. This will mostly be helpful for models with many implicit graph breaks. --- New Classes: GraphInfoProvider * Takes the joint_graph as well as the input memories and runtimes and parses the graph + values into usable forms for the SolverEvaluator. KnapsackEvaluator * Provides a function: Given all of the four inputs (solver function as a callable, max_dynamic_memory_budget, min_dynamic_memory_budget, dynamic_memory_budget_pareto_granularity) it returns an approximation of the knee point of the pareto distribution. # Test Plan: ### LintRunner LintRunner Output: P1700445547 ### Unit Tests ``` $ buck test @mode/opt //caffe2/test/functorch:test_ac_knapsack `@mode/opt` was specified, but not found. Using file at `//mode/opt`. This behavior is being deprecated. Please use `"@//mode/opt"` instead File changed: fbcode//caffe2/.ruff_cache/0.7.4/.tmpB6PmDS File changed: fbsource//xplat/caffe2/test/functorch/test_ac_knapsack.py File changed: fbcode//caffe2/.ruff_cache/0.7.4/.tmpyjCiPn 20 additional file change events Buck UI: https://www.internalfb.com/buck2/414ead46-9ede-4192-8e1a-5d3c52bdb9cc Test UI: https://www.internalfb.com/intern/testinfra/testrun/6473924710342830 Network: Up: 0B Down: 0B (reSessionID-159794b9-9d61-477e-8e63-9bdeaa537dca) Analyzing targets. Remaining 0/214 Executing actions. Remaining 0/6933 0.1s exec time total Command: test. Finished 1 local Time elapsed: 18.5s Tests finished: Pass 15. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` ### Test Run Updated the config: ``` activation_memory_budget_solver: DYNAMIC_MEMORY_BUDGET_DP ``` Confirming proper execution via: [aps-fb_fm_v4_768_01_dynamic-2a792ba8af](https://www.internalfb.com/mlhub/pipelines/runs/mast/aps-fb_fm_v4_768_01_dynamic-2a792ba8af?job_attempt=0&version=0&env=PRODUCTION) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143539 Approved by: https://github.com/jansel	2024-12-21 07:38:52 +00:00

Xuehai Pan

6d5c789ad5

[BE][PYFMT] migrate PYFMT for test/[a-h]*/ to ruff format (#144555 )

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144555
Approved by: https://github.com/ezyang
ghstack dependencies: #144551, #144554

2025-06-24 04:53:54 +00:00

Basil Wong

1250106630

[pytorch] Remove numpy dependency from Knapsack Evaluator (#150825 )

Summary:
The two implementations are functionally equivalent. They both calculate the memory budget at the knee point in the Pareto frontier using the same algorithm.

1. np.linspace -> basic list comprehension
2. runtime and memory values -> lists instead of numpy arrays
3. np.ptp -> max - min
4. np.norm -> diff with min value / range
5. np.sqrt -> **0.5
5. np.argmin -> .index(min(_))

Test Plan:
# Unit Testing

```
buck test mode/opt //caffe2/test/functorch:test_ac_knapsack; pingme "tests done"
Buck UI: https://www.internalfb.com/buck2/f4e41eb8-e775-4f04-b4e7-8e567599deb8
Test UI: https://www.internalfb.com/intern/testinfra/testrun/10133099236155875
Network: Up: 24KiB  Down: 1.9GiB  (reSessionID-7cd11487-f3e7-43ab-982a-805510771c8d)
Executing actions. Remaining      0/259826                                                                                                  98:15:40.5s exec time total
Command: test.     Finished 3 local, 5 remote, 103467 cache (99% hit)                                                                       98:15:14.8s exec time cached (99%)
Time elapsed: 1:09.9s
Tests finished: Pass 15. Fail 0. Fatal 0. Skip 0. Build failure 0
```

# End to End Testing

### Baseline Run with DP

Let's confirm everything we are running on works.

- Optimization Algo: DP
- Memory Budget: 0.05
- AIX Link: apf_local-basilwong-2025-03-22_20:39:10
- TLParse rank 0: https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmpDJaWp5/rank_0/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=10000
- TLParse rank 1:  https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmpDJaWp5/rank_1/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=10000

### Dynamic Memory Budget (Before Change)

- Revision: 2c95489b7f79
- Optimization Algo: Dynamic Memory Budget
- Memory Budget: 0.05
- AIX Link: https://www.internalfb.com/mlhub/pipeline/4088035428184866
- TLParse:
   - https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmpykEy8U/rank_0/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=10000
   - https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmpykEy8U/rank_1/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=10000

### Dynamic Memory Budget (After Change)

- Revision: 14353eef3c9e
- Optimization Algo: Dynamic Memory Budget
- Memory Budget: 0.05
- AIX Link: https://www.internalfb.com/mlhub/pipeline/1613558749306737
- TLParse Links:
   - https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmpZKNWFw/rank_0/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=10000
    - https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmpZKNWFw/rank_1/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=10000

As a sanity check lets take the AC information for the following compile id: 7_0_0 from the rank 0 of each TLParse.

 {F1976883124}

* Baseline: P1779400819
   * Saved node values show we are storing much more compared to dynamic memory:

```
  "Knapsack Saved Nodes": [
    16,
    17,
    19,
    20,
    21,
    22,
    24,
    25,
    26,
    27,
    28,
    29,
    30,
    31,
    32,
    33,
    34,
    35,
    36,
    37,
    38,
    39,
    40,
    41,
    42,
    43,
    44,
    45,
    46,
    47,
    49,
    50,
    51,
    52,
    53,
    54,
    55,
    56,
    57,
    58,
    59,
    60
  ]
```

* Before Change: P1779401775
   * Saved nodes are similar to after change but not exactly.

```
  "Knapsack Saved Nodes": [
    24,
    25,
    26,
    27,
    28,
    29,
    30,
    31,
    32,
    33,
    34,
    35,
    36,
    37,
    38,
    39,
    40,
    41,
    42,
    43,
    44,
    45,
    46,
    47,
    49,
    50
  ]
```

* After Change: P1779402106
   * Here we se the largest nodes that are saved are around the same, but there is a small discrepancy for the smallest nodes.

```
  "Knapsack Saved Nodes": [
    24,
    25,
    26,
    27,
    28,
    29,
    30,
    31,
    32,
    33,
    34,
    35,
    36,
    37,
    38,
    39,
    40,
    41,
    42,
    43,
    44,
    45,
    46,
    47,
    50,
    51,
    57,
    58,
    59,
    60,
    61,
    62
  ],
```

The discrepancy can be explained by looking at the estimated memory values. This is the non-deterministic part(below are the top 5 memory values for considered candidates):

```
    0.05774741703905514,
    0.007333005338292718,
    0.007333005338292718,
    0.007333005338292718,
    0.007333005338292718,
```

vs

```
    0.049254204820440746,
    0.006254502199421049,
    0.006254502199421049,
    0.006254502199421049,
    0.006254502199421049,
```

Based on that the dynamic memory implementations performed  similarly in an E2E test and that memory is non-deterministic we should be good to go to land.

Differential Revision: D71692245

Pull Request resolved: https://github.com/pytorch/pytorch/pull/150825
Approved by: https://github.com/seemethere, https://github.com/jansel

2025-04-10 17:07:03 +00:00

Basil Wong

7b2af25f80

[1/n] Support Dynamic Memory Budget in Auto AC (#143539 )

# Summary:
Full Context: https://docs.google.com/document/d/1-j5KSbfGFJQcH4sYh7BIeJXso3zYzl5G5yFQqXdKx_o/edit?usp=sharing

tl;dr

This change introduces classes which help determine a dynamic memory budget. This will mostly be helpful for models with many implicit graph breaks.

---

New Classes:

*GraphInfoProvider*
* Takes the joint_graph as well as the input memories and runtimes and parses the graph + values into usable forms for the SolverEvaluator.

*KnapsackEvaluator*
* Provides a function: Given all of the four inputs (solver function as a callable, max_dynamic_memory_budget, min_dynamic_memory_budget, dynamic_memory_budget_pareto_granularity) it returns an approximation of the knee point of the pareto distribution.

# Test Plan:

### LintRunner

LintRunner Output: P1700445547

### Unit Tests

```
$ buck test @mode/opt //caffe2/test/functorch:test_ac_knapsack
`@mode/opt` was specified, but not found. Using file at `//mode/opt`.
This behavior is being deprecated. Please use `"@//mode/opt"` instead
File changed: fbcode//caffe2/.ruff_cache/0.7.4/.tmpB6PmDS
File changed: fbsource//xplat/caffe2/test/functorch/test_ac_knapsack.py
File changed: fbcode//caffe2/.ruff_cache/0.7.4/.tmpyjCiPn
20 additional file change events
Buck UI: https://www.internalfb.com/buck2/414ead46-9ede-4192-8e1a-5d3c52bdb9cc
Test UI: https://www.internalfb.com/intern/testinfra/testrun/6473924710342830
Network: Up: 0B Down: 0B (reSessionID-159794b9-9d61-477e-8e63-9bdeaa537dca)
Analyzing targets. Remaining 0/214
Executing actions. Remaining 0/6933 0.1s exec time total
Command: test. Finished 1 local
Time elapsed: 18.5s
Tests finished: Pass 15. Fail 0. Fatal 0. Skip 0. Build failure 0
```

### Test Run

Updated the config:

```
activation_memory_budget_solver: DYNAMIC_MEMORY_BUDGET_DP
```

Confirming proper execution via: [aps-fb_fm_v4_768_01_dynamic-2a792ba8af](https://www.internalfb.com/mlhub/pipelines/runs/mast/aps-fb_fm_v4_768_01_dynamic-2a792ba8af?job_attempt=0&version=0&env=PRODUCTION)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143539
Approved by: https://github.com/jansel

2024-12-21 07:38:52 +00:00

3 Commits