AutoHeuristic: mixed_mm documentation (#133410)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133410 Approved by: https://github.com/Chillee ghstack dependencies: #133409
2025-10-20 21:14:14 +08:00 · 2024-08-13 22:38:59 -07:00
parent 142353eca3
commit f32a9e953f
3 changed files with 49 additions and 2 deletions
--- a/torchgen/_autoheuristic/mixed_mm/README.md
+++ b/torchgen/_autoheuristic/mixed_mm/README.md
@ -0,0 +1,16 @@
+If you just want to re-generate existing heuristics with already collected data for mixed_mm for A100/H100, run the following scripts:
+
+`bash get_mixedmm_dataset.sh # Downloads A100 and H100 datasets`
+`bash gen_mixedmm_heuristic_a100.sh # Generates A100 heuristic`
+`bash gen_mixedmm_heuristic_h100.sh # Generates H100 heuristic`
+
+If you want to collect new data, or generate a heuristic for another GPU, use the `generate_heuristic.sh` script:
+First, go into the generate_heuristic.sh and modify the variables according to the comments.
+Then run the script to perform benchmarks and collect training data:
+
+`bash generate_heuristic.sh collect`
+
+Depending on how many GPUs you are using, this might take a day.
+Afterwards, run the script in order to learn the heuristic:
+
+`bash generate_heuristic.sh generate`
--- a/torchgen/_autoheuristic/mixed_mm/gen_data_mixed_mm.py
+++ b/torchgen/_autoheuristic/mixed_mm/gen_data_mixed_mm.py
@ -12,6 +12,7 @@ from benchmark_runner import BenchmarkRunner  # type: ignore[import-not-found]
 from benchmark_utils import (  # type: ignore[import-not-found]
    fits_in_memory,
    get_mm_tensors,
+    get_random_between_pow2,
 )

 import torch
@ -95,7 +96,7 @@ class BenchmarkRunnerMixedMM(BenchmarkRunner):  # type: ignore[misc, no-any-unim
        if distr_type == "pow2":
            return self.get_random_pow2(min_power2=10, max_power2=17)
        elif distr_type == "uniform-between-pow2":
-            return self.get_random_between_pow2(min_power2=10, max_power2=17)
+            return get_random_between_pow2(min_power2=10, max_power2=17)
        elif distr_type == "uniform":
            return random.randint(1024, 131072)
        print(f"random_type {distr_type} not supported")
@ -106,7 +107,7 @@ class BenchmarkRunnerMixedMM(BenchmarkRunner):  # type: ignore[misc, no-any-unim
        if pow2:
            return 2 ** random.randint(1, 7)
        else:
-            return self.get_random_between_pow2(1, 7)
+            return get_random_between_pow2(1, 7)

    def get_m_k_n(self, dtype: Any) -> Tuple[int, int, int]:
        numel_max = 2**31
--- a/torchgen/_autoheuristic/mixed_mm/generate_heuristic_mixedmm.sh
+++ b/torchgen/_autoheuristic/mixed_mm/generate_heuristic_mixedmm.sh
@ -0,0 +1,30 @@
+#!/bin/bash
+
+if [ $# -ne 1 ]; then
+    echo "Error: This script requires exactly one argument."
+    echo "`bash generate_heuristic_mixedmm.sh collect` to run benchmark and collect training data."
+    echo "`bash generate_heuristic_mixedmm.sh generate` to use the collected data to learn a heuristic."
+    exit 1
+fi
+
+MODE=$1
+
+# !!! SPECIFY THE GPUs THAT YOU WANT TO USE HERE !!!
+GPU_DEVICE_IDS="4,5"
+
+# !!! SPECIFY THE CONDA ENVIRONEMNT THAT YOU WANT TO BE ACTIVATED HERE !!!
+CONDA_ENV=heuristic-pr
+
+NUM_SAMPLES=2000
+
+# This is where AutoHeuristic will store autotuning results
+OUTPUT_DIR="a100"
+
+# !!! CHANGE THE NAME OF THE HEURISTIC IF YOU WANT TO LEARN A HEURISTIC FOR A GPU THAT IS NOT A100 !!!
+HEURISTIC_NAME="MixedMMA100"
+
+BENCHMARK_SCRIPT="gen_data_mixed_mm.py"
+
+TRAIN_SCRIPT="train_decision_mixedmm.py"
+
+bash ../generate_heuristic.sh ${MODE} ${GPU_DEVICE_IDS} ${CONDA_ENV} ${NUM_SAMPLES} ${OUTPUT_DIR} ${HEURISTIC_NAME} ${BENCHMARK_SCRIPT} ${TRAIN_SCRIPT}