Expand documentation around running run_alphafold.py

PiperOrigin-RevId: 704634652
2025-10-20 13:23:47 +08:00 · 2024-12-10 15:45:44 +00:00
parent 9b59a237a8
commit 2e562355ba
1 changed files with 39 additions and 13 deletions
--- a/docs/installation.md
+++ b/docs/installation.md
@ -212,11 +212,13 @@ installing on local SSD. We recommend running the following in a `screen` or

 ```sh
 cd alphafold3  # Navigate to the directory with cloned AlphaFold 3 repository.
-./fetch_databases.sh <DB_DIR>
+./fetch_databases.sh [<DB_DIR>]
 ```

 This script downloads the databases from a mirror hosted on GCS, with all
-versions being the same as used in the AlphaFold 3 paper.
+versions being the same as used in the AlphaFold 3 paper, to the directory
+`<DB_DIR>`. If not specified, the default `<DB_DIR>` is
+`$HOME/public_databases`.

 :ledger: **Note: The download directory `<DB_DIR>` should *not* be a
 subdirectory in the AlphaFold 3 repository directory.** If it is, the Docker
@ -250,13 +252,14 @@ uniref90_2022_05.fa
 Optionally, after the script finishes, you may want copy databases to an SSD.
 You can use theses two scripts:

-*   `src/scripts/gcp_mount_ssd.sh <SSD_MOUNT_PATH>` Mounts and formats an
-    unmounted GCP SSD drive. It will skip the either step if the disk is either
-    already formatted or already mounted. The default `<SSD_MOUNT_PATH>` is
-    `/mnt/disks/ssd`.
-*   `src/scripts/copy_to_ssd.sh <DB_DIR> <SSD_DB_DIR>` this will copy as many
-    files that it can fit on to the SSD. The default `<DATABASE_DIR>` is
-    `$HOME/public_databases` and the default `<SSD_DB_DIR>` is
+*   `src/scripts/gcp_mount_ssd.sh [<SSD_MOUNT_PATH>]` Mounts and formats an
+    unmounted GCP SSD drive to the specified path. It will skip the either step
+    if the disk is either already formatted or already mounted. The default
+    `<SSD_MOUNT_PATH>` is `/mnt/disks/ssd`.
+*   `src/scripts/copy_to_ssd.sh [<DB_DIR>] [<SSD_DB_DIR>]` this will copy as
+    many files that it can fit on to the SSD. The default `<DB_DIR>` is
+    `$HOME/public_databases`, and must match the path used in the
+    `fetch_databases.sh` command above, and the default `<SSD_DB_DIR>` is
    `/mnt/disks/ssd/public_databases`.

 ## Obtaining Model Parameters
@ -268,6 +271,11 @@ business days. You may only use AlphaFold 3 model parameters if received
 directly from Google. Use is subject to these
 [terms of use](https://github.com/google-deepmind/alphafold3/blob/main/WEIGHTS_TERMS_OF_USE.md).

+Once access has been granted, download the model parameters to a directory of
+your choosing, referred to as `<MODEL_PARAMETERS_DIR>` in the following
+instructions. As with the databases, this should *not* be a subdirectory in the
+AlphaFold 3 repository directory.
+
 ## Building the Docker Container That Will Run AlphaFold 3

 Then, build the Docker container. This builds a container with all the right
@ -277,7 +285,11 @@ python dependencies:
 docker build -t alphafold3 -f docker/Dockerfile .
 ```

-You can now run AlphaFold 3!
+Create an input JSON file, using either the example in the
+[README](https://github.com/google-deepmind/alphafold3?tab=readme-ov-file#installation-and-running-your-first-prediction)
+or a
+[custom input](https://github.com/google-deepmind/alphafold3/blob/main/docs/input.md),
+and place it in a directory, e.g. `$HOME/af_input`. You can now run AlphaFold 3!

 ```sh
 docker run -it \
@ -293,13 +305,27 @@ docker run -it \
    --output_dir=/root/af_output
 ```

+where `$HOME/af_input` is the directory containing the input JSON file;
+`$HOME/af_output` is the directory where the output will be written to; and
+`<DB_DIR>` and `<MODEL_PARAMETERS_DIR>` are the directories containing the
+databases and model parameters. The values of these directories must match the
+directories used in previous steps for downloading databases and model weights,
+and for the input file.
+
+:ledger: Note: You may also need to create the output directory,
+`$HOME/af_output` directory before running the `docker` command and make it and
+the input directory writable from the docker container, e.g. by running `chmod
+755 $HOME/af_input $HOME/af_output`. In most cases `docker` and
+`run_alphafold.py` will create the output directory if it does not exist.
+
 :ledger: **Note: In the example above the databases have been placed on the
 persistent disk, which is slow.** If you want better genetic and template search
 performance, make sure all databases are placed on a local SSD.

-If you have databases on SSD in `<SSD_DB_DIR>` you can use uses it as the
-location to look for databases but allowing for a multiple fallbacks with
-`--db_dir` which can be specified multiple times.
+If you have some databases on an SSD in the `<SSD_DB_DIR>` directory and some
+databases on a slower disk in the `<DB_DIR>` directory, you can mount both
+directories and specify `db_dir` multiple times. This will enable the fast
+access to databases with a fallback to the larger, slower disk:

 ```sh
 docker run -it \