Expand documentation around running run_alphafold.py

PiperOrigin-RevId: 704634652
This commit is contained in:
James Spencer
2024-12-10 15:45:44 +00:00
parent 9b59a237a8
commit 2e562355ba

View File

@ -212,11 +212,13 @@ installing on local SSD. We recommend running the following in a `screen` or
```sh
cd alphafold3 # Navigate to the directory with cloned AlphaFold 3 repository.
./fetch_databases.sh <DB_DIR>
./fetch_databases.sh [<DB_DIR>]
```
This script downloads the databases from a mirror hosted on GCS, with all
versions being the same as used in the AlphaFold 3 paper.
versions being the same as used in the AlphaFold 3 paper, to the directory
`<DB_DIR>`. If not specified, the default `<DB_DIR>` is
`$HOME/public_databases`.
:ledger: **Note: The download directory `<DB_DIR>` should *not* be a
subdirectory in the AlphaFold 3 repository directory.** If it is, the Docker
@ -250,13 +252,14 @@ uniref90_2022_05.fa
Optionally, after the script finishes, you may want copy databases to an SSD.
You can use theses two scripts:
* `src/scripts/gcp_mount_ssd.sh <SSD_MOUNT_PATH>` Mounts and formats an
unmounted GCP SSD drive. It will skip the either step if the disk is either
already formatted or already mounted. The default `<SSD_MOUNT_PATH>` is
`/mnt/disks/ssd`.
* `src/scripts/copy_to_ssd.sh <DB_DIR> <SSD_DB_DIR>` this will copy as many
files that it can fit on to the SSD. The default `<DATABASE_DIR>` is
`$HOME/public_databases` and the default `<SSD_DB_DIR>` is
* `src/scripts/gcp_mount_ssd.sh [<SSD_MOUNT_PATH>]` Mounts and formats an
unmounted GCP SSD drive to the specified path. It will skip the either step
if the disk is either already formatted or already mounted. The default
`<SSD_MOUNT_PATH>` is `/mnt/disks/ssd`.
* `src/scripts/copy_to_ssd.sh [<DB_DIR>] [<SSD_DB_DIR>]` this will copy as
many files that it can fit on to the SSD. The default `<DB_DIR>` is
`$HOME/public_databases`, and must match the path used in the
`fetch_databases.sh` command above, and the default `<SSD_DB_DIR>` is
`/mnt/disks/ssd/public_databases`.
## Obtaining Model Parameters
@ -268,6 +271,11 @@ business days. You may only use AlphaFold 3 model parameters if received
directly from Google. Use is subject to these
[terms of use](https://github.com/google-deepmind/alphafold3/blob/main/WEIGHTS_TERMS_OF_USE.md).
Once access has been granted, download the model parameters to a directory of
your choosing, referred to as `<MODEL_PARAMETERS_DIR>` in the following
instructions. As with the databases, this should *not* be a subdirectory in the
AlphaFold 3 repository directory.
## Building the Docker Container That Will Run AlphaFold 3
Then, build the Docker container. This builds a container with all the right
@ -277,7 +285,11 @@ python dependencies:
docker build -t alphafold3 -f docker/Dockerfile .
```
You can now run AlphaFold 3!
Create an input JSON file, using either the example in the
[README](https://github.com/google-deepmind/alphafold3?tab=readme-ov-file#installation-and-running-your-first-prediction)
or a
[custom input](https://github.com/google-deepmind/alphafold3/blob/main/docs/input.md),
and place it in a directory, e.g. `$HOME/af_input`. You can now run AlphaFold 3!
```sh
docker run -it \
@ -293,13 +305,27 @@ docker run -it \
--output_dir=/root/af_output
```
where `$HOME/af_input` is the directory containing the input JSON file;
`$HOME/af_output` is the directory where the output will be written to; and
`<DB_DIR>` and `<MODEL_PARAMETERS_DIR>` are the directories containing the
databases and model parameters. The values of these directories must match the
directories used in previous steps for downloading databases and model weights,
and for the input file.
:ledger: Note: You may also need to create the output directory,
`$HOME/af_output` directory before running the `docker` command and make it and
the input directory writable from the docker container, e.g. by running `chmod
755 $HOME/af_input $HOME/af_output`. In most cases `docker` and
`run_alphafold.py` will create the output directory if it does not exist.
:ledger: **Note: In the example above the databases have been placed on the
persistent disk, which is slow.** If you want better genetic and template search
performance, make sure all databases are placed on a local SSD.
If you have databases on SSD in `<SSD_DB_DIR>` you can use uses it as the
location to look for databases but allowing for a multiple fallbacks with
`--db_dir` which can be specified multiple times.
If you have some databases on an SSD in the `<SSD_DB_DIR>` directory and some
databases on a slower disk in the `<DB_DIR>` directory, you can mount both
directories and specify `db_dir` multiple times. This will enable the fast
access to databases with a fallback to the larger, slower disk:
```sh
docker run -it \