Skip to content

7.2. DeepDriveMD

lukemartinlogan edited this page Aug 29, 2023 · 46 revisions

Dependencies

You can setup environments two ways

Prepare Conda Environment from Config Files

1. Prepare Conda

Get the miniconda3 installation script and run it

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh.sh

The current conda version tested work that works conda 23.3.1.

2. First git clone this repo and save it to $DDMD_PATH

export CONDA_OPENMM=openmm7_ddmd
export CONDA_PYTORCH=ddmd_pytorch
export DDMD_PATH=${PWD}/deepdrivemd
export MOLECULES_PATH=$DDMD_PATH/submodules/molecules
git clone --recursive https://github.com/candiceT233/deepdrivemd_pnnl.git $DDMD_PATH
cd $DDMD_PATH

3. Create the two conda environments

Name your two environment names $CONDA_OPENMM $CONDA_PYTORCH.

cd $DDMD_PATH
conda env create -f ${DDMD_PATH}/docs/conda_env/ddmd_openmm7.yaml --name=${CONDA_OPENMM}
conda env create -f ${DDMD_PATH}/docs/conda_env/ddmd_pytorch.yaml --name=${CONDA_PYTORCH}

If mdtools fails to install, that's ok. It will be handled in step 4.

4. Update python packages in both conda environments

Update CONDA_OPENMM

source activate $CONDA_OPENMM
cd $DDMD_PATH/submodules/MD-tools
pip install .
cd $DDMD_PATH/submodules/molecules
pip install .
conda deactivate

Update CONDA_PYTORCH

source activate $CONDA_PYTORCH
cd $DDMD_PATH/submodules/MD-tools
pip install .
cd $DDMD_PATH/submodules/molecules
pip install .
conda deactivate

Hermes Dependencies

In Ares:

module load hermes/pnnl-tz3s7yx

this automatically loads the Hermes build with VFD, and it's HDF5 dependency.

Personal Machine:

If building Hermes yourself:

  • Sequential HDF5 >= 1.14.0
  • Hermes>=1.0 with VFD and POSIX Adaptor support

Build HDF5

scspkg create hdf5
cd `scspkg pkg-src hdf5`
git clone https://github.com/HDFGroup/hdf5.git -b hdf5_1_14_0
cd hdf5
mkdir build
cd build
cmake ../ -DHDF5_BUILD_HL_LIB=ON -DCMAKE_INSTALL_PREFIX=`scspkg pkg-root hdf5`
make -j8
make install

Install Hermes with Custom HDF5

NOTE: this only needs to be done for the CONDA_OPENMM environment, since both environment use the same exact python version. HDF5 will be compiled the same. However, these commands must be executed before source active $CONDA_PYTORCH to avoid overriding the python version.

Installation

  • h5py==3.8.0 is required for hdf5-1.14.0 and Hermes>=1.0
  • pip install h5py==3.8.0 should be run after deepdrivemd installation due to version restriction with pip
  • makesure you have hdf5-1.14.0 installed and added to $PATH before installing h5py (otherwise it will download hdf5-1.12.0 by default)
module load hdf5

cd $DDMD_PATH
source activate $CONDA_OPENMM
pip install -e .
pip uninstall h5py; pip install h5py==3.8.0
conda deactivate

source activate $CONDA_PYTORCH
pip install -e .
pip uninstall h5py; pip install h5py==3.8.0
conda deactivate

Usage

Below describes running one iteration of the 4-stages pipeline.
Set up experiment path in $EXPERIMENT_PATH, this will store all output files and log files from all stages.

EXPERIMENT_PATH=~/ddmd_runs
mkdir -p $EXPERIMENT_PATH

Stage 1 : OPENMM

Run code:

source activate $CONDA_OPENMM

PYTHONPATH=$DDMD_PATH:$MOLECULES_PATH python $DDMD_PATH/deepdrivemd/sim/openmm/run_openmm.py -c $YAML_PATH/molecular_dynamics_stage_test.yaml

This stage runs simulation, minimally you have to run 12 simulation tasks for stage 3 & 4 to work. So you must run the above command at least 12 times and each time with a different TASK_IDX_FORMAT.

Environment variables note

  • TASK_IDX_FORMAT : give a different task ID format for each openmm task, starts with task0000 up to task0011 for 12 tasks.
  • SIM_LENGTH : The simulation size, must be at least 0.1 for stage 3 & 4 to work.
  • GPU_IDX : set it to 0 since GPU is not used
  • YAML_PATH : The yaml file that contains the test configuration for the first stage

Setup environment variables and paths

SIM_LENGTH=0.1
GPU_IDX=0
TASK_IDX_FORMAT="task0000"
STAGE_IDX=0
OUTPUT_PATH=$EXPERIMENT_PATH/molecular_dynamics_runs/stage0000/$TASK_IDX_FORMAT
YAML_PATH=$DDMD_PATH/test/bba
mkdir -p $OUTPUT_PATH

In the yaml file molecular_dynamics_stage_test.yaml, makesure to modify the following fields accordingly:

experiment_directory: $EXPERIMENT_PATH
stage_idx: $STAGE_IDX
output_path: $OUTPUT_PATH
pdb_file: $DDMD_PATH/data/bba/system/1FME-unfolded.pdb
initial_pdb_dir: $DDMD_PATH/data/bba
simulation_length_ns: $SIM_LENGTH
reference_pdb_file: $DDMD_PATH/data/bba/1FME-folded.pdb
gpu_idx: $GPU_IDX

Sample output under one task folder (total 12 tasks folders):

ls -l $OUTPUT_PATH
-rw-rw-r-- 1 username username  722 Aug 11 01:08 aggregate_stage_test.yaml
-rw-rw-r-- 1 username username  786 Aug 10 21:50 molecular_dynamics_stage_test.yaml
-rw-rw-r-- 1 username username 599K Aug 10 21:56 stage0000_task0000.dcd
-rw-rw-r-- 1 username username 164K Aug 10 21:56 stage0000_task0000.h5
-rw-rw-r-- 1 username username  39K Aug 10 21:50 system__1FME-unfolded.pdb

Stage 2 : AGGREGATE

Run code:

source activate $CONDA_OPENMM

PYTHONPATH=$DDMD_PATH/ python $DDMD_PATH/deepdrivemd/aggregation/basic/aggregate.py -c $YAML_PATH/aggregate_stage_test.yaml

This stage only need to be run one time, it aggregates all the stage0000_task0000.h5 files from simulation into a single aggregated.h5 file.

Setup a different output path to the first openmm task folder:

OUTPUT_PATH=$EXPERIMENT_PATH/machine_learning_runs/stage0000/task0000

mkdir -p $OUTPUT_PATH

In the yaml file aggregate_stage_test.yaml, makesure to modify the following fields accordingly:

experiment_directory: $EXPERIMENT_PATH
stage_idx: $STAGE_IDX
pdb_file: $DDMD_PATH/data/bba/system/1FME-unfolded.pdb
reference_pdb_file: $DDMD_PATH/data/bba/1FME-folded.pdb

Expected output:

ls -l $OUTPUT_PATH | grep aggregated
-rw-rw-r-- 1 username username 1.6M Aug 11 01:08 aggregated.h5

Stage 3 : TRAINING

Run code:

source activate $CONDA_PYTORCH

PYTHONPATH=$DDMD_PATH/:$MOLECULES_PATH python $DDMD_PATH/deepdrivemd/models/aae/train.py -c $YAML_PATH/training_stage_test.yaml

When the code run, python might show warning messages that can be ignored.

Setup a different output path:

OUTPUT_PATH=$EXPERIMENT_PATH/machine_learning_runs/stage000$STAGE_IDX/$TASK_IDX_FORMAT

mkdir -p $OUTPUT_PATH

In the yaml file training_stage_test.yaml, makesure to modify the following fields accordingly:

experiment_directory: $EXPERIMENT_PATH
output_path: $OUTPUT_PATH

Expected output:

ls -l $OUTPUT_PATH
drwxrwxr-x 2 username username 4.0K Aug 11 01:09 checkpoint
-rw-rw-r-- 1 username username 1.5M Aug 11 01:10 discriminator-weights.pt
drwxrwxr-x 2 username username 4.0K Aug 11 01:10 embeddings
-rw-rw-r-- 1 username username 2.0M Aug 11 01:10 encoder-weights.pt
-rw-rw-r-- 1 username username 2.7M Aug 11 01:10 generator-weights.pt
-rw-rw-r-- 1 username username 1.2K Aug 11 01:10 loss.json
-rw-rw-r-- 1 username username  495 Aug 11 01:08 model-hparams.json
-rw-rw-r-- 1 username username   82 Aug 11 01:08 optimizer-hparams.json
-rw-rw-r-- 1 username username  884 Aug 11 01:08 training_stage_test.yaml
-rw-rw-r-- 1 username username 1.3K Aug 11 01:08 virtual-h5-metadata.json
-rw-rw-r-- 1 username username  10K Aug 11 01:08 virtual_stage0000_task0000.h5

Stage 4 : INFERENCE

Run code:

source activate $CONDA_PYTORCH

OMP_NUM_THREADS=4 PYTHONPATH=$DDMD_PATH/:$MOLECULES_PATH python $DDMD_PATH/deepdrivemd/agents/lof/lof.py -c $YAML_PATH/inference_stage_test.yaml

OMP_NUM_THREADS can be changed.

Update environment variables:

STAGE_IDX=3

OUTPUT_PATH=$EXPERIMENT_PATH/inference_runs/stage0000/$TASK_IDX_FORMAT

mkdir -p $OUTPUT_PATH

In the yaml file inference_stage_test.yaml, makesure to modify the following fields accordingly:

experiment_directory: $EXPERIMENT_PATH
stage_idx: $STAGE_IDX
output_path: $OUTPUT_PATH

Expected output files:

ls -l $OUTPUT_PATH
-rw-rw-r-- 1 username username  479 Aug 11 01:10 inference_stage_test.yaml
-rw-rw-r-- 1 username username 1.5K Aug 11 01:10 virtual-h5-metadata.json
-rw-rw-r-- 1 username username  18K Aug 11 01:10 virtual_stage0003_task0000.h5
Clone this wiki locally