Skip to content

Latest commit

 

History

History
170 lines (116 loc) · 7.28 KB

README.md

File metadata and controls

170 lines (116 loc) · 7.28 KB

SMI-TED Inference for SMILES

License MIT Code style: black Docs
Linux macOS Python

About:

SMILES-based Transformer Encoder-Decoder (SMILES-TED) is an encoder-decoder model pre-trained on a curated dataset of 91 million SMILES samples sourced from PubChem, equivalent to 4 billion molecular tokens. SMI-TED supports various complex tasks, including quantum property prediction, with two main variants ( 289 M and 8 × 289 M ).

This repository provides a Python-based tool to access the SMI-TED models via a REST API. Once the API is set up, use the OpenAD Toolkit to easily access the inference functions of SMI-TED.

More information on SMI-TED can be found at:

More information on the OpenAD Toolkit and OpenAD Service Utilities:


Note:

this current version of the code in app.py excludes all other properties other than QM9 properties. you can change this to include QM8 and Molecule Net if checkpoints avsilable by simply changing the selected algorithms list.

Deployment Options

Note: All of these deployment options will allow you to access SMI-TED functions using the OpenAD Toolkit. You may want to first install the Toolkit in its own Python environment before proceeding with deploying the SMI-TED utility.


Deployment locally using a Python virtual environment


You will need a Python level of 3.11 & to follow these installation directions:

  1. Use your favorite Python environment manager (e.g., Conda, Pyenv) to create a new Python 3.11.10 environment

  2. Activate the new environment and install the required Python modules for SMI-TED per the "Getting Started" instructions at this site (ignore the requirement to use Python 3.9):
    https://github.com/IBM/materials/

  3. Install the OpenAD Service Utilities in the new environment with the following command:
    pip install git+https://github.com/acceleratedscience/openad_service_utils.git@nested_implementation

  4. Clone this repo into a new directory:
    git clone https://github.com/acceleratedscience/openad_smi_ted

  5. Change directory to openad_smi_ted.

  6. Add the necessary environment variables used by the OpenAD service utilities:
    source ./openad_smi_ted_bash_env.sh

  7. Start the server with the following command:
    python app.py

  8. Open a new terminal session.

  9. In the new terminal session start the OpenAD Toolkit:
    openad

  10. At the OpenAD Toolkit command line execute the following command to create a new service for accessing the local server started in step 7:
    catalog model service from remote 'http://127.0.0.1:8080/' as sm

  11. At the OpenAD Toolkit command line execute the following command to view the available commands:
    sm ?

Note: The first time you request a particular property (e.g., "qm8-e1-cam") the OpenAD Service Utility will take some time to download and locally cache the correct models. Requests will proceed much faster once models are cached locally.

Deployment locally via container


Prerequisites: Make sure you have Docker and the Docker Buildx plugin installed on your system.

  1. Clone this repo into a new directory:
    git clone https://github.com/acceleratedscience/openad_smi_ted

  2. To build the Docker image, change directory to openad_smi_ted then start the build with the following command:
    docker build -t smi-ted-app .

  3. After the build is complete, execute the following command to run the container and have the server available on port 8080: docker run -p 8080:8080 smi-ted-app

  4. Open a new terminal session and start the OpenAD Toolkit:
    openad

  5. At the OpenAD Toolkit command line execute the following command to create a new service for accessing the local server started in step 3:
    catalog model service from remote 'http://0.0.0.0:8080/' as sm

  6. At the OpenAD Toolkit command line execute the following command to view the available commands:
    sm ?

Deployment On OpenShift

Install Helm Chart

helm install smi-ted ./helm-chart

Start a new build

oc start-build smi-ted-build

Wait for the build to complete

LATEST_BUILD=$(oc get builds | grep 'smi-ted-build-' | awk '{print $1}' | sort -V | tail -n 1)

oc wait --for=condition=Complete build/$LATEST_BUILD --timeout=15m

Run request test (pod may take some time to initialize so curl request may fail. try again.)

curl "http://$(oc get route smi-ted-openad-model -o jsonpath='{.spec.host}')/health"

Deployment locally via Compose


run on the command line `mkdir -p ~/.openad_models`

Note:
Initially downloading models may take some time, this will be prompted by your first request. To pre-load models you can run the following

mkdir -p ~/.openad_models/properties/molecules && aws s3 sync s3://ad-prod-biomed/molecules/smi_ted/ /tmp/.openad_models/properties/molecules/smi_ted --no-sign-request --exact-timestamps
it does require installing the AWS cli which can be found here..

https://docs.aws.amazon.com/cli/latest/userguide/getting-started-quickstart.html

then using Podman or Docker run the following in the same directory as the compose.yaml file:

step 1:

(podman or docker) compose create

step 2:

(podman or docker) compose start

the service will start on poert 8080 change this in the compose file if you wish it to run on another port.

Step 3:

In openad run the following command catalog model service from remote 'http://127.0.0.1:8080/' as sm

Notes

https://github.com/acceleratedscience/openad_smi_ted/blob/main/compose.yaml

Deployment via Sky Pilot


Support for skypilot on AWS is coming soon