SMILES-based Transformer Encoder-Decoder (SMILES-TED) is an encoder-decoder model pre-trained on a curated dataset of 91 million SMILES samples sourced from PubChem, equivalent to 4 billion molecular tokens. SMI-TED supports various complex tasks, including quantum property prediction, with two main variants ( 289 M and 8 × 289 M ).
This repository provides a Python-based tool to access the SMI-TED models via a REST API. Once the API is set up, use the OpenAD Toolkit to easily access the inference functions of SMI-TED.
More information on SMI-TED can be found at:
- https://huggingface.co/ibm/materials.smi-ted
- https://github.com/IBM/materials/tree/main
- https://arxiv.org/abs/2407.20267
More information on the OpenAD Toolkit and OpenAD Service Utilities:
- https://accelerate.science/projects/openad
- https://github.com/acceleratedscience/open-ad-toolkit
- https://github.com/acceleratedscience/openad_service_utils
this current version of the code in app.py excludes all other properties other than QM9 properties. you can change this to include QM8 and Molecule Net if checkpoints avsilable by simply changing the selected algorithms list.
Note: All of these deployment options will allow you to access SMI-TED functions using the OpenAD Toolkit. You may want to first install the Toolkit in its own Python environment before proceeding with deploying the SMI-TED utility.
You will need a Python level of 3.11 & to follow these installation directions:
-
Use your favorite Python environment manager (e.g., Conda, Pyenv) to create a new Python 3.11.10 environment
-
Activate the new environment and install the required Python modules for SMI-TED per the "Getting Started" instructions at this site (ignore the requirement to use Python 3.9):
https://github.com/IBM/materials/ -
Install the OpenAD Service Utilities in the new environment with the following command:
pip install git+https://github.com/acceleratedscience/openad_service_utils.git@nested_implementation
-
Clone this repo into a new directory:
git clone https://github.com/acceleratedscience/openad_smi_ted
-
Change directory to
openad_smi_ted
. -
Add the necessary environment variables used by the OpenAD service utilities:
source ./openad_smi_ted_bash_env.sh
-
Start the server with the following command:
python app.py
-
Open a new terminal session.
-
In the new terminal session start the OpenAD Toolkit:
openad
-
At the OpenAD Toolkit command line execute the following command to create a new service for accessing the local server started in step 7:
catalog model service from remote 'http://127.0.0.1:8080/' as sm
-
At the OpenAD Toolkit command line execute the following command to view the available commands:
sm ?
Note: The first time you request a particular property (e.g., "qm8-e1-cam") the OpenAD Service Utility will take some time to download and locally cache the correct models. Requests will proceed much faster once models are cached locally.
Prerequisites: Make sure you have Docker and the Docker Buildx plugin installed on your system.
-
Clone this repo into a new directory:
git clone https://github.com/acceleratedscience/openad_smi_ted
-
To build the Docker image, change directory to
openad_smi_ted
then start the build with the following command:
docker build -t smi-ted-app .
-
After the build is complete, execute the following command to run the container and have the server available on port 8080:
docker run -p 8080:8080 smi-ted-app
-
Open a new terminal session and start the OpenAD Toolkit:
openad
-
At the OpenAD Toolkit command line execute the following command to create a new service for accessing the local server started in step 3:
catalog model service from remote 'http://0.0.0.0:8080/' as sm
-
At the OpenAD Toolkit command line execute the following command to view the available commands:
sm ?
Install Helm Chart
helm install smi-ted ./helm-chart
Start a new build
oc start-build smi-ted-build
Wait for the build to complete
LATEST_BUILD=$(oc get builds | grep 'smi-ted-build-' | awk '{print $1}' | sort -V | tail -n 1)
oc wait --for=condition=Complete build/$LATEST_BUILD --timeout=15m
Run request test (pod may take some time to initialize so curl request may fail. try again.)
curl "http://$(oc get route smi-ted-openad-model -o jsonpath='{.spec.host}')/health"
run on the command line `mkdir -p ~/.openad_models`
Note:
Initially downloading models may take some time, this will be prompted by your first request. To pre-load models you can run the following
mkdir -p ~/.openad_models/properties/molecules && aws s3 sync s3://ad-prod-biomed/molecules/smi_ted/ /tmp/.openad_models/properties/molecules/smi_ted --no-sign-request --exact-timestamps
it does require installing the AWS cli which can be found here..
https://docs.aws.amazon.com/cli/latest/userguide/getting-started-quickstart.html
then using Podman or Docker run the following in the same directory as the compose.yaml file:
(podman or docker) compose create
(podman or docker) compose start
the service will start on poert 8080
change this in the compose file if you wish it to run on another port.
In openad run the following command
catalog model service from remote 'http://127.0.0.1:8080/' as sm
-
The container used is https://quay.io/ibmdpdev/openad_smi_ted:latest
-
You can use the compose.yaml file rather than download the entire repository
https://github.com/acceleratedscience/openad_smi_ted/blob/main/compose.yaml
Support for skypilot on AWS is coming soon