Skip to content

Commit

Permalink
New llms and remote service (#48)
Browse files Browse the repository at this point in the history
* #38 addressed

* #38 problem addressed

* #38 addressed

* integration of models

* update with catalog services

* update with catalog services

* combined properties and generator

* grammar cleanup

* namespacing added

* addressed error checking, tab forward, Help and display formatting

* addressed error checking, tab forward, Help and display formatting

* model service api

* added model mgt services

* model service api updates

* bugfix services

* using inheritance

* working on bug fix for save

* servicer updates

* added skypilot

* phils changes

* download models

* pytest tests added

* new tests

* updated servicer

* change default port

* updated capabilities

* updated capabilities

* #42 addresses issue of canonical smiles being overwritten with 2D canonical from Pubchem

* testing and fixes

* bugs for launching services

* partially implemented context sensitive output

* temp feat gpu

* create config from service file

* remove unused grammer

* set default workdir

* version bump servicing

* print config specs

* try to remove service

* rename to cfg

* format config print

* fixes

* skip test until next servicing

* updated tests and servicing

* -c implementation for comand line raw

* implemented api

* implemented api

* reduce interval for spinner

* wait for service to ping

* add status

* cast vars

* bug fix

* updates for merge data

* fix cfg file types

* updated tests

* bug fix status

* scaffold for local service

* more fixes

* high speed merge of property output

* improvements to handling data only modes in api

* better status

* major services update

* spinner function hints

* more hints

* add remote service

* added grammar for remote service catalog

* fix Notetype error in returns

* start without gpu

* fixurl

* setup pre-commit

* chore format and lint

* added append and fixed messaging

* fixed adding or displaying molecule not on pubchem

* demonstration testing and new notebooks

* fix return_val parameter typo

* llm model update to latest granite

* llm model update to latest granite

* update on doco and demo

* update notebooks and for demonstration

* working remote service defs

* servicing version bump

* remote service working

* chore: linter

* fix logger

* notebook update

* notebook update

* better logging

* remove lower()

* one more log :-)

* change save

* expand user path

* remove sentence transformer

* fix remote fetch

* service grammar instant refresh

* save when necessary

* caching service definitions

* clean up and debug

* update service defs

* use python cache decorator

* reduce sleep

* fix tests

* timout on catalog query increase 1 - 3 seconds

* timout on catalog query increase 1 - 3 seconds

* increase linter line length

* fix lru cache update

* readme update

* temporary fix for url fetch

* updates

* readme updates

* readme updates

* linter

* corrected Readme and added plugin loader

* updated pnd files

* updated version

* merge

* merge

* merge

* merge

* llm ollama support

* tuning model output and changing embeddings

* finalise on llama3 for ollama and granite-chat for BAM

* readme update for ollama

* Feat auth api (#47)

* authentication grammer

* auth with api key

* chore: sort imports

* auth model inference

* rename headers

* file lock + optimize + api obfuscate

* working proxy

* add headers to proxy

* can add bearer token in USING quotes

---------

Co-authored-by: Brian Duenas <[email protected]>

* llm instructions

* chore: lint

* updated version

---------

Co-authored-by: Phil Downey <[email protected]>
Co-authored-by: Phil Downey <[email protected]>
  • Loading branch information
3 people authored Jun 14, 2024
1 parent 7eed012 commit 0f2b748
Show file tree
Hide file tree
Showing 29 changed files with 1,154 additions and 609 deletions.
50 changes: 41 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ The goal of openAD is to provide a common language for scientists to interact wi

---
> **Pre-install Note:**
For updating to 0.3.0 first remove toolkits `remove toolkit DS4SD` and `remove toolkit RXN` prior to updating
For updating to 0.3.0 or above first remove toolkits `remove toolkit DS4SD` and `remove toolkit RXN` prior to updating

> **Whats New ?**
- `%Openadd` has been added to the magic commands to provide pure data type results for data returning commands
Expand Down Expand Up @@ -367,24 +367,56 @@ To run a command in bash mode, prepend it with `openad` and make sure to escape

# AI Assistant

To enable our AI assistant, you'll need an account with OpenAI. There is a one month free trial.
To enable our AI assistant, you'll need either have access to [IBM BAM](https://bam.res.ibm.com/auth/signin) or to use a free open source LLM use [ollama](ollama.com).

This is available for IBM BAM service and Openai.
**Note:** Ollama will regire a 8gb GPU

> **Note:** watsonx coming soon
## IBM BAM Setup
For IBM BAM simply used your supplied API key if you have BAM access

For OpenAI
### Run BAM LLM
run `tell me` to be prompted for your OpenAI API credentials
```
>> set llm bam
>> tell me <enter prompt>
```

1. Go to [platform.openai.com](https://platform.openai.com) and create an account
## Ollama setup
Install ollama on your platform from [here](https://ollama.com/download)

2. Click on the profile icon in the top right and choose "View API keys"
Download appropriate models
```
ollama pull llama3:latest
ollama pull nomic-embed-text
```

3. Create a new key
Start the server if not already started
```
ollama serve
```
Thats it for local usage. If you want to run ollama remotely continue.

4. Run `tell me` to be prompted for your OpenAI API credentials
### Ollama remote setup with skypilot
Check out our configuration file to launch ollama on skypilot [ollama_setup.yaml](./ollama_setup.yaml)
```
sky serve up ollama_setup.yaml
```

Setup local environment variables

1. For windows `setx OLLAMA_HOST=<sky-server-ip>:11434`
2. For Linux and macos `export OLLAMA_HOST=<sky-server-ip>:11434`
3. To reset to local use `OLLAMA_HOST=0.0.0.0:11434`


### Run ollama on openad toolkit
> if prompted for api key and none was setup just leave empty
```
>> set llm ollama
>> tell me <enter prompt>
```

<br>

Expand Down Expand Up @@ -495,4 +527,4 @@ You will need to restart your Linux session before running `pip install openad`

If you get an error when running `init_magic`, you may first need to setup the default iPython profile for magic commands.

ipython profile create
`ipython profile create`
13 changes: 7 additions & 6 deletions _for_testing/Demonstration.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -280,7 +280,7 @@
}
],
"source": [
"%openad list files "
"%openad list files"
]
},
{
Expand Down Expand Up @@ -456,7 +456,7 @@
"metadata": {},
"outputs": [],
"source": [
"result = %openad SEARCH FOR ' Sibofimloc' USING (index_key='pubchem') SHOW (data) return as data\n"
"result = %openad SEARCH FOR ' Sibofimloc' USING (index_key='pubchem') SHOW (data) return as data"
]
},
{
Expand Down Expand Up @@ -505,7 +505,7 @@
"metadata": {},
"outputs": [],
"source": [
"target_string=result_mol.get_selection().loc[0][\"SMILES\"]\n",
"target_string = result_mol.get_selection().loc[0][\"SMILES\"]\n",
"display(target_string)"
]
},
Expand Down Expand Up @@ -556,8 +556,9 @@
"outputs": [],
"source": [
"import pandas\n",
"lst = ['BrBr.c1ccc2cc3ccccc3cc2c1CCO' ,'Cl.CCC(=O)NCCC.O'] \n",
"batch_list = pandas.DataFrame(lst,columns=['reactions'])\n",
"\n",
"lst = [\"BrBr.c1ccc2cc3ccccc3cc2c1CCO\", \"Cl.CCC(=O)NCCC.O\"]\n",
"batch_list = pandas.DataFrame(lst, columns=[\"reactions\"])\n",
"%openad predict reaction topn in batch from dataframe batch_list using (topn=6)"
]
},
Expand Down Expand Up @@ -613,7 +614,7 @@
"metadata": {},
"outputs": [],
"source": [
"%openad predict reaction in batch from list ['BrBr.c1ccc2cc3ccccc3cc2c1CCO' ,'Cl.CCC(=O)NCCC.O'] "
"%openad predict reaction in batch from list ['BrBr.c1ccc2cc3ccccc3cc2c1CCO' ,'Cl.CCC(=O)NCCC.O']"
]
},
{
Expand Down
13 changes: 7 additions & 6 deletions _for_testing/test_everything.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -23,19 +23,20 @@
"from IPython.display import Markdown\n",
"from openad.app.global_var_lib import _repo_dir\n",
"\n",
"os.chdir(_repo_dir + '/../../') # todo: _repo_dir should point to repo folder, not app\n",
"os.chdir(_repo_dir + \"/../../\") # todo: _repo_dir should point to repo folder, not app\n",
"os.getcwd()\n",
"\n",
"success = %openad import from '_for_testing/runs' to '_runs'\n",
"success_str = success.data\n",
"if 'Import failed' in success_str:\n",
" print('Please rename or delete your current `_runs` folder first.')\n",
"if \"Import failed\" in success_str:\n",
" print(\"Please rename or delete your current `_runs` folder first.\")\n",
"else:\n",
" print('Successfully copied over _runs folder to your workspace.')\n",
" print(\"Successfully copied over _runs folder to your workspace.\")\n",
"\n",
"\n",
"# To display separators\n",
"def sep():\n",
" print('\\n\\n\\n' + '- '*60 + '\\n\\n\\n')"
" print(\"\\n\\n\\n\" + \"- \" * 60 + \"\\n\\n\\n\")"
]
},
{
Expand Down Expand Up @@ -145,7 +146,7 @@
"%openad set context rxn\n",
"%openad run help_tk_rxn\n",
"\n",
"#ST4SD\n",
"# ST4SD\n",
"%openad set context st4sd\n",
"%openad run help_tk_st4sd\n",
"\n",
Expand Down
62 changes: 62 additions & 0 deletions ollama_setup.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
envs:
MODEL_NAME: llama3 # mistral, phi, other ollama supported models
EMBEDDINGS_MODEL_NAME: nomic-embed-text # mistral, phi, other ollama supported models
OLLAMA_HOST: 0.0.0.0:8888 # Host and port for Ollama to listen on

resources:
cpus: 8+
memory: 16+ # 8 GB+ for 7B models, 16 GB+ for 13B models, 32 GB+ for 33B models
accelerators: V100:1 # No GPUs necessary for Ollama, but you can use them to run inference faster
ports: 8888

service:
replicas: 2
# An actual request for readiness probe.
readiness_probe:
path: /v1/chat/completions
post_data:
model: $MODEL_NAME
messages:
- role: user
content: Hello! What is your name?
max_tokens: 1

setup: |
# Install Ollama
if [ "$(uname -m)" == "aarch64" ]; then
# For apple silicon support
sudo curl -L https://ollama.com/download/ollama-linux-arm64 -o /usr/bin/ollama
else
sudo curl -L https://ollama.com/download/ollama-linux-amd64 -o /usr/bin/ollama
fi
sudo chmod +x /usr/bin/ollama
# Start `ollama serve` and capture PID to kill it after pull is done
ollama serve &
OLLAMA_PID=$!
# Wait for ollama to be ready
IS_READY=false
for i in {1..20};
do ollama list && IS_READY=true && break;
sleep 5;
done
if [ "$IS_READY" = false ]; then
echo "Ollama was not ready after 100 seconds. Exiting."
exit 1
fi
# Pull the model
ollama pull $EMBEDDINGS_MODEL_NAME
echo "Model $EMBEDDINGS_MODEL_NAME pulled successfully."
# Pull the model
ollama pull $MODEL_NAME
echo "Model $MODEL_NAME pulled successfully."
# Kill `ollama serve` after pull is done
kill $OLLAMA_PID
run: |
# Run `ollama serve` in the foreground
echo "Serving model $MODEL_NAME"
ollama serve
3 changes: 3 additions & 0 deletions openad/app/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -201,6 +201,9 @@ def __init__(self, completekey="Tab", api=False):
if self.settings["env_vars"]["refresh_help_ai"] is True:
self.refresh_vector = True
self.refresh_train = True
else:
self.refresh_vector = False
self.refresh_train = False
except Exception: # pylint: disable=broad-exception-caught # if LLM not initiated move on
pass
# Try to load variables for llm. If missing, just pass and move on.
Expand Down
16 changes: 16 additions & 0 deletions openad/app/main_lib.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,11 @@
service_up,
local_service_up,
model_service_config,
add_service_auth_group,
remove_service_auth_group,
attach_service_auth_group,
detach_service_auth_group,
list_auth_services,
)

# molecules
Expand Down Expand Up @@ -171,12 +176,23 @@ def lang_parse(cmd_pointer, parser):
return local_service_up(cmd_pointer, parser)
elif parser.getName() == "service_down":
return service_down(cmd_pointer, parser)
elif parser.getName() == "add_service_auth_group":
return add_service_auth_group(cmd_pointer, parser)
elif parser.getName() == "remove_service_auth_group":
return remove_service_auth_group(cmd_pointer, parser)
elif parser.getName() == "attach_service_auth_group":
return attach_service_auth_group(cmd_pointer, parser)
elif parser.getName() == "detach_service_auth_group":
return detach_service_auth_group(cmd_pointer, parser)
elif parser.getName() == "list_auth_services":
return list_auth_services(cmd_pointer, parser)

# Language Model How To
elif parser.getName() == "how_do_i":
result = how_do_i(cmd_pointer, parser)
if result is False:
return False
cmd_pointer.settings["env_vars"]["refresh_help_ai"] = False
update_main_registry_env_var(cmd_pointer, "refresh_help_ai", False)
write_registry(cmd_pointer.settings, cmd_pointer)
return result
Expand Down
2 changes: 1 addition & 1 deletion openad/app/metadata.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"banner": "OPENAD",
"title": "Welcome to the Open Accelerated Discovery CLI",
"version": "0.3.1",
"version": "0.3.2",
"commands": {
"intro": "If this is your first time, let us help you getting started.",
"? toolkits": "Learn about the toolkits.",
Expand Down
6 changes: 4 additions & 2 deletions openad/core/grammar.py
Original file line number Diff line number Diff line change
Expand Up @@ -1296,7 +1296,8 @@ def output_train_statements(cmd_pointer):
Load: load a file from project directory to Target system
pyparsing_statement: a statement defined using pyparsing for the domain specific language
help_text: description of the Domain Specific language statement defined in a pyparsing_statement
toolkit: these are contextual plugins that are available one at a time for providing specific functionality to the user. Valid toolkits are DS4SD (deepSearch), RXN (retro synthesis), ST4SD(simulation toolkit)
toolkit: these are contextual plugins that are available one at a time for providing specific functionality to the user. Valid toolkits are DS4SD (deep Search), RXN (retro synthesis), ST4SD(simulation toolkit)
The Deep Search toolkit and RX toolkits have separate help outlining specific commands avilable to the user
History: History of DSL commands for a given Workspace
run: list of sequential commands saved by the user')
working list: is a set of molecules in memory that can added to using the 'add molecule' command and also loaded from a molecule-set and maipulated by commands suchs as 'display molecule', 'add Molecule','create molecule', 'remove molecule' 'merge mol-set'
Expand All @@ -1309,9 +1310,10 @@ def output_train_statements(cmd_pointer):
molecule-set: a molecule-set is a set a copy of a working list of molecules that has been stored in disk under a molecule set name and can be loaded into the working list of molecules in a users sessions
The short form of 'molecule-set' is 'molset'
The short form of 'molecule' is 'mol'
The Model Service is a capability to register and launch model services for property prediction and data set generation and allows you to launch ones you catalog yourself or remotely catalog already running services.
If a user asks for parameters or options this refers to the parameters that can be given to a function. Make sure all parameters are provided to the user
The Following commands are used to work with working list of molecules:
- add molecule <name> | <smiles> | <inchi> | <inchikey> | <cid> [as '<name>' ] [ basic ] [force ]
Expand Down
7 changes: 5 additions & 2 deletions openad/llm_assist/llm_interface.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,14 +14,13 @@
from openad.app.global_var_lib import _meta_dir
from openad.helpers.credentials import load_credentials


from openad.helpers.credentials import write_credentials, get_credentials
from openad.app.global_var_lib import GLOBAL_SETTINGS


# Constants
TRAINING_LLM_DIR = "/prompt_train/"
SUPPORTED_LLMS = ["WATSONX", "OPENAI", "BAM"]
SUPPORTED_LLMS = ["WATSONX", "OPENAI", "BAM", "OLLAMA"]
PROMPT_DIR = "~/.chat_embedded"
STANDARD_FILE_TYPES_EMBED = ["*.txt", "*.ipynb", "*.run", "*.cdoc", "*.pdf"]
EXTENDED_FILE_TYPES_EMBED = ["**/*.txt", "**/*.ipynb", "**/*.run", "**/*.cdoc", "**/*.pdf"]
Expand Down Expand Up @@ -200,6 +199,10 @@ def clean_up_llm_text(cmd_pointer, old_text):
text = re.sub(r"\`([a-z]*[\s\S]*?)\`", r" <cmd>\1</cmd> ", text)
text = re.sub(r"\`(\n*?)(\s*?)(\%*?)([a-z]\n*[\s\S]*?)(\n*?)(\s*?)\`", r" <cmd>\3\4</cmd> ", text)
text = text.replace("<br>", "\n")
text = text.replace("&lt;", "<")
text = text.replace("&gt;", ">")
text = text.replace("<cmd><cmd>", "<cmd>")
text = text.replace("</cmd></cmd>", "</cmd>")

# nuance of llm instructued to use markdown

Expand Down
Loading

0 comments on commit 0f2b748

Please sign in to comment.