Merge pull request #23 from alkem-io/develop

* fully refactored code * updated outstanding tasks in README * fix ingestion and make file paths fully configurable * update docker composew file to add rabbitmq server and persist vector database * Update .openai-template.env * add llm usage and cost info * updates after debugging session * make 'docker on Windows proof' and update docker compose file * refactor query chain so a chatLLM is used for the query and 'normal' llm for condensing the question * improve language capabilities * updated generate website script to run on windows - no dash in filename, absolute path for hugo; additional logging * fix additional path issue * moved from shell script to python for the generation * fixed additional path issues; reverted order in app.py * re-enable website generation * misc tidy up * reverted to loading hugo from the path + setting path properly * Refactoring * Made naming more consistent * revert some temperature changes some llm's need zero temperature. * Update ai_utils.py fix typo in prompt * finetuning and dependency updates --------- Co-authored-by: Rene Honig <[email protected]> Co-authored-by: Neil Smyth <[email protected]>
alkem-io · Sep 19, 2023 · 23db36e · 23db36e
2 parents c5776d2 + 4dd79ba
commit 23db36e
Show file tree

Hide file tree

Showing 13 changed files with 932 additions and 937 deletions.
diff --git a/.azure-template.env b/.azure-template.env
@@ -8,4 +8,7 @@ RABBITMQ_PASSWORD=super-secure-pass
 AI_MODEL_TEMPERATURE=0.3
 AI_MODEL_NAME=gpt-35-turbo
 AI_DEPLOYMENT_NAME=deploy-gpt-35-turbo
-AI_EMBEDDINGS_DEPLOYMENT_NAME=embedding
+AI_EMBEDDINGS_DEPLOYMENT_NAME=embedding
+AI_SOURCE_WEBSITE=https://www.alkemio.org
+AI_LOCAL_PATH=~/alkemio/data
+AI_WEBSITE_REPO=https://github.com/alkem-io/website.git
diff --git a/.gitignore b/.gitignore
@@ -3,4 +3,5 @@
 openai.env
 azure.env
 /__pycache__/*
-/local_index/*
+/vectordb/*
+local.env
diff --git a/.openai-template.env b/.openai-template.env
@@ -1 +1,11 @@
-OPENAI_API_KEY=api-key
+OPENAI_API_KEY=api-key
+AI_SOURCE_WEBSITE=https://www.alkemio.org
+AI_LOCAL_PATH=~/alkemio/data
+RABBITMQ_HOST=localhost
+RABBITMQ_USER=admin
+RABBITMQ_PASSWORD=super-secure-pass
+AI_MODEL_TEMPERATURE=0.3
+AI_MODEL_NAME=gpt-35-turbo
+AI_SOURCE_WEBSITE=https://www.alkemio.org
+AI_LOCAL_PATH=~/alkemio/data
+AI_WEBSITE_REPO=https://github.com/alkem-io/website.git
diff --git a/Dockerfile b/Dockerfile
@@ -4,15 +4,23 @@ FROM python:3-slim-bookworm
 # Set the working directory in the container to /app
 WORKDIR /app
 
+ARG GO_VERSION=1.21.1
+ARG HUGO_VERSIOM=0.118.2
+ARG ARCHITECTURE=amd64
+
+# install git, go and hugo
+RUN  apt update &&   apt upgrade -y && apt install git wget -y
+RUN wget https://go.dev/dl/go${GO_VERSION}.linux-${ARCHITECTURE}.tar.gz && tar -C /usr/local -xzf go${GO_VERSION}.linux-${ARCHITECTURE}.tar.gz 
+RUN export PATH=$PATH:/usr/local/go/bin:/usr/local && go version
+RUN wget https://github.com/gohugoio/hugo/releases/download/v${HUGO_VERSIOM}/hugo_extended_${HUGO_VERSIOM}_linux-${ARCHITECTURE}.tar.gz && tar -C /usr/local -xzf hugo_extended_${HUGO_VERSIOM}_linux-${ARCHITECTURE}.tar.gz && ls -al /usr/local
+RUN /usr/local/hugo version
+
 # Install Poetry
 RUN pip install poetry
 
 # Copy the current directory contents into the container at /app
 COPY . /app
 
-# install chromium-driver
-RUN apt update && apt install chromium-driver -y
-
 # Use Poetry to install dependencies
 RUN poetry config virtualenvs.create true && poetry install --no-interaction --no-ansi
 

diff --git a/README.md b/README.md
@@ -24,23 +24,50 @@ Training a LLM is prohibitatively expensive for most organisations, but for most
 The projects has been implemented as a container based micro-service with a RabbitMQ RPC. There is one RabbitMQ queue:
 - `alkemio-chat-guidance` - queue for submitting requests to the microservice
 
-The request payload consists of json with the following structure `{"operation" : "*operation type*", "param": "*addition request data*"} 
+The request payload consists of json with the following structure (example for a query):
+```
+{
+    "data": {
+        "userId": "userID",
+        "question": "What are the key Alkemio concepts?",
+        "language": "UK"
+    },
+    "pattern": {
+        "cmd": "query"
+    }
+}
+```
 
 The operation types are:
-- `ingest`: data collection from the Alkemio foundation website and embedding using the [OpenAI Ada text model](https://openai.com/blog/new-and-improved-embedding-model), no *addition request data*.
-- `reset`: reset the chat history for the ongoing chat, no *addition request data*.
-- `query`: post the next question in a chat sequence, with user question as *addition request data*
+- `ingest`: data collection from the Alkemio foundation website (through the Github source) and embedding using the [OpenAI Ada text model](https://openai.com/blog/new-and-improved-embedding-model), no *addition request data*.
+- `reset`: reset the chat history for the ongoing chat, needs userId.
+- `query`: post the next question in a chat sequence, see exmaple
 
 The response is published in an auto-generated, exclusive, unnamed queue.
 
+There is a draft implementation for the interaction language of the model (this needs significant improvement). If no language code is specified, English will be assumed. Choices are:
+    'EN': 'English',
+    'US': 'English',
+    'UK': 'English',
+    'FR': 'French',
+    'DE': 'German',
+    'ES': 'Spanish',
+    'NL': 'Dutch',
+    'BG': 'Bulgarian',
+    'UA': "Ukranian"
+
 *note: there is an earlier (outdated) RESTful implementation available at https://github.com/alkem-io/guidance-engine/tree/http-api
 
 ### Docker 
-The following command can be used to build the container from the Docker CLI:
-`docker build -t guidance-engine . `
+The following command can be used to build the container from the Docker CLI (default architecture is amd64, so `--build-arg ARCHITECTURE=arm64` for amd64 builds):
+`docker build --build-arg ARCHITECTURE=arm64 --no-cache -t alkemio/guidance-engine:v0.2.0 .`
+`docker build--no-cache -t alkemio/guidance-engine:v0.2.0 .`
+The Dockerfile has some self-explanatory configuration arguments.
 
 The following command can be used to start the container from the Docker CLI:
-`docker run --name guidance-engine -v /dev/shm:/dev/shm -v .env guidance-engine`
+`docker run --name guidance-engine -v /dev/shm:/dev/shm --env-file .env guidance-engine`
+where `.env` based on `.azure-template.env`
+Alternatively use `docker-compose up -d`.
 
 with:
 - `OPENAI_API_KEY`: a valid OpenAI API key
@@ -54,6 +81,9 @@ with:
 - `AI_MODEL_NAME`: the model name in Azure
 - `AI_DEPLOYMENT_NAME`: the AI gpt model deployment name in Azure
 - `AI_EMBEDDINGS_DEPLOYMENT_NAME`: the AI embeddings model deployment name in Azure
+- `AI_SOURCE_WEBSITE`: the URL of the website that contains the source data (for references only)
+- `AI_LOCAL_PATH`: local file path for storing data
+- `AI_WEBSITE_REPO`: url of the Git repository containing the website source data, based on Hugo
 
 You can find sample values in `.azure-template.env` and `.openai-template.env`. Configure them and create `.env` file with the updated settings.
 
@@ -62,9 +92,19 @@ The project requires Python & Poetry installed. The minimum version dependencies
 After installing Python & Poetry, you simply need to run `poetry run python app.py`
 
 ### Linux
-The project required Python 3.10 as a minimum and the chromium driver is required for scraing of the Alkemio website:
-install Chromium-driver: `sudo apt-get install chromium-driver`
-
-Note: make sure the version of the driver of chromium-driver is compatible with your chrome version. Otherwise, in order for this to work, you will need to:
-- re-install chrome / chromium driver to match the versions
-- uninstall chrome
+The project requires Python 3.11 as a minimum and needs Go and Hugo installed for creating a local version of the website. See Go and Hugo documentation for installation instructions (only when running outside container)
+
+
+## Outstanding
+The following tasks are still outstanding:
+- clean up code and add more comments.
+- improve interaction language.
+- assess overall quality and performance of the model and make improvements as and when required.
+- assess the need to summarize the chat history to avoid exceeding the prompt token limit.
+- update the yaml manifest.
+- add error handling.
+- perform extensive testing, in particular in multi-user scenarios.
+- look at improvements of the ingestion. As a minimum the service engine should not consume queries whilst the ingestion is ongoing, as thatwill lead to errors.
+- look at the use of `temperature` for the `QARetrievalChain`. It is not so obvious how this is handled.
+- look at the possibility to implement reinforcement learning.
+- return the actual LLM costs and token usage for queries.
diff --git a/ai_utils.py b/ai_utils.py
@@ -1,39 +1,118 @@
-from langchain.prompts.prompt import PromptTemplate
 from langchain.embeddings import OpenAIEmbeddings
-from langchain.chains import ConversationalRetrievalChain
 from langchain.vectorstores import FAISS
 from langchain.llms import AzureOpenAI
+from langchain.prompts import PromptTemplate
+from langchain.chat_models import AzureChatOpenAI
+from langchain.chains import ConversationalRetrievalChain, LLMChain
+from langchain.chains.question_answering import load_qa_chain
+from langchain.chains.conversational_retrieval.prompts import CONDENSE_QUESTION_PROMPT, QA_PROMPT
 
 import os
 
+# define internal configuration parameters
+# token limit for retrieval chain
+max_token_limit = 2000
+# verbose output for LLMs
+verbose_models = True
+# doews chain return the source documents?
+return_source_document = True
 
-# Set Context for response
-TEMPLATE = """
-- Act as a product and innovation expert.
-- Your task is to answer user questions. 
-- Return your response in markdown, and highlight important elements.
-- If the answer cannot be found within the context, write 'I could not find an answer to your question'.
-- Provide concise replies that are polite and professional. 
-- Use the following context to answer the query. 
 
+# Define a dictionary containing country codes as keys and related languages as values
+language_mapping = {
+    'EN': 'English',
+    'US': 'English',
+    'UK': 'English',
+    'FR': 'French',
+    'DE': 'German',
+    'ES': 'Spanish',
+    'NL': 'Dutch',
+    'BG': 'Bulgarian',
+    'UA': "Ukranian"
+}
+
+# function to retrieve language from country
+def get_language_by_code(language_code):
+    """Returns the language associated with the given code. If no match is found, it returns 'English'."""
+    return language_mapping.get(language_code, 'English')
+
+
+chat_template = """
+You are a conversational agent. Use the following step-by-step instructions to respond to user inputs.
+1 - The text provided in the context delimited by triple pluses may contain questions. Remove those questions from the context. 
+2 - Provide a single paragragh answer that is polite and professional taking into account the context delimited by triple pluses. If the answer cannot be found within the context, write 'I could not find an answer to your question'.
++++
 Context:
 {context}
++++
+Question: {question}
+"""
 
-Question:
-{question}
+custom_question_template = """"
+Combine the chat history and follow up question into a standalone question. 
++++
+Chat History: {chat_history}
++++
+Follow up question: {question}
++++
+Standalone question:
 """
 
-QA_PROMPT = PromptTemplate(template=TEMPLATE, input_variables=["question", "context"])
+translate_template = """"
+Act as a professional translator. Use the following step-by-step instructions:
+1: assess in what language input below delimited by triple pluses is written.
+2. carry out one of tasks A or B below:
+A: if the input language is different from {language} then translate the input below delimited by triple pluses to natural {language} language, maintaining tone of voice and length
+B: if the input language is the same as {language} there is no need for translation, simply return the original input below delimited by triple pluses as the answer.
+3. Only return the answer from step 2, do not show any code or additional information.
++++
+input:
+{answer}
++++
+Translated input:
+"""
+
+custom_question_prompt = PromptTemplate(
+    template=custom_question_template, input_variables=["chat_history", "question"]
+)
+
+translation_prompt = PromptTemplate(
+    template=translate_template, input_variables=["language", "answer"]
+)
 
+# prompt to be used by retrieval chain, note this is the default prompt name, so nowhere assigned
+QA_PROMPT = PromptTemplate(
+    template=chat_template, input_variables=["question", "context"]
+)
+
+def translate_answer(answer, language):
+    translate_llm = AzureOpenAI(deployment_name=os.environ["AI_DEPLOYMENT_NAME"], model_name=os.environ["AI_MODEL_NAME"],
+                                temperature=0, verbose=verbose_models)
+    prompt = translation_prompt.format(answer=answer, language=language)
+    return translate_llm(prompt)
+
+
+def setup_chain(db_path):
+    generic_llm = AzureOpenAI(deployment_name=os.environ["AI_DEPLOYMENT_NAME"], model_name=os.environ["AI_MODEL_NAME"],
+                              temperature=0, verbose=verbose_models)
 
-def setup_chain():
-    llm = AzureOpenAI(deployment_name=os.environ["AI_DEPLOYMENT_NAME"], model_name=os.environ["AI_MODEL_NAME"], temperature=os.environ["AI_MODEL_TEMPERATURE"])
     embeddings = OpenAIEmbeddings(deployment=os.environ["AI_EMBEDDINGS_DEPLOYMENT_NAME"], chunk_size=1)
-    vectorstore = FAISS.load_local("local_index", embeddings)
 
-    chain = ConversationalRetrievalChain.from_llm(
-        llm, vectorstore.as_retriever(), return_source_documents=True
-    )
-    print("\n\nchain:\n", chain)
+    vectorstore = FAISS.load_local(db_path, embeddings)
+    retriever = vectorstore.as_retriever()
+
+    chat_llm = AzureChatOpenAI(deployment_name=os.environ["AI_DEPLOYMENT_NAME"],
+                               model_name=os.environ["AI_MODEL_NAME"], temperature=os.environ["AI_MODEL_TEMPERATURE"],
+                               max_tokens=max_token_limit)
 
-    return chain
+    conversation_chain = ConversationalRetrievalChain.from_llm(
+        llm=chat_llm,
+        retriever=retriever,
+        condense_question_prompt=custom_question_prompt,
+        chain_type="stuff",
+        verbose=verbose_models,
+        condense_question_llm=generic_llm,
+        return_source_documents=True,
+        combine_docs_chain_kwargs={"prompt": QA_PROMPT}
+    )
+    return conversation_chain