- Add emojis to the README.md file. (#8)

- Add examples how to use the integration with OpenAI Assistant.
jirispilka · Oct 9, 2024 · 70dbc2a · 70dbc2a
1 parent 51399cd
commit 70dbc2a
Show file tree

Hide file tree

Showing 4 changed files with 171 additions and 9 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,10 @@
 # Change Log
 
+## 0.2.2 (2024-10-09)
+
+- Add emojis to the README.md file.
+- Add examples how to use the integration with OpenAI Assistant.
+
 ## 0.2.1 (2024-07-02)
 
 - Fix issue with pagination when listing files in the OpenAI Assistant.

diff --git a/README.md b/README.md
@@ -15,7 +15,7 @@ You can easily run the [OpenAI Vector Store Integration](https://apify.com/jiri.
 
 Read a detailed guide in [How we built an enterprise support assistant using OpenAI and the Apify platform](https://blog.apify.com/enterprise-support-openai-assistant/).
 
-## How does OpenAI Assistant Integration work?
+## ֎ How does OpenAI Assistant Integration work?
 
 Data for the Vector Store and Assistant are provided by various [Apify actors](https://apify.com/store) and includes web content, Docx, Pdf, Pptx, and other files.
 
@@ -27,19 +27,22 @@ The integration process includes:
 - Adding the newly created files to the vector store.
 - _[Optional]_ Deleting existing files from the OpenAI files (specified by `fileIdsToDelete` and/or `filePrefix`)
 
-## How much does it cost?
+## 💰 How much does it cost?
+
 Find the average usage cost for this actor on the [pricing page](https://apify.com/pricing) under the `Which plan do I need?` section.
 Additional costs are associated with the use of OpenAI Assistant. Please refer to their [pricing](https://openai.com/pricing) for details.
 
-## Before you start
+Since the integration is designed to upload entire dataset as a OpenAI file, the cost is minimal, typically less than $0.01 per run.
+
+## ✅ Before you start
 
-To utilize this integration, ensure you have:
+To use this integration, ensure you have:
 
 - An OpenAI account and an `OpenAI API KEY`. Create a free account at [OpenAI](https://beta.openai.com/).
 - Created an [OpenAI Vector Store](https://platform.openai.com/docs/assistants/tools/file-search/vector-stores). You will need `vectorStoreId` to run this integration.
 - Created an [OpenAI Assistant](https://platform.openai.com/docs/assistants/overview).
 
-## Inputs
+## ➡️ Inputs
 
 Refer to [input schema](.actor/input_schema.json) for details.
 
@@ -55,11 +58,11 @@ Refer to [input schema](.actor/input_schema.json) for details.
 - `keyValueStoreId`: _[Debug]_ Apify's Key Value Store ID (when running Actor as standalone without integration).
 - `saveInApifyKeyValueStore`: _[Debug]_ Save all created files in the Apify Key-Value Store to easily check and retrieve all files (this is typically used when debugging)
 
-## Outputs
+## ⬅️ Outputs
 
 This integration saves selected `datasetFields` from your Actor to the OpenAI Assistant and optionally to Actor Key Value Storage (useful for debugging).
 
-## Save data from Website Content Crawler to OpenAI Vector Store
+## 💾 Save data from Website Content Crawler to OpenAI Vector Store
 
 To use this integration, you need an OpenAI account and an `OpenAI API KEY`.
 Additionally, you need to create an OpenAI Vector Store (`vectorStoreId`).
@@ -92,7 +95,7 @@ Specify which fields you want to save to the OpenAI Vector Store, e.g., `["text"
 }
 ```
 
-### Update existing files in the OpenAI Vector Store
+### 🔄 Update existing files in the OpenAI Vector Store
 
 There are two ways to update existing files in the OpenAI Vector Store.
 You can either delete all files with a specific prefix or delete specific files by their IDs.
@@ -111,7 +114,7 @@ The settings for the integration are as follows:
 }
 ```
 
-### Save Amazon Products to OpenAI Vector Store
+### 📦 Save Amazon Products to OpenAI Vector Store
 
 You can also save Amazon products to the OpenAI Vector Store.
 Again, you need to have an OpenAI account and an `OpenAI API KEY` with a created OpenAI Vector Store (`vectorStoreId`).

diff --git a/examples/2024-10-08-docs_assistant_rag_web_browser.py b/examples/2024-10-08-docs_assistant_rag_web_browser.py
@@ -0,0 +1,93 @@
+# ruff: noqa:T201,SIM115
+
+"""
+- Create OpenAI Assistant and add tools
+- Create a thread and a message
+- Run the Assistant and poll for the results
+- Submit tool outputs
+- Get assistant answer
+"""
+
+from __future__ import annotations
+
+import json
+from typing import TYPE_CHECKING
+
+from apify_client import ApifyClient
+from openai import OpenAI, Stream
+from openai.types.beta.threads.run_submit_tool_outputs_params import ToolOutput
+
+if TYPE_CHECKING:
+    from openai.types.beta import AssistantStreamEvent
+    from openai.types.beta.threads import Run
+
+client = OpenAI(api_key="YOUR-OPENAI-API-KEY")
+apify_client = ApifyClient("YOUR-APIFY-API-TOKEN")
+
+INSTRUCTIONS = """
+You are a smart and helpful assistant. Maintain an expert, friendly, and informative tone in your responses.
+Your task is to answer questions based on information from the internet.
+Always call call_rag_web_browser function to retrieve the latest and most relevant online results.
+Never provide answers based solely on your own knowledge.
+For each answer, always include relevant sources whenever possible.
+"""
+
+rag_web_browser_function = {
+    "type": "function",
+    "function": {
+        "name": "call_rag_web_browser",
+        "description": "Query Google search, scrape the top N pages from the results, and returns their cleaned content as markdown",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "query": {"type": "string", "description": "Use regular search words or enter Google Search URLs. "},
+                "maxResults": {"type": "integer", "description": "The number of top organic search results to return and scrape text from"}
+            },
+            "required": ["query"]
+        }
+    }
+}
+
+my_assistant = client.beta.assistants.retrieve("asst_7GXx3q9lWLmhSf9yexA7J1WX")
+
+
+def call_rag_web_browser(query: str, max_results: int) -> list[dict]:
+    """
+    Query Google search, scrape the top N pages from the results, and returns their cleaned content as markdown.
+    First start the Actor and wait for it to finish. Then fetch results from the Actor run's default dataset.
+    """
+    actor_call = apify_client.actor("apify/rag-web-browser").call(run_input={"query": query, "maxResults": max_results})
+    return apify_client.dataset(actor_call["defaultDatasetId"]).list_items().items
+
+
+def submit_tool_outputs(run_: Run) -> Run | Stream[AssistantStreamEvent]:
+    """ Submit tool outputs to continue the run """
+    tool_output = []
+    for tool in run_.required_action.submit_tool_outputs.tool_calls:
+        if tool.function.name == "call_rag_web_browser":
+            d = json.loads(tool.function.arguments)
+            output = call_rag_web_browser(query=d["query"], max_results=d["maxResults"])
+            tool_output.append(ToolOutput(tool_call_id=tool.id, output=json.dumps(output)))
+            print("RAG-Web-Browser added as a tool output.")
+
+    return client.beta.threads.runs.submit_tool_outputs_and_poll(thread_id=run_.thread_id, run_id=run_.id, tool_outputs=tool_output)
+
+
+# Runs are asynchronous, which means you'll want to monitor their status by polling the Run object until a terminal status is reached.
+thread = client.beta.threads.create()
+message = client.beta.threads.messages.create(
+    thread_id=thread.id, role="user", content="What are the latest LLM news?"
+)
+
+# Run with assistant and poll for the results
+run = client.beta.threads.runs.create_and_poll(thread_id=thread.id, assistant_id=my_assistant.id)
+
+if run.status == "requires_action":
+    run = submit_tool_outputs(run)
+
+print("Assistant response:")
+for m in client.beta.threads.messages.list(thread_id=run.thread_id):
+    print(m.content[0].text.value)
+
+# Delete the thread
+client.beta.threads.delete(thread.id)
diff --git a/examples/2024-10-08-docs_assistant_vector_store.py b/examples/2024-10-08-docs_assistant_vector_store.py
@@ -0,0 +1,61 @@
+# ruff: noqa:T201,SIM115
+"""
+- Create OpenAI Assistant and vVector Store
+- Update the assistant to use the new Vector Store
+- Call Website Content Crawler and crawl docs.apify.com
+- Use OpenAi Vector Store Integration to upload dataset items to the Vector Store
+- Create a thread and a message and get assistant answer
+"""
+from apify_client import ApifyClient
+from openai import OpenAI
+
+client = OpenAI(api_key="YOUR-OPENAI-API-KEY")
+apify_client = ApifyClient("YOUR-APIFY-API-TOKEN")
+
+my_assistant = client.beta.assistants.create(
+    instructions="As a customer support agent at Apify, your role is to assist customers",
+    name="Support assistant",
+    tools=[{"type": "file_search"}],
+    model="gpt-4o-mini",
+)
+
+# Create a vector store
+vector_store = client.beta.vector_stores.create(name="Support assistant vector store")
+
+# Update the assistant to use the new Vector Store
+assistant = client.beta.assistants.update(
+    assistant_id=my_assistant.id,
+    tool_resources={"file_search": {"vector_store_ids": [vector_store.id]}},
+)
+
+run_input = {"startUrls": [{"url": "https://docs.apify.com/platform"}], "maxCrawlPages": 10, "crawlerType": "cheerio"}
+actor_call_website_crawler = apify_client.actor("apify/website-content-crawler").call(run_input=run_input)
+
+dataset_id = actor_call_website_crawler["defaultDatasetId"]
+
+run_input_vs = {
+    "datasetId": dataset_id,
+    "assistantId": my_assistant.id,
+    "datasetFields": ["text", "url"],
+    "openaiApiKey": "YOUR-OPENAI-API-KEY",
+    "vectorStoreId": vector_store.id,
+}
+
+apify_client.actor("jiri.spilka/openai-vector-store-integration").call(run_input=run_input_vs)
+
+# Create a thread and a message
+thread = client.beta.threads.create()
+message = client.beta.threads.messages.create(
+    thread_id=thread.id, role="user", content="How can I scrape a website using Apify?"
+)
+
+# Run with assistant and poll for the results
+run = client.beta.threads.runs.create_and_poll(
+    thread_id=thread.id,
+    assistant_id=assistant.id,
+    tool_choice={"type": "file_search"}
+)
+
+print("Assistant response:")
+for m in client.beta.threads.messages.list(thread_id=run.thread_id):
+    print(m.content[0].text.value)