Skip to content

Commit

Permalink
- Add emojis to the README.md file. (#8)
Browse files Browse the repository at this point in the history
- Add examples how to use the integration with OpenAI Assistant.
  • Loading branch information
jirispilka authored Oct 9, 2024
1 parent 51399cd commit 70dbc2a
Show file tree
Hide file tree
Showing 4 changed files with 171 additions and 9 deletions.
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
# Change Log

## 0.2.2 (2024-10-09)

- Add emojis to the README.md file.
- Add examples how to use the integration with OpenAI Assistant.

## 0.2.1 (2024-07-02)

- Fix issue with pagination when listing files in the OpenAI Assistant.
Expand Down
21 changes: 12 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ You can easily run the [OpenAI Vector Store Integration](https://apify.com/jiri.

Read a detailed guide in [How we built an enterprise support assistant using OpenAI and the Apify platform](https://blog.apify.com/enterprise-support-openai-assistant/).

## How does OpenAI Assistant Integration work?
## ֎ How does OpenAI Assistant Integration work?

Data for the Vector Store and Assistant are provided by various [Apify actors](https://apify.com/store) and includes web content, Docx, Pdf, Pptx, and other files.

Expand All @@ -27,19 +27,22 @@ The integration process includes:
- Adding the newly created files to the vector store.
- _[Optional]_ Deleting existing files from the OpenAI files (specified by `fileIdsToDelete` and/or `filePrefix`)

## How much does it cost?
## 💰 How much does it cost?

Find the average usage cost for this actor on the [pricing page](https://apify.com/pricing) under the `Which plan do I need?` section.
Additional costs are associated with the use of OpenAI Assistant. Please refer to their [pricing](https://openai.com/pricing) for details.

## Before you start
Since the integration is designed to upload entire dataset as a OpenAI file, the cost is minimal, typically less than $0.01 per run.

## ✅ Before you start

To utilize this integration, ensure you have:
To use this integration, ensure you have:

- An OpenAI account and an `OpenAI API KEY`. Create a free account at [OpenAI](https://beta.openai.com/).
- Created an [OpenAI Vector Store](https://platform.openai.com/docs/assistants/tools/file-search/vector-stores). You will need `vectorStoreId` to run this integration.
- Created an [OpenAI Assistant](https://platform.openai.com/docs/assistants/overview).

## Inputs
## ➡️ Inputs

Refer to [input schema](.actor/input_schema.json) for details.

Expand All @@ -55,11 +58,11 @@ Refer to [input schema](.actor/input_schema.json) for details.
- `keyValueStoreId`: _[Debug]_ Apify's Key Value Store ID (when running Actor as standalone without integration).
- `saveInApifyKeyValueStore`: _[Debug]_ Save all created files in the Apify Key-Value Store to easily check and retrieve all files (this is typically used when debugging)

## Outputs
## ⬅️ Outputs

This integration saves selected `datasetFields` from your Actor to the OpenAI Assistant and optionally to Actor Key Value Storage (useful for debugging).

## Save data from Website Content Crawler to OpenAI Vector Store
## 💾 Save data from Website Content Crawler to OpenAI Vector Store

To use this integration, you need an OpenAI account and an `OpenAI API KEY`.
Additionally, you need to create an OpenAI Vector Store (`vectorStoreId`).
Expand Down Expand Up @@ -92,7 +95,7 @@ Specify which fields you want to save to the OpenAI Vector Store, e.g., `["text"
}
```

### Update existing files in the OpenAI Vector Store
### 🔄 Update existing files in the OpenAI Vector Store

There are two ways to update existing files in the OpenAI Vector Store.
You can either delete all files with a specific prefix or delete specific files by their IDs.
Expand All @@ -111,7 +114,7 @@ The settings for the integration are as follows:
}
```

### Save Amazon Products to OpenAI Vector Store
### 📦 Save Amazon Products to OpenAI Vector Store

You can also save Amazon products to the OpenAI Vector Store.
Again, you need to have an OpenAI account and an `OpenAI API KEY` with a created OpenAI Vector Store (`vectorStoreId`).
Expand Down
93 changes: 93 additions & 0 deletions examples/2024-10-08-docs_assistant_rag_web_browser.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
# ruff: noqa:T201,SIM115

"""
- Create OpenAI Assistant and add tools
- Create a thread and a message
- Run the Assistant and poll for the results
- Submit tool outputs
- Get assistant answer
"""

from __future__ import annotations

import json
from typing import TYPE_CHECKING

from apify_client import ApifyClient
from openai import OpenAI, Stream
from openai.types.beta.threads.run_submit_tool_outputs_params import ToolOutput

if TYPE_CHECKING:
from openai.types.beta import AssistantStreamEvent
from openai.types.beta.threads import Run

client = OpenAI(api_key="YOUR-OPENAI-API-KEY")
apify_client = ApifyClient("YOUR-APIFY-API-TOKEN")

INSTRUCTIONS = """
You are a smart and helpful assistant. Maintain an expert, friendly, and informative tone in your responses.
Your task is to answer questions based on information from the internet.
Always call call_rag_web_browser function to retrieve the latest and most relevant online results.
Never provide answers based solely on your own knowledge.
For each answer, always include relevant sources whenever possible.
"""

rag_web_browser_function = {
"type": "function",
"function": {
"name": "call_rag_web_browser",
"description": "Query Google search, scrape the top N pages from the results, and returns their cleaned content as markdown",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "Use regular search words or enter Google Search URLs. "},
"maxResults": {"type": "integer", "description": "The number of top organic search results to return and scrape text from"}
},
"required": ["query"]
}
}
}

my_assistant = client.beta.assistants.retrieve("asst_7GXx3q9lWLmhSf9yexA7J1WX")


def call_rag_web_browser(query: str, max_results: int) -> list[dict]:
"""
Query Google search, scrape the top N pages from the results, and returns their cleaned content as markdown.
First start the Actor and wait for it to finish. Then fetch results from the Actor run's default dataset.
"""
actor_call = apify_client.actor("apify/rag-web-browser").call(run_input={"query": query, "maxResults": max_results})
return apify_client.dataset(actor_call["defaultDatasetId"]).list_items().items


def submit_tool_outputs(run_: Run) -> Run | Stream[AssistantStreamEvent]:
""" Submit tool outputs to continue the run """
tool_output = []
for tool in run_.required_action.submit_tool_outputs.tool_calls:
if tool.function.name == "call_rag_web_browser":
d = json.loads(tool.function.arguments)
output = call_rag_web_browser(query=d["query"], max_results=d["maxResults"])
tool_output.append(ToolOutput(tool_call_id=tool.id, output=json.dumps(output)))
print("RAG-Web-Browser added as a tool output.")

return client.beta.threads.runs.submit_tool_outputs_and_poll(thread_id=run_.thread_id, run_id=run_.id, tool_outputs=tool_output)


# Runs are asynchronous, which means you'll want to monitor their status by polling the Run object until a terminal status is reached.
thread = client.beta.threads.create()
message = client.beta.threads.messages.create(
thread_id=thread.id, role="user", content="What are the latest LLM news?"
)

# Run with assistant and poll for the results
run = client.beta.threads.runs.create_and_poll(thread_id=thread.id, assistant_id=my_assistant.id)

if run.status == "requires_action":
run = submit_tool_outputs(run)

print("Assistant response:")
for m in client.beta.threads.messages.list(thread_id=run.thread_id):
print(m.content[0].text.value)

# Delete the thread
client.beta.threads.delete(thread.id)
61 changes: 61 additions & 0 deletions examples/2024-10-08-docs_assistant_vector_store.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# ruff: noqa:T201,SIM115
"""
- Create OpenAI Assistant and vVector Store
- Update the assistant to use the new Vector Store
- Call Website Content Crawler and crawl docs.apify.com
- Use OpenAi Vector Store Integration to upload dataset items to the Vector Store
- Create a thread and a message and get assistant answer
"""
from apify_client import ApifyClient
from openai import OpenAI

client = OpenAI(api_key="YOUR-OPENAI-API-KEY")
apify_client = ApifyClient("YOUR-APIFY-API-TOKEN")

my_assistant = client.beta.assistants.create(
instructions="As a customer support agent at Apify, your role is to assist customers",
name="Support assistant",
tools=[{"type": "file_search"}],
model="gpt-4o-mini",
)

# Create a vector store
vector_store = client.beta.vector_stores.create(name="Support assistant vector store")

# Update the assistant to use the new Vector Store
assistant = client.beta.assistants.update(
assistant_id=my_assistant.id,
tool_resources={"file_search": {"vector_store_ids": [vector_store.id]}},
)

run_input = {"startUrls": [{"url": "https://docs.apify.com/platform"}], "maxCrawlPages": 10, "crawlerType": "cheerio"}
actor_call_website_crawler = apify_client.actor("apify/website-content-crawler").call(run_input=run_input)

dataset_id = actor_call_website_crawler["defaultDatasetId"]

run_input_vs = {
"datasetId": dataset_id,
"assistantId": my_assistant.id,
"datasetFields": ["text", "url"],
"openaiApiKey": "YOUR-OPENAI-API-KEY",
"vectorStoreId": vector_store.id,
}

apify_client.actor("jiri.spilka/openai-vector-store-integration").call(run_input=run_input_vs)

# Create a thread and a message
thread = client.beta.threads.create()
message = client.beta.threads.messages.create(
thread_id=thread.id, role="user", content="How can I scrape a website using Apify?"
)

# Run with assistant and poll for the results
run = client.beta.threads.runs.create_and_poll(
thread_id=thread.id,
assistant_id=assistant.id,
tool_choice={"type": "file_search"}
)

print("Assistant response:")
for m in client.beta.threads.messages.list(thread_id=run.thread_id):
print(m.content[0].text.value)

0 comments on commit 70dbc2a

Please sign in to comment.