-
Notifications
You must be signed in to change notification settings - Fork 8
/
workshop.qmd
287 lines (207 loc) · 9.75 KB
/
workshop.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
---
jupyter: python3
---
# Union.ai Serverless Workshop
Welcome to the Union.ai Serverless Workshop! In this workshop, we will cover:
1. Setup workspace and connect to Union.ai Serverless.
2. Creating a GPT-based retrieval augmented generation (RAG) workflow.
3. Deploying a Streamlit app to interact with the Union workflow.
4. Transitioning to an open weights LLM-based RAG workflow with Ollama.
## Setup Workspace
1. If you are not signed into Google, sign in by clicking the "Sign in" on the upper right of this page.
2. If you have not already, sign up for Union Serverless at: https://signup.union.ai/?group=workshop
3. Navigate to https://serverless.union.ai
**Note:** running the cell below will cause the Colab session to restart due
to a reinstalled dependency.
```{python}
try:
import google.colab
IN_COLAB = True
except ImportError:
IN_COLAB = False
if IN_COLAB:
!git clone https://github.com/unionai-oss/union-rag.git
%cd union-rag
%pip install -r _workshop/requirements.lock.txt
```
## Authenticate on Colab
To authenticate with Severless run, and go to the link provided to authenticate your CLI.
```{python}
%cd /content/union-rag
!union create login device-flow
```
To make sure everything is working, run this sample workflow:
```{python}
!union run --remote _workshop/starter.py main
```
Go to the link provided to see your execution on Union!
## GPT-based RAG workflow
In this first part, we're going to run a RAG workflow defined as a Flyte workflow,
using LangChain as the LLM pipelining tool and GPT as the underlying LLM.
### Create secrets
Let's create a Union secret for an OpenAI API key. The workshop runner will now
provide a limited key for the purposes of this workshop.
> **Note**: Don't get too excited 🙂, this key has a limit of $50. We will delete
> this key after the workshop. Alternatively, you can use your own API key.
Once you have the key, create a secret on Union with:
```{python}
!union create secret openai_api_key
```
To see all of your secrets, run:
```{python}
!union get secret
```
If you are running into an issue with secrets, uncomment the following code to
delete the secret and try again:
```{python}
# !union delete secret openai_api_key
```
### Create the knowledge base
The first stage in building a RAG application is to create knowledge base that
we represent as a vector store.
```{python}
!union run --copy-all --remote union_rag/simple_rag.py create_knowledge_base \
--exclude_patterns '["/api/", "/_tags/"]' \
--limit 100
```
This step will take about 5-10 minutes, so in the mean time, let's review how a
RAG application works:
![RAG application](./static/rag-application.png)
This workflow builds the knowledge base part of the RAG application, and highlights
a few important capabilities that makes it easy for us to a production-grade
RAG app:
- `ImageSpec`: abstracting the Docker container
- Tasks: the core building block of Flyte
- Workflows: composing building blocks together
- Caching and cache versions
- Resource requests
And it also highlights Union features:
- Artifacts and partitioning
- Remote image-building service
Let's also explore the UI to get some insights into what's going on in this workflow:
- List view
- Graph view
- Timeline view
- View utilization
- Live logs
Once the `create_knowledge_base` workflow is complete, we can take a look at the
artifact that was created through the UI.
### Ask a Flyte-related question
We can then use the `VectorStore` knowledge base we created to ask a flyte-related
question:
```{python}
!union run --copy-all --remote union_rag/simple_rag.py ask --question 'what is flytekit?'
```
There are a few things to note in this workflow:
- We're using the `openai_api_key` secret that we made earlier to use GPT.
- We rehydrate the `FAISS` vector store to use for RAG.
- We create question-answering pipeline with `load_qa_with_sources_chain` using LangChain
We also use `Artifact`s and partitions to consume the correct vector store:
- We use the `partition_keys` argument to define the artifact partitions
- We define the partitioned artifact as a task output with `Artifact(partition=Inputs.arg)`
- We can consume artifact partitions in downstream workflow with the `Artifact.query(partition="<value>")`
Evaluation is also a big part of the RAG lifecycle. To do this, it's helpful to
collect feedback. We can use gate nodes to do this:
```{python}
!union run --copy-all --remote union_rag/simple_rag.py ask_with_feedback --question 'what is flytekit?'
```
Go to the execution on the UI and you'll see the `ask_with_feedback` workflow
has a step in the end where you can give it some feedback. In this case, it's
an unstructured text field, so you can say something like "thumbs-up" or "thumbs-down".
In the next step of this workshop, we'll see how you can interact with this
RAG workflow through a Streamlit app.
## Build a Streamlit App
First, create a Union API key:
```{python}
!union create app streamlit-rag-app
```
Save the API key somewhere secure!
Then, follow these steps to deploy the streamlit app that you can use to interact
with the RAG workflow:
- Sign up for an account on streamlit cloud: https://share.streamlit.io/signup
- Fork the `union-rag` repo: https://github.com/unionai-oss/union-rag
- Click on **Create App** > **Yup, I have an app**, then add the information
needed to deploy:
- **Repository**: `<github_username>/union-rag`
- **Branch**: `main`
- **Main file path**: streamlit/app.py
![streamlit-advanced](./static/streamlit-deploy-init.png){width=350}
- Click **Advanced settings**
![streamlit-advanced](./static/streamlit-deploy.png){width=350}
- Then provide the Union secret under the **Secrets** input field:
```
UNIONAI_SERVERLESS_API_KEY = "<MY_SECRET>"
```
![streamlit-advanced](./static/streamlit-secrets.png){width=350}
- Then click **Deploy!**
After a few seconds, you should be redirected to the streamlit app as it builds.
You should see a chat input box in the bottom of the screen.
Try typing in a flyte-related question! Note: responses will take about 20-30
seconds.
Let's go through how this application works:
- It uses `UnionRemote` to connect to Union Serverless.
- When a user types in a question in the chat box, this kicks off an with `UnionRemote.execute`.
- When the response comes back, we keep track of the execution ids associated
with each response.
- When a users clicks on one of the 👍 or 👎 buttons, we use `UnionRemote.set_signal`
to programmatically provide feedback on the response.
## Ollama-based RAG workflow
First, we need to re-create the knowledge base to use an open-weights embedder.
In this case, we'll use the `all-MiniLM-L6-v2` sentence transformer available
through Hugging Face.
```{python}
!union run --copy-all --remote union_rag/simple_rag.py create_knowledge_base \
--exclude_patterns '["/api/", "/_tags/"]' \
--embedding_type "huggingface" \
--limit 100
```
As we wait for this new knowledge base to build, we can let's dive into a few
aspects of this workflow that are worth noting.
Since we're using Union to host an Ollama server, we need to provision a GPU
to use it effectively. We do this by:
- Specifying the `requests=Resources(cpu="4", mem="24Gi", gpu="1")` argument.
- Defining the `accelerator=accelerators.T4` argument.
It's also important to know how we're setting up the `Ollama` dependencies and
server using `ImageSpec` and task decorators:
- We use the `with_apt_packages` to install additional dependencies that we need
to use Ollama.
- We use the `with_commands` to invoke additional commands to install Ollama and
preload the llama3 model into the container.
- The `ollama_server` task function decorator starts an Ollama server to run
the RAG pipeline and tears it down when it's done.
Next, we invoke the `ask_ollama` workflow to run a RAG workflow that uses `phi3`
to answer our question:
```{python}
!union run --copy-all --remote union_rag/simple_rag.py ask_ollama --question 'what is flytekit?'
```
Why does this take a lot longer than the GPT RAG workflow?
- We're spinning up an ephemeral T4 GPU, starting an Ollama server, then running our RAG task.
- With a regular `@task` this resource is spun down after the task is complete
- The GPT workflow simply does an API call on a Union CPU resource.
- We're working on ways to initialize long-running resources to we don't incur
additional overhead due to ephemeral resources (stay tuned!).
### Customizing the prompts
As you can see, the smaller LLMs are less useful than the off-the-shelf SOTA
models like GPT that you can access via an API.
The iterative part of the RAG lifecycle is to do prompt engineering to customize
how the RAG system interprets the retrieved information to help it answer your
question.
This involves creating a [RAG Evaluation pipeline](https://huggingface.co/learn/cookbook/en/rag_evaluation),
which we won't have time to create in todays workshop, but to end, let's go
through how we can customize the prompts that are using in the langchain pipeline
that we're using to implement our RAG system.
```{python}
!union run --copy-all --remote union_rag/simple_rag.py ask_ollama --question 'what is flytekit?' --prompt_template "$(cat PROMPT_TEMPLATE.txt)"
```
## Conclusion
In this workshop, you learned how to:
- Set up a development environment on Union Serverless
- Run a RAG workflow using Langchain and GPT
- Deploy a Streamlit app that interacts with Union Serverless
- Swap out GPT for open weights models like `all-MiniLM-L6-v2` for sentence
embeddings and `phi3` as the response model.
- Customizing prompts via `PromptTemplate`s in Langchain.
Thank you for attending!
## Bonus: Create a Slackbot
If you're interested in high-latency chat use cases, check out the slack app
deployment section in the repo [README](https://github.com/unionai-oss/union-rag?tab=readme-ov-file#slack-app-deployment).