forked from spiceai/docs
-
Notifications
You must be signed in to change notification settings - Fork 0
/
spicepod.yaml
executable file
·177 lines (161 loc) · 6.7 KB
/
spicepod.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
version: v1beta1
kind: Spicepod
name: spice-oss-docs
datasets:
- from: github:github.com/spiceai/docs/files/trunk
name: spiceai.docs
description: Spice.ai OSS documentation and reference, from https://docs.spiceai.org
metadata:
instructions: |
Documents are stored in Markdown. Always provide citations.
When generating reference links for docs, use https://docs.spiceai.org/<docs_path> as a template
Exclude `spiceaidocs/docs` prefix from docs path and `.md` file extension.
Also replace `/index.md` with `/` in the path.
reference_base_url: https://docs.spiceai.org/<docs_path>
params:
file_format: md
github_client_id: ${secrets:GITHUB_CLIENT_ID}
github_private_key: ${secrets:GITHUB_PRIVATE_KEY}
github_installation_id: ${secrets:GITHUB_INSTALLATION_ID}
include: "spiceaidocs/docs/**/*.md"
acceleration:
enabled: true
refresh_check_interval: 4h
refresh_jitter_enabled: true
refresh_jitter_max: 30m
columns:
- name: content
embeddings:
- from: openai_embeddings
row_id:
- path
chunking:
enabled: true
target_chunk_size: 512
overlap_size: 128
trim_whitespace: true
- from: github:github.com/spiceai/samples/files/trunk
name: spiceai.samples
description: Spice.ai OSS samples
metadata:
instructions: Documents are stored in Markdown. Always provide citations.
reference_base_url: https://github.com/spiceai/samples/tree/trunk/<sample_path>
params:
github_client_id: ${secrets:GITHUB_CLIENT_ID}
github_private_key: ${secrets:GITHUB_PRIVATE_KEY}
github_installation_id: ${secrets:GITHUB_INSTALLATION_ID}
include: "**/*.md"
acceleration:
enabled: true
refresh_check_interval: 4h
refresh_jitter_enabled: true
refresh_jitter_max: 30m
columns:
- name: content
embeddings:
- from: openai_embeddings
row_id:
- path
chunking:
enabled: true
target_chunk_size: 512
overlap_size: 128
trim_whitespace: true
- from: github:github.com/spiceai/quickstarts/files/trunk
name: spiceai.quickstarts
description: Spice.ai OSS quickstarts
metadata:
instructions: Documents are stored in Markdown. Always provide citations.
reference_base_url: https://github.com/spiceai/quickstarts/tree/trunk/<quickstart_path>
params:
file_format: md
github_client_id: ${secrets:GITHUB_CLIENT_ID}
github_private_key: ${secrets:GITHUB_PRIVATE_KEY}
github_installation_id: ${secrets:GITHUB_INSTALLATION_ID}
include: "**/*.md"
acceleration:
enabled: true
refresh_check_interval: 4h
refresh_jitter_enabled: true
refresh_jitter_max: 30m
columns:
- name: content
embeddings:
- from: openai_embeddings
row_id:
- path
chunking:
enabled: true
target_chunk_size: 512
overlap_size: 128
trim_whitespace: true
- from: github:github.com/spiceai/blog/files/trunk
name: spiceai.blog
description: Spice.ai OSS blog posts
metadata:
instructions: |
This dataset provides access to the Spice.ai OSS project blog posts in Markdown format. The content is sourced from the Spice.ai OSS blog repository at https://github.com/spiceai/blog.
reference_base_url: https://github.com/spiceai/blog/tree/trunk/content/posts/<post_path>
params:
file_format: md
github_client_id: ${secrets:GITHUB_CLIENT_ID}
github_private_key: ${secrets:GITHUB_PRIVATE_KEY}
github_installation_id: ${secrets:GITHUB_INSTALLATION_ID}
include: "content/posts/**/*.md"
acceleration:
enabled: true
refresh_check_interval: 4h
refresh_jitter_enabled: true
refresh_jitter_max: 30m
columns:
- name: content
embeddings:
- from: openai_embeddings
row_id:
- path
chunking:
enabled: true
target_chunk_size: 512
overlap_size: 128
trim_whitespace: true
- from: github:github.com/spiceai/spiceai/issues
name: spiceai.issues
description: Spice.ai OSS issues from https://github.com/spiceai/spiceai/issues
params:
github_client_id: ${secrets:GITHUB_CLIENT_ID}
github_private_key: ${secrets:GITHUB_PRIVATE_KEY}
github_installation_id: ${secrets:GITHUB_INSTALLATION_ID}
acceleration:
enabled: true
refresh_check_interval: 4h
refresh_jitter_enabled: true
refresh_jitter_max: 30m
embeddings:
- from: openai
name: openai_embeddings
params:
openai_api_key: ${ secrets:OPENAI_API_KEY }
models:
- name: openai
from: openai:gpt-4o
params:
tools: auto
openai_api_key: ${secrets:OPENAI_API_KEY}
system_prompt: |
You are an AI assistant assisting engineers with the Spice.ai OSS Project.
Always strive to be accurate, concise, and helpful in your responses.
Apply instructions and reference_base_url metadata from the datasets to provide accurate and relevant information.
Prefer "docs" dataset for documentation and reference information questions.
Prefer "samples" and "quickstarts" datasets for use cases, sample code, and configuration questions. Always include links to relevant samples or quickstarts.
Use the SQL tool (sql_query) when:
1. The query involves precise numerical data, statistics, or aggregations.
2. The user asks for specific counts, sums, averages, or other calculations.
3. The query requires joining or comparing data from multiple related tables.
If the SQL tool returns a query, syntax, or planning error, call the `list_datasets` tool to get the available tables and continue to refine and retry the query until it succeeds. If the query fails after 5 attempts, on each subsequent run `EXPLAIN <query>` to better understand what went wrong. If it continues to fail after 10 attempts, fall back to other available tools.
When returning results from datasets, always provide citations and reference links if possible.
Use the document search tool when:
1. The query is about unstructured text information, such as policies, reports, or articles.
2. The user is looking for qualitative information or explanations.
3. The query requires understanding context or interpreting written content.
General guidelines:
1. If a query could be answered by either tool, prefer SQL for more precise, quantitative answers.