You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is there an existing issue for the same feature request?
I have checked the existing issues.
Is your feature request related to a problem?
We need a plugin framework in SQL that allows to execute a custom plugin executable that can manipulate the data from (datalink or string) in database. The result generated by the plugin can then be shown as a result in SQL.
For application,
1. generate embedding from LLM vector with source data (datalink or string) stored in database and then insert embedding into the table in database.
2. perform RAG application ask() function to LLM in SQL. You can create a plugin executable to implement the ask function. The plugin will do the following:
a. Read the question from stdin
b. generate the embedding of the question from LLM
c. perform SQL with the LLM index from our database to get the N-best results
d. send N-best results and question to LLM to get the RAG answer
e. send the RAG answer to database
f. show the answer as result
Describe the feature you'd like
Plugin is a framework to run an executable with input data (datalink or string) from stdin and
output must be a JSON array
Function signature
Plugin needs two input columns
command line with arguments in JSON, e.g. ["cat", "filename"]. Note: file path must be an absolute path
input data with datalink or string. With offset and size parameter in datalink, only portion
of data is read and send to stdin.
Return:
output buffer in JSON Array format
Create Embedding in Database
One of the example is wikipedia dumps. Wiki dump is a multisteam file which combine multiple bzip2
data chunks and its index file has the (offset, ID, title). The advantage is we don't need to unpack
the whole file before getting the required data.
Also, for LLM application, you can run the table_function plugin_exec() to convert the data from datalink
into the embedding by calling external LLM vendor.
All you have to do it to write a plugin executable in any language that
read the data from stdin, unzip data with bzip2, parse xml to get all pages
For each page, cut into multiple chunks
send each chunk to LLM to convert into embedding
format list of chunk embedding into JSON Array format
Write the JSON to stdout
Insert the embedding into table with the JSON array returned from stdout
To do this, simple run the SQL like this:
INSERT INTO t1 SELECT json_unquote(json_extract(result, '$.chunk')), json_unquote(json_extract(result, '$.e')) from
plugin_exec('["python3", "gen_embedding.py", "arg1", "arg2"]', cast('stage://mys3/wiki.bz2?offset=0&size=123456' as datalink)) as f;
Example,
create stage llmstage URL='file:///$resources/plugin/';
create table src (pkey int primary key, dlink datalink);
create table embed (pkey int, chunk int, e vecf32(3), t varchar);
insert into src values
(0, 'stage://llmstage/multistream.json?offset=0&size=155'),
(1, 'stage://llmstage/multistream.json?offset=155&size=164');
insert into embed select src.pkey, json_unquote(json_extract(f.result, "$.chunk")),
json_unquote(json_extract(f.result, "$.e")), json_unquote(json_extract(f.result, "$.t"))
from src CROSS APPLY plugin_exec('["cat"]', src.dlink) as f;
select * from embed;
pkey chunk e t
0 0 [1.1, 2.2, 3.3] text0
0 1 [2.5, 3.5, 4.5] text1
0 2 [9.1, 8.2, 7.3] text2
1 3 [10.1, 20.2, 30.3] text3
1 4 [20.5, 30.5, 40.5] text4
1 5 [90.1, 80.2, 70.3] text5
drop stage llmstage;
drop table src;
drop table embed;
LLM Ask function
You can also do ASK function with plugin "ask" command like this:
SELECT * FROM plugin_exec('["ask", "index_table"]', 'this is a question?') as f;
the plugin ask do the following:
get the question from stdin
get the embedding of the question from LLM
query the database with index_table. SELECT * from index_table order by L2_DISTANCE(e, "[1.3,2.3,3.4]") LIMIT 3;
send the result from database to LLM and get the answer
return answer to stdout
Example
select * from plugin_exec('["python3", "$resources/plugin/ask.py", "index_table"]', 'this is question to LLM') as f;
result
"this is answer from LLM with index table name = index_table. Question: this is question to LLM"
Describe implementation you've considered
No response
Documentation, Adoption, Use Case, Migration Strategy
No response
Additional information
No response
The text was updated successfully, but these errors were encountered:
Is there an existing issue for the same feature request?
Is your feature request related to a problem?
Describe the feature you'd like
Plugin is a framework to run an executable with input data (datalink or string) from stdin and
output must be a JSON array
Function signature
Plugin needs two input columns
of data is read and send to stdin.
Return:
output buffer in JSON Array format
Create Embedding in Database
One of the example is wikipedia dumps. Wiki dump is a multisteam file which combine multiple bzip2
data chunks and its index file has the (offset, ID, title). The advantage is we don't need to unpack
the whole file before getting the required data.
Also, for LLM application, you can run the table_function plugin_exec() to convert the data from datalink
into the embedding by calling external LLM vendor.
All you have to do it to write a plugin executable in any language that
To do this, simple run the SQL like this:
INSERT INTO t1 SELECT json_unquote(json_extract(result, '$.chunk')), json_unquote(json_extract(result, '$.e')) from
plugin_exec('["python3", "gen_embedding.py", "arg1", "arg2"]', cast('stage://mys3/wiki.bz2?offset=0&size=123456' as datalink)) as f;
Example,
LLM Ask function
You can also do ASK function with plugin "ask" command like this:
SELECT * FROM plugin_exec('["ask", "index_table"]', 'this is a question?') as f;
the plugin ask do the following:
Example
Describe implementation you've considered
No response
Documentation, Adoption, Use Case, Migration Strategy
No response
Additional information
No response
The text was updated successfully, but these errors were encountered: