Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let virtual table implementations know which SQL string is being executed #87

Merged
merged 4 commits into from
Dec 21, 2022

Conversation

lucasvr
Copy link
Contributor

@lucasvr lucasvr commented Dec 12, 2022

SQLite provides an interface to expose different data sources (such as files or other database systems) as virtual tables. There are several callbacks one must implement to enable e.g., connection handling and pagination, but one key attribute is missing on all callbacks: the query string being executed by the application.

The query string is especially relevant when virtual tables are used to mirror data from external databases hosted on different machines. A virtual table implementation that knows about the SQL string is able to reduce the amount of data transferred over the network when an application is only concerned about a subset of the columns.

This patch adds a new opcode VPrepareSql that's called immediately before VFilter when dispatching queries to a virtual table. The new opcode triggers the execution of the new xPrepareSql method which provides the SQL text to the virtual table implementation.

I am open to suggestions regarding the implementation approach and the new method's name. I'm looking forward to reading your feedback. Thanks!

This commit adds a new method named xPrepareSql to sqlite3_module and an
associated opcode VPrepareSql. That opcode is emitted immediately before
a VFilter opcode when querying virtual tables.

xPrepareSql takes two arguments: a pointer to the virtual table's cursor
and, missing from existing sqlite3_module methods, the SQL string being
executed by the application.

Prior to the introduction of this method, a virtual table that mirrored
a remote table had no way to tell which columns of the remote server
were requested by the application. Therefore, such an implementation
would typically `SELECT *` from the remote table and let callbacks such
as xColumn() and xNext() decide which data to send back to the
application. The problem with this approach is that queries that SELECT
just a subset of the remote columns trigger the transfer of data over
the network even though they won't be used.

xPrepareSql gives an opportunity for the virtual table implementation to
inspect the query string and selectively choose which columns from the
remote server to pull and cache locally.

Because this change introduces a new method to sqlite3_module, new
modules wishing to implement a callback for xPrepareSql must declare
a module version (iVersion) 4 or above.
@psarna psarna self-requested a review December 12, 2022 19:47
Copy link
Collaborator

@psarna psarna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Left a few minor comments.

src/sqlite.h.in Outdated
@@ -7076,6 +7076,8 @@ struct sqlite3_module {
/* The methods above are in versions 1 and 2 of the sqlite_module object.
** Those below are for version 3 and greater. */
int (*xShadowName)(const char*);
/** The methods below are for version 4 and greater. */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's going to get ambiguous if/when SQLite adds some other field and bump the current version to 4, and we'd like to keep libSQL compatible for as long as possible. There's no perfect solution here, but in an (unmerged yet) pull request I used version "700", to indicate that it's a libSQL extension, and it makes the collision less likely to happen in the next few years. Perhaps the same trick could be applied here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. HDF5 adopts a similar approach when dealing with I/O filters: a certain range of numbers is reserved for filters contributed by the community. I will follow your suggestion and will check for iVersion >= 700.

src/vdbe.c Outdated
}
}

if( rc ) goto abort_due_to_error;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line looks redundant - it's already checked in line 8101, and line 8100 is the only one in this block that assigns to rc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, indeed. Thanks for spotting that!

src/vdbe.c Outdated
if( pModule->iVersion>=4 ){
if( pModule->xPrepareSql && p->zSql ){
rc = pModule->xPrepareSql(pVCur, p->zSql);
if( rc ) goto abort_due_to_error; // TODO set and test error msg
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you elaborate on the TODO? As long as the implementation of xPrepareSql assigns a proper error code, it will be properly handled and printed in the code that follows abort_due_to_error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was a reminder to test the error path, but I ended up leaving it behind by mistake. I will remove it.

src/vdbe.c Outdated

/* Invoke the xPrepareSql method */
if( pModule->iVersion>=4 ){
if( pModule->xPrepareSql && p->zSql ){
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question - in which cases is p->zSql null? I wonder if the responsibility for checking and handling NULL shouldn't be pushed to the implementation of xPrepareSql - perhaps somebody would like to do something specific even for cursors that don't have the SQL text?

This is a genuine question, as I haven't implemented any virtual SQLite tables in my life (: if it doesn't make sense, then feel free to leave the code as is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added that as a sanity check. A NULL value for p->zsql would be possible when handling the Init or Trace opcodes, but that is not expected when processing a valid SQL statement.

@psarna
Copy link
Collaborator

psarna commented Dec 19, 2022

@lucasvr just to prevent this PR from going stale - would you like us to take over and apply the fixes or would you prefer working on v2 yourself?

@lucasvr
Copy link
Contributor Author

lucasvr commented Dec 19, 2022

@lucasvr just to prevent this PR from going stale - would you like us to take over and apply the fixes or would you prefer working on v2 yourself?

Apologies for the late response! Let me work on v2 - I will send you an updated PR in the next few hours. Also, I appreciate your review and comments -- thanks for that!

This commit also removes a comment that was left over on the previous
changeset.
Following @psarna's advice, use iVersion >= 700 for
community-contributed methods.
This name change is motivated to prevent misunderstandings: the callback
is not meant to output a prepared SQL statement. Rather, the callback is
being informed of the SQL statement prepared by SQLite.
@lucasvr
Copy link
Contributor Author

lucasvr commented Dec 19, 2022

@psarna I've added 2 commits on top of the original one that address your comments. I have also introduced a name change: the method is now calledPreparedSql (as opposed to PrepareSql) to prevent misunderstandings about what the callback is supposed to do. Thanks again for your prompt review.

Copy link
Collaborator

@psarna psarna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks good to me. @penberg could you take a cursory look just to have a second pair of eyes on it? And then we can go ahead and merge

@penberg penberg merged commit 7beaa40 into tursodatabase:main Dec 21, 2022
@penberg
Copy link
Collaborator

penberg commented Dec 21, 2022

Merged, thanks! @lucasvr @psarna perhaps a note about this in doc/libsql_extensions.md is in order?

@asg017
Copy link

asg017 commented Dec 21, 2022

For reference, the current SQLite vtab implementation does already give information about how the table is being queried, in the xBestIndex method.

Specifically, the sqlite3_index_info.colUsed field states which columns are requested (ex "only these 2 out of 10 columns are requested"), which can be used for projection pushdowns. Also the sqlite3_index_info.aConstraint[] fields state all the WHERE filters that are applied to the query (ex "only returns rows with created_at >= '2022-01-01'"), which can be use for predicate pushdowns.

It's not the friendliest API, but I find dealing with those fields much easier than parsing raw SQL. It also doesn't handle GROUP BY aggregates well, ORDER BY is awkward, and only the first 63 columns are mentioned in colUsed.

@lucasvr
Copy link
Contributor Author

lucasvr commented Dec 21, 2022

@asg017 xBestIndex is indeed helpful. In my particular case I wanted to use SQLite as a trampoline to an external database, so forwarding the raw SQL query string made sense. I wasn't aware of the limitations of xBestIndex that you've mentioned, though -- thanks for sharing!

@lucasvr
Copy link
Contributor Author

lucasvr commented Dec 21, 2022

Merged, thanks! @lucasvr @psarna perhaps a note about this in doc/libsql_extensions.md is in order?

Will do. I'll send a new PR soon.

lucasvr added a commit to lucasvr/libsql that referenced this pull request Dec 22, 2022
Enhance documentation by describing the new xPreparedSql callback for
virtual tables. Refs tursodatabase#87
psarna pushed a commit that referenced this pull request Dec 23, 2022
Enhance documentation by describing the new xPreparedSql callback for
virtual tables. Refs #87
MarinPostma pushed a commit to MarinPostma/libsql that referenced this pull request Jan 2, 2023
Enhance documentation by describing the new xPreparedSql callback for
virtual tables. Refs tursodatabase#87
MarinPostma pushed a commit that referenced this pull request Oct 17, 2023
Fix data row serialization for the PostgreSQL wire protocol:

- Send column values as part of a data row instead a data row per
  column.
- Fix column name and type information in row description.

Refs #87
MarinPostma added a commit that referenced this pull request Oct 17, 2023
87: box parser r=psarna a=MarinPostma

This pr puts the parser on the heap. It's quite chunky, and caused stackoverflows in the past. I'm working to get a nicer patch to upstream lemon-rs, but it requires more work for somilar effects.

I also remove the dependency on my patch


Co-authored-by: ad hoc <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants