-
Notifications
You must be signed in to change notification settings - Fork 553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for Postgres array slice syntax #1290
Conversation
Pull Request Test Coverage Report for Build 9324507459Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
@jmhain Cool! I'm going to test this branch on the datafusion patch. Does this PR change user-facing APIs (ASTs) or it fixes the parsing logic internally? |
It's ready at apache/datafusion#10392. Seems the AST changes a non-trivial to adapt. I suppose we should treat the datafusion case as a regression test for this PR :D |
This patch cannot parse
|
And we may need to pack |
/// An access of nested data using subscript syntax, for example `array[2]`. | ||
Subscript { | ||
expr: Box<Expr>, | ||
subscript: Box<Subscript>, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the last comment, datafusion has a method:
fn plan_indices(
&self,
expr: SQLExpr,
schema: &DFSchema,
planner_context: &mut PlannerContext,
) -> Result<GetFieldAccess> {
where takes only the subscript
part. And:
fn plan_indexed(
&self,
expr: Expr,
mut keys: Vec<SQLExpr>,
schema: &DFSchema,
planner_context: &mut PlannerContext,
) -> Result<Expr> {
where the keys
is the subscript
part (vec![subscript]
). The logic is shared with MapAccess so I don't think it's easy to refactor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may still have:
ArrayIndex {
obj: Box<Expr>,
indexes: Vec<Expr>,
},
with a new variant:
Subscript { ... } // [1] or [1:2] or [1:2:3]
that becomes the indexes
part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have an idea how to fix this -- working on it
Another previously pattern: It's formerly parsed as:
|
Sorry -- I accidentally pushed the commit 4df6dc8 directly to this branch. We can revert it if we don't like it it. I am going to see if I can make datafusion work with this |
@alamb No worries, thank you for taking care of this! Quick question: is this stride syntax DataFusion-specific? If so, it'd be good for us to add some comments as such because currently the code makes it seem like this is a Postgres feature. |
This seems to have been enough to unblock our upgrade here: apache/datafusion#10392
I was going to say "I hope not as DataFusion tries not to invest new syntax".... But I think it might be 🤦 -- I took a quick google around and couldn't find any other databases that permit the stride syntax. I'll add a note to this PR |
@jmhain are you ok with the approach in this PR? If so I will merge it in to main (and likely see if I can release an incremental update to sqlparser with just this change) |
@alamb do you test this kind of SQL I don't find cases to guard regressions on slices combo. |
I also added an explicit test in def916c
Agreed -- I'll just make a normal release maybe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is related to #1283, though I don't know what SQL dialect DataFusion uses so I'm not sure whether this alone resolves it. cc @alamb @tisonkun
Prior to 0.46, we were able to parse an expression such as
select make_array(1, 2, 3)[1:2]
, but this was only because we were incorrectly interpreting the1:2
as a JSON access (Snowflake syntax) of a property2
on the value1
. (Snowflake doesn't actually allow the field name to start with a number.)On PostgreSQL, this syntax represents an array slice access. An important detail here is that these two syntaxes are mutually exclusive, since the array slice syntax can accept arbitrary expressions for either bound. For example,
array[foo:bar]
on Snowflake means: "access the property ofarray
whose name is the value of thebar
field infoo
", while on Postgres it means "return the subarray starting at the index from columnfoo
until the index from the columnbar
".As part of this change, I also renamed
ArrayIndex
toSubscript
. My intention is to remove theMapAccess
variant and subsume that functionality intoSubscript
andCompositeAccess
in a follow-up, as it's entirely redundant.