Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Postgres array slice syntax #1290

Merged
merged 5 commits into from
May 31, 2024
Merged

Support for Postgres array slice syntax #1290

merged 5 commits into from
May 31, 2024

Conversation

jmhain
Copy link
Contributor

@jmhain jmhain commented May 27, 2024

This is related to #1283, though I don't know what SQL dialect DataFusion uses so I'm not sure whether this alone resolves it. cc @alamb @tisonkun

Prior to 0.46, we were able to parse an expression such as select make_array(1, 2, 3)[1:2], but this was only because we were incorrectly interpreting the 1:2 as a JSON access (Snowflake syntax) of a property 2 on the value 1. (Snowflake doesn't actually allow the field name to start with a number.)

On PostgreSQL, this syntax represents an array slice access. An important detail here is that these two syntaxes are mutually exclusive, since the array slice syntax can accept arbitrary expressions for either bound. For example, array[foo:bar] on Snowflake means: "access the property of array whose name is the value of the bar field in foo", while on Postgres it means "return the subarray starting at the index from column foo until the index from the column bar".

As part of this change, I also renamed ArrayIndex to Subscript. My intention is to remove the MapAccess variant and subsume that functionality into Subscript and CompositeAccess in a follow-up, as it's entirely redundant.

@coveralls
Copy link

coveralls commented May 27, 2024

Pull Request Test Coverage Report for Build 9324507459

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 185 of 207 (89.37%) changed or added relevant lines in 5 files are covered.
  • 1466 unchanged lines in 10 files lost coverage.
  • Overall coverage decreased (-0.7%) to 88.775%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/parser/mod.rs 41 44 93.18%
src/ast/mod.rs 17 21 80.95%
tests/sqlparser_postgres.rs 107 122 87.7%
Files with Coverage Reduction New Missed Lines %
tests/sqlparser_hive.rs 2 99.01%
tests/sqlparser_sqlite.rs 4 98.18%
src/ast/ddl.rs 12 85.92%
tests/sqlparser_snowflake.rs 16 98.09%
src/ast/query.rs 67 84.1%
tests/sqlparser_bigquery.rs 131 90.07%
tests/sqlparser_common.rs 246 89.5%
tests/sqlparser_postgres.rs 262 87.83%
src/parser/mod.rs 330 92.98%
src/ast/mod.rs 396 80.85%
Totals Coverage Status
Change from base Build 9212358328: -0.7%
Covered Lines: 25513
Relevant Lines: 28739

💛 - Coveralls

@tisonkun
Copy link
Member

tisonkun commented May 28, 2024

@jmhain Cool! I'm going to test this branch on the datafusion patch.

Does this PR change user-facing APIs (ASTs) or it fixes the parsing logic internally?

@tisonkun
Copy link
Member

It's ready at apache/datafusion#10392. Seems the AST changes a non-trivial to adapt. I suppose we should treat the datafusion case as a regression test for this PR :D

@tisonkun
Copy link
Member

This patch cannot parse select make_array(1, 2, 3)[1:2:2], which is previously:

ArrayIndex {
    obj: "...",
    indexes: [
        JsonAccess {
            left: JsonAccess {
                left: Value(Number("1", false)),
                operator: Colon,
                right: Value(Number("2", false)),
            },
            operator: Colon,
            right: Value(Number("2", false)),
        },
    ],
},

@tisonkun
Copy link
Member

And we may need to pack Subscipt as a variant in Expr, so that we can translate it into a logical plan like ListSlices. The planner method generally accepts an Expr.

Comment on lines +748 to 752
/// An access of nested data using subscript syntax, for example `array[2]`.
Subscript {
expr: Box<Expr>,
subscript: Box<Subscript>,
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the last comment, datafusion has a method:

    fn plan_indices(
        &self,
        expr: SQLExpr,
        schema: &DFSchema,
        planner_context: &mut PlannerContext,
    ) -> Result<GetFieldAccess> {

where takes only the subscript part. And:

    fn plan_indexed(
        &self,
        expr: Expr,
        mut keys: Vec<SQLExpr>,
        schema: &DFSchema,
        planner_context: &mut PlannerContext,
    ) -> Result<Expr> {

where the keys is the subscript part (vec![subscript]). The logic is shared with MapAccess so I don't think it's easy to refactor.

Copy link
Member

@tisonkun tisonkun May 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may still have:

    ArrayIndex {
        obj: Box<Expr>,
        indexes: Vec<Expr>,
    },

with a new variant:

Subscript { ... } // [1] or [1:2] or [1:2:3]

that becomes the indexes part.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have an idea how to fix this -- working on it

@tisonkun
Copy link
Member

tisonkun commented May 30, 2024

Another previously pattern: select make_array(1, 2, 3)[1:2][2] now panic with: called `Result::unwrap()` on an `Err` value: ParserError("Expected variant object key name, found: 2 at Line: 1, Column 30").

It's formerly parsed as:

ArrayIndex {
    obj: "Function ...",
    indexes: [
        JsonAccess {
            left: Value(Number("1", false)),
            operator: Colon,
            right: Value(Number("2", false)),
        },
        Value(Number("2", false)),
    ],
},

@alamb
Copy link
Contributor

alamb commented May 30, 2024

Sorry -- I accidentally pushed the commit 4df6dc8 directly to this branch. We can revert it if we don't like it it.

I am going to see if I can make datafusion work with this

@jmhain
Copy link
Contributor Author

jmhain commented May 30, 2024

@alamb No worries, thank you for taking care of this! Quick question: is this stride syntax DataFusion-specific? If so, it'd be good for us to add some comments as such because currently the code makes it seem like this is a Postgres feature.

@alamb
Copy link
Contributor

alamb commented May 30, 2024

This seems to have been enough to unblock our upgrade here: apache/datafusion#10392

@alamb No worries, thank you for taking care of this! Quick question: is this stride syntax DataFusion-specific? If so, it'd be good for us to add some comments as such because currently the code makes it seem like this is a Postgres feature.

I was going to say "I hope not as DataFusion tries not to invest new syntax".... But I think it might be 🤦 -- I took a quick google around and couldn't find any other databases that permit the stride syntax. I'll add a note to this PR

@alamb
Copy link
Contributor

alamb commented May 31, 2024

@jmhain are you ok with the approach in this PR? If so I will merge it in to main (and likely see if I can release an incremental update to sqlparser with just this change)

@tisonkun
Copy link
Member

@alamb do you test this kind of SQL select make_array(1, 2, 3)[1:2][2]?

I don't find cases to guard regressions on slices combo.

@jmhain
Copy link
Contributor Author

jmhain commented May 31, 2024

@tisonkun There's this double slice round-trip test and iirc there are a few existing tests with multiple subscripts.

@alamb Yeah this is all good with me. It is an API break though, so I don't think we can get away with a version number like 0.46.1.

@alamb
Copy link
Contributor

alamb commented May 31, 2024

@tisonkun There's this double slice round-trip test and iirc there are a few existing tests with multiple subscripts.

I also added an explicit test in def916c

@alamb Yeah this is all good with me. It is an API break though, so I don't think we can get away with a version number like 0.46.1.

Agreed -- I'll just make a normal release maybe

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @jmhain and @tisonkun

@alamb alamb merged commit afa5f08 into apache:main May 31, 2024
10 checks passed
@jmhain jmhain deleted the subscript branch May 31, 2024 21:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants