-
Notifications
You must be signed in to change notification settings - Fork 406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: implement consistent formatting for constraint expressions #1985
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great addition!
Left only one comment just for my own understanding
let expr = into_expr(expr, &schema, &state)?; | ||
let expr_str = fmt_expr_to_sql(&expr)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this going from SQL string expr to Datafusion Expr and then back to sql string expression?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes exactly.
If the provided expression is already a Datafusion expression then into_expr
will simply return it otherwise it will parse the string expression.
Either way if a String is provided or a proper DF expression it needs to be normalized somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which SQL dialect does Datafusion actually use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
postgres ... but the actual dialect expressions in the delta log should comply with is as of yet unspecified :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we put SQLglot in between on the python side then, so we can allow more flexibility there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use the generic dialect which in my mind is similar to postgres. Given we are only parsing expressions and not DML or DDL statements we should be fine.
https://docs.rs/sqlparser/latest/sqlparser/dialect/struct.GenericDialect.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we put SQLglot in between on the python side then, so we can allow more flexibility there?
I'm indifferent to that feature. My only concern is if we start exposing operations to other languages (i.e delta-core) then we should have fairly consistent interfaces.
A decision / documentation needs to be made on which sql dialect must be used when exposing expressions to the log. If I had to guess they would pick hive sql.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
Description
Implements consistent formatting for constraint expressions so something like
value < 1000
is normalized tovalue < 1000
Also includes drive by improvements.
Related Issue(s)