-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Isthmus: Improve the API for Substrait to Sql #276
Comments
It predates my time working with the library, but it's been on my list of thing to address. I don't think there's a good reason for As for the API in public RelRoot convert(Root plan) { From the RelRoot docs, the The SubstraitRelVisitor (which converts from Calcite to Substrait) also does the wrong thing here IMO and throws away the RelRoot information: substrait-java/isthmus/src/main/java/io/substrait/isthmus/SubstraitRelVisitor.java Lines 382 to 389 in 8537dca
These should return a Substrait
I think it should be instance based, because I would like to see more configuration for the conversion. I'm actually not super happy with the SQL conversion API as is, and aside from it being useful as an internal utility for fast testing I'm not sure how useful it is generally. Core Isthmus can handle converting from Calcite into Substrait (via the poorly named SubstraitRelVisitor) or from Substrait to Calcite (using SubstraitToCalcite). The SQL stuff feels tacked on and it's also not specific to Isthmus (or Substrait) because it should (ideally) only depend on Calcite if we layer the APIs correctly. |
Thanks @vbarua - admit that I would agree with your views and opinions; things aren't quite lined-up. It will have evolved overtime as the standards and requirements have emerged. I'll look in more detail at the UPDATED: Add this as an update based on more digging
However.... for the next step getting back to SQL, not so easy. And I think this is more Calcite than Substrait. There's no support in |
Background: this came out of our recent work using Substrait - and found that converting to SQL was bit cumbersome. I've outlined a suggestion below based on the code we've got. The implementation is the first pass and fully expect to get suggestions for improvement. More than happy to submit a PR for this and see it through to completion. Thanks!
SubstraitToSql
This currently works at the Relation level not the Plan Level. As such the final SQL statement doesn't contain the expected final names of the columns. What was a
select id as identifier
is lost to become aselect id as $f1
The proposal here is work at the plan level where this information can be added via modifying the final relational. The information is in the plan and therefore could be used.
This would give symmetry to the APIs - and make them easier to use externally. Having to drop one level in the proto structure is cumbersome.
The downside of the approach is that in some cases calcite is creating an additional project relation. Have attempted to rationalise this via optimisers, but it felt better to leave the optimisation out. And get an SQL statement that was correct.
On the API structure..
Should the API for Isthmus be
io.substrait.plan.Plan
orio.substrait.proto.Plan
. As this is an external API, then it would be useful to have consistency - especially with the return types that can't be overloaded? SqltoSubstrait is currently returning the protobuf versions. Is there a reason that Isthmus does not return theio.substrait.plan.Plan
versions - this would seem to be the higher-level API?Code
Add to the
SubtraitToCalcite
classIn SubstraitToSql, this new
convert()
is then called by an updatedtoSql
Externally this would look like
Notes:
io.substrait.proto
or the high-levelio.substrait.plan
calls?The text was updated successfully, but these errors were encountered: