Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roadmap of being a "gateway" for other data sources #8386

Open
1 of 3 tasks
jerryleooo opened this issue Jun 25, 2021 · 6 comments
Open
1 of 3 tasks

Roadmap of being a "gateway" for other data sources #8386

jerryleooo opened this issue Jun 25, 2021 · 6 comments

Comments

@jerryleooo
Copy link
Member

jerryleooo commented Jun 25, 2021

Firstly thanks so much for the hard work of the community. As we all know that Trino is "SQL on Everything", but for a general query engine or "gateway" for other data sources, I think some works still need to do. Since I could not find an overall plan for this, I create this issue as the roadmap to record the progress, and welcome all the ideas.

The important features in my opinions include:

  • trino-parser can understand if the function encountered belongs to Trino or belongs to data sources. The benefit of this is we can use the datasource-specific functions that not supported by Trino. I am not sure if Function and type namespaces #8 is about this, if not then I guess a namespace idea can be considered -- when we use druid.func1(c1), Trino knows this function belongs to Druid data source, and can stop parsing and just keep the expression, then later just push this into Druid data source.

  • Pushdown, I can see Allow connectors to participate in query optimization #18 is about general pushdown and ConnectorExpression pushdown #7994 is specific for function pushdown.

  • Dynamic filtering, with dynamic filtering enabled, many full scans can be avoided. I have a PR about JDBC dynamic filtering: Enable dynamic filtering in JDBC connector #8137 (hope the community can give more feedback).

Forgive me if I use the wrong term or I misunderstand something or miss some discussions and welcome pointing them out!

@kokosing
Copy link
Member

trino-parser can understand if the function encountered belongs to Trino or belongs to data sources

If parse finds a function then it is surely a Trino function. Then if possible we could try to pushdown such function execution to remote data source. This is supported for connectors for aggregation functions only.

@jerryleooo
Copy link
Member Author

Hi @kokosing thanks for the reply.

If parse finds a function then it is surely a Trino function.

I know it is the current situation, but from #8140 (comment) I think there is future plan to support datasource-specific function? Then how about the idea of adding function namespace to make the parser easier?

This is supported for connectors for aggregation functions only.

I know it is the current situation, but from #7994 I think there is effort to push other functions down?

@kokosing
Copy link
Member

I think there is future plan to support datasource-specific function?

Not exactly. That depends on connector if Trino function has an equivalent function in remote data source we can try to push down the function execution there, but it does not mean that data source specific function will be available in Trino SQL.

I think there is effort to push other functions down

Function is just one of expression forms. We do not push down expressions, but operations like filter, join, projection, aggregation, limit, sort. Operation contains expressions but not every expression form is supported everywhere. Currently functions are supported in aggregation pushdown only. I am not aware of any other active work related to pushdown functions in projection or filter. @hashhar @wendigo ?

@jerryleooo
Copy link
Member Author

cc @martint

@martint
Copy link
Member

martint commented Jul 1, 2021

I know it is the current situation, but from #8140 (comment) I think there is future plan to support datasource-specific function? Then how about the idea of adding function namespace to make the parser easier?

Yes, that's the goal. Namespaces are not strictly needed, but the work done leading to it is needed to be able to do the handshake between the engine and connectors when reasoning about functions.

Overall, this is the approach we're shooting for:

  • Connectors can expose functions. These can be real (i.e., the Trino will call them during query processing) or synthetic (the engine can reason about them during query analysis and planning, but it's expected that they will get pushed down. If they don't, the query will fail at runtime)
  • Optimizers can push down complex expressions that contain those functions, for example, in filters, projections, join clauses, etc. The initial steps for this are in ConnectorExpression pushdown #7994

@jerryleooo
Copy link
Member Author

Hi @martint we are interested in the "connectors can expose functions" work, can share with us more materials if have. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants