You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some connectors support a wide range of table layouts / on-demand partitioning.
For such connectors, we can skip the initial exchange that wraps TableScanNode when it's not partitioned properly for its parent nodes (for example a join)
As far as I can tell, this isn't trivial to solve in Presto currently.
Some similar logic was added to avoid exchanges for Hive partitionings that are compatible. See this PR. Unfortunately, this logic isn't flexible enough, i.e. the initial partitioning columns needs to match the desired exchange partitioning.
Another related primitive is the multi-layout support in Connector's metadata: connectors are able to provide different layouts depending on requested columns, unfortunately this logic in PickTableLayout & Metadata#GetLayout doesn't tell the connector which partitioning is desired, therefore the connector can't pick the right partitioning to avoid an exchange...
It looks like there's been quite a bit of discussion on such topics, for example #12674 suggested a path forward was to allow the Connectors to participate in the planning process instead of adding smart layouts APIs. I guess that's what ConnectorPlanOptimizerProvider solved.
I guess I could write a ConnectorPlanOptimizer that detects which nodes trigger exchanges and select the right partitioning to rewrite its TableScanNodes, but this comes with a couple problems:
it feels expensive to write and it feels like we're duplicating the AddExchanges logic
I'm not sure it would work in all cases, if I read ConnectorPlanOptimizer javadocs correctly, ConnectorPlanOptimizer won't reach nodes whose children involve different connectors, for example the JoinNode for query some_connector JOIN other_connector wouldn't get processed by a ConnectorPlanOptimizer. Here, even though different connectors are involved, we can still pick the proper partitioning to avoid the exchange on the TableScanNode.
Did I miss something here ? What would be the idiomatic way to solve this problem ? Can we extend the Connector API and add callbacks that rewrite the partitioning scheme in PickTableLayout / AddExchanges ?
The text was updated successfully, but these errors were encountered:
Some connectors support a wide range of table layouts / on-demand partitioning.
For such connectors, we can skip the initial exchange that wraps
TableScanNode
when it's not partitioned properly for its parent nodes (for example a join)As far as I can tell, this isn't trivial to solve in Presto currently.
Some similar logic was added to avoid exchanges for Hive partitionings that are compatible. See this PR. Unfortunately, this logic isn't flexible enough, i.e. the initial partitioning columns needs to match the desired exchange partitioning.
Another related primitive is the multi-layout support in Connector's metadata: connectors are able to provide different layouts depending on requested columns, unfortunately this logic in
PickTableLayout
&Metadata#GetLayout
doesn't tell the connector which partitioning is desired, therefore the connector can't pick the right partitioning to avoid an exchange...presto/presto-main/src/main/java/com/facebook/presto/sql/planner/iterative/rule/PickTableLayout.java
Lines 211 to 217 in 087f4ff
It looks like there's been quite a bit of discussion on such topics, for example #12674 suggested a path forward was to allow the Connectors to participate in the planning process instead of adding smart layouts APIs. I guess that's what
ConnectorPlanOptimizerProvider
solved.I guess I could write a
ConnectorPlanOptimizer
that detects which nodes trigger exchanges and select the right partitioning to rewrite itsTableScanNodes
, but this comes with a couple problems:AddExchanges
logicConnectorPlanOptimizer
won't reach nodes whose children involve different connectors, for example the JoinNode for querysome_connector JOIN other_connector
wouldn't get processed by aConnectorPlanOptimizer
. Here, even though different connectors are involved, we can still pick the proper partitioning to avoid the exchange on the TableScanNode.Did I miss something here ? What would be the idiomatic way to solve this problem ? Can we extend the Connector API and add callbacks that rewrite the partitioning scheme in
PickTableLayout
/AddExchanges
?The text was updated successfully, but these errors were encountered: