-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MicroPlans - Allow choosing between different plans on the split level #13534
Comments
@assaf2 can you please add an example describing what a micro plan is? |
@findepi it would be an empty interface like
Second optimizer phase -
Then, for each split, the connector will use @martint do you wish to add something? |
Why does it need to be a new abstraction if all it does is to pass it along with table handle? |
Because each MicroPlan will have different plan on the engine (that's why the engine will need to be aware of which MicroPlan was chosen for each split). |
There are certain connectors that can't always make pushdown guarantees on the coordinator level. For example, ORC files may contain headers describing the min and max values within a certain row group. We want to give Hive and Iceberg connectors the ability to tell the engine for each split if a certain filter is supported. For example, let's assume the engine pushes down the filter
col > 1
, Hive\Iceberg could respond with 2 plans - one that consumes the filter and another that doesn't. Then, for each split in which the filter is always true (in our example, when the min value is greater than 1), Hive\Iceberg would use the plan that consumes the filter.This approach is a step directing to exploratory optimizer.
In general, we want to give connectors the ability to choose between different plans on the split level.
A new abstraction will be introduced - MicroPlanHandle (temporary name) which is a handle to a transformation of a table.
For certain metadata APIs, connectors will have the ability to return several MicroPlanHandles.
These APIs will only be those that affect the plan on the stage level (for example, they can’t affect any Exchange operation).
The optimization process will run in 2 phases. The first phase is the current optimization process. The second phase is where MicroPlanHandles are taken into consideration. At any given time, the connector won’t have the knowledge of which phase is running.
Changes in ConnectorMetadata
The signatures of the following APIs will be deprecated (and eventually removed) and replaced with:
In the first phase, the engine won’t pass a MicroPlanHandle and will only take the first element of the returned list. Trino might create a new table handle using a new API
combine(TableHandle, MicroPlanHandle) -> TableHandle
. Another option would be to embed the MicroPlanHandle inside the TableHandle.In the second phase, all the elements in the returned list will be taken. The engine might put a limitation on the amount of MicroPlanHandles a connector can generate by pruning the last MicroPlanHandles off the list.
Therefore, the connector should place the broadest required residual element as the first element and then all the other elements ordered by priority.
Worker SPI
A new argument will be passed into
ConnectorPageSourceProvider#createPageSource
-Optional<List<Pair<MicroPlanHandle, List<ColumnHandle>>>
instead of the existingList<ColumnHandle>
argument.ConnectorPageSource
will contain a new method:Optional<Integer> getChosenMicroPlan()
. The connector will return the ordinal number of the MicroPlanHandle it has chosen.Plans that are not used by any split won’t be compiled and cached in the worker (lazy approach).
The text was updated successfully, but these errors were encountered: