Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement dynamic partition pruning #1072

Merged
merged 3 commits into from
Jun 10, 2020

Conversation

raunaqmorarka
Copy link
Member

@raunaqmorarka raunaqmorarka commented Jul 2, 2019

Introduce collecting DynamicFilterSummary on build side of Join

  • Introduce DynamicFilterSummary
  • Collect DynamicFilterSummary before HashBuilder

Expose DynamicFilterResource endpoint on coordinator for collecting DynamicFilterSummaries from worker nodes. Feed coordinator with DynamicFilterSummaries from worker nodes

  • Add DynamicFilterResource
  • Add InMemoryDynamicFilterClient

Use DynamicFilterSummaries collected on coordinator to filter out splits in SplitManager

  • Register tasks in DynamicFilterService
  • Add DynamicFilterDescription
  • Pass Future with DynamicFilterDescription to the SplitManager
  • Add dynamic partition pruning to Hive connector

@cla-bot cla-bot bot added the cla-signed label Jul 2, 2019
@raunaqmorarka raunaqmorarka mentioned this pull request Jul 2, 2019
@sopel39
Copy link
Member

sopel39 commented Jul 3, 2019

@rzeyde-varada could you also assist reviewing it?

@sopel39
Copy link
Member

sopel39 commented Jul 3, 2019

test failures seems relevant to this feature

@raunaqmorarka raunaqmorarka force-pushed the runtime-partition-pruning branch 4 times, most recently from 7818da6 to 79d9a73 Compare July 3, 2019 17:35
@raunaqmorarka
Copy link
Member Author

test failures seems relevant to this feature

I've resolved the test failures now

Copy link
Contributor

@rzeyde-varada rzeyde-varada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed most of the code - looks good overall, a few questions/suggestions inside.
Will continue the review tomorrow.

Copy link
Contributor

@rzeyde-varada rzeyde-varada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished the review - many thanks for the contribution!

A small question about integration tests:
Would it be possible to adapt MemorySplitManager, so we can run a smoke integration test for dynamic partition pruning?

@raunaqmorarka
Copy link
Member Author

Finished the review - many thanks for the contribution!

A small question about integration tests:
Would it be possible to adapt MemorySplitManager, so we can run a smoke integration test for dynamic partition pruning?

Thanks for the thorough review :)
It doesn't look like there is an equivalent of hive partitions in memory connector.
@sopel39 Is it possible to test partition pruning through memory connector ? Also, would it be useful to add product tests with partitioned hive tables for this change ?

@raunaqmorarka raunaqmorarka force-pushed the runtime-partition-pruning branch 3 times, most recently from f042881 to 715227e Compare October 8, 2019 22:02
@raunaqmorarka raunaqmorarka force-pushed the runtime-partition-pruning branch 3 times, most recently from 058f13b to b760b51 Compare October 10, 2019 20:56
@martint martint self-requested a review October 16, 2019 15:29
Copy link
Member

@martint martint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments. I'm looking at the rest of the code now.

Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests left to review

@@ -54,22 +57,25 @@
// Mapping from dynamic filter ID to its build channel indices.
private final Map<String, Integer> buildChannels;

private final TypeProvider types;
// Mapping from dynamic filter ID to its build channel type.
private final Map<String, Type> filterBuildTypes;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we probably should have dedicated class for filter Id at this point instead of string

return Futures.transform(resultFuture, this::convertTupleDomain, directExecutor());
}

public ListenableFuture<Map<Symbol, Domain>> getNodeLocalDynamicFilterForSymbols()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Local specific logic should be moved out of LocalDynamicFilter eventually

Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % test improvement

@sopel39
Copy link
Member

sopel39 commented Jun 7, 2020

fixes: #52

@raunaqmorarka raunaqmorarka force-pushed the runtime-partition-pruning branch 2 times, most recently from 19be880 to 2f13dfb Compare June 8, 2020 11:36
@sopel39
Copy link
Member

sopel39 commented Jun 9, 2020

benchmark results with partitions (significant CPU and duration improvement)
dynamic_filtering_partitions.pdf

benchmark results without partitions:
dynamic_filtering_no_partitions.pdf
There is some CPU improvement visible, but it's bogus due to benchmark instability for baseline (I'm reruning it now).

@martint
Copy link
Member

martint commented Jun 9, 2020

Screen Shot 2020-06-09 at 8 34 28 AM

@@ -88,4 +88,29 @@ public boolean isFailure()
{
return failureState;
}

public boolean noMoreTasks()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be named something else. noMoreSplits sounds like a command, and this simply interrogates the current state.

case RUNNING:
case FINISHED:
case CANCELED:
// no more workers will be added to the query
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments don't seem to match the code... this doesn't change anything

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code was extracted as-is from io.prestosql.execution.scheduler.SqlQueryScheduler.StageLinkage#processScheduleResults
#1072 (comment)
Are we guaranteed that there will be no more tasks added (e.g when new workers come with co-located joins) when we reach SCHEDULING_SPLITS state ?
If not, we still need some way to figure out in DynamicFilterService when it's safe to assume that no new tasks will be added which can generate more values from build-side of join and we can use the dynamic filters reported from completion of existing build-side operators for filtering on the probe side.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dain

Looking at SourcePartitionedScheduler#finalizeTaskCreationIfNecessary it will splitPlacementPolicy.lockDownNodes(); and set state to SCHEDULING_SPLIT. It seems that at this point no more nodes will participate in query.

However, can stage state can change to RUNNING via SqlStageExecution#schedulingComplete() and skip splitPlacementPolicy.lockDownNodes(); call?

I guess SqlQueryScheduler.StageLinkage#processScheduleResults would also break then

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referring to the tone of the comments, not the contents. The comments imply that a change will occur. For example, the comment below says DO NOT complete a FAILED or ABORTED stage, but this is not a command. The code where this was extracted was making changes to the stage, so the comments were appropriate.

@raunaqmorarka raunaqmorarka requested a review from dain June 10, 2020 06:35
@sopel39 sopel39 dismissed martint’s stale review June 10, 2020 09:31

comments applied

@sopel39 sopel39 merged commit 7f278db into trinodb:master Jun 10, 2020
@sopel39
Copy link
Member

sopel39 commented Jun 10, 2020

merged, thanks!

@sopel39 sopel39 mentioned this pull request Jun 10, 2020
9 tasks
@raunaqmorarka raunaqmorarka deleted the runtime-partition-pruning branch June 11, 2020 03:27
@bitsondatadev
Copy link
Member

@simpligility I think this would be a great PR to showcase for the Presto Twitch show. @raunaqmorarka any objections to us showcasing your work? Further would you be interested in joining the show?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

9 participants