Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve bucketed table write parallelism for Presto on Spark #15934

Merged
merged 5 commits into from
Apr 15, 2021

Commits on Apr 15, 2021

  1. Configuration menu
    Copy the full SHA
    5443ee2 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    84f347e View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    f054a1d View commit details
    Browse the repository at this point in the history
  4. Refactor PrestoSparkRddFactory

    Move partitioning assignment to PrestoSparkQueryExecutionFactory
    
    This will allow to simply follow the number of partitions set in the
    bucketToPartition when creating a spark partitioner instead of running
    the logic of assigning numbers of partitions twice
    arhimondr committed Apr 15, 2021
    Configuration menu
    Copy the full SHA
    9a9c1bd View commit details
    Browse the repository at this point in the history
  5. Optimize partitioned table write for Presto on Spark

    When writing to a partitioned (bucketed) table ensure that each writer
    node has enough buckets to write to efficiently utilize all available
    concurrent threads
    arhimondr committed Apr 15, 2021
    Configuration menu
    Copy the full SHA
    68689dc View commit details
    Browse the repository at this point in the history