Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement TPCH Query 2 in TpchQueryBuilder #9825

Closed

Conversation

deepthydavis
Copy link
Contributor

@deepthydavis deepthydavis commented May 15, 2024

This PR introduces TPC-H Query 2 into the TpchQueryBuilder and extends the TpchBenchmark and ParquetTpchTest to include this query. Additionally, it provides a detailed performance comparison with DuckDB using the Parquet file format and includes the output of PrintPlanWithStats for detailed analysis.
Scaling Factor used is 1.
Here is the link to the PowerPoint presentation, which contains a detailed description for each driver and thread : https://ibm.box.com/s/sau464qdfac45aainwpj6pyvkvbtlsat

Performance Comparison

Chip: Apple M1 Pro
Total Number of Cores: 10 (8 performance and 2 efficiency)
Memory: 32 GB

The following table summarizes the performance comparison between Velox and DuckDB (with Parquet file format) across various numbers of threads/drivers:

# Num Threads/ Drivers Velox(ms) DuckDB(ms)
1 27 88.4
4 23 84.1
8 25 82.8
16 30 84

@facebook-github-bot
Copy link
Contributor

Hi @deepthydavis!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

Copy link

netlify bot commented May 15, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 3ccf32d
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/665eb61fa7e4ea0008280d6c

@deepthydavis deepthydavis changed the title sample query plan sample query plan[Draft] May 16, 2024
@deepthydavis deepthydavis marked this pull request as draft May 16, 2024 03:49
@deepthydavis deepthydavis changed the title sample query plan[Draft] Implement TPCH Query 11 in TpchQueryBuilder May 17, 2024
@deepthydavis deepthydavis changed the title Implement TPCH Query 11 in TpchQueryBuilder Implement TPCH Query 2 in TpchQueryBuilder May 17, 2024
@deepthydavis
Copy link
Contributor Author

deepthydavis commented May 17, 2024

Execution Plan Statistics

Output of PrintPlanWithStats for 4 drivers:

Execution time: 23ms
Splits total: 90, finished: 90
-- Limit[100] -> s_acctbal:DOUBLE, s_name:VARCHAR, n_name:VARCHAR, p_partkey:BIGINT, p_mfgr:VARCHAR, s_address:VARCHAR, s_phone:VARCHAR, s_comment:VARCHAR
   Output: 100 rows (343.44KB, 1 batches), Cpu time: 3.50us, Blocked wall time: 0ns, Peak memory: 0B, Memory allocations: 0, Threads: 1
      runningAddInputWallNanos     sum: 84ns, count: 1, min: 84ns, max: 84ns
      runningFinishWallNanos       sum: 291ns, count: 1, min: 291ns, max: 291ns
      runningGetOutputWallNanos    sum: 1.21us, count: 1, min: 1.21us, max: 1.21us
  -- Project[expressions: (s_acctbal:DOUBLE, ROW["s_acctbal"]), (s_name:VARCHAR, ROW["s_name"]), (n_name:VARCHAR, ROW["n_name"]), (p_partkey:BIGINT, ROW["p_partkey"]), (p_mfgr:VARCHAR, ROW["p_mfgr"]), (s_address:VARCHAR, ROW["s_address"]), (s_phone:VARCHAR, ROW["s_phone"]), (s_comment:VARCHAR, ROW["s_comment"])] -> s_acctbal:DOUBLE, s_name:VARCHAR, n_name:VARCHAR, p_partkey:BIGINT, p_mfgr:VARCHAR, s_address:VARCHAR, s_phone:VARCHAR, s_comment:VARCHAR
     Output: 460 rows (343.44KB, 1 batches), Cpu time: 4.00us, Blocked wall time: 0ns, Peak memory: 0B, Memory allocations: 0, Threads: 1
        runningAddInputWallNanos     sum: 375ns, count: 1, min: 375ns, max: 375ns
        runningFinishWallNanos       sum: 0ns, count: 1, min: 0ns, max: 0ns
        runningGetOutputWallNanos    sum: 2.04us, count: 1, min: 2.04us, max: 2.04us
    -- OrderBy[s_acctbal DESC NULLS LAST, n_name ASC NULLS LAST, s_name ASC NULLS LAST, p_partkey ASC NULLS LAST] -> s_acctbal:DOUBLE, s_name:VARCHAR, s_address:VARCHAR, s_phone:VARCHAR, s_comment:VARCHAR, s_suppkey:BIGINT, s_nationkey:BIGINT, p_partkey:BIGINT, p_mfgr:VARCHAR, n_name:VARCHAR
       Output: 460 rows (351.44KB, 1 batches), Cpu time: 147.83us, Blocked wall time: 0ns, Peak memory: 545.88KB, Memory allocations: 29, Threads: 1
          runningAddInputWallNanos     sum: 79.00us, count: 1, min: 79.00us, max: 79.00us
          runningFinishWallNanos       sum: 0ns, count: 1, min: 0ns, max: 0ns
          runningGetOutputWallNanos    sum: 63.12us, count: 1, min: 63.12us, max: 63.12us
      -- HashJoin[INNER ps_partkey=p_partkey, filter: eq(ROW["ps_supplycost"],ROW["min_supplycost"])] -> s_acctbal:DOUBLE, s_name:VARCHAR, s_address:VARCHAR, s_phone:VARCHAR, s_comment:VARCHAR, s_suppkey:BIGINT, s_nationkey:BIGINT, p_partkey:BIGINT, p_mfgr:VARCHAR, n_name:VARCHAR
         Output: 460 rows (3.45MB, 12 batches), Cpu time: 975.91us, Blocked wall time: 43.19ms, Peak memory: 2.07MB, Memory allocations: 161
         HashBuild: Input: 642 rows (22.66MB, 80 batches), Output: 0 rows (0B, 0 batches), Cpu time: 240.45us, Blocked wall time: 25.84ms, Peak memory: 1.77MB, Memory allocations: 5, Threads: 4
            blockedWaitForJoinBuildTimes        sum: 3, count: 3, min: 1, max: 1
            blockedWaitForJoinBuildWallNanos    sum: 25.84ms, count: 3, min: 8.57ms, max: 8.64ms
            distinctKey0                        sum: 461, count: 1, min: 461, max: 461
            hashtable.buildWallNanos            sum: 187.04us, count: 1, min: 187.04us, max: 187.04us
            hashtable.capacity                  sum: 199231, count: 1, min: 199231, max: 199231
            hashtable.numDistinct               sum: 642, count: 1, min: 642, max: 642
            hashtable.numRehashes               sum: 1, count: 1, min: 1, max: 1
            queuedWallNanos                     sum: 1.18ms, count: 4, min: 283.00us, max: 309.00us
            rangeKey0                           sum: 199231, count: 1, min: 199231, max: 199231
            runningAddInputWallNanos            sum: 198.55us, count: 4, min: 0ns, max: 198.55us
            runningFinishWallNanos              sum: 0ns, count: 4, min: 0ns, max: 0ns
            runningGetOutputWallNanos           sum: 5.42us, count: 4, min: 250ns, max: 4.50us
         HashProbe: Input: 117422 rows (2.26MB, 12 batches), Output: 460 rows (3.45MB, 12 batches), Cpu time: 735.46us, Blocked wall time: 17.35ms, Peak memory: 304.12KB, Memory allocations: 156, Threads: 1
            blockedWaitForJoinBuildTimes        sum: 1, count: 1, min: 1, max: 1
            blockedWaitForJoinBuildWallNanos    sum: 17.35ms, count: 1, min: 17.35ms, max: 17.35ms
            queuedWallNanos                     sum: 37.00us, count: 1, min: 37.00us, max: 37.00us
            runningAddInputWallNanos            sum: 347.88us, count: 1, min: 347.88us, max: 347.88us
            runningFinishWallNanos              sum: 54.38us, count: 1, min: 54.38us, max: 54.38us
            runningGetOutputWallNanos           sum: 325.87us, count: 1, min: 325.87us, max: 325.87us
        -- Aggregation[FINAL [ps_partkey] min_supplycost := min("min_supplycost")] -> ps_partkey:BIGINT, min_supplycost:DOUBLE
           Output: 117422 rows (2.26MB, 12 batches), Cpu time: 5.92ms, Blocked wall time: 0ns, Peak memory: 7.47MB, Memory allocations: 23, Threads: 1
              distinctKey0                 sum: 61089, count: 1, min: 61089, max: 61089
              hashtable.capacity           sum: 397279, count: 1, min: 397279, max: 397279
              hashtable.numDistinct        sum: 117422, count: 1, min: 117422, max: 117422
              hashtable.numRehashes        sum: 4, count: 1, min: 4, max: 4
              hashtable.numTombstones      sum: 0, count: 1, min: 0, max: 0
              rangeKey0                    sum: 397279, count: 1, min: 397279, max: 397279
              runningAddInputWallNanos     sum: 5.17ms, count: 1, min: 5.17ms, max: 5.17ms
              runningFinishWallNanos       sum: 1.62us, count: 1, min: 1.62us, max: 1.62us
              runningGetOutputWallNanos    sum: 858.50us, count: 1, min: 858.50us, max: 858.50us
          -- LocalPartition[REPARTITION HASH(ps_partkey)] -> ps_partkey:BIGINT, min_supplycost:DOUBLE
             Output: 234844 rows (4.53MB, 24 batches), Cpu time: 44.91us, Blocked wall time: 5.29ms, Peak memory: 0B, Memory allocations: 0
             LocalPartition: Input: 117422 rows (2.26MB, 12 batches), Output: 117422 rows (2.26MB, 12 batches), Cpu time: 34.17us, Blocked wall time: 0ns, Peak memory: 0B, Memory allocations: 0, Threads: 4
                queuedWallNanos              sum: 124.00us, count: 4, min: 24.00us, max: 41.00us
                runningAddInputWallNanos     sum: 22.83us, count: 4, min: 0ns, max: 22.83us
                runningFinishWallNanos       sum: 0ns, count: 4, min: 0ns, max: 0ns
                runningGetOutputWallNanos    sum: 2.08us, count: 4, min: 291ns, max: 958ns
             LocalExchange: Input: 117422 rows (2.26MB, 12 batches), Output: 117422 rows (2.26MB, 12 batches), Cpu time: 10.75us, Blocked wall time: 5.29ms, Peak memory: 0B, Memory allocations: 0, Threads: 1
                blockedWaitForProducerTimes        sum: 1, count: 1, min: 1, max: 1
                blockedWaitForProducerWallNanos    sum: 5.29ms, count: 1, min: 5.29ms, max: 5.29ms
                queuedWallNanos                    sum: 22.00us, count: 1, min: 22.00us, max: 22.00us
                runningAddInputWallNanos           sum: 0ns, count: 1, min: 0ns, max: 0ns
                runningFinishWallNanos             sum: 417ns, count: 1, min: 417ns, max: 417ns
                runningGetOutputWallNanos          sum: 7.62us, count: 1, min: 7.62us, max: 7.62us
            -- Aggregation[PARTIAL [ps_partkey] min_supplycost := min(ROW["ps_supplycost"])] -> ps_partkey:BIGINT, min_supplycost:DOUBLE
               Output: 117422 rows (2.26MB, 12 batches), Cpu time: 7.84ms, Blocked wall time: 0ns, Peak memory: 10.13MB, Memory allocations: 56, Threads: 4
                  distinctKey0                 sum: 70908, count: 1, min: 70908, max: 70908
                  hashtable.capacity           sum: 473699, count: 1, min: 473699, max: 473699
                  hashtable.numDistinct        sum: 117422, count: 1, min: 117422, max: 117422
                  hashtable.numRehashes        sum: 7, count: 1, min: 7, max: 7
                  hashtable.numTombstones      sum: 0, count: 1, min: 0, max: 0
                  rangeKey0                    sum: 473699, count: 1, min: 473699, max: 473699
                  runningAddInputWallNanos     sum: 7.19ms, count: 4, min: 0ns, max: 7.19ms
                  runningFinishWallNanos       sum: 1.33us, count: 4, min: 250ns, max: 417ns
                  runningGetOutputWallNanos    sum: 783.96us, count: 4, min: 4.63us, max: 768.41us
              -- HashJoin[INNER ps_suppkey=s_suppkey] -> ps_supplycost:DOUBLE, ps_partkey:BIGINT
                 Output: 158960 rows (2.49MB, 80 batches), Cpu time: 192.96us, Blocked wall time: 24.15ms, Peak memory: 80.00KB, Memory allocations: 2
                 HashBuild: Input: 1987 rows (15.91KB, 1 batches), Output: 0 rows (0B, 0 batches), Cpu time: 99.75us, Blocked wall time: 2.28ms, Peak memory: 80.00KB, Memory allocations: 2, Threads: 4
                    blockedWaitForJoinBuildTimes        sum: 3, count: 3, min: 1, max: 1
                    blockedWaitForJoinBuildWallNanos    sum: 2.28ms, count: 3, min: 736.00us, max: 796.00us
                    distinctKey0                        sum: 1988, count: 1, min: 1988, max: 1988
                    hashtable.buildWallNanos            sum: 29.67us, count: 1, min: 29.67us, max: 29.67us
                    hashtable.capacity                  sum: 9995, count: 1, min: 9995, max: 9995
                    hashtable.numDistinct               sum: 1987, count: 1, min: 1987, max: 1987
                    hashtable.numRehashes               sum: 1, count: 1, min: 1, max: 1
                    queuedWallNanos                     sum: 190.00us, count: 4, min: 21.00us, max: 68.00us
                    rangeKey0                           sum: 9995, count: 1, min: 9995, max: 9995
                    runningAddInputWallNanos            sum: 173.62us, count: 4, min: 0ns, max: 173.62us
                    runningFinishWallNanos              sum: 0ns, count: 4, min: 0ns, max: 0ns
                    runningGetOutputWallNanos           sum: 1.42us, count: 4, min: 209ns, max: 626ns
                 HashProbe: Input: 158960 rows (9.98MB, 80 batches), Output: 158960 rows (2.49MB, 80 batches), Cpu time: 93.20us, Blocked wall time: 21.87ms, Peak memory: 0B, Memory allocations: 0, Threads: 4
                    blockedWaitForJoinBuildTimes        sum: 4, count: 4, min: 1, max: 1
                    blockedWaitForJoinBuildWallNanos    sum: 21.87ms, count: 4, min: 5.22ms, max: 5.56ms
                    dynamicFiltersProduced              sum: 4, count: 4, min: 1, max: 1
                    queuedWallNanos                     sum: 48.00us, count: 4, min: 10.00us, max: 16.00us
                    replacedWithDynamicFilterRows       sum: 158960, count: 80, min: 1983, max: 1992
                    runningAddInputWallNanos            sum: 6.95us, count: 4, min: 0ns, max: 6.95us
                    runningFinishWallNanos              sum: 2.13us, count: 4, min: 334ns, max: 625ns
                    runningGetOutputWallNanos           sum: 34.83us, count: 4, min: 250ns, max: 33.66us
                -- TableScan[table: partsupp] -> ps_partkey:BIGINT, ps_suppkey:BIGINT, ps_supplycost:DOUBLE
                   Input: 158960 rows (9.98MB, 80 batches), Raw Input: 800000 rows (2.38MB), Output: 158960 rows (9.98MB, 80 batches), Cpu time: 12.32ms, Blocked wall time: 0ns, Peak memory: 17.58MB, Memory allocations: 529, Threads: 4, Splits: 10, DynamicFilter producer plan nodes: 16
                      dataSourceAddSplitWallNanos      sum: 2.07ms, count: 4, min: 436.00us, max: 621.00us
                      dataSourceReadWallNanos          sum: 8.78ms, count: 4, min: 4.00us, max: 8.77ms
                      dynamicFiltersAccepted           sum: 4, count: 4, min: 1, max: 1
                      flattenStringDictionaryValues    sum: 0, count: 4, min: 0, max: 0
                      ioWaitNanos                      sum: 1.45ms, count: 4, min: 0ns, max: 1.45ms
                      localReadBytes                   sum: 0B, count: 4, min: 0B, max: 0B
                      maxSingleIoWaitNanos             sum: 991.00us, count: 4, min: 0ns, max: 991.00us
                      numLocalRead                     sum: 0, count: 4, min: 0, max: 0
                      numPrefetch                      sum: 0, count: 4, min: 0, max: 0
                      numRamRead                       sum: 0, count: 4, min: 0, max: 0
                      numStorageRead                   sum: 2, count: 4, min: 0, max: 2
                      overreadBytes                    sum: 118B, count: 4, min: 0B, max: 118B
                      prefetchBytes                    sum: 0B, count: 4, min: 0B, max: 0B
                      preloadedSplits                  sum: 6, count: 6, min: 1, max: 1
                      queryThreadIoLatency             sum: 3, count: 4, min: 0, max: 3
                      ramReadBytes                     sum: 0B, count: 4, min: 0B, max: 0B
                      readyPreloadedSplits             sum: 1, count: 1, min: 1, max: 1
                      runningAddInputWallNanos         sum: 0ns, count: 4, min: 0ns, max: 0ns
                      runningFinishWallNanos           sum: 4.88us, count: 4, min: 542ns, max: 2.54us
                      runningGetOutputWallNanos        sum: 14.61ms, count: 4, min: 1.54ms, max: 9.29ms
                      skippedSplitBytes                sum: 0B, count: 4, min: 0B, max: 0B
                      skippedSplits                    sum: 0, count: 4, min: 0, max: 0
                      skippedStrides                   sum: 9, count: 4, min: 0, max: 4
                      storageReadBytes                 sum: 7.97MB, count: 4, min: 0B, max: 7.97MB
                      totalRemainingFilterTime         sum: 0ns, count: 4, min: 0ns, max: 0ns
                      totalScanTime                    sum: 463.00us, count: 4, min: 0ns, max: 463.00us
                -- HashJoin[INNER s_nationkey=n_nationkey] -> s_suppkey:BIGINT
                   Output: 1987 rows (15.91KB, 1 batches), Cpu time: 162.92us, Blocked wall time: 11.18ms, Peak memory: 64.00KB, Memory allocations: 2
                   HashBuild: Input: 5 rows (96B, 1 batches), Output: 0 rows (0B, 0 batches), Cpu time: 10.96us, Blocked wall time: 1.56ms, Peak memory: 64.00KB, Memory allocations: 2, Threads: 4
                      blockedWaitForJoinBuildTimes        sum: 3, count: 3, min: 1, max: 1
                      blockedWaitForJoinBuildWallNanos    sum: 1.56ms, count: 3, min: 435.00us, max: 627.00us
                      distinctKey0                        sum: 6, count: 1, min: 6, max: 6
                      hashtable.buildWallNanos            sum: 7.00us, count: 1, min: 7.00us, max: 7.00us
                      hashtable.capacity                  sum: 19, count: 1, min: 19, max: 19
                      hashtable.numDistinct               sum: 5, count: 1, min: 5, max: 5
                      hashtable.numRehashes               sum: 1, count: 1, min: 1, max: 1
                      queuedWallNanos                     sum: 521.00us, count: 4, min: 47.00us, max: 197.00us
                      rangeKey0                           sum: 19, count: 1, min: 19, max: 19
                      runningAddInputWallNanos            sum: 4.67us, count: 4, min: 0ns, max: 4.67us
                      runningFinishWallNanos              sum: 0ns, count: 4, min: 0ns, max: 0ns
                      runningGetOutputWallNanos           sum: 1.79us, count: 4, min: 208ns, max: 708ns
                   HashProbe: Input: 1987 rows (111.81KB, 1 batches), Output: 1987 rows (15.91KB, 1 batches), Cpu time: 151.96us, Blocked wall time: 9.63ms, Peak memory: 0B, Memory allocations: 0, Threads: 4
                      blockedWaitForJoinBuildTimes        sum: 4, count: 4, min: 1, max: 1
                      blockedWaitForJoinBuildWallNanos    sum: 9.63ms, count: 4, min: 2.31ms, max: 2.50ms
                      dynamicFiltersProduced              sum: 4, count: 4, min: 1, max: 1
                      queuedWallNanos                     sum: 179.00us, count: 4, min: 23.00us, max: 89.00us
                      replacedWithDynamicFilterRows       sum: 1987, count: 1, min: 1987, max: 1987
                      runningAddInputWallNanos            sum: 167ns, count: 4, min: 0ns, max: 167ns
                      runningFinishWallNanos              sum: 197.38us, count: 4, min: 1.12us, max: 193.92us
                      runningGetOutputWallNanos           sum: 2.58us, count: 4, min: 250ns, max: 1.71us
                  -- TableScan[table: supplier] -> s_suppkey:BIGINT, s_nationkey:BIGINT
                     Input: 1987 rows (111.81KB, 1 batches), Raw Input: 10000 rows (101.73KB), Output: 1987 rows (111.81KB, 1 batches), Cpu time: 6.08ms, Blocked wall time: 0ns, Peak memory: 9.40MB, Memory allocations: 29, Threads: 4, Splits: 10, DynamicFilter producer plan nodes: 4
                        dataSourceAddSplitWallNanos      sum: 2.13ms, count: 4, min: 385.00us, max: 749.00us
                        dataSourceReadWallNanos          sum: 174.00us, count: 4, min: 2.00us, max: 165.00us
                        dynamicFiltersAccepted           sum: 4, count: 4, min: 1, max: 1
                        flattenStringDictionaryValues    sum: 0, count: 4, min: 0, max: 0
                        ioWaitNanos                      sum: 5.47ms, count: 4, min: 673.00us, max: 1.75ms
                        localReadBytes                   sum: 0B, count: 4, min: 0B, max: 0B
                        maxSingleIoWaitNanos             sum: 3.81ms, count: 4, min: 498.00us, max: 1.60ms
                        numLocalRead                     sum: 0, count: 4, min: 0, max: 0
                        numPrefetch                      sum: 5, count: 4, min: 1, max: 2
                        numRamRead                       sum: 0, count: 4, min: 0, max: 0
                        numStorageRead                   sum: 12, count: 4, min: 2, max: 5
                        overreadBytes                    sum: 0B, count: 4, min: 0B, max: 0B
                        prefetchBytes                    sum: 7.82MB, count: 4, min: 1.56MB, max: 3.13MB
                        preloadedSplits                  sum: 6, count: 6, min: 1, max: 1
                        queryThreadIoLatency             sum: 12, count: 4, min: 2, max: 5
                        ramReadBytes                     sum: 0B, count: 4, min: 0B, max: 0B
                        runningAddInputWallNanos         sum: 0ns, count: 4, min: 0ns, max: 0ns
                        runningFinishWallNanos           sum: 3.33us, count: 4, min: 542ns, max: 1.25us
                        runningGetOutputWallNanos        sum: 14.12ms, count: 4, min: 2.09ms, max: 4.87ms
                        skippedSplitBytes                sum: 0B, count: 4, min: 0B, max: 0B
                        skippedSplits                    sum: 0, count: 4, min: 0, max: 0
                        skippedStrides                   sum: 9, count: 4, min: 2, max: 3
                        storageReadBytes                 sum: 15.73MB, count: 4, min: 3.13MB, max: 4.79MB
                        totalRemainingFilterTime         sum: 0ns, count: 4, min: 0ns, max: 0ns
                        totalScanTime                    sum: 61.00us, count: 4, min: 0ns, max: 61.00us
                  -- HashJoin[INNER n_regionkey=r_regionkey] -> n_nationkey:BIGINT
                     Output: 5 rows (96B, 1 batches), Cpu time: 334.71us, Blocked wall time: 7.52ms, Peak memory: 64.00KB, Memory allocations: 2
                     HashBuild: Input: 1 rows (134B, 1 batches), Output: 0 rows (0B, 0 batches), Cpu time: 18.04us, Blocked wall time: 950.00us, Peak memory: 64.00KB, Memory allocations: 2, Threads: 4
                        blockedWaitForJoinBuildTimes        sum: 3, count: 3, min: 1, max: 1
                        blockedWaitForJoinBuildWallNanos    sum: 950.00us, count: 3, min: 310.00us, max: 321.00us
                        distinctKey0                        sum: 2, count: 1, min: 2, max: 2
                        hashtable.buildWallNanos            sum: 66.04us, count: 1, min: 66.04us, max: 66.04us
                        hashtable.capacity                  sum: 2, count: 1, min: 2, max: 2
                        hashtable.numDistinct               sum: 1, count: 1, min: 1, max: 1
                        hashtable.numRehashes               sum: 1, count: 1, min: 1, max: 1
                        queuedWallNanos                     sum: 2.26ms, count: 4, min: 249.00us, max: 761.00us
                        rangeKey0                           sum: 2, count: 1, min: 2, max: 2
                        runningAddInputWallNanos            sum: 14.08us, count: 4, min: 0ns, max: 14.08us
                        runningFinishWallNanos              sum: 0ns, count: 4, min: 0ns, max: 0ns
                        runningGetOutputWallNanos           sum: 1.75us, count: 4, min: 167ns, max: 1.08us
                     HashProbe: Input: 5 rows (384B, 1 batches), Output: 5 rows (96B, 1 batches), Cpu time: 316.67us, Blocked wall time: 6.57ms, Peak memory: 0B, Memory allocations: 0, Threads: 4
                        blockedWaitForJoinBuildTimes        sum: 4, count: 4, min: 1, max: 1
                        blockedWaitForJoinBuildWallNanos    sum: 6.57ms, count: 4, min: 1.51ms, max: 1.75ms
                        dynamicFiltersProduced              sum: 4, count: 4, min: 1, max: 1
                        queuedWallNanos                     sum: 38.00us, count: 4, min: 8.00us, max: 12.00us
                        replacedWithDynamicFilterRows       sum: 5, count: 1, min: 5, max: 5
                        runningAddInputWallNanos            sum: 583ns, count: 4, min: 0ns, max: 583ns
                        runningFinishWallNanos              sum: 671.29us, count: 4, min: 7.33us, max: 575.33us
                        runningGetOutputWallNanos           sum: 5.88us, count: 4, min: 249ns, max: 4.63us
                    -- TableScan[table: nation] -> n_nationkey:BIGINT, n_regionkey:BIGINT
                       Input: 5 rows (384B, 1 batches), Raw Input: 25 rows (0B), Output: 5 rows (384B, 1 batches), Cpu time: 1.18ms, Blocked wall time: 0ns, Peak memory: 24.25KB, Memory allocations: 27, Threads: 4, Splits: 10, DynamicFilter producer plan nodes: 2
                          dataSourceAddSplitWallNanos      sum: 260.00us, count: 4, min: 36.00us, max: 121.00us
                          dataSourceReadWallNanos          sum: 43.00us, count: 4, min: 4.00us, max: 30.00us
                          dynamicFiltersAccepted           sum: 4, count: 4, min: 1, max: 1
                          flattenStringDictionaryValues    sum: 0, count: 4, min: 0, max: 0
                          ioWaitNanos                      sum: 78.00us, count: 4, min: 15.00us, max: 25.00us
                          localReadBytes                   sum: 0B, count: 4, min: 0B, max: 0B
                          maxSingleIoWaitNanos             sum: 42.00us, count: 4, min: 7.00us, max: 19.00us
                          numLocalRead                     sum: 0, count: 4, min: 0, max: 0
                          numPrefetch                      sum: 0, count: 4, min: 0, max: 0
                          numRamRead                       sum: 0, count: 4, min: 0, max: 0
                          numStorageRead                   sum: 11, count: 4, min: 2, max: 3
                          overreadBytes                    sum: 538B, count: 4, min: 0B, max: 538B
                          prefetchBytes                    sum: 0B, count: 4, min: 0B, max: 0B
                          preloadedSplits                  sum: 6, count: 6, min: 1, max: 1
                          queryThreadIoLatency             sum: 12, count: 4, min: 2, max: 4
                          ramReadBytes                     sum: 0B, count: 4, min: 0B, max: 0B
                          readyPreloadedSplits             sum: 5, count: 5, min: 1, max: 1
                          runningAddInputWallNanos         sum: 0ns, count: 4, min: 0ns, max: 0ns
                          runningFinishWallNanos           sum: 259.00us, count: 4, min: 584ns, max: 171.04us
                          runningGetOutputWallNanos        sum: 1.89ms, count: 4, min: 348.25us, max: 581.87us
                          skippedSplitBytes                sum: 0B, count: 4, min: 0B, max: 0B
                          skippedSplits                    sum: 0, count: 4, min: 0, max: 0
                          skippedStrides                   sum: 9, count: 4, min: 1, max: 3
                          storageReadBytes                 sum: 40.56KB, count: 4, min: 8.03KB, max: 12.04KB
                          totalRemainingFilterTime         sum: 0ns, count: 4, min: 0ns, max: 0ns
                          totalScanTime                    sum: 0ns, count: 4, min: 0ns, max: 0ns
                    -- TableScan[table: region, range filters: [(r_name, Filter(BytesValues, deterministic, null not allowed))]] -> r_regionkey:BIGINT, r_name:VARCHAR
                       Input: 1 rows (134B, 1 batches), Raw Input: 5 rows (0B), Output: 1 rows (134B, 1 batches), Cpu time: 1.79ms, Blocked wall time: 0ns, Peak memory: 1.94KB, Memory allocations: 18, Threads: 4, Splits: 10
                          dataSourceAddSplitWallNanos      sum: 3.76ms, count: 4, min: 789.00us, max: 1.14ms
                          dataSourceReadWallNanos          sum: 66.00us, count: 4, min: 2.00us, max: 57.00us
                          flattenStringDictionaryValues    sum: 0, count: 4, min: 0, max: 0
                          ioWaitNanos                      sum: 51.00us, count: 4, min: 9.00us, max: 21.00us
                          localReadBytes                   sum: 0B, count: 4, min: 0B, max: 0B
                          maxSingleIoWaitNanos             sum: 27.00us, count: 4, min: 5.00us, max: 9.00us
                          numLocalRead                     sum: 0, count: 4, min: 0, max: 0
                          numPrefetch                      sum: 0, count: 4, min: 0, max: 0
                          numRamRead                       sum: 0, count: 4, min: 0, max: 0
                          numStorageRead                   sum: 11, count: 4, min: 2, max: 4
                          overreadBytes                    sum: 101B, count: 4, min: 0B, max: 101B
                          prefetchBytes                    sum: 0B, count: 4, min: 0B, max: 0B
                          preloadedSplits                  sum: 6, count: 6, min: 1, max: 1
                          queryThreadIoLatency             sum: 12, count: 4, min: 2, max: 5
                          ramReadBytes                     sum: 0B, count: 4, min: 0B, max: 0B
                          readyPreloadedSplits             sum: 1, count: 1, min: 1, max: 1
                          runningAddInputWallNanos         sum: 0ns, count: 4, min: 0ns, max: 0ns
                          runningFinishWallNanos           sum: 302.79us, count: 4, min: 666ns, max: 296.96us
                          runningGetOutputWallNanos        sum: 5.98ms, count: 4, min: 1.45ms, max: 1.52ms
                          skippedSplitBytes                sum: 0B, count: 4, min: 0B, max: 0B
                          skippedSplits                    sum: 0, count: 4, min: 0, max: 0
                          skippedStrides                   sum: 9, count: 4, min: 2, max: 3
                          storageReadBytes                 sum: 16.78KB, count: 4, min: 3.31KB, max: 5.20KB
                          totalRemainingFilterTime         sum: 0ns, count: 4, min: 0ns, max: 0ns
                          totalScanTime                    sum: 0ns, count: 4, min: 0ns, max: 0ns
        -- HashJoin[INNER ps_suppkey=s_suppkey] -> s_acctbal:DOUBLE, s_name:VARCHAR, s_address:VARCHAR, s_phone:VARCHAR, s_comment:VARCHAR, s_suppkey:BIGINT, s_nationkey:BIGINT, ps_supplycost:DOUBLE, p_partkey:BIGINT, p_mfgr:VARCHAR, n_name:VARCHAR
           Output: 642 rows (22.66MB, 80 batches), Cpu time: 1.23ms, Blocked wall time: 24.02ms, Peak memory: 4.74MB, Memory allocations: 599
           HashBuild: Input: 1987 rows (753.97KB, 2 batches), Output: 0 rows (0B, 0 batches), Cpu time: 476.42us, Blocked wall time: 710.00us, Peak memory: 4.50MB, Memory allocations: 11, Threads: 4
              blockedWaitForJoinBuildTimes        sum: 3, count: 3, min: 1, max: 1
              blockedWaitForJoinBuildWallNanos    sum: 710.00us, count: 3, min: 142.00us, max: 391.00us
              distinctKey0                        sum: 1988, count: 1, min: 1988, max: 1988
              hashtable.buildWallNanos            sum: 44.88us, count: 1, min: 44.88us, max: 44.88us
              hashtable.capacity                  sum: 9995, count: 1, min: 9995, max: 9995
              hashtable.numDistinct               sum: 1987, count: 1, min: 1987, max: 1987
              hashtable.numRehashes               sum: 1, count: 1, min: 1, max: 1
              queuedWallNanos                     sum: 1.98ms, count: 4, min: 479.00us, max: 505.00us
              rangeKey0                           sum: 9995, count: 1, min: 9995, max: 9995
              runningAddInputWallNanos            sum: 495.29us, count: 4, min: 0ns, max: 495.29us
              runningFinishWallNanos              sum: 0ns, count: 4, min: 0ns, max: 0ns
              runningGetOutputWallNanos           sum: 1.17us, count: 4, min: 251ns, max: 333ns
           HashProbe: Input: 642 rows (3.82MB, 80 batches), Output: 642 rows (22.66MB, 80 batches), Cpu time: 757.04us, Blocked wall time: 23.31ms, Peak memory: 249.50KB, Memory allocations: 588, Threads: 4
              blockedWaitForJoinBuildTimes        sum: 4, count: 4, min: 1, max: 1
              blockedWaitForJoinBuildWallNanos    sum: 23.31ms, count: 4, min: 5.81ms, max: 5.86ms
              dynamicFiltersProduced              sum: 4, count: 4, min: 1, max: 1
              queuedWallNanos                     sum: 28.00us, count: 4, min: 3.00us, max: 9.00us
              runningAddInputWallNanos            sum: 35.59us, count: 4, min: 0ns, max: 35.59us
              runningFinishWallNanos              sum: 243.17us, count: 4, min: 667ns, max: 240.42us
              runningGetOutputWallNanos           sum: 426.91us, count: 4, min: 416ns, max: 425.50us
          -- HashJoin[INNER ps_partkey=p_partkey] -> ps_suppkey:BIGINT, ps_supplycost:DOUBLE, p_partkey:BIGINT, p_mfgr:VARCHAR
             Output: 642 rows (3.82MB, 80 batches), Cpu time: 512.54us, Blocked wall time: 25.80ms, Peak memory: 1.70MB, Memory allocations: 92
             HashBuild: Input: 747 rows (72.64KB, 20 batches), Output: 0 rows (0B, 0 batches), Cpu time: 185.87us, Blocked wall time: 18.47ms, Peak memory: 1.65MB, Memory allocations: 3, Threads: 4
                blockedWaitForJoinBuildTimes        sum: 3, count: 3, min: 1, max: 1
                blockedWaitForJoinBuildWallNanos    sum: 18.47ms, count: 3, min: 6.01ms, max: 6.24ms
                distinctKey0                        sum: 748, count: 1, min: 748, max: 748
                hashtable.buildWallNanos            sum: 183.17us, count: 1, min: 183.17us, max: 183.17us
                hashtable.capacity                  sum: 199476, count: 1, min: 199476, max: 199476
                hashtable.numDistinct               sum: 747, count: 1, min: 747, max: 747
                hashtable.numRehashes               sum: 1, count: 1, min: 1, max: 1
                queuedWallNanos                     sum: 1.48ms, count: 4, min: 304.00us, max: 463.00us
                rangeKey0                           sum: 199476, count: 1, min: 199476, max: 199476
                runningAddInputWallNanos            sum: 172.21us, count: 4, min: 0ns, max: 172.21us
                runningFinishWallNanos              sum: 0ns, count: 4, min: 0ns, max: 0ns
                runningGetOutputWallNanos           sum: 2.29us, count: 4, min: 208ns, max: 1.62us
             HashProbe: Input: 642 rows (7.53MB, 80 batches), Output: 642 rows (3.82MB, 80 batches), Cpu time: 326.67us, Blocked wall time: 7.33ms, Peak memory: 54.81KB, Memory allocations: 89, Threads: 4
                blockedWaitForJoinBuildTimes        sum: 4, count: 4, min: 1, max: 1
                blockedWaitForJoinBuildWallNanos    sum: 7.33ms, count: 4, min: 1.80ms, max: 1.85ms
                dynamicFiltersProduced              sum: 4, count: 4, min: 1, max: 1
                queuedWallNanos                     sum: 89.00us, count: 4, min: 13.00us, max: 28.00us
                runningAddInputWallNanos            sum: 133.25us, count: 4, min: 0ns, max: 133.25us
                runningFinishWallNanos              sum: 873ns, count: 4, min: 166ns, max: 333ns
                runningGetOutputWallNanos           sum: 142.25us, count: 4, min: 166ns, max: 141.46us
            -- TableScan[table: partsupp] -> ps_partkey:BIGINT, ps_suppkey:BIGINT, ps_supplycost:DOUBLE
               Input: 642 rows (7.53MB, 80 batches), Raw Input: 800000 rows (2.38MB), Output: 642 rows (7.53MB, 80 batches), Cpu time: 10.59ms, Blocked wall time: 0ns, Peak memory: 17.52MB, Memory allocations: 533, Threads: 4, Splits: 10, DynamicFilter producer plan nodes: 13,14
                  dataSourceAddSplitWallNanos      sum: 1.55ms, count: 4, min: 181.00us, max: 549.00us
                  dataSourceReadWallNanos          sum: 7.22ms, count: 4, min: 3.00us, max: 7.21ms
                  dynamicFiltersAccepted           sum: 8, count: 8, min: 1, max: 1
                  flattenStringDictionaryValues    sum: 0, count: 4, min: 0, max: 0
                  ioWaitNanos                      sum: 1.26ms, count: 4, min: 0ns, max: 1.26ms
                  localReadBytes                   sum: 0B, count: 4, min: 0B, max: 0B
                  maxSingleIoWaitNanos             sum: 882.00us, count: 4, min: 0ns, max: 882.00us
                  numLocalRead                     sum: 0, count: 4, min: 0, max: 0
                  numPrefetch                      sum: 0, count: 4, min: 0, max: 0
                  numRamRead                       sum: 0, count: 4, min: 0, max: 0
                  numStorageRead                   sum: 2, count: 4, min: 0, max: 2
                  overreadBytes                    sum: 118B, count: 4, min: 0B, max: 118B
                  prefetchBytes                    sum: 0B, count: 4, min: 0B, max: 0B
                  preloadedSplits                  sum: 6, count: 6, min: 1, max: 1
                  queryThreadIoLatency             sum: 3, count: 4, min: 0, max: 3
                  ramReadBytes                     sum: 0B, count: 4, min: 0B, max: 0B
                  runningAddInputWallNanos         sum: 0ns, count: 4, min: 0ns, max: 0ns
                  runningFinishWallNanos           sum: 3.50us, count: 4, min: 458ns, max: 1.71us
                  runningGetOutputWallNanos        sum: 12.22ms, count: 4, min: 1.36ms, max: 7.84ms
                  skippedSplitBytes                sum: 0B, count: 4, min: 0B, max: 0B
                  skippedSplits                    sum: 0, count: 4, min: 0, max: 0
                  skippedStrides                   sum: 9, count: 4, min: 0, max: 3
                  storageReadBytes                 sum: 7.97MB, count: 4, min: 0B, max: 7.97MB
                  totalRemainingFilterTime         sum: 0ns, count: 4, min: 0ns, max: 0ns
                  totalScanTime                    sum: 380.00us, count: 4, min: 0ns, max: 380.00us
            -- Filter[expression: eq(cast ROW["p_size"] as BIGINT,15)] -> p_partkey:BIGINT, p_mfgr:VARCHAR, p_size:INTEGER, p_type:VARCHAR
               Output: 747 rows (72.64KB, 20 batches), Cpu time: 902.88us, Blocked wall time: 0ns, Peak memory: 109.88KB, Memory allocations: 23, Threads: 4
                  runningAddInputWallNanos     sum: 3.00us, count: 4, min: 0ns, max: 3.00us
                  runningFinishWallNanos       sum: 337.08us, count: 4, min: 542ns, max: 275.21us
                  runningGetOutputWallNanos    sum: 624.87us, count: 4, min: 125ns, max: 624.46us
              -- TableScan[table: part, remaining filter: (like(ROW["p_type"],"%BRASS"))] -> p_partkey:BIGINT, p_mfgr:VARCHAR, p_size:INTEGER, p_type:VARCHAR
                 Input: 40058 rows (3.81MB, 20 batches), Raw Input: 200000 rows (1.79MB), Output: 40058 rows (3.81MB, 20 batches), Cpu time: 9.53ms, Blocked wall time: 0ns, Peak memory: 7.51MB, Memory allocations: 239, Threads: 4, Splits: 10
                    dataSourceAddSplitWallNanos      sum: 2.00ms, count: 4, min: 411.00us, max: 606.00us
                    dataSourceReadWallNanos          sum: 6.01ms, count: 4, min: 6.00us, max: 5.98ms
                    flattenStringDictionaryValues    sum: 0, count: 4, min: 0, max: 0
                    ioWaitNanos                      sum: 870.00us, count: 4, min: 0ns, max: 870.00us
                    localReadBytes                   sum: 0B, count: 4, min: 0B, max: 0B
                    maxSingleIoWaitNanos             sum: 543.00us, count: 4, min: 0ns, max: 543.00us
                    numLocalRead                     sum: 0, count: 4, min: 0, max: 0
                    numPrefetch                      sum: 0, count: 4, min: 0, max: 0
                    numRamRead                       sum: 0, count: 4, min: 0, max: 0
                    numStorageRead                   sum: 2, count: 4, min: 0, max: 2
                    overreadBytes                    sum: 123.12KB, count: 4, min: 0B, max: 123.12KB
                    prefetchBytes                    sum: 0B, count: 4, min: 0B, max: 0B
                    preloadedSplits                  sum: 6, count: 6, min: 1, max: 1
                    queryThreadIoLatency             sum: 4, count: 4, min: 0, max: 4
                    ramReadBytes                     sum: 0B, count: 4, min: 0B, max: 0B
                    readyPreloadedSplits             sum: 4, count: 4, min: 1, max: 1
                    runningAddInputWallNanos         sum: 0ns, count: 4, min: 0ns, max: 0ns
                    runningFinishWallNanos           sum: 500ns, count: 4, min: 83ns, max: 209ns
                    runningGetOutputWallNanos        sum: 12.88ms, count: 4, min: 1.88ms, max: 6.57ms
                    skippedSplitBytes                sum: 0B, count: 4, min: 0B, max: 0B
                    skippedSplits                    sum: 0, count: 4, min: 0, max: 0
                    skippedStrides                   sum: 9, count: 4, min: 0, max: 3
                    storageReadBytes                 sum: 2.20MB, count: 4, min: 0B, max: 2.20MB
                    totalRemainingFilterTime         sum: 803.00us, count: 4, min: 0ns, max: 803.00us
                    totalScanTime                    sum: 543.00us, count: 4, min: 0ns, max: 543.00us
          -- HashJoin[INNER s_nationkey=n_nationkey] -> s_acctbal:DOUBLE, s_name:VARCHAR, s_address:VARCHAR, s_phone:VARCHAR, s_comment:VARCHAR, s_suppkey:BIGINT, s_nationkey:BIGINT, s_suppkey:BIGINT, n_name:VARCHAR
             Output: 1987 rows (753.97KB, 2 batches), Cpu time: 278.37us, Blocked wall time: 11.49ms, Peak memory: 302.00KB, Memory allocations: 16
             HashBuild: Input: 5 rows (236B, 1 batches), Output: 0 rows (0B, 0 batches), Cpu time: 24.58us, Blocked wall time: 1.43ms, Peak memory: 128.00KB, Memory allocations: 3, Threads: 4
                blockedWaitForJoinBuildTimes        sum: 3, count: 3, min: 1, max: 1
                blockedWaitForJoinBuildWallNanos    sum: 1.43ms, count: 3, min: 466.00us, max: 495.00us
                distinctKey0                        sum: 6, count: 1, min: 6, max: 6
                hashtable.buildWallNanos            sum: 8.12us, count: 1, min: 8.12us, max: 8.12us
                hashtable.capacity                  sum: 19, count: 1, min: 19, max: 19
                hashtable.numDistinct               sum: 5, count: 1, min: 5, max: 5
                hashtable.numRehashes               sum: 1, count: 1, min: 1, max: 1
                queuedWallNanos                     sum: 2.44ms, count: 4, min: 491.00us, max: 670.00us
                rangeKey0                           sum: 19, count: 1, min: 19, max: 19
                runningAddInputWallNanos            sum: 19.58us, count: 4, min: 0ns, max: 19.58us
                runningFinishWallNanos              sum: 0ns, count: 4, min: 0ns, max: 0ns
                runningGetOutputWallNanos           sum: 876ns, count: 4, min: 168ns, max: 291ns
             HashProbe: Input: 1987 rows (598.13KB, 1 batches), Output: 1987 rows (753.97KB, 2 batches), Cpu time: 253.79us, Blocked wall time: 10.06ms, Peak memory: 174.00KB, Memory allocations: 13, Threads: 4
                blockedWaitForJoinBuildTimes        sum: 4, count: 4, min: 1, max: 1
                blockedWaitForJoinBuildWallNanos    sum: 10.06ms, count: 4, min: 2.42ms, max: 2.58ms
                dynamicFiltersProduced              sum: 4, count: 4, min: 1, max: 1
                queuedWallNanos                     sum: 311.00us, count: 4, min: 3.00us, max: 152.00us
                runningAddInputWallNanos            sum: 20.96us, count: 4, min: 0ns, max: 20.96us
                runningFinishWallNanos              sum: 167.71us, count: 4, min: 1.00us, max: 152.08us
                runningGetOutputWallNanos           sum: 90.17us, count: 4, min: 167ns, max: 89.29us
            -- TableScan[table: supplier] -> s_acctbal:DOUBLE, s_name:VARCHAR, s_address:VARCHAR, s_phone:VARCHAR, s_comment:VARCHAR, s_suppkey:BIGINT, s_nationkey:BIGINT
               Input: 1987 rows (598.13KB, 1 batches), Raw Input: 10000 rows (0B), Output: 1987 rows (598.13KB, 1 batches), Cpu time: 7.16ms, Blocked wall time: 0ns, Peak memory: 6.27MB, Memory allocations: 81, Threads: 4, Splits: 10, DynamicFilter producer plan nodes: 11
                  dataSourceAddSplitWallNanos      sum: 4.18ms, count: 4, min: 842.00us, max: 1.27ms
                  dataSourceReadWallNanos          sum: 1.39ms, count: 4, min: 4.00us, max: 1.37ms
                  dynamicFiltersAccepted           sum: 4, count: 4, min: 1, max: 1
                  flattenStringDictionaryValues    sum: 0, count: 4, min: 0, max: 0
                  ioWaitNanos                      sum: 5.22ms, count: 4, min: 547.00us, max: 2.36ms
                  localReadBytes                   sum: 0B, count: 4, min: 0B, max: 0B
                  maxSingleIoWaitNanos             sum: 3.16ms, count: 4, min: 547.00us, max: 1.15ms
                  numLocalRead                     sum: 0, count: 4, min: 0, max: 0
                  numPrefetch                      sum: 6, count: 4, min: 1, max: 2
                  numRamRead                       sum: 0, count: 4, min: 0, max: 0
                  numStorageRead                   sum: 11, count: 4, min: 2, max: 4
                  overreadBytes                    sum: 653B, count: 4, min: 0B, max: 653B
                  prefetchBytes                    sum: 9.38MB, count: 4, min: 1.56MB, max: 3.13MB
                  preloadedSplits                  sum: 6, count: 6, min: 1, max: 1
                  queryThreadIoLatency             sum: 17, count: 4, min: 2, max: 8
                  ramReadBytes                     sum: 0B, count: 4, min: 0B, max: 0B
                  readyPreloadedSplits             sum: 1, count: 1, min: 1, max: 1
                  runningAddInputWallNanos         sum: 0ns, count: 4, min: 0ns, max: 0ns
                  runningFinishWallNanos           sum: 1.79us, count: 4, min: 291ns, max: 625ns
                  runningGetOutputWallNanos        sum: 14.92ms, count: 4, min: 2.23ms, max: 6.42ms
                  skippedSplitBytes                sum: 0B, count: 4, min: 0B, max: 0B
                  skippedSplits                    sum: 0, count: 4, min: 0, max: 0
                  skippedStrides                   sum: 9, count: 4, min: 0, max: 4
                  storageReadBytes                 sum: 17.19MB, count: 4, min: 3.12MB, max: 6.25MB
                  totalRemainingFilterTime         sum: 0ns, count: 4, min: 0ns, max: 0ns
                  totalScanTime                    sum: 0ns, count: 4, min: 0ns, max: 0ns
            -- HashJoin[INNER n_regionkey=r_regionkey] -> n_nationkey:BIGINT, n_name:VARCHAR
               Output: 5 rows (236B, 1 batches), Cpu time: 326.46us, Blocked wall time: 7.00ms, Peak memory: 68.00KB, Memory allocations: 2
               HashBuild: Input: 1 rows (134B, 1 batches), Output: 0 rows (0B, 0 batches), Cpu time: 9.21us, Blocked wall time: 1.24ms, Peak memory: 68.00KB, Memory allocations: 2, Threads: 4
                  blockedWaitForJoinBuildTimes        sum: 3, count: 3, min: 1, max: 1
                  blockedWaitForJoinBuildWallNanos    sum: 1.24ms, count: 3, min: 333.00us, max: 456.00us
                  distinctKey0                        sum: 2, count: 1, min: 2, max: 2
                  hashtable.buildWallNanos            sum: 41.17us, count: 1, min: 41.17us, max: 41.17us
                  hashtable.capacity                  sum: 2, count: 1, min: 2, max: 2
                  hashtable.numDistinct               sum: 1, count: 1, min: 1, max: 1
                  hashtable.numRehashes               sum: 1, count: 1, min: 1, max: 1
                  queuedWallNanos                     sum: 5.85ms, count: 4, min: 495.00us, max: 2.22ms
                  rangeKey0                           sum: 2, count: 1, min: 2, max: 2
                  runningAddInputWallNanos            sum: 6.38us, count: 4, min: 0ns, max: 6.38us
                  runningFinishWallNanos              sum: 0ns, count: 4, min: 0ns, max: 0ns
                  runningGetOutputWallNanos           sum: 540ns, count: 4, min: 83ns, max: 166ns
               HashProbe: Input: 5 rows (524B, 1 batches), Output: 5 rows (236B, 1 batches), Cpu time: 317.25us, Blocked wall time: 5.76ms, Peak memory: 0B, Memory allocations: 0, Threads: 4
                  blockedWaitForJoinBuildTimes        sum: 4, count: 4, min: 1, max: 1
                  blockedWaitForJoinBuildWallNanos    sum: 5.76ms, count: 4, min: 1.37ms, max: 1.52ms
                  dynamicFiltersProduced              sum: 4, count: 4, min: 1, max: 1
                  queuedWallNanos                     sum: 629.00us, count: 4, min: 14.00us, max: 504.00us
                  replacedWithDynamicFilterRows       sum: 5, count: 1, min: 5, max: 5
                  runningAddInputWallNanos            sum: 333ns, count: 4, min: 0ns, max: 333ns
                  runningFinishWallNanos              sum: 787.96us, count: 4, min: 20.96us, max: 483.33us
                  runningGetOutputWallNanos           sum: 3.50us, count: 4, min: 209ns, max: 2.62us
              -- TableScan[table: nation] -> n_nationkey:BIGINT, n_name:VARCHAR, n_regionkey:BIGINT
                 Input: 5 rows (524B, 1 batches), Raw Input: 25 rows (0B), Output: 5 rows (524B, 1 batches), Cpu time: 1.28ms, Blocked wall time: 0ns, Peak memory: 27.19KB, Memory allocations: 33, Threads: 4, Splits: 10, DynamicFilter producer plan nodes: 9
                    dataSourceAddSplitWallNanos      sum: 144.00us, count: 2, min: 39.00us, max: 105.00us
                    dataSourceReadWallNanos          sum: 47.00us, count: 3, min: 3.00us, max: 35.00us
                    dynamicFiltersAccepted           sum: 4, count: 4, min: 1, max: 1
                    flattenStringDictionaryValues    sum: 0, count: 3, min: 0, max: 0
                    ioWaitNanos                      sum: 52.00us, count: 3, min: 13.00us, max: 23.00us
                    localReadBytes                   sum: 0B, count: 3, min: 0B, max: 0B
                    maxSingleIoWaitNanos             sum: 18.00us, count: 3, min: 4.00us, max: 7.00us
                    numLocalRead                     sum: 0, count: 3, min: 0, max: 0
                    numPrefetch                      sum: 0, count: 3, min: 0, max: 0
                    numRamRead                       sum: 0, count: 3, min: 0, max: 0
                    numStorageRead                   sum: 11, count: 3, min: 2, max: 6
                    overreadBytes                    sum: 178B, count: 3, min: 0B, max: 178B
                    prefetchBytes                    sum: 0B, count: 3, min: 0B, max: 0B
                    preloadedSplits                  sum: 8, count: 8, min: 1, max: 1
                    queryThreadIoLatency             sum: 13, count: 3, min: 2, max: 6
                    ramReadBytes                     sum: 0B, count: 3, min: 0B, max: 0B
                    readyPreloadedSplits             sum: 5, count: 5, min: 1, max: 1
                    runningAddInputWallNanos         sum: 0ns, count: 4, min: 0ns, max: 0ns
                    runningFinishWallNanos           sum: 190.00us, count: 4, min: 13.00us, max: 60.21us
                    runningGetOutputWallNanos        sum: 2.18ms, count: 4, min: 38.88us, max: 852.04us
                    skippedSplitBytes                sum: 0B, count: 3, min: 0B, max: 0B
                    skippedSplits                    sum: 0, count: 3, min: 0, max: 0
                    skippedStrides                   sum: 9, count: 3, min: 1, max: 6
                    storageReadBytes                 sum: 40.91KB, count: 3, min: 8.03KB, max: 24.08KB
                    totalRemainingFilterTime         sum: 0ns, count: 3, min: 0ns, max: 0ns
                    totalScanTime                    sum: 0ns, count: 3, min: 0ns, max: 0ns
              -- TableScan[table: region, range filters: [(r_name, Filter(BytesValues, deterministic, null not allowed))]] -> r_regionkey:BIGINT, r_name:VARCHAR
                 Input: 1 rows (134B, 1 batches), Raw Input: 5 rows (0B), Output: 1 rows (134B, 1 batches), Cpu time: 1.23ms, Blocked wall time: 0ns, Peak memory: 1.94KB, Memory allocations: 14, Threads: 4, Splits: 10
                    dataSourceAddSplitWallNanos      sum: 1.43ms, count: 2, min: 510.00us, max: 922.00us
                    dataSourceReadWallNanos          sum: 31.00us, count: 2, min: 5.00us, max: 26.00us
                    flattenStringDictionaryValues    sum: 0, count: 2, min: 0, max: 0
                    ioWaitNanos                      sum: 44.00us, count: 2, min: 20.00us, max: 24.00us
                    localReadBytes                   sum: 0B, count: 2, min: 0B, max: 0B
                    maxSingleIoWaitNanos             sum: 14.00us, count: 2, min: 6.00us, max: 8.00us
                    numLocalRead                     sum: 0, count: 2, min: 0, max: 0
                    numPrefetch                      sum: 0, count: 2, min: 0, max: 0
                    numRamRead                       sum: 0, count: 2, min: 0, max: 0
                    numStorageRead                   sum: 11, count: 2, min: 5, max: 6
                    overreadBytes                    sum: 101B, count: 2, min: 0B, max: 101B
                    prefetchBytes                    sum: 0B, count: 2, min: 0B, max: 0B
                    preloadedSplits                  sum: 8, count: 8, min: 1, max: 1
                    queryThreadIoLatency             sum: 12, count: 2, min: 5, max: 7
                    ramReadBytes                     sum: 0B, count: 2, min: 0B, max: 0B
                    readyPreloadedSplits             sum: 3, count: 3, min: 1, max: 1
                    runningAddInputWallNanos         sum: 0ns, count: 4, min: 0ns, max: 0ns
                    runningFinishWallNanos           sum: 429.12us, count: 4, min: 709ns, max: 425.54us
                    runningGetOutputWallNanos        sum: 2.51ms, count: 4, min: 11.75us, max: 1.29ms
                    skippedSplitBytes                sum: 0B, count: 2, min: 0B, max: 0B
                    skippedSplits                    sum: 0, count: 2, min: 0, max: 0
                    skippedStrides                   sum: 9, count: 2, min: 4, max: 5
                    storageReadBytes                 sum: 16.78KB, count: 2, min: 8.27KB, max: 8.51KB
                    totalRemainingFilterTime         sum: 0ns, count: 2, min: 0ns, max: 0ns
                    totalScanTime                    sum: 0ns, count: 2, min: 0ns, max: 0ns

Copy link
Collaborator

@pramodsatya pramodsatya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes @deepthydavis, LGTM overall, left some comments.

CMakeLists.txt Outdated
@@ -98,13 +98,13 @@ option(VELOX_ENABLE_EXAMPLES
"Build examples. This will enable VELOX_ENABLE_EXPRESSION automatically."
OFF)
option(VELOX_ENABLE_SUBSTRAIT "Build Substrait-to-Velox converter." OFF)
option(VELOX_ENABLE_BENCHMARKS "Enable Velox top level benchmarks." OFF)
option(VELOX_ENABLE_BENCHMARKS "Enable Velox top level benchmarks." ON)
option(VELOX_ENABLE_BENCHMARKS_BASIC "Enable Velox basic benchmarks." OFF)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert changes to this file and enable them for testing only.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted the changes in CMakeLists.txt

velox/exec/tests/utils/TpchQueryBuilder.cpp Show resolved Hide resolved
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 20, 2024
@facebook-github-bot
Copy link
Contributor

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

Copy link
Collaborator

@pramodsatya pramodsatya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the query plan @deepthydavis.

Hi @aditi-pandit, could you please take a look?

@aditi-pandit
Copy link
Collaborator

Notes : Its a bit odd to me that the tableScan timings for Nation jumped as much between the runs changing numDrivers. But the main node contributing to the time difference is again the TableScan for partsupp likely on account of the dynamic filters.

@aditi-pandit
Copy link
Collaborator

@deepthydavis : Changes look good.

@aditi-pandit aditi-pandit marked this pull request as ready for review June 4, 2024 05:16
Copy link
Collaborator

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @deepthydavis

@aditi-pandit
Copy link
Collaborator

@deepthydavis : Please can you rebase your code and get a clean build for this PR. We can pass on to the Meta team for merge post that.

@aditi-pandit
Copy link
Collaborator

Thanks @deepthydavis

@kgpai : Another missing micro-benchmark test for merge.

@aditi-pandit aditi-pandit added the ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall label Jun 6, 2024
@facebook-github-bot
Copy link
Contributor

@kevinwilfong has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@kevinwilfong merged this pull request in 20470a0.

Copy link

Conbench analyzed the 1 benchmark run on commit 20470a09.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

deepashreeraghu pushed a commit to deepashreeraghu/velox that referenced this pull request Jun 13, 2024
Summary:
This PR introduces TPC-H Query 2 into the TpchQueryBuilder and extends the TpchBenchmark and ParquetTpchTest to include this query. Additionally, it provides a detailed performance comparison with DuckDB using the Parquet file format and includes the output of PrintPlanWithStats for detailed analysis.
Scaling Factor used is 1.
Here is the link to the PowerPoint presentation, which contains a detailed description for each driver and thread : https://ibm.box.com/s/sau464qdfac45aainwpj6pyvkvbtlsat

### Performance Comparison
  Chip: Apple M1 Pro
  Total Number of Cores: 10 (8 performance and 2 efficiency)
  Memory: 32 GB

The following table summarizes the performance comparison between Velox and DuckDB (with Parquet file format) across various numbers of threads/drivers:

| # Num Threads/ Drivers | Velox(ms) | DuckDB(ms) |
|:----------------------:|:---------:|:----------:|
|            1           |     27     |     88.4    |
|            4           |     23     |     84.1    |
|            8           |     25     |     82.8    |
|           16           |     30     |     84    |

Pull Request resolved: facebookincubator#9825

Reviewed By: bikramSingh91

Differential Revision: D58244304

Pulled By: kevinwilfong

fbshipit-source-id: 4d216b10b49847ab692a394783bd5c59a59c9eb2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants