Optimize row hash #6065

comphead · 2023-04-19T19:07:10Z

Which issue does this PR close?

Closes #6064.

Rationale for this change

slice_and_maybe_filter spends some CPU ticks on vector allocations and can be improved

What changes are included in this PR?

Rewrite slice_and_maybe_filter to avoid excessive vector allocations

Are these changes tested?

Yes

Are there any user-facing changes?

No

comphead · 2023-04-19T19:19:04Z

Performance improvement with optimized datafusion::physical_plan::aggregates::row_hash::slice_and_maybe_filter there is ~25% gain.
was 68sec, now 50 sec

2.4 GHz 8-Core Intel Core i9

Query
#5969 (comment)

Dandandan · 2023-04-19T20:42:09Z

datafusion/core/src/physical_plan/aggregates/row_hash.rs

-        .map(|array| array.slice(offsets[0], offsets[1] - offsets[0]))
-        .collect();
+    let null_array = Arc::new(NullArray::new(0)) as ArrayRef;
+    let mut sliced_arrays: Vec<ArrayRef> = vec![null_array; aggr_array.len()];


I don't see why this should be faster? 🤔

The key point of this PR is to get rid of extra allocations #5969 (comment) and allows 20-25% speed gain.

Another optimization that we traverse input collection only once, instead of 2 times in original implementation

See #5969 (comment) -- reported performance improvement

mingmwang · 2023-04-20T03:21:12Z

I will do some test locally tomorrow.

comphead · 2023-04-21T15:23:25Z

I will do some test locally tomorrow.
Hi @mingmwang did you get a chance to test it?

ozankabak · 2023-04-21T23:25:04Z

datafusion/core/src/physical_plan/aggregates/row_hash.rs

+    if let Some(f) = filter_opt {
+        let sliced = f.slice(offsets[0], offsets[1] - offsets[0]);
+        let filter_array = as_boolean_array(&sliced)?;

-            sliced_arrays
-                .iter()
-                .map(|array| filter(array, filter_array).unwrap())
-                .collect::<Vec<ArrayRef>>()
+        for (i, arr) in aggr_array.iter().enumerate() {
+            let sliced = &arr.slice(offsets[0], offsets[1] - offsets[0]);
+            sliced_arrays[i] = filter(sliced, filter_array).unwrap();
+        }
+    } else {
+        for (i, arr) in aggr_array.iter().enumerate() {
+            sliced_arrays[i] = arr.slice(offsets[0], offsets[1] - offsets[0]);
        }
-        None => sliced_arrays,
-    };
-    Ok(filtered_arrays)
+    }


I think writing these loops as a zip of aggr_array.iter() and sliced_arrays.iter_mut() and avoiding the sliced_arrays[i] access inside the loop can make the code (1) a little more idiomatic, and (2) may result in less implicit bounds-checking at run-time.

With zip might even use collect into Vec again to avoid initializing the vec.

mingmwang · 2023-04-23T03:21:33Z

@comphead
Sorry that, I had tested this PR on my Mac locally and do not see any performance improvement on tpch-q17(high cardinality aggregation), and there is about 10% downgrade.

Before this PR:

q17(sf =1)
Running benchmarks with the following options: DataFusionBenchmarkOpt { query: Some(17), debug: false, iterations: 3, partitions: 1, batch_size: 8192, path: "./parquet_data", file_format: "parquet", mem_table: false, output_path: None, disable_statistics: true, enable_scheduler: false }
Query 17 iteration 0 took 1942.3 ms and returned 1 rows
Query 17 iteration 1 took 1918.5 ms and returned 1 rows
Query 17 iteration 2 took 1940.6 ms and returned 1 rows
Query 17 avg time: 1933.79 ms

q17(sf =10)
Running benchmarks with the following options: DataFusionBenchmarkOpt { query: Some(17), debug: false, iterations: 3, partitions: 1, batch_size: 8192, path: "./parquet_data10", file_format: "parquet", mem_table: false, output_path: None, disable_statistics: true, enable_scheduler: false }
Query 17 iteration 0 took 32352.1 ms and returned 1 rows
Query 17 iteration 1 took 32210.5 ms and returned 1 rows
Query 17 iteration 2 took 32003.7 ms and returned 1 rows
Query 17 avg time: 32188.74 ms

this PR:
q17(sf =1)
Running benchmarks with the following options: DataFusionBenchmarkOpt { query: Some(17), debug: false, iterations: 3, partitions: 1, batch_size: 8192, path: "./parquet_data", file_format: "parquet", mem_table: false, output_path: None, disable_statistics: true, enable_scheduler: false }
Query 17 iteration 0 took 2159.3 ms and returned 1 rows
Query 17 iteration 1 took 2106.7 ms and returned 1 rows
Query 17 iteration 2 took 2111.4 ms and returned 1 rows
Query 17 avg time: 2125.82 ms

q17(sf =10)
Running benchmarks with the following options: DataFusionBenchmarkOpt { query: Some(17), debug: false, iterations: 3, partitions: 1, batch_size: 8192, path: "./parquet_data10", file_format: "parquet", mem_table: false, output_path: None, disable_statistics: true, enable_scheduler: false }
Query 17 iteration 0 took 34505.3 ms and returned 1 rows
Query 17 iteration 1 took 34240.6 ms and returned 1 rows
Query 17 iteration 2 took 33967.1 ms and returned 1 rows
Query 17 avg time: 34237.68 ms

How this was tested:

if matches!(self.mode, AggregateMode::Partial | AggregateMode::Single)
                && normal_aggr_input_values.is_empty()
                && normal_filter_values.is_empty()
                && groups_with_rows.len() >= batch.num_rows() / 10

change the magic number from 10 to 1

if matches!(self.mode, AggregateMode::Partial | AggregateMode::Single)
                && normal_aggr_input_values.is_empty()
                && normal_filter_values.is_empty()
                && groups_with_rows.len() >= batch.num_rows() / 1

So that the update accumulators will use the method update_accumulators_using_batch() and call the method slice_and_maybe_filter()

mingmwang · 2023-04-23T03:27:32Z

cargo run --release --bin tpch -- benchmark datafusion --iterations 3 --path ./parquet_data10 --format parquet --query 17 -n 1 --disable-statistics

cargo run --release --bin tpch -- benchmark datafusion --iterations 3 --path ./parquet_data --format parquet --query 17 -n 1 --disable-statistics

My Mac:
MacBook Pro (16-inch, 2021)
Apple M1 Max
Memory 64 GB

mingmwang · 2023-04-23T03:42:11Z

@yahoNanJing

yahoNanJing · 2023-04-23T03:54:23Z

datafusion/core/src/physical_plan/aggregates/row_hash.rs

-                .collect::<Vec<ArrayRef>>()
+        for (i, arr) in aggr_array.iter().enumerate() {
+            let sliced = &arr.slice(offsets[0], offsets[1] - offsets[0]);
+            sliced_arrays[i] = filter(sliced, filter_array).unwrap();


Actually, I don't think it's a good idea to do more than one thing in a loop, especially the slice of Array is not so light weight behavior. And I would prefer the previous implementation.

mingmwang · 2023-04-23T04:28:00Z

From the Flame Graph, it shows the hot path of the method slice_and_maybe_filter should be SpecFromIter:: from_iter (). And the hot path of the method of SpecFromIter::from_iter() should be the slice of ArrowArray, but not the memory allocations of the Vec.

impl<T, I> SpecFromIterNested<T, I> for Vec<T>
where
    I: TrustedLen<Item = T>,
{
    fn from_iter(iterator: I) -> Self {
        let mut vector = match iterator.size_hint() {
            (_, Some(upper)) => Vec::with_capacity(upper),
            // TrustedLen contract guarantees that `size_hint() == (_, None)` means that there
            // are more than `usize::MAX` elements.
            // Since the previous branch would eagerly panic if the capacity is too large
            // (via `with_capacity`) we do the same here.
            _ => panic!("capacity overflow"),
        };
        // reuse extend specialization for TrustedLen
        vector.spec_extend(iterator);
        vector
    }
}

mingmwang · 2023-04-23T06:20:51Z

https://doc.rust-lang.org/std/iter/trait.TrustedLen.html#impl-TrustedLen-for-Iter%3C'_,+T%3E-1

std::slice::Iter implements the TrustedLen trait, so the SpecFromIter:: from_iter() dispatch to a specialization implementation.

comphead · 2023-04-23T21:39:08Z

Thanks folks for the feedback

@mingmwang this change was tested before you PR #6003 merged and only for q32. I will retest the latest codebase soon with other benchmarks.

@ozankabak mutating by dereferencing sounds good to me, I will test it out

@yahoNanJing I'm not getting your part, are you saying 2 iterations each doing the operation is the faster than 1 iteration doing 2 ops every iteration?

I will retest it soon, and share results. If there is still no perf benefit after #6003 I will close the PR.

mingmwang · 2023-04-24T06:46:35Z

Can someone else help to test and verify this on other machines?
Sometimes I think it is very machine specific and I can not explain the performance downgrade either.

alamb · 2023-04-24T14:24:44Z

I will run test this PR using https://github.com/alamb/datafusion-benchmarking/blob/main/bench.sh

alamb · 2023-04-24T16:28:19Z

Here are some benchmark results:

++ echo '****** TPCH SF1 (Parquet) ******'
****** TPCH SF1 (Parquet) ******
++ python3 /home/alamb/arrow-datafusion/benchmarks/compare.py /home/alamb/benchmarking/optimize_row_hash/tpch_sf1_parquet_main.json /home/alamb/benchmarking/optimize_row_hash/t\
pch_sf1_parquet_branch.json
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃ /home/alamb… ┃ /home/alamb… ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1     │    1641.50ms │    1668.52ms │    no change │
│ QQuery 2     │     433.09ms │     457.15ms │ 1.06x slower │
│ QQuery 3     │     551.05ms │     560.14ms │    no change │
│ QQuery 4     │     212.82ms │     214.71ms │    no change │
│ QQuery 5     │     722.18ms │     725.48ms │    no change │
│ QQuery 6     │     454.35ms │     459.32ms │    no change │
│ QQuery 7     │    1277.69ms │    1256.32ms │    no change │
│ QQuery 8     │     714.88ms │     714.78ms │    no change │
│ QQuery 9     │    1333.88ms │    1365.51ms │    no change │
│ QQuery 10    │     792.20ms │     818.54ms │    no change │
│ QQuery 11    │     345.96ms │     351.47ms │    no change │
│ QQuery 12    │     333.29ms │     336.69ms │    no change │
│ QQuery 13    │    1375.21ms │    1462.85ms │ 1.06x slower │
│ QQuery 14    │     465.92ms │     453.73ms │    no change │
│ QQuery 15    │     432.71ms │     458.58ms │ 1.06x slower │
│ QQuery 16    │     340.73ms │     358.66ms │ 1.05x slower │
│ QQuery 17    │    3961.24ms │    4538.85ms │ 1.15x slower │
│ QQuery 18    │    3493.70ms │    3737.18ms │ 1.07x slower │
│ QQuery 19    │     761.64ms │     765.50ms │    no change │
│ QQuery 20    │    1304.52ms │    1478.34ms │ 1.13x slower │
│ QQuery 21    │    1621.47ms │    1718.01ms │ 1.06x slower │
│ QQuery 22    │     188.55ms │     189.26ms │    no change │
└──────────────┴──────────────┴──────────────┴──────────────┘
++ echo '****** TPCH SF1 (mem) ******'
****** TPCH SF1 (mem) ******
++ python3 /home/alamb/arrow-datafusion/benchmarks/compare.py /home/alamb/benchmarking/optimize_row_hash/tpch_sf1_mem_main.json /home/alamb/benchmarking/optimize_row_hash/tpch_\
sf1_mem_branch.json
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃           -o ┃           -o ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │     929.87ms │     927.61ms │     no change │
│ QQuery 2     │     312.32ms │     335.58ms │  1.07x slower │
│ QQuery 3     │     187.35ms │     174.29ms │ +1.07x faster │
│ QQuery 4     │      99.81ms │     104.31ms │     no change │
│ QQuery 5     │     475.38ms │     465.07ms │     no change │
│ QQuery 6     │      38.34ms │      37.14ms │     no change │
│ QQuery 7     │    1076.25ms │    1165.47ms │  1.08x slower │
│ QQuery 8     │     254.25ms │     242.57ms │     no change │
│ QQuery 9     │     631.52ms │     611.30ms │     no change │
│ QQuery 10    │     335.70ms │     346.45ms │     no change │
│ QQuery 11    │     297.94ms │     293.44ms │     no change │
│ QQuery 12    │     154.28ms │     153.63ms │     no change │
│ QQuery 13    │     842.44ms │     983.07ms │  1.17x slower │
│ QQuery 14    │      55.30ms │      58.83ms │  1.06x slower │
│ QQuery 15    │     128.91ms │     123.69ms │     no change │
│ QQuery 16    │     265.96ms │     258.43ms │     no change │
│ QQuery 17    │    3514.38ms │    4024.28ms │  1.15x slower │
│ QQuery 18    │    3158.26ms │    3294.42ms │     no change │
│ QQuery 19    │     145.06ms │     150.17ms │     no change │
│ QQuery 20    │    1093.86ms │    1163.19ms │  1.06x slower │
│ QQuery 21    │    1437.97ms │    1483.74ms │     no change │
│ QQuery 22    │     151.64ms │     139.58ms │ +1.09x faster │
└──────────────┴──────────────┴──────────────┴───────────────┘

For reference, here is the same benchmark run against main itself:

****** TPCH SF1 (Parquet) ******
+ python3 /home/alamb/arrow-datafusion/benchmarks/compare.py /home/alamb/benchmarking/alamb-main/tpch_sf1_parquet_main.json /home/alamb/benchmarking/alamb-main/tpch_sf1_parquet\
_branch.json
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃ /home/alamb… ┃ /home/alamb… ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │    1430.86ms │    1423.29ms │     no change │
│ QQuery 2     │     399.75ms │     405.00ms │     no change │
│ QQuery 3     │     520.40ms │     525.56ms │     no change │
│ QQuery 4     │     218.29ms │     223.87ms │     no change │
│ QQuery 5     │     693.57ms │     685.46ms │     no change │
│ QQuery 6     │     416.62ms │     423.02ms │     no change │
│ QQuery 7     │    1258.17ms │    1243.79ms │     no change │
│ QQuery 8     │     690.25ms │     687.29ms │     no change │
│ QQuery 9     │    1304.02ms │    1288.01ms │     no change │
│ QQuery 10    │     770.91ms │     748.94ms │     no change │
│ QQuery 11    │     356.32ms │     336.55ms │ +1.06x faster │
│ QQuery 12    │     335.14ms │     329.12ms │     no change │
│ QQuery 13    │    1170.83ms │    1146.78ms │     no change │
│ QQuery 14    │     422.25ms │     421.47ms │     no change │
│ QQuery 15    │     391.14ms │     381.71ms │     no change │
│ QQuery 16    │     348.38ms │     344.13ms │     no change │
│ QQuery 17    │    2860.96ms │    2838.27ms │     no change │
│ QQuery 18    │    3726.11ms │    3734.67ms │     no change │
│ QQuery 19    │     728.53ms │     737.35ms │     no change │
│ QQuery 20    │    1250.75ms │    1208.06ms │     no change │
│ QQuery 21    │    1688.40ms │    1757.45ms │     no change │
│ QQuery 22    │     192.36ms │     190.43ms │     no change │
└──────────────┴──────────────┴──────────────┴───────────────┘
+ echo '****** TPCH SF1 (mem) ******'
****** TPCH SF1 (mem) ******
+ python3 /home/alamb/arrow-datafusion/benchmarks/compare.py /home/alamb/benchmarking/alamb-main/tpch_sf1_mem_main.json /home/alamb/benchmarking/alamb-main/tpch_sf1_mem_branch.\
json
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃           -o ┃           -o ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │     759.07ms │     770.73ms │     no change │
│ QQuery 2     │     269.81ms │     291.05ms │  1.08x slower │
│ QQuery 3     │     180.67ms │     161.61ms │ +1.12x faster │
│ QQuery 4     │     105.16ms │     105.46ms │     no change │
│ QQuery 5     │     467.46ms │     466.11ms │     no change │
│ QQuery 6     │      38.08ms │      42.72ms │  1.12x slower │
│ QQuery 7     │    1170.05ms │    1147.43ms │     no change │
│ QQuery 8     │     249.06ms │     238.74ms │     no change │
│ QQuery 9     │     613.38ms │     609.98ms │     no change │
│ QQuery 10    │     342.67ms │     327.23ms │     no change │
│ QQuery 11    │     279.84ms │     281.69ms │     no change │
│ QQuery 12    │     143.94ms │     146.57ms │     no change │
│ QQuery 13    │     676.22ms │     668.79ms │     no change │
│ QQuery 14    │      53.06ms │      51.73ms │     no change │
│ QQuery 15    │      98.47ms │      92.68ms │ +1.06x faster │
│ QQuery 16    │     244.93ms │     257.73ms │  1.05x slower │
│ QQuery 17    │    2473.33ms │    2503.47ms │     no change │
│ QQuery 18    │    3150.26ms │    3169.32ms │     no change │
│ QQuery 19    │     154.90ms │     150.53ms │     no change │
│ QQuery 20    │     969.12ms │     929.69ms │     no change │
│ QQuery 21    │    1476.10ms │    1457.34ms │     no change │
│ QQuery 22    │     148.71ms │     143.20ms │     no change │
└──────────────┴──────────────┴──────────────┴───────────────┘

comphead · 2023-04-24T17:05:49Z

I have also double checked from my side with latest main which also includes row_hash optimizations from @mingmwang
Testing is against 13G hits.parquet file, 2 runs(cold run, warm run)

datafusion-cli built in release mode
Machine 2.4 GHz 8-Core Intel Core i9 MacOS

With this PR

DataFusion CLI v23.0.0
❯ CREATE EXTERNAL TABLE hits STORED AS PARQUET location 'hits.parquet';
0 rows in set. Query took 0.079 seconds.
❯ SELECT "WatchID", "ClientIP", COUNT(*) AS c, SUM("IsRefresh"), AVG("ResolutionWidth") FROM hits GROUP BY "WatchID", "ClientIP" ORDER BY c DESC LIMIT 10;
+---------------------+-------------+---+---------------------+---------------------------+
| WatchID             | ClientIP    | c | SUM(hits.IsRefresh) | AVG(hits.ResolutionWidth) |
+---------------------+-------------+---+---------------------+---------------------------+
| 6655575552203051303 | 1611957945  | 2 | 0                   | 1638.0                    |
| 7904046282518428963 | 1509330109  | 2 | 0                   | 1368.0                    |
| 8566928176839891583 | -1402644643 | 2 | 0                   | 1368.0                    |
| 7224410078130478461 | -776509581  | 2 | 0                   | 1368.0                    |
| 7968574085024155935 | -986722817  | 1 | 0                   | 1990.0                    |
| 8683116696854507598 | -1887352109 | 1 | 0                   | 1917.0                    |
| 6071982018954122379 | 1154898388  | 1 | 0                   | 1638.0                    |
| 7044330683984323480 | -765736418  | 1 | 0                   | 1750.0                    |
| 5170668904757974782 | 580435115   | 1 | 0                   | 1087.0                    |
| 7121372218861667575 | -888761092  | 1 | 0                   | 1368.0                    |
+---------------------+-------------+---+---------------------+---------------------------+
10 rows in set. Query took 51.300 seconds.
❯ SELECT "WatchID", "ClientIP", COUNT(*) AS c, SUM("IsRefresh"), AVG("ResolutionWidth") FROM hits GROUP BY "WatchID", "ClientIP" ORDER BY c DESC LIMIT 10;
+---------------------+-------------+---+---------------------+---------------------------+
| WatchID             | ClientIP    | c | SUM(hits.IsRefresh) | AVG(hits.ResolutionWidth) |
+---------------------+-------------+---+---------------------+---------------------------+
| 6655575552203051303 | 1611957945  | 2 | 0                   | 1638.0                    |
| 7904046282518428963 | 1509330109  | 2 | 0                   | 1368.0                    |
| 8566928176839891583 | -1402644643 | 2 | 0                   | 1368.0                    |
| 7224410078130478461 | -776509581  | 2 | 0                   | 1368.0                    |
| 5811527790243312578 | 56896075    | 1 | 0                   | 1368.0                    |
| 5998597912391672099 | 807012274   | 1 | 0                   | 1996.0                    |
| 6071982018954122379 | 1154898388  | 1 | 0                   | 1638.0                    |
| 7044330683984323480 | -765736418  | 1 | 0                   | 1750.0                    |
| 5170668904757974782 | 580435115   | 1 | 0                   | 1087.0                    |
| 7121372218861667575 | -888761092  | 1 | 0                   | 1368.0                    |
+---------------------+-------------+---+---------------------+---------------------------+
10 rows in set. Query took 47.066 seconds.

Without PR

DataFusion CLI v23.0.0
❯ CREATE EXTERNAL TABLE hits STORED AS PARQUET location 'hits.parquet';
0 rows in set. Query took 0.066 seconds.
❯ SELECT "WatchID", "ClientIP", COUNT(*) AS c, SUM("IsRefresh"), AVG("ResolutionWidth") FROM hits GROUP BY "WatchID", "ClientIP" ORDER BY c DESC LIMIT 10;
+---------------------+-------------+---+---------------------+---------------------------+
| WatchID             | ClientIP    | c | SUM(hits.IsRefresh) | AVG(hits.ResolutionWidth) |
+---------------------+-------------+---+---------------------+---------------------------+
| 6655575552203051303 | 1611957945  | 2 | 0                   | 1638.0                    |
| 7904046282518428963 | 1509330109  | 2 | 0                   | 1368.0                    |
| 8566928176839891583 | -1402644643 | 2 | 0                   | 1368.0                    |
| 7224410078130478461 | -776509581  | 2 | 0                   | 1368.0                    |
| 7876723297163966966 | -1495669395 | 1 | 0                   | 1750.0                    |
| 9012818526311489736 | 1663619136  | 1 | 0                   | 1638.0                    |
| 6071982018954122379 | 1154898388  | 1 | 0                   | 1638.0                    |
| 7044330683984323480 | -765736418  | 1 | 0                   | 1750.0                    |
| 5170668904757974782 | 580435115   | 1 | 0                   | 1087.0                    |
| 7121372218861667575 | -888761092  | 1 | 0                   | 1368.0                    |
+---------------------+-------------+---+---------------------+---------------------------+
10 rows in set. Query took 50.776 seconds.
❯ SELECT "WatchID", "ClientIP", COUNT(*) AS c, SUM("IsRefresh"), AVG("ResolutionWidth") FROM hits GROUP BY "WatchID", "ClientIP" ORDER BY c DESC LIMIT 10;
+---------------------+-------------+---+---------------------+---------------------------+
| WatchID             | ClientIP    | c | SUM(hits.IsRefresh) | AVG(hits.ResolutionWidth) |
+---------------------+-------------+---+---------------------+---------------------------+
| 6655575552203051303 | 1611957945  | 2 | 0                   | 1638.0                    |
| 7904046282518428963 | 1509330109  | 2 | 0                   | 1368.0                    |
| 8566928176839891583 | -1402644643 | 2 | 0                   | 1368.0                    |
| 7224410078130478461 | -776509581  | 2 | 0                   | 1368.0                    |
| 5981193970486754997 | -1725568709 | 1 | 0                   | 1917.0                    |
| 6089584153122015492 | 1209163516  | 1 | 0                   | 1996.0                    |
| 6071982018954122379 | 1154898388  | 1 | 0                   | 1638.0                    |
| 7044330683984323480 | -765736418  | 1 | 0                   | 1750.0                    |
| 5170668904757974782 | 580435115   | 1 | 0                   | 1087.0                    |
| 7121372218861667575 | -888761092  | 1 | 0                   | 1368.0                    |
+---------------------+-------------+---+---------------------+---------------------------+
10 rows in set. Query took 48.886 seconds.

Before #6003 DF took 62 and 58 sec respectively.
That mean @mingmwang fixes performance and slice_and_maybe_filter was not a bottleneck anymore so this PR doesn't bring any benefit. I will close it. Thanks all for participating

alamb · 2023-04-24T17:22:49Z

Thanks @comphead and @mingmwang and @ozankabak and @Dandandan

comphead added 2 commits April 19, 2023 08:25

infer column nullability on joins

0b685a8

fix LEFT join coOptimize row_hash performance

5264b9c

github-actions bot added core Core DataFusion crate logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt) labels Apr 19, 2023

fix fmt

5d59f01

github-actions bot removed the logical-expr Logical plan and expressions label Apr 19, 2023

fix fmt

ea7d11f

github-actions bot removed the sqllogictest SQL Logic Tests (.slt) label Apr 19, 2023

Dandandan reviewed Apr 19, 2023

View reviewed changes

ozankabak reviewed Apr 21, 2023

View reviewed changes

yahoNanJing reviewed Apr 23, 2023

View reviewed changes

Merge branch 'apache:main' into optimize_row_hash

557779f

comphead closed this Apr 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize row hash #6065

Optimize row hash #6065

comphead commented Apr 19, 2023

comphead commented Apr 19, 2023

Dandandan Apr 19, 2023

comphead Apr 19, 2023 •

edited

Loading

comphead Apr 19, 2023

alamb Apr 22, 2023

mingmwang commented Apr 20, 2023

comphead commented Apr 21, 2023

ozankabak Apr 21, 2023

Dandandan Apr 22, 2023

mingmwang commented Apr 23, 2023

mingmwang commented Apr 23, 2023 •

edited

Loading

mingmwang commented Apr 23, 2023

yahoNanJing Apr 23, 2023 •

edited

Loading

mingmwang commented Apr 23, 2023 •

edited

Loading

mingmwang commented Apr 23, 2023 •

edited

Loading

comphead commented Apr 23, 2023

mingmwang commented Apr 24, 2023

alamb commented Apr 24, 2023

alamb commented Apr 24, 2023

comphead commented Apr 24, 2023

alamb commented Apr 24, 2023 •

edited

Loading

Optimize row hash #6065

Optimize row hash #6065

Conversation

comphead commented Apr 19, 2023

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

comphead commented Apr 19, 2023

Dandandan Apr 19, 2023

Choose a reason for hiding this comment

comphead Apr 19, 2023 • edited Loading

Choose a reason for hiding this comment

comphead Apr 19, 2023

Choose a reason for hiding this comment

alamb Apr 22, 2023

Choose a reason for hiding this comment

mingmwang commented Apr 20, 2023

comphead commented Apr 21, 2023

ozankabak Apr 21, 2023

Choose a reason for hiding this comment

Dandandan Apr 22, 2023

Choose a reason for hiding this comment

mingmwang commented Apr 23, 2023

mingmwang commented Apr 23, 2023 • edited Loading

mingmwang commented Apr 23, 2023

yahoNanJing Apr 23, 2023 • edited Loading

Choose a reason for hiding this comment

mingmwang commented Apr 23, 2023 • edited Loading

mingmwang commented Apr 23, 2023 • edited Loading

comphead commented Apr 23, 2023

mingmwang commented Apr 24, 2023

alamb commented Apr 24, 2023

alamb commented Apr 24, 2023

comphead commented Apr 24, 2023

alamb commented Apr 24, 2023 • edited Loading

comphead Apr 19, 2023 •

edited

Loading

mingmwang commented Apr 23, 2023 •

edited

Loading

yahoNanJing Apr 23, 2023 •

edited

Loading

mingmwang commented Apr 23, 2023 •

edited

Loading

mingmwang commented Apr 23, 2023 •

edited

Loading

alamb commented Apr 24, 2023 •

edited

Loading