Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize row hash #6065

Closed
wants to merge 5 commits into from
Closed

Optimize row hash #6065

wants to merge 5 commits into from

Conversation

comphead
Copy link
Contributor

Which issue does this PR close?

Closes #6064.

Rationale for this change

slice_and_maybe_filter spends some CPU ticks on vector allocations and can be improved

What changes are included in this PR?

Rewrite slice_and_maybe_filter to avoid excessive vector allocations

Are these changes tested?

Yes

Are there any user-facing changes?

No

@github-actions github-actions bot added core Core DataFusion crate logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt) labels Apr 19, 2023
@github-actions github-actions bot removed the logical-expr Logical plan and expressions label Apr 19, 2023
@github-actions github-actions bot removed the sqllogictest SQL Logic Tests (.slt) label Apr 19, 2023
@comphead
Copy link
Contributor Author

Performance improvement with optimized datafusion::physical_plan::aggregates::row_hash::slice_and_maybe_filter there is ~25% gain.
was 68sec, now 50 sec

2.4 GHz 8-Core Intel Core i9

Query
#5969 (comment)

.map(|array| array.slice(offsets[0], offsets[1] - offsets[0]))
.collect();
let null_array = Arc::new(NullArray::new(0)) as ArrayRef;
let mut sliced_arrays: Vec<ArrayRef> = vec![null_array; aggr_array.len()];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why this should be faster? 🤔

Copy link
Contributor Author

@comphead comphead Apr 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key point of this PR is to get rid of extra allocations #5969 (comment) and allows 20-25% speed gain.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another optimization that we traverse input collection only once, instead of 2 times in original implementation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #5969 (comment) -- reported performance improvement

@mingmwang
Copy link
Contributor

I will do some test locally tomorrow.

@comphead
Copy link
Contributor Author

I will do some test locally tomorrow.
Hi @mingmwang did you get a chance to test it?

Comment on lines +779 to +791
if let Some(f) = filter_opt {
let sliced = f.slice(offsets[0], offsets[1] - offsets[0]);
let filter_array = as_boolean_array(&sliced)?;

sliced_arrays
.iter()
.map(|array| filter(array, filter_array).unwrap())
.collect::<Vec<ArrayRef>>()
for (i, arr) in aggr_array.iter().enumerate() {
let sliced = &arr.slice(offsets[0], offsets[1] - offsets[0]);
sliced_arrays[i] = filter(sliced, filter_array).unwrap();
}
} else {
for (i, arr) in aggr_array.iter().enumerate() {
sliced_arrays[i] = arr.slice(offsets[0], offsets[1] - offsets[0]);
}
None => sliced_arrays,
};
Ok(filtered_arrays)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think writing these loops as a zip of aggr_array.iter() and sliced_arrays.iter_mut() and avoiding the sliced_arrays[i] access inside the loop can make the code (1) a little more idiomatic, and (2) may result in less implicit bounds-checking at run-time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With zip might even use collect into Vec again to avoid initializing the vec.

@mingmwang
Copy link
Contributor

@comphead
Sorry that, I had tested this PR on my Mac locally and do not see any performance improvement on tpch-q17(high cardinality aggregation), and there is about 10% downgrade.

Before this PR:

q17(sf =1)
Running benchmarks with the following options: DataFusionBenchmarkOpt { query: Some(17), debug: false, iterations: 3, partitions: 1, batch_size: 8192, path: "./parquet_data", file_format: "parquet", mem_table: false, output_path: None, disable_statistics: true, enable_scheduler: false }
Query 17 iteration 0 took 1942.3 ms and returned 1 rows
Query 17 iteration 1 took 1918.5 ms and returned 1 rows
Query 17 iteration 2 took 1940.6 ms and returned 1 rows
Query 17 avg time: 1933.79 ms

q17(sf =10)
Running benchmarks with the following options: DataFusionBenchmarkOpt { query: Some(17), debug: false, iterations: 3, partitions: 1, batch_size: 8192, path: "./parquet_data10", file_format: "parquet", mem_table: false, output_path: None, disable_statistics: true, enable_scheduler: false }
Query 17 iteration 0 took 32352.1 ms and returned 1 rows
Query 17 iteration 1 took 32210.5 ms and returned 1 rows
Query 17 iteration 2 took 32003.7 ms and returned 1 rows
Query 17 avg time: 32188.74 ms

this PR:
q17(sf =1)
Running benchmarks with the following options: DataFusionBenchmarkOpt { query: Some(17), debug: false, iterations: 3, partitions: 1, batch_size: 8192, path: "./parquet_data", file_format: "parquet", mem_table: false, output_path: None, disable_statistics: true, enable_scheduler: false }
Query 17 iteration 0 took 2159.3 ms and returned 1 rows
Query 17 iteration 1 took 2106.7 ms and returned 1 rows
Query 17 iteration 2 took 2111.4 ms and returned 1 rows
Query 17 avg time: 2125.82 ms

q17(sf =10)
Running benchmarks with the following options: DataFusionBenchmarkOpt { query: Some(17), debug: false, iterations: 3, partitions: 1, batch_size: 8192, path: "./parquet_data10", file_format: "parquet", mem_table: false, output_path: None, disable_statistics: true, enable_scheduler: false }
Query 17 iteration 0 took 34505.3 ms and returned 1 rows
Query 17 iteration 1 took 34240.6 ms and returned 1 rows
Query 17 iteration 2 took 33967.1 ms and returned 1 rows
Query 17 avg time: 34237.68 ms


How this was tested:

if matches!(self.mode, AggregateMode::Partial | AggregateMode::Single)
                && normal_aggr_input_values.is_empty()
                && normal_filter_values.is_empty()
                && groups_with_rows.len() >= batch.num_rows() / 10

change the magic number from 10 to 1

if matches!(self.mode, AggregateMode::Partial | AggregateMode::Single)
                && normal_aggr_input_values.is_empty()
                && normal_filter_values.is_empty()
                && groups_with_rows.len() >= batch.num_rows() / 1

So that the update accumulators will use the method update_accumulators_using_batch() and call the method slice_and_maybe_filter()

@mingmwang
Copy link
Contributor

mingmwang commented Apr 23, 2023

cargo run --release --bin tpch -- benchmark datafusion --iterations 3 --path ./parquet_data10 --format parquet --query 17 -n 1 --disable-statistics

cargo run --release --bin tpch -- benchmark datafusion --iterations 3 --path ./parquet_data --format parquet --query 17 -n 1 --disable-statistics

My Mac:
MacBook Pro (16-inch, 2021)
Apple M1 Max
Memory 64 GB

@mingmwang
Copy link
Contributor

@yahoNanJing

.collect::<Vec<ArrayRef>>()
for (i, arr) in aggr_array.iter().enumerate() {
let sliced = &arr.slice(offsets[0], offsets[1] - offsets[0]);
sliced_arrays[i] = filter(sliced, filter_array).unwrap();
Copy link
Contributor

@yahoNanJing yahoNanJing Apr 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I don't think it's a good idea to do more than one thing in a loop, especially the slice of Array is not so light weight behavior. And I would prefer the previous implementation.

@mingmwang
Copy link
Contributor

mingmwang commented Apr 23, 2023

From the Flame Graph, it shows the hot path of the method slice_and_maybe_filter should be SpecFromIter:: from_iter (). And the hot path of the method of SpecFromIter::from_iter() should be the slice of ArrowArray, but not the memory allocations of the Vec.

impl<T, I> SpecFromIterNested<T, I> for Vec<T>
where
    I: TrustedLen<Item = T>,
{
    fn from_iter(iterator: I) -> Self {
        let mut vector = match iterator.size_hint() {
            (_, Some(upper)) => Vec::with_capacity(upper),
            // TrustedLen contract guarantees that `size_hint() == (_, None)` means that there
            // are more than `usize::MAX` elements.
            // Since the previous branch would eagerly panic if the capacity is too large
            // (via `with_capacity`) we do the same here.
            _ => panic!("capacity overflow"),
        };
        // reuse extend specialization for TrustedLen
        vector.spec_extend(iterator);
        vector
    }
}

@mingmwang
Copy link
Contributor

mingmwang commented Apr 23, 2023

https://doc.rust-lang.org/std/iter/trait.TrustedLen.html#impl-TrustedLen-for-Iter%3C'_,+T%3E-1

std::slice::Iter implements the TrustedLen trait, so the SpecFromIter:: from_iter() dispatch to a specialization implementation.

@comphead
Copy link
Contributor Author

Thanks folks for the feedback

@mingmwang this change was tested before you PR #6003 merged and only for q32. I will retest the latest codebase soon with other benchmarks.

@ozankabak mutating by dereferencing sounds good to me, I will test it out

@yahoNanJing I'm not getting your part, are you saying 2 iterations each doing the operation is the faster than 1 iteration doing 2 ops every iteration?

I will retest it soon, and share results. If there is still no perf benefit after #6003 I will close the PR.

@mingmwang
Copy link
Contributor

Can someone else help to test and verify this on other machines?
Sometimes I think it is very machine specific and I can not explain the performance downgrade either.

@alamb
Copy link
Contributor

alamb commented Apr 24, 2023

@alamb
Copy link
Contributor

alamb commented Apr 24, 2023

Here are some benchmark results:

++ echo '****** TPCH SF1 (Parquet) ******'
****** TPCH SF1 (Parquet) ******
++ python3 /home/alamb/arrow-datafusion/benchmarks/compare.py /home/alamb/benchmarking/optimize_row_hash/tpch_sf1_parquet_main.json /home/alamb/benchmarking/optimize_row_hash/t\
pch_sf1_parquet_branch.json
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃ /home/alamb… ┃ /home/alamb… ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1     │    1641.50ms │    1668.52ms │    no change │
│ QQuery 2     │     433.09ms │     457.15ms │ 1.06x slower │
│ QQuery 3     │     551.05ms │     560.14ms │    no change │
│ QQuery 4     │     212.82ms │     214.71ms │    no change │
│ QQuery 5     │     722.18ms │     725.48ms │    no change │
│ QQuery 6     │     454.35ms │     459.32ms │    no change │
│ QQuery 7     │    1277.69ms │    1256.32ms │    no change │
│ QQuery 8     │     714.88ms │     714.78ms │    no change │
│ QQuery 9     │    1333.88ms │    1365.51ms │    no change │
│ QQuery 10    │     792.20ms │     818.54ms │    no change │
│ QQuery 11    │     345.96ms │     351.47ms │    no change │
│ QQuery 12    │     333.29ms │     336.69ms │    no change │
│ QQuery 13    │    1375.21ms │    1462.85ms │ 1.06x slower │
│ QQuery 14    │     465.92ms │     453.73ms │    no change │
│ QQuery 15    │     432.71ms │     458.58ms │ 1.06x slower │
│ QQuery 16    │     340.73ms │     358.66ms │ 1.05x slower │
│ QQuery 17    │    3961.24ms │    4538.85ms │ 1.15x slower │
│ QQuery 18    │    3493.70ms │    3737.18ms │ 1.07x slower │
│ QQuery 19    │     761.64ms │     765.50ms │    no change │
│ QQuery 20    │    1304.52ms │    1478.34ms │ 1.13x slower │
│ QQuery 21    │    1621.47ms │    1718.01ms │ 1.06x slower │
│ QQuery 22    │     188.55ms │     189.26ms │    no change │
└──────────────┴──────────────┴──────────────┴──────────────┘
++ echo '****** TPCH SF1 (mem) ******'
****** TPCH SF1 (mem) ******
++ python3 /home/alamb/arrow-datafusion/benchmarks/compare.py /home/alamb/benchmarking/optimize_row_hash/tpch_sf1_mem_main.json /home/alamb/benchmarking/optimize_row_hash/tpch_\
sf1_mem_branch.json
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃           -o ┃           -o ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │     929.87ms │     927.61ms │     no change │
│ QQuery 2     │     312.32ms │     335.58ms │  1.07x slower │
│ QQuery 3     │     187.35ms │     174.29ms │ +1.07x faster │
│ QQuery 4     │      99.81ms │     104.31ms │     no change │
│ QQuery 5     │     475.38ms │     465.07ms │     no change │
│ QQuery 6     │      38.34ms │      37.14ms │     no change │
│ QQuery 7     │    1076.25ms │    1165.47ms │  1.08x slower │
│ QQuery 8     │     254.25ms │     242.57ms │     no change │
│ QQuery 9     │     631.52ms │     611.30ms │     no change │
│ QQuery 10    │     335.70ms │     346.45ms │     no change │
│ QQuery 11    │     297.94ms │     293.44ms │     no change │
│ QQuery 12    │     154.28ms │     153.63ms │     no change │
│ QQuery 13    │     842.44ms │     983.07ms │  1.17x slower │
│ QQuery 14    │      55.30ms │      58.83ms │  1.06x slower │
│ QQuery 15    │     128.91ms │     123.69ms │     no change │
│ QQuery 16    │     265.96ms │     258.43ms │     no change │
│ QQuery 17    │    3514.38ms │    4024.28ms │  1.15x slower │
│ QQuery 18    │    3158.26ms │    3294.42ms │     no change │
│ QQuery 19    │     145.06ms │     150.17ms │     no change │
│ QQuery 20    │    1093.86ms │    1163.19ms │  1.06x slower │
│ QQuery 21    │    1437.97ms │    1483.74ms │     no change │
│ QQuery 22    │     151.64ms │     139.58ms │ +1.09x faster │
└──────────────┴──────────────┴──────────────┴───────────────┘

For reference, here is the same benchmark run against main itself:

****** TPCH SF1 (Parquet) ******
+ python3 /home/alamb/arrow-datafusion/benchmarks/compare.py /home/alamb/benchmarking/alamb-main/tpch_sf1_parquet_main.json /home/alamb/benchmarking/alamb-main/tpch_sf1_parquet\
_branch.json
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃ /home/alamb… ┃ /home/alamb… ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │    1430.86ms │    1423.29ms │     no change │
│ QQuery 2     │     399.75ms │     405.00ms │     no change │
│ QQuery 3     │     520.40ms │     525.56ms │     no change │
│ QQuery 4     │     218.29ms │     223.87ms │     no change │
│ QQuery 5     │     693.57ms │     685.46ms │     no change │
│ QQuery 6     │     416.62ms │     423.02ms │     no change │
│ QQuery 7     │    1258.17ms │    1243.79ms │     no change │
│ QQuery 8     │     690.25ms │     687.29ms │     no change │
│ QQuery 9     │    1304.02ms │    1288.01ms │     no change │
│ QQuery 10    │     770.91ms │     748.94ms │     no change │
│ QQuery 11    │     356.32ms │     336.55ms │ +1.06x faster │
│ QQuery 12    │     335.14ms │     329.12ms │     no change │
│ QQuery 13    │    1170.83ms │    1146.78ms │     no change │
│ QQuery 14    │     422.25ms │     421.47ms │     no change │
│ QQuery 15    │     391.14ms │     381.71ms │     no change │
│ QQuery 16    │     348.38ms │     344.13ms │     no change │
│ QQuery 17    │    2860.96ms │    2838.27ms │     no change │
│ QQuery 18    │    3726.11ms │    3734.67ms │     no change │
│ QQuery 19    │     728.53ms │     737.35ms │     no change │
│ QQuery 20    │    1250.75ms │    1208.06ms │     no change │
│ QQuery 21    │    1688.40ms │    1757.45ms │     no change │
│ QQuery 22    │     192.36ms │     190.43ms │     no change │
└──────────────┴──────────────┴──────────────┴───────────────┘
+ echo '****** TPCH SF1 (mem) ******'
****** TPCH SF1 (mem) ******
+ python3 /home/alamb/arrow-datafusion/benchmarks/compare.py /home/alamb/benchmarking/alamb-main/tpch_sf1_mem_main.json /home/alamb/benchmarking/alamb-main/tpch_sf1_mem_branch.\
json
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃           -o ┃           -o ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │     759.07ms │     770.73ms │     no change │
│ QQuery 2     │     269.81ms │     291.05ms │  1.08x slower │
│ QQuery 3     │     180.67ms │     161.61ms │ +1.12x faster │
│ QQuery 4     │     105.16ms │     105.46ms │     no change │
│ QQuery 5     │     467.46ms │     466.11ms │     no change │
│ QQuery 6     │      38.08ms │      42.72ms │  1.12x slower │
│ QQuery 7     │    1170.05ms │    1147.43ms │     no change │
│ QQuery 8     │     249.06ms │     238.74ms │     no change │
│ QQuery 9     │     613.38ms │     609.98ms │     no change │
│ QQuery 10    │     342.67ms │     327.23ms │     no change │
│ QQuery 11    │     279.84ms │     281.69ms │     no change │
│ QQuery 12    │     143.94ms │     146.57ms │     no change │
│ QQuery 13    │     676.22ms │     668.79ms │     no change │
│ QQuery 14    │      53.06ms │      51.73ms │     no change │
│ QQuery 15    │      98.47ms │      92.68ms │ +1.06x faster │
│ QQuery 16    │     244.93ms │     257.73ms │  1.05x slower │
│ QQuery 17    │    2473.33ms │    2503.47ms │     no change │
│ QQuery 18    │    3150.26ms │    3169.32ms │     no change │
│ QQuery 19    │     154.90ms │     150.53ms │     no change │
│ QQuery 20    │     969.12ms │     929.69ms │     no change │
│ QQuery 21    │    1476.10ms │    1457.34ms │     no change │
│ QQuery 22    │     148.71ms │     143.20ms │     no change │
└──────────────┴──────────────┴──────────────┴───────────────┘

@comphead
Copy link
Contributor Author

I have also double checked from my side with latest main which also includes row_hash optimizations from @mingmwang
Testing is against 13G hits.parquet file, 2 runs(cold run, warm run)

datafusion-cli built in release mode
Machine 2.4 GHz 8-Core Intel Core i9 MacOS

With this PR

DataFusion CLI v23.0.0
❯ CREATE EXTERNAL TABLE hits STORED AS PARQUET location 'hits.parquet';
0 rows in set. Query took 0.079 seconds.
❯ SELECT "WatchID", "ClientIP", COUNT(*) AS c, SUM("IsRefresh"), AVG("ResolutionWidth") FROM hits GROUP BY "WatchID", "ClientIP" ORDER BY c DESC LIMIT 10;
+---------------------+-------------+---+---------------------+---------------------------+
| WatchID             | ClientIP    | c | SUM(hits.IsRefresh) | AVG(hits.ResolutionWidth) |
+---------------------+-------------+---+---------------------+---------------------------+
| 6655575552203051303 | 1611957945  | 2 | 0                   | 1638.0                    |
| 7904046282518428963 | 1509330109  | 2 | 0                   | 1368.0                    |
| 8566928176839891583 | -1402644643 | 2 | 0                   | 1368.0                    |
| 7224410078130478461 | -776509581  | 2 | 0                   | 1368.0                    |
| 7968574085024155935 | -986722817  | 1 | 0                   | 1990.0                    |
| 8683116696854507598 | -1887352109 | 1 | 0                   | 1917.0                    |
| 6071982018954122379 | 1154898388  | 1 | 0                   | 1638.0                    |
| 7044330683984323480 | -765736418  | 1 | 0                   | 1750.0                    |
| 5170668904757974782 | 580435115   | 1 | 0                   | 1087.0                    |
| 7121372218861667575 | -888761092  | 1 | 0                   | 1368.0                    |
+---------------------+-------------+---+---------------------+---------------------------+
10 rows in set. Query took 51.300 seconds.
❯ SELECT "WatchID", "ClientIP", COUNT(*) AS c, SUM("IsRefresh"), AVG("ResolutionWidth") FROM hits GROUP BY "WatchID", "ClientIP" ORDER BY c DESC LIMIT 10;
+---------------------+-------------+---+---------------------+---------------------------+
| WatchID             | ClientIP    | c | SUM(hits.IsRefresh) | AVG(hits.ResolutionWidth) |
+---------------------+-------------+---+---------------------+---------------------------+
| 6655575552203051303 | 1611957945  | 2 | 0                   | 1638.0                    |
| 7904046282518428963 | 1509330109  | 2 | 0                   | 1368.0                    |
| 8566928176839891583 | -1402644643 | 2 | 0                   | 1368.0                    |
| 7224410078130478461 | -776509581  | 2 | 0                   | 1368.0                    |
| 5811527790243312578 | 56896075    | 1 | 0                   | 1368.0                    |
| 5998597912391672099 | 807012274   | 1 | 0                   | 1996.0                    |
| 6071982018954122379 | 1154898388  | 1 | 0                   | 1638.0                    |
| 7044330683984323480 | -765736418  | 1 | 0                   | 1750.0                    |
| 5170668904757974782 | 580435115   | 1 | 0                   | 1087.0                    |
| 7121372218861667575 | -888761092  | 1 | 0                   | 1368.0                    |
+---------------------+-------------+---+---------------------+---------------------------+
10 rows in set. Query took 47.066 seconds.

Without PR

DataFusion CLI v23.0.0
❯ CREATE EXTERNAL TABLE hits STORED AS PARQUET location 'hits.parquet';
0 rows in set. Query took 0.066 seconds.
❯ SELECT "WatchID", "ClientIP", COUNT(*) AS c, SUM("IsRefresh"), AVG("ResolutionWidth") FROM hits GROUP BY "WatchID", "ClientIP" ORDER BY c DESC LIMIT 10;
+---------------------+-------------+---+---------------------+---------------------------+
| WatchID             | ClientIP    | c | SUM(hits.IsRefresh) | AVG(hits.ResolutionWidth) |
+---------------------+-------------+---+---------------------+---------------------------+
| 6655575552203051303 | 1611957945  | 2 | 0                   | 1638.0                    |
| 7904046282518428963 | 1509330109  | 2 | 0                   | 1368.0                    |
| 8566928176839891583 | -1402644643 | 2 | 0                   | 1368.0                    |
| 7224410078130478461 | -776509581  | 2 | 0                   | 1368.0                    |
| 7876723297163966966 | -1495669395 | 1 | 0                   | 1750.0                    |
| 9012818526311489736 | 1663619136  | 1 | 0                   | 1638.0                    |
| 6071982018954122379 | 1154898388  | 1 | 0                   | 1638.0                    |
| 7044330683984323480 | -765736418  | 1 | 0                   | 1750.0                    |
| 5170668904757974782 | 580435115   | 1 | 0                   | 1087.0                    |
| 7121372218861667575 | -888761092  | 1 | 0                   | 1368.0                    |
+---------------------+-------------+---+---------------------+---------------------------+
10 rows in set. Query took 50.776 seconds.
❯ SELECT "WatchID", "ClientIP", COUNT(*) AS c, SUM("IsRefresh"), AVG("ResolutionWidth") FROM hits GROUP BY "WatchID", "ClientIP" ORDER BY c DESC LIMIT 10;
+---------------------+-------------+---+---------------------+---------------------------+
| WatchID             | ClientIP    | c | SUM(hits.IsRefresh) | AVG(hits.ResolutionWidth) |
+---------------------+-------------+---+---------------------+---------------------------+
| 6655575552203051303 | 1611957945  | 2 | 0                   | 1638.0                    |
| 7904046282518428963 | 1509330109  | 2 | 0                   | 1368.0                    |
| 8566928176839891583 | -1402644643 | 2 | 0                   | 1368.0                    |
| 7224410078130478461 | -776509581  | 2 | 0                   | 1368.0                    |
| 5981193970486754997 | -1725568709 | 1 | 0                   | 1917.0                    |
| 6089584153122015492 | 1209163516  | 1 | 0                   | 1996.0                    |
| 6071982018954122379 | 1154898388  | 1 | 0                   | 1638.0                    |
| 7044330683984323480 | -765736418  | 1 | 0                   | 1750.0                    |
| 5170668904757974782 | 580435115   | 1 | 0                   | 1087.0                    |
| 7121372218861667575 | -888761092  | 1 | 0                   | 1368.0                    |
+---------------------+-------------+---+---------------------+---------------------------+
10 rows in set. Query took 48.886 seconds.

Before #6003 DF took 62 and 58 sec respectively.
That mean @mingmwang fixes performance and slice_and_maybe_filter was not a bottleneck anymore so this PR doesn't bring any benefit. I will close it. Thanks all for participating

@comphead comphead closed this Apr 24, 2023
@alamb
Copy link
Contributor

alamb commented Apr 24, 2023

Thanks @comphead and @mingmwang and @ozankabak and @Dandandan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

slice_and_maybe_filter performance improvement
6 participants