Implement `filter` kernel for byte view arrays. #5624

RinChanNOWWW · 2024-04-11T02:11:44Z

Which issue does this PR close?

Closes #5510.

Rationale for this change

Necessary feature.

What changes are included in this PR?

Implement filter kernel for StringViewArray and ByteViewArray.
Add unit tests for the kernel.
Add string view arrays to benchmark filter_kernels.

Are there any user-facing changes?

Yes.

Deprecate ArrowPrimitiveType::get_byte_width. Move this method to ArrowNativeType::get_byte_width.

When using ArrowPrimitiveType, we can get the byte width by ArrowPrimitiveType::Native::get_byte_width.

alamb

Thank you for this contribution @RinChanNOWWW 🙏 This looks really nice

The only thing I think we need to do to merge this PR is to run the filter_kernels benchmarks to show that this doesn't regress existing performance

It would be a nice bonus to add a new benchmark for filtering of BinaryView and StringView array

arrow-array/src/types.rs

RinChanNOWWW · 2024-04-14T02:28:06Z

Benchmark

Test Machine

MacBook M1 Pro (10 Cores, 32G RAM)

Bench Results

Before this PR

filter optimize (kept 1/2)
                        time:   [108.85 µs 109.10 µs 109.38 µs]
Found 14 outliers among 100 measurements (14.00%)
  12 (12.00%) high mild
  2 (2.00%) high severe

filter optimize high selectivity (kept 1023/1024)
                        time:   [1.2401 µs 1.2460 µs 1.2546 µs]
Found 13 outliers among 100 measurements (13.00%)
  3 (3.00%) low mild
  5 (5.00%) high mild
  5 (5.00%) high severe

filter optimize low selectivity (kept 1/1024)
                        time:   [1.1569 µs 1.1718 µs 1.1925 µs]
Found 12 outliers among 100 measurements (12.00%)
  2 (2.00%) high mild
  10 (10.00%) high severe

filter u8 (kept 1/2)    time:   [107.89 µs 108.18 µs 108.63 µs]
Found 12 outliers among 100 measurements (12.00%)
  5 (5.00%) low mild
  5 (5.00%) high mild
  2 (2.00%) high severe

filter u8 high selectivity (kept 1023/1024)
                        time:   [2.3864 µs 2.3994 µs 2.4165 µs]
Found 18 outliers among 100 measurements (18.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  6 (6.00%) high mild
  9 (9.00%) high severe

filter u8 low selectivity (kept 1/1024)
                        time:   [1.2988 µs 1.3011 µs 1.3036 µs]
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) low mild
  3 (3.00%) high mild

filter context u8 (kept 1/2)
                        time:   [10.775 µs 10.803 µs 10.835 µs]
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

filter context u8 high selectivity (kept 1023/1024)
                        time:   [1.3241 µs 1.3297 µs 1.3371 µs]
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

filter context u8 low selectivity (kept 1/1024)
                        time:   [218.56 ns 219.12 ns 219.68 ns]
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) low mild
  3 (3.00%) high mild

filter i32 (kept 1/2)   time:   [109.58 µs 109.70 µs 109.83 µs]
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

filter i32 high selectivity (kept 1023/1024)
                        time:   [6.9736 µs 6.9930 µs 7.0150 µs]
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild
  3 (3.00%) high severe

filter i32 low selectivity (kept 1/1024)
                        time:   [1.2492 µs 1.2539 µs 1.2607 µs]
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

filter context i32 (kept 1/2)
                        time:   [11.006 µs 11.039 µs 11.088 µs]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

filter context i32 high selectivity (kept 1023/1024)
                        time:   [6.4559 µs 6.5295 µs 6.6090 µs]
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe

filter context i32 low selectivity (kept 1/1024)
                        time:   [187.45 ns 189.07 ns 190.80 ns]
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

filter context i32 w NULLs (kept 1/2)
                        time:   [39.095 µs 39.134 µs 39.174 µs]
Found 10 outliers among 100 measurements (10.00%)
  4 (4.00%) low mild
  3 (3.00%) high mild
  3 (3.00%) high severe

filter context i32 w NULLs high selectivity (kept 1023/1024)
                        time:   [9.8810 µs 9.9135 µs 9.9479 µs]
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe

filter context i32 w NULLs low selectivity (kept 1/1024)
                        time:   [378.16 ns 379.77 ns 381.76 ns]
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

filter context u8 w NULLs (kept 1/2)
                        time:   [38.769 µs 39.277 µs 40.111 µs]
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  5 (5.00%) high severe

filter context u8 w NULLs high selectivity (kept 1023/1024)
                        time:   [5.2810 µs 5.2918 µs 5.3034 µs]
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  2 (2.00%) high severe

filter context u8 w NULLs low selectivity (kept 1/1024)
                        time:   [396.76 ns 398.03 ns 399.52 ns]
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

filter f32 (kept 1/2)   time:   [225.86 µs 226.22 µs 226.62 µs]
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low mild
  7 (7.00%) high mild
  2 (2.00%) high severe

filter context f32 (kept 1/2)
                        time:   [39.001 µs 39.063 µs 39.133 µs]
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  1 (1.00%) high severe

filter context f32 high selectivity (kept 1023/1024)
                        time:   [9.8491 µs 9.8671 µs 9.8865 µs]
Found 6 outliers among 100 measurements (6.00%)
  6 (6.00%) high mild

filter context f32 low selectivity (kept 1/1024)
                        time:   [382.18 ns 384.13 ns 387.05 ns]
Found 10 outliers among 100 measurements (10.00%)
  7 (7.00%) high mild
  3 (3.00%) high severe

filter decimal128 (kept 1/2)
                        time:   [114.21 µs 114.59 µs 115.03 µs]
Found 14 outliers among 100 measurements (14.00%)
  3 (3.00%) low mild
  6 (6.00%) high mild
  5 (5.00%) high severe

filter decimal128 high selectivity (kept 1023/1024)
                        time:   [19.883 µs 19.931 µs 19.979 µs]
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

filter decimal128 low selectivity (kept 1/1024)
                        time:   [1.2756 µs 1.2800 µs 1.2872 µs]
Found 11 outliers among 100 measurements (11.00%)
  3 (3.00%) low mild
  6 (6.00%) high mild
  2 (2.00%) high severe

filter context decimal128 (kept 1/2)
                        time:   [15.669 µs 15.728 µs 15.792 µs]
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild

filter context decimal128 high selectivity (kept 1023/1024)
                        time:   [19.516 µs 19.576 µs 19.657 µs]
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

filter context decimal128 low selectivity (kept 1/1024)
                        time:   [198.30 ns 199.32 ns 200.72 ns]
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

filter context string (kept 1/2)
                        time:   [248.81 µs 249.42 µs 250.03 µs]
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe

filter context string high selectivity (kept 1023/1024)
                        time:   [101.29 µs 102.20 µs 103.12 µs]
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

filter context string low selectivity (kept 1/1024)
                        time:   [789.42 ns 791.92 ns 794.52 ns]
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) low mild
  4 (4.00%) high mild

filter context string dictionary (kept 1/2)
                        time:   [11.479 µs 11.491 µs 11.504 µs]
Found 17 outliers among 100 measurements (17.00%)
  3 (3.00%) low severe
  5 (5.00%) low mild
  6 (6.00%) high mild
  3 (3.00%) high severe

filter context string dictionary high selectivity (kept 1023/1024)
                        time:   [6.7771 µs 6.8049 µs 6.8496 µs]
Found 11 outliers among 100 measurements (11.00%)
  3 (3.00%) low mild
  3 (3.00%) high mild
  5 (5.00%) high severe

filter context string dictionary low selectivity (kept 1/1024)
                        time:   [486.21 ns 1.0693 µs 2.3499 µs]
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

filter context string dictionary w NULLs (kept 1/2)
                        time:   [39.316 µs 39.395 µs 39.481 µs]
Found 10 outliers among 100 measurements (10.00%)
  5 (5.00%) low mild
  3 (3.00%) high mild
  2 (2.00%) high severe

filter context string dictionary w NULLs high selectivity (kept 1023/1024)
                        time:   [10.224 µs 10.256 µs 10.289 µs]
Found 15 outliers among 100 measurements (15.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  9 (9.00%) high mild
  3 (3.00%) high severe

filter context string dictionary w NULLs low selectivity (kept 1/1024)
                        time:   [683.57 ns 685.92 ns 689.36 ns]
Found 15 outliers among 100 measurements (15.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  10 (10.00%) high severe

filter single record batch
                        time:   [108.48 µs 108.71 µs 108.96 µs]
Found 14 outliers among 100 measurements (14.00%)
  2 (2.00%) low mild
  10 (10.00%) high mild
  2 (2.00%) high severe

This PR

filter optimize (kept 1/2)
                        time:   [113.58 µs 113.77 µs 113.95 µs]
                        change: [+4.0851% +4.3738% +4.6620%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) low mild
  1 (1.00%) high mild

filter optimize high selectivity (kept 1023/1024)
                        time:   [1.2711 µs 1.2777 µs 1.2846 µs]
                        change: [+2.5568% +3.0976% +3.5958%] (p = 0.00 < 0.05)
                        Performance has regressed.

filter optimize low selectivity (kept 1/1024)
                        time:   [1.1929 µs 1.1983 µs 1.2040 µs]
                        change: [-2.1008% +0.3841% +2.6277%] (p = 0.76 > 0.05)
                        No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe

filter u8 (kept 1/2)    time:   [109.53 µs 109.89 µs 110.32 µs]
                        change: [+1.6143% +2.4601% +3.8865%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

filter u8 high selectivity (kept 1023/1024)
                        time:   [2.4081 µs 2.4106 µs 2.4134 µs]
                        change: [+0.2962% +0.7727% +1.1562%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low severe
  5 (5.00%) high mild
  2 (2.00%) high severe

filter u8 low selectivity (kept 1/1024)
                        time:   [1.2950 µs 1.2997 µs 1.3044 µs]
                        change: [-0.8837% -0.6003% -0.3008%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 16 outliers among 100 measurements (16.00%)
  1 (1.00%) low mild
  12 (12.00%) high mild
  3 (3.00%) high severe

filter context u8 (kept 1/2)
                        time:   [10.877 µs 10.920 µs 10.969 µs]
                        change: [+1.4152% +2.4277% +3.2334%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  7 (7.00%) high mild

filter context u8 high selectivity (kept 1023/1024)
                        time:   [1.3538 µs 1.3555 µs 1.3575 µs]
                        change: [+1.5450% +2.0398% +2.4813%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low mild
  7 (7.00%) high mild
  1 (1.00%) high severe

filter context u8 low selectivity (kept 1/1024)
                        time:   [220.61 ns 221.62 ns 223.23 ns]
                        change: [+0.8775% +1.3357% +1.8448%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  4 (4.00%) high severe

filter i32 (kept 1/2)   time:   [108.89 µs 109.18 µs 109.51 µs]
                        change: [-1.0341% -0.7383% -0.4345%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  4 (4.00%) low mild
  3 (3.00%) high mild
  3 (3.00%) high severe

filter i32 high selectivity (kept 1023/1024)
                        time:   [7.0566 µs 7.0849 µs 7.1286 µs]
                        change: [+0.6780% +1.0743% +1.4971%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  5 (5.00%) high mild
  3 (3.00%) high severe

filter i32 low selectivity (kept 1/1024)
                        time:   [1.2507 µs 1.2528 µs 1.2547 µs]
                        change: [-1.2179% -0.5290% +0.0010%] (p = 0.08 > 0.05)
                        No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) low mild
  4 (4.00%) high mild
  2 (2.00%) high severe

filter context i32 (kept 1/2)
                        time:   [11.041 µs 11.066 µs 11.098 µs]
                        change: [-0.2388% +0.3357% +1.0170%] (p = 0.38 > 0.05)
                        No change in performance detected.
Found 9 outliers among 100 measurements (9.00%)
  2 (2.00%) low mild
  5 (5.00%) high mild
  2 (2.00%) high severe

filter context i32 high selectivity (kept 1023/1024)
                        time:   [6.4224 µs 6.4283 µs 6.4349 µs]
                        change: [-0.6789% -0.0051% +0.6111%] (p = 0.99 > 0.05)
                        No change in performance detected.
Found 13 outliers among 100 measurements (13.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  4 (4.00%) high mild
  6 (6.00%) high severe

filter context i32 low selectivity (kept 1/1024)
                        time:   [182.22 ns 182.91 ns 183.73 ns]
                        change: [-4.8992% -3.8138% -2.7259%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

filter context i32 w NULLs (kept 1/2)
                        time:   [39.046 µs 39.130 µs 39.231 µs]
                        change: [-0.5928% -0.3519% -0.1009%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) low mild
  3 (3.00%) high mild
  1 (1.00%) high severe

filter context i32 w NULLs high selectivity (kept 1023/1024)
                        time:   [9.8509 µs 9.8910 µs 9.9429 µs]
                        change: [-0.1213% +0.3332% +0.7471%] (p = 0.15 > 0.05)
                        No change in performance detected.
Found 18 outliers among 100 measurements (18.00%)
  1 (1.00%) low mild
  10 (10.00%) high mild
  7 (7.00%) high severe

filter context i32 w NULLs low selectivity (kept 1/1024)
                        time:   [380.64 ns 381.56 ns 382.44 ns]
                        change: [-0.1487% +0.5091% +1.1435%] (p = 0.12 > 0.05)
                        No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe

filter context u8 w NULLs (kept 1/2)
                        time:   [38.937 µs 38.988 µs 39.043 µs]
                        change: [-1.0588% -0.0070% +0.7384%] (p = 0.99 > 0.05)
                        No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) low mild
  1 (1.00%) high severe

filter context u8 w NULLs high selectivity (kept 1023/1024)
                        time:   [5.2514 µs 5.2719 µs 5.3039 µs]
                        change: [-1.0082% -0.7196% -0.3732%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  2 (2.00%) high severe

filter context u8 w NULLs low selectivity (kept 1/1024)
                        time:   [402.92 ns 403.73 ns 404.51 ns]
                        change: [+0.1940% +0.9968% +1.5659%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) low mild

filter f32 (kept 1/2)   time:   [226.90 µs 227.84 µs 229.27 µs]
                        change: [+0.3733% +0.6710% +1.0071%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe

filter context f32 (kept 1/2)
                        time:   [39.199 µs 39.251 µs 39.309 µs]
                        change: [+0.2787% +0.5129% +0.7989%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  2 (2.00%) high severe

filter context f32 high selectivity (kept 1023/1024)
                        time:   [9.9815 µs 10.002 µs 10.028 µs]
                        change: [+0.8736% +1.4786% +2.3604%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) low severe
  3 (3.00%) high mild
  3 (3.00%) high severe

filter context f32 low selectivity (kept 1/1024)
                        time:   [387.26 ns 390.03 ns 393.57 ns]
                        change: [+1.0072% +1.8188% +2.9038%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high severe

filter decimal128 (kept 1/2)
                        time:   [114.29 µs 114.51 µs 114.73 µs]
                        change: [-0.0174% +0.2914% +0.5825%] (p = 0.06 > 0.05)
                        No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) low mild
  6 (6.00%) high mild
  1 (1.00%) high severe

filter decimal128 high selectivity (kept 1023/1024)
                        time:   [20.367 µs 20.461 µs 20.567 µs]
                        change: [+3.6664% +4.5560% +5.4165%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

filter decimal128 low selectivity (kept 1/1024)
                        time:   [1.2728 µs 1.2763 µs 1.2824 µs]
                        change: [-0.4061% +0.0696% +0.6427%] (p = 0.80 > 0.05)
                        No change in performance detected.
Found 13 outliers among 100 measurements (13.00%)
  4 (4.00%) low mild
  3 (3.00%) high mild
  6 (6.00%) high severe

filter context decimal128 (kept 1/2)
                        time:   [16.352 µs 16.549 µs 16.769 µs]
                        change: [+6.2700% +7.8273% +9.5264%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  9 (9.00%) high mild

filter context decimal128 high selectivity (kept 1023/1024)
                        time:   [19.889 µs 19.964 µs 20.043 µs]
                        change: [+1.9809% +2.3829% +2.8013%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild

filter context decimal128 low selectivity (kept 1/1024)
                        time:   [194.91 ns 195.69 ns 196.91 ns]
                        change: [-1.3852% +0.1304% +2.3070%] (p = 0.92 > 0.05)
                        No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) high mild
  5 (5.00%) high severe

filter context string (kept 1/2)
                        time:   [252.81 µs 254.08 µs 255.76 µs]
                        change: [+1.5679% +2.2820% +2.8509%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

filter context string high selectivity (kept 1023/1024)
                        time:   [100.14 µs 101.27 µs 102.43 µs]
                        change: [-2.3776% -0.6880% +0.8152%] (p = 0.39 > 0.05)
                        No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

filter context string low selectivity (kept 1/1024)
                        time:   [787.56 ns 793.59 ns 802.22 ns]
                        change: [-0.4104% +0.3498% +1.3407%] (p = 0.55 > 0.05)
                        No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

filter context string dictionary (kept 1/2)
                        time:   [11.471 µs 11.496 µs 11.524 µs]
                        change: [-0.1016% +0.1368% +0.3608%] (p = 0.26 > 0.05)
                        No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe

filter context string dictionary high selectivity (kept 1023/1024)
                        time:   [6.8123 µs 6.8485 µs 6.8901 µs]
                        change: [+0.2411% +0.6646% +1.1653%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low mild
  6 (6.00%) high mild
  3 (3.00%) high severe

filter context string dictionary low selectivity (kept 1/1024)
                        time:   [521.82 ns 522.37 ns 522.94 ns]
                        change: [-51.588% -23.655% +7.4035%] (p = 0.64 > 0.05)
                        No change in performance detected.
Found 14 outliers among 100 measurements (14.00%)
  3 (3.00%) low mild
  10 (10.00%) high mild
  1 (1.00%) high severe

filter context string dictionary w NULLs (kept 1/2)
                        time:   [39.575 µs 39.638 µs 39.704 µs]
                        change: [+0.4186% +0.6957% +0.9821%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe

filter context string dictionary w NULLs high selectivity (kept 1023/1024)
                        time:   [10.262 µs 10.291 µs 10.322 µs]
                        change: [+0.1850% +0.5708% +1.0183%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe

filter context string dictionary w NULLs low selectivity (kept 1/1024)
                        time:   [718.68 ns 719.49 ns 720.33 ns]
                        change: [+4.3311% +4.8665% +5.3129%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

filter single record batch
                        time:   [109.58 µs 109.96 µs 110.63 µs]
                        change: [+0.9978% +1.3005% +1.6416%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low mild
  6 (6.00%) high mild
  3 (3.00%) high severe

filter context short string view (kept 1/2)
                        time:   [43.870 µs 43.968 µs 44.075 µs]
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe

filter context short string view high selectivity (kept 1023/1024)
                        time:   [22.188 µs 22.286 µs 22.398 µs]
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

filter context short string view low selectivity (kept 1/1024)
                        time:   [384.87 ns 385.69 ns 386.51 ns]
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  3 (3.00%) high severe

filter context mixed string view (kept 1/2)
                        time:   [44.397 µs 44.475 µs 44.565 µs]
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild

filter context mixed string view high selectivity (kept 1023/1024)
                        time:   [23.370 µs 23.507 µs 23.641 µs]

filter context mixed string view low selectivity (kept 1/1024)
                        time:   [891.97 ns 894.09 ns 895.96 ns]

RinChanNOWWW · 2024-04-14T02:42:06Z

Do the benchmark results meet expectation? @alamb

alamb · 2024-04-15T19:05:49Z

Do the benchmark results meet expectation? @alamb

Thanks @RinChanNOWWW -- much appreciated

THanks for the merge @tustvold

github-actions bot added the arrow Changes to the arrow crate label Apr 11, 2024

RinChanNOWWW marked this pull request as ready for review April 11, 2024 11:39

alamb mentioned this pull request Apr 12, 2024

DataFusion weekly project plan (Andrew Lamb) - April 8, 2024 apache/datafusion#10002

Closed

9 tasks

alamb reviewed Apr 12, 2024

View reviewed changes

arrow-array/src/types.rs Show resolved Hide resolved

RinChanNOWWW and others added 4 commits April 14, 2024 10:12

Implement filter kernel for byte view arrays.

c438b16

Add unit tests and fix.

70819a1

Deprecate ArrowPrimitiveType::get_byte_width.

ef4f44f

Add string view filter benchmark.

9a6fd81

RinChanNOWWW marked this pull request as draft April 14, 2024 02:14

RinChanNOWWW force-pushed the view-filter branch from dd74d49 to 9a6fd81 Compare April 14, 2024 02:15

RinChanNOWWW marked this pull request as ready for review April 14, 2024 02:40

tustvold merged commit e88e5aa into apache:master Apr 15, 2024
26 checks passed

RinChanNOWWW deleted the view-filter branch April 15, 2024 13:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement `filter` kernel for byte view arrays. #5624

Implement `filter` kernel for byte view arrays. #5624

RinChanNOWWW commented Apr 11, 2024 •

edited

Loading

alamb left a comment

RinChanNOWWW commented Apr 14, 2024 •

edited

Loading

RinChanNOWWW commented Apr 14, 2024

alamb commented Apr 15, 2024

Implement filter kernel for byte view arrays. #5624

Implement filter kernel for byte view arrays. #5624

Conversation

RinChanNOWWW commented Apr 11, 2024 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

alamb left a comment

Choose a reason for hiding this comment

RinChanNOWWW commented Apr 14, 2024 • edited Loading

Benchmark

Test Machine

Bench Results

Before this PR

This PR

RinChanNOWWW commented Apr 14, 2024

alamb commented Apr 15, 2024

Implement `filter` kernel for byte view arrays. #5624

Implement `filter` kernel for byte view arrays. #5624

RinChanNOWWW commented Apr 11, 2024 •

edited

Loading

RinChanNOWWW commented Apr 14, 2024 •

edited

Loading