Improve performance of casting `DictionaryArray` to `StringViewArray` #5871

XiangpengHao · 2024-06-11T14:29:11Z

Which issue does this PR close?

Part of #5861 .

Rationale for this change

Casting from DictionaryArray to String/BinaryView was previously handled by unpack_dictionary: https://github.com/apache/arrow-rs/blob/master/arrow-cast/src/cast/dictionary.rs#L93-L105

which incurs unnecessary copy to the value buffer. This pr handles Utf8View and BinaryView so that it will reuse the value buffer instead of creating a new one.

What changes are included in this PR?

Are there any user-facing changes?

No

alamb

Thank you @XiangpengHao -- This is looking great.

arrow-cast/src/cast/dictionary.rs

alamb · 2024-06-11T15:35:28Z

arrow-cast/src/cast/dictionary.rs

+                StringViewArray::new_unchecked(
+                    view_buffer,
+                    vec![value_buffer],
+                    dict_array.nulls().cloned(),


I think calling nulls() doesn't handle the case where the dictionary value itself (rather than the key) was null

arrow-rs/arrow-array/src/array/dictionary_array.rs

Line 727 in cf59b6c

fn nulls(&self) -> Option<&NullBuffer> {

I think this should call logical_nulls() instead:

arrow-rs/arrow-array/src/array/dictionary_array.rs

Line 731 in cf59b6c

fn logical_nulls(&self) -> Option<NullBuffer> {

Also, it would be good to create a test case that covers this too

Nice catch! added a new test to cover this

(fyi there seem to be no new commits to this PR)

alamb · 2024-06-11T15:37:50Z

arrow-cast/src/cast/mod.rs

+    #[test]
+    fn test_dict_to_view() {
+        let string_view_array = StringViewArray::from_iter(VIEW_TEST_DATA);
+        let string_dict_array: DictionaryArray<Int8Type> = VIEW_TEST_DATA.into_iter().collect();


Can you please update this test to have a dictionary array that has:

Repeated use of dictionary values

keys that are not all increasing

Nulls in the values (as well as the keys)

Perhaps what you can do is create a StringArray from VIEW_TEST_DATA and then make create the indexes manually

Like this example https://docs.rs/arrow/latest/arrow/array/struct.DictionaryArray.html#example-from-existing-arrays

Does that make sense?

alamb · 2024-06-11T15:40:53Z

arrow-cast/src/cast/dictionary.rs

+                let length = end - offset;
+                let value_buf = &value_buffer[offset as usize..end as usize];
+
+                if length <= 12 {


Instead of creating the views directly, what do you think about using try_append_view and try_append_block added in this PR: #5796

Maybe as bonus points you could add a benchmark for this casting operation, and see if adding append_view_unchecked would make any difference

Unfortunately, adding append_view_unchecked improved the performance by 50%

dict to view time: [38.116 µs 38.122 µs 38.127 µs] change: [-50.618% -50.455% -50.290%] (p = 0.00 < 0.05) Performance has improved. Found 8 outliers among 100 measurements (8.00%) 1 (1.00%) low severe 4 (4.00%) low mild 1 (1.00%) high mild 2 (2.00%) high severe

I think it justifies adding append_view_unchecked, what do you think?

Also, should I add the benchmark to the repo? It's a bit tricky to setup two versions of the cast implementation..

I think it justifies adding append_view_unchecked, what do you think?

Makes sense to me

Also, should I add the benchmark to the repo? It's a bit tricky to setup two versions of the cast implementation..

I do think we should add the benchmark to the repo (so we can use it for future optimizations)

In terms of justifying append_view_unsafe I think running the benchmark on a local checkout that calls append_view and then revert and run the same benchmark is fine (which is presumably what you did)

XiangpengHao · 2024-06-11T20:08:09Z

BTW, with this pull request, casting a dictionary array to string view with 10_000 items improved performance by 25%

dict to view            time:   [37.979 µs 37.990 µs 38.001 µs]
                        change: [-25.271% -25.253% -25.237%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  2 (2.00%) low mild
  5 (5.00%) high mild
  4 (4.00%) high severe

alamb

Looks good to me -- I think this could be made significantly faster but I think we should add a benchmark and optimize as a follow on PR

Thank you @XiangpengHao

alamb · 2024-06-11T20:08:42Z

arrow-cast/src/cast/mod.rs

+        let typed_dict = string_dict_array.downcast_dict::<StringArray>().unwrap();
+
+        let string_view_array = {
+            let mut builder = StringViewBuilder::new().with_block_size(8); // multiple buffers.


alamb · 2024-06-11T20:10:29Z

arrow-cast/src/cast/dictionary.rs

+    let value_offsets = array.value_offsets();
+    let mut builder = GenericByteViewBuilder::<T>::with_capacity(keys.len());
+    builder.append_block(value_buffer.clone());
+    for i in keys.iter() {


Another potential optimization is a separate loop if there are no nulls in keys (so we can avoid the branch)

Another potental idea is to use value_offsets.windows(2) as an iterator to avoid the bounds checks in value_offsets

However, I think we should merge this basic PR in as is, and then add a bencmark and optimize this kenrnel as a follow on PR (if we care). I can file a ticket if @tustvold agrees

ops, I checked in a append_view_unchecked.

But if we do try_append_view, it is even slower than the unpack_dictionary approach

I can also move append_view_unchecked as a follow-up PR and potentially clean up other use cases where ByteViews are manually constructed.

alamb

Thanks @XiangpengHao --

This looks really nice to me. Can you please make a separate PR with the cast dictionary --> view benchmark? Then we can use that benchmark to ensure that this approach is faster than "unpack dictionary" (which I am sure it will be)

alamb · 2024-06-12T14:44:03Z

arrow-array/src/builder/generic_bytes_view_builder.rs

-            };
-            self.views_builder.append(view.into());
+        unsafe {
+            self.append_view_unchecked(block, offset, len);


alamb · 2024-06-13T10:05:27Z

My plan is to merge this branch up to main, run the benchmarks added in #5874 on it and then assuming they look good merge it in

alamb · 2024-06-13T11:45:20Z

The bechmark I ran suggests this kernel is almost 2x as fast as the existing approach (which makes sense as it avoids a copy).

Nice work @XiangpengHao

++ critcmp master view-to-dict
group                       master                                 view-to-dict
-----                       ------                                 ------------
cast dict to string view    1.79    123.2±1.68µs        ? ?/sec    1.00     68.8±2.76µs        ? ?/sec
cast string view to dict    1.00    249.9±0.68µs        ? ?/sec    1.07    268.3±0.53µs        ? ?/sec

zero-copy dict to view

63db275

github-actions bot added the arrow Changes to the arrow crate label Jun 11, 2024

alamb reviewed Jun 11, 2024

View reviewed changes

refactor to use try_append_view

0308dd4

alamb approved these changes Jun 11, 2024

View reviewed changes

XiangpengHao added 2 commits June 11, 2024 16:17

unchecked view

c27d548

make fmt happy

fdebc0e

XiangpengHao mentioned this pull request Jun 11, 2024

Improve performance of casting StringView/BinaryView to DictionaryArray #5872

Merged

alamb approved these changes Jun 12, 2024

View reviewed changes

XiangpengHao mentioned this pull request Jun 12, 2024

Benchmark for casting view to dict arrays (and the reverse) #5874

Merged

alamb added 2 commits June 13, 2024 06:06

Merge remote-tracking branch 'apache/master' into view-to-dict

ca11706

Merge remote-tracking branch 'apache/master' into view-to-dict

52c28c1

alamb changed the title ~~zero copy cast dict array to view type arrays~~ Improve performance of casting DictionaryArray to StringViewArray Jun 13, 2024

alamb merged commit 77271c4 into apache:master Jun 13, 2024
26 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of casting `DictionaryArray` to `StringViewArray` #5871

Improve performance of casting `DictionaryArray` to `StringViewArray` #5871

XiangpengHao commented Jun 11, 2024

alamb left a comment

alamb Jun 11, 2024

XiangpengHao Jun 11, 2024

alamb Jun 11, 2024

XiangpengHao Jun 11, 2024

alamb Jun 11, 2024

alamb Jun 11, 2024

XiangpengHao Jun 11, 2024

alamb Jun 12, 2024

XiangpengHao commented Jun 11, 2024

alamb left a comment

alamb Jun 11, 2024

alamb Jun 11, 2024

XiangpengHao Jun 11, 2024

XiangpengHao Jun 11, 2024

alamb left a comment

alamb Jun 12, 2024

alamb commented Jun 13, 2024

alamb commented Jun 13, 2024

Improve performance of casting DictionaryArray to StringViewArray #5871

Improve performance of casting DictionaryArray to StringViewArray #5871

Conversation

XiangpengHao commented Jun 11, 2024

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

alamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

XiangpengHao commented Jun 11, 2024

alamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alamb commented Jun 13, 2024

alamb commented Jun 13, 2024

Improve performance of casting `DictionaryArray` to `StringViewArray` #5871

Improve performance of casting `DictionaryArray` to `StringViewArray` #5871