Improve performance of `trim` for string view (10%) #12395

Rachelint · 2024-09-09T13:51:08Z

Which issue does this PR close?

Closes #12387

Rationale for this change

Similar as the string view version substr, we can impl the string view version trim to improve performance.

What changes are included in this PR?

Impl a string view version trim which can avoid copying the whole long(> 12) string when performing trim.
Introduce the basic unit tests for trim.

Are these changes tested?

Test by new unit test and exist other tests.

Are there any user-facing changes?

No.

Kev1n8 · 2024-09-09T16:03:36Z

FYI @Rachelint that #12383 is modifying make_and_append_view, the original implementation is not correct, which is my fault.

Rachelint · 2024-09-09T16:16:36Z

FYI @Rachelint that #12383 is modifying make_and_append_view, the original implementation is not correct, which is my fault.

Thanks! I will push forward this until #12383 merged.

alamb

Thank you @Rachelint -- this looks really nice and quite close 🙏

I left some comments, but I don't think they are required to merge this.

I do think we should have benchmark numbers showing this makes things faster in order to merge it. Could you please make a StringView based benchmark for trim -- perhaps in

datafusion/datafusion/functions/benches/ltrim.rs

Line 4 in a08f923

// regarding copyright ownership. The ASF licenses this file

?

Then we can run that benchmark and show that this PR improves the performance.

Thanks again!

alamb · 2024-09-16T19:15:41Z

datafusion/functions/src/string/btrim.rs

@@ -82,7 +82,11 @@ impl ScalarUDFImpl for BTrimFunc {
    }

    fn return_type(&self, arg_types: &[DataType]) -> Result<DataType> {
-        utf8_to_str_type(&arg_types[0], "btrim")
+        if arg_types[0] == DataType::Utf8View {
+            Ok(DataType::Utf8View)


👍

Also eventually it would also be possible to return Utf8View when the input was Utf8 and save a copy as well

alamb · 2024-09-16T19:21:03Z

datafusion/functions/src/string/common.rs

 use datafusion_common::cast::{as_generic_string_array, as_string_view_array};
 use datafusion_common::Result;
 use datafusion_common::{exec_err, ScalarValue};
 use datafusion_expr::ColumnarValue;

+/// Make a `u128` based on the given substr, start(offset to view.offset), and
+/// push into to the given buffers
+pub(crate) fn make_and_append_view(


🤔 I wonder if we should (as a follow on PR) propose adding this upstream to arrow-rs as it seems valuable for any trim related kernels on stringview

It sounds great! and #12383 (comment) can be solved if it is function in arrow-rs.

It seems like what would be really useful is a StringViewBuilder that could be modified perhaps 🤔

I started to write a ticket in arrow-rs but I didn't know exactly what API to suggest. I think we would have to try it out

datafusion/functions/src/string/ltrim.rs

Rachelint · 2024-09-17T12:04:50Z

I think maybe we should place the LTrim/RTrim/BTrim into a same place(like trim.rs)?

Kev1n8 · 2024-09-17T15:40:06Z

For benchmarking, I would recommend this PR #12111. for what it's worth

Rachelint · 2024-09-17T16:59:57Z

For benchmarking, I would recommend this PR #12111. for what it's worth

Thanks, it is really helpful!

Rachelint · 2024-09-17T19:02:18Z

Run benchmark introduced in #12513, about 10~20% improvement for the long string(64 bytes).

Highlights, as we expected, the string view trim mainly reduces copyings when the trimmed result > 12:

group                                                                                    main                                   string-view-trim
-----                                                                                    ----                                   ----------------
INPUT LEN > 12, OUTPUT LEN > 12/string_view [size=1024, len_before=64, len_after=60]     1.16     41.2±0.19µs        ? ?/sec    1.00     35.6±0.21µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/string_view [size=4096, len_before=64, len_after=60]     1.25    173.1±5.68µs        ? ?/sec    1.00    138.5±0.78µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/string_view [size=8192, len_before=64, len_after=60]     1.24    341.3±3.67µs        ? ?/sec    1.00    276.1±1.17µs        ? ?/sec

The detailed sorted out benchmark result:

group                                                                                    main                                   string-view-trim
-----                                                                                    ----                                   ----------------
INPUT LEN <= 12/large_string [size=1024, len_before=12, len_after=8]                     1.00     35.9±0.07µs        ? ?/sec    1.01     36.1±0.36µs        ? ?/sec
INPUT LEN <= 12/large_string [size=4096, len_before=12, len_after=8]                     1.00    139.6±0.51µs        ? ?/sec    1.00    139.1±0.49µs        ? ?/sec
INPUT LEN <= 12/large_string [size=8192, len_before=12, len_after=8]                     1.01    281.2±2.01µs        ? ?/sec    1.00    278.4±2.06µs        ? ?/sec
INPUT LEN <= 12/string [size=1024, len_before=12, len_after=8]                           1.00     35.9±0.31µs        ? ?/sec    1.00     35.9±0.14µs        ? ?/sec
INPUT LEN <= 12/string [size=4096, len_before=12, len_after=8]                           1.00    138.5±0.41µs        ? ?/sec    1.01    139.4±0.52µs        ? ?/sec
INPUT LEN <= 12/string [size=8192, len_before=12, len_after=8]                           1.00    279.1±3.72µs        ? ?/sec    1.00    278.6±1.07µs        ? ?/sec
INPUT LEN <= 12/string_view [size=1024, len_before=12, len_after=8]                      1.00     36.2±1.13µs        ? ?/sec    1.00     36.1±1.98µs        ? ?/sec
INPUT LEN <= 12/string_view [size=4096, len_before=12, len_after=8]                      1.00    139.7±1.54µs        ? ?/sec    1.00    139.0±2.41µs        ? ?/sec
INPUT LEN <= 12/string_view [size=8192, len_before=12, len_after=8]                      1.01    277.5±1.31µs        ? ?/sec    1.00    275.5±2.25µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/large_string [size=1024, len_before=64, len_after=4]    1.03    135.5±4.86µs        ? ?/sec    1.00    131.6±1.33µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/large_string [size=4096, len_before=64, len_after=4]    1.00    522.5±2.32µs        ? ?/sec    1.00    522.1±2.30µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/large_string [size=8192, len_before=64, len_after=4]    1.00   1039.3±3.48µs        ? ?/sec    1.00   1040.9±3.07µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/string [size=1024, len_before=64, len_after=4]          1.01    132.5±1.17µs        ? ?/sec    1.00    131.3±0.92µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/string [size=4096, len_before=64, len_after=4]          1.01    527.6±3.43µs        ? ?/sec    1.00    522.2±1.72µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/string [size=8192, len_before=64, len_after=4]          1.00   1043.3±2.28µs        ? ?/sec    1.00   1040.7±3.50µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/string_view [size=1024, len_before=64, len_after=4]     1.01    131.3±0.40µs        ? ?/sec    1.00    130.5±0.60µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/string_view [size=4096, len_before=64, len_after=4]     1.01    524.0±2.79µs        ? ?/sec    1.00    519.3±2.52µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/string_view [size=8192, len_before=64, len_after=4]     1.00   1041.1±3.21µs        ? ?/sec    1.00   1040.1±9.73µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/large_string [size=1024, len_before=64, len_after=60]    1.00     41.2±0.30µs        ? ?/sec    1.00     41.2±0.16µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/large_string [size=4096, len_before=64, len_after=60]    1.01    169.9±4.30µs        ? ?/sec    1.00    168.1±1.83µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/large_string [size=8192, len_before=64, len_after=60]    1.01   345.1±10.96µs        ? ?/sec    1.00    342.5±4.26µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/string [size=1024, len_before=64, len_after=60]          1.02     41.8±0.62µs        ? ?/sec    1.00     41.0±0.12µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/string [size=4096, len_before=64, len_after=60]          1.01    171.6±1.73µs        ? ?/sec    1.00    169.2±2.07µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/string [size=8192, len_before=64, len_after=60]          1.00    343.0±6.30µs        ? ?/sec    1.00    341.8±6.00µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/string_view [size=1024, len_before=64, len_after=60]     1.16     41.2±0.19µs        ? ?/sec    1.00     35.6±0.21µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/string_view [size=4096, len_before=64, len_after=60]     1.25    173.1±5.68µs        ? ?/sec    1.00    138.5±0.78µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/string_view [size=8192, len_before=64, len_after=60]     1.24    341.3±3.67µs        ? ?/sec    1.00    276.1±1.17µs        ? ?/sec

alamb · 2024-09-18T22:00:33Z

I merged this PR up to main and am running another round of benchmarks. Thank you @Rachelint

alamb · 2024-09-18T23:23:44Z

++ critcmp main string-view-trim
group                                                                                    main                                   string-view-trim
-----                                                                                    ----                                   ----------------
INPUT LEN <= 12/large_string [size=1024, len_before=12, len_after=8]                     1.07     45.0±0.03µs        ? ?/sec    1.00     42.0±0.03µs        ? ?/sec
INPUT LEN <= 12/large_string [size=4096, len_before=12, len_after=8]                     1.07    173.8±0.20µs        ? ?/sec    1.00    163.1±0.18µs        ? ?/sec
INPUT LEN <= 12/large_string [size=8192, len_before=12, len_after=8]                     1.07    345.0±0.15µs        ? ?/sec    1.00    321.9±0.34µs        ? ?/sec
INPUT LEN <= 12/string [size=1024, len_before=12, len_after=8]                           1.00     42.3±0.02µs        ? ?/sec    1.02     43.3±0.02µs        ? ?/sec
INPUT LEN <= 12/string [size=4096, len_before=12, len_after=8]                           1.00    162.9±0.12µs        ? ?/sec    1.03    167.2±0.06µs        ? ?/sec
INPUT LEN <= 12/string [size=8192, len_before=12, len_after=8]                           1.00    323.2±0.17µs        ? ?/sec    1.03    332.3±0.49µs        ? ?/sec
INPUT LEN <= 12/string_view [size=1024, len_before=12, len_after=8]                      1.04     42.1±0.08µs        ? ?/sec    1.00     40.5±0.04µs        ? ?/sec
INPUT LEN <= 12/string_view [size=4096, len_before=12, len_after=8]                      1.02    163.1±0.14µs        ? ?/sec    1.00    159.4±0.16µs        ? ?/sec
INPUT LEN <= 12/string_view [size=8192, len_before=12, len_after=8]                      1.02    323.0±0.25µs        ? ?/sec    1.00    317.1±0.14µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/large_string [size=1024, len_before=64, len_after=4]    1.00    184.2±0.17µs        ? ?/sec    1.01    186.2±0.22µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/large_string [size=4096, len_before=64, len_after=4]    1.00   740.0±16.98µs        ? ?/sec    1.00    741.3±0.94µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/large_string [size=8192, len_before=64, len_after=4]    1.00   1464.2±2.95µs        ? ?/sec    1.01   1482.4±2.26µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/string [size=1024, len_before=64, len_after=4]          1.00    181.8±0.09µs        ? ?/sec    1.03    187.6±0.06µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/string [size=4096, len_before=64, len_after=4]          1.00    722.9±0.86µs        ? ?/sec    1.03    746.0±0.42µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/string [size=8192, len_before=64, len_after=4]          1.00   1440.5±1.30µs        ? ?/sec    1.04   1491.1±3.38µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/string_view [size=1024, len_before=64, len_after=4]     1.00    182.3±0.19µs        ? ?/sec    1.01    184.0±3.93µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/string_view [size=4096, len_before=64, len_after=4]     1.00    724.5±1.23µs        ? ?/sec    1.01    732.5±0.40µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/string_view [size=8192, len_before=64, len_after=4]     1.00   1443.6±1.95µs        ? ?/sec    1.02  1465.6±24.73µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/large_string [size=1024, len_before=64, len_after=60]    1.07     46.6±0.07µs        ? ?/sec    1.00     43.4±0.05µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/large_string [size=4096, len_before=64, len_after=60]    1.06    179.5±0.27µs        ? ?/sec    1.00    168.9±0.19µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/large_string [size=8192, len_before=64, len_after=60]    1.06    363.9±0.73µs        ? ?/sec    1.00    341.8±0.64µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/string [size=1024, len_before=64, len_after=60]          1.00     44.2±0.11µs        ? ?/sec    1.02     45.3±0.17µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/string [size=4096, len_before=64, len_after=60]          1.00    168.6±0.13µs        ? ?/sec    1.03    174.0±0.30µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/string [size=8192, len_before=64, len_after=60]          1.00    343.9±0.93µs        ? ?/sec    1.02    352.1±0.62µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/string_view [size=1024, len_before=64, len_after=60]     1.07     44.6±0.05µs        ? ?/sec    1.00     41.7±0.04µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/string_view [size=4096, len_before=64, len_after=60]     1.05    170.7±0.11µs        ? ?/sec    1.00    163.0±1.12µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/string_view [size=8192, len_before=64, len_after=60]     1.07    348.4±0.87µs        ? ?/sec    1.00    324.8±0.51µs        ? ?/sec

Looks like a reasonable improvement to me

alamb

Thanks @Rachelint -- I went thought this PR again and it looks good

Since I had this PR checked out locally for review, I went ahead and remove the unsafe pointer calculation to try and move this PR along (I know it has been outstanding for too long)

Thanks again!

alamb · 2024-09-23T17:37:26Z

datafusion/functions/src/string/common.rs

+    let views_buf = ScalarBuffer::from(views_buf);
+    let nulls_buf = null_builder.finish();
+
+    // Safety:


Related discussion: apache/arrow-rs#6430

alamb · 2024-09-23T17:47:47Z

datafusion/functions/src/string/common.rs

+        // Safety:
+        // `trim_str` is computed from `str::trim_xxx_matches`,
+        // and its addr is ensured to be >= `origin_str`'s
+        let start = unsafe { trim_str.as_ptr().offset_from(src_str.as_ptr()) as u32 };


I ran this diff:

diff --git a/datafusion/functions/src/string/common.rs b/datafusion/functions/src/string/common.rs index 4f70374b7..f796d10c2 100644 --- a/datafusion/functions/src/string/common.rs +++ b/datafusion/functions/src/string/common.rs @@ -204,10 +204,7 @@ fn trim_and_append_str<'a>( if let (Some(src_str), Some(characters)) = (src_str_opt, trim_characters_opt) { let trim_str = trim_func(src_str, characters); - // Safety: - // `trim_str` is computed from `str::trim_xxx_matches`, - // and its addr is ensured to be >= `origin_str`'s - let start = unsafe { trim_str.as_ptr().offset_from(src_str.as_ptr()) as u32 }; + let start = (src_str.as_bytes().len() - trim_str.as_bytes().len()) as u32; make_and_append_view(views_buf, null_builder, raw, trim_str, start); } else {

And all tests passed.

alamb · 2024-09-23T17:52:00Z

How about we merge this PR and then you can continue work on the optimizations as follow on PRs?

Rachelint · 2024-09-23T18:02:34Z

How about we merge this PR and then you can continue work on the optimizations as follow on PRs?

I am checking about #12395 (comment)
Just wait a minute for me.

alamb · 2024-09-23T18:29:43Z

How about we merge this PR and then you can continue work on the optimizations as follow on PRs?

I am checking about #12395 (comment) Just wait a minute for me.

Sure -- no worries -- we can wait too. I just feel bad about how long this PR has been outstanding

Hmm, some newly added tests seem to be failing

Rachelint · 2024-09-23T18:34:30Z

How about we merge this PR and then you can continue work on the optimizations as follow on PRs?

I am checking about #12395 (comment) Just wait a minute for me.

Sure -- no worries -- we can wait too. I just feel bad about how long this PR has been outstanding

Thanks. I added some case to check it. And unfortunately, I found maybe we can't remove the unsafe codes currently.

alamb · 2024-09-23T18:54:26Z

Thanks. I added some case to check it. And unfortunately, I found maybe we can't remove the unsafe codes currently.

I just can't explain why pointer arithmetic is needed -- I think it is important to fix (or really understand) before merging

Rachelint · 2024-09-23T19:07:14Z

Thanks. I added some case to check it. And unfortunately, I found maybe we can't remove the unsafe codes currently.

I just can't explain why pointer arithmetic is needed -- I think it is important to fix (or really understand) before merging

Maybe disscussion in #12387 can help.

The logic of str::trim_xxx_matches is well explained by @Kev1n8

I've looked into the implementation of [general_trim](https://github.com/apache/datafusion/blob/f5c47fa274d53c1d524a1fb788d9a063bf5240ef/datafusion/functions/src/string/common.rs#L51), it uses the str::trim_xxx_matches methods the obtain the "substring". Furthermore, inside the str::trim_xxx_matches method, it first computes the [start, end) boundary and slices the str.

But unforunately, the needed feature Pattern for getting the start index by safe codes is still unstable... @Kev1n8 mentioned that, too.

The index here is useful for modifying views. Unfortunately, currently the feature Pattern it uses is unstable.

So eventually, we can just through the pointer arithmetic to get the start index currently...

Update:
Unsafe codes have removed, actually we can get the needed index in safe way.

Rachelint · 2024-09-23T19:11:57Z

I have filed an issue #12597 to track the introduced unsafe codes.

And added a todo to mention this issue for tracking and explaining why we introduce unsafe codes here.

Rachelint · 2024-09-25T16:50:29Z

@alamb I found we indeed don't need the unsafe pointer arithmetic to get the start_offset, and I have swithed to a safe way here. Thanks much for suggestion!

https://github.com/Rachelint/arrow-datafusion/blob/f8174626e47d147e90c6715f5052ccfa269f0493/datafusion/functions/src/string/common.rs#L80

alamb

Thanks @Rachelint -- very nice. Thank you for sticking with it

I reran the benchmarks one more time and they looks good to me. Nice work.

I merged up from main and removed some redundant tests and plan to merge this PR when it passes CI.

++ critcmp main string-view-trim
group                                                                                    main                                   string-view-trim
-----                                                                                    ----                                   ----------------
INPUT LEN <= 12/large_string [size=1024, len_before=12, len_after=8]                     1.03     42.9±0.05µs        ? ?/sec    1.00     41.7±0.06µs        ? ?/sec
INPUT LEN <= 12/large_string [size=4096, len_before=12, len_after=8]                     1.02    165.5±0.56µs        ? ?/sec    1.00    161.5±0.06µs        ? ?/sec
INPUT LEN <= 12/large_string [size=8192, len_before=12, len_after=8]                     1.02    327.8±0.19µs        ? ?/sec    1.00    320.2±0.19µs        ? ?/sec
INPUT LEN <= 12/string [size=1024, len_before=12, len_after=8]                           1.01     41.6±0.03µs        ? ?/sec    1.00     41.2±0.12µs        ? ?/sec
INPUT LEN <= 12/string [size=4096, len_before=12, len_after=8]                           1.01    160.3±0.11µs        ? ?/sec    1.00    159.4±0.09µs        ? ?/sec
INPUT LEN <= 12/string [size=8192, len_before=12, len_after=8]                           1.01    318.5±1.89µs        ? ?/sec    1.00    316.4±0.48µs        ? ?/sec
INPUT LEN <= 12/string_view [size=1024, len_before=12, len_after=8]                      1.06     41.6±0.01µs        ? ?/sec    1.00     39.4±0.02µs        ? ?/sec
INPUT LEN <= 12/string_view [size=4096, len_before=12, len_after=8]                      1.03    160.5±0.07µs        ? ?/sec    1.00    155.1±0.13µs        ? ?/sec
INPUT LEN <= 12/string_view [size=8192, len_before=12, len_after=8]                      1.03    318.0±0.20µs        ? ?/sec    1.00    309.3±1.15µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/large_string [size=1024, len_before=64, len_after=4]    1.10    184.3±0.27µs        ? ?/sec    1.00    167.4±0.10µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/large_string [size=4096, len_before=64, len_after=4]    1.09    727.0±0.67µs        ? ?/sec    1.00    665.8±0.41µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/large_string [size=8192, len_before=64, len_after=4]    1.09   1448.7±1.76µs        ? ?/sec    1.00   1329.6±1.77µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/string [size=1024, len_before=64, len_after=4]          1.09    181.8±0.18µs        ? ?/sec    1.00    167.4±0.11µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/string [size=4096, len_before=64, len_after=4]          1.09   725.8±11.87µs        ? ?/sec    1.00    664.9±0.62µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/string [size=8192, len_before=64, len_after=4]          1.09   1443.8±2.64µs        ? ?/sec    1.00   1324.8±1.25µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/string_view [size=1024, len_before=64, len_after=4]     1.11    181.8±0.11µs        ? ?/sec    1.00    163.8±0.22µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/string_view [size=4096, len_before=64, len_after=4]     1.11    724.6±0.84µs        ? ?/sec    1.00    651.6±0.22µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN <= 12/string_view [size=8192, len_before=64, len_after=4]     1.11   1444.8±0.63µs        ? ?/sec    1.00   1302.7±0.98µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/large_string [size=1024, len_before=64, len_after=60]    1.01     44.4±0.12µs        ? ?/sec    1.00     44.0±0.07µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/large_string [size=4096, len_before=64, len_after=60]    1.01    171.7±0.30µs        ? ?/sec    1.00    170.4±0.15µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/large_string [size=8192, len_before=64, len_after=60]    1.01    349.8±0.83µs        ? ?/sec    1.00    347.1±0.53µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/string [size=1024, len_before=64, len_after=60]          1.00     43.0±0.06µs        ? ?/sec    1.00     43.1±0.20µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/string [size=4096, len_before=64, len_after=60]          1.00    166.2±0.21µs        ? ?/sec    1.01    167.5±0.32µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/string [size=8192, len_before=64, len_after=60]          1.00    336.5±1.03µs        ? ?/sec    1.01    341.1±0.42µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/string_view [size=1024, len_before=64, len_after=60]     1.08     43.8±0.17µs        ? ?/sec    1.00     40.6±0.03µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/string_view [size=4096, len_before=64, len_after=60]     1.06    169.0±0.55µs        ? ?/sec    1.00    159.0±0.21µs        ? ?/sec
INPUT LEN > 12, OUTPUT LEN > 12/string_view [size=8192, len_before=64, len_after=60]     1.08    342.9±0.96µs        ? ?/sec    1.00    316.5±0.57µs        ? ?/sec

alamb · 2024-09-25T18:56:10Z

datafusion/sqllogictest/test_files/string/string_view.slt

@@ -982,5 +982,93 @@ logical_plan
 01)Projection: temp.column2 || temp.column3
 02)--TableScan: temp projection=[column2, column3]

+################################################


I double checked and @goldmedal 's recent changes (I think created after this PR) make these tests redundant

datafusion/datafusion/sqllogictest/test_files/string/string_view.slt

Line 56 in 91c8a47

include ./string_query.slt.part

WHich then runs the tests in https://github.com/apache/datafusion/blob/main/datafusion/sqllogictest/test_files/string/string_query.slt.part

I removed these tests from this PR to keep things moving

alamb · 2024-09-25T20:54:31Z

🚀

github-actions bot added the functions label Sep 9, 2024

Rachelint mentioned this pull request Sep 9, 2024

Fix: substr() on StringView column's behavior is inconsistent with the old version #12383

Merged

Rachelint force-pushed the string-view-trim branch from f6c83bf to 325fac6 Compare September 9, 2024 17:18

Rachelint added 5 commits September 10, 2024 23:44

draft.

736eb11

add unit tests for xTrim.

a0da2d0

fix fmt.

3c8b035

tmp copy for ci.

06d104d

move make_and_append_view to common.

48cb4db

Rachelint force-pushed the string-view-trim branch from 325fac6 to 48cb4db Compare September 11, 2024 01:38

Rachelint added 2 commits September 11, 2024 21:08

fix sting view trim about the process of empty string.

863e9b7

fix compile.

36a8125

Rachelint marked this pull request as ready for review September 11, 2024 13:13

eliminate some repeated codes.

aa2c131

This was referenced Sep 11, 2024

DataFusion weekly project plan (Andrew Lamb) - Sep 9, 2024 #12391

Closed

DataFusion weekly project plan (Andrew Lamb) - Sep 16, 2024 #12494

Closed

alamb reviewed Sep 16, 2024

View reviewed changes

add sql test case about string view trim.

e3e9b53

github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Sep 17, 2024

Merge branch 'main' into string-view-trim

dbd0f25

Rachelint force-pushed the string-view-trim branch from 4e092d4 to dbd0f25 Compare September 17, 2024 18:45

remove unused imports.

6d5660f

Rachelint mentioned this pull request Sep 18, 2024

Improve benchmark for ltrim #12513

Merged

remove stale file.

840ec46

alamb mentioned this pull request Sep 23, 2024

Add into_builder methods for Arrays apache/arrow-rs#6430

Open

alamb added 2 commits September 23, 2024 13:39

Merge remote-tracking branch 'apache/main' into string-view-trim

307850a

Avoid unecessary unsafe

064450f

alamb previously approved these changes Sep 23, 2024

View reviewed changes

add unit test cases with a unlined string view output.

c2510de

Rachelint added 3 commits September 24, 2024 02:36

fix tests.

38790b2

improve comments.

20197d9

add todo and the related issue.

2112bc5

use the safe way to get start_offset after trimming.

790f7a9

github-actions bot added the physical-expr Physical Expressions label Sep 25, 2024

fix comments.

f817462

Rachelint and others added 3 commits September 26, 2024 01:22

Merge branch 'main' into string-view-trim

148a991

Remove redundant test

f9c1543

Merge remote-tracking branch 'apache/main' into string-view-trim

e39f916

github-actions bot removed the sqllogictest SQL Logic Tests (.slt) label Sep 25, 2024

alamb approved these changes Sep 25, 2024

View reviewed changes

alamb changed the title ~~Improve performance of trim for string view~~ Improve performance of trim for string view (10%) Sep 25, 2024

alamb mentioned this pull request Sep 25, 2024

Minor: improve documentation to StringView trim #12629

Merged

alamb merged commit dbfde67 into apache:main Sep 25, 2024
25 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of `trim` for string view (10%) #12395

Improve performance of `trim` for string view (10%) #12395

Rachelint commented Sep 9, 2024 •

edited

Loading

Kev1n8 commented Sep 9, 2024

Rachelint commented Sep 9, 2024 •

edited

Loading

alamb left a comment

alamb Sep 16, 2024

alamb Sep 16, 2024

Rachelint Sep 17, 2024 •

edited

Loading

alamb Sep 20, 2024

Rachelint commented Sep 17, 2024 •

edited

Loading

Kev1n8 commented Sep 17, 2024

Rachelint commented Sep 17, 2024

Rachelint commented Sep 17, 2024 •

edited

Loading

alamb commented Sep 18, 2024

alamb commented Sep 18, 2024

alamb left a comment

alamb Sep 23, 2024

alamb Sep 23, 2024

alamb commented Sep 23, 2024

Rachelint commented Sep 23, 2024

alamb commented Sep 23, 2024

Rachelint commented Sep 23, 2024

alamb commented Sep 23, 2024

Rachelint commented Sep 23, 2024 •

edited

Loading

Rachelint commented Sep 23, 2024 •

edited

Loading

Rachelint commented Sep 25, 2024 •

edited

Loading

alamb left a comment

alamb Sep 25, 2024

alamb commented Sep 25, 2024

Improve performance of trim for string view (10%) #12395

Improve performance of trim for string view (10%) #12395

Conversation

Rachelint commented Sep 9, 2024 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Kev1n8 commented Sep 9, 2024

Rachelint commented Sep 9, 2024 • edited Loading

alamb left a comment

Choose a reason for hiding this comment

alamb Sep 16, 2024

Choose a reason for hiding this comment

alamb Sep 16, 2024

Choose a reason for hiding this comment

Rachelint Sep 17, 2024 • edited Loading

Choose a reason for hiding this comment

alamb Sep 20, 2024

Choose a reason for hiding this comment

Rachelint commented Sep 17, 2024 • edited Loading

Kev1n8 commented Sep 17, 2024

Rachelint commented Sep 17, 2024

Rachelint commented Sep 17, 2024 • edited Loading

alamb commented Sep 18, 2024

alamb commented Sep 18, 2024

alamb left a comment

Choose a reason for hiding this comment

alamb Sep 23, 2024

Choose a reason for hiding this comment

alamb Sep 23, 2024

Choose a reason for hiding this comment

alamb commented Sep 23, 2024

Rachelint commented Sep 23, 2024

alamb commented Sep 23, 2024

Rachelint commented Sep 23, 2024

alamb commented Sep 23, 2024

Rachelint commented Sep 23, 2024 • edited Loading

Rachelint commented Sep 23, 2024 • edited Loading

Rachelint commented Sep 25, 2024 • edited Loading

alamb left a comment

Choose a reason for hiding this comment

alamb Sep 25, 2024

Choose a reason for hiding this comment

alamb commented Sep 25, 2024

Improve performance of `trim` for string view (10%) #12395

Improve performance of `trim` for string view (10%) #12395

Rachelint commented Sep 9, 2024 •

edited

Loading

Rachelint commented Sep 9, 2024 •

edited

Loading

Rachelint Sep 17, 2024 •

edited

Loading

Rachelint commented Sep 17, 2024 •

edited

Loading

Rachelint commented Sep 17, 2024 •

edited

Loading

Rachelint commented Sep 23, 2024 •

edited

Loading

Rachelint commented Sep 23, 2024 •

edited

Loading

Rachelint commented Sep 25, 2024 •

edited

Loading