memcpy/memset not inlined when __builtin_assume gurantee the data is short enough #106895

mapleFU · 2024-09-01T09:07:25Z

The code below is in apache arrow cpp[1]. The arrow-rs also has similiar phenomenon[2].

To be short, when size is gurantee to be less or equal to 12, gcc would inline the memcpy and memset
but the clang don't optimize this. See godbolt link [3]. The problem is still exists when -ffreestanding is enabled.

c_type makeInline1(const char* data, int32_t size) {
  ARROW_COMPILER_ASSUME(size <= kInlineSize); // __builtin_assume
  c_type out;
  out.inlined = {size, {}};
  // Memcpy for 0 to 12
  memcpy(&out.inlined.data, data, size);
  return out;
}

Would this being a problem? If it can be fixed with some compiler flags, what flag should I use?

[1] https://github.com/apache/arrow/blob/63b34c97c5d3ca6d20dacb9e92b404986f1d7d62/cpp/src/arrow/util/binary_view_util.h#L28
[2] apache/arrow-rs#6034
[3] https://godbolt.org/z/47T8s69xK

The text was updated successfully, but these errors were encountered:

the8472 · 2024-09-01T14:28:35Z

Yeah we often see this in Rust that short but variable-length memcpy's don't optimize well.

https://rust.godbolt.org/z/4P9xxeYsc

With AVX it should be possible to do masked moves for multiple-of-word-size types. On AVX512 it would even work for byte-sized ones.

This frequently comes up when attempting to vectorize code by chunking slices into arrays. The variable-length tail then ends up being a small but variable-length copy.

dtcxzyw added llvm:optimizations missed-optimization labels Sep 1, 2024

dtcxzyw self-assigned this Sep 1, 2024

mapleFU mentioned this issue Sep 1, 2024

Improve performance of constructing ByteViews for small strings apache/arrow-rs#6034

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

memcpy/memset not inlined when __builtin_assume gurantee the data is short enough #106895

memcpy/memset not inlined when __builtin_assume gurantee the data is short enough #106895

mapleFU commented Sep 1, 2024

the8472 commented Sep 1, 2024 •

edited

Loading

memcpy/memset not inlined when __builtin_assume gurantee the data is short enough #106895

memcpy/memset not inlined when __builtin_assume gurantee the data is short enough #106895

Comments

mapleFU commented Sep 1, 2024

the8472 commented Sep 1, 2024 • edited Loading

the8472 commented Sep 1, 2024 •

edited

Loading