You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The code below is in apache arrow cpp[1]. The arrow-rs also has similiar phenomenon[2].
To be short, when size is gurantee to be less or equal to 12, gcc would inline the memcpy and memset
but the clang don't optimize this. See godbolt link [3]. The problem is still exists when -ffreestanding is enabled.
With AVX it should be possible to do masked moves for multiple-of-word-size types. On AVX512 it would even work for byte-sized ones.
This frequently comes up when attempting to vectorize code by chunking slices into arrays. The variable-length tail then ends up being a small but variable-length copy.
The code below is in apache arrow cpp[1]. The arrow-rs also has similiar phenomenon[2].
To be short, when size is gurantee to be less or equal to
12
, gcc would inline thememcpy
andmemset
but the clang don't optimize this. See godbolt link [3]. The problem is still exists when
-ffreestanding
is enabled.Would this being a problem? If it can be fixed with some compiler flags, what flag should I use?
[1] https://github.com/apache/arrow/blob/63b34c97c5d3ca6d20dacb9e92b404986f1d7d62/cpp/src/arrow/util/binary_view_util.h#L28
[2] apache/arrow-rs#6034
[3] https://godbolt.org/z/47T8s69xK
The text was updated successfully, but these errors were encountered: