You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! Thanks for the awesome work here. I've been tinkering with it for the past couple of evenings.
I've been testing for potential performance improvements by switching from std::vector & your allocator to using blocks of memory allocated on the stack (via alloca). I'm seeing a 25%-ish improvement for short strings, but about 25%-ish degradation for longer strings. I'm wondering if you have any ideas why that might be?
Also, I haven't yet done a proper multi-thread test. I suspect the contention on _mm_malloc will yield a higher performance improvement when using alloca, since the threads don't need to sync on heap allcoation. If I get some time next week I will try to do that and I'd love to hear what you think of this approach.
The changes I made to achieve the stack allocation are here... NickStrupat@58ee960
The text was updated successfully, but these errors were encountered:
I like the idea, but switching to alloca unconditionally might not be ideal, because it’s completely possible to bring things to a stack overflow using a single call that way…
What we do in Node.js core a lot is using a fixed-size stack-allocated buffer for small strings, and deferring to a larger, heap-allocated buffer for longer ones.
Also, this library is generally better at handling long strings – for short ones, vectorization doesn’t really pay out, and there’s the additional overhead of JS → C++ calls.
Hi! Thanks for the awesome work here. I've been tinkering with it for the past couple of evenings.
I've been testing for potential performance improvements by switching from
std::vector
& your allocator to using blocks of memory allocated on the stack (viaalloca
). I'm seeing a 25%-ish improvement for short strings, but about 25%-ish degradation for longer strings. I'm wondering if you have any ideas why that might be?Also, I haven't yet done a proper multi-thread test. I suspect the contention on
_mm_malloc
will yield a higher performance improvement when usingalloca
, since the threads don't need to sync on heap allcoation. If I get some time next week I will try to do that and I'd love to hear what you think of this approach.The changes I made to achieve the stack allocation are here...
NickStrupat@58ee960
The text was updated successfully, but these errors were encountered: