Make safe/unsafe foreign calls size dependent #282

WJWH · 2020-09-09T16:41:21Z

In https://github.com/haskell/bytestring/blob/master/Data/ByteString/Internal.hs#L747, we import memcpy as an unsafe FFI call, which means that it runs "inline" and will block all other Haskell threads running on the same capability in the threaded runtime. If the Bytestring being copied is very large, this might block the other Haskell threads for longer than is desired.

In the crypto libraries, there are both safe and unsafe imports which are switched between depending on the size of the input. This trick might be useful here as well.

Interested to hear your opinions about this, I can pick it up if this is something desired.

The text was updated successfully, but these errors were encountered:

vdukhovni · 2020-09-09T18:42:49Z

One might say that, for large enough buffers, performance-sensitive applications should not be using strict bytestrings, and use Lazy (chunked) ones instead or streams.

I am not sure that we should introduce automatic switching based on size, but we could perhaps leave the choice to the caller. If we were to provide a switch, the cuttoff buffer size should probably be larger than with SHA and the like, because copying data is much cheaper than computing a cryptographic hash.

WJWH · 2020-09-09T19:15:54Z

I agree that high performance applications probably should use something other than large strict bytestrings and they do already mention that it'll take O(N) time. They don't mention that (for example) cons on a strict bytestring will block all other Haskell threads on the capability for multithreaded programs and this is probably not what most users would expect.

Apparently the difference between safe and unsafe is only a few hundred nanoseconds, so we could set the switchover value suitably high that the safe path would only be called when the overhead is less than a few percent?

vdukhovni · 2020-09-09T19:52:57Z

Right, I'm just hesitant to assume that the cutoff is comparable on all platforms, so would be fairly conservative with the buffer size (>= 128k? More?) to be sure that we're not too strongly biased to just X86_64 platforms.

Do you have measurements of how long it takes to copy N bytes for a few values of N on X86 and perhaps ARM? How big are the samples that motivated this issue?

vdukhovni · 2020-09-09T19:54:42Z

Also what is the cost of moving all the other threads off the capability so they get a chance elsewhere while the copy is happening? How does that scale with the number of threads?

WJWH · 2020-09-09T20:00:31Z

I was thinking more >512 MB or something like that 😅. The crypto example is really very low since it's so cpu intensive even for fairly small chunks of data. For memcmp and memcpy I'd expect the cutoff to be much higher. To me, it's not so much about squeezing every last bit of performance out of this, just preventing some worst case scenario's when people are accidentally being stupid.

vdukhovni · 2020-09-09T20:19:12Z

OK, so you're on board for a fairly conservative cutoff, perhaps not quite that high, but some multiple MB sounds about right. We'll have to see what others think of this idea. I'm not opposed, though it will cost an extra branch on every dispatch, I don't know whether that's ever an issue...

Bodigrim · 2020-09-10T20:37:21Z

memcpy is used in several functions. I would like to see an analysis which of these functions are mostly used with smaller and which with larger inputs. And benchmarks about performance costs of branching and safe calls.

hsyl20 · 2020-09-17T14:36:59Z

We should really add memcpy/memset/memmove primops into GHC. GHC already has thresholds to inline internal calls (see https://gitlab.haskell.org/ghc/ghc/-/blob/master/compiler/GHC/CmmToAsm/X86/CodeGen.hs#L2194 for memcpy on X86). It could insert unsafe/safe calls with new thresholds if needed.

vdukhovni · 2020-09-17T19:05:55Z

We should really add memcpy/memset/memmove primops into GHC.

Indeed, see #274

Bodigrim mentioned this issue Apr 2, 2021

Should C functions be called FFI-safely on big inputs? haskell/text#327

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make safe/unsafe foreign calls size dependent #282

Make safe/unsafe foreign calls size dependent #282

WJWH commented Sep 9, 2020

vdukhovni commented Sep 9, 2020

WJWH commented Sep 9, 2020

vdukhovni commented Sep 9, 2020

vdukhovni commented Sep 9, 2020

WJWH commented Sep 9, 2020

vdukhovni commented Sep 9, 2020

Bodigrim commented Sep 10, 2020

hsyl20 commented Sep 17, 2020

vdukhovni commented Sep 17, 2020

Make safe/unsafe foreign calls size dependent #282

Make safe/unsafe foreign calls size dependent #282

Comments

WJWH commented Sep 9, 2020

vdukhovni commented Sep 9, 2020

WJWH commented Sep 9, 2020

vdukhovni commented Sep 9, 2020

vdukhovni commented Sep 9, 2020

WJWH commented Sep 9, 2020

vdukhovni commented Sep 9, 2020

Bodigrim commented Sep 10, 2020

hsyl20 commented Sep 17, 2020

vdukhovni commented Sep 17, 2020