-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Builder: unsound reuse of buffers #690
Comments
`toLazyByteString :: Builder -> LazyByteString` had a race condition that could generate wrong results if two threads concurrently evaluated the result. This bug was introduced in haskell#581 (5c4d236) and first present in release 0.11.5.0 (as 0c030bb). Due to the use of `unsafeDupablePerformIO` for performance, it is critical that any IO actions executed when running a `Builder` can be interrupted or executed multiple times. In principle, filling a buffer is safe provided the buffer is used only once and the same bytes are written each time. However, `wrapChunk` in `buildStepToCIOS` would re-use a buffer in the trimming case after copying its contents to produce a new trimmed chunk. This is safe when run in a single thread, but if two threads simultaneously execute the code, one of them may still be copying the contents while the other starts overwriting the buffer. This patch fixes `wrapChunk` to unconditionally allocate a new buffer after trimming, rather than re-using the old buffer. This will presumably come at a slight performance cost for builders inserting many trimmed chunks.
In practice, I think the only practical ways for a client to guarantee this are:
Well, option 2 could have reasonable applications, but I'm not entirely sure about the merits of documenting that this is possible. It seems less dangerous to for such applications to use their own analogue of the |
`toLazyByteString :: Builder -> LazyByteString` had a race condition that could generate wrong results if two threads concurrently evaluated the result. This bug was introduced in #581 (5c4d236) and first present in release 0.11.5.0 (as 0c030bb). Due to the use of `unsafeDupablePerformIO` for performance, it is critical that any IO actions executed when running a `Builder` can be interrupted or executed multiple times. In principle, filling a buffer is safe provided the buffer is used only once and the same bytes are written each time. However, `wrapChunk` in `buildStepToCIOS` would re-use a buffer in the trimming case after copying its contents to produce a new trimmed chunk. This is safe when run in a single thread, but if two threads simultaneously execute the code, one of them may still be copying the contents while the other starts overwriting the buffer. This patch fixes `wrapChunk` to unconditionally allocate a new buffer after trimming, rather than re-using the old buffer. This will presumably come at a slight performance cost for builders inserting many trimmed chunks.
* Clarify documentation of 'customStrategy' based on #690 * Remove outdated comments on AllocationStrategy (These were not visible in the Haddock output anyway.)
`toLazyByteString :: Builder -> LazyByteString` had a race condition that could generate wrong results if two threads concurrently evaluated the result. This bug was introduced in #581 (5c4d236) and first present in release 0.11.5.0 (as 0c030bb). Due to the use of `unsafeDupablePerformIO` for performance, it is critical that any IO actions executed when running a `Builder` can be interrupted or executed multiple times. In principle, filling a buffer is safe provided the buffer is used only once and the same bytes are written each time. However, `wrapChunk` in `buildStepToCIOS` would re-use a buffer in the trimming case after copying its contents to produce a new trimmed chunk. This is safe when run in a single thread, but if two threads simultaneously execute the code, one of them may still be copying the contents while the other starts overwriting the buffer. This patch fixes `wrapChunk` to unconditionally allocate a new buffer after trimming, rather than re-using the old buffer. This will presumably come at a slight performance cost for builders inserting many trimmed chunks.
* Clarify documentation of 'customStrategy' based on #690 * Remove outdated comments on AllocationStrategy (These were not visible in the Haddock output anyway.)
`toLazyByteString :: Builder -> LazyByteString` had a race condition that could generate wrong results if two threads concurrently evaluated the result. This bug was introduced in #581 (5c4d236) and first present in release 0.11.5.0 (as 0c030bb). Due to the use of `unsafeDupablePerformIO` for performance, it is critical that any IO actions executed when running a `Builder` can be interrupted or executed multiple times. In principle, filling a buffer is safe provided the buffer is used only once and the same bytes are written each time. However, `wrapChunk` in `buildStepToCIOS` would re-use a buffer in the trimming case after copying its contents to produce a new trimmed chunk. This is safe when run in a single thread, but if two threads simultaneously execute the code, one of them may still be copying the contents while the other starts overwriting the buffer. This patch fixes `wrapChunk` to unconditionally allocate a new buffer after trimming, rather than re-using the old buffer. This will presumably come at a slight performance cost for builders inserting many trimmed chunks.
* Clarify documentation of 'customStrategy' based on #690 * Remove outdated comments on AllocationStrategy (These were not visible in the Haddock output anyway.)
This is meant to be both 0.11.5.4 and 0.12.2.0, but GitHub allows only one milestone. |
Summary
toLazyByteString :: Builder -> LazyByteString
has a race condition that may generate wrong results if two threads concurrently evaluate the result. This originally manifested itself in an application using Cassava, which would produce CSV files in which part of the file was overwritten with data from later in the file. The underlying cause appears to be an interaction betweenunsafeDupablePerformIO
and buffer reuse inbuildStepToCIOS
arising since version 0.11.5.0 (specifically #581).Reproducer
Since the race condition is non-deterministic, it is not entirely trivial to reproduce, but the following module races two threads parsing the same lazy bytestring (which should get the same result) and typically hits a "bad" case after a few iterations.
Full code
Diagnosis
Due to the use of
unsafeDupablePerformIO
for performance, it is critical that any IO actions executed when running aBuilder
can be interrupted or executed multiple times. In principle, filling a buffer is safe provided the buffer is used only once and the same bytes are written each time. However,wrapChunk
inbuildStepToCIOS
re-uses a buffer in the trimming case.More precisely, I believe what is happening is the following:
Builder
includes large chunks usinginsertChunk
that hit thetrim
case ofwrapChunk
.fpbuf
.fpbuf'
and starts copying bytes fromfpbuf
.fpbuf'
, copies it, yieldsbs
and then continues using the same buffer.fpbuf
with more data from later in the output.fpbuf
, so it copies bogus data from later in the output to earlier in (its version of) the output.It looks like the bug was introduced in #581 which is first present in 0.11.5.0 (see 5c4d236 and 0c030bb).
Solution
An easy fix is for the
trim
case ofwrapChunk
to unconditionally allocate a new buffer after trimming, rather than re-using the old buffer. This appears to fix the problem in my reproducer. (I think it is safe to use the same buffer if the chunk is actually empty, though.) I don't know how much impact this will have on performance, but presumably it will slightly reduce it for builders inserting many trimmed chunks.Replacing the
unsafeDupablePerformIO
calls withunsafePerformIO
should also fix the issue, but is likely to significantly impact performance (4ac4be0 claimed introducingunsafeDupablePerformIO
"increases the performance of bytestring chunk insertion by 20%.").Perhaps users who know that they will not evaluate the resulting bytestring in multiple threads concurrently could somehow indicate this and opt in to more aggressive buffer reuse, e.g. via a custom
AllocationStrategy
?I notice that using
customStrategy
forAllocationStrategy
can in principle choose to reuse buffers "if it can guarantee that this referentially transparent" (sic). It's not clear to me what the caller needs to guarantee, but this issue suggests that (amongst other things?) the caller must ensure that the resulting lazy bytestring is evaluated only by a single thread. Could we document this more clearly?The text was updated successfully, but these errors were encountered: