-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Binary decoding is slow #1804
Comments
I've also tried looking at the core, but it's fairly impenetrable to me. @Gabriel439 do you have any other suggestions what we could do except ask for help on the |
@sjakobi: I ran into the same issue and I also don't know how to make progress |
I've opened well-typed/cborg#236. |
While looking at replicateM :: Int -> Decoder s a -> Decoder s [a]
replicateM n d = Vector.toList <$> Vector.replicateM n d
I've also just tried this version: replicateM :: Int -> Decoder s a -> Decoder s [a]
replicateM n d = go n
where
go 0 = return []
go n = do
!x <- d
(x :) <$> go (n - 1) That seems to improve speed in the range of 5-10%. PR incoming. Given the sheer size of |
@sjakobi: Another thing that might further improve performance is eliminating the list intermediate. In many cases the list returned by |
@Gabriel439 Yeah. For Regarding |
@sjakobi: Well, if it is an expression protected by an integrity check we could rely on the input being sorted. The |
This reduces timings for decoding the cache for https://raw.githubusercontent.com/vmchale/cpkg/0003ec5af2859008424aec06c2db0437a6a22ff9/pkgs/pkg-set.dhall with `dhall decode --quiet` by 8.7% to 12.3%. Context: #1804
I've tried this. It seems to actually reduce performance for my |
@sjakobi: Also, if the profile is not very informative, you can fix that by adding extra cost annotations to the code using the Also, I highly recommend using the |
This reduces timings for decoding the cache for https://raw.githubusercontent.com/vmchale/cpkg/0003ec5af2859008424aec06c2db0437a6a22ff9/pkgs/pkg-set.dhall with `dhall decode --quiet` by 8.7% to 12.3%. Context: #1804
I've experimented a bit with the map decoding (the branch is here), but haven't seen any substantial speed-ups yet. I suspect that maybe benchmarking this stuff with I'll try some more profiling too, although I do wonder how the profiling might affect the structure of the generated code and its optimizations. |
Profiling with the cost centres added in #1808 reveals what kind of expressions we spend most decoding time on: Prelude
cpkg
Note that for
We could try optimizing this by
I also wonder why the builtins are encoded in this string format at all. It would be much cheaper if they were encoded with simple integer tags like the operators. Regarding the other strings, for variables, map keys, etc. we'd probably also profit to switching to UTF8, for example by using For |
@sjakobi: The original rationale for encoding built-ins in that way was to treat them essentially the same as free variables I agree that not decoding built-ins to UTF8 is probably a performance win |
This seems to speed up decoding of Prelude and cpkg caches by 1-2%. Context: #1804.
This seems to speed up decoding of Prelude and cpkg caches by 1-2%. Context: #1804.
* Use decodeUtf8ByteArray to avoid UTF16-encoding the scrutinee. * Optimize the pattern matching by grouping the patterns by length. GHC currently doesn't produce static length information for string literals. Consequently the pattern matching worked somewhat like this: s <- decodeString let len_s = length s if len_s == length "Natural/build" && sameBytes s "Natural/build" then return NaturalBuild else if len_s == length "Natural/fold" && sameBytes s "Natural/fold" ... Decoding `Sort`, the most extreme case, would involve a total of 32 conditional jumps as a consequence of length comparisons alone. Judging by the Core, we can get that number down to 8 by grouping the patterns by length: One to check the length of the decoded string, and (unfortunately) still one each for the 7 candidate literals of length 4. The number of string content comparisons should be unchanged. The result of these optimizations is that the time to decode the cache for cpkg is reduced by 7-9%. Decoding time for the Prelude goes down by 13-16%. This also changes the builtin encoding to use encodeUtf8ByteArray in order to avoid UTF16-encoding and decoding the builtins strings. I didn't check the performance implications though. Context: #1804.
This seems to speed up decoding of Prelude and cpkg caches by 1-2%. Context: #1804.
* Use decodeUtf8ByteArray to avoid UTF16-encoding the scrutinee. * Optimize the pattern matching by grouping the patterns by length. GHC currently doesn't produce static length information for string literals. Consequently the pattern matching worked somewhat like this: s <- decodeString let len_s = length s if len_s == length "Natural/build" && sameBytes s "Natural/build" then return NaturalBuild else if len_s == length "Natural/fold" && sameBytes s "Natural/fold" ... Decoding `Sort`, the most extreme case, would involve a total of 32 conditional jumps as a consequence of length comparisons alone. Judging by the Core, we can get that number down to 8 by grouping the patterns by length: One to check the length of the decoded string, and (unfortunately) still one each for the 7 candidate literals of length 4. The number of string content comparisons should be unchanged. The result of these optimizations is that the time to decode the cache for cpkg is reduced by 7-9%. Decoding time for the Prelude goes down by 13-16%. This also changes the builtin encoding to use encodeUtf8ByteArray in order to avoid UTF16-encoding and decoding the builtins strings. I didn't check the performance implications though. Context: #1804. Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
In my test (which only involved |
There have been a few reports of slow binary decoding performance on Discourse recently:
In general, people seem to get only about 10 MB/s.
I've tried profiling
dhall decode
with the new--quiet
option, but the output only points at two functions fromcborg
:Things to investigate:
Things to try:
Map
andSet
deserialization in the style of Deserializing maps and sets haskell/containers#405The text was updated successfully, but these errors were encountered: