-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compute length at compile time for literal strings #191
Compute length at compile time for literal strings #191
Conversation
On gitlab, @bgamari asked why
In the previous comment, I presented the resulting Core for
If we switch to
Even though the lengths have been computed at compile time, GHC floats the call to But maybe that's not convincing enough. One might argue that that's just a bug in the optimizer, that really GHC should just do a better job optimizing code with unsound productions of
With
I won't bother showing the assembly since I'm no good at reading or analyzing it, but I can offer a hand-wavy approximation in the binary bloat that
The delta here is 4528 bytes. Divide by ten and round a little, and we can approximate that every There's another angle to that I don't have any hard numbers on, another way that
So with 1000 literal bytestrings that actually got used at some point in time as a program ran, you'd be paying 16KB that you didn't need to. And then maybe on top of that, during garbage-collection ... mumble mumble ... GC roots ... something something ... mutable list ... errrr ... longer pause times, maybe? My understanding of the garbage collector is not great. Perhaps someone else might be able to predict whether or not it would be reasonable to expect any improvement there. Hopefully, this has been somewhat convincing. I suspect that at least one of these three arguments (more Core optimization opportunities, small binaries, smaller heap at runtime) will resonate with most readers. I don't have any numbers from any real-world applications, but if that's important, I can try to come up with something. |
Have you considered combining this with (rebasing onto) #175 ? Ideally, both optimizations would happen, and one or the other may yield more benefits depending on the application, and the two should not a priori be in conflict. Perhaps #175 should happen first, #175 requires at least GHC 8.0 to provide compatibility in the Internal module, and this PR will require 8.10 or later, so it may be fine to start with #175 as a base. |
I've not rebased onto that because it's not clear to me that it will actually be merged. Both PRs improve performance, but #175 will break a lot of downstream dependencies, and progress appears to have stalled on it. I'm not convinced that it would be a good idea that it would be good to chain this PR to that one given that they are orthogonal, and given that this one may very well end up being merged first. |
It was my impression that #175 remains backwards-compatible provided pattern-synonyms (8.0) are available. If progress has stalled, it perhaps mostly reticence. If, as it seems, this PR requires 8.10 or later, when that is released, it would IMHO be a reasonable opportunity to drop support for GHC < 8.0, and then both could be merged? I do agree that the two are orthogonal in their feature set, and either or both could be adopted, but I really think that if we're focused on optimizing ByteString overhead it should be both. As for who goes first and who then needs to rebase, that's less clear, perhaps this PR does go first in the end, but it seemed to me like in the grand scheme of things, if I were merging both, I'd first merge #175 . |
I'm going to guard with CCP so that GHCs older than 8.12 can still build |
For the record, #175 looks compatible with 8.0, it is 7.10 and earlier support that would have to be dropped. Dropping 7.10 by the time 8.12 ships seems plausible... |
@andrewthad On a separate note, I was curious (ignorant) of how the proposed GHC MR handles string literals with embedded NULs. At first glance they are truncated unexpectedly, but I may be missing something that excludes such strings from the proposed string literal optimisation: |
What's the compatibility problem with 7.10 and earlier? |
The internal representation of ByteStrings changes (dropping the offset), which means that |
Ah, I had thought that you were pointing out that #175 itself wouldn't build with GHC <= 7.10. Sorry for the noise! |
No, #175 will build fine with GHC 7.x, just won't offer a backwards compatible FWIW, on a whim I rebuilt my DANE survey engine with ByteString patched a la #175 and GHC 8.8.2. This meant also rebuilding AttoParsec, Hasql, Conduit, DNS, TLS, ... all to use the new ByteStrings. Everything built cleanly and runs just fine. So #175 just works across a decent swath of the library ecosystem with 8.x. My code does not support 7.x, so I can't speak to what would break without attempting a backport that is likely not a good use of my time. The engine spends most of its time waiting for DNS replies from the network (~4k DNS qps, going much faster would likely trigger remote rate limits), so no visible change in throughput. Testing just a simple DNS RRset -> Builder codebase streaming out cooked DNS records, I see about a 3.5% reduction in runtime with #175, but this does not feel much memory pressure, it runs in a small constant space, so dramatic savings are not expected. Given all the other costs, a measurable ~3.5% is not bad at all. |
Having finally resolved my confusion about MR2165, I should say that I support this pull request, once the pre-requisite MR lands. Whether that would also be a good opportunity to revisit #175 is I guess a separate decision. FWIW, in the 2019 survey we see that GHC 7.x was used by just 4% of responders, and the accompanying commentary was:
So, as a separate matter, it is perhaps time to consider merging #175 at some point. Cc: @cartazio |
Add unsafePackLiteral to Data.ByteString.Internal. With GHC-8.10+, use known-key variant of C `strlen` from `GHC.CString` that supports constant folding. Also in GHC 8.10, another data constructor of ForeignPtrContents becomes available: LiteralPtr. For string literals, this is now used. It saves space when there are lots of literals, and it improves opportunities for case-of-known data constructor optimizations when a function scrutinizes the length of a ByteString.
cdc5d36
to
103727b
Compare
MR 2165 has landed in GHC. This PR is ready for final review. |
Good to see that nothing breaks tests with GHC 8.10.1 and earlier, but sadly we don't yet have tests for (yesterday's) GHC 8.11, so I am not sure this can be merged just yet (except of course by code inspection without tests). Speaking of tests, I've built GHC 8.11 for myself, and bytestring with this PR, but I'm having a bit of a struggle getting the dependencies needed for testing built... |
After a bunch of effort I got tests built, and ran into a problem. The |
@andrewthad What would it take to extend this optimization to |
Separately, when I built GHC-itself with the patched |
Good news for |
@sjakobi The best way to get better @vdukhovni I've addressed the requested changes. That GHC panic is a bummer, but I'm sure that'll get sorted out soon. |
Thanks for working on this stuff @andrewthad! :) I was curious about EDIT: The PR including these patches has some more info: dhall-lang/dhall-haskell#1810 |
Back to this MR, are you looking to have it merged into 0.10.10.1 (a bug-fix + CI + backlog of safe/backwards-compatible PRs) or just reviewed and approved for a later release? If releases of |
Co-authored-by: Simon Jakobi <[email protected]>
Looks good to me, but since the code depends on yet unreleased features of GHC, I would rather delay merging until GHC 9.0. |
Agreed. At the least, I think we should wait until there's a release candidate, but I don't mind waiting until an actual release either. |
While this did not land in 0.10.12.0 (makes sense), now that 0.11.0.0 is coming up, perhaps this is ready? |
Except for the change log merge conflict, this PR is ready to be merged. Let me know if you're ready to merge it into master, and if so, I'll resolve the conflict and squash. |
@andrewthad yes, please proceed. |
@andrewthad sorry for reminding, but I intend to cut a new release soon, ideally by the end of the week. Actually I can rebase your PR myself, if you are busy. |
Go ahead and do it. I tried to this morning but then got stuck because there’s a merge commit in there. |
Speaking of merge-commits. Can we please BAN THEM from this project. They make rebasing patches essentially impossible, and make it difficult to understand the commit history. There's a pull request for OpenSSL I've been unable to help the author to progress, because he updated it a few times by "merging", rather than rebasing, and now it is completely unusable. In the actual OpenSSL repository, merge commits are disallowed. The history is (correctly) forced to be linear. |
I've addressed the merge conflicts with yet another merge commit – I hope I got it right. In other repositories, I usually try to keep |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few copy-and-paste mistakes on my part. :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This got merged into master on 2020-08-25, and the necessary supporting features are going to be in GHC 9.0.1, but whether this is an oversight or an explicit choice, this MR is not (presently) in the I am guessing that at this point it may be too late to change that, but if not, or if there's something we should have done or should still do to facilitate its inclusion at the most appropriate time, I thought it may be sensible to bring this up for discussion... Should this have been handled differently? Is this MR deliberately postponed to 9.2? Just lack of cycles? ... |
@vdukhovni I was expecting this patch to be bundled with GHC 9.0.1 as a part of However, when we discussed using I think I have underestimated how long it would take to update the other core libraries for compatibility with |
Do you remember which packages were blockers? At this point both It would sure be great if at least the boot packages were mutually responsive to support ongoing evolution amongst their peers. Perhaps helping to coordinate this better is a worthy topic for the foundation to explore... |
I know that
Yeah, it would be great if the Haskell foundation could help with this. |
@vdukhovni You can check the progress regarding using |
Do not merge this unless MR 2165 lands.
Add unsafePackLiteral to Data.ByteString.Internal. With GHC 8.10+, use known-key variant of C
strlen
fromGHC.CString
that supports constant folding. Also in GHC 8.10, another data constructor of ForeignPtrContents becomes available: LiteralPtr. For string literals, this is now used. It saves space when there are lots of literals, and it improves opportunities for case-of-known data constructor optimizations when a function scrutinizes the length of aByteString
.This can result in massive optimization opportunities that were previously missed. The following example is contrived but illustrative:
After this PR, GHC is able to optimize the Core to this (I've omitted the irrelevant parts like the module name):
In
bytestring
today, this is what you get when you compile this code:In practice, I don't expect many users will experience the big win that shows up in the contrived example of
biggestStringLength
. I think the more common win (a smaller one) is going to be thatByteString
s that originate as literals result in less generated code. So, slightly smaller binaries, better opportunities for the the memory caches to be used effectively, the usual jazz. But maybe I'm wrong. Maybe there are a bunch of people that will get some scrutinize-a-known-int optimizations kicking in.