-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace jemalloc with mimalloc #92249
Conversation
(rust-highfive has picked a reviewer for you, use r? to override) |
This comment has been minimized.
This comment has been minimized.
@bors try |
⌛ Trying commit b90cfc8 with merge e7033aafc2d47cc9d3ebe315ec20e2b5a9e586da... |
Why replace jemalloc? |
@@ -582,9 +582,9 @@ changelog-seen = 2 | |||
# Map debuginfo paths to `/rust/$sha/...`, generally only set for releases | |||
#remap-debuginfo = false | |||
|
|||
# Link the compiler against `jemalloc`, where on Linux and OSX it should | |||
# Link the compiler against `mimalloc`, where on Linux and OSX it should |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use mimalloc on Windows too? It is created by MS so it should be supported, probably.
https://github.com/microsoft/mimalloc#override-on-windows
@NateLing It can be faster, probably. However, PR really lacks description yet. |
In my program (a web server based on hyper and tower), wrk test result could be 270000 rps +-, but mimalloc rps is 260000+- |
In the last PR the results were all green on perf.rlo. In general there is no best allocator and it depends on the type of the program, |
Did you disabled MiMallocs secure mode? It comes with cost and it is not necessary for correct Rust program. Also, MiMalloc works on Windows too so we would be able to get +1 platform with better allocator in compiler. |
☀️ Try build successful - checks-actions |
@AngelicosPhosphoros yep, i disabled MiMallocs secure mode. |
Assuming @dtolnay wanted a perf run: @rust-timer build e7033aafc2d47cc9d3ebe315ec20e2b5a9e586da Regarding windows support: I will investigate how to do this in the next few days. |
Queued e7033aafc2d47cc9d3ebe315ec20e2b5a9e586da with parent 59337cd, future comparison URL. |
Finished benchmarking commit (e7033aafc2d47cc9d3ebe315ec20e2b5a9e586da): comparison url. Summary: This change led to very large relevant improvements 🎉 in compiler performance.
If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf. @bors rollup=never |
FWIW it looks like the crates.io libmimalloc-sys is pretty out of date (2 years old), so while the perf results look good I'd suggest experimenting with a fork that brings mimalloc to a more recent release. In general though results look pretty promising! |
You should be looking at https://crates.io/crates/libmimalloc-sys which was updated just a month ago which I believe is the latest release. |
The perf results still show a 35% regression in memory usage. |
Seems to be microsoft/mimalloc#393. They have suggested using |
// registers itself with the allocator's zone APIs in a ctor. However, | ||
// the linker doesn't seem to consider ctors as "used" when statically | ||
// linking, so we need to explicitly depend on the function. | ||
#[cfg(target_os = "macos")] | ||
{ | ||
extern "C" { | ||
fn _rjem_je_zone_register(); | ||
fn _mi_macos_override_malloc(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the superseded PR, great to see this is progressing! Out of curiosity, I tried enabling debugs for libmimalloc-sys on macOS BigSur and I could not get the debug statistics increasing with env MIMALLOC_VERBOSE=1 MIMALLOC_SHOW_STATS=1 ./x.py build
with this commit:
...
heap stats: peak total freed current unit count
normal: 0 0 0 0 ok
huge: 0 0 0 0 ok
giant: 0 0 0 0 ok
total: 0 0 0 0 ok
malloc req: 0 0 0 0 ok
reserved: 0 0 0 0 ok
committed: 0 0 0 0 ok
reset: 0 0 0 0 ok
touched: 0 0 0 0 ok
segments: 0 0 0 0 ok
-abandoned: 0 0 0 0 ok
-cached: 0 0 0 0 ok
pages: 0 0 0 0 ok
-abandoned: 0 0 0 0 ok
-extended: 0
-noretire: 0
mmaps: 0
commits: 0
threads: 0 0 0 0 ok
searches: 0.0 avg
mimalloc: using 1 numa regions
numa nodes: 1
elapsed: 11.095 s
process: user: 11.671 s, system: 0.478 s, faults: 100, rss: 223.9 MiB
mimalloc: process done: 0x10df93e00
...
Maybe something more is missing? This is where I got stuck earlier. I can try again if someone has these working, maybe something related to my debug-enable? On Linux however, these do show values when override works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I did not test this in macOS because I don't have a machine accessible. Windows is not supported because static linking cannot be used with override. Not sure what the next steps are...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, @thomcc may be interested to give new ideas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have to enable stats when building or they aren't tracked, (see: https://github.com/microsoft/mimalloc/blob/master/include/mimalloc-types.h#L32-L33). I don't remember if theres support in the mimalloc (or -sys) crates directly, but setting env CFLAGS="-DMI_STAT=1"
or env CFLAGS="-DMI_STAT=2"
when building should make cc
do it, hopefully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thomcc Thanks, I checked this in detail. In my debug build I enable MI_DEBUG_FULL
in CMakeLists.txt
and this will enable MI_DEBUG=3
definition. This then again will set MI_STAT=2
https://github.com/microsoft/mimalloc/blob/master/include/mimalloc-types.h#L421-L427. I cannot easily see what could go wrong, but to be sure, I'll try with a separate forced setting to MI_STAT
.
If the statistics are valid, then mi_macos_override_malloc
seems not to be called, perhaps due to similar reasons jemalloc
originally did not work (#82423), where the linker is optimizing away some needed call paths. We would need to figure out what is still missing for mimalloc
. My skills need sharpening here, so any hints on how to do it best are welcome :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have your integration code published anywhere? If not I can take a stab, but would like to avoid redundant work.
I think the mi_macos_override_malloc stuff is requires a little bit of extra work. most of the documentatoin mimalloc has about what's required to override on macos is a bit dated (or wrong) too (interposing happens by default, flat namespace should not be required).
Also, the zone override is not intended to be used on its own, but that's fixable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thomcc Thank you for taking a look! My current changes are just testing and stats debug related, which we discussed above. I do not have anything else outside this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking more of a look, it's very unsurprising that you can't call _mi_macos_override_malloc
directly, it's a static private function. It's also only included when mimalloc is built as a dynamic library (there are various reasons for this, but I think the main one is that mimalloc attempts interposing, and prior to macOS 10.12, this could only be done as a dynamic lib with some DYLD_
environment vars defined).
There are at least a few issues with the way rustc will need to use it. Nothing that seems enormous, thankfully, I'll see if I can come up with a fixed version we can test out, and upstream if it works fine.
The biggest issue is probably that I don't have an easy way to test on older versions of macOS, so will be taking a stab in them. I also don't know how far back rustc supports for the host (I know libstd supports back to 10.7, though). Also, does rustc fork itself during execution? If so, that may complicate things a little, at least in terms of testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, does rustc fork itself during execution?
No. It only spawns a new thread to run the compilation on.
I don't know how I can move forward with this. Perhaps this could be left as an option? It uses more memory but is faster.. |
Rebase and bump mimalloc version? Maybe something new will be over that time. |
I think at this time we're unlikely to want to take a ~6-35% max-rss regression for the instruction count wins from adopting this PR. That said, I'd love to see this rebased on a more recent version of mimalloc to see if they've managed to improve the memory overhead since then. If mimalloc is just making a different tradeoff on the memory/performance curve, then I'm not sure it's something we're willing to go for, given that rustc already uses a large amount of memory and we get some amount of complaints about that. |
@fee1-dead: this was tested with libmimalloc-sys 0.1.23, which uses mimalloc 1.7.0. libmimalloc-sys 0.1.30 is now available, which uses mimalloc 1.7.6. That's not 2.0.x, and the release notes suggest that the difference between mimalloc 1.7.0 and 1.7.6 is mostly improved Mac support. Still, is it worth trying again? @octavonce, @thomcc: Is there any prospect of libmimalloc-sys gaining support for 2.0.x? |
It already has support for it (As I note in purpleprotocol/mimalloc_rust#62 (comment), the mimalloc_rust submodule point towards 2.0.6 -- might be even newer now). A big concern is that mimalloc's macOS override support is a lot less robust (and also a lot more dodgy) than jemalloc's. On other platforms it should be fine, though. |
But since this would be included as a static dependency to the compiler, and not overriding the OS allocator, I don't see how this is relevant in this case? Maybe try to test again with the latest version? |
That's not actually true, rustc needs to override malloc (as documented here and here). I've experimented with making it use |
One of the 2.x betas was tried in #92317 but we should probably try to have up-to-date numbers. |
@thomcc Oh, alright. Thank you for the explanation, I thought it was straightforward enough |
I think the API which allows using |
I have a prototype version that fixes the issues with mimalloc's zone integration I can dig out. It's late now and I have work tomorrow, but I'll try to resurrect this weekend. Please ping me if I do not, or feel free to DM me on zulip if anybody wants more information on the issues with mimalloc's current malloc_zone implementation (TLDR the handling of zone free is wrong, and the interposing is highly dubious). |
Thanks @thomcc. To elaborate a bit, I think some form of public |
I think it's implementation is at least as much of a problem than the privacy (it will behave very badly if any system APIs manage zones on their own, and I think we/LLVM are big enough that we probably need to worry about that). |
supersedes #81782