Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace jemalloc with mimalloc #92249

Closed
wants to merge 1 commit into from

Conversation

fee1-dead
Copy link
Member

@fee1-dead fee1-dead commented Dec 24, 2021

supersedes #81782

@rustbot rustbot added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. labels Dec 24, 2021
@rust-highfive
Copy link
Collaborator

r? @matthewjasper

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Dec 24, 2021
@rust-log-analyzer

This comment has been minimized.

@dtolnay
Copy link
Member

dtolnay commented Dec 24, 2021

@bors try

@bors
Copy link
Contributor

bors commented Dec 24, 2021

⌛ Trying commit b90cfc8 with merge e7033aafc2d47cc9d3ebe315ec20e2b5a9e586da...

@takkuumi
Copy link

Why replace jemalloc?

@@ -582,9 +582,9 @@ changelog-seen = 2
# Map debuginfo paths to `/rust/$sha/...`, generally only set for releases
#remap-debuginfo = false

# Link the compiler against `jemalloc`, where on Linux and OSX it should
# Link the compiler against `mimalloc`, where on Linux and OSX it should
Copy link
Contributor

@AngelicosPhosphoros AngelicosPhosphoros Dec 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use mimalloc on Windows too? It is created by MS so it should be supported, probably.

https://github.com/microsoft/mimalloc#override-on-windows

@AngelicosPhosphoros
Copy link
Contributor

@NateLing It can be faster, probably.

However, PR really lacks description yet.

@takkuumi
Copy link

@NateLing It can be faster, probably.

However, PR really lacks description yet.

In my program (a web server based on hyper and tower), wrk test result could be 270000 rps +-, but mimalloc rps is 260000+-

@fee1-dead
Copy link
Member Author

In my program (a web server based on hyper and tower), wrk test result could be 270000 rps +-, but mimalloc rps is 260000+-

In the last PR the results were all green on perf.rlo. In general there is no best allocator and it depends on the type of the program,

@AngelicosPhosphoros
Copy link
Contributor

AngelicosPhosphoros commented Dec 24, 2021

In my program (a web server based on hyper and tower), wrk test result could be 270000 rps +-, but mimalloc rps is 260000+-

Did you disabled MiMallocs secure mode? It comes with cost and it is not necessary for correct Rust program.

Also, MiMalloc works on Windows too so we would be able to get +1 platform with better allocator in compiler.

@bors
Copy link
Contributor

bors commented Dec 24, 2021

☀️ Try build successful - checks-actions
Build commit: e7033aafc2d47cc9d3ebe315ec20e2b5a9e586da (e7033aafc2d47cc9d3ebe315ec20e2b5a9e586da)

@takkuumi
Copy link

@AngelicosPhosphoros yep, i disabled MiMallocs secure mode.
@fee1-dead i agree you. i dont care about 10000 rps. i think there could be a guide doc, so user can choose the right malloc in rust program.

@fee1-dead
Copy link
Member Author

Assuming @dtolnay wanted a perf run:

@rust-timer build e7033aafc2d47cc9d3ebe315ec20e2b5a9e586da

Regarding windows support: I will investigate how to do this in the next few days.

@rust-timer
Copy link
Collaborator

Queued e7033aafc2d47cc9d3ebe315ec20e2b5a9e586da with parent 59337cd, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (e7033aafc2d47cc9d3ebe315ec20e2b5a9e586da): comparison url.

Summary: This change led to very large relevant improvements 🎉 in compiler performance.

  • Very large improvement in instruction counts (up to -10.3% on incr-unchanged builds of deep-vector)

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf -perf-regression

@Mark-Simulacrum
Copy link
Member

FWIW it looks like the crates.io libmimalloc-sys is pretty out of date (2 years old), so while the perf results look good I'd suggest experimenting with a fork that brings mimalloc to a more recent release.

In general though results look pretty promising!

@fee1-dead
Copy link
Member Author

it looks like the crates.io libmimalloc-sys is pretty out of date (2 years old)

You should be looking at https://crates.io/crates/libmimalloc-sys which was updated just a month ago which I believe is the latest release.

@nikic
Copy link
Contributor

nikic commented Dec 26, 2021

The perf results still show a 35% regression in memory usage.

@fee1-dead
Copy link
Member Author

fee1-dead commented Dec 26, 2021

The perf results still show a 35% regression in memory usage.

Seems to be microsoft/mimalloc#393. They have suggested using v2.x but those versions are currently in beta and AFAIK there is no crate that currently package that major version. I'm going to make a fork and test it out.

// registers itself with the allocator's zone APIs in a ctor. However,
// the linker doesn't seem to consider ctors as "used" when statically
// linking, so we need to explicitly depend on the function.
#[cfg(target_os = "macos")]
{
extern "C" {
fn _rjem_je_zone_register();
fn _mi_macos_override_malloc();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the superseded PR, great to see this is progressing! Out of curiosity, I tried enabling debugs for libmimalloc-sys on macOS BigSur and I could not get the debug statistics increasing with env MIMALLOC_VERBOSE=1 MIMALLOC_SHOW_STATS=1 ./x.py build with this commit:

...
heap stats:    peak      total      freed    current       unit      count   
    normal:      0          0          0          0                            ok
      huge:      0          0          0          0                            ok
     giant:      0          0          0          0                            ok
     total:      0          0          0          0                            ok
malloc req:      0          0          0          0                            ok

  reserved:      0          0          0          0                            ok
 committed:      0          0          0          0                            ok
     reset:      0          0          0          0                            ok
   touched:      0          0          0          0                            ok
  segments:      0          0          0          0                            ok
-abandoned:      0          0          0          0                            ok
   -cached:      0          0          0          0                            ok
     pages:      0          0          0          0                            ok
-abandoned:      0          0          0          0                            ok
 -extended:      0    
 -noretire:      0    
     mmaps:      0    
   commits:      0    
   threads:      0          0          0          0                            ok
  searches:     0.0 avg
mimalloc: using 1 numa regions
numa nodes:       1
   elapsed:      11.095 s
   process: user: 11.671 s, system: 0.478 s, faults: 100, rss: 223.9 MiB
mimalloc: process done: 0x10df93e00
...

Maybe something more is missing? This is where I got stuck earlier. I can try again if someone has these working, maybe something related to my debug-enable? On Linux however, these do show values when override works.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I did not test this in macOS because I don't have a machine accessible. Windows is not supported because static linking cannot be used with override. Not sure what the next steps are...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The results look really good to me, so I think we should try to find a way to fix this, if possible. I am happy to test this, but I am not proficient with macOS either. Maybe @sfackler or @ehuss have ideas what to try next (as you fixed jemalloc for macOS earlier)?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, @thomcc may be interested to give new ideas.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have to enable stats when building or they aren't tracked, (see: https://github.com/microsoft/mimalloc/blob/master/include/mimalloc-types.h#L32-L33). I don't remember if theres support in the mimalloc (or -sys) crates directly, but setting env CFLAGS="-DMI_STAT=1" or env CFLAGS="-DMI_STAT=2" when building should make cc do it, hopefully.

Copy link

@jq-rs jq-rs Jan 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thomcc Thanks, I checked this in detail. In my debug build I enable MI_DEBUG_FULL in CMakeLists.txt and this will enable MI_DEBUG=3definition. This then again will set MI_STAT=2 https://github.com/microsoft/mimalloc/blob/master/include/mimalloc-types.h#L421-L427. I cannot easily see what could go wrong, but to be sure, I'll try with a separate forced setting to MI_STAT.

If the statistics are valid, then mi_macos_override_malloc seems not to be called, perhaps due to similar reasons jemalloc originally did not work (#82423), where the linker is optimizing away some needed call paths. We would need to figure out what is still missing for mimalloc. My skills need sharpening here, so any hints on how to do it best are welcome :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have your integration code published anywhere? If not I can take a stab, but would like to avoid redundant work.

I think the mi_macos_override_malloc stuff is requires a little bit of extra work. most of the documentatoin mimalloc has about what's required to override on macos is a bit dated (or wrong) too (interposing happens by default, flat namespace should not be required).

Also, the zone override is not intended to be used on its own, but that's fixable.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thomcc Thank you for taking a look! My current changes are just testing and stats debug related, which we discussed above. I do not have anything else outside this PR.

Copy link
Member

@thomcc thomcc Jan 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking more of a look, it's very unsurprising that you can't call _mi_macos_override_malloc directly, it's a static private function. It's also only included when mimalloc is built as a dynamic library (there are various reasons for this, but I think the main one is that mimalloc attempts interposing, and prior to macOS 10.12, this could only be done as a dynamic lib with some DYLD_ environment vars defined).

There are at least a few issues with the way rustc will need to use it. Nothing that seems enormous, thankfully, I'll see if I can come up with a fixed version we can test out, and upstream if it works fine.

The biggest issue is probably that I don't have an easy way to test on older versions of macOS, so will be taking a stab in them. I also don't know how far back rustc supports for the host (I know libstd supports back to 10.7, though). Also, does rustc fork itself during execution? If so, that may complicate things a little, at least in terms of testing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, does rustc fork itself during execution?

No. It only spawns a new thread to run the compilation on.

@fee1-dead
Copy link
Member Author

I don't know how I can move forward with this. Perhaps this could be left as an option? It uses more memory but is faster..

@klensy
Copy link
Contributor

klensy commented Apr 12, 2022

I don't know how I can move forward with this. Perhaps this could be left as an option? It uses more memory but is faster.

Rebase and bump mimalloc version? Maybe something new will be over that time.

@Mark-Simulacrum Mark-Simulacrum added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 24, 2022
@Mark-Simulacrum
Copy link
Member

I think at this time we're unlikely to want to take a ~6-35% max-rss regression for the instruction count wins from adopting this PR.

That said, I'd love to see this rebased on a more recent version of mimalloc to see if they've managed to improve the memory overhead since then. If mimalloc is just making a different tradeoff on the memory/performance curve, then I'm not sure it's something we're willing to go for, given that rustc already uses a large amount of memory and we get some amount of complaints about that.

@nnethercote
Copy link
Contributor

@fee1-dead: this was tested with libmimalloc-sys 0.1.23, which uses mimalloc 1.7.0. libmimalloc-sys 0.1.30 is now available, which uses mimalloc 1.7.6. That's not 2.0.x, and the release notes suggest that the difference between mimalloc 1.7.0 and 1.7.6 is mostly improved Mac support. Still, is it worth trying again?

@octavonce, @thomcc: Is there any prospect of libmimalloc-sys gaining support for 2.0.x?

@thomcc
Copy link
Member

thomcc commented Oct 26, 2022

It already has support for it (As I note in purpleprotocol/mimalloc_rust#62 (comment), the mimalloc_rust submodule point towards 2.0.6 -- might be even newer now).

A big concern is that mimalloc's macOS override support is a lot less robust (and also a lot more dodgy) than jemalloc's. On other platforms it should be fine, though.

@octavonce
Copy link

octavonce commented Oct 26, 2022

@thomcc @nnethercote

A big concern is that mimalloc's macOS override support is a lot less robust (and also a lot more dodgy) than jemalloc's. On other platforms it should be fine, though.

But since this would be included as a static dependency to the compiler, and not overriding the OS allocator, I don't see how this is relevant in this case?

Maybe try to test again with the latest version?

@thomcc
Copy link
Member

thomcc commented Oct 26, 2022

But since this would be included as a static dependency to the compiler, and not overriding the OS allocator, I don't see how this is relevant in this case?

That's not actually true, rustc needs to override malloc (as documented here and here). I've experimented with making it use #[global_allocator], but it's not exactly viable.

@lqd
Copy link
Member

lqd commented Oct 26, 2022

One of the 2.x betas was tried in #92317 but we should probably try to have up-to-date numbers.

@octavonce
Copy link

@thomcc Oh, alright. Thank you for the explanation, I thought it was straightforward enough

@jq-rs
Copy link

jq-rs commented Oct 27, 2022

I think the API which allows using mimalloc in rustc, similar to jemalloc's one, is missing at the moment for the macOS part of mimalloc. Probably it should be implemented first. At least I am able to test the API if someone has a proto version available.

@thomcc
Copy link
Member

thomcc commented Oct 27, 2022

I have a prototype version that fixes the issues with mimalloc's zone integration I can dig out. It's late now and I have work tomorrow, but I'll try to resurrect this weekend. Please ping me if I do not, or feel free to DM me on zulip if anybody wants more information on the issues with mimalloc's current malloc_zone implementation (TLDR the handling of zone free is wrong, and the interposing is highly dubious).

@jq-rs
Copy link

jq-rs commented Oct 27, 2022

Thanks @thomcc. To elaborate a bit, I think some form of public _mi_macos_override_malloc is probably needed as a next step, it is now a static private function.

@thomcc
Copy link
Member

thomcc commented Oct 27, 2022

I think it's implementation is at least as much of a problem than the privacy (it will behave very badly if any system APIs manage zones on their own, and I think we/LLVM are big enough that we probably need to worry about that).

@fee1-dead fee1-dead closed this Nov 21, 2022
@fee1-dead fee1-dead deleted the mimalloc branch November 21, 2022 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.