Miri: Skip over GlobalAllocs when sweeping #118080

saethlin · 2023-11-20T01:59:15Z

Global allocations in the interpreter are never deallocated, so their AllocId is always in-use and cannot be removed by the GC. So when we are checking whether an AllocId can be removed, we really want to fast-path out if we know that the AllocId refers to a Global allocation.

In the interpreter we have a few HashMaps that map AllocId to something else, so in all those maps I've stuck a bool alongside the value which is a local cache of "Is this AllocId global". LiveAllocs::is_live now also takes a &mut bool so that it can set the flag if it deduces that the AllocId is in use because it is global.

InterpCx::is_alloc_live now returns a Liveness enum which is alive, dead, or global, so that we can populate local bool if we ever look up whether a Global allocation is live.

r? RalfJung

Ideally we could set this flag upon insertion into these maps, but eagerly setting the value for base_ptr_tag doesn't seem possible. It's also tempting to pack the bit we need here into AllocId, but users of these maps don't know if they are looking up a GlobalAlloc so we would need Hash+PartialEq that ignores the bit which seems cursed.

bors · 2023-11-21T18:13:12Z

☔ The latest upstream changes (presumably #118134) made this pull request unmergeable. Please resolve the merge conflicts.

rustbot · 2023-11-23T22:57:01Z

Some changes occurred to the CTFE / Miri engine

cc @rust-lang/miri

The Miri subtree was changed

cc @rust-lang/miri

saethlin · 2023-11-23T22:57:39Z

Oh 🤦 I just remembered why I was keeping this as a draft. I wanted to post benchmarks. I'll do that in a few hours.

RalfJung · 2023-11-24T06:34:32Z

compiler/rustc_const_eval/src/interpret/memory.rs

+    pub fn is_alloc_live(&self, id: AllocId) -> Liveness {
+        if self.memory.alloc_map.contains_key_ref(&id) {
+            return Liveness::Live;
+        }


Note that these might be global. On their first read or write, globals get copied into the alloc_map, which is needed because we need to convert them from the global provenance type to that of Miri.

So what even is your definition of "global"? "AllocId is in tcx" or "AllocId is not in memory"? Those two are not equivalent.

RalfJung · 2023-11-24T06:35:36Z

src/tools/miri/src/borrow_tracker/stacked_borrows/mod.rs

@@ -648,32 +648,27 @@ trait EvalContextPrivExt<'mir: 'ecx, 'tcx: 'mir, 'ecx>: crate::MiriInterpCxExt<'
            };

            let (_size, _align, alloc_kind) = this.get_alloc_info(alloc_id);


This does not care about size and align so it should call is_alloc_live, not get_alloc_info.
(We probably have more of these across Miri.)

RalfJung · 2023-11-24T06:39:39Z

In the interpreter we have a few HashMaps that map AllocId to something else, so in all those maps I've stuck a bool alongside the value

Hm, my first reaction is that I really don't like such global changes. Most places shouldn't care whether something is global or not, and so this pollutes the actually relevant logic with some irrelevant book-keeping. I would strongly prefer if that can be avoided, or at least factored away so as to be completely invisible inside intptrcast and borrow tracking.

So the point of this is that tcx.try_get_global_alloc(id).is_some() is so expensive that we want to avoid doing it on each GC run?

saethlin · 2023-11-25T03:43:33Z

So the point of this is that tcx.try_get_global_alloc(id).is_some() is so expensive that we want to avoid doing it on each GC run?

For allocations which will never deallocated but currently are not referenced in the Machine, the current implementation does two lookups (one into the map of AllocIds referenced in the Machine, then try_get_global_alloc) before we conclude that the GC should not clean up that AllocId. That's the kind of Allocation I want to identify here; those which we know will never be deallocated, and therefore the GC can never clean up entries for them in these maps.

The perf implications of this PR are insignificant for normal operation, so if you don't think we can get the complexity cost down, I think it might make sense to not do this optimization. Running the GC infrequently is very effective for pushing down overheads like this into the noise.

But just to point out the value, I derived this optimization originally by studying execution of the 0weak_memory_consistency test, which uses the flags -Zmiri-disable-stacked-borrows -Zmiri-provenance-gc=1 which is basically the worst case for the AllocId-collecting parts of the GC. With those flags set in the benchmark suite, we see this before this PR:

Benchmark 1: cargo +stage2 miri run --manifest-path=bench-cargo-miri/backtraces/Cargo.toml
  Time (mean ± σ):     36.691 s ±  0.704 s    [User: 36.507 s, System: 0.080 s]
  Range (min … max):   35.720 s … 37.411 s    5 runs
 
Benchmark 1: cargo +stage2 miri run --manifest-path=bench-cargo-miri/invalidate/Cargo.toml
  Time (mean ± σ):     11.261 s ±  0.109 s    [User: 11.160 s, System: 0.068 s]
  Range (min … max):   11.098 s … 11.392 s    5 runs
 
Benchmark 1: cargo +stage2 miri run --manifest-path=bench-cargo-miri/mse/Cargo.toml
  Time (mean ± σ):      1.046 s ±  0.008 s    [User: 0.980 s, System: 0.061 s]
  Range (min … max):    1.038 s …  1.059 s    5 runs
 
Benchmark 1: cargo +stage2 miri run --manifest-path=bench-cargo-miri/serde1/Cargo.toml
  Time (mean ± σ):      3.947 s ±  0.046 s    [User: 3.862 s, System: 0.071 s]
  Range (min … max):    3.874 s …  3.997 s    5 runs
 
Benchmark 1: cargo +stage2 miri run --manifest-path=bench-cargo-miri/serde2/Cargo.toml
  Time (mean ± σ):     11.978 s ±  0.214 s    [User: 11.867 s, System: 0.076 s]
  Range (min … max):   11.848 s … 12.357 s    5 runs
  
Benchmark 1: cargo +stage2 miri run --manifest-path=bench-cargo-miri/slice-get-unchecked/Cargo.toml
  Time (mean ± σ):     697.6 ms ±   1.6 ms    [User: 626.7 ms, System: 66.9 ms]
  Range (min … max):   694.8 ms … 698.6 ms    5 runs
  
Benchmark 1: cargo +stage2 miri run --manifest-path=bench-cargo-miri/unicode/Cargo.toml
  Time (mean ± σ):      6.600 s ±  0.017 s    [User: 6.513 s, System: 0.065 s]
  Range (min … max):    6.582 s …  6.625 s    5 runs
 
Benchmark 1: cargo +stage2 miri run --manifest-path=bench-cargo-miri/zip-equal/Cargo.toml
  Time (mean ± σ):      2.746 s ±  0.010 s    [User: 2.677 s, System: 0.059 s]
  Range (min … max):    2.736 s …  2.757 s    5 runs

And with this PR, we see:

Benchmark 1: cargo +stage2 miri run --manifest-path=bench-cargo-miri/backtraces/Cargo.toml
  Time (mean ± σ):     25.631 s ±  0.422 s    [User: 25.506 s, System: 0.081 s]
  Range (min … max):   25.232 s … 26.200 s    5 runs
 
Benchmark 1: cargo +stage2 miri run --manifest-path=bench-cargo-miri/invalidate/Cargo.toml
  Time (mean ± σ):     10.580 s ±  0.088 s    [User: 10.444 s, System: 0.067 s]
  Range (min … max):   10.441 s … 10.686 s    5 runs
 
Benchmark 1: cargo +stage2 miri run --manifest-path=bench-cargo-miri/mse/Cargo.toml
  Time (mean ± σ):     981.4 ms ±  12.4 ms    [User: 910.6 ms, System: 66.1 ms]
  Range (min … max):   960.2 ms … 993.0 ms    5 runs
 
Benchmark 1: cargo +stage2 miri run --manifest-path=bench-cargo-miri/serde1/Cargo.toml
  Time (mean ± σ):      3.780 s ±  0.041 s    [User: 3.685 s, System: 0.076 s]
  Range (min … max):    3.751 s …  3.851 s    5 runs
 
Benchmark 1: cargo +stage2 miri run --manifest-path=bench-cargo-miri/serde2/Cargo.toml
  Time (mean ± σ):     11.514 s ±  0.396 s    [User: 11.414 s, System: 0.080 s]
  Range (min … max):   11.266 s … 12.214 s    5 runs
 
Benchmark 1: cargo +stage2 miri run --manifest-path=bench-cargo-miri/slice-get-unchecked/Cargo.toml
  Time (mean ± σ):     649.1 ms ±  10.9 ms    [User: 580.6 ms, System: 63.9 ms]
  Range (min … max):   630.1 ms … 657.4 ms    5 runs
 
Benchmark 1: cargo +stage2 miri run --manifest-path=bench-cargo-miri/unicode/Cargo.toml
  Time (mean ± σ):      2.702 s ±  0.049 s    [User: 2.608 s, System: 0.061 s]
  Range (min … max):    2.657 s …  2.780 s    5 runs
 
Benchmark 1: cargo +stage2 miri run --manifest-path=bench-cargo-miri/zip-equal/Cargo.toml
  Time (mean ± σ):      2.612 s ±  0.040 s    [User: 2.511 s, System: 0.067 s]
  Range (min … max):    2.578 s …  2.677 s    5 runs

The 0weak_memory_consistency test has the same change as the unicode benchmark.

RalfJung · 2023-11-25T07:44:24Z

Those are some pretty impressive wins indeed, albeit for a rather artificial benchmark. How far do you have to increase the GC interval to make the improvement disappear in the noise?

For allocations which will never deallocated but currently are not referenced in the Machine, the current implementation does two lookups (one into the map of AllocIds referenced in the Machine, then try_get_global_alloc) before we conclude that the GC should not clean up that AllocId.

This could be reduced to one lookup by swapping the order in which they are checked. But I suppose that's worse elsewhere?

bors · 2023-11-25T17:59:01Z

☔ The latest upstream changes (presumably #118284) made this pull request unmergeable. Please resolve the merge conflicts.

saethlin · 2023-11-27T02:09:39Z

Obviated by #118336

saethlin force-pushed the skip-global-allocs branch from 17bdf20 to 9fc2c00 Compare November 20, 2023 03:44

saethlin changed the title ~~Miri: Skip over GlobalAllocs when GC'ing base_addr~~ Miri: Skip over GlobalAllocs when collecting Nov 20, 2023

This comment was marked as off-topic.

Sign in to view

saethlin added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 20, 2023

saethlin changed the title ~~Miri: Skip over GlobalAllocs when collecting~~ Miri: Skip over GlobalAllocs when sweeping Nov 20, 2023

This comment was marked as abuse.

Sign in to view

saethlin removed A-testsuite Area: The testsuite used to check the correctness of rustc T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue. labels Nov 20, 2023

saethlin force-pushed the skip-global-allocs branch from 9fc2c00 to 16c5769 Compare November 21, 2023 18:52

Miri: Skip over GlobalAllocs when collecting

8520226

saethlin force-pushed the skip-global-allocs branch from 16c5769 to 8520226 Compare November 23, 2023 18:51

rustbot assigned RalfJung Nov 23, 2023

saethlin marked this pull request as ready for review November 23, 2023 22:56

RalfJung reviewed Nov 24, 2023

View reviewed changes

saethlin closed this Nov 27, 2023

saethlin deleted the skip-global-allocs branch November 27, 2023 02:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Miri: Skip over GlobalAllocs when sweeping #118080

Miri: Skip over GlobalAllocs when sweeping #118080

saethlin commented Nov 20, 2023 •

edited

Loading

This comment was marked as off-topic.

This comment was marked as abuse.

bors commented Nov 21, 2023

rustbot commented Nov 23, 2023

saethlin commented Nov 23, 2023

RalfJung Nov 24, 2023

RalfJung Nov 24, 2023

RalfJung commented Nov 24, 2023

saethlin commented Nov 25, 2023

RalfJung commented Nov 25, 2023

bors commented Nov 25, 2023

saethlin commented Nov 27, 2023

		@@ -648,32 +648,27 @@ trait EvalContextPrivExt<'mir: 'ecx, 'tcx: 'mir, 'ecx>: crate::MiriInterpCxExt<'
		};

		let (_size, _align, alloc_kind) = this.get_alloc_info(alloc_id);

Miri: Skip over GlobalAllocs when sweeping #118080

Miri: Skip over GlobalAllocs when sweeping #118080

Conversation

saethlin commented Nov 20, 2023 • edited Loading

This comment was marked as off-topic.

This comment was marked as abuse.

bors commented Nov 21, 2023

rustbot commented Nov 23, 2023

saethlin commented Nov 23, 2023

RalfJung Nov 24, 2023

Choose a reason for hiding this comment

RalfJung Nov 24, 2023

Choose a reason for hiding this comment

RalfJung commented Nov 24, 2023

saethlin commented Nov 25, 2023

RalfJung commented Nov 25, 2023

bors commented Nov 25, 2023

saethlin commented Nov 27, 2023

saethlin commented Nov 20, 2023 •

edited

Loading