Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add core migration measuring and anti-bias to drmemtrace scheduler #6938

Closed
derekbruening opened this issue Aug 27, 2024 · 2 comments
Closed
Assignees

Comments

@derekbruening
Copy link
Contributor

Today the drmemtrace scheduler is not preferring to keep an input on the same core it last ran on. This issue covers measuring the migration statistics and probably adding some kind of bias to avoid migrations.

@derekbruening derekbruening self-assigned this Aug 27, 2024
derekbruening added a commit that referenced this issue Aug 27, 2024
Adds schedule statistics to memtrace_stream.h.  Implements these
statistics in the streams returned by scheduler_t.  This initial round
includes total switches, time preempts, and direct switch attempts and
successes.

Adds checks that these match the schedule_stats tool's values.

Adds tests of the values to several key scheduler unit tests.

Issue: #6938
derekbruening added a commit that referenced this issue Aug 28, 2024
Adds schedule statistics to memtrace_stream.h. Implements these
statistics in the streams returned by scheduler_t. This initial round
includes the following values:
```
[scheduler] Stats for output #0
[scheduler]   Switch input->input      :        16
[scheduler]   Switch input->idle       :         4
[scheduler]   Switch idle->input       :         3
[scheduler]   Switch nop               :       119
[scheduler]   Quantum preempts         :       131
[scheduler]   Direct switch attempts   :         0
[scheduler]   Direct switch successes  :         0
```

The switches are split into those 4 categories to make it easier to
compare to other sources of switch counts, such as `perf` where `perf`
limited to a cgroup or process will be missing the `idle->input`
switches, or schedule_stats which is missing the `input->idle` today.

Adds checks that these match the schedule_stats tool's values.

Adds tests of the values to several key scheduler unit tests.

Issue: #6938
derekbruening added a commit that referenced this issue Aug 29, 2024
Adds the migration count to the new scheduler stats.
Adds checks of the count to various unit tests.

Issue: #6938
derekbruening added a commit that referenced this issue Aug 29, 2024
Adds the migration count to the new scheduler stats. Adds checks of the
count to various unit tests.

Issue: #6938
derekbruening added a commit that referenced this issue Aug 30, 2024
Adds the new scheduler-provided statistics to the schedule_stats tool.
Adds a sanity test check.

Issue: #6938
derekbruening added a commit that referenced this issue Aug 30, 2024
Adds the new scheduler-provided statistics to the schedule_stats tool. 
Adds a sanity test check.

Tested on some larger apps where a high count of "nop" switches was
sometimes found; as part of understanding those, I found a code path
where if the runqueue is empty and the current thread is supposed to go
unscheduled it would be run again instead. Fixed that here; unclear it
has ever happened.

Issue: #6938
derekbruening added a commit that referenced this issue Sep 4, 2024
Adds a std::mutex wrapper which stores the owner so we can assert on
lock ownership.

Adds lock ownership asserts to scheduler functions whose comments say
they require a lock on entry.

Fixes several cases of missing locks found by these asserts.

Issue: #6938
derekbruening added a commit that referenced this issue Sep 5, 2024
Adds a std::mutex wrapper which stores the owner so we can assert on
lock ownership.

Adds lock ownership asserts to scheduler functions whose comments say
they require a lock on entry.

Fixes several cases of missing locks found by these asserts.

Issue: #6938
derekbruening added a commit that referenced this issue Sep 6, 2024
Adds lock contention counters for mutex_dbg_owned::lock() in
non-NDEBUG builds.  While it's not there for release builds, the
results in non-release should still be indicative of general lock
behavior.

Prints the stats for sched_lock_ with the other scheduler stats.

Sample results on schedule_stats on a threadsig trace show a lot of
contention even with only 3 cores:

```
$ clients/bin64/drmemtrace_launcher -indir ../build_x64_dbg_tests/drmemtrace.threadsig.5* -core_sharded -cores 3 -tool schedule_stats -verbose 1
[scheduler] Schedule lock acquired     :   2196602
[scheduler] Schedule lock contended    :    257580
```

Issue: #6938
derekbruening added a commit that referenced this issue Sep 9, 2024
Adds lock contention counters for mutex_dbg_owned::lock() in non-NDEBUG builds.
While it's not there for release builds, the results in non-release should still be
indicative of general lock behavior.

When I just changed the mutex_dbg_owned header, the relese build static drmemtrace
tests (e.g., the burst_* ones) were all failing with floating point exceptions. It
turns out it's due to different compilation units having different values for NDEBUG.
I tried to make them all the same: but we have some files that contain `#undef NDEBUG`
and other complexities. So I split the mutex_dbg_owned implementation into a
.cpp file and have the mutex wrapper extra fields always present.

Prints the stats for sched_lock_ with the other scheduler stats.

Sample results on schedule_stats on a threadsig trace show a lot of
contention even with only 3 cores, which the forthcoming separate runqueue change
will address:

```
$ clients/bin64/drmemtrace_launcher -indir ../build_x64_dbg_tests/drmemtrace.threadsig.5* -core_sharded -cores 3 -tool schedule_stats -verbose 1
[scheduler] Schedule lock acquired     :   2196602
[scheduler] Schedule lock contended    :    257580
```

Issue: #6938
derekbruening added a commit that referenced this issue Sep 9, 2024
Direct thread switches were not being counted as migrations in the
original migration stats in PR #6950: we address that here.  A test
with multiple outputs will be added later as part of the move to
separate runqueues, which is where this bug was discovered.

Issue: #6938
derekbruening added a commit that referenced this issue Sep 9, 2024
Fixes a bug (discovered while testing separate runqueues) where direct
switch requests with timeouts do not set blocked_start_time, resulting
in re-using a prior (possibly 0) time and consequently much shorter
timeouts.

Reduces the default block_time_max parameter, as the bug was
artificially reducing many instances and with the accurate timeout the
existing defaults were no longer well-tuned.

Updates unit tests.

Issue: #6938
@derekbruening
Copy link
Contributor Author

The plan is to implement separate runqueues per output. While testing the separate runqueue implementation I have found and fixed several bugs that affect the old global runqueue scheme as well. I'm going to include them as sub-parts of this issue:

  • Direct switch migrations were not being added to the migration stat counter
  • Direct switches were not setting blocked_start_time, resulting in too-short timeouts
  • Quantum time left was underflowing in certain conditions where preempt and voluntary switch triggers coincided

Testing the separate runqueues has led to a number of other changes I'd like to make:

  • Add a single simulator time scale option and have all the other options not be unitless (forcing the simulator to set every single one) but instead in simulated microsconds; apply to the existing quantum duration and block time max and possibly scale
  • Exiting early: xref Add scheduler exit-early feature to avoid long tail with sparse activity #6959 but not just when all are unscheduled
  • Retry using a counter instead of wall-clock time for instr quanta to reduce nondeterminism and make analyzers behavior more like simulators, simplifying testing

derekbruening added a commit that referenced this issue Sep 9, 2024
Fix bug underflowing quantum time when trying to correct overshoot but
the quantum just expired and so is 0; or, we're replaying.
Adds an assert.

A test will come in the forthcoming rebalance test for per-output
runqueues, which is where this bug was discovered.

Issue: #6938
derekbruening added a commit that referenced this issue Sep 9, 2024
Replaces a comment with an official compiler annotation for a switch
case fallthrough, to fix a warning under some compilers as part of
running scheduler_unit_tests internally.

Issue: #6938
derekbruening added a commit that referenced this issue Sep 10, 2024
Replaces a comment with an official compiler annotation for a switch
case fallthrough, to fix a warning under some compilers as part of
running scheduler_unit_tests internally.

Issue: #6938
derekbruening added a commit that referenced this issue Sep 10, 2024
Direct thread switches were not being counted as migrations in the
original migration stats in PR #6950: we address that here. A test with
multiple outputs will be added later as part of the move to separate
runqueues, which is where this bug was discovered.

Issue: #6938
derekbruening added a commit that referenced this issue Oct 3, 2024
Adds a -verbose 1 dump of the scheduler options at startup.  This
helps to record what options were passed in a particular run.

Issue: #6938
derekbruening added a commit that referenced this issue Oct 3, 2024
Adds a -verbose 1 dump of the scheduler options at startup. This helps
to record what options were passed in a particular run.

Issue: #6938
derekbruening added a commit that referenced this issue Oct 3, 2024
Improves diagnostics by augmenting the all-runqueue printing:

+ It now constructs its many-line string in memory and then prints it
all at once, to make it more atomic.

+ It includes the remaining blocked times for blocked inputs.

+ It is moved from pop_from_ready_queue() where the popped input is in
flux to pick_next_input() where the current running input is valid.

+ It is printed more frequently.

Also prints the size of the unscheduled queue when moving it.

Issue: #6938
derekbruening added a commit that referenced this issue Oct 4, 2024
Adds a -verbose 1 dump of the scheduler options at startup. This helps
to record what options were passed in a particular run.

Issue: #6938
derekbruening added a commit that referenced this issue Oct 4, 2024
Improves diagnostics by augmenting the all-runqueue printing:

+ It now constructs its many-line string in memory and then prints it
all at once, to make it more atomic.

+ It includes the remaining blocked times for blocked inputs.

+ It is moved from pop_from_ready_queue() where the popped input is in
flux to pick_next_input() where the current running input is valid.

+ It is printed more frequently.

Also prints the size of the unscheduled queue when moving it.

Issue: #6938
derekbruening added a commit that referenced this issue Oct 11, 2024
Previously, a never-executed input could be moved to another output at
any time, yet was still counted as a "migration".  We change that here
to consider a never-executed input to have executed at the initial
simulation time seen on an output, so it will not be migrated until
that threshold is met.  An exception is the very first rebalance at
init time for the initial allocation of inputs to outputs when inputs
can be freely moved; this does not count as a migration.

Adds a unit test.

Issue: #6938
derekbruening added a commit that referenced this issue Oct 11, 2024
Previously, a never-executed input could be moved to another output at
any time, yet was still counted as a "migration". We change that here to
consider a never-executed input to have executed at the initial
simulation time seen on an output, so it will not be migrated until that
threshold is met. An exception is the very first rebalance at init time
for the initial allocation of inputs to outputs when inputs can be
freely moved; this does not count as a migration.

Adds a unit test.

Issue: #6938
derekbruening added a commit that referenced this issue Oct 15, 2024
Fixes a race in the drmemtrace scheduler by making
output_info_t.initial_cur_time atomic.

Tested on ThreadSanitizer on the internal test where this was first
reported.

Issue: #6938
derekbruening added a commit that referenced this issue Oct 16, 2024
Fixes a race in the drmemtrace scheduler by making
output_info_t.initial_cur_time atomic.

Tested on ThreadSanitizer on the internal test where this was first
reported.

Issue: #6938
@derekbruening
Copy link
Contributor Author

The scheduler was refactored to use per-core runqueues. Closing this.

derekbruening added a commit that referenced this issue Oct 22, 2024
Add a missing check that we're doing a dynamic schedule before trying
to steal work when idle.  There don't seem to be any consequences from
the missing check but it is best to have it for clarity.

Issue: #6938
derekbruening added a commit that referenced this issue Oct 23, 2024
Add a missing check that we're doing a dynamic schedule before trying to
steal work when idle. There don't seem to be any consequences from the
missing check but it is best to have it for clarity.

Issue: #6938
derekbruening added a commit that referenced this issue Oct 25, 2024
Adds two statistics reported by the scheduler to the schedule_stats
report: work steals and rebalances.

Tests that something is printed out, but checks of the actual values
are left to existing scheduler tests.

Issue: #6938
derekbruening added a commit that referenced this issue Oct 25, 2024
While the schedule_stats tool already reports the migration count from
the scheduler, that is only non-zero for a dynamic schedule, resulting
in 0 for core-sharded-on-disk traces.  We add "observed migrations"
where schedule_stats looks for migrations on its own.  This requires a
global lock on each context switch, but experimental results show that
this does not seem to cause noticeable overhead.

Adds some testing.

Issue: #6938
derekbruening added a commit that referenced this issue Oct 28, 2024
Adds two statistics reported by the scheduler to the schedule_stats
report: work steals and rebalances.

Tests that something is printed out, but checks of the actual values are
left to existing scheduler tests.

Issue: #6938
derekbruening added a commit that referenced this issue Oct 28, 2024
While the schedule_stats tool already reports the migration count from
the scheduler, that is only non-zero for a dynamic schedule, resulting
in 0 for core-sharded-on-disk traces. We add "observed migrations" where
schedule_stats looks for migrations on its own. This requires a global
lock on each context switch, but experimental results show that this
does not seem to cause noticeable overhead.

Adds some testing.

Issue: #6938
derekbruening added a commit that referenced this issue Oct 28, 2024
Fixes an assert on the new observed_migrations stat added to
schedule_stats in PR #7057.  These observed_migrations are counted on
the destination core, while the scheduler reports migrations away from
a source core: so they can differ, causing the assert to fire.  Fixed
by moving it to only check the aggregated stats across all cores.

Tested on the internal trace where the assert fired before.

Issue: #6938
derekbruening added a commit that referenced this issue Oct 29, 2024
Fixes an assert on the new observed_migrations stat added to
schedule_stats in PR #7057. These observed_migrations are counted on the
destination core, while the scheduler reports migrations away from a
source core: so they can differ, causing the assert to fire. Fixed by
moving it to only check the aggregated stats across all cores.

Tested on the internal trace where the assert fired before.

Issue: #6938
derekbruening added a commit that referenced this issue Oct 29, 2024
schedule_stats_t::get_total_counts() was not including
scheduler-provided stats, as it was doing its own simple aggregation
instead of calling aggregate_results().  We fix that here.  That then
triggers the newly added assert from PR #7057 which checks for the
scheduler-provided value being exactly equal meaning there is no data
available.  It fires on the schedule_stats_test, which uses a mock
stream which returns -1 for such a stat, so we end up with a negative
value.  We update the assert for that condition.

Issue: #6938
derekbruening added a commit that referenced this issue Oct 29, 2024
…7060)

schedule_stats_t::get_total_counts() was not including
scheduler-provided stats, as it was doing its own simple aggregation
instead of calling aggregate_results(). We fix that here. That then
triggers the newly added assert from PR #7057 which checks for the
scheduler-provided value being exactly equal meaning there is no data
available. It fires on the schedule_stats_test, which uses a mock stream
which returns -1 for such a stat, so we end up with a negative value. We
update the assert for that condition.

Issue: #6938
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant