Better represent heap cost in run policy #7373

mhofman · 2023-04-10T23:02:04Z

What is the Problem Being Solved?

The run policy is a way to limit the work done in a block to an approximate amount of wall clock time. Since we cannot deterministically use time elapsed for a chain node, we instead use proxy measures for the work performed. This currently roughly takes the shape of a computron count per completed delivery, which is a proxy for the program's computational complexity.

This however does not represent the impact that the program's heap usage has. In particular:

GC has to traverse the whole heap every time it triggers, including when forced like for bringOutYourDead
Heap snapshots have to serialize, hash and compresse the whole heap

While we attempt to hide the effects of GC (see #1872 and #3830), we know we are currently sensitive to organic gc (see #6784) at the syscall level. We also know we will never be insensitive to minor engine difference during consensus execution, as that would often result in computrons/metering differences.

Description of the Design

For each delivery, report:

the occurrence of gc, possibly with some heap size information
the number of slot and chunk allocations

Inform the run policy with:

allocation and gc information
size of heap snapshots when taken (which after snapshot / BOYD interval based on computrons #6786 should become part of the bringOutYourDead delivery)

Add bean cost metrics for the above new policy inputs.

Note that we are already reporting computron usage for the bringOutYourDead delivery.

Security Considerations

We need to ensure that the gc behavior is deterministic during consensus execution. In practice we should be, especially once forced reload from snapshot (#6943) is implemented. To avoid downstream divergences due to the run policy making different decisions, the information provided to the run policy should be included in consensus data like proposed in #6770.

During replay, the run policy is not consulted, so gc determinism is relaxed, and we should still be able to perform minor XS version upgrades that change allocation behavior.

Scaling Considerations

None I can think of

Test Plan

TBD

mhofman · 2024-11-07T22:06:39Z

#10424 would add out of consensus instrumentation. Once this becomes part of consensus, we can consider letting some measurements made by xsnap affect snapshot content.

mhofman added enhancement New feature or request SwingSet package: SwingSet cosmic-swingset package: cosmic-swingset xsnap the XS execution tool labels Apr 10, 2023

mhofman mentioned this issue Nov 7, 2024

More xsnap instrumentation #10424

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better represent heap cost in run policy #7373

Better represent heap cost in run policy #7373

mhofman commented Apr 10, 2023

mhofman commented Nov 7, 2024

Better represent heap cost in run policy #7373

Better represent heap cost in run policy #7373

Comments

mhofman commented Apr 10, 2023

What is the Problem Being Solved?

Description of the Design

Security Considerations

Scaling Considerations

Test Plan

mhofman commented Nov 7, 2024