move chain/solo back to local-worker, not xsnap #3403

warner · 2021-06-24T01:36:11Z

What is the Problem Being Solved?

We discovered late last night that current trunk takes an infeasibly long time to restart (after 12 hours of one cycle per minute AMM/vault loadgen). Just the first vat (vat-bank) took over 15min to get to the 80% transcript replay point, and had slowed down considerably during that time.

The current trunk is slower because we force a GC sweep at the end of every delivery. The hope was that this would be more than made up for by the speedup gained by having fewer objects around. However we also discovered some unexpected behavior from the XS garbage collector that appears to inhibit collection for most of our code. We're still trying to characterize that, but our standard await E(A).foo(B) seems to prevent both A and B from being GCed, even after foo() has finished.

Our immediate workaround is to move the chain back to local workers, because the Node.js/V8 garbage collector does appear to drop objects when we expect them to. This won't get us metering (so runaway contracts could take down the entire chain), and all vats will live inside a single shared process. This will magnify the consequences of forcing a GC sweep at the end of every delivery, because the scope of that sweep will be all vats (plus the kernel), not just the objects within a single vat.

In testing this, we discovered another problem, which prevented replay from working roughly 60% of the time. One of the linear transcript files (the "stream store" portion of swing-store-lmdb) was spontaneously closed before replay had finished reading from it. We haven't been able to find the culprit. Our workaround is to replace the stream store entirely with a new implementation based on SQLite (#3402).

The task for this ticket is to modify our two config files to make both the chain and the ag-solo use defaultManagerType: 'local' instead of xs-worker.

Once we figure out the GC-inhibition problem, we'll switch back to xs-worker.

cc @michaelfig

The text was updated successfully, but these errors were encountered:

closes #3403

warner added enhancement New feature or request cosmic-swingset package: cosmic-swingset labels Jun 24, 2021

warner added this to the Testnet: Stress Test Phase milestone Jun 24, 2021

warner self-assigned this Jun 24, 2021

warner added a commit that referenced this issue Jun 24, 2021

fix: use 'local' worker, not xsnap, on both solo and chain

e93a3d4

closes #3403

warner mentioned this issue Jun 24, 2021

fix: use 'local' worker, not xsnap, on both solo and chain #3404

Merged

warner added a commit that referenced this issue Jun 24, 2021

fix: use 'local' worker, not xsnap, on both solo and chain

887496f

closes #3403

warner closed this as completed in #3404 Jun 24, 2021

warner added a commit that referenced this issue Jun 24, 2021

fix: use 'local' worker, not xsnap, on both solo and chain

a061a3e

closes #3403

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

move chain/solo back to local-worker, not xsnap #3403

move chain/solo back to local-worker, not xsnap #3403

warner commented Jun 24, 2021

move chain/solo back to local-worker, not xsnap #3403

move chain/solo back to local-worker, not xsnap #3403

Comments

warner commented Jun 24, 2021

What is the Problem Being Solved?