Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move chain/solo back to local-worker, not xsnap #3403

Closed
warner opened this issue Jun 24, 2021 · 0 comments · Fixed by #3404
Closed

move chain/solo back to local-worker, not xsnap #3403

warner opened this issue Jun 24, 2021 · 0 comments · Fixed by #3404
Assignees
Labels
cosmic-swingset package: cosmic-swingset enhancement New feature or request

Comments

@warner
Copy link
Member

warner commented Jun 24, 2021

What is the Problem Being Solved?

We discovered late last night that current trunk takes an infeasibly long time to restart (after 12 hours of one cycle per minute AMM/vault loadgen). Just the first vat (vat-bank) took over 15min to get to the 80% transcript replay point, and had slowed down considerably during that time.

The current trunk is slower because we force a GC sweep at the end of every delivery. The hope was that this would be more than made up for by the speedup gained by having fewer objects around. However we also discovered some unexpected behavior from the XS garbage collector that appears to inhibit collection for most of our code. We're still trying to characterize that, but our standard await E(A).foo(B) seems to prevent both A and B from being GCed, even after foo() has finished.

Our immediate workaround is to move the chain back to local workers, because the Node.js/V8 garbage collector does appear to drop objects when we expect them to. This won't get us metering (so runaway contracts could take down the entire chain), and all vats will live inside a single shared process. This will magnify the consequences of forcing a GC sweep at the end of every delivery, because the scope of that sweep will be all vats (plus the kernel), not just the objects within a single vat.

In testing this, we discovered another problem, which prevented replay from working roughly 60% of the time. One of the linear transcript files (the "stream store" portion of swing-store-lmdb) was spontaneously closed before replay had finished reading from it. We haven't been able to find the culprit. Our workaround is to replace the stream store entirely with a new implementation based on SQLite (#3402).

The task for this ticket is to modify our two config files to make both the chain and the ag-solo use defaultManagerType: 'local' instead of xs-worker.

Once we figure out the GC-inhibition problem, we'll switch back to xs-worker.

cc @michaelfig

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cosmic-swingset package: cosmic-swingset enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant