You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We discovered late last night that current trunk takes an infeasibly long time to restart (after 12 hours of one cycle per minute AMM/vault loadgen). Just the first vat (vat-bank) took over 15min to get to the 80% transcript replay point, and had slowed down considerably during that time.
The current trunk is slower because we force a GC sweep at the end of every delivery. The hope was that this would be more than made up for by the speedup gained by having fewer objects around. However we also discovered some unexpected behavior from the XS garbage collector that appears to inhibit collection for most of our code. We're still trying to characterize that, but our standard await E(A).foo(B) seems to prevent both A and B from being GCed, even after foo() has finished.
Our immediate workaround is to move the chain back to local workers, because the Node.js/V8 garbage collector does appear to drop objects when we expect them to. This won't get us metering (so runaway contracts could take down the entire chain), and all vats will live inside a single shared process. This will magnify the consequences of forcing a GC sweep at the end of every delivery, because the scope of that sweep will be all vats (plus the kernel), not just the objects within a single vat.
In testing this, we discovered another problem, which prevented replay from working roughly 60% of the time. One of the linear transcript files (the "stream store" portion of swing-store-lmdb) was spontaneously closed before replay had finished reading from it. We haven't been able to find the culprit. Our workaround is to replace the stream store entirely with a new implementation based on SQLite (#3402).
The task for this ticket is to modify our two config files to make both the chain and the ag-solo use defaultManagerType: 'local' instead of xs-worker.
Once we figure out the GC-inhibition problem, we'll switch back to xs-worker.
What is the Problem Being Solved?
We discovered late last night that current trunk takes an infeasibly long time to restart (after 12 hours of one cycle per minute AMM/vault loadgen). Just the first vat (vat-bank) took over 15min to get to the 80% transcript replay point, and had slowed down considerably during that time.
The current trunk is slower because we force a GC sweep at the end of every delivery. The hope was that this would be more than made up for by the speedup gained by having fewer objects around. However we also discovered some unexpected behavior from the XS garbage collector that appears to inhibit collection for most of our code. We're still trying to characterize that, but our standard
await E(A).foo(B)
seems to prevent both A and B from being GCed, even afterfoo()
has finished.Our immediate workaround is to move the chain back to
local
workers, because the Node.js/V8 garbage collector does appear to drop objects when we expect them to. This won't get us metering (so runaway contracts could take down the entire chain), and all vats will live inside a single shared process. This will magnify the consequences of forcing a GC sweep at the end of every delivery, because the scope of that sweep will be all vats (plus the kernel), not just the objects within a single vat.In testing this, we discovered another problem, which prevented replay from working roughly 60% of the time. One of the linear transcript files (the "stream store" portion of
swing-store-lmdb
) was spontaneously closed before replay had finished reading from it. We haven't been able to find the culprit. Our workaround is to replace the stream store entirely with a new implementation based on SQLite (#3402).The task for this ticket is to modify our two config files to make both the chain and the ag-solo use
defaultManagerType: 'local'
instead ofxs-worker
.Once we figure out the GC-inhibition problem, we'll switch back to
xs-worker
.cc @michaelfig
The text was updated successfully, but these errors were encountered: