Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stopVat is not flushing the LRU cache #6604

Closed
warner opened this issue Nov 29, 2022 · 0 comments · Fixed by #6627
Closed

stopVat is not flushing the LRU cache #6604

warner opened this issue Nov 29, 2022 · 0 comments · Fixed by #6627
Assignees
Labels
bug Something isn't working SwingSet package: SwingSet

Comments

@warner
Copy link
Member

warner commented Nov 29, 2022

Describe the bug

We traced a test failure in #6502 down to the durable object state not actually getting written to the DB. The data was added to the LRU cache at the right time, but (being a cache) it doesn't call vatstoreSet() right away. If the vat sees a lot of other activity, eventually the data will get evicted from the LRU cache, but at the very least we must ensure it gets written before the vat shuts down (e.g. for upgrade).

When a vat is upgraded, we send the old worker a stopVat(), let it complete, then replace the worker with the new version (and send a startVat to the new one). The stopVat() is supposed to flush everything (in particular, it should call the LRU's flush() function). stopVat() used to call bringOutYourDead(), which does a LRU flush(), but that BOYD call was accidentally commented out in commit 91480de (part of #5342), where we decided that the risk of crashes during upgrade was worse than the storage leak of not entirely deleting the old state data, and we disabled a lot of the deletion code.

await tools.bringOutYourDead();

The BOYD appeared in the middle of a big block of code with 6-ish other calls. We used a block comment to disable the whole section (so the diff I reviewed didn't display the BOYD call), and we didn't realize that we really needed that BOYD to flush the LRU cache. And we didn't have a test that was sensitive to this case (we have lots of upgrade tests, but either they did their best to disable the LRU cache, for simplicity, or they were using enough durable objects that the LRU cache was being cycled naturally, so we weren't completely depending upon the flush()).

The fix is to re-enable that bringOutYourDead() in the stopVat() process (during releaseOldState() in stop-vat.js).

And of course we should add a new test to catch this case. It should use the default LRU cache size (maybe 100 or so), create a vat with a single durable object, not do other activity, then perform a null upgrade. The second incarnation should attempt to access the old object.

We can mostly copy this test from @gibson042 's PR #6502 , just without the PublisherKit stuff: one vivifyDurable.. of some trivial Kind, then provide an instance.

cc @FUDCo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working SwingSet package: SwingSet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants