-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rewrite VOM LRU cache #7138
rewrite VOM LRU cache #7138
Conversation
cc @FUDCo |
Datadog ReportBranch report: ❌ ❌ Failed Tests (384)
|
Yeah, I think I reintroduced the main GC sensitivity (the one on reanimating Representatives). I want to fix this by makine |
Actually, I believe that sensitivity was pre-existing, see #7142. If so, this branch is just as problematic, but at least it's a lot simpler. |
0a29ba3
to
9264d6b
Compare
7d562a0
to
ef539c1
Compare
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #7138 +/- ##
==========================================
- Coverage 71.00% 70.72% -0.28%
==========================================
Files 450 449 -1
Lines 86390 85575 -815
Branches 3 3
==========================================
- Hits 61338 60526 -812
+ Misses 24984 24983 -1
+ Partials 68 66 -2
|
Note: this changes the VOM, but not the collection manager (which still has two sources of GC sensitivity). I'll address those in a separate PR. Also note that for some reason, this change increases the sensitivity of the real-GC tests to Node.js and AVA being weird (#5575), specifically in @erights Please note that I'm changing |
Datadog ReportBranch report: ❌ ❌ Failed Tests (6)
|
ef539c1
to
2ed0d9e
Compare
2ed0d9e
to
0a1c34e
Compare
03c779e
to
b01ac26
Compare
b24d868
to
4a83063
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial review, but the three files still to go are the ones with all the substance:
- packages/swingset-liveslots/test/test-liveslots-mock-gc.js
- packages/swingset-liveslots/test/test-liveslots-real-gc.js
- packages/swingset-liveslots/src/virtualObjectManager.js
packages/swingset-liveslots/test/virtual-objects/test-cross-facet.js
Outdated
Show resolved
Hide resolved
/** | ||
* @typedef { (representative: any) => any } ContextProvider | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about narrowing this?
/** | |
* @typedef { (representative: any) => any } ContextProvider | |
*/ | |
/** | |
* @typedef {{self: any, state: any}} ClassContext | |
* @typedef {{self: any, facets: Record<string, any>}} ClassKitContext | |
* @typedef {(representative: any) => ClassContext} ContextProvider | |
* @typedef {Record<facetName: string, (facet: any) => ClassKitContext>} ContextProviderKit | |
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, also note: the change this comment is referring to has been landed already, as part of #7304 , so further improvements to the type shouldn't affect this PR (unless swingset is doing something wrong)
* This cache is part of the virtual object manager and is not intended to be | ||
* used independently; it is exported only for the benefit of test code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really not already have a suitable preexisting implementation of this? I wonder if it merits a ticket/TODO for e.g. merging with makeReadCachingStorage
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really hesitant to share an implementation with other customers whose needs may diverge, or which might provoke the package that hosts the cache to add new dependencies (I'd want to put the cache in its own package, with no additional dependencies). Also I think the "hold until flushed" behavior might not be useful in as many other contexts as the more traditional LRU.
That said, makeReadCachingStorage
(in cosmic-swingset
) looks like an 80% match. The overhead of a new package feels like it would outweigh the savings made by sharing this with the other cache, but maybe it'd make sense.
packages/swingset-liveslots/test/virtual-objects/test-empty-data.js
Outdated
Show resolved
Hide resolved
packages/swingset-liveslots/test/virtual-objects/test-virtualObjectManager.js
Outdated
Show resolved
Hide resolved
packages/swingset-liveslots/test/virtual-objects/test-virtualObjectManager.js
Outdated
Show resolved
Hide resolved
packages/swingset-liveslots/test/virtual-objects/test-vo-real-gc.js
Outdated
Show resolved
Hide resolved
ae15780
to
9c3b8f5
Compare
4a83063
to
c645b4a
Compare
c645b4a
to
54797ea
Compare
two known tests still failing:
|
Be aware of #7286 , which I think is a complementary optimization. But I'm not sure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The overall structure of virtualObjectManager.js makes sense and seems good, but I would like some more complete commentary. Everything else is just suggested tweaks to code organization and/or messaging.
I haven't yet reviewed test-liveslots-{mock,real}-gc.js, but am pausing this for now to work on liveslots export/representative visibility and #7083.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I got nuthin' but nit picks.
@@ -26,12 +26,9 @@ test('cache', t => { | |||
|
|||
// new reads pass through immediately to backing store | |||
t.is(c.get('key0'), 'value0'); | |||
t.deepEqual(log.splice(0), [['read', 'key0']]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
splice
is a confusing operation that I have to look up the docs for each time I use it. I'm wondering if it would be clearer to have a helper function that consumes the head of the log and returns it (which of course would be written with splice
, but only in one place). Seems like we have something like that in other tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, see #7138 (comment)
I like
splice
for supporting the ability to make assertions about a batch of entries rather than one at a time, and really this should be an external test helper where the implementation is explained inline but never really considered at call sites because it Just Works as expected (above suggestion updated with that in mind)./** * Clear the contents of a log array and assert that they match expectations. * * @param {import('ava').ExecutionContext} t * @param {unknown[]} log * @param {unknown[]} expected * @returns {void} */ const clearAndAssertLog = (t, log, expected) => { // `log.splice(0)` removes elements starting at index 0 (i.e., every element) // and returns them in an array. const entries = log.splice(0); t.deepEqual(entries, expected); }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, two votes for a helper, I'll start using one in the future
4513d7e
to
7ca461c
Compare
608857b
to
14af542
Compare
14af542
to
9beef4a
Compare
This cache accumulates all data until an explicit flush, at which point it writes back all dirty state, and forgets all state. This avoids the somewhat unpredictable behavior of an LRU cache, which caused us problems with GC sensitivity. It suffers from the potential for larger RAM usage between flushes, but is quite simple, and yields easy-to-debug syscall behavior. We have other caches in our platform, but liveslots needs something with these specific properties, and I don't know what the needs in the rest of the platform are, so I'm fine with not sharing this implementation with other packages, at least for now. refs #6693
This overhauls the VOM, to drastically simplify the creation of virtual-object Representatives, and remove the syscall sensitivity to GC. The old code created `state` (with accessors for each state property) and `context` (which holds both `state` and `self` or the `facets` cohort) each time it needed to create a Representative, which is every time a virtual/durable vref is deserialized but a Representative was not present in the slotToVal table (thus sensitive to GC). This was partially a holdover from the time when Representatives were behavior methods closing over `state`. Since then, we moved to a model where Representatives are empty objects, with a special prototype that holds the behavior methods. We use `defendPrototype` from `exo-tools.js` to build this prototype, which wraps each behavior method with code that looks the Representative up in a "context map" to find the `context` object. This reduces memory usage: we share the prototype object among all instances of the Kind, and the Representatives have no own properties. It also enables the key improvement of this commit: we can defer creating `context` until userspace actually invokes a method on the Representative. We replace the context map with a `contextProvider`: a function that takes the Representative and looks up (or creates) a context. Since userspace invocation of methods is unaffected by GC, calls to the contextProvider are too, making it a safe place to perform syscalls (such as the vatstoreGet needed to learn the state properties and create the `state` accessors). The cache ensures that we create the context at most once per delivery. We use two instances of the cache: one for `context` (including `state`), and a second for the actual state data. For each baseref, the state data cache holds the full `capdatas` record (separate capdata for each property name). It also holds a sparsely/lazily-populated `valueMap`, a Map of unserialized values. These are made available to `state` getters. When modifying a property on `state`, we immediately serialize it and hold it in the cache. We also perform refcount delta management at that time. Later, during flush(), we reassemble the capdatas (some modified, some pristine) into the value of a `syscall.vatstoreSet`. We defer unserialization until a getter is called, which reduces the constraints on userspace behavior, and making virtual objects behave more like virtual collections. We remove `virtualObjectCacheSize` from vat options, since the new cache size is unlimited, and thus not configurable. This changes the `slotToVal` mapping to hold the cohort *record*, where previously it held an Array. This is for multi-facet Kinds, whose list of named facets is learned during `defineKindMulti`. Each Kind has a fixed list of facet names, so each facet has both a name and an index. Userspace receives these as a record, with one property per facet. Internally, we need the index to encode into the vref, e.g. `o+11/5:1` means the second (0-indexed) facet of the 5th (1-indexed) instance of the Kind with KindID 11. We also use the index when looking up the export status, which is recorded in one vatstore key per baseref, with one character per facet, so 'rrn' means the first two facets are reachable, and the third is unreachable. Previously, during construction or reanimation, we also created an *Array* of facets, and recorded them in the slotToVal Map (under the baseref as a key). When doing `convertSlotToVal()`, we'd extract the baseref from the vref, use it to fetch the array, then use the integer facet index to extract the specific facet. This complicated the construction process, because we needed to create two separate containers for the set of facets, which could have separate lifetimes. We needed a third object (named `keepAlive`) to help maintain the cyclic GC-reference subgraph containing all facets and the user-visible facet record. With this commit, we discard this extra Array, and store the facet record itself in slotToVal. `convertSlotToVal()` asks the VRM to convert the facet index (extracted from the vref) into a facet name, and then looks it up in the slotToVal record. When registering a cohort, `registerValue` iterates through the facet names to make sure we register each separate facet in the reverse-pointing valToSlot table. This gets rid of `toRegister`, `toExpose`, and `keepAlive`.
* remove virtualObjectCacheSize from options * change expectations about when vatstoreSet calls happen * add some tests * fix test-cross-facet to expect a different error message
for some reason, these recent changes have caused 20% of 'yarn test' runs to fail, in test-virtualObjectGC.js, in varying test functions. This happens even when I run just test-virtualObjectGC.js, with --serial, --no-worker-threads, --concurrency 1. The symptom is consistent with GC not successfully releasing the target object. We never figured out why one engineGC() call seemed to help, but apparently three calls work even better.
* remove cacheSize / virtualObjectCacheSize from options * update convertSlotToVal to expect facets object, not array
This affects both static-vat creation options in `config.vats.NAME.creationOptions.virtualObjectCacheSize`, the dynamic-vat options to `E(vatAdminSvc).createVat()`, and the vatAdmin `changeOptions()` facility.
I saw a test failure here, "historical inaccuracy in replay", which might have been due to GC happening one way in the original, and a different way in the replay (when running under Node.js). This feels like an aspect of #5575, and this test isn't trying to exercise anything about GC, so I'm just going to set defaultReapInterval to 'never' to inhibit BOYD, which should remove the problem.
9beef4a
to
99522ea
Compare
rewrite virtual object manager to use the new cache
This overhauls the VOM, to drastically simplify the creation of
virtual-object Representatives, and remove the syscall sensitivity to
GC.
The old code created
state
(with accessors for each state property)and
context
(which holds bothstate
andself
or thefacets
cohort) each time it needed to create a Representative, which is every
time a virtual/durable vref is deserialized but a Representative was
not present in the slotToVal table (thus sensitive to GC). This was
partially a holdover from the time when Representatives were behavior
methods closing over
state
.Since then, we moved to a model where Representatives are empty
objects, with a special prototype that holds the behavior methods. We
use
defendPrototype
fromexo-tools.js
to build this prototype,which wraps each behavior method with code that looks the
Representative up in a "context map" to find the
context
object. This reduces memory usage: we share the prototype object among
all instances of the Kind, and the Representatives have no own
properties.
It also enables the key improvement of this commit: we can defer
creating
context
until userspace actually invokes a method on theRepresentative. We replace the context map with a
contextProvider
: afunction that takes the Representative and looks up (or creates) a
context. Since userspace invocation of methods is unaffected by GC,
calls to the contextProvider are too, making it a safe place to
perform syscalls (such as the vatstoreGet needed to learn the state
properties and create the
state
accessors).The cache ensures that we create the context at most once per
delivery.
We use two instances of the cache: one for
context
(includingstate
), and a second for the actual state data.We remove
virtualObjectCacheSize
from vat options, since the newcache does not have a configurable size limit.