Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store stateShape in VOM metadata #7338

Closed
mhofman opened this issue Apr 5, 2023 · 3 comments · Fixed by #7365
Closed

Store stateShape in VOM metadata #7338

mhofman opened this issue Apr 5, 2023 · 3 comments · Fixed by #7365
Assignees
Labels
enhancement New feature or request liveslots requires vat-upgrade to deploy changes SwingSet package: SwingSet
Milestone

Comments

@mhofman
Copy link
Member

mhofman commented Apr 5, 2023

What is the Problem Being Solved?

To enforce compatibility of stateShape checks when upgrading (#7337), we need to persist the stateShape across upgrades. Currently the pattern is closed over by the kind definition, and thus lost.

Description of the Design

Add stateShape to the durable metadata already saved about Virtual Objects (keyShape, valueShape).

@mhofman mhofman added enhancement New feature or request SwingSet package: SwingSet liveslots requires vat-upgrade to deploy changes labels Apr 5, 2023
@warner warner self-assigned this Apr 5, 2023
@warner warner added this to the Vaults EVP milestone Apr 5, 2023
@warner
Copy link
Member

warner commented Apr 7, 2023

Lemme review the basics, to understand what's really required here.

Virtual/Durable -object Kinds have stateShape. Virtual/Durable collections have keyShape and valueShape.

stateShape is one-per-Kind, and Kinds are low-cardinality (unlike keyShape/valueShape, which are one-per-collection, where collections are high cardinality). So we don't need to evict stateShape from RAM to meet our memory goals (we do need to evict keyShape/valueShape from RAM to meet our goals).

Virtual Kinds don't outlive the incarnation, nor do their instances, so their stateShape doesn't need to be remembered into the next incarnation.

Durable Kinds are redefined at the start of each incarnation (and if buildRootObject fails to redefine one, the upgrade is flunked). stateShape is an argument to defineDurableKind, not createKindHandle, so it can change each time, hence the #7337 concern about incompatibility.

I just finished a PR (#7334) to make sure keyShape/valueShape vrefs are refcounted properly, In conjunction with that, we require that durable collections have durable shape patterns, otherwise a future incarnation won't be able to reanimate the shapes.

It would make sense to enforce a similar requirement for stateShape, but:

  • the new incarnation must supply a new shape, it doesn't have the option of continuing to use any saved one
    • so we don't need to reanimate the pattern, at least for ordinary operational purposes
  • if the pattern doesn't use an M.or, any saved state will contain every object that appeared in the pattern
    • so the saved state will keep those objects alive, independent of any refcounts maintained by the pattern itself
    • if it does use M.or, then the state might match a branch that doesn't hold exact objects, so the pattern might be the only thing keeping those objects around

Now, if we want the new incarnation's defineDurableKind({ newStateShape }) to perform some sort of compatibility check, we need to reanimate the old shape and pass both to a checker function. Which means we need refcounts on the objects in the shape, so the reanimation can succeed. If the checker approves and we replace the recorded stateShape, then we want to do our usual vrm.updateReferenceCounts(oldSlots, newSlots) to release the old objects and add new ones. We can imagine a migration of stateShape from:

  • v1: { field: BLDBrand }
  • v2: M.or({ field: BLDBrand }, { field: OtherBrand })
  • v3: { field: OtherBrand }

such that instances created under v1 are still readable in v2. Some migration process rewrites all those instances during v2, so that by the time we upgrade to v3, all the records are compatible with the new constraint. Our refcounts would hold onto both instances during v2, then drops BLDBrand when v3 does its defineDurableKind.

But if we want compression too, then we need to know which shape was used for each individual instance. We could keep them all, in the DurableKindDescriptor record, indexed by incarnation number. And we'd add a createdByIncarnation: NN field to each virtual object record, somehow.

Or (as @erights suggested) we calculate a slower-moving counter that only increments when we observe the the shape changing (which might reduce the rate data-upgrade calls we must make). Or, defineDurableKind examines the complete history of shapes and deduces the transition points, creating a surjective Map of "if the instance says incarnation X, put it through the following ugprader functions to bring it up to date", where many consecutive values of X get the same treatment.

Since we're thinking about data-migration processes here, we might also want some per-incarnation counters on the Kind, to remember how many objects we have of each incarnation, to schedule background migration work. Or an accomplishment tracker that remembers "all objects with ID below this value have been upgraded".

@warner
Copy link
Member

warner commented Apr 8, 2023

Some more thoughts:

  • without compression, a vref-bearing stateShape (along with key/value shapes) is really a recognizer: you can think of a shape-constrained Kind as a sort of WeakSet, where the predicate "is object X in the Set?" is expressed with:
const initData = obj => ({ obj });
const behavior = {};
function createPredicate(objectX) {
  const stateShape = M.or(objectX, 0);
  const makeFoo = defineDurableKind(handle, initData, behavior);
  const isX = harden(specimen => {
    try {
      makeFoo(specimen); // drop any successful result
      return true;
    } catch (e) {
      return false; // assume matches() failed
    });
  return isX;
}

We immediately drop any successfully-created instances, so the object is never serialized long enough to reach the database (modulo BOYD timing), so that data never holds a strong reference to the object. Technically, we could implement this to have the stateShape hold merely a weak reference, but it's really really not worth the effort. I'm content to have the shape hold a strong reference.

Note that there is no way to retrieve the stateShape from a Kind: it is a write-only (and write-once -per-incarnation) property of the Kind.

  • with compression, it's almost a kind of degenerate immutable one-entry container:
const initData = obj => ({ obj });
const behavior = {
  get: ({ state }) => state.obj,
};
function hold(objectX) {
  const stateShape = objectX;
  const makeFoo = defineDurableKind(handle, initData, behavior);
  const foo = makeFoo(objectX);
  const get = harden(() => foo.get();
  return get;
}

The serialized instance data, in the DB, does not hold the object: it gets compressed out. Only the stored stateShape holds it.

Again, technically, we could implement this in a way that had startShape held a weak reference (what it really needs is a recognizer, a predicate to decide whether the supplied state matches the constraint, and that can be implemented with a WeakSet or WeakRef, without keeping the constraint objects alive strongly). But again it's not worth the effort.

Since Kinds are low-cardinality, it's also not worth the effort to allow any Presences or Representatives in stateShape to be evicted from RAM. We'll continue to have stateShape be held in a strong Map, populated at defineKind time, and held for the rest of the incarnation.

The Plan

For durable Kinds, we'll serialize stateShape and increment refcounts on any objects therein. This means we can deserialize the stateShape in the next incarnation, even though we'll always be supplied with a replacement (because stateShape is an option on defineDurableKind rather than on makeKindHandle, and defineDurableKind is called once-per-incarnation, and must be called for every durable Kind or else buildRootObject fails).

By serializing it, we'll be able to write a comparison/compatibility-checking function to apply in the future. By incrementing refcounts, we'll be able to give that function unserialized Patterns instead of asking it to work on serialized data.

If the shape is unconditional (no M.or), then we'll get one refcount increment for every instance of the Kind, because every instance must include that object. If the shape is conditional we may or may not get a refcount increment, depending upon whether the M.or branch with the object was matched, or the branch without. This is independent of whether compression is implemented or not.

TODO: we might consider not incrementing refcounts on the stateShape-mandated "constant" vrefs for every instance of the Kind, under the theory that the single reference from the serialized stateShape is sufficient to keep those objects alive. This would save some time during state changes. However, we'd still need to increment refcounts of any non-constant vrefs.

For now, we'll store this serialized stateShape in a key named stateShape in the KindDescriptor, next to { kindID, tag, nextInstanceID }. We'll continue to not include any versioning information on each VO record (vatstoreSet('vom.o+d12/1', data), where data is a JSON-encoded record of propname: valueCapdata entries).

Next, when we introduce compression, we can leave the structures alone, and just change the serialization/deserialization code to refer to the stored shape when decompressing.

Around that time, we can introduce versioned records. The KindDescriptor can continue to use .stateShape to refer to "version 0", but we add a new .stateShapes that is either an array or a record, with numerically-versioned shapes. We might also add a flag to each shape, or in a per-version metadata record, to indicate whether compression is in use or not.

The VOM data records must change: our current record-of-capdatas structure has completely consumed the space of possible records (we don't forbid any property names, even an empty string, so there's no room left for metadata). So we'll change them to an array, of [ metadata, propertyCapdata ], and metadata.version can be used to lookup the particular shape, like kindDescriptor.stateShapes[metadata.version], or perhaps { stateShape, compressed } = kindDescriptor.versions[metadata.version].

In this period, upgrades must not change the shape, or at least they must not change the order of the stored fields. But knowing the shape of each version, we might be able to accomodate shape changes somehow.

The KindDescriptor also holds the next instance ID, so it's a bit too "hot" to hold large cold data like state shapes. So we might want to split it into separate vatstore keys, a hot counter and a cold metadata one. Given the disruption of that, we might want to do that now, rather than later, even if we don't need the extra metadata for a while yet.

@warner
Copy link
Member

warner commented Apr 9, 2023

Some migration process rewrites all those instances during v2, so that by the time we upgrade to v3, all the records are compatible with the new constraint

To be clear, I'm also ok with lazy migration, or best-effort migration that may not complete ugprading v1 to v2 by the time we switch to v3.

It sounds like @erights and @mhofman are thinking that userspace will be obligated to handle state data from any previous version, i.e. super-lazy migration. We'll need to provide userspace with some hint to let it know which era the data was from. This might be a numeric version ID, or some other string. I had been thinking of an integer that is incremented each time we redefine the Kind (once per incarnation, so basically the incarnation number), or maybe once each time the stateShape changes, but that would be awkward to tell userspace about (an extra return value from defineDurableKind? eww). @mhofman had a better idea:

When defining the kind you provide 3 things: state shape, shape id, state migration function. When an object of the old shape is accessed the first time, the migration function is called with old state and corresponding shape id.l, and must return new state that checks against new shape.

We could include a constraint that if defineDurableKind is called with a "shape ID" that already exists, then the provided stateShape must be identical. Otherwise, userspace is free to use whatever number or string it likes.

When calling makeState, we unpack the state capdata record, create state (with getters and setters from the record's property names), then we examine its recorded shape ID. If that does not match the current incarnation's ID, we call the userspace migration function with that getters/setters object as oldState, and the shape ID / version, and we expect back a newState, which is examined just like the initialData object passed into makeFoo(). We serialize and record the contents (marked as using the current shape ID), then build a new state with the new getters/setters (with possibly different property names), and finally build a context around the new/correct state. So contexts remain immutable, and behavior methods never see an out-of-date state.

My one naming quibble is that the interpretation of the state data might change even though the shape does not (imagine changing a timestamp from milliseconds to seconds, without also changing the property name). So schemaVersion would feel better to me than shapeVersion.

As written, there's no support for letting userspace authors know when it's safe to stop supporting a given old version. It might be nice to give them a histogram of how many instances are using which versions of the data. This could either be done in-line (maintain some durable counters, add an API to retrieve them), or offline (publish a tool that crawls the DB and produces a report). The offline form would have less impact on the code, of course, and it's not clear that the data is needed by vat code, when it's the author of that vat code who needs to act upon it.

warner added a commit that referenced this issue Apr 9, 2023
Save a serialized copy of the Kind's `stateShape` option, so future
incarnations can compare the old one against newer ones when they
re-define the Kind. Increment refcounts on any objects included in the
shape. Forbid the use of non-durable objects in the shape.

closes #7338
refs #7337
warner added a commit that referenced this issue Apr 10, 2023
Save a serialized copy of the Kind's `stateShape` option, so future
incarnations can compare the old one against newer ones when they
re-define the Kind. Increment refcounts on any objects included in the
shape. Forbid the use of non-durable objects in the shape.

closes #7338
refs #7337
warner added a commit that referenced this issue Apr 10, 2023
Save a serialized copy of the Kind's `stateShape` option, so future
incarnations can compare the old one against newer ones when they
re-define the Kind. Increment refcounts on any objects included in the
shape. Forbid the use of non-durable objects in the shape.

closes #7338
refs #7337
warner added a commit that referenced this issue Apr 10, 2023
Save a serialized copy of the Kind's `stateShape` option, so future
incarnations can compare the old one against newer ones when they
re-define the Kind. Increment refcounts on any objects included in the
shape. Forbid the use of non-durable objects in the shape.

closes #7338
refs #7337
warner added a commit that referenced this issue Apr 11, 2023
Save a serialized copy of the Kind's `stateShape` option, so future
incarnations can compare the old one against newer ones when they
re-define the Kind. Increment refcounts on any objects included in the
shape. Forbid the use of non-durable objects in the shape.

closes #7338
refs #7337
warner added a commit that referenced this issue Apr 12, 2023
Save a serialized copy of the Kind's `stateShape` option, so future
incarnations can compare the old one against newer ones when they
re-define the Kind. Increment refcounts on any objects included in the
shape. Forbid the use of non-durable objects in the shape.

closes #7338
refs #7337
warner added a commit that referenced this issue Apr 12, 2023
Save a serialized copy of the Kind's `stateShape` option, so future
incarnations can compare the old one against newer ones when they
re-define the Kind. Increment refcounts on any objects included in the
shape. Forbid the use of non-durable objects in the shape.

closes #7338
refs #7337
warner added a commit that referenced this issue Apr 12, 2023
Save a serialized copy of the Kind's `stateShape` option, so future
incarnations can compare the old one against newer ones when they
re-define the Kind. Increment refcounts on any objects included in the
shape. Forbid the use of non-durable objects in the shape.

closes #7338
refs #7337
turadg pushed a commit that referenced this issue Apr 12, 2023
Save a serialized copy of the Kind's `stateShape` option, so future
incarnations can compare the old one against newer ones when they
re-define the Kind. Increment refcounts on any objects included in the
shape. Forbid the use of non-durable objects in the shape.

closes #7338
refs #7337
@mergify mergify bot closed this as completed in #7365 Apr 13, 2023
mergify bot pushed a commit that referenced this issue Apr 13, 2023
Save a serialized copy of the Kind's `stateShape` option, so future
incarnations can compare the old one against newer ones when they
re-define the Kind. Increment refcounts on any objects included in the
shape. Forbid the use of non-durable objects in the shape.

closes #7338
refs #7337
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request liveslots requires vat-upgrade to deploy changes SwingSet package: SwingSet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants