Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: make Store events independent of schema #1222

Closed
alvrs opened this issue Aug 1, 2023 · 5 comments · Fixed by #1354
Closed

Proposal: make Store events independent of schema #1222

alvrs opened this issue Aug 1, 2023 · 5 comments · Fixed by #1354

Comments

@alvrs
Copy link
Member

alvrs commented Aug 1, 2023

Current situation

Store events:

event StoreSetRecord(bytes32 tableId, bytes32[] key, bytes data);
event StoreSetField(bytes32 tableId, bytes32[] key, uint8 schemaIndex, bytes data);
event StoreDeleteRecord(bytes32 tableId, bytes32[] key);
event StoreEphemeralRecord(bytes32 table, bytes32[] key, bytes data);
  • StoreSetRecord is independent of schema
  • StoreSetField only contains data of a single field and needs context of the schema to know how to update the record
  • StoreDeleteRecord is independent of schema
  • StoreEphemeralRecord is independent of schema. I will ignore this event for the rest of this proposal because it is handled identical to StoreSetRecord.
  • We are missing events for updating partial data of a field (eg StoreSetPartialField). This is relevant for updating dynamic fields (eg pushing an element to an array). Currently we need to load the entire field from storage on chain, to update it and then to be able to emit it as part of the StoreSetField event. Instead we should have an event that accepts partial field data and sufficient context for indexers / clients to be able to update the dynamic field.
  • Since StoreSetRecord and StoreDeleteRecord are independent of the schema, they can be processed in parallel: a simple event indexer can load these events from all blocks in a block range in parallel and insert them into a database including a reference to the block in which they occurred. In a second step this list of events can be filtered to only include the last event corresponding to each record.
  • With StoreSetField we can still sync all events in parallel, but the filter step becomes much more complex: we only care about the StoreSetField events after the last StoreSetRecord event for a given record, but to apply these events to compute the final record value we need context on the record’s schema (because the event includes a schemaIndex to indicate which field was updated). The table’s schema is stored in the schema table, and is emitted as a StoreSetRecord event before the first event of the new table. In other words, processing the StoreSetField event depends on other events earlier in the event stream.

Proposal

Change the StoreSetField event to a more pure StoreSpliceRecord event

  • If we change the StoreSetField event to include sufficient context to update the record without context of the schema, processing StoreSetField events becomes much simpler.
event StoreSpliceRecord(bytes32 tableId, bytes32[] key, uint40 start, uint40 deleteCount, bytes data)
  • Leaning on JavaScript’s Array.splice we can define an event to insert/update/delete data in the record in a single atomic operation.
  • tableId and key behave as before: they uniquely identify a record
  • start is the byte index at which to change the record
    • if start >= record.length no data will be deleted, but the data passed as last argument will be appended at the end of the record
  • deleteCount is the number of bytes to delete starting from start
  • data is the data to be inserted after start
  • With this event it is possible for an indexer to sync each record’s raw bytes value without context of the schema. Without the schema the record is just a blob of bytes and can’t be interpreted, but the interpretation can be a thin layer on top of a highly performant bytes indexer.
  • In the extreme case it would be possible to use StoreSpiceRecord for all operations and remove StoreSetRecord (and potentially even StoreDeleteRecord). However I think it’s still useful to have an explicit notion of setting a full record and deleting a record, because it makes it easier to reduce a list of events into the current record value, because all events before a StoreSetRecord or StoreDeleteRecord for a given record can be ignored.
  • StoreSpliceRecord could also be used for updating a partial field instead of an explicit StoreSetPartialField
  • Downside of this approach would be that it’s slightly more expensive to update a record’s value on an indexer/client that decodes the records. Previously the StoreSetField event explicitly indicated which field to update, while StoreSpliceRecord would require those clients to first encode the current record, then update the blob of bytes and finally decode the record again. StoreSpliceRecord has no guarantee to only touch a single field, so it would be way more complex to compute which field the byte index corresponds to to only update this field. This also means it’s more complex to know which field was updated (you’d have to compare the decoded previous record with the decoded record after the update), but our current libraries only indicate which record was updated as well.

Implementation

  • We currently emit StoreSetField in StoreCore's setField, pushToField, popFromField and updateInField methods.
  • setField
    1. Compute start from schemaIndex by computing the sum of all field lengths before the schema index (_getStaticDataOffset for static fields, full static data length + 32B packed counter + sum of all dynamic fields up to the field to update for dynamic fields)
    2. Compute deleteCount (static byte length for static fields, dynamic length for dynamic fields)
    3. Pass data as it is
  • pushToField
    1. Compute start (full static length + 32B packed counter + lengths of all dynamic fields including the field to push to)
    2. Set deleteCount to 0
    3. Pass data as it is
  • popFromField
    1. Compute start (full static length + 32B packed counter + lengths of all dynamic fields including the field to push to - byteLengthToPop)
    2. Set deleteCount to byteLengthToPop
    3. Pass new bytes(0) as data
  • updateInField
    1. Compute start (full static length + 32B packed counter + length of all dynamic fields up to the field to push to + startByteIndex
    2. Set deleteCount to dataToSet.length
    3. Pass dataToSet as data
  • Main downside: we need one additional sload to get the current length of all dynamic fields to compute start

Additional thoughts

  • Should we remove the concept of schema type from Store and instead just use byte length directly (and have schema as an additional “metadata” table)?
  • Should we change the Store methods to spliceRecord instead of setField, pushToField, popFromField, updateInField?
    • Probably more gas intensive because less specialized
    • Moving complexity outside of StoreCore and inside codegen libraries / consumer
    • → probably not
@dk1a
Copy link
Contributor

dk1a commented Aug 1, 2023

Should we change the Store methods to spliceRecord instead of setField, pushToField, popFromField, updateInField?
Probably more gas intensive because less specialized

yeah I tried this when making the specialized methods, a naive splice implementation would cost more gas

Moving complexity outside of StoreCore and inside codegen libraries / consumer

we could change StoreCore methods to accommodate libraries doing all the work, at which point there won't be much left of StoreCore (it still has to handle dynamic lengths?) and it would be closer to directly mirroring Storage's store(uint256 storagePointer, uint256 offset, bytes memory data). It can't work with schema events, but it can work with generic byte splice events

With StoreSetField we can still sync all events in parallel, but the filter step becomes much more complex: we only care about the StoreSetField events after the last StoreSetRecord event for a given record, but to apply these events to compute the final record value we need context on the record’s schema (because the event includes a schemaIndex to indicate which field was updated)

if we go with schema, we could instead go further and make everything a field. So remove the notion of record, and setRecord emits a bunch of SetField. Then we can sync all field events in parallel (like we can parallelize SetRecord right now)

(both ideas are dubious:
Getting rid of StoreCore likely puts too much burden on codegen, which is harder to test.
Haven't thought much about removing records, just mentioning it)

@alvrs
Copy link
Member Author

alvrs commented Aug 1, 2023

we could change StoreCore methods to accommodate libraries doing all the work, at which point there won't be much left of StoreCore (it still has to handle dynamic lengths?) and it would be closer to directly mirroring Storage's store(uint256 storagePointer, uint256 offset, bytes memory data). It can't work with schema events, but it can work with generic byte splice events

So you're saying there are other libraries around store that handle how the low level store methods are used? I guess that would mean the IStore interface and by extension IWorld would also only have those lower level functions and most of the complexity would be in the table libraries (either in codegen or in helper libraries)?

if we go with schema, we could instead go further and make everything a field. So remove the notion of record, and setRecord emits a bunch of SetField. Then we can sync all field events in parallel (like we can parallelize SetRecord right now)

Interesting idea, let's spin it further to see what it would look like. We could have events like

event StoreSetField(bytes32 table, bytes32[] key, uint8 schemaIndex, bytes data) // Exactly how it is currently
event StoreSpliceField(bytes32 table, bytes32[] key, uint8 schemaIndex, uint40 start, uint40 deleteCount, bytes data) // partially update a field

Advantages

  • much less work to decode (because data already has the right length and just needs to be casted into the right type for static fields, or split up and casted for dynamic fields)
  • can be indexed without context of the schema (StoreSpliceField contains enough info to update the field)

Disadvantages

  • More events and more data overhead in the events (approximately +1422 gas per field event, see https://www.evm.codes/?fork=shanghai with 32 bytes for table, 64+ bytes for key, 1 byte for schemaIndex, 32+ bytes for data)
  • I think we'd still have to load the dynamic data length for StoreSpliceField's start

@alvrs
Copy link
Member Author

alvrs commented Aug 1, 2023

@holic just had a good point: we only need the dynamic data length when setting dynamic fields (if it's a static field we can compute it from the schema), and we already read and write the dynamic data length for setField, pushToField and popFromField, so we could actually compute the byteOffset without additional overhead for storage access. (Only overhead would be the math.) The only function where we currently don't care about the dynamic data length but would need it to compute byteLength is updateInField.

@dk1a
Copy link
Contributor

dk1a commented Aug 1, 2023

So you're saying there are other libraries around store that handle how the low level store methods are used?

I meant the autogenerated table libraries, separating StoreCore into several libs isn't really related to this

@alvrs
Copy link
Member Author

alvrs commented Aug 21, 2023

After more consideration it seems like a worthwhile tradeoff to replace StoreSetField with StoreSpliceRecord to make all events independent of the Schema, which enables a global, high performance schema-less indexer. In combination with #1336 schemas then move to the end of the data pipeline: they are stored in the internal Tables table, but are not used onchain or by the global indexer, only by schema-full "re-indexers" and clients.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants