Proposal: make Store events independent of schema #1222

alvrs · 2023-08-01T11:02:59Z

Current situation

Store events:

event StoreSetRecord(bytes32 tableId, bytes32[] key, bytes data);
event StoreSetField(bytes32 tableId, bytes32[] key, uint8 schemaIndex, bytes data);
event StoreDeleteRecord(bytes32 tableId, bytes32[] key);
event StoreEphemeralRecord(bytes32 table, bytes32[] key, bytes data);

StoreSetRecord is independent of schema
StoreSetField only contains data of a single field and needs context of the schema to know how to update the record
StoreDeleteRecord is independent of schema
StoreEphemeralRecord is independent of schema. I will ignore this event for the rest of this proposal because it is handled identical to StoreSetRecord.
We are missing events for updating partial data of a field (eg StoreSetPartialField). This is relevant for updating dynamic fields (eg pushing an element to an array). Currently we need to load the entire field from storage on chain, to update it and then to be able to emit it as part of the StoreSetField event. Instead we should have an event that accepts partial field data and sufficient context for indexers / clients to be able to update the dynamic field.
Since StoreSetRecord and StoreDeleteRecord are independent of the schema, they can be processed in parallel: a simple event indexer can load these events from all blocks in a block range in parallel and insert them into a database including a reference to the block in which they occurred. In a second step this list of events can be filtered to only include the last event corresponding to each record.
With StoreSetField we can still sync all events in parallel, but the filter step becomes much more complex: we only care about the StoreSetField events after the last StoreSetRecord event for a given record, but to apply these events to compute the final record value we need context on the record’s schema (because the event includes a schemaIndex to indicate which field was updated). The table’s schema is stored in the schema table, and is emitted as a StoreSetRecord event before the first event of the new table. In other words, processing the StoreSetField event depends on other events earlier in the event stream.

Proposal

Change the `StoreSetField` event to a more pure `StoreSpliceRecord` event

If we change the StoreSetField event to include sufficient context to update the record without context of the schema, processing StoreSetField events becomes much simpler.

event StoreSpliceRecord(bytes32 tableId, bytes32[] key, uint40 start, uint40 deleteCount, bytes data)

Leaning on JavaScript’s Array.splice we can define an event to insert/update/delete data in the record in a single atomic operation.
tableId and key behave as before: they uniquely identify a record
start is the byte index at which to change the record
- if start >= record.length no data will be deleted, but the data passed as last argument will be appended at the end of the record
deleteCount is the number of bytes to delete starting from start
data is the data to be inserted after start
With this event it is possible for an indexer to sync each record’s raw bytes value without context of the schema. Without the schema the record is just a blob of bytes and can’t be interpreted, but the interpretation can be a thin layer on top of a highly performant bytes indexer.
In the extreme case it would be possible to use StoreSpiceRecord for all operations and remove StoreSetRecord (and potentially even StoreDeleteRecord). However I think it’s still useful to have an explicit notion of setting a full record and deleting a record, because it makes it easier to reduce a list of events into the current record value, because all events before a StoreSetRecord or StoreDeleteRecord for a given record can be ignored.
StoreSpliceRecord could also be used for updating a partial field instead of an explicit StoreSetPartialField
Downside of this approach would be that it’s slightly more expensive to update a record’s value on an indexer/client that decodes the records. Previously the StoreSetField event explicitly indicated which field to update, while StoreSpliceRecord would require those clients to first encode the current record, then update the blob of bytes and finally decode the record again. StoreSpliceRecord has no guarantee to only touch a single field, so it would be way more complex to compute which field the byte index corresponds to to only update this field. This also means it’s more complex to know which field was updated (you’d have to compare the decoded previous record with the decoded record after the update), but our current libraries only indicate which record was updated as well.

Implementation

We currently emit StoreSetField in StoreCore's setField, pushToField, popFromField and updateInField methods.
setField
1. Compute start from schemaIndex by computing the sum of all field lengths before the schema index (_getStaticDataOffset for static fields, full static data length + 32B packed counter + sum of all dynamic fields up to the field to update for dynamic fields)
2. Compute deleteCount (static byte length for static fields, dynamic length for dynamic fields)
3. Pass data as it is
pushToField
1. Compute start (full static length + 32B packed counter + lengths of all dynamic fields including the field to push to)
2. Set deleteCount to 0
3. Pass data as it is
popFromField
1. Compute start (full static length + 32B packed counter + lengths of all dynamic fields including the field to push to - byteLengthToPop)
2. Set deleteCount to byteLengthToPop
3. Pass new bytes(0) as data
updateInField
1. Compute start (full static length + 32B packed counter + length of all dynamic fields up to the field to push to + startByteIndex
2. Set deleteCount to dataToSet.length
3. Pass dataToSet as data
Main downside: we need one additional sload to get the current length of all dynamic fields to compute start

Additional thoughts

Should we remove the concept of schema type from Store and instead just use byte length directly (and have schema as an additional “metadata” table)?
- It might improve gas cost because we don’t need to convert the schema type into the byte length
- See feat: replace Schema with FieldLayout for contract internals #1336 and Replace schema types with static data length #1326
Should we change the Store methods to spliceRecord instead of setField, pushToField, popFromField, updateInField?
- Probably more gas intensive because less specialized
- Moving complexity outside of StoreCore and inside codegen libraries / consumer
- → probably not

The text was updated successfully, but these errors were encountered:

dk1a · 2023-08-01T11:45:30Z

Should we change the Store methods to spliceRecord instead of setField, pushToField, popFromField, updateInField?
Probably more gas intensive because less specialized

yeah I tried this when making the specialized methods, a naive splice implementation would cost more gas

Moving complexity outside of StoreCore and inside codegen libraries / consumer

we could change StoreCore methods to accommodate libraries doing all the work, at which point there won't be much left of StoreCore (it still has to handle dynamic lengths?) and it would be closer to directly mirroring Storage's store(uint256 storagePointer, uint256 offset, bytes memory data). It can't work with schema events, but it can work with generic byte splice events

With StoreSetField we can still sync all events in parallel, but the filter step becomes much more complex: we only care about the StoreSetField events after the last StoreSetRecord event for a given record, but to apply these events to compute the final record value we need context on the record’s schema (because the event includes a schemaIndex to indicate which field was updated)

if we go with schema, we could instead go further and make everything a field. So remove the notion of record, and setRecord emits a bunch of SetField. Then we can sync all field events in parallel (like we can parallelize SetRecord right now)

(both ideas are dubious:
Getting rid of StoreCore likely puts too much burden on codegen, which is harder to test.
Haven't thought much about removing records, just mentioning it)

alvrs · 2023-08-01T16:33:41Z

we could change StoreCore methods to accommodate libraries doing all the work, at which point there won't be much left of StoreCore (it still has to handle dynamic lengths?) and it would be closer to directly mirroring Storage's store(uint256 storagePointer, uint256 offset, bytes memory data). It can't work with schema events, but it can work with generic byte splice events

So you're saying there are other libraries around store that handle how the low level store methods are used? I guess that would mean the IStore interface and by extension IWorld would also only have those lower level functions and most of the complexity would be in the table libraries (either in codegen or in helper libraries)?

if we go with schema, we could instead go further and make everything a field. So remove the notion of record, and setRecord emits a bunch of SetField. Then we can sync all field events in parallel (like we can parallelize SetRecord right now)

Interesting idea, let's spin it further to see what it would look like. We could have events like

event StoreSetField(bytes32 table, bytes32[] key, uint8 schemaIndex, bytes data) // Exactly how it is currently
event StoreSpliceField(bytes32 table, bytes32[] key, uint8 schemaIndex, uint40 start, uint40 deleteCount, bytes data) // partially update a field

Advantages

much less work to decode (because data already has the right length and just needs to be casted into the right type for static fields, or split up and casted for dynamic fields)
can be indexed without context of the schema (StoreSpliceField contains enough info to update the field)

Disadvantages

More events and more data overhead in the events (approximately +1422 gas per field event, see https://www.evm.codes/?fork=shanghai with 32 bytes for table, 64+ bytes for key, 1 byte for schemaIndex, 32+ bytes for data)
I think we'd still have to load the dynamic data length for StoreSpliceField's start

alvrs · 2023-08-01T17:31:54Z

@holic just had a good point: we only need the dynamic data length when setting dynamic fields (if it's a static field we can compute it from the schema), and we already read and write the dynamic data length for setField, pushToField and popFromField, so we could actually compute the byteOffset without additional overhead for storage access. (Only overhead would be the math.) The only function where we currently don't care about the dynamic data length but would need it to compute byteLength is updateInField.

dk1a · 2023-08-01T18:42:22Z

So you're saying there are other libraries around store that handle how the low level store methods are used?

I meant the autogenerated table libraries, separating StoreCore into several libs isn't really related to this

alvrs · 2023-08-21T11:24:09Z

After more consideration it seems like a worthwhile tradeoff to replace StoreSetField with StoreSpliceRecord to make all events independent of the Schema, which enables a global, high performance schema-less indexer. In combination with #1336 schemas then move to the end of the data pipeline: they are stored in the internal Tables table, but are not used onchain or by the global indexer, only by schema-full "re-indexers" and clients.

holic mentioned this issue Aug 7, 2023

Add missing events to Store (for setting partial fields) and review existing events #1121

Closed

dk1a mentioned this issue Aug 16, 2023

feat(world, store): stop loading schema from storage, require schema as an argument #1174

Merged

dk1a mentioned this issue Aug 22, 2023

feat(store): add splice events #1354

Merged

10 tasks

holic closed this as completed in #1354 Sep 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: make Store events independent of schema #1222

Proposal: make Store events independent of schema #1222

alvrs commented Aug 1, 2023 •

edited

Loading

dk1a commented Aug 1, 2023 •

edited

Loading

alvrs commented Aug 1, 2023

alvrs commented Aug 1, 2023

dk1a commented Aug 1, 2023

alvrs commented Aug 21, 2023

Proposal: make Store events independent of schema #1222

Proposal: make Store events independent of schema #1222

Comments

alvrs commented Aug 1, 2023 • edited Loading

Current situation

Proposal

Change the StoreSetField event to a more pure StoreSpliceRecord event

Implementation

Additional thoughts

dk1a commented Aug 1, 2023 • edited Loading

alvrs commented Aug 1, 2023

alvrs commented Aug 1, 2023

dk1a commented Aug 1, 2023

alvrs commented Aug 21, 2023

alvrs commented Aug 1, 2023 •

edited

Loading

Change the `StoreSetField` event to a more pure `StoreSpliceRecord` event

dk1a commented Aug 1, 2023 •

edited

Loading