Epic: in-process off chain indexing #20352

tac0turtle · 2024-05-11T07:08:25Z

Summary

Indexing data from a chain allows teams to build complex front ends that are not limited based on the nodes performance. We have seen data teams spend countless hours building complex systems allowing them to build front ends.

State streaming is a good step towards allowing teams to build off chain indexes. It has its limitations. State streaming is not a first class citizen forcing off chain actors to need to decode data. This leads to complex software being built.

lastly the state machine is creating countless more writes which are needed for querying. This increases the amount of io a state machine does. In order to reduce over head, create a more performant state machine it should only hold the state needed for going to the next block. Extra information for queries should be handled with a in process off chain indexer.

This epic proposes changes to the state machine and the creation of an in process off chain indexer allowing users to build more complex applications without being prohibited by maintaining complex pieces of software.

The feature should have a plugin based system allowing teams to extend the indexing functionality to create a richer schema than the default which will be offered by the cosmos sdk team.

There are a few things to be aware of. The state machine has a differentiation between deleted data and pruned data. Deleted data refers to the removal of data due to an action. Pruning of data within in the state machine refers to data that is not needed for the state machine to continue and is removed but it is useful for users to know this information later on.

Problem Definition

Indexing of state events and blocks is a complex process with countless steps needed in order to get enough information to build complex applications.

state streaming is not a first class citizen within the software forcing users to decode the data received.

the state machine is storing more data than it needs to due to queries. Reducing h to e amount of data the state machine stores allows the state machine to have less io there fore be more performant.

Work Breakdown

BaseApp/Server integration

Complete main/0.52 indexer BaseApp Integration #21526
[Feature]: Consider using schema/appdata event model in core/app #21312
server/v2 Indexer Integration #21527
[Feature]: Define the appropriate json structure of BlockHeader and Tx in the listener operations of server/v2 #22009
Indexer config and implement the listener constructor in baseapp and server/v2 #22217
integrate with legacy SDK versions as desired - 0.50, 0.47, etc

`collections` Integration

natively implement HasSchemaCodec with each collections KeyCodec and ValueCodec in both collections and the SDK. There is a fallback implementation if HasSchemaCodec isn't implemented but we should avoid relying on that
support schema.ReferenceTypes being returned in collections/codec.SchemaCodec (basically we just need to add a ReferencedTypes []schema.ReferenceType field and then integrate that into Schema.ModuleCodec)
filter out secondary indexes (there is a Collection.isSecondaryIndex method but we're not implementing it anywhere)

Module Integration

Every module should have schema.HasModuleCodec implemented starting with:

x/bank
x/staking
x/gov
x/distribution

Non-collections Modules

orm integration
x/group custom ORM integration

Events

support event, tx, and block data simulation in cosmossdk.io/schema/testing/appdatasim
support event, tx and block data indexing in Postgres

Indexing Framework Support

The indexer.Start method has missing support for

catch up sync
filtering

@aaronc has unmerged code for the above in the aaronc/indexer-manager-impl branch (only tests are missing)

Migration Support

Some of this stuff could be considered phase 2.

cosmossdk.io/schema/diff allows diffing of different versions of schemas. In Postgres we should add support for:

whenever InitializeModuleData is called we should save the JSON of the schema in a postgres table for that module if we are seeing the module for the first time, otherwise retrieve the existing schema and compare it with the new one for changes - initially we should reject changes
in the second phase, we should do ALTER TABLE statements for all compatible changes in the diff from one schema version to the next
the schema.ModuleCodec type should have some field which allows specifying indexer specific options as either interface{} or string
Postgres should specify some indexer specific options to run custom migrations on startup and to use the migrations to override cases where the schema diff has incompatible changes

Phase 2

A mirror of cosmossdk.io/schema is being built in Rust in ixc_schema. Some additional types are being added there, proposed on the golang side in #21482. Once that work is further along we will want to make sure the Rust and Golang schema packages have parity. There will need a native schema wire format for indexing crosslang modules (also proposed in #21482). We are also talking about a native proto -> schema mapping on the Rust side which we may (or may not) also want to port over to go. We have also talked about and specified a native JSON encoding which could be used for genesis or even signing.

parity with Rust ixc_schema (additional Struct, OneOf, etc. types)
binary wire encoding
JSON encoding
proto encoding

The text was updated successfully, but these errors were encountered:

github-actions bot added the needs-triage Issue that needs to be triaged label May 11, 2024

tac0turtle added this to Cosmos-SDK May 11, 2024

github-project-automation bot moved this to 📋 Backlog in Cosmos-SDK May 11, 2024

tac0turtle added T: Client UX T:Epic Epics and removed needs-triage Issue that needs to be triaged labels May 11, 2024

tac0turtle mentioned this issue May 13, 2024

[Epic]: State machine needs vs Client needs #18000

Closed

2 tasks

coderabbitai bot mentioned this issue Jun 3, 2024

docs: ADR 073: Built-in Indexer #20532

Merged

12 tasks

aaronc self-assigned this Jun 5, 2024

This was referenced Jun 11, 2024

feat: indexer base types #20629

Merged

feat(indexer/base): schema and value validation #20665

Merged

tac0turtle moved this from 📋 Backlog to 🤸‍♂️ In Progress in Cosmos-SDK Jul 4, 2024

aaronc mentioned this issue Jul 10, 2024

ORM SQL indexing #11000

Closed

github-project-automation bot added this to Interchain Public Works Sep 3, 2024

tac0turtle assigned cool-develope Sep 25, 2024

cool-develope mentioned this issue Oct 23, 2024

feat(indexer): implement schema.HasModuleCodec interface in the bank module #22349

Merged

12 tasks

mergify bot mentioned this issue Oct 30, 2024

feat(indexer): implement schema.HasModuleCodec interface in the bank module (backport #22349) #22398

Merged

12 tasks

julienrbrt assigned facundomedica Nov 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Epic: in-process off chain indexing #20352

Epic: in-process off chain indexing #20352

tac0turtle commented May 11, 2024 •

edited by julienrbrt

Loading

Epic: in-process off chain indexing #20352

Epic: in-process off chain indexing #20352

Comments

tac0turtle commented May 11, 2024 • edited by julienrbrt Loading

Summary

Problem Definition

Work Breakdown

BaseApp/Server integration

collections Integration

Module Integration

Non-collections Modules

Events

Indexing Framework Support

Migration Support

Phase 2

tac0turtle commented May 11, 2024 •

edited by julienrbrt

Loading

`collections` Integration