-
Notifications
You must be signed in to change notification settings - Fork 653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(indexer): Move state_changes
from StreamerMessage
root to IndexerShard
instead
#6255
refactor(indexer): Move state_changes
from StreamerMessage
root to IndexerShard
instead
#6255
Conversation
f10e529
to
8c9d7d6
Compare
9e694b2
to
4916e1b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
P.S. Run fmt to fix CI
@@ -0,0 +1,19 @@ | |||
[package] | |||
name = "near-indexer-primitives" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a readme explaining why this crates needs to exist, what should belong to this crate, and what shouldn't belong to this crate? "this crates holds types" doesn't answer those questions for me -- we might as well keep them in indexer, and we also might add some logic here, why not?
It seems to me that what actually happens is that "this crate defines over-the-wire schema for JSON messages used by indexer HTTP API". Ie, it seems that this is not about Rust, but rather about a network protocol/API we expose. That is, if true, a very important bit of information, and it currently is missing from the description.
🤔 wait, I think the above is not true. I've just tried removing Serialize, Deserialize
from all intexer messages, and that seems to work -- nothing in this repository seems to rely on serialization. I guess, I am just confused as to why this particular bunch of types needs special treatment and is worth a separate crate :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@matklad I'll add a little README.
We are making NEAR Lake - in a few words it's an indexer that is going to save all the data from the network to AWS S3 in JSON files. We are starting some NEAR Lake Framework which is going to read from AWS S3 and stream all the data. Both of the mentioned projects will use indexer primitives, in nearest future, there are going to be even more projects.
The first goal is to make it possible that projects like NEAR Lake Framework and the ones that use it as a dependency won't require an entire nearcore
in dependencies in order to actually use only those primitives.
And we are going to extend the number of types in indexer primitives and adjust the serde_json parameters to improve the way the types are serialized.
I hope that makes sense. Let me know if you have questions regarding this extraction to a separate crate :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this makes sense! And yeah, I want to stress that the salient point of this thing then is not the Rust types, but rather the JSON schema, and all of the related backwards compatibility concerns.
And yeah, this totally needs to be a separate crate. I wonder though if -types
or -views
would be a better public name than -primitives
. In today's usage primitives
is essentially a kitchen sink, and it defines much more than just data schema.
I also wonder if, long term, this should live in some other repository. It feels odd that this defines a data schema, but the actual data isn't used by the repo (so, it's easy for nearcore developers to break this stuff without realizing this). Contrast this with views
, which also do define the schema, but this schema is than used by the code in this repository (the node's RPC API). It seems nicer if indexer-framework
defines just the Rust API (without Serialize), and let's the consumers define their own data schema.
🤔 yeah, I guess long-term I can see two worlds:
- In one world, "indexer schema" is something defined completely outside of nearcore as a part of the lake framework. In this world, there's a separate
indexer-types
crate, but it lives in the repo which implements the actual indexer. We can also imagine two different lakes by two different providers with different schema - In another world, we say that "indexer schema" is something universal, and is as much a part of nearcore as JSON RPC API. In this world, I would expect this to be a module in the crate that defines JSON RPC API (so,
near_primitives::views::indexer
today). In this world, while there might be alternative implementations of the lake, we'd expect them to use the same schema.
But yeah, this is long-term, starting with moving this to a separate crate might make sense (but also do consider just creating a views submodule)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@matklad TL;DR: There are several things at play, and you properly identified the long-term plans we actually have.
The types defined in this crate are Rust types for Indexer Framework, which is now tightly coupled with nearcore. You don't see any use of these APIs in nearcore repo because it is designed to be a library (you can find the usage in indexer-for-* projects out there), and we want to stabilize this API in the long term in terms of Rust types and while we already have our own top-level structure, there are still quite some types used in depth (e.g. Transaction and ExecutionOutcome types). Given there is an initiative to decouple JSON RPC types #5516, we decided to postpone solving all the problems.
Second, indeed, we want to be able to serialize the types for near-lake and deserialize it on the reader side without pulling the whole nearcore (we could have get away with pulling the whole nearcore, but due to #3803 it makes things a bit harder on our side, since we often receive questions about M1 support in our tooling and we use M1 ourselves)
Third, there is a plan to further solidify the APIs and extract near-indexer into a separate repo, but to make that happen, nearcore should be published as a crate, and its APIs also stable.
unrelated, but |
417e4a4
to
f277032
Compare
130ce16
to
abf99e1
Compare
abf99e1
to
204fec7
Compare
@@ -8,7 +8,6 @@ pub use near_primitives::{types, views}; | |||
pub struct StreamerMessage { | |||
pub block: views::BlockView, | |||
pub shards: Vec<IndexerShard>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you have to worry about backwards compatibility ? is this message used in communication between two different binaries ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little bit above in this PR we have talked about it, let me know if that doesn't answer your question
chain/indexer/src/streamer/mod.rs
Outdated
receipt_execution_outcomes: vec![], | ||
state_changes: state_changes | ||
.remove(&shard_id) | ||
.expect("StateChanges for given shard should be present"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But some shards (chunks) might be missing in a block, right ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, really, my bad, thanks!
@@ -42,4 +41,5 @@ pub struct IndexerShard { | |||
pub shard_id: types::ShardId, | |||
pub chunk: Option<IndexerChunkView>, | |||
pub receipt_execution_outcomes: Vec<IndexerExecutionOutcomeWithReceipt>, | |||
pub state_changes: views::StateChangesView, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have any tests for this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had a conversation re: indexer and tests with @matklad recently
Here's the start #6195 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so - no tests ? :-(
204fec7
to
8670763
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine as is, but I'd like to note the following:
- We probably should change the underlying storage to store state changes by shard.
- @frol @khorolets we plan to stop storing state changes in the database in the future Archival nodes do not need to store trie changes for old blocks #6250. It shouldn't affect how this PR works, but just want to give you a heads up.
Thanks for the heads up @bowenwang1996! In #6250 you're saying it should be possible to retrieve StateChanges on the fly, so it means we would be able to return StateChanges from the NEAR Indexer side. cc @frol |
…o `IndexerShard` instead
Co-authored-by: Vlad Frolov <[email protected]>
Co-authored-by: Vlad Frolov <[email protected]>
bd87154
to
5dbd4ac
Compare
pub fn streamer(&self) -> mpsc::Receiver<streamer::StreamerMessage> { | ||
let (sender, receiver) = mpsc::channel(16); | ||
pub fn streamer(&self) -> mpsc::Receiver<StreamerMessage> { | ||
let (sender, receiver) = mpsc::channel(100); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why was it changed to 100? can you add a comment ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the PR description
Increase the streaming channel size from 16 to 100 (reindexing speed)
@@ -42,4 +41,5 @@ pub struct IndexerShard { | |||
pub shard_id: types::ShardId, | |||
pub chunk: Option<IndexerChunkView>, | |||
pub receipt_execution_outcomes: Vec<IndexerExecutionOutcomeWithReceipt>, | |||
pub state_changes: views::StateChangesView, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so - no tests ? :-(
indexer::streamer::types
to separate crateindexer-primitives
for publishingGetStateChangesWithCauseInBlockForTrackedShards
toclient-primitives
and correspondingview_client
handler to returnStateChangesWithCause
for each shard separatelyStreamerMessage
struct, there is nostate_changes
in the root anymore