-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: NEAR Lake Helper (high-level Lake Framework) #51
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
I don't approve just because you make this as a draft and I want to review the final version as well
//! | ||
//! ## Cost estimates | ||
//! | ||
//! **TL;DR** approximately $18.15 per month (for AWS S3 access, paid directly to AWS) for the reading of fresh blocks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I know, we have the plan to close everything with our keys and charge extra for the access.
@pkudinov could you please clarify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disregard the docs, for now, they are not updated.
I do remember the talks to serve everything with the Pagoda keys. It is not implemented on the infra side yet, meanwhile, we continue to support direct access to AWS S3 through the developers' keys. And I believe we won't disable it anyway.
block_heights.len() | ||
); | ||
|
||
start_from_block_height = *block_heights.last().unwrap() + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we can get rid of unwrap by using if let Some()
few lines earlier
} | ||
} | ||
|
||
async fn fast_fetch_block_heights( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it work faster with this? 😅
Without jokes, do we need such prefix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am going to rethink these pieces @frol has introduced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming is definitely not the best here. The intent was to fetch a list of block heights without getting blocked on the next batch. Maybe this is a better one: try_fetching_block_heights_without_waiting
async fn fast_fetch_block_heights( | ||
pending_block_heights: &mut std::pin::Pin<&mut impl tokio_stream::Stream<Item = u64>>, | ||
limit: usize, | ||
await_for_at_least_one: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
await_for_at_least_one_block
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question, I will check it out and provide doc-strings
"`prev_hash` does not match, refetching the data from S3 in 200ms", | ||
); | ||
tokio::time::sleep(std::time::Duration::from_millis(200)).await; | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure, but maybe it's better to move the inner loop (while let Some(streamer_message_result) = streamer_messages_futures.next().await {
) to the function
It will improve readability, it would be easier to find out what loop we finish here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll consider it. Thanks
use near_primitives_core::serialize::{base64_format, dec_format}; | ||
|
||
#[derive(Debug, Clone)] | ||
pub struct ExecutedReceipt { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am thinking about renaming it to just Receipt
as I also want to provide included_receipts
to the Block
and executed ones with included will differ only by the status.
I am open to propositions on how to differentiate ActionReceipts
from DataReceipts
not following the ReceiptEnumView
pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would consider making Receipt
to be an enum:
enum Receipt {
ActionReceipt(ActionReceipt),
DataReceipt(DataReceipt),
}
Alternatively, I would consider splitting them into separate vectors: action_receipts: Vec<ActionReceipt>
and data_receipts: Vec<DataReceipt>
- I believe this is the most straightforward way as I have not yet seen a case where users need to deal with a mix of action and data receipts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@frol thanks.
The idea is great. Though I would need to have something like:
.action_receipts_included()
.data_receipts_included()
.receipts_executed()
(DataReceipt
is non-executable)
Though I already have the .actions()
method that would return the collection of Action
structures which is sort of ActionReceipt
but concentrated on the action side more than on the receipt side of it. It might be confusing. I need to think a bit more about this.
Thank you for your inputs!
SuccessValue(Vec<u8>), | ||
SuccessReceiptId(CryptoHash), | ||
Failure(String), | ||
Unknown, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can someone remind me in which case ExecutionStatusView
might be Unknown
, in which case?
cc @frol @telezhnaya
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is internal nearcore status when the execution has not yet started. It is impossible to observe an execution outcome with such a status when it is included in a block. Consider removing it. Ideally, nearcore should better separate types instead of leaking impossible states to the outside world.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's almost how I thought it. I think I am going to make it Pending
in the Lake's ExecutionStatus
for Receipt
for the included yet not executed ones, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a user point of view, I would prefer to get a list of items that I can immediately deal with, and mixing included and executed receipts does not make it simpler as users will need to filter out not-yet-executed receipts. Thus, I am not supporting the idea of artificial execution status, especially given that execution status is a property of an ExecutionOutcome with logs and a result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@frol I mean I want to give users the same structure Receipt
with status = ExecutionStatus::Pending
and logs: vec![]
when they request something like block.receipts_included()
Executed ones would be achievable via block.receipts_executed()
I find it convenient from the developer's perspective. Most indexers work only with executed ones and don't care about the included unless they are executed. Makes sense to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@khorolets trying to chime in here. It threw me off in the beginning about the differentiation between block.receipts_included()
and block.receipts_executed()
.
I think block.receipts_pending()
makes more sense to me than block.receipts_included()
, if the only difference between blocks.receipts_executed()
and blocks.receipts_included()
is that receipts from .receipts_included
have not executed yet and thus are in ExecutionStatus::Pending
state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@roshaans Sorry it doesn't work. Receipts are included in the block only once (e.g. block 100) and they might be considered as pending but when you call this method in next block you won't get them there. Though they are still pending, but not included anymore and might be not executed yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@roshaans I like the direction of your thinking re "pending".
@khorolets I want to propose considering postponed_receipts
.
} | ||
|
||
#[derive(Debug, Clone)] | ||
pub struct BlockHeader { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I miss some useful data from the BlockHeaderView
wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the following important fields:
- timestamp_nanosec
- epoch_id
- next_epoch_id
- gas_price
- total_supply
- latest_protocol_version
- random_value
- chunks_included ?
- validator_proposals ?
Will there still be access to the "raw" block header?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is always access to the entire StreamerMessage
of course.
I am going to add these fields I guess
pub enum ExecutionStatus { | ||
SuccessValue(Vec<u8>), | ||
SuccessReceiptId(CryptoHash), | ||
Failure(String), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use Failure(TxExecutionError)
, from near_primitives::transaction::ExecutionStatus
instead of String
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps, but I don't like it too much. I've put String
here for now and later I'll decide what to do the TxExecutionError
@khorolets It's hard to determine what actually changed here, are you able to separate moving files and the refactor in to separate commits please? |
…ange Event structure
Since I want to prevent significant updates to the 0.7 version of the Lake Framework, I've decided to merge this PR into the My intent is to ensure the introduced Lake Primitives structures are feature-complete and won't require breaking changes right after the release. I am going to create a milestone and tasks to track the work. Stay tuned. Reach out if you have questions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This abstraction over the core-primitives was very much needed! Just left a few questions I had
.streamer_message | ||
.shards | ||
.iter() | ||
.filter_map(|shard| shard.chunk.as_ref().map(|chunk| chunk.receipts.iter())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are all receipts inside of shard.chunk receipts considered postponed receipts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Thanks! Postponed are those who have no ExecutionOutcome in this block, I'll fix it in a separate PR
} | ||
|
||
#[derive(Debug, Clone)] | ||
pub struct Action { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be useful to also include gas_price for every Action as well. I think every Action
has it.
.events() | ||
.iter() | ||
.filter_map(|event| if let Some(action) = self.action_by_receipt_id(event.related_receipt_id()) { | ||
if &action.receiver_id() == account_id || &action.signer_id() == account_id { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think should there be a distinction between events created as a result of an account_id's own actions VS events created as a result of other user's actions on a given account_id.
Perhaps it makes sense to expose two methods, events_on_account_id
where we show all events on a particular account created as a result of other user's activities. e.g if I am indexing ref finance's contract, I would be interested in events created as a result of user's interacting with the contract and not the contract mantainer's transactions on that account_id.
and then a events_by_account_id
method where we show all events where the account_id
was a signer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting idea, could you please add a suggestion to the Lake Primitives, https://pagodaplatform.atlassian.net/wiki/spaces/DAT/pages/226295830/NEAR+Lake+primitives
* feat: NEAR Lake Helper (high-level Lake Framework) * cover all top-level primitives * Update changelog * Add getters to fields in primitives, update the required function, change Event structure
* feat: NEAR Lake Helper (high-level Lake Framework) (#51) * feat: NEAR Lake Helper (high-level Lake Framework) * cover all top-level primitives * Update changelog * Add getters to fields in primitives, update the required function, change Event structure * chore: Closes #61: Changes made to the lib should be picked to 0.8.0 * fix(tests): Resolved #60: Fix failing doc tests (#66) * fix(tests): Resolved #60: Fix failing doc tests * fix: add proper code annotations where possible * feat(primitives): Add support for DelegateAction (NEP-366) + LEP-002 (Focus on action and Events) (#67) * feat(primitives): Add support for DelegateAction (NEP-336) * Refactor the structures completely (LEP-002) * (examples) Address review suggestions * remove refactoring commented leftovers * (primitives) Address review suggestion, optimize the structure, do some renaming * Return iterators from methods instead of clone, improve return types, handle tx actions * add todo for later to create dedicated structures for possible errors of Failure ExecutionStatus * DelegateAction struct, split receipts file * remove commented code * expose delegate_actions mod * feat: Expose concurrency parameter to the Lake structure (and builder) (#69) * feat: LEP-001 Introduce Lake Context (#68) * feat: Implement LEP-001 (Context) * clean up Lake impl to reuse code * Update docstrings * refactor: Resolves #42 replace anyhow with LakeError enum (thiserror) (#70) * fix: typo in README * fix: CHANGELOG typos and bring back the concurrency usage lost during merging * feat: ParentTransactionCache context and example (#74) * feat: ParentTransactionCache context and example * Update README for ParentTransactionCache about advanced usage * refactor: Add LakeContext derive and introduce a concept of LakeContext with calls before and after user function * Update lake-parent-transaction-cache/README.md Co-authored-by: Morgan McCauley <[email protected]> * address review suggestion rename field accounts_id -> account_ids * add comments to the example, add READMEs, drop accidental artifact * Update documentation, add docs about the LakeContext * Add ParentTransactionCache to the Lake Framework README * refactor Lake::run method to drop code duplication --------- Co-authored-by: Morgan McCauley <[email protected]> * docs: Lake Primitives documentation added (#75) * feat: ParentTransactionCache context and example * Update documentation, add docs about the LakeContext * docs: Lake Primitives documentation added * chore: Upgrade near primitives to 0.17.0 version (#77) * chore: Prepare 0.8.0-beta.1 release (#78) * chore: Prepare 0.8.0-beta.1 release * Change to use CUSTOM_GITHUB_TOKEN and add a link to the workflow for the reasoning * Setup versioning and add release-plz.toml * Drop the release-plz job limitation on the CI job * Add missing versions to the changelog * Add 0.7.x branch to the CI --------- Co-authored-by: Morgan McCauley <[email protected]>
This PR aims to introduce a higher level of the framework.
NOTE! the target branch for this PR is set to
dev-0.7.0
Features:
With this new version, I am trying to make the Framework more high-level, while keeping the original data and allowing developers to benefit from the low-level features it has right now.
Working indexer example
This is an NFT Indexer clone implemented on Rust and high-level Lake Framework
https://github.com/khorolets/l8kit-nft-indexer
To do:
StreamerMessage
substructures have their representation inlake-primitives