Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: NEAR Lake Helper (high-level Lake Framework) #51

Merged
merged 4 commits into from
Jan 20, 2023

Conversation

khorolets
Copy link
Member

@khorolets khorolets commented Dec 15, 2022

This PR aims to introduce a higher level of the framework.

NOTE! the target branch for this PR is set to dev-0.7.0

Features:

  • Simplified start boilerplate
  • Introduce new more straightforward and more meaningful structures
  • Make NEAR Lake Framework ready to accept 3rd party extensions (e.g. start options, tx-receipt cache, some data parser, etc.)

With this new version, I am trying to make the Framework more high-level, while keeping the original data and allowing developers to benefit from the low-level features it has right now.

Working indexer example

This is an NFT Indexer clone implemented on Rust and high-level Lake Framework

https://github.com/khorolets/l8kit-nft-indexer

To do:

  • All the StreamerMessage substructures have their representation in lake-primitives
  • Every simplified structure is accessible via getters
  • More helpers
  • Docs
  • Examples

Copy link
Contributor

@telezhnaya telezhnaya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
I don't approve just because you make this as a draft and I want to review the final version as well

//!
//! ## Cost estimates
//!
//! **TL;DR** approximately $18.15 per month (for AWS S3 access, paid directly to AWS) for the reading of fresh blocks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know, we have the plan to close everything with our keys and charge extra for the access.
@pkudinov could you please clarify?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disregard the docs, for now, they are not updated.

I do remember the talks to serve everything with the Pagoda keys. It is not implemented on the infra side yet, meanwhile, we continue to support direct access to AWS S3 through the developers' keys. And I believe we won't disable it anyway.

block_heights.len()
);

start_from_block_height = *block_heights.last().unwrap() + 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can get rid of unwrap by using if let Some() few lines earlier

}
}

async fn fast_fetch_block_heights(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it work faster with this? 😅

Without jokes, do we need such prefix?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am going to rethink these pieces @frol has introduced.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming is definitely not the best here. The intent was to fetch a list of block heights without getting blocked on the next batch. Maybe this is a better one: try_fetching_block_heights_without_waiting

async fn fast_fetch_block_heights(
pending_block_heights: &mut std::pin::Pin<&mut impl tokio_stream::Stream<Item = u64>>,
limit: usize,
await_for_at_least_one: bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

await_for_at_least_one_block?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, I will check it out and provide doc-strings

near-lake-framework/src/streamer.rs Outdated Show resolved Hide resolved
near-lake-framework/src/streamer.rs Outdated Show resolved Hide resolved
"`prev_hash` does not match, refetching the data from S3 in 200ms",
);
tokio::time::sleep(std::time::Duration::from_millis(200)).await;
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, but maybe it's better to move the inner loop (while let Some(streamer_message_result) = streamer_messages_futures.next().await {) to the function
It will improve readability, it would be easier to find out what loop we finish here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll consider it. Thanks

near-lake-primitives/src/lib.rs Outdated Show resolved Hide resolved
near-lake-primitives/src/types/events.rs Show resolved Hide resolved
use near_primitives_core::serialize::{base64_format, dec_format};

#[derive(Debug, Clone)]
pub struct ExecutedReceipt {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking about renaming it to just Receipt as I also want to provide included_receipts to the Block and executed ones with included will differ only by the status.

I am open to propositions on how to differentiate ActionReceipts from DataReceipts not following the ReceiptEnumView pattern.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider making Receipt to be an enum:

enum Receipt {
    ActionReceipt(ActionReceipt),
    DataReceipt(DataReceipt),
}

Alternatively, I would consider splitting them into separate vectors: action_receipts: Vec<ActionReceipt> and data_receipts: Vec<DataReceipt> - I believe this is the most straightforward way as I have not yet seen a case where users need to deal with a mix of action and data receipts.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@frol thanks.

The idea is great. Though I would need to have something like:

  • .action_receipts_included()
  • .data_receipts_included()
  • .receipts_executed() (DataReceipt is non-executable)

Though I already have the .actions() method that would return the collection of Action structures which is sort of ActionReceipt but concentrated on the action side more than on the receipt side of it. It might be confusing. I need to think a bit more about this.

Thank you for your inputs!

SuccessValue(Vec<u8>),
SuccessReceiptId(CryptoHash),
Failure(String),
Unknown,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can someone remind me in which case ExecutionStatusView might be Unknown, in which case?

cc @frol @telezhnaya

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is internal nearcore status when the execution has not yet started. It is impossible to observe an execution outcome with such a status when it is included in a block. Consider removing it. Ideally, nearcore should better separate types instead of leaking impossible states to the outside world.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's almost how I thought it. I think I am going to make it Pending in the Lake's ExecutionStatus for Receipt for the included yet not executed ones, what do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a user point of view, I would prefer to get a list of items that I can immediately deal with, and mixing included and executed receipts does not make it simpler as users will need to filter out not-yet-executed receipts. Thus, I am not supporting the idea of artificial execution status, especially given that execution status is a property of an ExecutionOutcome with logs and a result.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@frol I mean I want to give users the same structure Receipt with status = ExecutionStatus::Pending and logs: vec![] when they request something like block.receipts_included()

Executed ones would be achievable via block.receipts_executed() I find it convenient from the developer's perspective. Most indexers work only with executed ones and don't care about the included unless they are executed. Makes sense to you?

Copy link

@roshaans roshaans Dec 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@khorolets trying to chime in here. It threw me off in the beginning about the differentiation between block.receipts_included() and block.receipts_executed().

I think block.receipts_pending() makes more sense to me than block.receipts_included(), if the only difference between blocks.receipts_executed() and blocks.receipts_included() is that receipts from .receipts_included have not executed yet and thus are in ExecutionStatus::Pending state.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@roshaans Sorry it doesn't work. Receipts are included in the block only once (e.g. block 100) and they might be considered as pending but when you call this method in next block you won't get them there. Though they are still pending, but not included anymore and might be not executed yet.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@roshaans I like the direction of your thinking re "pending".

@khorolets I want to propose considering postponed_receipts.

}

#[derive(Debug, Clone)]
pub struct BlockHeader {
Copy link
Member Author

@khorolets khorolets Dec 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I miss some useful data from the BlockHeaderView wdyt?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the following important fields:

  • timestamp_nanosec
  • epoch_id
  • next_epoch_id
  • gas_price
  • total_supply
  • latest_protocol_version
  • random_value
  • chunks_included ?
  • validator_proposals ?

Will there still be access to the "raw" block header?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is always access to the entire StreamerMessage of course.

I am going to add these fields I guess

pub enum ExecutionStatus {
SuccessValue(Vec<u8>),
SuccessReceiptId(CryptoHash),
Failure(String),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use Failure(TxExecutionError), from near_primitives::transaction::ExecutionStatus instead of String?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps, but I don't like it too much. I've put String here for now and later I'll decide what to do the TxExecutionError

@morgsmccauley
Copy link
Contributor

morgsmccauley commented Dec 18, 2022

@khorolets It's hard to determine what actually changed here, are you able to separate moving files and the refactor in to separate commits please?

@khorolets khorolets changed the base branch from main to dev-0.7.0 December 23, 2022 14:41
@khorolets khorolets marked this pull request as ready for review January 20, 2023 09:46
@khorolets
Copy link
Member Author

Since I want to prevent significant updates to the 0.7 version of the Lake Framework, I've decided to merge this PR into the dev-0.7.0 branch and continue the work towards the 0.7 release in separate PRs.

My intent is to ensure the introduced Lake Primitives structures are feature-complete and won't require breaking changes right after the release.

I am going to create a milestone and tasks to track the work. Stay tuned. Reach out if you have questions.

@khorolets khorolets merged commit 8bcd5c5 into dev-0.7.0 Jan 20, 2023
Copy link

@roshaans roshaans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This abstraction over the core-primitives was very much needed! Just left a few questions I had

.streamer_message
.shards
.iter()
.filter_map(|shard| shard.chunk.as_ref().map(|chunk| chunk.receipts.iter()))
Copy link

@roshaans roshaans Jan 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are all receipts inside of shard.chunk receipts considered postponed receipts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Thanks! Postponed are those who have no ExecutionOutcome in this block, I'll fix it in a separate PR

}

#[derive(Debug, Clone)]
pub struct Action {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be useful to also include gas_price for every Action as well. I think every Action has it.

.events()
.iter()
.filter_map(|event| if let Some(action) = self.action_by_receipt_id(event.related_receipt_id()) {
if &action.receiver_id() == account_id || &action.signer_id() == account_id {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think should there be a distinction between events created as a result of an account_id's own actions VS events created as a result of other user's actions on a given account_id.

Perhaps it makes sense to expose two methods, events_on_account_id where we show all events on a particular account created as a result of other user's activities. e.g if I am indexing ref finance's contract, I would be interested in events created as a result of user's interacting with the contract and not the contract mantainer's transactions on that account_id.

and then a events_by_account_id method where we show all events where the account_id was a signer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting idea, could you please add a suggestion to the Lake Primitives, https://pagodaplatform.atlassian.net/wiki/spaces/DAT/pages/226295830/NEAR+Lake+primitives

khorolets added a commit that referenced this pull request Jun 6, 2023
* feat: NEAR Lake Helper (high-level Lake Framework)

* cover all top-level primitives

* Update changelog

* Add getters to fields in primitives, update the required function, change Event structure
khorolets added a commit that referenced this pull request Jun 6, 2023
* feat: NEAR Lake Helper (high-level Lake Framework) (#51)

* feat: NEAR Lake Helper (high-level Lake Framework)

* cover all top-level primitives

* Update changelog

* Add getters to fields in primitives, update the required function, change Event structure

* chore: Closes #61: Changes made to the lib should be picked to 0.8.0

* fix(tests): Resolved #60: Fix failing doc tests (#66)

* fix(tests): Resolved #60: Fix failing doc tests

* fix: add proper code annotations where possible

* feat(primitives): Add support for DelegateAction (NEP-366) + LEP-002 (Focus on action and Events) (#67)

* feat(primitives): Add support for DelegateAction (NEP-336)

* Refactor the structures completely (LEP-002)

* (examples) Address review suggestions

* remove refactoring commented leftovers

* (primitives) Address review suggestion, optimize the structure, do some renaming

* Return iterators from methods instead of clone, improve return types, handle tx actions

* add todo for later to create dedicated structures for possible errors of Failure ExecutionStatus

* DelegateAction struct, split receipts file

* remove commented code

* expose delegate_actions mod

* feat: Expose concurrency parameter to the Lake structure (and builder) (#69)

* feat: LEP-001 Introduce Lake Context (#68)

* feat: Implement LEP-001 (Context)

* clean up Lake impl to reuse code

* Update docstrings

* refactor: Resolves #42 replace anyhow with LakeError enum (thiserror) (#70)

* fix: typo in README

* fix: CHANGELOG typos and bring back the concurrency usage lost during merging

* feat: ParentTransactionCache context and example (#74)

* feat: ParentTransactionCache context and example

* Update README for ParentTransactionCache about advanced usage

* refactor: Add LakeContext derive and introduce a concept of LakeContext with calls before and after user function

* Update lake-parent-transaction-cache/README.md

Co-authored-by: Morgan McCauley <[email protected]>

* address review suggestion rename field accounts_id -> account_ids

* add comments to the example, add READMEs, drop accidental artifact

* Update documentation, add docs about the LakeContext

* Add ParentTransactionCache to the Lake Framework README

* refactor Lake::run method to drop code duplication

---------

Co-authored-by: Morgan McCauley <[email protected]>

* docs: Lake Primitives documentation added (#75)

* feat: ParentTransactionCache context and example

* Update documentation, add docs about the LakeContext

* docs: Lake Primitives documentation added

* chore: Upgrade near primitives to 0.17.0 version (#77)

* chore: Prepare 0.8.0-beta.1 release (#78)

* chore: Prepare 0.8.0-beta.1 release

* Change to use CUSTOM_GITHUB_TOKEN and add a link to the workflow for the reasoning

* Setup versioning and add release-plz.toml

* Drop the release-plz job limitation on the CI job

* Add missing versions to the changelog

* Add 0.7.x branch to the CI

---------

Co-authored-by: Morgan McCauley <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants