New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

feat: NEAR Lake Helper (high-level Lake Framework) #51

Merged

khorolets merged 4 commits into dev-0.7.0 from feat/lake-helpers

Jan 20, 2023

Member

khorolets commented Dec 15, 2022 •

edited

Loading

This PR aims to introduce a higher level of the framework.

NOTE! the target branch for this PR is set to dev-0.7.0

Features:

Simplified start boilerplate
Introduce new more straightforward and more meaningful structures
Make NEAR Lake Framework ready to accept 3rd party extensions (e.g. start options, tx-receipt cache, some data parser, etc.)

With this new version, I am trying to make the Framework more high-level, while keeping the original data and allowing developers to benefit from the low-level features it has right now.

Working indexer example

This is an NFT Indexer clone implemented on Rust and high-level Lake Framework

https://github.com/khorolets/l8kit-nft-indexer

To do:

All the StreamerMessage substructures have their representation in lake-primitives
Every simplified structure is accessible via getters
More helpers
Docs
Examples


          feat: NEAR Lake Helper (high-level Lake Framework)

3e2add1

khorolets requested review from pkudinov, telezhnaya and morgsmccauley

December 15, 2022 15:16

telezhnaya reviewed

View reviewed changes

Contributor

telezhnaya left a comment

LGTM!
I don't approve just because you make this as a draft and I want to review the final version as well

near-lake-framework/src/lib.rs

+              //!
+              //! ## Cost estimates
+              //!
+              //! **TL;DR** approximately $18.15 per month (for AWS S3 access, paid directly to AWS) for the reading of fresh blocks

Contributor

telezhnaya Dec 16, 2022

As far as I know, we have the plan to close everything with our keys and charge extra for the access.
@pkudinov could you please clarify?

Member Author

khorolets Dec 16, 2022

Disregard the docs, for now, they are not updated.

I do remember the talks to serve everything with the Pagoda keys. It is not implemented on the infra side yet, meanwhile, we continue to support direct access to AWS S3 through the developers' keys. And I believe we won't disable it anyway.

near-lake-framework/src/streamer.rs

+                                      block_heights.len()
+                                  );
+                                  start_from_block_height = *block_heights.last().unwrap() + 1;

Contributor

telezhnaya Dec 16, 2022

nit: we can get rid of unwrap by using if let Some() few lines earlier

near-lake-framework/src/streamer.rs

+                  }
+              }
+              async fn fast_fetch_block_heights(

Contributor

telezhnaya Dec 16, 2022

Does it work faster with this? 😅

Without jokes, do we need such prefix?

Member Author

khorolets Dec 16, 2022

I am going to rethink these pieces @frol has introduced.

Collaborator

frol Dec 16, 2022

The naming is definitely not the best here. The intent was to fetch a list of block heights without getting blocked on the next batch. Maybe this is a better one: try_fetching_block_heights_without_waiting

near-lake-framework/src/streamer.rs

+              async fn fast_fetch_block_heights(
+                  pending_block_heights: &mut std::pin::Pin<&mut impl tokio_stream::Stream<Item = u64>>,
+                  limit: usize,
+                  await_for_at_least_one: bool,

Contributor

telezhnaya Dec 16, 2022

await_for_at_least_one_block?

Member Author

khorolets Dec 16, 2022

Good question, I will check it out and provide doc-strings

near-lake-framework/src/streamer.rs Outdated Show resolved Hide resolved

near-lake-framework/src/streamer.rs Outdated Show resolved Hide resolved

near-lake-framework/src/streamer.rs

+                                      "`prev_hash` does not match, refetching the data from S3 in 200ms",
+                                  );
+                                  tokio::time::sleep(std::time::Duration::from_millis(200)).await;
+                                  break;

Contributor

telezhnaya Dec 16, 2022

Not sure, but maybe it's better to move the inner loop (while let Some(streamer_message_result) = streamer_messages_futures.next().await {) to the function
It will improve readability, it would be easier to find out what loop we finish here

Member Author

khorolets Dec 16, 2022

I'll consider it. Thanks

near-lake-primitives/src/lib.rs Outdated Show resolved Hide resolved

near-lake-primitives/src/types/events.rs Show resolved Hide resolved

khorolets commented

View reviewed changes

near-lake-primitives/src/types/receipts.rs Outdated

+              use near_primitives_core::serialize::{base64_format, dec_format};
+              #[derive(Debug, Clone)]
+              pub struct ExecutedReceipt {

Member Author

khorolets Dec 16, 2022

I am thinking about renaming it to just Receipt as I also want to provide included_receipts to the Block and executed ones with included will differ only by the status.

I am open to propositions on how to differentiate ActionReceipts from DataReceipts not following the ReceiptEnumView pattern.

Collaborator

frol Dec 16, 2022

I would consider making Receipt to be an enum:

enum Receipt {
    ActionReceipt(ActionReceipt),
    DataReceipt(DataReceipt),
}

Alternatively, I would consider splitting them into separate vectors: action_receipts: Vec<ActionReceipt> and data_receipts: Vec<DataReceipt> - I believe this is the most straightforward way as I have not yet seen a case where users need to deal with a mix of action and data receipts.

Member Author

khorolets Dec 16, 2022

@frol thanks.

The idea is great. Though I would need to have something like:

.action_receipts_included()
.data_receipts_included()
.receipts_executed() (DataReceipt is non-executable)

Though I already have the .actions() method that would return the collection of Action structures which is sort of ActionReceipt but concentrated on the action side more than on the receipt side of it. It might be confusing. I need to think a bit more about this.

Thank you for your inputs!

khorolets commented

View reviewed changes

near-lake-primitives/src/types/receipts.rs Outdated

+                  SuccessValue(Vec<u8>),
+                  SuccessReceiptId(CryptoHash),
+                  Failure(String),
+                  Unknown,

Member Author

khorolets Dec 16, 2022

Can someone remind me in which case ExecutionStatusView might be Unknown, in which case?

cc @frol @telezhnaya

Collaborator

frol Dec 16, 2022

It is internal nearcore status when the execution has not yet started. It is impossible to observe an execution outcome with such a status when it is included in a block. Consider removing it. Ideally, nearcore should better separate types instead of leaking impossible states to the outside world.

Member Author

khorolets Dec 16, 2022

It's almost how I thought it. I think I am going to make it Pending in the Lake's ExecutionStatus for Receipt for the included yet not executed ones, what do you think?

Collaborator

frol Dec 16, 2022

From a user point of view, I would prefer to get a list of items that I can immediately deal with, and mixing included and executed receipts does not make it simpler as users will need to filter out not-yet-executed receipts. Thus, I am not supporting the idea of artificial execution status, especially given that execution status is a property of an ExecutionOutcome with logs and a result.

Member Author

khorolets Dec 16, 2022

@frol I mean I want to give users the same structure Receipt with status = ExecutionStatus::Pending and logs: vec![] when they request something like block.receipts_included()

Executed ones would be achievable via block.receipts_executed() I find it convenient from the developer's perspective. Most indexers work only with executed ones and don't care about the included unless they are executed. Makes sense to you?

roshaans Dec 16, 2022 •

edited

Loading

@khorolets trying to chime in here. It threw me off in the beginning about the differentiation between block.receipts_included() and block.receipts_executed().

I think block.receipts_pending() makes more sense to me than block.receipts_included(), if the only difference between blocks.receipts_executed() and blocks.receipts_included() is that receipts from .receipts_included have not executed yet and thus are in ExecutionStatus::Pending state.

Member Author

khorolets Dec 16, 2022

@roshaans Sorry it doesn't work. Receipts are included in the block only once (e.g. block 100) and they might be considered as pending but when you call this method in next block you won't get them there. Though they are still pending, but not included anymore and might be not executed yet.

Collaborator

frol Dec 16, 2022

@roshaans I like the direction of your thinking re "pending".

@khorolets I want to propose considering postponed_receipts.

khorolets commented

View reviewed changes

near-lake-primitives/src/types/block.rs

+              }
+              #[derive(Debug, Clone)]
+              pub struct BlockHeader {

Member Author

khorolets Dec 16, 2022 •

edited by frol

Loading

Do I miss some useful data from the BlockHeaderView wdyt?

Collaborator

frol Dec 16, 2022

I see the following important fields:

timestamp_nanosec
epoch_id
next_epoch_id
gas_price
total_supply
latest_protocol_version
random_value
chunks_included ?
validator_proposals ?

Will there still be access to the "raw" block header?

Member Author

khorolets Dec 16, 2022

There is always access to the entire StreamerMessage of course.

I am going to add these fields I guess

roshaans reviewed

View reviewed changes

near-lake-primitives/src/types/receipts.rs

+              pub enum ExecutionStatus {
+                  SuccessValue(Vec<u8>),
+                  SuccessReceiptId(CryptoHash),
+                  Failure(String),

roshaans Dec 16, 2022

Should we use Failure(TxExecutionError), from near_primitives::transaction::ExecutionStatus instead of String?

Member Author

khorolets Dec 16, 2022

perhaps, but I don't like it too much. I've put String here for now and later I'll decide what to do the TxExecutionError

Contributor

morgsmccauley commented Dec 18, 2022 •

edited

Loading

@khorolets It's hard to determine what actually changed here, are you able to separate moving files and the refactor in to separate commits please?


          cover all top-level primitives

72d4650

khorolets changed the base branch from main to dev-0.7.0

December 23, 2022 14:41

khorolets added 2 commits

December 23, 2022 17:47


          Update changelog

30ced26


          Add getters to fields in primitives, update the required function, ch…

a2c518c

…ange Event structure

khorolets marked this pull request as ready for review

January 20, 2023 09:46

Member Author

khorolets commented Jan 20, 2023

Since I want to prevent significant updates to the 0.7 version of the Lake Framework, I've decided to merge this PR into the dev-0.7.0 branch and continue the work towards the 0.7 release in separate PRs.

My intent is to ensure the introduced Lake Primitives structures are feature-complete and won't require breaking changes right after the release.

I am going to create a milestone and tasks to track the work. Stay tuned. Reach out if you have questions.

khorolets merged commit 8bcd5c5 into dev-0.7.0

roshaans reviewed

View reviewed changes

roshaans left a comment

This abstraction over the core-primitives was very much needed! Just left a few questions I had

near-lake-primitives/src/types/block.rs

+                              .streamer_message
+                              .shards
+                              .iter()
+                              .filter_map(|shard| shard.chunk.as_ref().map(|chunk| chunk.receipts.iter()))

roshaans Jan 20, 2023 •

edited

Loading

Are all receipts inside of shard.chunk receipts considered postponed receipts?

Member Author

khorolets Jan 23, 2023

Good catch! Thanks! Postponed are those who have no ExecutionOutcome in this block, I'll fix it in a separate PR

near-lake-primitives/src/types/receipts.rs

+              }
+              #[derive(Debug, Clone)]
+              pub struct Action {

roshaans Jan 20, 2023

could be useful to also include gas_price for every Action as well. I think every Action has it.

near-lake-primitives/src/types/block.rs

+                          .events()
+                          .iter()
+                          .filter_map(|event| if let Some(action) = self.action_by_receipt_id(event.related_receipt_id()) {
+                              if &action.receiver_id() == account_id || &action.signer_id() == account_id {

roshaans Jan 20, 2023

I think should there be a distinction between events created as a result of an account_id's own actions VS events created as a result of other user's actions on a given account_id.

Perhaps it makes sense to expose two methods, events_on_account_id where we show all events on a particular account created as a result of other user's activities. e.g if I am indexing ref finance's contract, I would be interested in events created as a result of user's interacting with the contract and not the contract mantainer's transactions on that account_id.

and then a events_by_account_id method where we show all events where the account_id was a signer.

Member Author

khorolets Jan 23, 2023

Interesting idea, could you please add a suggestion to the Lake Primitives, https://pagodaplatform.atlassian.net/wiki/spaces/DAT/pages/226295830/NEAR+Lake+primitives

khorolets added a commit that referenced this pull request


          feat: NEAR Lake Helper (high-level Lake Framework) (#51)

acd89c5

* feat: NEAR Lake Helper (high-level Lake Framework)

* cover all top-level primitives

* Update changelog

* Add getters to fields in primitives, update the required function, change Event structure

khorolets added a commit that referenced this pull request


          0.8.0 High-level update (#79)

67fede7

* feat: NEAR Lake Helper (high-level Lake Framework) (#51)

* feat: NEAR Lake Helper (high-level Lake Framework)

* cover all top-level primitives

* Update changelog

* Add getters to fields in primitives, update the required function, change Event structure

* chore: Closes #61: Changes made to the lib should be picked to 0.8.0

* fix(tests): Resolved #60: Fix failing doc tests (#66)

* fix(tests): Resolved #60: Fix failing doc tests

* fix: add proper code annotations where possible

* feat(primitives): Add support for DelegateAction (NEP-366) + LEP-002 (Focus on action and Events) (#67)

* feat(primitives): Add support for DelegateAction (NEP-336)

* Refactor the structures completely (LEP-002)

* (examples) Address review suggestions

* remove refactoring commented leftovers

* (primitives) Address review suggestion, optimize the structure, do some renaming

* Return iterators from methods instead of clone, improve return types, handle tx actions

* add todo for later to create dedicated structures for possible errors of Failure ExecutionStatus

* DelegateAction struct, split receipts file

* remove commented code

* expose delegate_actions mod

* feat: Expose concurrency parameter to the Lake structure (and builder) (#69)

* feat: LEP-001 Introduce Lake Context (#68)

* feat: Implement LEP-001 (Context)

* clean up Lake impl to reuse code

* Update docstrings

* refactor: Resolves #42 replace anyhow with LakeError enum (thiserror) (#70)

* fix: typo in README

* fix: CHANGELOG typos and bring back the concurrency usage lost during merging

* feat: ParentTransactionCache context and example (#74)

* feat: ParentTransactionCache context and example

* Update README for ParentTransactionCache about advanced usage

* refactor: Add LakeContext derive and introduce a concept of LakeContext with calls before and after user function

* Update lake-parent-transaction-cache/README.md

Co-authored-by: Morgan McCauley <[email protected]>

* address review suggestion rename field accounts_id -> account_ids

* add comments to the example, add READMEs, drop accidental artifact

* Update documentation, add docs about the LakeContext

* Add ParentTransactionCache to the Lake Framework README

* refactor Lake::run method to drop code duplication

---------

Co-authored-by: Morgan McCauley <[email protected]>

* docs: Lake Primitives documentation added (#75)

* feat: ParentTransactionCache context and example

* Update documentation, add docs about the LakeContext

* docs: Lake Primitives documentation added

* chore: Upgrade near primitives to 0.17.0 version (#77)

* chore: Prepare 0.8.0-beta.1 release (#78)

* chore: Prepare 0.8.0-beta.1 release

* Change to use CUSTOM_GITHUB_TOKEN and add a link to the workflow for the reasoning

* Setup versioning and add release-plz.toml

* Drop the release-plz job limitation on the CI job

* Add missing versions to the changelog

* Add 0.7.x branch to the CI

---------

Co-authored-by: Morgan McCauley <[email protected]>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

roshaans roshaans left review comments

telezhnaya telezhnaya left review comments

frol frol left review comments

pkudinov Awaiting requested review from pkudinov

morgsmccauley Awaiting requested review from morgsmccauley

Labels

None yet