Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DPLT-1020 Fetch indexing metadata file to determine last_indexed_block. #96

Merged
merged 7 commits into from
Jun 6, 2023

Conversation

gabehamilton
Copy link
Collaborator

@gabehamilton gabehamilton commented Jun 5, 2023

  • Retrieves last_indexed_block from indexing metadata file to delineate where direct filtering from S3 Lake blocks should start.
  • Use archival RPC nodes to determine date of start_block_height so historical processing can go further back than five epochs.

Use archival RPC nodes to determine data of start_block_height
@gabehamilton gabehamilton changed the title Fetch indexing metadata file to determine last_indexed_block. DPLT-1020 Fetch indexing metadata file to determine last_indexed_block. Jun 5, 2023
"Last indexed block from latest_block.json: {:?}",
last_indexed_block
);
let last_indexed_block: u64 = from_str(last_indexed_block.as_str().unwrap()).unwrap();
Copy link
Contributor

@roshaans roshaans Jun 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if last_indexed_block happens to null because there is an issue in S3 or any network issues, it will cause this line to panic without any helpful error messages.

Can we change the function signature to return a Result<BlockHeight, Error> and either use ? to propagate the error message to the caller of this method or use an `.except("Unable to convert to a u64") on this line?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All set now. More error handling for the S3 files coming in https://pagodaplatform.atlassian.net/browse/DPLT-1012

@@ -116,21 +135,53 @@ async fn process_historical_messages(
block_difference
}

pub(crate) async fn last_indexed_block_from_metadata(
aws_config: &SdkConfig,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're instantiating a new S3 client in a few different places within this file, probably easier just to create one which we can share across all these functions :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had started with that but ran into errors where I needed to clone it. Maybe I can narrow it down though now that more functions are using it. Also possible that those errors were related to my attempt to use the s3_fetcher module from near-lake-framework; another reason to review.

Comment on lines +146 to +166
let metadata: serde_json::Value = serde_json::from_str(&metadata).unwrap();
let last_indexed_block = metadata["last_indexed_block"].clone();
let last_indexed_block = last_indexed_block.as_str();
if last_indexed_block.is_none() {
return Err(anyhow::anyhow!(
"No last_indexed_block found in latest_block.json"
));
}
let last_indexed_block = last_indexed_block.unwrap();
let last_indexed_block = from_str(last_indexed_block);
if last_indexed_block.is_err() {
return Err(anyhow::anyhow!(
"last_indexed_block couldn't be converted to u64"
));
}
let last_indexed_block = last_indexed_block.unwrap();
tracing::info!(
target: crate::INDEXER,
"Last indexed block from latest_block.json: {:?}",
last_indexed_block
);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably remove this entire block of code by parsing metadata as a strong type in serde. That would do all the type & existence checking for us meaning we wouldn't have to do it here.

Having concrete types is also nice from a documentation standpoint, as it's more clear what the JSON actually contains.

if last_indexed_block.is_err() {
tracing::error!(
target: crate::INDEXER,
last_indexed_block = ?last_indexed_block,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I didn't know you could do this

@gabehamilton
Copy link
Collaborator Author

Merging to test longer historical processing on dev. Will address PR comments in subsequent PR.

@gabehamilton gabehamilton merged commit 2a85de4 into main Jun 6, 2023
@gabehamilton gabehamilton deleted the DPLT-1020_use_indexing_metadata branch June 6, 2023 04:50
@roshaans roshaans mentioned this pull request Jun 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants