Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add check for GCS file upload exist #280

Merged
merged 1 commit into from
Aug 29, 2024

Conversation

chowbao
Copy link
Contributor

@chowbao chowbao commented Aug 29, 2024

PR Checklist

PR Structure

  • This PR has reasonably narrow scope (if not, break it down into smaller PRs).
  • This PR avoids mixing refactoring changes with feature changes (split into two PRs
    otherwise).
  • This PR's title starts with the jira ticket associated with the PR.

Thoroughness

  • This PR adds tests for the most critical parts of the new functionality or fixes.
  • I've updated the README with the added features, breaking changes, new instructions on how to use the repository. I updated the description of the fuction with the changes that were made.

Release planning

  • I've decided if this PR requires a new major/minor/patch version accordingly to
    semver, and I've changed the name of the BRANCH to release/_ , feature/_ or patch/* .

What

Adding a check to see if the stellar-etl exported data file was actually upload and exists in GCS by checking for the file in GCS.

Why

There have been times that the stellar-etl task in airflow has failed to upload a file to GCS and did not fail the task. This should prevent that scenario from happening again.

Known limitations

It is in theory a redundant check but performance impact is negligible

@chowbao chowbao requested a review from a team as a code owner August 29, 2024 17:43
@chowbao
Copy link
Contributor Author

chowbao commented Aug 29, 2024

Need to test after logging issue is fixed in Airflow

Copy link
Contributor

@amishas157 amishas157 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good.
nit suggestion: To also verify size of the file is > 10b or something to ensure file is not empty

But defer to you whether that's really needed or not

@chowbao
Copy link
Contributor Author

chowbao commented Aug 29, 2024

This looks good. nit suggestion: To also verify size of the file is > 10b or something to ensure file is not empty

But defer to you whether that's really needed or not

Actually the files can be empty in the case there is no record for a time range 😅
Example:
https://console.cloud.google.com/storage/browser/_details/us-central1-test-hubble-43c3e190-bucket/dag-exported/scheduled__2024-07-25T00:00:00%2B00:00/changes_folder/708319-708382-claimable_balances.txt;tab=live_object?project=test-hubble-319619

@amishas157
Copy link
Contributor

Actually the files can be empty in the case there is no record for a time range 😅

Alright, then i take my nit suggestion back

@chowbao chowbao merged commit e72e367 into master Aug 29, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants