Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement sweeper data versioning for repairkit #73

Merged
merged 7 commits into from
Sep 12, 2023
Merged

Conversation

alexdunnjpl
Copy link
Contributor

@alexdunnjpl alexdunnjpl commented Sep 12, 2023

🗒️ Summary

Implements a mechanism (in repairkit, but the approach is easily used elsewhere) to avoid redundant processing work. Previously, just no-op iterating through the initial query for repairkit was taking an inordinate amount of time. Now, only documents which haven't been updated with a version of repairkit GTE the current version are returned by the initial query.

For any sweeper, the sweeper version should be written as an integer to f"ops:Provenance/ops:registry_sweepers_{sweeper_name}_version". This version should be updated in the sweeper's constants submodule whenever a change is made to the sweeper which invalidates previous processing of documents.

Because it can only result from a code change in the first place, I've elected to hard-code the version in constants instead of using a configuration file, for simplicity.

Timeouts have been bumped to improve stability when run against prod registries. I've hardcoded them initially, but if they need to be changed a second time, I'll move them to a CLI argument.

There are additional unrelated changes/bugfixes included in this PR

  • demote a noisy log
  • minor refactoring/de-crufting
  • fix an erroneous error log

⚙️ Test Data and/or Report

Manually tested against local, then en-prod (repairkit took 53min initially, and 0sec thereafter)

♻️ Related Issues

related to #61
fixes #70

Copy link
Member

@nutjob4life nutjob4life left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alexdunnjpl alexdunnjpl merged commit ca72e3b into main Sep 12, 2023
1 check passed
@alexdunnjpl alexdunnjpl deleted the 70-data-version branch September 12, 2023 15:53
@tloubrieu-jpl tloubrieu-jpl changed the title 70 implement sweeper data versioning for repairkit Implement sweeper data versioning for repairkit Oct 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update repairkit to include repairkit version metadata and check to streamline execution
2 participants