Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add 'select oldest unaudited' selection strategy #79

Merged
merged 1 commit into from
Mar 16, 2023

Conversation

perama-v
Copy link
Contributor

@perama-v perama-v commented Mar 11, 2023

Issue addressed

Proposed changes

Adds the final unimplemented strategy: SelectionStrategy::SelectOldestUnaudited

Description

The strategy finds the oldest content that has not yet been audited and submits it for auditing. As audits take time to complete, the strategy remembers the timestamp of the last loop of submissions to prevent double-submitting the content for another audit (race condition).

Additional info

  • unit test for strategy

match content::Entity::find()
.order_by_asc(content::Column::FirstAvailableAt)
.find_with_related(entity::content_audit::Entity)
.filter(content_audit::Column::CreatedAt.is_null())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious why this line exists. Am I correct that this is how we get sea-orm to select for rows that don't have an audit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is my understanding. In the test case for this strategy I have added an audited for item 1 of 45, and the test correctly does not select that item for audit.

Copy link
Member

@pipermerriam pipermerriam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading this correctly, we select the N oldest pieces of content that have no audit.

When I first read "oldest missing" my brain inferred it to mean, the oldest content that was found to be missing from the network (failed audit).

Assuming I'm correct in my interpretation of what this strategy is doing, I think it should potentially be renamed to something like "select oldest unaudited".

My next thought here is that the current implementation introduces what I think can be categorized as a race condition.

  • The strategy selects things that need to be audited and puts them in the queue
  • The queue is consumed, albeit slowly.
  • The strategy selects the same things again because they are still in the queue waiting to be audited.
  • The queue gets duplicate items added.

I think one way around this might be to keep note of the FirstAvailableAt timestamp of the last item selected, and then perform subsequent queries with a filter like this filter(content::Column::FirstAvailableAt.gt(previous_latest_first_available_at)).

@perama-v perama-v marked this pull request as draft March 15, 2023 23:31
@perama-v perama-v marked this pull request as ready for review March 16, 2023 11:57
@perama-v perama-v changed the title add 'oldest missing' selection strategy add 'select oldest unaudited' selection strategy Mar 16, 2023
@perama-v
Copy link
Contributor Author

... I think it should potentially be renamed to something like "select oldest unaudited".

Updated

I think one way around this might be to keep note of the FirstAvailableAt timestamp of the last item selected, and then perform subsequent queries with a filter like this filter(content::Column::FirstAvailableAt.gt(previous_latest_first_available_at)).

Good idea - implemented

@pipermerriam pipermerriam merged commit a4e85d0 into ethereum:master Mar 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add additional audit strategies
2 participants