-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add 'select oldest unaudited' selection strategy #79
add 'select oldest unaudited' selection strategy #79
Conversation
match content::Entity::find() | ||
.order_by_asc(content::Column::FirstAvailableAt) | ||
.find_with_related(entity::content_audit::Entity) | ||
.filter(content_audit::Column::CreatedAt.is_null()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious why this line exists. Am I correct that this is how we get sea-orm to select for rows that don't have an audit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this is my understanding. In the test case for this strategy I have added an audited for item 1 of 45, and the test correctly does not select that item for audit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm reading this correctly, we select the N
oldest pieces of content that have no audit.
When I first read "oldest missing" my brain inferred it to mean, the oldest content that was found to be missing from the network (failed audit).
Assuming I'm correct in my interpretation of what this strategy is doing, I think it should potentially be renamed to something like "select oldest unaudited".
My next thought here is that the current implementation introduces what I think can be categorized as a race condition.
- The strategy selects things that need to be audited and puts them in the queue
- The queue is consumed, albeit slowly.
- The strategy selects the same things again because they are still in the queue waiting to be audited.
- The queue gets duplicate items added.
I think one way around this might be to keep note of the FirstAvailableAt
timestamp of the last item selected, and then perform subsequent queries with a filter like this filter(content::Column::FirstAvailableAt.gt(previous_latest_first_available_at))
.
b984742
to
026b418
Compare
Updated
Good idea - implemented |
Issue addressed
Proposed changes
Adds the final unimplemented strategy:
SelectionStrategy::SelectOldestUnaudited
Description
The strategy finds the oldest content that has not yet been audited and submits it for auditing. As audits take time to complete, the strategy remembers the timestamp of the last loop of submissions to prevent double-submitting the content for another audit (race condition).
Additional info