-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Separate MetadataSyncJob from main queue and increase frequency of background syncs #652
Comments
Also, I forgot to mention that this would allow us to prevent new MetadataSyncJobs getting added every 5 minutes by the timer if the queue is already paused because we would only create a new MetadataSyncJob when the health check succeeds. |
We discussed this briefly today. @redshiftzero proposed to have this queue handle the background-run MetadataSyncJobs for now, instead of adding a new ping action. User-initiated actions (and download jobs triggered by background syncs) would be handled by the main queue. Since we're also aiming to make MetadataSyncs faster, it's still an open question whether we need a separate network ping action. This is under consideration for the end-of-year kanban period or the next sprint. |
As I understand this issue's original scope:
As I understand Jen's proposal:
Please let me know if I captured that accurately. I would still nominate this to its own issue, because the scope, behavior, acceptance criteria, and possibly dependencies are substantively different. |
Yes, the first proposal assumes the health checker is in its own thread rather than queue (because there's nothing to queue up) and it doesn't have to be every 5 minutes, we could make it every 15 seconds or whatever we think makes sense for checking the network and adding MetadataSyncs. Jen brought up a good reason to avoid a separate ping: it costs two round trips to do a sync and could take longer than 15 seconds to do in total. Adding a separate queue for MetadataSyncJobs means we will have a queue that'll only ever contain one job at a time, but the reason for this is because we don't have support of async jobs in the main queue yet and it keeps our code uniform and easy to read since we're used to seeing api tasks be queue jobs (it fits within our architecture design). I think it would make the most sense to focus on tasks in this order:
|
Created a new issue for removing the sync during a reply send: #660 The work that doesn't have its own issue and could be captured by this ticket are:
|
That SGTM @creviera. Would you mind editing the issue description to that effect, to avoid any ambiguity? Also happy to take a stab at that, if you prefer. |
Updated this issue: #652 (comment) |
great writeup, thank you!
👍
some ideas/thoughts. We could:
tbh I think an investigatory spike could be worthwhile here |
Hi, so I've been looking into this issue. I'm painfully aware that I'm likely holding the wrong end of the stick, misunderstood something or don't have all the context needed, so please shoot me down. As I understand it the problem is that the message queue gets clogged up with Why not just limit only one What am I missing here, and/or is this other approach enough to mitigate the clogged queue problem? Like I said, I'm painfully aware of my lack of context and this is probably a good learning exercise for me too. |
Yeah, there's not much point in adding more than one |
With #715 merged, we're partway to the goals spec'd out in this issue. Some work described here has been split off into separate issues:
Additionally, we are already tracking needed follow-up to remove unnecessary sync calls, to fix sync-related UI glitchiness, and to revise the user experience of the "sync" area in the client: #687, #726, #671, #670, #352 Some proposed refactoring is tracked here: #647 Since this issue is not really organized as a tracking epic and we've covered follow-up work in separate issues, closing. |
Description
Separate metadata syncs from the main queue.
Follow-up for #491
Why do this?
Currently, whenever a queue job other than MetadataSyncJob sees an ApiInaccessibleError or RequestTimeoutError, we pause the queue. The queue can be unpaused by the follow actions:
Clicking on the "Retry" link from the client. This unpauses the queue and retries the last failed job. If it fails again, the queue will pause once again.
Initiating a sync either by our 5-minute background process or clicking on the refresh icon. A sync adds a MetadataSyncJob to the queue before unpausing it. Since MetadataSyncJob is the highest priority job it cuts to the front of the line and keeps retrying until it's successful.
The downside to this approach is that potentially many MetadataSyncJobs can be added to the queue, and they will always cut in line, making it so other jobs have to wait longer and longer before getting processed. For example, if a user clicks refresh 5 times and the background process kicks off a sync, there will be 6 MetadataSyncJobs at the front of the queue. And the longer the client fails to connect to the server, the more MetadataSyncJobs pile up. Even when there are no network issues and the queue is not paused, other tasks kick off syncs. For instance, the reply job triggers a sync so that there isn't a long period of displaying a conversation out of order (this can happen if a message came in during the time period between syncs). Another time a sync is triggered is when a user tries to open a file that no longer exists on the file system.
Also, if we want to sync with the server more often, to decrease the likelihood of journalists viewing out-of-date information when they send a reply, this means more MetadataSyncJobs cutting in line.
What's not in scope?
Before tackling this issue, we should do the following (when all these Issues are closed, our background sync will be the only place MetadataSyncJobs are created):
What's in scope for this issue?
So the scope of this issue is to run MetadataSyncJob outside of the main queue (until we add async job support) and to sync more frequently to make up for removing syncs from other areas of the code. What still needs to be decided by is:
Criteria
Regression Criteria
Given that some operations have repeatedly failed due to network issues
When I do nothing
Then the client should periodically test connectivity
And the error message should disappear when connectivity is restored
And the queue should be resumed when connectivity is restored
The text was updated successfully, but these errors were encountered: