-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement network connectivity error handling #391
Comments
In order to implement this, we should first implement pausing the queue, otherwise we won't be able to test the "Retry" action that should cause the queue operation to be resumed immediately. Also it would be helpful implement realistic timeouts before the "Retry" action so that we can test and get a good feel for how long a retry might take for reoccurring timeouts. But in reality, we might end up tweaking timeout values a few times until we have a smarter system that has some sort of backoff strategy for request timeouts so that we slow down our retries until we pause the queue and require user intervention. |
Breaking this down:
|
For the 7/24-8/7 sprint, we'll aim to resolve #443 and #430 above, and add a first iteration of the UI layer (error message and functioning "Retry" button). If it makes sense to split the UI layer into its own issue/PR, that may be done mid-sprint. #491 is out of scope for the current sprint, so the epic remains in the near-term backlog; once #491 is resolved, we should be able to fully satisfy the acceptance criteria. |
Why is this still open again? |
ah because we still need to implement #491 |
If I understand current state correctly, right now we are only offering "Retry" on metadata syncs, but on no other actions. This issue, as currently scoped, suggests a single global "Retry" action on all network errors, after automatic retries have failed. Is that still the approach we want to choose for beta? I suggest we discuss a bit more what's realistically achievable while alleviating likely pain points for our users. Very high on the priority list is IMO a way for journalists to re-send failed replies without having to resort to copy and paste, whether that's accomplished through a global "retry" action (as proposed in this issue) or otherwise. |
The retry is for any network action that is done via the queue. To improve reliability we should complete moving all network actions to the queue (that means they'll get automatic retries and then network actions will pause if a failure keeps occurring). Right now we have an incomplete transition: e.g. deletion is not done via the queue, if it fails once, it doesn't get retried at all. In addition to that I think we need to do two things:
|
Discuss in Monday's Client meeting? |
Recap of today's client sync discussion as I understand it, with proposed must/should/could priorities for the beta:
|
Cross-referencing #650 |
After discussion with @creviera, we agreed that this issue can be closed. Most of the original work has been done, and remaining work has clear follow-up issues (e.g. #534, #359). That said, given the heavy work on the network stack of the client, it's important that we verify and re-verify that the client behaves as intended. I've updated the acceptance criteria and moved them to the client test plan wiki page: https://github.com/freedomofpress/securedrop-client/wiki/Test-plan We may narrow the number of tests for the final QA plan, but we should IMO verify each of the criteria listed there before then, and file new issues as needed. |
(This is closely related to #291, but not dependent on it.)
If the client repeatedly fails to perform the next operation in the queue after a fixed number of retries and this is at least in part due to network conditions (e.g., network down, Whonix VM down, Tor network issues), we want to report this to the user as an error.
The purpose of this error message is to let the user know that operations are pending, and to give them an opportunity to correct the problem.
A "Retry" action should cause the error message to disappear, and queue operation to be resumed immediately.
Design
Behavior via Invision · Zeplin
Related issuess
User Story
As a user, I want to know when the client is experiencing connectivity isses, so that I can address them or try again later.
Acceptance Criteria
Given that I have performed some operations on the client
And I have temporarily lost (Tor) network connectivity
When the client has performed a set number of automatic retries that have failed (#384)
Then it should alert me to the error in the manner specified here.
Given that I have performed some operations on the client
And some of those operations fail for reasons unrelated to Tor network connectivity (e.g., attempting to star a source that has since been deleted)
When the client recognizes the failure
Then it should not treat the error in the manner specified here.
Given that some operations have repeatedly failed due to network issues
When I click "Retry"
Then the error message should disappear, and the client should attempt to resume processing the queue
Given that some operations have repeatedly failed due to network issues
When I do nothing
Then the client should periodically test connectivity
And the error message should disappear when connectivity is restored
And the queue should be resumed when connectivity is restored
Given that I have retried an operation that has previously failed
And the network is still down
When the operation fails again after a limited number of retries
Then it the client should again alert me to the error in the manner specified here.
Given that the client is in the error state specified here
When I perform an operation that requires connectivity (e.g., reply)
Then that operation should be processed normally (e.g., added to queue)
And the queue should continue to be paused until network connectivity is restored
Given that the client is in the error state specified here
When I add a reply
Then the reply should be displayed in "sending" state
Given that the client is in the error state specified here
When I star a source
Then the source should be displayed in "starred" state
Given that the client is in the error state specified here
When I delete a source
Then the source should be displayed in "deletion pending" state
The text was updated successfully, but these errors were encountered: