-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve error handling of downloads, speech bubbles #1059
Conversation
2a2522b
to
eb49eca
Compare
Note that |
I'm seeing the same retry behavior as you're describing in the test plan for Qubes in the Docker env: replies/messages aren't re-downloaded until 10 minutes have elapsed; before then I see log lines like this during each sync:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Log in and wait for the sync. The test messages and replies should turn gray with italic text and contain a message about the decryption error.
For me the messages and replies turned red, is this expected?
Try to compose a reply, and check its failure styling.
I see the standard "Failed to send reply" error, which I think looks right.
Close the client, then restart normally, without deleting the data dir or submission key. After sync the messages and replies should look normal.
This works! Here's what the ConversationView looked like before and after this step:
As expected, the replies are permanently failed by logging out and logging back in.
Next up I need to test in Qubes and review the code.
Regarding the logic implemented here: Would it be preferable to just check once, on startup, instead of checking every 10 minutes? One main goal of this PR is to minimize notification noise that journalists see, i.e. the Qubes |
No. That was the initial styling on this branch, but the current error CSS should turn things gray. Check that you've got the latest. |
it seems like the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good so far. I see the gray text now that I pulled in the latest changes and did an initial code review. I still want to manually test a few more things in Qubes and review the unit tests.
I agree that increasing the download_failure_retry_interval
to 8 hours (the length of a login session) is a good minimal change to make it so we only try a failed File/Message/Reply once per session.
DateTime, | ||
nullable=False, | ||
default=datetime.datetime.utcnow, | ||
onupdate=datetime.datetime.utcnow, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This introduces a new concept of last_updated for Files, Messages, and Replies, which we also have for Sources. For Sources it represents when a Source was last updated on the server. For Files, Messages, and Replies, it represents when the local database object was last updated. A little documentation about this might go a long way.
I recommend updating our architecture docs in the wiki to mention how we have additional retry logic when the client starts for download jobs that fail for checksum and crypto reasons. |
Sorry to reiterate, but is there any reason (other than, obviously, wanting to land this change ASAP) to go with a fixed retry time rather than just doing this check on first sync? I'm just worried that a fixed timeout is more difficult to manage in practice ("we just imported the key and now we have to wait 8 hours"). |
I don't like the idea of setting the expectation that the client needs to be restarted to pick up changes. We could clear download errors at startup, so everything is tried at least once, but I'd prefer to also self-correct where possible. How much noise has been reported? Were there actually lots of items affected, or was it just that they were being constantly retried? Because if it's a just handful of popups and it's reduced to happening only every x minutes or hours, maybe it wouldn't be as obnoxious, and we could try to reflect administrative corrections sooner, without requiring a restart. |
So, what would the hypothetical admin intervention look like? As I understand it, in a real world newsroom, an admin is going to have to grab the laptop, make some
There's two ways that I see to avoid the "have to restart client" solution:
But having it just retry all the time in ways that cause UI feedback to the user that is not actually relevant to their work seems problematic.
We know that every news organization that a) stores significant amounts of content on the server, b) has ever undertaken a key rotation, would be affected. And so the number of |
Yeah, all good points. I'll eliminate the retry interval, clear errors at start, and if download decryption fails, that item will stay broken until the next restart. |
I've updated the test plan for the latest logic. |
d6ea335
to
a75e8ca
Compare
@rmol - what do you think about addressing #1059 (comment) and #1059 (comment) as followup? you can find our architecture docs in the wiki. |
Sure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@@ -347,6 +336,8 @@ def setup(self): | |||
self.export.moveToThread(self.export_thread) | |||
self.export_thread.start() | |||
|
|||
storage.clear_download_errors(self.session) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirming: This makes it so that the client has to be fully restarted in order to clear the download errors and retry ((ogging out and logging back in again will not reset the error state).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
once again, lgtm!
Hey @creviera, could I ask you to hold off on merge until I've gone over this with @ninavizz ? Just want to make sure that we're on the same page about the initial wording & UX before we land this one. |
to make sure we don't accidentally merge i'm going to dismiss my review while the language used in this PR is being reviewed.
hmm interesting idea! though wouldn't the error state bubbles only be drawn for the source conversations the user clicks on prior to the first sync? (since the conversation view is drawn when the user clicks on the corresponding source list item). Taking a step back, I feel like what we want here (eventually) is a little "Retry" button/icon or something next to the failed items so that users can attempt to retry single items, or a global "Retry download all failed items" somewhere. |
Adds error state to SpeechBubble, so messages and replies that fail to download will change style and display the error message within. They should automatically recover when errors clear. Add db.DownloadError with relationships to File, Message and Reply. Use that to determine styling and content of messages and replies in the conversation view, instead of signals, because if the conversation view has not been built, there's nothing listening for the signals, and when the source is clicked, the styling and text won't reflect the current downloadable object state. Note that while File can now have a download error associated, file widgets are not yet using it. Add last_updated column to File, Message and Reply, and use that in conjunction with download_error to delay retry of downloads that fail due to decryption problems, to avoid sd-gpg access notification spam.
Clear download errors once at start. If downloads fail because of decryption errors, they'll remain in an error state until restart, preventing any GPG notifications.
Can't use database defaults or triggers without test failures because SQLAlchemy's idea of the schema differs, so just explicitly populate the last_updated column.
742c53f
to
9d3008f
Compare
Yes, and if we're going to add anything more like retry buttons, let's definitely kick that down the road. I've pushed up changes addressing the message box blowing out of the speech bubble, the overstyling, and the bulky overwhelming error text. I have not addressed the unchanged file decryption error handling (item 3 in this comment) because it's going to take some work to determine the nature of the GPG error and it's been pointed out to me that this PR is already too large. |
I'm not sure that these actions should be exposed to journalist users if they require administrative intervention to resolve. I think Nina's next-gen mockup here (second mockup) does a good job of prioritizing information that is useful for the journalist in a clear way ("this won't work, see your admin").
Cool, I do think that warrants an issue maybe in the n+1 sprint. I think after that we should prioritize the key import. That would give admins two remediation paths (import missing keys or delete old sources), which may be fully sufficient without diving further into the UX for the error cases. |
@eloquence @ninavizz Was talking with @creviera and she suggested I point out that the fix for the padding problem she spotted was simply to give the message box a smaller width than the speech bubble. (There's actually an invisible box within the white box, which actually contains the text.) I went this slightly hacky route because the pending overhaul of speech bubbles to fix wrapping is going to change all of this, and be a better solution. |
Sorry, terminology confusion. To me the terms "message box" and "speech bubble" are synonymous in the context of the Client. Could you clarify the before/after state by way of screenshots? |
Chiming in... "message box" refers to the There are two issues that look alike but aren't. We have this issue #815, which is where long strings without spaces, forward slashes, dashes, or special characters get cut off within the @rmol's solution gets around this by making the speech bubbles wider. I am curious why this is happening, and believe it's more likely to be related to how we set a widget's stylesheet. |
I just double-checked and the issue brought up here: #1059 (comment) seems to be unrelated to #815. There aren't any upcoming PRs that will address this sizing problem that I know of so I'd like to keep this PR open until it's fixed. |
Reiterating @creviera's description: the speech bubble is a container, holding a text box (a QLabel) and the color bar. Applying the same CSS that is applied whole to the speech bubble container widget on master (and expected to cascade to the text box and color bar) in individual chunks specific to the contained widgets (which fixes other Qt bugs where the cascading doesn't work properly when you change the CSS -- as when a message fails or is corrected) results in the contained text box no longer being contained -- it blows out the right side of the container. I've added an orange border to the text box here -- note that the right border is not visible, because the text box is overflowing the container: My fix was to constrain its width to fit within the container, not to widen the container, but the end result is the same: it's now entirely visible and so is its text. The reason I think #1050 (which will fix #815) will also address this is that in that PR, the QLabel text box is being replaced with another widget, and much more work is being put into controlling the size of that widget and wrapping the text properly. Any solution in this PR will end up being replaced or reworked once #1050 is merged. |
Confirming that with @rmol's latest change and one fine adjustment (to ensure the padding size doesn't change in this PR compared to Behavior in this PR: I'll approve but hold off on merge to give others a chance to comment. But otherwise, I think this should be ready to go. |
FYI - I approved just now because John's latest commit updated the speech bubbles so that they are no longer larger (9d3008f), which was the UX change I thought should be reviewed by Erik or Nina. It looks like there was some overlap between his last commit and my advice to get UX review. Also Jen's latest commit looks good to me! |
Description
Adds error state to SpeechBubble, so messages and replies that fail to download will change style and display the error message within. They should automatically recover when errors clear, but there seems to be some glitchiness with draft replies.
Fixes #140.
Test Plan
Dev environment
rm -rf ~/.securedrop_client/ && ./run.sh --sdc-home ~/.securedrop_client
gpg --homedir ~/.securedrop_client/gpg --delete-secret-keys --yes 65A1B5FF195B56353CC63DFFCC40EF1228271441
gpg --homedir ~/.securedrop_client/gpg --delete-keys --yes 65A1B5FF195B56353CC63DFFCC40EF1228271441
Qubes with test server
sd-app
.sd-app
as root, edit/opt/venvs/securedrop-client/lib/python3.7/site-packages/securedrop_client/logic.py
to shorten the default value ofController.download_failure_retry_interval
to 30 seconds.sd-gpg
, runmv /home/user/.gnupg /home/user/.gnupg.save
to delete the submission key.sd-gpg
, runrm -rf /home/user/.gnupg && cp -ax /home/user/.gnupg.save /home/user/.gnupg
.Checklist
If these changes modify code paths involving cryptography, the opening of files in VMs or network (via the RPC service) traffic, Qubes testing in the staging environment is required. For fine tuning of the graphical user interface, testing in any environment in Qubes is required. Please check as applicable:
If these changes add or remove files other than client code, the AppArmor profile may need to be updated. Please check as applicable: