Improve error handling of downloads, speech bubbles #1059

rmol · 2020-04-10T18:05:57Z

Description

Adds error state to SpeechBubble, so messages and replies that fail to download will change style and display the error message within. They should automatically recover when errors clear, but there seems to be some glitchiness with draft replies.

Fixes #140.

Test Plan

Dev environment

Start the dev server
rm -rf ~/.securedrop_client/ && ./run.sh --sdc-home ~/.securedrop_client
Before logging in, delete the submission key:
- gpg --homedir ~/.securedrop_client/gpg --delete-secret-keys --yes 65A1B5FF195B56353CC63DFFCC40EF1228271441
- gpg --homedir ~/.securedrop_client/gpg --delete-keys --yes 65A1B5FF195B56353CC63DFFCC40EF1228271441
Log in and wait for the sync. The test messages and replies should turn gray with italic text and contain a message about the decryption error.
Try to compose a reply, and check its failure styling.
Close the client, then restart normally, without deleting the data dir or submission key. After sync the messages and replies should look normal.

Qubes with test server

Build a deb from this branch, and install it in sd-app.
On sd-app as root, edit /opt/venvs/securedrop-client/lib/python3.7/site-packages/securedrop_client/logic.py to shorten the default value of Controller.download_failure_retry_interval to 30 seconds.
Run the client.
Before logging in, in sd-gpg, run mv /home/user/.gnupg /home/user/.gnupg.save to delete the submission key.
Log in.
Confirm that messages and replies are gray and italic, containing decryption errors.
Wait and watch the client log, confirming that messages and replies are never being retried because of the download failures.
Back in sd-gpg, run rm -rf /home/user/.gnupg && cp -ax /home/user/.gnupg.save /home/user/.gnupg.
Wait for the next sync; the failed downloads should not be retried and should not change their visual state, even though the key is now present.
Restart the client. All the failed downloads should be retried, and the conversation views should update to normal status.

Checklist

If these changes modify code paths involving cryptography, the opening of files in VMs or network (via the RPC service) traffic, Qubes testing in the staging environment is required. For fine tuning of the graphical user interface, testing in any environment in Qubes is required. Please check as applicable:

I have tested these changes in the appropriate Qubes environment
I do not have an appropriate Qubes OS workstation set up (the reviewer will need to test these changes)
These changes should not need testing in Qubes

If these changes add or remove files other than client code, the AppArmor profile may need to be updated. Please check as applicable:

I have updated the AppArmor profile
No update to the AppArmor profile is required for these changes
I don't know and would appreciate guidance

eloquence · 2020-04-20T23:17:27Z

./run.sh --sdc-home ~/.securedrop_client

Note that run.sh will re-import the key, which is fine if you follow the test plan exactly as you wrote it. :-) I made the mistake of exiting the client in between; when testing this with the Docker env, I then had to comment out this line after the initial import.

eloquence · 2020-04-20T23:52:25Z

Close the client, then restart normally, without deleting the data dir or submission key. After sync the messages and replies should look normal.

I'm seeing the same retry behavior as you're describing in the test plan for Qubes in the Docker env: replies/messages aren't re-downloaded until 10 minutes have elapsed; before then I see log lines like this during each sync:

DEBUG: Download of reply 2174e1ef-5209-4930-908f-db821d02b294 failed recently; not retrying yet.

sssoleileraaa

Log in and wait for the sync. The test messages and replies should turn gray with italic text and contain a message about the decryption error.

For me the messages and replies turned red, is this expected?

Try to compose a reply, and check its failure styling.

I see the standard "Failed to send reply" error, which I think looks right.

Close the client, then restart normally, without deleting the data dir or submission key. After sync the messages and replies should look normal.

This works! Here's what the ConversationView looked like before and after this step:

Before
After

As expected, the replies are permanently failed by logging out and logging back in.

Next up I need to test in Qubes and review the code.

eloquence · 2020-04-21T17:03:42Z

Regarding the logic implemented here: Would it be preferable to just check once, on startup, instead of checking every 10 minutes?

One main goal of this PR is to minimize notification noise that journalists see, i.e. the Qubes sd-gpg notifications (not visible in Docker env testing). It may be several weeks until we support adding more than one submission key. It seems preferable to only show these notifications once, on the sync after login, instead of repeatedly showing them during a session that could last an hour or longer.

rmol · 2020-04-21T17:04:14Z

For me the messages and replies turned red, is this expected?

No. That was the initial styling on this branch, but the current error CSS should turn things gray. Check that you've got the latest.

redshiftzero · 2020-04-21T18:14:17Z

it seems like the last_updated field added here generically useful (i.e. so we should leave that in) but we can increase the time for update from 10 minutes to ~8 hours to handle #1059 (comment) - reducing notification frequency until we have a path to resolution - what do y’all think?

sssoleileraaa

This is looking good so far. I see the gray text now that I pulled in the latest changes and did an initial code review. I still want to manually test a few more things in Qubes and review the unit tests.

I agree that increasing the download_failure_retry_interval to 8 hours (the length of a login session) is a good minimal change to make it so we only try a failed File/Message/Reply once per session.

securedrop_client/gui/widgets.py

sssoleileraaa · 2020-04-21T19:02:02Z

securedrop_client/db.py

+        DateTime,
+        nullable=False,
+        default=datetime.datetime.utcnow,
+        onupdate=datetime.datetime.utcnow,


This introduces a new concept of last_updated for Files, Messages, and Replies, which we also have for Sources. For Sources it represents when a Source was last updated on the server. For Files, Messages, and Replies, it represents when the local database object was last updated. A little documentation about this might go a long way.

securedrop_client/db.py

sssoleileraaa · 2020-04-21T19:27:32Z

I agree that increasing the download_failure_retry_interval to 8 hours (the length of a login session) is a good minimal change to make it so we only try a failed File/Message/Reply once per session.

I recommend updating our architecture docs in the wiki to mention how we have additional retry logic when the client starts for download jobs that fail for checksum and crypto reasons.

eloquence · 2020-04-21T20:17:15Z

Sorry to reiterate, but is there any reason (other than, obviously, wanting to land this change ASAP) to go with a fixed retry time rather than just doing this check on first sync? I'm just worried that a fixed timeout is more difficult to manage in practice ("we just imported the key and now we have to wait 8 hours").

rmol · 2020-04-21T20:30:09Z

I don't like the idea of setting the expectation that the client needs to be restarted to pick up changes.

We could clear download errors at startup, so everything is tried at least once, but I'd prefer to also self-correct where possible. How much noise has been reported? Were there actually lots of items affected, or was it just that they were being constantly retried? Because if it's a just handful of popups and it's reduced to happening only every x minutes or hours, maybe it wouldn't be as obnoxious, and we could try to reflect administrative corrections sooner, without requiring a restart.

eloquence · 2020-04-21T20:44:27Z

I don't like the idea of setting the expectation that the client needs to be restarted to pick up changes.

So, what would the hypothetical admin intervention look like? As I understand it, in a real world newsroom, an admin is going to have to grab the laptop, make some config.json changes, add another .sec file, and re-reun securedrop-admin --apply. The concern "does the client have to be restarted to pick up this change" is IMO very secondary in this scenario, unless I'm missing something. I would agree if the required admin fix was server-side rather than workstation-side.

We could clear download errors at startup, so everything is tried at least once, but I'd prefer to also self-correct where possible.

There's two ways that I see to avoid the "have to restart client" solution:

have the hypothetical import tooling send a signal to the client that it needs to recheck sources
have the client offer some kind of "refresh key" feature.

But having it just retry all the time in ways that cause UI feedback to the user that is not actually relevant to their work seems problematic.

How much noise has been reported?

We know that every news organization that a) stores significant amounts of content on the server, b) has ever undertaken a key rotation, would be affected. And so the number of sd-gpg notifications could range from zero to potentially hundreds. In the specific cases we know about, we know it was significant enough to be viewed as mildly disruptive.

rmol · 2020-04-21T20:58:42Z

Yeah, all good points. I'll eliminate the retry interval, clear errors at start, and if download decryption fails, that item will stay broken until the next restart.

rmol · 2020-04-22T17:14:37Z

I've updated the test plan for the latest logic.

sssoleileraaa · 2020-04-22T21:20:54Z

@rmol - what do you think about addressing #1059 (comment) and #1059 (comment) as followup? you can find our architecture docs in the wiki.

rmol · 2020-04-22T21:24:21Z

Sure.

sssoleileraaa

lgtm

sssoleileraaa · 2020-04-22T21:26:26Z

securedrop_client/logic.py

@@ -347,6 +336,8 @@ def setup(self):
        self.export.moveToThread(self.export_thread)
        self.export_thread.start()

+        storage.clear_download_errors(self.session)


Confirming: This makes it so that the client has to be fully restarted in order to clear the download errors and retry ((ogging out and logging back in again will not reset the error state).

(sorry, need to retest on qubes)

sssoleileraaa

once again, lgtm!

eloquence · 2020-04-22T22:48:23Z

Hey @creviera, could I ask you to hold off on merge until I've gone over this with @ninavizz ? Just want to make sure that we're on the same page about the initial wording & UX before we land this one.

to make sure we don't accidentally merge i'm going to dismiss my review while the language used in this PR is being reviewed.

redshiftzero · 2020-04-24T14:55:59Z

hmm interesting idea! though wouldn't the error state bubbles only be drawn for the source conversations the user clicks on prior to the first sync? (since the conversation view is drawn when the user clicks on the corresponding source list item). Taking a step back, I feel like what we want here (eventually) is a little "Retry" button/icon or something next to the failed items so that users can attempt to retry single items, or a global "Retry download all failed items" somewhere.

Adds error state to SpeechBubble, so messages and replies that fail to download will change style and display the error message within. They should automatically recover when errors clear. Add db.DownloadError with relationships to File, Message and Reply. Use that to determine styling and content of messages and replies in the conversation view, instead of signals, because if the conversation view has not been built, there's nothing listening for the signals, and when the source is clicked, the styling and text won't reflect the current downloadable object state. Note that while File can now have a download error associated, file widgets are not yet using it. Add last_updated column to File, Message and Reply, and use that in conjunction with download_error to delay retry of downloads that fail due to decryption problems, to avoid sd-gpg access notification spam.

Clear download errors once at start. If downloads fail because of decryption errors, they'll remain in an error state until restart, preventing any GPG notifications.

Can't use database defaults or triggers without test failures because SQLAlchemy's idea of the schema differs, so just explicitly populate the last_updated column.

rmol · 2020-04-24T16:03:13Z

Yes, and if we're going to add anything more like retry buttons, let's definitely kick that down the road.

I've pushed up changes addressing the message box blowing out of the speech bubble, the overstyling, and the bulky overwhelming error text.

I have not addressed the unchanged file decryption error handling (item 3 in this comment) because it's going to take some work to determine the nature of the GPG error and it's been pointed out to me that this PR is already too large.

eloquence · 2020-04-24T16:08:23Z

Taking a step back, I feel like what we want here (eventually) is a little "Retry" button/icon or something next to the failed items so that users can attempt to retry single items, or a global "Retry download all failed items" somewhere.

I'm not sure that these actions should be exposed to journalist users if they require administrative intervention to resolve. I think Nina's next-gen mockup here (second mockup) does a good job of prioritizing information that is useful for the journalist in a clear way ("this won't work, see your admin").

I have not addressed the unchanged file decryption error handling (item 3 in this comment) because it's going to take some work to determine the nature of the GPG error and it's been pointed out to me that this PR is already too large.

Cool, I do think that warrants an issue maybe in the n+1 sprint.

I think after that we should prioritize the key import. That would give admins two remediation paths (import missing keys or delete old sources), which may be fully sufficient without diving further into the UX for the error cases.

rmol · 2020-04-24T21:53:32Z

@eloquence @ninavizz Was talking with @creviera and she suggested I point out that the fix for the padding problem she spotted was simply to give the message box a smaller width than the speech bubble. (There's actually an invisible box within the white box, which actually contains the text.) I went this slightly hacky route because the pending overhaul of speech bubbles to fix wrapping is going to change all of this, and be a better solution.

eloquence · 2020-04-24T21:55:42Z

she spotted was simply to give the message box a smaller width than the speech bubble.

Sorry, terminology confusion. To me the terms "message box" and "speech bubble" are synonymous in the context of the Client. Could you clarify the before/after state by way of screenshots?

sssoleileraaa · 2020-04-24T22:10:02Z

Chiming in...

"message box" refers to the QLabel that holds the text and "speech bubble" refers to the QWidget that contains the QLabel (and color bar that indicates message/reply status).

There are two issues that look alike but aren't. We have this issue #815, which is where long strings without spaces, forward slashes, dashes, or special characters get cut off within the QLabel itself. Then there is the issue found in this PR, which is where the QLabel is a larger width for some reason so the QLabel gets cut off in the speech bubble that contains the QLabel.

@rmol's solution gets around this by making the speech bubbles wider. I am curious why this is happening, and believe it's more likely to be related to how we set a widget's stylesheet.

sssoleileraaa · 2020-04-25T00:09:28Z

I just double-checked and the issue brought up here: #1059 (comment) seems to be unrelated to #815. There aren't any upcoming PRs that will address this sizing problem that I know of so I'd like to keep this PR open until it's fixed.

rmol · 2020-04-25T12:48:28Z

Reiterating @creviera's description: the speech bubble is a container, holding a text box (a QLabel) and the color bar. Applying the same CSS that is applied whole to the speech bubble container widget on master (and expected to cascade to the text box and color bar) in individual chunks specific to the contained widgets (which fixes other Qt bugs where the cascading doesn't work properly when you change the CSS -- as when a message fails or is corrected) results in the contained text box no longer being contained -- it blows out the right side of the container. I've added an orange border to the text box here -- note that the right border is not visible, because the text box is overflowing the container:

My fix was to constrain its width to fit within the container, not to widen the container, but the end result is the same: it's now entirely visible and so is its text.

The reason I think #1050 (which will fix #815) will also address this is that in that PR, the QLabel text box is being replaced with another widget, and much more work is being put into controlling the size of that widget and wrapping the text properly. Any solution in this PR will end up being replaced or reworked once #1050 is merged.

redshiftzero · 2020-04-27T22:04:56Z

Confirming that with @rmol's latest change and one fine adjustment (to ensure the padding size doesn't change in this PR compared to master) that the problem @creviera notes here is now resolved. Behavior on master:

Behavior in this PR:

I'll approve but hold off on merge to give others a chance to comment. But otherwise, I think this should be ready to go.

sssoleileraaa · 2020-04-27T22:05:52Z

FYI - I approved just now because John's latest commit updated the speech bubbles so that they are no longer larger (9d3008f), which was the UX change I thought should be reviewed by Erik or Nina. It looks like there was some overlap between his last commit and my advice to get UX review. Also Jen's latest commit looks good to me!

rmol force-pushed the 140-handle-missing-keys branch 4 times, most recently from 2a2522b to eb49eca Compare April 16, 2020 19:16

rmol changed the title ~~WIP: add error state to all SpeechBubbles~~ Improve error handling of downloads, speech bubbles Apr 16, 2020

rmol marked this pull request as ready for review April 16, 2020 19:28

rmol requested review from sssoleileraaa, kushaldas and redshiftzero as code owners April 16, 2020 19:28

sssoleileraaa reviewed Apr 21, 2020

View reviewed changes

rmol force-pushed the 140-handle-missing-keys branch from d6ea335 to a75e8ca Compare April 22, 2020 21:18

sssoleileraaa previously approved these changes Apr 22, 2020

View reviewed changes

rmol added 6 commits April 24, 2020 12:01

Revert download retry logic

ebc50c1

Clear download errors once at start. If downloads fail because of decryption errors, they'll remain in an error state until restart, preventing any GPG notifications.

Restore database default of downloadables' last_updated

93aa3cd

Fix table rework in download error migration

f55bb97

Can't use database defaults or triggers without test failures because SQLAlchemy's idea of the schema differs, so just explicitly populate the last_updated column.

Fix longstanding bugs in test_storage.test_update_replies

ed2b1c4

address review feedback

9d3008f

rmol force-pushed the 140-handle-missing-keys branch from 742c53f to 9d3008f Compare April 24, 2020 16:01

Increase space around speech bubble message box

13be3d5

sssoleileraaa previously approved these changes Apr 27, 2020

View reviewed changes

gui: very slight adjustment to message/speech bubble padding

c052e2d

redshiftzero dismissed sssoleileraaa’s stale review via c052e2d April 27, 2020 21:52

redshiftzero approved these changes Apr 27, 2020

View reviewed changes

sssoleileraaa approved these changes Apr 27, 2020

View reviewed changes

sssoleileraaa merged commit 2cce27d into master Apr 27, 2020

sssoleileraaa deleted the 140-handle-missing-keys branch April 27, 2020 22:07

eloquence mentioned this pull request Apr 28, 2020

Show a more specific error message when download failed due to missing GPG key #1075

Open

redshiftzero mentioned this pull request May 7, 2020

securedrop-client 0.2.0 #1083

Merged

29 tasks

sssoleileraaa mentioned this pull request May 9, 2020

Cryptoerror logs #1090

Merged

10 tasks

sssoleileraaa mentioned this pull request May 27, 2020

Expand functional test coverage to prevent regressions #1004

Open

eloquence mentioned this pull request Feb 8, 2021

polish message bubble styling #465

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve error handling of downloads, speech bubbles #1059

Improve error handling of downloads, speech bubbles #1059

rmol commented Apr 10, 2020 •

edited by redshiftzero

Loading

eloquence commented Apr 20, 2020 •

edited

Loading

eloquence commented Apr 20, 2020

sssoleileraaa left a comment

eloquence commented Apr 21, 2020

rmol commented Apr 21, 2020

redshiftzero commented Apr 21, 2020

sssoleileraaa left a comment

sssoleileraaa Apr 21, 2020

sssoleileraaa commented Apr 21, 2020

eloquence commented Apr 21, 2020 •

edited

Loading

rmol commented Apr 21, 2020

eloquence commented Apr 21, 2020 •

edited

Loading

rmol commented Apr 21, 2020

rmol commented Apr 22, 2020

sssoleileraaa commented Apr 22, 2020

rmol commented Apr 22, 2020

sssoleileraaa left a comment

sssoleileraaa Apr 22, 2020

rmol Apr 22, 2020

sssoleileraaa left a comment

eloquence commented Apr 22, 2020

redshiftzero commented Apr 24, 2020

rmol commented Apr 24, 2020

eloquence commented Apr 24, 2020

rmol commented Apr 24, 2020

eloquence commented Apr 24, 2020

sssoleileraaa commented Apr 24, 2020 •

edited

Loading

sssoleileraaa commented Apr 25, 2020

rmol commented Apr 25, 2020

redshiftzero commented Apr 27, 2020

sssoleileraaa commented Apr 27, 2020 •

edited

Loading

Improve error handling of downloads, speech bubbles #1059

Improve error handling of downloads, speech bubbles #1059

Conversation

rmol commented Apr 10, 2020 • edited by redshiftzero Loading

Description

Test Plan

Dev environment

Qubes with test server

Checklist

eloquence commented Apr 20, 2020 • edited Loading

eloquence commented Apr 20, 2020

sssoleileraaa left a comment

Choose a reason for hiding this comment

eloquence commented Apr 21, 2020

rmol commented Apr 21, 2020

redshiftzero commented Apr 21, 2020

sssoleileraaa left a comment

Choose a reason for hiding this comment

sssoleileraaa Apr 21, 2020

Choose a reason for hiding this comment

sssoleileraaa commented Apr 21, 2020

eloquence commented Apr 21, 2020 • edited Loading

rmol commented Apr 21, 2020

eloquence commented Apr 21, 2020 • edited Loading

rmol commented Apr 21, 2020

rmol commented Apr 22, 2020

sssoleileraaa commented Apr 22, 2020

rmol commented Apr 22, 2020

sssoleileraaa left a comment

Choose a reason for hiding this comment

sssoleileraaa Apr 22, 2020

Choose a reason for hiding this comment

rmol Apr 22, 2020

Choose a reason for hiding this comment

sssoleileraaa left a comment

Choose a reason for hiding this comment

eloquence commented Apr 22, 2020

redshiftzero commented Apr 24, 2020

rmol commented Apr 24, 2020

eloquence commented Apr 24, 2020

rmol commented Apr 24, 2020

eloquence commented Apr 24, 2020

sssoleileraaa commented Apr 24, 2020 • edited Loading

sssoleileraaa commented Apr 25, 2020

rmol commented Apr 25, 2020

redshiftzero commented Apr 27, 2020

sssoleileraaa commented Apr 27, 2020 • edited Loading

rmol commented Apr 10, 2020 •

edited by redshiftzero

Loading

eloquence commented Apr 20, 2020 •

edited

Loading

eloquence commented Apr 21, 2020 •

edited

Loading

eloquence commented Apr 21, 2020 •

edited

Loading

sssoleileraaa commented Apr 24, 2020 •

edited

Loading

sssoleileraaa commented Apr 27, 2020 •

edited

Loading