Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve error handling of downloads, speech bubbles #1059

Merged
merged 8 commits into from
Apr 27, 2020

Conversation

rmol
Copy link
Contributor

@rmol rmol commented Apr 10, 2020

Description

Adds error state to SpeechBubble, so messages and replies that fail to download will change style and display the error message within. They should automatically recover when errors clear, but there seems to be some glitchiness with draft replies.

Fixes #140.

Test Plan

Dev environment

  • Start the dev server
  • rm -rf ~/.securedrop_client/ && ./run.sh --sdc-home ~/.securedrop_client
  • Before logging in, delete the submission key:
    • gpg --homedir ~/.securedrop_client/gpg --delete-secret-keys --yes 65A1B5FF195B56353CC63DFFCC40EF1228271441
    • gpg --homedir ~/.securedrop_client/gpg --delete-keys --yes 65A1B5FF195B56353CC63DFFCC40EF1228271441
  • Log in and wait for the sync. The test messages and replies should turn gray with italic text and contain a message about the decryption error.
  • Try to compose a reply, and check its failure styling.
  • Close the client, then restart normally, without deleting the data dir or submission key. After sync the messages and replies should look normal.

Qubes with test server

  • Build a deb from this branch, and install it in sd-app.
  • On sd-app as root, edit /opt/venvs/securedrop-client/lib/python3.7/site-packages/securedrop_client/logic.py to shorten the default value of Controller.download_failure_retry_interval to 30 seconds.
  • Run the client.
  • Before logging in, in sd-gpg, run mv /home/user/.gnupg /home/user/.gnupg.save to delete the submission key.
  • Log in.
  • Confirm that messages and replies are gray and italic, containing decryption errors.
  • Wait and watch the client log, confirming that messages and replies are never being retried because of the download failures.
  • Back in sd-gpg, run rm -rf /home/user/.gnupg && cp -ax /home/user/.gnupg.save /home/user/.gnupg.
  • Wait for the next sync; the failed downloads should not be retried and should not change their visual state, even though the key is now present.
  • Restart the client. All the failed downloads should be retried, and the conversation views should update to normal status.

Checklist

If these changes modify code paths involving cryptography, the opening of files in VMs or network (via the RPC service) traffic, Qubes testing in the staging environment is required. For fine tuning of the graphical user interface, testing in any environment in Qubes is required. Please check as applicable:

  • I have tested these changes in the appropriate Qubes environment
  • I do not have an appropriate Qubes OS workstation set up (the reviewer will need to test these changes)
  • These changes should not need testing in Qubes

If these changes add or remove files other than client code, the AppArmor profile may need to be updated. Please check as applicable:

  • I have updated the AppArmor profile
  • No update to the AppArmor profile is required for these changes
  • I don't know and would appreciate guidance

@rmol rmol force-pushed the 140-handle-missing-keys branch 4 times, most recently from 2a2522b to eb49eca Compare April 16, 2020 19:16
@rmol rmol changed the title WIP: add error state to all SpeechBubbles Improve error handling of downloads, speech bubbles Apr 16, 2020
@rmol rmol marked this pull request as ready for review April 16, 2020 19:28
@eloquence
Copy link
Member

eloquence commented Apr 20, 2020

./run.sh --sdc-home ~/.securedrop_client

Note that run.sh will re-import the key, which is fine if you follow the test plan exactly as you wrote it. :-) I made the mistake of exiting the client in between; when testing this with the Docker env, I then had to comment out this line after the initial import.

@eloquence
Copy link
Member

Close the client, then restart normally, without deleting the data dir or submission key. After sync the messages and replies should look normal.

I'm seeing the same retry behavior as you're describing in the test plan for Qubes in the Docker env: replies/messages aren't re-downloaded until 10 minutes have elapsed; before then I see log lines like this during each sync:

DEBUG: Download of reply 2174e1ef-5209-4930-908f-db821d02b294 failed recently; not retrying yet.

Copy link
Contributor

@sssoleileraaa sssoleileraaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log in and wait for the sync. The test messages and replies should turn gray with italic text and contain a message about the decryption error.

For me the messages and replies turned red, is this expected?

Try to compose a reply, and check its failure styling.

I see the standard "Failed to send reply" error, which I think looks right.

Close the client, then restart normally, without deleting the data dir or submission key. After sync the messages and replies should look normal.

This works! Here's what the ConversationView looked like before and after this step:

  • Before
    Screenshot from 2020-04-21 09-57-23

  • After
    Screenshot from 2020-04-21 09-52-16

As expected, the replies are permanently failed by logging out and logging back in.


Next up I need to test in Qubes and review the code.

@eloquence
Copy link
Member

Regarding the logic implemented here: Would it be preferable to just check once, on startup, instead of checking every 10 minutes?

One main goal of this PR is to minimize notification noise that journalists see, i.e. the Qubes sd-gpg notifications (not visible in Docker env testing). It may be several weeks until we support adding more than one submission key. It seems preferable to only show these notifications once, on the sync after login, instead of repeatedly showing them during a session that could last an hour or longer.

@rmol
Copy link
Contributor Author

rmol commented Apr 21, 2020

For me the messages and replies turned red, is this expected?

No. That was the initial styling on this branch, but the current error CSS should turn things gray. Check that you've got the latest.

@redshiftzero
Copy link
Contributor

it seems like the last_updated field added here generically useful (i.e. so we should leave that in) but we can increase the time for update from 10 minutes to ~8 hours to handle #1059 (comment) - reducing notification frequency until we have a path to resolution - what do y’all think?

Copy link
Contributor

@sssoleileraaa sssoleileraaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good so far. I see the gray text now that I pulled in the latest changes and did an initial code review. I still want to manually test a few more things in Qubes and review the unit tests.

I agree that increasing the download_failure_retry_interval to 8 hours (the length of a login session) is a good minimal change to make it so we only try a failed File/Message/Reply once per session.

securedrop_client/gui/widgets.py Show resolved Hide resolved
DateTime,
nullable=False,
default=datetime.datetime.utcnow,
onupdate=datetime.datetime.utcnow,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduces a new concept of last_updated for Files, Messages, and Replies, which we also have for Sources. For Sources it represents when a Source was last updated on the server. For Files, Messages, and Replies, it represents when the local database object was last updated. A little documentation about this might go a long way.

securedrop_client/db.py Outdated Show resolved Hide resolved
@sssoleileraaa
Copy link
Contributor

I agree that increasing the download_failure_retry_interval to 8 hours (the length of a login session) is a good minimal change to make it so we only try a failed File/Message/Reply once per session.

I recommend updating our architecture docs in the wiki to mention how we have additional retry logic when the client starts for download jobs that fail for checksum and crypto reasons.

@eloquence
Copy link
Member

eloquence commented Apr 21, 2020

Sorry to reiterate, but is there any reason (other than, obviously, wanting to land this change ASAP) to go with a fixed retry time rather than just doing this check on first sync? I'm just worried that a fixed timeout is more difficult to manage in practice ("we just imported the key and now we have to wait 8 hours").

@rmol
Copy link
Contributor Author

rmol commented Apr 21, 2020

I don't like the idea of setting the expectation that the client needs to be restarted to pick up changes.

We could clear download errors at startup, so everything is tried at least once, but I'd prefer to also self-correct where possible. How much noise has been reported? Were there actually lots of items affected, or was it just that they were being constantly retried? Because if it's a just handful of popups and it's reduced to happening only every x minutes or hours, maybe it wouldn't be as obnoxious, and we could try to reflect administrative corrections sooner, without requiring a restart.

@eloquence
Copy link
Member

eloquence commented Apr 21, 2020

I don't like the idea of setting the expectation that the client needs to be restarted to pick up changes.

So, what would the hypothetical admin intervention look like? As I understand it, in a real world newsroom, an admin is going to have to grab the laptop, make some config.json changes, add another .sec file, and re-reun securedrop-admin --apply. The concern "does the client have to be restarted to pick up this change" is IMO very secondary in this scenario, unless I'm missing something. I would agree if the required admin fix was server-side rather than workstation-side.

We could clear download errors at startup, so everything is tried at least once, but I'd prefer to also self-correct where possible.

There's two ways that I see to avoid the "have to restart client" solution:

  • have the hypothetical import tooling send a signal to the client that it needs to recheck sources
  • have the client offer some kind of "refresh key" feature.

But having it just retry all the time in ways that cause UI feedback to the user that is not actually relevant to their work seems problematic.

How much noise has been reported?

We know that every news organization that a) stores significant amounts of content on the server, b) has ever undertaken a key rotation, would be affected. And so the number of sd-gpg notifications could range from zero to potentially hundreds. In the specific cases we know about, we know it was significant enough to be viewed as mildly disruptive.

@rmol
Copy link
Contributor Author

rmol commented Apr 21, 2020

Yeah, all good points. I'll eliminate the retry interval, clear errors at start, and if download decryption fails, that item will stay broken until the next restart.

@rmol
Copy link
Contributor Author

rmol commented Apr 22, 2020

I've updated the test plan for the latest logic.

@rmol rmol force-pushed the 140-handle-missing-keys branch from d6ea335 to a75e8ca Compare April 22, 2020 21:18
@sssoleileraaa
Copy link
Contributor

@rmol - what do you think about addressing #1059 (comment) and #1059 (comment) as followup? you can find our architecture docs in the wiki.

@rmol
Copy link
Contributor Author

rmol commented Apr 22, 2020

Sure.

sssoleileraaa
sssoleileraaa previously approved these changes Apr 22, 2020
Copy link
Contributor

@sssoleileraaa sssoleileraaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@@ -347,6 +336,8 @@ def setup(self):
self.export.moveToThread(self.export_thread)
self.export_thread.start()

storage.clear_download_errors(self.session)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirming: This makes it so that the client has to be fully restarted in order to clear the download errors and retry ((ogging out and logging back in again will not reset the error state).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right.

@sssoleileraaa sssoleileraaa dismissed their stale review April 22, 2020 21:43

(sorry, need to retest on qubes)

sssoleileraaa
sssoleileraaa previously approved these changes Apr 22, 2020
Copy link
Contributor

@sssoleileraaa sssoleileraaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once again, lgtm!

@eloquence
Copy link
Member

Hey @creviera, could I ask you to hold off on merge until I've gone over this with @ninavizz ? Just want to make sure that we're on the same page about the initial wording & UX before we land this one.

@sssoleileraaa sssoleileraaa dismissed their stale review April 22, 2020 22:53

to make sure we don't accidentally merge i'm going to dismiss my review while the language used in this PR is being reviewed.

@redshiftzero
Copy link
Contributor

hmm interesting idea! though wouldn't the error state bubbles only be drawn for the source conversations the user clicks on prior to the first sync? (since the conversation view is drawn when the user clicks on the corresponding source list item). Taking a step back, I feel like what we want here (eventually) is a little "Retry" button/icon or something next to the failed items so that users can attempt to retry single items, or a global "Retry download all failed items" somewhere.

rmol added 6 commits April 24, 2020 12:01
Adds error state to SpeechBubble, so messages and replies that fail
to download will change style and display the error message within.
They should automatically recover when errors clear.

Add db.DownloadError with relationships to File, Message and
Reply. Use that to determine styling and content of messages and
replies in the conversation view, instead of signals, because if the
conversation view has not been built, there's nothing listening for
the signals, and when the source is clicked, the styling and text
won't reflect the current downloadable object state.

Note that while File can now have a download error associated,
file widgets are not yet using it.

Add last_updated column to File, Message and Reply, and use that in
conjunction with download_error to delay retry of downloads that fail
due to decryption problems, to avoid sd-gpg access notification spam.
Clear download errors once at start. If downloads fail because of
decryption errors, they'll remain in an error state until restart,
preventing any GPG notifications.
Can't use database defaults or triggers without test failures because
SQLAlchemy's idea of the schema differs, so just explicitly populate
the last_updated column.
@rmol rmol force-pushed the 140-handle-missing-keys branch from 742c53f to 9d3008f Compare April 24, 2020 16:01
@rmol
Copy link
Contributor Author

rmol commented Apr 24, 2020

Yes, and if we're going to add anything more like retry buttons, let's definitely kick that down the road.

I've pushed up changes addressing the message box blowing out of the speech bubble, the overstyling, and the bulky overwhelming error text.

I have not addressed the unchanged file decryption error handling (item 3 in this comment) because it's going to take some work to determine the nature of the GPG error and it's been pointed out to me that this PR is already too large.

@eloquence
Copy link
Member

Taking a step back, I feel like what we want here (eventually) is a little "Retry" button/icon or something next to the failed items so that users can attempt to retry single items, or a global "Retry download all failed items" somewhere.

I'm not sure that these actions should be exposed to journalist users if they require administrative intervention to resolve. I think Nina's next-gen mockup here (second mockup) does a good job of prioritizing information that is useful for the journalist in a clear way ("this won't work, see your admin").

I have not addressed the unchanged file decryption error handling (item 3 in this comment) because it's going to take some work to determine the nature of the GPG error and it's been pointed out to me that this PR is already too large.

Cool, I do think that warrants an issue maybe in the n+1 sprint.

I think after that we should prioritize the key import. That would give admins two remediation paths (import missing keys or delete old sources), which may be fully sufficient without diving further into the UX for the error cases.

@rmol
Copy link
Contributor Author

rmol commented Apr 24, 2020

@eloquence @ninavizz Was talking with @creviera and she suggested I point out that the fix for the padding problem she spotted was simply to give the message box a smaller width than the speech bubble. (There's actually an invisible box within the white box, which actually contains the text.) I went this slightly hacky route because the pending overhaul of speech bubbles to fix wrapping is going to change all of this, and be a better solution.

@eloquence
Copy link
Member

she spotted was simply to give the message box a smaller width than the speech bubble.

Sorry, terminology confusion. To me the terms "message box" and "speech bubble" are synonymous in the context of the Client. Could you clarify the before/after state by way of screenshots?

@sssoleileraaa
Copy link
Contributor

sssoleileraaa commented Apr 24, 2020

Chiming in...

"message box" refers to the QLabel that holds the text and "speech bubble" refers to the QWidget that contains the QLabel (and color bar that indicates message/reply status).

There are two issues that look alike but aren't. We have this issue #815, which is where long strings without spaces, forward slashes, dashes, or special characters get cut off within the QLabel itself. Then there is the issue found in this PR, which is where the QLabel is a larger width for some reason so the QLabel gets cut off in the speech bubble that contains the QLabel.

@rmol's solution gets around this by making the speech bubbles wider. I am curious why this is happening, and believe it's more likely to be related to how we set a widget's stylesheet.

@sssoleileraaa
Copy link
Contributor

I just double-checked and the issue brought up here: #1059 (comment) seems to be unrelated to #815. There aren't any upcoming PRs that will address this sizing problem that I know of so I'd like to keep this PR open until it's fixed.

@rmol
Copy link
Contributor Author

rmol commented Apr 25, 2020

Reiterating @creviera's description: the speech bubble is a container, holding a text box (a QLabel) and the color bar. Applying the same CSS that is applied whole to the speech bubble container widget on master (and expected to cascade to the text box and color bar) in individual chunks specific to the contained widgets (which fixes other Qt bugs where the cascading doesn't work properly when you change the CSS -- as when a message fails or is corrected) results in the contained text box no longer being contained -- it blows out the right side of the container. I've added an orange border to the text box here -- note that the right border is not visible, because the text box is overflowing the container:

speech-bubble-blowout

My fix was to constrain its width to fit within the container, not to widen the container, but the end result is the same: it's now entirely visible and so is its text.

The reason I think #1050 (which will fix #815) will also address this is that in that PR, the QLabel text box is being replaced with another widget, and much more work is being put into controlling the size of that widget and wrapping the text properly. Any solution in this PR will end up being replaced or reworked once #1050 is merged.

sssoleileraaa
sssoleileraaa previously approved these changes Apr 27, 2020
@redshiftzero
Copy link
Contributor

Confirming that with @rmol's latest change and one fine adjustment (to ensure the padding size doesn't change in this PR compared to master) that the problem @creviera notes here is now resolved. Behavior on master:

masterconv

Behavior in this PR:

this_pr

I'll approve but hold off on merge to give others a chance to comment. But otherwise, I think this should be ready to go.

@sssoleileraaa
Copy link
Contributor

sssoleileraaa commented Apr 27, 2020

FYI - I approved just now because John's latest commit updated the speech bubbles so that they are no longer larger (9d3008f), which was the UX change I thought should be reviewed by Erik or Nina. It looks like there was some overlap between his last commit and my advice to get UX review. Also Jen's latest commit looks good to me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Gracefully handle lack of submission GPG private key
5 participants