app: add pending reply status, persist replies in the database #578

redshiftzero · 2019-10-21T19:53:39Z

Description

Fixes #350.
Fixes #294.

What it looks like now when you send replies (the failed replies will persist between application restarts and clicking between sources):

For a followup:

Add spinner: https://app.zeplin.io/project/5c807ea562f734bd2756b243/screen/5d5b183a152c905223e5fab0 - will file in a followup issue since we don't actually have a nice place in the conversation item widget to put this yet and I don't want this diff to get any larger (actually there already is an issue - Add spinner for replies in process of being sent #359)

Test Plan

To generate lots of failed replies and introduce delay I'm testing using the following server diff:

diff --git a/securedrop/journalist_app/api.py b/securedrop/journalist_app/api.py
index aa776d520..8eb8f26c8 100644
--- a/securedrop/journalist_app/api.py
+++ b/securedrop/journalist_app/api.py
@@ -230,6 +230,13 @@ def make_blueprint(config):
                 {'replies': [reply.to_json() for
                              reply in source.replies]}), 200
         elif request.method == 'POST':
+            import time
+            time.sleep(2)
+
+            import random
+            fail_early = random.choice([True, False])
+            if fail_early:
+                abort(409, 'That UUID is already in use.')
             source = get_or_404(Source, source_uuid,
                                 column=Source.uuid)
             if request.json is None:

You can use that or you can just use staging and rely on regular ol tor network issues to:

Send a reply (or a few) to a source (source A) until one fails.
Click to another source (source B).
Click back to the source A. Confirm that the failed reply is still there.
Close the application.
Restart the application with the same sdc homedir. Confirm the failed reply is still there.

Checklist

If these changes modify code paths involving cryptography, the opening of files in VMs or network (via the RPC service) traffic, Qubes testing in the staging environment is required. For fine tuning of the graphical user interface, testing in any environment in Qubes is required. Please check as applicable:

I have tested these changes in the appropriate Qubes environment
I do not have an appropriate Qubes OS workstation set up (the reviewer will need to test these changes)
These changes should not need testing in Qubes

ninavizz · 2019-10-21T20:30:16Z

Question: is the above video just looping, or does it ever go from purple to urgent-coral to blue in a seconds-long window? The urgent-coral (visible error/stalled state) should not show unless the client has tried and tried and tried, and the user needs to likely intervene to troubleshoot.

redshiftzero · 2019-10-21T20:32:16Z

oh yeah the gif is just looping. i can confirm it is the case that the urgent-coral bar will only show up if there is some problem that requires user intervention. for example, if the reply send fails the first time due to a network error, then the reply gets resent again transparently to the user without turning the color bar urgent-coral.

redshiftzero · 2019-10-23T22:31:32Z

Today I thought a bit about how to handle the server/client reply order state synchronization (the remaining issue with this branch, and the underlying issue behind #489), here are my thoughts on how to proceed, comment if you think there's an error in my logic or you think I'm overlooking a simpler/more elegant solution.

First here are some test cases that we need to consider and handle. For all cases we only consider a single source. Other requirements/constraints are:

file_counter is used for ordering the conversation view and cannot be duplicated for any conversation item with a single source: i.e. the ordering for a given source’s conversation view is unique.
Pending and failed replies can only be stored locally.
There may be multiple journalists interacting with a given source at a given time.

Case 1: Single journalist, multiple successful replies

Source sends message M with file_counter=1.
Journalist A submits reply X with local file_counter=2 to queue.
Journalist A submits reply Y with local file_counter=3 to queue.
Reply with file_counter=2 (X) is saved on the server.
Reply with file_counter=3 (Y)is saved on the server.

Expected ordering (server and client): M, X, Y

Case 2: Single user, multiple replies, some that fail

Source sends message M with file_counter=1.
Journalist A submits reply X with local file_counter=2 to queue.
Reply with local file_counter=2 (X) fails to send.
Journalist A submits reply Y with local file_counter=3 to queue.
Reply with local file_counter=3 (Y) is saved on the server as file_counter=2.

Expected ordering (server): M, Y
Expected ordering (client): M, X, Y

Case 3: Multiple clients, multiple successful replies

Source sends message M with file_counter=1.
Journalist A submits reply X with local file_counter=2 to their queue.
Journalist B submits reply Y with local file_counter=2 to their queue.
Reply from A (X) with local file_counter=2 is saved on the server.
Reply from B (Y) with local file_counter=2 is saved on the server with file_counter=3.

Expected ordering (server, journalist A, journalist B): M, X, Y

Case 4: Single journalist, multiple successful replies, source messaging at the same time

Source sends message M1 with file_counter=1.
Journalist A submits reply X with local file_counter=2 to their queue.
Source sends message M2 with file_counter=2.
Client downloads message M2 with file_counter=2.
Reply from A (X) with local file_counter=2 is saved on the server with file_counter=3.

Expected ordering (server, journalist A, journalist B): M1, M2, X

Current behavior

Case 1 is handled in the PR as is, that’s the happy path.
Case 2 triggers exception upon sync due to two replies with file_counter=2 (very similar to replies.file_counter unique constraint fails and crashes app #489)
Case 3 triggers exception for journalist B due to two replies with file_counter=2.
Case 4 is basically bug replies.file_counter unique constraint fails and crashes app #489 which I’m including here for completeness since it’s closely related.

New behavior

Here’s my current thinking on the best way to handle this:

In the response from the POST reply endpoint on the server side, we return the actual file_counter of the saved reply. This way we know where in the conversation this item is (without having to do another round trip which could fail). When we get a successful reply upload, we update the file_counter and filename in the database. This handles case 3 and case 4. (Edited: we currently return the filename so we can extract file_counter already 🎉)
We make a new ORM object called DraftReply:
- it has all the rows the reply table does except it does not have the file_counter field.
- it also will add a timestamp field: this is the local timestamp that the reply was sent and this will be used for ordering local replies when there are multiple attempted replies in between conversation items from the server.
- it also will add a field called prev_file_counter, which points to the file_counter of the message/reply/file after which the draft reply was sent. We need to do this because we only have the timestamp of the most recent conversation item server side (so we can’t order by timestamp for the conversation items from the server).
- when we construct the conversation view, we include all messages, replies, and files as we were doing before ordered by file_counter. Then we interleave in the pending and failed local replies, placed after the corresponding prev_file_counter, ordered by timestamp. This handles case 2.

sssoleileraaa

Quick feedback while I'm testing this:

The client won't open now after I closed the application when the reply fails to send. I'm seeing:

  File "/home/creviera/workspace/freedomofpress/securedrop-client/securedrop_client/storage.py", line 166, in update_sources
    delete_single_submission_or_reply_on_disk(document, data_dir)
  File "/home/creviera/workspace/freedomofpress/securedrop-client/securedrop_client/storage.py", line 437, in delete_single_submission_or_reply_on_disk
    filename_without_extensions = obj_db.filename.split('.')[0]
AttributeError: 'DraftReply' object has no attribute 'filename'

redshiftzero · 2019-10-25T21:15:05Z

ahh thanks, forgot the source sync deletion iterates through the source.collection (which now includes DraftReplys) - added a fix and regression test coverage in two small commits (not force pushing in case you are mid-review)

sssoleileraaa · 2019-10-29T00:10:37Z

not sure what's happening but all my replies are now saved in draftreplies with send_status_id as 2 (FAILED). when i switch to master it replies send. i'll have to review the code more tomorrow and try to figure out what's wrong with my setup or maybe there's a bug in the code, but this does feel like a setup issue.

Update: i was wrong about replies sending on master, could have sworn they were

redshiftzero · 2019-10-29T19:45:25Z

For the interested observer @creviera's report is correct - replies now fail to send - however this is due to #598 on master (to reproduce on master you must first add a new source since the issue appears to be something related to gpg key imports).

redshiftzero · 2019-10-29T21:02:29Z

I can confirm that after deleting the lock file cited in freedomofpress/securedrop#4909 (which from my testing just now appears to be the cause of #598) then testing this branch in Qubes works without issue for me

sssoleileraaa

STR:

Send a reply that you know will fail (cut connection to server)
Send a couple other replies (you'll see them stay in the pending state)
Click retry (nothing should change)
Fix connection to server and refresh

Expected

The red reply and pending replies that follow are sent

Actual

The red reply shows up once in red as failed, followed by a copy of itself once in green as successful. The pending replies that follow are sent and green.

securedrop_client/gui/widgets.py

redshiftzero · 2019-10-30T14:51:07Z

Interesting! I think what is happening here is that the reply send does time out: the reply status is failed and is stored as a draft on the client side. However, the reply actually did get saved on the server, we just didn't get the response due to network issues. I think the right place to handle this is during the sync action for new replies: we should check if there is a draft locally with the same uuid as the new reply from the server: if yes, then we simply delete the local draft. I'll implement this and then retest your scenario.

during the sync, we don't attempt to delete draft replies in the source.collection they aren't stored on disk, but will get deleted by the cascade delete when the source is deleted. however we also ensure that duplicate drafts are cleaned up on sync to handle the scenario where a ReplySendJob "fails" but the reply _was_ actually saved properly on the server.

redshiftzero · 2019-11-04T17:56:11Z

So - there's a remaining issue with the queue pausing which will complicate testing (basically as is the queue isn't pausing when the server is completely down, not timing out, which requires proxy changes): that's freedomofpress/securedrop-proxy#128 - we'll need to handle this outside of this PR.

sssoleileraaa

Looking good so far, just made a follow-up issue that we can discuss on Monday.

Switching to code review now.

if a user sends multiple replies A, B, C, the order should always be A, B, C, even if: * A fails * B sends successfully * C is pending we ensure that once a reply sends successfully - or we find that a reply _did_ send but we have it marked as failed as we never got the response (h/t @creviera for testing this case) - that we ensure that C appears after B. this is done by updating the file_counter.

redshiftzero · 2019-11-05T23:17:13Z

added a commit 95db9b4 to handle a case @creviera and I just discussed synchronously:

User sends Replies A and B
Reply A sends
Reply A fails, and the queue pauses. Reply B is still pending.

At this point reply A has been successfully sent on the server in this scenario - we just never got the response. That reply will appears as failed locally until we sync and find out otherwise. In real world use this can happen, and we now have logic in this PR to handle this situation on sync: if we find that a draft exists locally matching a successfully sent reply on the server, we remove the local draft and update to reflect the server state locally.

However, in this case we need to also ensure that pending reply B will appear after now successfully reply A in the conversation view for the user. To do so, we run the logic that runs after a reply sends successfully updating the ordering of draft replies to move the drafts to after the file_counter corresponding to the successfully sent reply.

sssoleileraaa · 2019-11-06T01:00:08Z

That reply will appears as failed locally until we sync and find out otherwise.

Yeah, so what I'm seeing is:

0. User sends Replies A and B
1. Reply A sends
2. Reply A fails, and the queue pauses. Reply B is still pending.
3. When queue starts again, Reply B sends, Reply A still appears as failed in the client
4. ~10 seconds later, Reply A shows up as sent

This is because, as you say, Reply A actually sent and made it to the server in step 2 but the client doesn't know about it yet. It would be nice to be able to see Reply A as successful before Reply B shows up as successful because there's about 10 seconds of it looking like Reply A was never sent.

You might have mentioned the reason to not sync the client with the server before starting the queue again, but I can't remember it so it'll be helpful to have an explanation in writing somewhere.

Update

Eventually we will move Metadata sync to a job that'll be added to a queue at a set interval but could also be added to the main queue when the sync icon is clicked or when we do things like unpausing a queue. I think it might make sense to make what I'm describing above into an issue and mention that we could prioritize a Metadata sync job before any other job on the queue.

sssoleileraaa · 2019-11-06T01:34:39Z

However, in this case we need to also ensure that pending reply B will appear after now successfully reply A in the conversation view for the user. To do so, we run the logic that runs after a reply sends successfully updating the ordering of draft replies to move the drafts to after the file_counter corresponding to the successfully sent reply.

Reply A is always showing up before Reply B now, but instead of Reply B showing up as successful it now shows up as failed (in the case where we close the client, see steps to repro below):

User sends Replies A and B
Reply A sends
Reply A fails, and the queue pauses. Reply B is still pending.
User closes client, reauthenticates when logs back in, the queue starts again
After a few or more seconds, Reply A shows up as sent, but Reply B remains failed.

Before the latest commit, I'm pretty sure Reply B would send. I'm super curious to see what's going on here, but might have to wait until tomorrow morning.

sssoleileraaa · 2019-11-06T01:38:14Z

but instead of Reply B showing up as successful it now shows up as failed

Oh wait, actually, this is what we want! We don't want to automatically send pending replies between different client sessions.

Okay, nvm, ignore latest comment.

sssoleileraaa

So the only thing I haven't reviewed yet are tests. I have some comments, a few questions, nothing really blocking the PR, but it would be nice to have another day to let all these changes sink in and chat with you more about design etc.

sssoleileraaa · 2019-11-06T01:42:39Z

securedrop_client/api_jobs/uploads.py

@@ -6,7 +6,8 @@

 from securedrop_client.api_jobs.base import ApiJob
 from securedrop_client.crypto import GpgHelper
-from securedrop_client.db import Reply, Source
+from securedrop_client.db import DraftReply, Reply, ReplySendStatus, ReplySendStatusCodes, Source


so organized!

securedrop_client/api_jobs/uploads.py

sssoleileraaa · 2019-11-06T02:00:42Z

securedrop_client/api_jobs/uploads.py

            session.commit()
+
            return reply_db_object.uuid
        except RequestTimeoutError as e:


This is out of the scope of this PR but shouldn't we also be catching AuthError and ApiInaccessibleError and raising a custom exception to include reply_uuid and message like we do SendReplyJobTimeoutError?

:o yes... yes we should

sssoleileraaa · 2019-11-06T02:17:07Z

securedrop_client/db.py

@@ -54,7 +56,11 @@ def collection(self) -> List:
        collection.extend(self.messages)
        collection.extend(self.files)
        collection.extend(self.replies)
-        collection.sort(key=lambda x: x.file_counter)
+        collection.extend(self.draftreplies)
+        # Sort first by the file_counter, then by timestamp (used only for draft replies).


I wonder if there's a more high-level visible place we should mention how we use timestamps for saved drafts. I can't think of one other than in a docstring for source or in our client architecture doc. I was just hoping to find more information about why we use the timestamp somewhere. I did find your PR comment:

this is the local timestamp that the reply was sent and this will be used for ordering local replies when there are multiple attempted replies in between conversation items from the server.

So we could use a local_file_counter instead of timestamp right? I don't feel strongly about this but maybe it makes it clearer that we don't actually care about time, we just care about order in which a reply was drafted so that we can display the drafts in the correct order in the client?

Or perhaps we'll want to show the timestamp next to the draft to help the journalist remember when they drafted it?

hmm yeah good point - how about I add a description of the ordering situation here to the wiki architecture page?

i.e. saying something like

"draft replies store:

a file_counter which points to the file_counter of the previously sent item. this enables us to interleave the drafts with the items from the source conversation fetched from the server, which do not have timestamps associated with them.

a timestamp which contains the timestamp the draft reply was saved locally: this is used to order drafts in the case where there are multiple drafts sent after a given reply (i.e. when file_counter is the same for multiple drafts)"

with an example

I actually did call the DraftReply.file_counter field local_file_counter field (😇) but then renamed it back to file_counter to simplify the source.collection.sort key. You're right that we could ditch timestamp and have two fields file_counter and local_file_counter. imho I figure is slightly more useful to have the actual timestamp locally for if we ever do want to expose the draft timestamp to users (I could imagine that being useful).

sssoleileraaa · 2019-11-06T02:20:27Z

securedrop_client/db.py

+
+
+class ReplySendStatusCodes(Enum):
+    """In progress (sending) replies can currently have the following statuses"""


I see a SAVED status in the future 🔮

haha yeah so at some point in this PR before I created the DraftReply object this status object was stored on the Reply and had three codes: FAILED, SUCCESSFUL, PENDING. but I removed SUCCESSFUL when I created the DraftReply as only drafts can be FAILED or PENDING, all replies are by construction SUCCESSFUL

I could imagine this is a place to store more detailed status codes about the failures in the future like ENCRYPTION_FAILED_NO_SOURCE_KEY, SEND_FAILED_SOURCE_DELETED. We could also imagine some more granular pending statuses like PENDING_ENCRYPTION_IN_PROGRESS, PENDING_AWAITING_SERVER_RESPONSE.

securedrop_client/gui/widgets.py

sssoleileraaa · 2019-11-06T02:23:08Z

securedrop_client/gui/widgets.py

-        Send reply and emit a signal so that the gui can be updated immediately, even before the
-        the reply is saved locally.
+        Send reply and emit a signal so that the gui can be updated immediately indicating
+        that it is a pending reply.


sssoleileraaa · 2019-11-06T02:24:48Z

securedrop_client/storage.py

@@ -241,7 +244,7 @@ def update_replies(remote_replies: List[SDKReply], local_replies: List[Reply],
    * Existing replies are updated in the local database.
    * New replies have an entry created in the local database.
    * Local replies not returned in the remote replies are deleted from the
-      local database.
+      local database unless they are pending or failed.


sssoleileraaa · 2019-11-06T02:40:02Z

securedrop_client/storage.py

+    When we confirm a sent reply, if there are drafts that were sent after it,
+    we need to reposition them to ensure that they appear _after_ the confirmed
+    replies.
+    """


I would find it super helpful for you to add a quick description for each parameter. For instance I was confused how the new_file_counter was found until I looked at where this function is used. It looks like new_file_counter is from the reply (that is causing the draft replies to be reordered) that the server has recorded? And old_file_counter is from the reply (that is causing the draft replies to be reordered) that the client has recorded?

Why do we need to set all the draft replies to have the same file_counter as the new_file_counter?

It looks like this works but just not sure why you decided to approach it this way.

your understanding here is correct - I've added some more explanation and an example in 44c4394, let me know if that makes sense!

sssoleileraaa

The docstring change is really helpful. This change is significant and not only fixes a very important issue of a journalist potentially losing what they write in the ReplyBox, it also sets us up nicely for adding a "Save" feature and allowing users to selectively send a draft or failed reply.

This looks ready to be framed 🖼️

Let's shipit!

redshiftzero mentioned this pull request Oct 23, 2019

replies.file_counter unique constraint fails and crashes app #489

Closed

redshiftzero changed the title ~~[wip] app: add pending reply status, persist replies in the database~~ app: add pending reply status, persist replies in the database Oct 24, 2019

redshiftzero force-pushed the persist-replies-db branch from 31c6737 to 0a76bcf Compare October 24, 2019 21:40

redshiftzero marked this pull request as ready for review October 24, 2019 22:06

redshiftzero requested review from sssoleileraaa and kushaldas as code owners October 24, 2019 22:06

sssoleileraaa suggested changes Oct 25, 2019

View reviewed changes

redshiftzero mentioned this pull request Oct 29, 2019

replies failing to send on master in qubes #598

Closed

sssoleileraaa self-requested a review October 29, 2019 22:14

sssoleileraaa suggested changes Oct 29, 2019

View reviewed changes

securedrop_client/gui/widgets.py Show resolved Hide resolved

redshiftzero added 3 commits October 31, 2019 19:42

app: persisting replies in the database if pending, failed

196c969

app: add pending reply before adding to send queue

c281524

app: failed/pending replies don't get deleted by the sync action

9958697

redshiftzero force-pushed the persist-replies-db branch from 09482af to 1cb1a2d Compare October 31, 2019 23:43

redshiftzero added 3 commits November 4, 2019 11:18

test: update tests for pending/failed/draft replies

935de8a

test: ensure sync does not delete (non-existent) DraftReply files

73e13b7

redshiftzero added 3 commits November 4, 2019 11:18

app: make pending color bar blue (addresses UX feedback)

dc878cd

test: add coverage for draft cleanup during sync

c3fb666

test: additional coverage for expected ordering of drafts/replies

4880707

redshiftzero force-pushed the persist-replies-db branch from 1cb1a2d to 4880707 Compare November 4, 2019 16:19

redshiftzero mentioned this pull request Nov 4, 2019

[securedrop-proxy] gracefully handle connection failures freedomofpress/securedrop-proxy#128

Closed

login/out should fail any pending replies

03c116b

sssoleileraaa self-requested a review November 5, 2019 21:00

sssoleileraaa reviewed Nov 5, 2019

View reviewed changes

sssoleileraaa self-requested a review November 6, 2019 01:10

sssoleileraaa reviewed Nov 6, 2019

View reviewed changes

app: explain the client-side re-ordering in a docstring

44c4394

sssoleileraaa self-requested a review November 7, 2019 21:47

sssoleileraaa approved these changes Nov 7, 2019

View reviewed changes

sssoleileraaa merged commit 2acfaea into master Nov 7, 2019

sssoleileraaa deleted the persist-replies-db branch November 7, 2019 21:54

redshiftzero mentioned this pull request Nov 14, 2019

Crash when sending reply [Nov 14 nightly / SF instance] #622

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

app: add pending reply status, persist replies in the database #578

app: add pending reply status, persist replies in the database #578

redshiftzero commented Oct 21, 2019 •

edited

Loading

ninavizz commented Oct 21, 2019

redshiftzero commented Oct 21, 2019

redshiftzero commented Oct 23, 2019 •

edited

Loading

sssoleileraaa left a comment •

edited

Loading

redshiftzero commented Oct 25, 2019

sssoleileraaa commented Oct 29, 2019 •

edited

Loading

redshiftzero commented Oct 29, 2019

redshiftzero commented Oct 29, 2019

sssoleileraaa left a comment

redshiftzero commented Oct 30, 2019

redshiftzero commented Nov 4, 2019

sssoleileraaa left a comment

redshiftzero commented Nov 5, 2019

sssoleileraaa commented Nov 6, 2019 •

edited

Loading

sssoleileraaa commented Nov 6, 2019

sssoleileraaa commented Nov 6, 2019

sssoleileraaa left a comment

sssoleileraaa Nov 6, 2019

sssoleileraaa Nov 6, 2019

redshiftzero Nov 6, 2019

sssoleileraaa Nov 6, 2019

redshiftzero Nov 6, 2019

sssoleileraaa Nov 6, 2019

redshiftzero Nov 6, 2019

sssoleileraaa Nov 6, 2019

sssoleileraaa Nov 6, 2019

sssoleileraaa Nov 6, 2019

redshiftzero Nov 6, 2019

sssoleileraaa left a comment



		class ReplySendStatusCodes(Enum):
		"""In progress (sending) replies can currently have the following statuses"""

app: add pending reply status, persist replies in the database #578

app: add pending reply status, persist replies in the database #578

Conversation

redshiftzero commented Oct 21, 2019 • edited Loading

Description

Test Plan

Checklist

ninavizz commented Oct 21, 2019

redshiftzero commented Oct 21, 2019

redshiftzero commented Oct 23, 2019 • edited Loading

Case 1: Single journalist, multiple successful replies

Case 2: Single user, multiple replies, some that fail

Case 3: Multiple clients, multiple successful replies

Case 4: Single journalist, multiple successful replies, source messaging at the same time

Current behavior

New behavior

sssoleileraaa left a comment • edited Loading

Choose a reason for hiding this comment

redshiftzero commented Oct 25, 2019

sssoleileraaa commented Oct 29, 2019 • edited Loading

redshiftzero commented Oct 29, 2019

redshiftzero commented Oct 29, 2019

sssoleileraaa left a comment

Choose a reason for hiding this comment

Expected

Actual

redshiftzero commented Oct 30, 2019

redshiftzero commented Nov 4, 2019

sssoleileraaa left a comment

Choose a reason for hiding this comment

redshiftzero commented Nov 5, 2019

sssoleileraaa commented Nov 6, 2019 • edited Loading

Update

sssoleileraaa commented Nov 6, 2019

sssoleileraaa commented Nov 6, 2019

sssoleileraaa left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sssoleileraaa left a comment

Choose a reason for hiding this comment

redshiftzero commented Oct 21, 2019 •

edited

Loading

redshiftzero commented Oct 23, 2019 •

edited

Loading

sssoleileraaa left a comment •

edited

Loading

sssoleileraaa commented Oct 29, 2019 •

edited

Loading

sssoleileraaa commented Nov 6, 2019 •

edited

Loading