Make deletion of multiple sources asynchronous #5257

rmol · 2020-05-13T20:49:31Z

Status

Ready for review

Description of Changes

Add Source.deleted_at timestamp. Start generally filtering sources that have it set. When deleting multiple sources with journalist_app.utils.col_delete, just mark them deleted with this and defer the removal of their submissions, keypair and database record to asynchronous processing. Add a new systemd service to periodically purge sources marked for deletion.

Assign explicit names to source GPG keys. Instead of using the python-gnupg default of "Autogenerated Key", label source keys with "Source Key". When asked to delete a reply keypair, verify that its UID contains one of those labels.

Fixes #5233

Testing

Development configuration

Check out this branch (5233-spring-cleaning-support).
Run the dev server with NUM_SOURCES=100 make dev .
In the journalist interface, select multiple sources for deletion.
Delete the selected sources. The operation should finish quickly, probably in a second or two.
In the terminal where you ran make dev, you should see the source_deleter script deleting the individual sources and their data.

Production configuration

Check out this branch (5233-spring-cleaning-support)
Run make build-debs && make staging to provision a staging environment with these changes.
When it completes, use qa_loader.py or another means of populating the staging database with a large number of sources (100 should be sufficient to have triggered the timeout before this change, but 500 or 1000 would be better).
While that runs, check that the source_deleter service is running on the app server with systemctl status securedrop_source_deleter.
Once the database is populated, visit the journalist interface and select all but a few of the sources for deletion.
Delete the sources. The operation should finish quickly.
Watch the source_deleter logging on the app server with journalctl -fu securedrop_source_deleter. Deleting 1000 sources took almost half an hour in my testing.
Verify that the sources you did not select were not deleted.

Deployment

Making the deletion of sources asynchronous makes it possible to delete many at once without having the operation time out, but that sources are not being immediately removed is only indicated by the changed text of the flash message after the sources are marked for deletion: it now says "n collections scheduled for deletion".

As mentioned above, deleting large numbers of sources can take a while, so there's the possibility of system errors or a reboot interrupting them. The source_deleter process should resume removal of sources marked for deletion when the system comes back up.

Errors during deletion will only appear in the system logs, and then in OSSEC alerts. At this point problems are going to require support, but the failure modes should be few and unlikely. Deletion of submissions is handled by the existing (reasonably well-tested by now) shredder mechanism. Deletion of GPG keys could fail if permissions on /var/lib/securedrop/keys were altered, or there were a bug in GnuPG/python-gnupg, or there were some other ephemeral error on the server, but in those cases the deletion should abort before removing the source database record, so the operation would be retried.

Checklist

If you made changes to the server application code:

Linting (make lint) and tests (make test) pass in the development container

If you made changes to the system configuration:

Configuration tests pass

If you made non-trivial code changes:

I have written a test plan and validated it for this PR

lgtm-com · 2020-05-13T20:56:41Z

This pull request introduces 2 alerts when merging 41e8297 into 696b47a - view on LGTM.com

new alerts:

2 for Unreachable code

redshiftzero

this looks great! thanks @rmol. tested both in dev env and with ~150 sources in staging, confirmed keys and files both deleted async after deletion is scheduled.

at least before release, we should add a note to the docs about what scheduled for deletion means, since the user might not realize that this means that the deletions will start processing basically immediately but just async, and instead might wonder/worry "when is it scheduled for?" (cc @ninavizz and @eloquence on any thoughts there).

If another person (@emkll or @zenmonkeykstop) wants to take a look in the next couple days given the scope of the changes, please do, else this is good to merge when CI passes

redshiftzero · 2020-05-20T21:37:20Z

securedrop/journalist_app/forms.py

@@ -38,16 +38,16 @@ def name_length_validation(form, field):


 class NewUserForm(FlaskForm):
-    username = TextField('username', validators=[
+    username = StringField('username', validators=[


(for other reviewers: TextField is deprecated)

Yeah, I was just trying to reduce the noise.

eloquence · 2020-05-20T22:24:10Z

Thanks for flagging the UX implications @redshiftzero. I gave it a quick spin, I agree we may want to tweak the notification language a bit and/or address this in docs. Will discuss w/ Nina & John.

Should we add a "SecureDrop Client" section to the test plan since this does modify the API and has potential implications for sync behavior?

eloquence · 2020-05-20T23:04:17Z

Question for @rmol: What motivates changing the user experience at all? If I'm understanding the scope of changes correctly, we were using a potentially long-running asynchronous process for the deletion of files on-disk before, and are using a potentially long-running asynchronous process for the deletion of files and database records now, to prevent timeouts at any stage of the process. Do I have that right?

If so, is there an argument against keeping the Journalist UI exactly as it was? I do think we should document deletion behavior a bit more so admins and curious journalists understand how it works under the hood, but I'm not sure we need to expose those mechanics to the user.

rmol · 2020-05-21T14:43:12Z

@redshiftzero That's a good point. I'd suggest that we consider changing the wording to "scheduled for immediate deletion", or "deletion of 100 collection(s) has begun".

@eloquence Yes, I'll add to the test plan a check of the implications for the client.

@eloquence My motivation for changing the user experience was that a journalist's perception of what is happening in the system should match reality. You have it right: before, when we said the keypairs and Source records had been deleted, they really had been. Now, that removal will only have been started, and instead of just disconnected files in the shredder, the filesystem will still contain those submissions and replies, plus the corresponding source records and their keys,

It's a better situation than being unable to delete large batches at all when you need to, but I think it's important to be clear about the true status of the source data. I believe that should absolutely be indicated to journalists, at the very least in the confirmation message. I'd like to go further and make it possible to monitor pending deletions somewhere in the journalist interface. If we don't change the UI at all and the only notification of this change is in the docs, few people are going to be aware of it and understand what's actually happening, and I think that's dangerous.

It occurs to me that the client already exposes the asynchronous nature of source deletion, with a visible pending state. It wouldn't be the worst thing for both experiences to be consistent.

eloquence · 2020-05-21T15:53:18Z

You have it right: before, when we said the keypairs and Source records had been deleted, they really had been.

I'm not sure I follow. In the current production experience, when I delete a batch of sources, it immediately says "3 collections deleted", but the shredder may not have deleted the associated files yet, no? In that sense it seems we're merely expanding the asynchronicity of an operation that's already partially asynchronous?

rmol · 2020-05-21T16:02:38Z

In the current production experience, when I delete a batch of sources, it immediately says "3 collections deleted", but the shredder may not have deleted the associated files yet, no?

That is correct. But the keypairs and Source records are gone by the time the response is rendered.

we're merely expanding the asynchronicity of an operation that's already partially asynchronous?

Yes, but I think the fact that all the sources' data will now be available for a short while is significant and should be made clear.

zenmonkeykstop · 2020-05-21T16:06:23Z

Given that this currently only schedules immediate deletions, it looks the same as current behaviour from the journalist perspective, so a UX change doesn't seem necessary at this point IMO. (If an option for disappearing messages/trash messages is added subsequently, it would probably be worth differentiating deletion types at that point.)

ninavizz · 2020-05-21T16:11:18Z

I feel it's very confusing for a user to execute a "Delete" command, and to then be told their data is still available and will actually be deleted later. I appreciate that's how it works, and how transparency to users is critical in security. This degree of both, however, I feel cause confusion and too much friction with existing mental models and expectations of how "Delete" should work.

Also, "Schedule 'x' Task" is a command option in many apps (such as scheduling emails or Tweets to be sent), and when Erik first pinged me about this my first thought was "I didn't schedule a delete, I just wanted it to delete!"

If anything, offering an "Undo" feature while the data is still available, meets existing user mental models—but of course that would be either inaccurate, or just odd, in our context.

SD is a very complicated product, and it's a lot of balance between what of that complexity to expose to users, and what of it to not. If the user cannot meaningfully act upon something, making it clear to a user that the data remains available provides no value, and may even create confusion. My $0.02. :)

rmol · 2020-05-21T18:18:38Z

I've changed it to say "secure deletion of 1 collection has begun", so there's no confusion about whether the journalist "scheduled" the operation.

ninavizz · 2020-05-21T20:29:56Z

Hey John! Thanks for these quick updates. Erik spun them up on his dev-env for me, and I had a chance to poke around a bit.

Primary takeaway for me, is that we're now introducing a partial-concept; vs more concretely informing the user's risk as introduced by a class of actions. The concept of secure deletion is presented against only 1 of 3 deletion actions the user can take. As the user, I'm left to ask: "Why can't I securely delete files or messages? There's probably a reason I can't securely delete a Source while (in the Source Page view)."

I've observed SecureDrop users to assume a lot of odd things about how things are working under the hood, when concepts aren't clear to them.

TL;DR, I think it would be better for now to keep the old deletion user experience, and make consistent changes across all deletion actions in a future UI iteration. Happy to do a call with you to discuss further. I appreciate all the thought for users, so much. :)

rmol · 2020-05-22T19:52:41Z

The flash message text has been reverted.

Add Source.deleted_at timestamp. Start generally filtering sources that have it set. When deleting multiple sources with journalist_app.utils.col_delete, just mark them deleted with this and defer the removal of their submissions, keypair and database record to asynchronous processing. Add a new systemd service to periodically purge sources marked for deletion. It's been decided that the journalist experience will not be changed to make this new mechanism apparent; the messaging remains the same and there is no facility for monitoring the deletion process. Assign explicit names to source GPG keys. Instead of using the python-gnupg default of "Autogenerated Key", label source keys with "Source Key". When asked to delete a reply keypair, verify that its UID contains one of those labels.

redshiftzero

approving, thanks @rmol!

rmol requested review from conorsch, emkll, kushaldas and redshiftzero as code owners May 13, 2020 20:49

rmol force-pushed the 5233-spring-cleaning-support branch 3 times, most recently from 9846e2c to d813aa4 Compare May 19, 2020 20:30

redshiftzero previously approved these changes May 20, 2020

View reviewed changes

rmol dismissed redshiftzero’s stale review via 388f7d7 May 21, 2020 18:17

rmol force-pushed the 5233-spring-cleaning-support branch from 388f7d7 to f3c5dec Compare May 21, 2020 20:29

rmol force-pushed the 5233-spring-cleaning-support branch from f3c5dec to 4fb965f Compare May 22, 2020 19:51

rmol force-pushed the 5233-spring-cleaning-support branch from 4fb965f to ee06369 Compare May 26, 2020 13:44

redshiftzero approved these changes May 26, 2020

View reviewed changes

redshiftzero merged commit e0c5ff0 into develop May 26, 2020

redshiftzero deleted the 5233-spring-cleaning-support branch May 26, 2020 17:10

kushaldas mentioned this pull request Jun 11, 2020

Release SecureDrop 1.4.0 #5289

Closed

20 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make deletion of multiple sources asynchronous #5257

Make deletion of multiple sources asynchronous #5257

rmol commented May 13, 2020

lgtm-com bot commented May 13, 2020

redshiftzero left a comment

redshiftzero May 20, 2020

rmol May 21, 2020

eloquence commented May 20, 2020 •

edited

Loading

eloquence commented May 20, 2020 •

edited

Loading

rmol commented May 21, 2020

eloquence commented May 21, 2020

rmol commented May 21, 2020

zenmonkeykstop commented May 21, 2020

ninavizz commented May 21, 2020

rmol commented May 21, 2020

ninavizz commented May 21, 2020

rmol commented May 22, 2020

redshiftzero left a comment

Make deletion of multiple sources asynchronous #5257

Make deletion of multiple sources asynchronous #5257

Conversation

rmol commented May 13, 2020

Status

Description of Changes

Testing

Development configuration

Production configuration

Deployment

Checklist

If you made changes to the server application code:

If you made changes to the system configuration:

If you made non-trivial code changes:

lgtm-com bot commented May 13, 2020

redshiftzero left a comment

Choose a reason for hiding this comment

redshiftzero May 20, 2020

Choose a reason for hiding this comment

rmol May 21, 2020

Choose a reason for hiding this comment

eloquence commented May 20, 2020 • edited Loading

eloquence commented May 20, 2020 • edited Loading

rmol commented May 21, 2020

eloquence commented May 21, 2020

rmol commented May 21, 2020

zenmonkeykstop commented May 21, 2020

ninavizz commented May 21, 2020

rmol commented May 21, 2020

ninavizz commented May 21, 2020

rmol commented May 22, 2020

redshiftzero left a comment

Choose a reason for hiding this comment

eloquence commented May 20, 2020 •

edited

Loading

eloquence commented May 20, 2020 •

edited

Loading