Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make deletion of multiple sources asynchronous #5257

Merged
merged 1 commit into from
May 26, 2020

Conversation

rmol
Copy link
Contributor

@rmol rmol commented May 13, 2020

Status

Ready for review

Description of Changes

Add Source.deleted_at timestamp. Start generally filtering sources that have it set. When deleting multiple sources with journalist_app.utils.col_delete, just mark them deleted with this and defer the removal of their submissions, keypair and database record to asynchronous processing. Add a new systemd service to periodically purge sources marked for deletion.

Assign explicit names to source GPG keys. Instead of using the python-gnupg default of "Autogenerated Key", label source keys with "Source Key". When asked to delete a reply keypair, verify that its UID contains one of those labels.

Fixes #5233

Testing

Development configuration

  • Check out this branch (5233-spring-cleaning-support).
  • Run the dev server with NUM_SOURCES=100 make dev .
  • In the journalist interface, select multiple sources for deletion.
  • Delete the selected sources. The operation should finish quickly, probably in a second or two.
  • In the terminal where you ran make dev, you should see the source_deleter script deleting the individual sources and their data.

Production configuration

  • Check out this branch (5233-spring-cleaning-support)
  • Run make build-debs && make staging to provision a staging environment with these changes.
  • When it completes, use qa_loader.py or another means of populating the staging database with a large number of sources (100 should be sufficient to have triggered the timeout before this change, but 500 or 1000 would be better).
  • While that runs, check that the source_deleter service is running on the app server with systemctl status securedrop_source_deleter.
  • Once the database is populated, visit the journalist interface and select all but a few of the sources for deletion.
  • Delete the sources. The operation should finish quickly.
  • Watch the source_deleter logging on the app server with journalctl -fu securedrop_source_deleter. Deleting 1000 sources took almost half an hour in my testing.
  • Verify that the sources you did not select were not deleted.

Deployment

Making the deletion of sources asynchronous makes it possible to delete many at once without having the operation time out, but that sources are not being immediately removed is only indicated by the changed text of the flash message after the sources are marked for deletion: it now says "n collections scheduled for deletion".

As mentioned above, deleting large numbers of sources can take a while, so there's the possibility of system errors or a reboot interrupting them. The source_deleter process should resume removal of sources marked for deletion when the system comes back up.

Errors during deletion will only appear in the system logs, and then in OSSEC alerts. At this point problems are going to require support, but the failure modes should be few and unlikely. Deletion of submissions is handled by the existing (reasonably well-tested by now) shredder mechanism. Deletion of GPG keys could fail if permissions on /var/lib/securedrop/keys were altered, or there were a bug in GnuPG/python-gnupg, or there were some other ephemeral error on the server, but in those cases the deletion should abort before removing the source database record, so the operation would be retried.

Checklist

If you made changes to the server application code:

  • Linting (make lint) and tests (make test) pass in the development container

If you made changes to the system configuration:

If you made non-trivial code changes:

  • I have written a test plan and validated it for this PR

@lgtm-com
Copy link

lgtm-com bot commented May 13, 2020

This pull request introduces 2 alerts when merging 41e8297 into 696b47a - view on LGTM.com

new alerts:

  • 2 for Unreachable code

@rmol rmol force-pushed the 5233-spring-cleaning-support branch 3 times, most recently from 9846e2c to d813aa4 Compare May 19, 2020 20:30
redshiftzero
redshiftzero previously approved these changes May 20, 2020
Copy link
Contributor

@redshiftzero redshiftzero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks great! thanks @rmol. tested both in dev env and with ~150 sources in staging, confirmed keys and files both deleted async after deletion is scheduled.

at least before release, we should add a note to the docs about what scheduled for deletion means, since the user might not realize that this means that the deletions will start processing basically immediately but just async, and instead might wonder/worry "when is it scheduled for?" (cc @ninavizz and @eloquence on any thoughts there).

If another person (@emkll or @zenmonkeykstop) wants to take a look in the next couple days given the scope of the changes, please do, else this is good to merge when CI passes

@@ -38,16 +38,16 @@ def name_length_validation(form, field):


class NewUserForm(FlaskForm):
username = TextField('username', validators=[
username = StringField('username', validators=[
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(for other reviewers: TextField is deprecated)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was just trying to reduce the noise.

@eloquence
Copy link
Member

eloquence commented May 20, 2020

Thanks for flagging the UX implications @redshiftzero. I gave it a quick spin, I agree we may want to tweak the notification language a bit and/or address this in docs. Will discuss w/ Nina & John.

Should we add a "SecureDrop Client" section to the test plan since this does modify the API and has potential implications for sync behavior?

@eloquence
Copy link
Member

eloquence commented May 20, 2020

Question for @rmol: What motivates changing the user experience at all? If I'm understanding the scope of changes correctly, we were using a potentially long-running asynchronous process for the deletion of files on-disk before, and are using a potentially long-running asynchronous process for the deletion of files and database records now, to prevent timeouts at any stage of the process. Do I have that right?

If so, is there an argument against keeping the Journalist UI exactly as it was? I do think we should document deletion behavior a bit more so admins and curious journalists understand how it works under the hood, but I'm not sure we need to expose those mechanics to the user.

@rmol
Copy link
Contributor Author

rmol commented May 21, 2020

@redshiftzero That's a good point. I'd suggest that we consider changing the wording to "scheduled for immediate deletion", or "deletion of 100 collection(s) has begun".

@eloquence Yes, I'll add to the test plan a check of the implications for the client.

@eloquence My motivation for changing the user experience was that a journalist's perception of what is happening in the system should match reality. You have it right: before, when we said the keypairs and Source records had been deleted, they really had been. Now, that removal will only have been started, and instead of just disconnected files in the shredder, the filesystem will still contain those submissions and replies, plus the corresponding source records and their keys,

It's a better situation than being unable to delete large batches at all when you need to, but I think it's important to be clear about the true status of the source data. I believe that should absolutely be indicated to journalists, at the very least in the confirmation message. I'd like to go further and make it possible to monitor pending deletions somewhere in the journalist interface. If we don't change the UI at all and the only notification of this change is in the docs, few people are going to be aware of it and understand what's actually happening, and I think that's dangerous.

It occurs to me that the client already exposes the asynchronous nature of source deletion, with a visible pending state. It wouldn't be the worst thing for both experiences to be consistent.

@eloquence
Copy link
Member

You have it right: before, when we said the keypairs and Source records had been deleted, they really had been.

I'm not sure I follow. In the current production experience, when I delete a batch of sources, it immediately says "3 collections deleted", but the shredder may not have deleted the associated files yet, no? In that sense it seems we're merely expanding the asynchronicity of an operation that's already partially asynchronous?

@rmol
Copy link
Contributor Author

rmol commented May 21, 2020

In the current production experience, when I delete a batch of sources, it immediately says "3 collections deleted", but the shredder may not have deleted the associated files yet, no?

That is correct. But the keypairs and Source records are gone by the time the response is rendered.

we're merely expanding the asynchronicity of an operation that's already partially asynchronous?

Yes, but I think the fact that all the sources' data will now be available for a short while is significant and should be made clear.

@zenmonkeykstop
Copy link
Contributor

Given that this currently only schedules immediate deletions, it looks the same as current behaviour from the journalist perspective, so a UX change doesn't seem necessary at this point IMO. (If an option for disappearing messages/trash messages is added subsequently, it would probably be worth differentiating deletion types at that point.)

@ninavizz
Copy link
Member

I feel it's very confusing for a user to execute a "Delete" command, and to then be told their data is still available and will actually be deleted later. I appreciate that's how it works, and how transparency to users is critical in security. This degree of both, however, I feel cause confusion and too much friction with existing mental models and expectations of how "Delete" should work.

Also, "Schedule 'x' Task" is a command option in many apps (such as scheduling emails or Tweets to be sent), and when Erik first pinged me about this my first thought was "I didn't schedule a delete, I just wanted it to delete!"

If anything, offering an "Undo" feature while the data is still available, meets existing user mental models—but of course that would be either inaccurate, or just odd, in our context.

SD is a very complicated product, and it's a lot of balance between what of that complexity to expose to users, and what of it to not. If the user cannot meaningfully act upon something, making it clear to a user that the data remains available provides no value, and may even create confusion. My $0.02. :)

@rmol
Copy link
Contributor Author

rmol commented May 21, 2020

I've changed it to say "secure deletion of 1 collection has begun", so there's no confusion about whether the journalist "scheduled" the operation.

@rmol rmol force-pushed the 5233-spring-cleaning-support branch from 388f7d7 to f3c5dec Compare May 21, 2020 20:29
@ninavizz
Copy link
Member

Hey John! Thanks for these quick updates. Erik spun them up on his dev-env for me, and I had a chance to poke around a bit.

Primary takeaway for me, is that we're now introducing a partial-concept; vs more concretely informing the user's risk as introduced by a class of actions. The concept of secure deletion is presented against only 1 of 3 deletion actions the user can take. As the user, I'm left to ask: "Why can't I securely delete files or messages? There's probably a reason I can't securely delete a Source while (in the Source Page view)."

I've observed SecureDrop users to assume a lot of odd things about how things are working under the hood, when concepts aren't clear to them.

TL;DR, I think it would be better for now to keep the old deletion user experience, and make consistent changes across all deletion actions in a future UI iteration. Happy to do a call with you to discuss further. I appreciate all the thought for users, so much. :)

@rmol rmol force-pushed the 5233-spring-cleaning-support branch from f3c5dec to 4fb965f Compare May 22, 2020 19:51
@rmol
Copy link
Contributor Author

rmol commented May 22, 2020

The flash message text has been reverted.

Add Source.deleted_at timestamp. Start generally filtering sources
that have it set. When deleting multiple sources with
journalist_app.utils.col_delete, just mark them deleted with this and
defer the removal of their submissions, keypair and database record to
asynchronous processing. Add a new systemd service to periodically
purge sources marked for deletion. It's been decided that the
journalist experience will not be changed to make this new mechanism
apparent; the messaging remains the same and there is no facility for
monitoring the deletion process.

Assign explicit names to source GPG keys. Instead of using the
python-gnupg default of "Autogenerated Key", label source keys with
"Source Key". When asked to delete a reply keypair, verify that its
UID contains one of those labels.
@rmol rmol force-pushed the 5233-spring-cleaning-support branch from 4fb965f to ee06369 Compare May 26, 2020 13:44
Copy link
Contributor

@redshiftzero redshiftzero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approving, thanks @rmol!

@redshiftzero redshiftzero merged commit e0c5ff0 into develop May 26, 2020
@redshiftzero redshiftzero deleted the 5233-spring-cleaning-support branch May 26, 2020 17:10
@kushaldas kushaldas mentioned this pull request Jun 11, 2020
20 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deleting large numbers of sources can fail such that is it thereafter impossible to delete the sources
5 participants