Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for removal of "pending" source accounts #6488

Closed
1 of 3 tasks
zenmonkeykstop opened this issue Jul 18, 2022 · 6 comments · Fixed by #6785
Closed
1 of 3 tasks

Allow for removal of "pending" source accounts #6488

zenmonkeykstop opened this issue Jul 18, 2022 · 6 comments · Fixed by #6785
Assignees

Comments

@zenmonkeykstop
Copy link
Contributor

zenmonkeykstop commented Jul 18, 2022

Description

When a source account is created but before a first submission, by default a "pending" flag is set. Pending accounts are not displayed in the journalist interface.

If a user then exits their session and never returns to provide a submission, the account will remain in a pending state indefinitely, and cannot be removed via the Journalist Interface. On long-running high-volume instances this increases the source user count significantly, requiring admin intervention to fix.

One approach to allow for the removal of unused source accounts would be to change the pending boolean flag to a datetime value, and purge unused accounts via a cronjob or similar after a set time (say, a month). This has the disadvantage of increasing metadata (source account creation time) about prospective sources, though the field could be set to null on submission, meaning that said metadata would not be stored for sources that were actually ever active. This could be mitigated somewhat by giving the datetime a resolution on the order of days or weeks.

Alternatively, the fact that active source sessions without submissions would be nuked when the purge ran could just be accepted.

Note that this would be moot if/when inverted flow changes land, as accounts would then only be created on first submission. So it might not be worth the effort if it was just a temporary measure.

To do

  1. Add manage.py command
  2. Add tests, especially for --keep-most-recent's offset logic
  3. Extract standalone script for testing purposes

User Research Evidence

long-term observations of instance behavior...

User Stories

@cfm
Copy link
Member

cfm commented Apr 25, 2023

Today @nathandyer and I mapped out the following approach for implementing this lever:

  1. Add a securedrop.management.sources command that:

    1. Retrieves Sources with pending=True, except for the (parameterized) n most recent—that is, except those with the n highest ids.
      • This constraint adds a margin for recent pending sources without introducing the timestamp suggested above.
    2. For each Source in (1), calls EncryptionManager.delete_source_key_pair().
    3. Deletes all Sources in (1).
      • To my eye, Source should have a deletion method or hook that automatically calls EncryptionManager.delete_source_key_pair() to avoid orphaned keys. But that's out of scope of this ticket, and a bulk operation will be more efficient anyway.
  2. Once we're happy with the implementation of this command via manage.py, we can easily derive a standalone version for testing purposes (even just in a gist, to avoid cluttering up the branch). @nathandyer and I agreed that we'd much rather work backwards from the intended manage.py structure than start with a standalone script and then formalize it in manage.py.

@nathandyer
Copy link
Contributor

@cfm I just pushed the progress I've made here, if you have a moment would you mind reviewing it and letting me know if I'm on the right track? Thanks!

@cfm
Copy link
Member

cfm commented Apr 25, 2023

Reviewed directly on 07a5892, @nathandyer. :-)

@nathandyer
Copy link
Contributor

Perfect, thank you so much for that feedback @cfm! Glad to know I'm on the right track. Will take another swing at this in the morning.

@nathandyer
Copy link
Contributor

nathandyer commented Apr 27, 2023

@cfm I've added some new changes which should address the issues you found in the first commit, please feel free to take a look when you have a moment. I will note that I still haven't tested these changes first-hand, I'm getting an error on two different machines when I try to directly call the script (I suspect I have newer packages than I'm supposed to and something got deprecated - will continue to troubleshoot).

@cfm
Copy link
Member

cfm commented Apr 27, 2023

@nathandyer and I paired on this again today, through which we have a working manage.py remove-pending-sources --keep-most-recent N as of eb31592 and a testable version as of e04db45. On Monday we'll pair again on the remaining tasks in the checklist I've added above:

  1. Add tests, especially for --keep-most-recent's offset logic
  2. Extract standalone script for testing purposes

cfm added a commit that referenced this issue May 1, 2023
Per securedrop.models.Source, "sources are 'pending' and don't get
displayed to journalists until they submit something".  Preserve that
behavior by deferring pending=True to record_source_interaction().

In the context of #6488, the test plan should check something like:

	$ ./loaddata.py
	[...]
	$ ./manage.py remove-pending-sources --keep-most-recent 0
	Found 0 pending sources
	Deleted 0 pending sources
	$ ./loaddata.py --messages-per-source 0 --files-per-source 0 --replies-per-source 0
	[...]
	$ ./manage.py remove-pending-sources --keep-most-recent 0
	Found 3 pending sources
	Deleted 3 pending sources
cfm added a commit that referenced this issue May 2, 2023
Per securedrop.models.Source, "sources are 'pending' and don't get
displayed to journalists until they submit something".  Preserve that
behavior by deferring pending=True to record_source_interaction().

In the context of #6488, the test plan should check something like:

	$ ./loaddata.py
	[...]
	$ ./manage.py remove-pending-sources --keep-most-recent 0
	Found 0 pending sources
	Deleted 0 pending sources
	$ ./loaddata.py --messages-per-source 0 --files-per-source 0 --replies-per-source 0
	[...]
	$ ./manage.py remove-pending-sources --keep-most-recent 0
	Found 3 pending sources
	Deleted 3 pending sources
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants