This repository has been archived by the owner on Nov 30, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 16
Pause Erasure Execution for Manual Confirmation [#522] #571
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…nfirming the number of rows erased.
…ses of cache_manual_input and cache_manual_erasure_count.
…asure # Conflicts: # CHANGELOG.md
… the privacy request from the erasure step. Assert both access and erasure are called when we submit the privacy request from the access step.
10 tasks
@ethyca/docs-authors minor docs change added here to explain how to resume a privacy request that is paused in the erasure step. |
sanders41
reviewed
May 31, 2022
I think I've responded to all your comments Paul, ready for another pass! |
sanders41
approved these changes
Jun 1, 2022
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
I see you have a note to the docs team so I didn't merge in case you still have something that needs looking at there.
Thank you @sanders41, i think docs team is out this week, so I'd go ahead and merge and I can make a note to him about the followup. |
10 tasks
10 tasks
sanders41
pushed a commit
that referenced
this pull request
Sep 22, 2022
* First draft - add ability to resume an erasure request by manually confirming the number of rows erased. * Have the access and erasure endpoints share code. * Update privacy request pause failure test. * Add tests for erasure caching. * Update both get_manual_count and get_manual_erasure_count to be inverses of cache_manual_input and cache_manual_erasure_count. * Fix some formatting. * Update changelog and add a docs draft. * Assert that the access portion of execution isn't called if we submit the privacy request from the erasure step. Assert both access and erasure are called when we submit the privacy request from the access step. * Fix submit mock. * Respond to CR.
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
➡️ Follow-up to #554. Follows same pattern, but for an erasure.
Purpose
Some data cannot be automatically masked or destroyed. During execution of a privacy request with configured erasures, pause on collections marked as manual and wait for the user to confirm that data has been removed or manually masked before proceeding.
Resume a paused erasure request by passing in the masked count for the paused collection:
POST {{host}}/privacy-request/{{privacy_request_id}}/erasure_confirm
This mirrors what we return in an automated way when masking collections.
Changes
Big picture, changes added to store which nodes we've run erasures on as we run them, and separately to store which node we're currently paused on, as well as store manually-confirmed erasure counts. Finally, an endpoint is added to resume the privacy request from an erasure stage, more details below:
graph_task.erasure_request
to cache the row count erased for each collection (similar to how we currently cache data retrieved for access requests). The primary purpose of this is to track which collections we've already erased so the new graph doesn't visit them when we resume.update_erasure_mapping_from_cache
which modifies the graph if we need to restart to avoid running erasures on the collections we've already visited. This is how "pausing" is effectively handled - we don't actually pause, but we force execution to stop when we hit a paused node, and then rebuild a different graph that just runs the remaining nodes on resume.Dask Delayed
to execute the graph - its side effect is that an exception cancels all other tasks in the graph.ManualConnector.mask_data
which looks to see if a manual erasure confirmation has been added to the cache. Otherwise, it Pauses the graph. When we rerun after the user adds the manual confirmation via the API, it finds that and proceeds with execution.Checklist
CHANGELOG.md
fileCHANGELOG.md
file is being appended toUnreleased
section in an appropriate category. Add a new category from the list at the top of the file if the needed one isn't already there.Run Unsafe PR Checks
label has been applied, and checks have passed, if this PR touches any external servicesWorth noting
Ticket
Fixes #522