This repository has been archived by the owner on Nov 30, 2022. It is now read-only.
[Spike] Improve reprocessing a request with new collections #1036
Labels
enhancement
New feature or request
Is your feature request related to a specific problem?
Investigate how to better reprocess a privacy request that maximizes retrieving and/or masking the relevant data when the graph has changed.
The original version of reprocessing assumed the graph didn't change between retries, and maximized masking as much of the original data requested as possible. If we mask some of the collections and then we have a failure, current logic lets us to mask the original remaining collections using the saved data we retrieved originally, instead of re-querying the collections to figure out which data we should mask. Once data is masked in one collection, it potentially prevents us from being able to reach data in downstream collections, so we opt to use our temporarily saved data.
The side effect is that data related to newly added collections can be missed:
Describe the solution you'd like
Investigate how to better retrieve and mask newly added data and its downstream collections when reprocessing, while still being able to execute the erasure step in full, even when some collections have already had their data destroyed.
Describe alternatives you've considered, if any
Changing run order
Merging multiple access results to use for erasures
Current reprocessing logic
The text was updated successfully, but these errors were encountered: