-
Notifications
You must be signed in to change notification settings - Fork 16
Conversation
4ad69cd
to
25404fc
Compare
- After retries have expired, throw an exception, cancelling remaining tasks, instead of continuing the graph execution. - On failure, cache the failed step (access or erasure), and the failed collection. - Add an API endpoint for resuming from failure. - Refactor the methods used for caching the paused step and collection to share them with new methods to cache the failed step/collection,
…ceeded instead of continuing with execution.
25404fc
to
1ffd401
Compare
@ethyca/docs-authors minor edit to guide added here |
Thanks @pattisdr, I just have that one product level question re: the webhooks |
# Conflicts: # CHANGELOG.md
… exit before we get to this point.
This reverts commit cfc2b79.
…behavior was added. We should not build a bigquery update query if there is no data to update- this was incorrectly causing a query to be built that looks like: UPDATE `address` SET WHERE address_id = 4; - A failure at the collection level now causes the entire PrivacyRequest to fail, instead of ignoring the failed collection after "x" retries. The above bug was previously being ignored in the test because the collection error was being suppressed.
…_STRICT = False, so both update and delete actions can be performed. Stripe has some endpoints whose update action is a "delete". Saas configs will error if there is an attempt to mask but we haven't granted permission to use delete actions. This test shouldn't have been running with MASKING_STRICT=True, because this particular config requires False for an erasure to run successfully, as there are mixtures of updates/deletes defined. However, existing behavior that ignored a failed collection was still causing this privacy request to complete.
…attempt a masking request on that collection. There's intentionally no update or delete configuration defined for owners right now. This prevents us from trying to run an erasure against that collection for the time being. (We were previously attempting to run an erasure and getting a failure that was ignored, but new execution behavior doesn't ignore failures.)
# run erasure with MASKING_STRICT to execute the update actions | ||
|
||
config.execution.MASKING_STRICT = True | ||
# Run erasure with masking_strict = False so both update and delete actions can be used | ||
config.execution.MASKING_STRICT = False | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stripe tests were being run with config.execution.MASKING_STRICT = True
which is invalid for Stripe, because there are both updates and deletes defined in the config. Nodes with delete
-only configs were failing silently and then Stripe tests were re-run with config.execution.MASKING_STRICT = False
below to get delete-specific behavior.
Because a collection no longer fails silently, we were seeing failures here. To address, we can just run a single erasure request with config.execution.MASKING_STRICT = False
so stripe can use the update if defined, otherwise it uses the delete. The counts below have been updated to reflect this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stripe tests were incorrectly being run with config.execution.MASKING_STRICT = True which is invalid for Stripe, because there are both updates and deletes defined in the config. Nodes with delete-only configs were failing silently and then Stripe tests were re-run with config.execution.MASKING_STRICT = False below to get delete-specific behavior.
That's a really good catch @pattisdr — thanks
@@ -91,7 +91,6 @@ dataset: | |||
- name: id | |||
data_categories: [user.derived.identifiable.unique_id] | |||
fidesops_meta: | |||
primary_key: True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We currently shouldn't attempt to run an erasure against hubspot owners' endpoint. #361. Our tests were attempting to run this and failing silently.
This PR doesn't allow it to fail silently anymore, so this adjustment prevents us from running an erasure against hubspot owners' until we can sort out how to connect to that endpoint.
|
||
config.execution.MASKING_STRICT = True | ||
# Run erasure with masking_strict = False so both update and delete actions can be used | ||
config.execution.MASKING_STRICT = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we wouldn't set this in the test since if execution halts mid-test the value won't be reset and could cause cascade failures within other tests. Looks like you're only updating the value here so let's change these as part of a subsequent ticket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @pattisdr
* WIP Allow restart graph from failure. - After retries have expired, throw an exception, cancelling remaining tasks, instead of continuing the graph execution. - On failure, cache the failed step (access or erasure), and the failed collection. - Add an API endpoint for resuming from failure. - Refactor the methods used for caching the paused step and collection to share them with new methods to cache the failed step/collection, * Add API endpoint tests for restarting from failed node. No request body is required. * Add test that restarting from failure doesn't re-run already-executed nodes. * Add tests for caching the failed step and collection. * Fix imports. * Add minor docs to guides. * Fix retry tests. We now raise an exception after retries have been exceeded instead of continuing with execution. * Fix items from rebase with erasure branch. * Fix items from merge. * Remove check if status is error because errored privacy requests will exit before we get to this point. * Sqlalchemy bigquery upgrade experiment. * Revert "Sqlalchemy bigquery upgrade experiment." This reverts commit cfc2b79. * Fix an existing bigquery bug that was revealed after the new failure behavior was added. We should not build a bigquery update query if there is no data to update- this was incorrectly causing a query to be built that looks like: UPDATE `address` SET WHERE address_id = 4; - A failure at the collection level now causes the entire PrivacyRequest to fail, instead of ignoring the failed collection after "x" retries. The above bug was previously being ignored in the test because the collection error was being suppressed. * Update stripe erasure tests to only run with config.execution.MASKING_STRICT = False, so both update and delete actions can be performed. Stripe has some endpoints whose update action is a "delete". Saas configs will error if there is an attempt to mask but we haven't granted permission to use delete actions. This test shouldn't have been running with MASKING_STRICT=True, because this particular config requires False for an erasure to run successfully, as there are mixtures of updates/deletes defined. However, existing behavior that ignored a failed collection was still causing this privacy request to complete. * Remove the primary key off of hubspot's owners' dataset, so we don't attempt a masking request on that collection. There's intentionally no update or delete configuration defined for owners right now. This prevents us from trying to run an erasure against that collection for the time being. (We were previously attempting to run an erasure and getting a failure that was ignored, but new execution behavior doesn't ignore failures.)
Purpose
We currently can't rerun a graph from a failed node, our only choice is to attempt to run a completely new privacy request.
If a node in the graph fails a certain number of times, we continue with graph execution, just passing in empty values downstream to dependent nodes. This is problematic for a couple of reasons: we may not have properly retrieved/masked data on the failed collection, or downstream collections. Second, if we want to run another privacy request to rectify this, we may no longer be able to execute the graph because data has been destroyed.
Changes
/privacy-request/{privacy_request_id}/retry
will restart the graph, only running remaining graph tasks.Note
Checklist
CHANGELOG.md
fileCHANGELOG.md
file is being appended toUnreleased
section in an appropriate category. Add a new category from the list at the top of the file if the needed one isn't already there.Run Unsafe PR Checks
label has been applied, and checks have passed, if this PR touches any external servicesTicket
Fixes #574