-
Notifications
You must be signed in to change notification settings - Fork 16
Run Policy Webhooks as part of Privacy Request Execution [#101] #124
Conversation
…ivacy requests to be resumed from a specific pre-execution policy. Don't instantiate PrivacyRequestRunner with a session.
… connect to a webhook - otherwise this will just hang for a very long time.
…heduler to add a job that runs when that privacy request's identity data expires to mark the privacy request as "errored". If we don't have identity data, we can't run the privacy request anyway. Settings need to be configured so webhooks can complete processing before identity data expires. Moved get_scheduler to request_runner_service for import reasons - one trust scheduler imports privacy request runner.
…o the diff is more apparent.
data={ | ||
"status": PrivacyRequestStatus.error, | ||
"finished_processing_at": datetime.utcnow(), | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like we're doing this a couple of different ways in the codebase; we're updating the status, but also the timestamp, and writing to the db. For example, https://github.com/ethyca/fidesops/blob/6715756452cf076c35c8d64c45abbce0da590658/src/fidesops/service/privacy_request/request_runner_service.py#L73 we're doing this step manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well hopefully this will help! this change specifically was trying to consolidate the logic that you linked to. It seems like not every status should update the timestamp, we only have timestamps on started and finished processing, and sometimes the "saving" is postponed until some other attributes are set.
def run_webhooks( | ||
self, webhook_cls: WebhookTypes, after_webhook: Optional[WebhookTypes] = None | ||
@staticmethod | ||
def run_webhooks_and_report_status( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a general comment, would it make more sense to run the webhooks as a step in the task runner as some kind of pre-traversal task? That way the BackgroundScheduler would not be necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand how this alleviates the needs for the BackgroundScheduler. The webhooks are currently just being run as part of PrivacyRequestRunner.submit(). The BackgroundScheduler doesn't run the wehbooks, its purpose is to update the status of paused PrivacyRequests that never completes.
Webhooks are capable of pausing PrivacyRequest processing while they handle processing on the user's end, but they need to resume that privacy request before identity data expires. All the BackgroundScheduler does is mark expired privacy requests are "error" instead of paused, to let the user know it needs to be resubmitted.
Definitely welcome suggestions on how to do this better, just wanted to clarify that the BackgroundScheduler is not actually running the webhooks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks. I do think that if we're going to have a background scheduler it would be a nice refactor to define it in its own file. Right now we're also using this in tasks.initiate_scheduled_request_intake
which is being called by main
, so it's not limited to the privacy request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was in its own file originally! I moved here for circular import reasons, it's tricky because initiate_scheduled_request_intake
calls the privacy request runner, i'll see if i can figure out something better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok I've moved into its own file, plus some adjustments to not instantiate on demand!
if _scheduler is None: | ||
_scheduler = BackgroundScheduler() | ||
_scheduler.start() | ||
return _scheduler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is instantiating in this way vulnerable to a race condition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, can you elaborate more on your concerns here @stevenbenjamin ? This is the same implementation we're using for the onetrust scheduler, I just relocated for circular import reasons.
Here's the PR that implemented this originally: https://github.com/ethyca/solon/pull/126
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a broken singleton pattern, e.g. https://stackoverflow.com/a/33000332/19479 .We're instantiating on demand here. Granted the worst that could happen is an additional scheduler can be created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah thank you for clarifying! this link is super helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stevenbenjamin when you get a chance, i've tried to fix this in the latest commit -
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for your comments @stevenbenjamin - I have some followup questions:
data={ | ||
"status": PrivacyRequestStatus.error, | ||
"finished_processing_at": datetime.utcnow(), | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well hopefully this will help! this change specifically was trying to consolidate the logic that you linked to. It seems like not every status should update the timestamp, we only have timestamps on started and finished processing, and sometimes the "saving" is postponed until some other attributes are set.
def run_webhooks( | ||
self, webhook_cls: WebhookTypes, after_webhook: Optional[WebhookTypes] = None | ||
@staticmethod | ||
def run_webhooks_and_report_status( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand how this alleviates the needs for the BackgroundScheduler. The webhooks are currently just being run as part of PrivacyRequestRunner.submit(). The BackgroundScheduler doesn't run the wehbooks, its purpose is to update the status of paused PrivacyRequests that never completes.
Webhooks are capable of pausing PrivacyRequest processing while they handle processing on the user's end, but they need to resume that privacy request before identity data expires. All the BackgroundScheduler does is mark expired privacy requests are "error" instead of paused, to let the user know it needs to be resubmitted.
Definitely welcome suggestions on how to do this better, just wanted to clarify that the BackgroundScheduler is not actually running the webhooks.
if _scheduler is None: | ||
_scheduler = BackgroundScheduler() | ||
_scheduler.start() | ||
return _scheduler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, can you elaborate more on your concerns here @stevenbenjamin ? This is the same implementation we're using for the onetrust scheduler, I just relocated for circular import reasons.
Here's the PR that implemented this originally: https://github.com/ethyca/solon/pull/126
…y_webhooks # Conflicts: # docs/fidesops/docs/guides/policy_webhooks.md # tests/models/test_privacy_request.py
* Run policy webhooks as part of privacy request execution and allow privacy requests to be resumed from a specific pre-execution policy. Don't instantiate PrivacyRequestRunner with a session. * Test pre- and post-execution webhooks are triggered. * Exit privacy request execution if we get a connection error trying to connect to a webhook - otherwise this will just hang for a very long time. * If a webhook instructs privacy request execution to pause, use app scheduler to add a job that runs when that privacy request's identity data expires to mark the privacy request as "errored". If we don't have identity data, we can't run the privacy request anyway. Settings need to be configured so webhooks can complete processing before identity data expires. Moved get_scheduler to request_runner_service for import reasons - one trust scheduler imports privacy request runner. * Update docs. * Relocate run_webhooks_and_report_status method to original location so the diff is more apparent. * Move the scheduler to a new file. * Move scheduler to scheduler file. * Try starting the scheduler in start_webserver,
Purpose
Finally connect running Pre-Execution and Post-Execution Policy Webhooks as part of a Privacy Request. Any configured PolicyPreWebhooks will run before the privacy request is executed, which can potentially halt execution or add to the identity graph. After the query traversal has completed, run configured
PolicyPostWebhooks
. PolicyPostWebhooks can'thalt
execution, but an error will put the PrivacyRequest in an errored state.Also connect resuming privacy request execution from a given webhook. For example, if you have three webhooks, and the first webhook needs more processing time (and replied with
halt: True
), your app should send a request to/privacy_request/<privacy_request_id>/resume
with thereply-to-token
the webhook was originally given. Enable the privacy request to continue executing from the second webhook.Changes
paused
requests to be resumedPrivacyRequestRunner.submit
flowApScheduler
to schedule a job when that redis cache expires to clean up the privacy request's status toerror
since they will no longer be able to resume that privacy request.Checklist
Ticket
Fixes #101