Run Policy Webhooks as part of Privacy Request Execution [#101] #124

pattisdr · 2021-12-10T22:09:18Z

Purpose

Finally connect running Pre-Execution and Post-Execution Policy Webhooks as part of a Privacy Request. Any configured PolicyPreWebhooks will run before the privacy request is executed, which can potentially halt execution or add to the identity graph. After the query traversal has completed, run configured PolicyPostWebhooks. PolicyPostWebhooks can't halt execution, but an error will put the PrivacyRequest in an errored state.

Also connect resuming privacy request execution from a given webhook. For example, if you have three webhooks, and the first webhook needs more processing time (and replied with halt: True), your app should send a request to /privacy_request/<privacy_request_id>/resume with the reply-to-token the webhook was originally given. Enable the privacy request to continue executing from the second webhook.

Changes

Removes instantiating the PrivacyRequestRunner with a session, as one is generated internally.
- pass in the new session when running webhooks
Fix: Update the resume_privacy_request endpoint to only allow paused requests to be resumed
Fix: Exit privacy request execution if we are unable to make a connection to a webhook instead of letting it hang
Add running Pre- and Post-Execution webhooks as part of the PrivacyRequestRunner.submit flow
When a privacy request is paused, use the ApScheduler to schedule a job when that redis cache expires to clean up the privacy request's status to error since they will no longer be able to resume that privacy request.

Checklist

Applicable documentation updated (guides, quickstart, postman collections, tutorial, fidesdemo, database diagram
Good unit test/integration test coverage

Ticket

Fixes #101

…ivacy requests to be resumed from a specific pre-execution policy. Don't instantiate PrivacyRequestRunner with a session.

… connect to a webhook - otherwise this will just hang for a very long time.

…heduler to add a job that runs when that privacy request's identity data expires to mark the privacy request as "errored". If we don't have identity data, we can't run the privacy request anyway. Settings need to be configured so webhooks can complete processing before identity data expires. Moved get_scheduler to request_runner_service for import reasons - one trust scheduler imports privacy request runner.

…o the diff is more apparent.

stevenbenjamin · 2021-12-13T14:44:41Z

src/fidesops/models/privacy_request.py

+            data={
+                "status": PrivacyRequestStatus.error,
+                "finished_processing_at": datetime.utcnow(),
+            },


It seems like we're doing this a couple of different ways in the codebase; we're updating the status, but also the timestamp, and writing to the db. For example, https://github.com/ethyca/fidesops/blob/6715756452cf076c35c8d64c45abbce0da590658/src/fidesops/service/privacy_request/request_runner_service.py#L73 we're doing this step manually.

well hopefully this will help! this change specifically was trying to consolidate the logic that you linked to. It seems like not every status should update the timestamp, we only have timestamps on started and finished processing, and sometimes the "saving" is postponed until some other attributes are set.

src/fidesops/service/privacy_request/onetrust_service.py

stevenbenjamin · 2021-12-13T14:53:39Z

src/fidesops/service/privacy_request/request_runner_service.py

-    def run_webhooks(
-        self, webhook_cls: WebhookTypes, after_webhook: Optional[WebhookTypes] = None
+    @staticmethod
+    def run_webhooks_and_report_status(


As a general comment, would it make more sense to run the webhooks as a step in the task runner as some kind of pre-traversal task? That way the BackgroundScheduler would not be necessary.

I don't understand how this alleviates the needs for the BackgroundScheduler. The webhooks are currently just being run as part of PrivacyRequestRunner.submit(). The BackgroundScheduler doesn't run the wehbooks, its purpose is to update the status of paused PrivacyRequests that never completes.

Webhooks are capable of pausing PrivacyRequest processing while they handle processing on the user's end, but they need to resume that privacy request before identity data expires. All the BackgroundScheduler does is mark expired privacy requests are "error" instead of paused, to let the user know it needs to be resubmitted.

Definitely welcome suggestions on how to do this better, just wanted to clarify that the BackgroundScheduler is not actually running the webhooks.

Ah, thanks. I do think that if we're going to have a background scheduler it would be a nice refactor to define it in its own file. Right now we're also using this in tasks.initiate_scheduled_request_intake which is being called by main, so it's not limited to the privacy request.

it was in its own file originally! I moved here for circular import reasons, it's tricky because initiate_scheduled_request_intake calls the privacy request runner, i'll see if i can figure out something better.

ok I've moved into its own file, plus some adjustments to not instantiate on demand!

stevenbenjamin · 2021-12-13T14:58:01Z

src/fidesops/service/privacy_request/request_runner_service.py

+    if _scheduler is None:
+        _scheduler = BackgroundScheduler()
+        _scheduler.start()
+    return _scheduler


Is instantiating in this way vulnerable to a race condition?

I'm not sure, can you elaborate more on your concerns here @stevenbenjamin ? This is the same implementation we're using for the onetrust scheduler, I just relocated for circular import reasons.

Here's the PR that implemented this originally: https://github.com/ethyca/solon/pull/126

This is a broken singleton pattern, e.g. https://stackoverflow.com/a/33000332/19479 .We're instantiating on demand here. Granted the worst that could happen is an additional scheduler can be created.

ah thank you for clarifying! this link is super helpful.

@stevenbenjamin when you get a chance, i've tried to fix this in the latest commit -

pattisdr

thanks for your comments @stevenbenjamin - I have some followup questions:

pattisdr · 2021-12-13T15:47:28Z

src/fidesops/models/privacy_request.py

+            data={
+                "status": PrivacyRequestStatus.error,
+                "finished_processing_at": datetime.utcnow(),
+            },


well hopefully this will help! this change specifically was trying to consolidate the logic that you linked to. It seems like not every status should update the timestamp, we only have timestamps on started and finished processing, and sometimes the "saving" is postponed until some other attributes are set.

pattisdr · 2021-12-13T15:54:06Z

src/fidesops/service/privacy_request/request_runner_service.py

-    def run_webhooks(
-        self, webhook_cls: WebhookTypes, after_webhook: Optional[WebhookTypes] = None
+    @staticmethod
+    def run_webhooks_and_report_status(


I don't understand how this alleviates the needs for the BackgroundScheduler. The webhooks are currently just being run as part of PrivacyRequestRunner.submit(). The BackgroundScheduler doesn't run the wehbooks, its purpose is to update the status of paused PrivacyRequests that never completes.

Webhooks are capable of pausing PrivacyRequest processing while they handle processing on the user's end, but they need to resume that privacy request before identity data expires. All the BackgroundScheduler does is mark expired privacy requests are "error" instead of paused, to let the user know it needs to be resubmitted.

Definitely welcome suggestions on how to do this better, just wanted to clarify that the BackgroundScheduler is not actually running the webhooks.

pattisdr · 2021-12-13T15:56:09Z

src/fidesops/service/privacy_request/request_runner_service.py

+    if _scheduler is None:
+        _scheduler = BackgroundScheduler()
+        _scheduler.start()
+    return _scheduler


I'm not sure, can you elaborate more on your concerns here @stevenbenjamin ? This is the same implementation we're using for the onetrust scheduler, I just relocated for circular import reasons.

Here's the PR that implemented this originally: https://github.com/ethyca/solon/pull/126

…y_webhooks # Conflicts: # docs/fidesops/docs/guides/policy_webhooks.md # tests/models/test_privacy_request.py

* Run policy webhooks as part of privacy request execution and allow privacy requests to be resumed from a specific pre-execution policy. Don't instantiate PrivacyRequestRunner with a session. * Test pre- and post-execution webhooks are triggered. * Exit privacy request execution if we get a connection error trying to connect to a webhook - otherwise this will just hang for a very long time. * If a webhook instructs privacy request execution to pause, use app scheduler to add a job that runs when that privacy request's identity data expires to mark the privacy request as "errored". If we don't have identity data, we can't run the privacy request anyway. Settings need to be configured so webhooks can complete processing before identity data expires. Moved get_scheduler to request_runner_service for import reasons - one trust scheduler imports privacy request runner. * Update docs. * Relocate run_webhooks_and_report_status method to original location so the diff is more apparent. * Move the scheduler to a new file. * Move scheduler to scheduler file. * Try starting the scheduler in start_webserver,

pattisdr added 6 commits December 9, 2021 17:10

Run policy webhooks as part of privacy request execution and allow pr…

d5e3fba

…ivacy requests to be resumed from a specific pre-execution policy. Don't instantiate PrivacyRequestRunner with a session.

Test pre- and post-execution webhooks are triggered.

25d5446

Exit privacy request execution if we get a connection error trying to…

f3dc5f5

… connect to a webhook - otherwise this will just hang for a very long time.

Update docs.

80a3371

Relocate run_webhooks_and_report_status method to original location s…

f380322

…o the diff is more apparent.

stevenbenjamin reviewed Dec 13, 2021

View reviewed changes

src/fidesops/service/privacy_request/onetrust_service.py Show resolved Hide resolved

stevenbenjamin reviewed Dec 13, 2021

View reviewed changes

seanpreston assigned stevenbenjamin Dec 13, 2021

pattisdr commented Dec 13, 2021

View reviewed changes

pattisdr added 4 commits December 14, 2021 10:03

Move the scheduler to a new file.

00dadaf

Merge remote-tracking branch 'ethyca/main' into fidesops101_run_polic…

8b23f4d

…y_webhooks # Conflicts: # docs/fidesops/docs/guides/policy_webhooks.md # tests/models/test_privacy_request.py

Move scheduler to scheduler file.

526b528

Try starting the scheduler in start_webserver,

ba1e42e

stevenbenjamin merged commit 3de8980 into main Dec 14, 2021

pattisdr deleted the fidesops101_run_policy_webhooks branch February 7, 2022 03:17

pattisdr mentioned this pull request Jun 8, 2022

Cache/Surface Resume/Restart Privacy Request Details [#574] #591

Merged

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run Policy Webhooks as part of Privacy Request Execution [#101] #124

Run Policy Webhooks as part of Privacy Request Execution [#101] #124

pattisdr commented Dec 10, 2021 •

edited

Loading

stevenbenjamin Dec 13, 2021

pattisdr Dec 13, 2021 •

edited

Loading

stevenbenjamin Dec 13, 2021

pattisdr Dec 13, 2021 •

edited

Loading

stevenbenjamin Dec 13, 2021

pattisdr Dec 13, 2021

pattisdr Dec 14, 2021

stevenbenjamin Dec 13, 2021

pattisdr Dec 13, 2021

stevenbenjamin Dec 13, 2021

pattisdr Dec 13, 2021

pattisdr Dec 14, 2021

pattisdr left a comment

pattisdr Dec 13, 2021 •

edited

Loading

pattisdr Dec 13, 2021 •

edited

Loading

pattisdr Dec 13, 2021

Run Policy Webhooks as part of Privacy Request Execution [#101] #124

Run Policy Webhooks as part of Privacy Request Execution [#101] #124

Conversation

pattisdr commented Dec 10, 2021 • edited Loading

Purpose

Changes

Checklist

Ticket

Choose a reason for hiding this comment

pattisdr Dec 13, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pattisdr Dec 13, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pattisdr left a comment

Choose a reason for hiding this comment

pattisdr Dec 13, 2021 • edited Loading

Choose a reason for hiding this comment

pattisdr Dec 13, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pattisdr commented Dec 10, 2021 •

edited

Loading

pattisdr Dec 13, 2021 •

edited

Loading

pattisdr Dec 13, 2021 •

edited

Loading

pattisdr Dec 13, 2021 •

edited

Loading

pattisdr Dec 13, 2021 •

edited

Loading