Skip to content
This repository has been archived by the owner on Nov 30, 2022. It is now read-only.

Run Policy Webhooks as part of Privacy Request Execution [#101] #124

Merged
merged 10 commits into from
Dec 14, 2021

Conversation

pattisdr
Copy link
Contributor

@pattisdr pattisdr commented Dec 10, 2021

Purpose

Finally connect running Pre-Execution and Post-Execution Policy Webhooks as part of a Privacy Request. Any configured PolicyPreWebhooks will run before the privacy request is executed, which can potentially halt execution or add to the identity graph. After the query traversal has completed, run configured PolicyPostWebhooks. PolicyPostWebhooks can't halt execution, but an error will put the PrivacyRequest in an errored state.

Also connect resuming privacy request execution from a given webhook. For example, if you have three webhooks, and the first webhook needs more processing time (and replied with halt: True), your app should send a request to /privacy_request/<privacy_request_id>/resume with the reply-to-token the webhook was originally given. Enable the privacy request to continue executing from the second webhook.

Changes

  • Removes instantiating the PrivacyRequestRunner with a session, as one is generated internally.
    • pass in the new session when running webhooks
  • Fix: Update the resume_privacy_request endpoint to only allow paused requests to be resumed
  • Fix: Exit privacy request execution if we are unable to make a connection to a webhook instead of letting it hang
  • Add running Pre- and Post-Execution webhooks as part of the PrivacyRequestRunner.submit flow
  • When a privacy request is paused, use the ApScheduler to schedule a job when that redis cache expires to clean up the privacy request's status to error since they will no longer be able to resume that privacy request.

Checklist

  • Applicable documentation updated (guides, quickstart, postman collections, tutorial, fidesdemo, database diagram
  • Good unit test/integration test coverage

Ticket

Fixes #101

…ivacy requests to be resumed from a specific pre-execution policy.

Don't instantiate PrivacyRequestRunner with a session.
… connect to a webhook - otherwise this will just hang for a very long time.
…heduler to add a job that runs when that privacy request's identity data expires to mark the privacy request as "errored".

If we don't have identity data, we can't run the privacy request anyway. Settings need to be configured so webhooks can complete processing before identity data expires.

Moved get_scheduler to request_runner_service for import reasons - one trust scheduler imports privacy request runner.
data={
"status": PrivacyRequestStatus.error,
"finished_processing_at": datetime.utcnow(),
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we're doing this a couple of different ways in the codebase; we're updating the status, but also the timestamp, and writing to the db. For example, https://github.com/ethyca/fidesops/blob/6715756452cf076c35c8d64c45abbce0da590658/src/fidesops/service/privacy_request/request_runner_service.py#L73 we're doing this step manually.

Copy link
Contributor Author

@pattisdr pattisdr Dec 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well hopefully this will help! this change specifically was trying to consolidate the logic that you linked to. It seems like not every status should update the timestamp, we only have timestamps on started and finished processing, and sometimes the "saving" is postponed until some other attributes are set.

def run_webhooks(
self, webhook_cls: WebhookTypes, after_webhook: Optional[WebhookTypes] = None
@staticmethod
def run_webhooks_and_report_status(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general comment, would it make more sense to run the webhooks as a step in the task runner as some kind of pre-traversal task? That way the BackgroundScheduler would not be necessary.

Copy link
Contributor Author

@pattisdr pattisdr Dec 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how this alleviates the needs for the BackgroundScheduler. The webhooks are currently just being run as part of PrivacyRequestRunner.submit(). The BackgroundScheduler doesn't run the wehbooks, its purpose is to update the status of paused PrivacyRequests that never completes.

Webhooks are capable of pausing PrivacyRequest processing while they handle processing on the user's end, but they need to resume that privacy request before identity data expires. All the BackgroundScheduler does is mark expired privacy requests are "error" instead of paused, to let the user know it needs to be resubmitted.

Definitely welcome suggestions on how to do this better, just wanted to clarify that the BackgroundScheduler is not actually running the webhooks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks. I do think that if we're going to have a background scheduler it would be a nice refactor to define it in its own file. Right now we're also using this in tasks.initiate_scheduled_request_intake which is being called by main, so it's not limited to the privacy request.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was in its own file originally! I moved here for circular import reasons, it's tricky because initiate_scheduled_request_intake calls the privacy request runner, i'll see if i can figure out something better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I've moved into its own file, plus some adjustments to not instantiate on demand!

if _scheduler is None:
_scheduler = BackgroundScheduler()
_scheduler.start()
return _scheduler
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is instantiating in this way vulnerable to a race condition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, can you elaborate more on your concerns here @stevenbenjamin ? This is the same implementation we're using for the onetrust scheduler, I just relocated for circular import reasons.

Here's the PR that implemented this originally: https://github.com/ethyca/solon/pull/126

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a broken singleton pattern, e.g. https://stackoverflow.com/a/33000332/19479 .We're instantiating on demand here. Granted the worst that could happen is an additional scheduler can be created.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah thank you for clarifying! this link is super helpful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stevenbenjamin when you get a chance, i've tried to fix this in the latest commit -

Copy link
Contributor Author

@pattisdr pattisdr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for your comments @stevenbenjamin - I have some followup questions:

data={
"status": PrivacyRequestStatus.error,
"finished_processing_at": datetime.utcnow(),
},
Copy link
Contributor Author

@pattisdr pattisdr Dec 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well hopefully this will help! this change specifically was trying to consolidate the logic that you linked to. It seems like not every status should update the timestamp, we only have timestamps on started and finished processing, and sometimes the "saving" is postponed until some other attributes are set.

def run_webhooks(
self, webhook_cls: WebhookTypes, after_webhook: Optional[WebhookTypes] = None
@staticmethod
def run_webhooks_and_report_status(
Copy link
Contributor Author

@pattisdr pattisdr Dec 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how this alleviates the needs for the BackgroundScheduler. The webhooks are currently just being run as part of PrivacyRequestRunner.submit(). The BackgroundScheduler doesn't run the wehbooks, its purpose is to update the status of paused PrivacyRequests that never completes.

Webhooks are capable of pausing PrivacyRequest processing while they handle processing on the user's end, but they need to resume that privacy request before identity data expires. All the BackgroundScheduler does is mark expired privacy requests are "error" instead of paused, to let the user know it needs to be resubmitted.

Definitely welcome suggestions on how to do this better, just wanted to clarify that the BackgroundScheduler is not actually running the webhooks.

if _scheduler is None:
_scheduler = BackgroundScheduler()
_scheduler.start()
return _scheduler
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, can you elaborate more on your concerns here @stevenbenjamin ? This is the same implementation we're using for the onetrust scheduler, I just relocated for circular import reasons.

Here's the PR that implemented this originally: https://github.com/ethyca/solon/pull/126

@stevenbenjamin stevenbenjamin merged commit 3de8980 into main Dec 14, 2021
@pattisdr pattisdr deleted the fidesops101_run_policy_webhooks branch February 7, 2022 03:17
sanders41 pushed a commit that referenced this pull request Sep 22, 2022
* Run policy webhooks as part of privacy request execution and allow privacy requests to be resumed from a specific pre-execution policy.

Don't instantiate PrivacyRequestRunner with a session.

* Test pre- and post-execution webhooks are triggered.

* Exit privacy request execution if we get a connection error trying to connect to a webhook - otherwise this will just hang for a very long time.

* If a webhook instructs privacy request execution to pause, use app scheduler to add a job that runs when that privacy request's identity data expires to mark the privacy request as "errored".

If we don't have identity data, we can't run the privacy request anyway. Settings need to be configured so webhooks can complete processing before identity data expires.

Moved get_scheduler to request_runner_service for import reasons - one trust scheduler imports privacy request runner.

* Update docs.

* Relocate run_webhooks_and_report_status method to original location so the diff is more apparent.

* Move the scheduler to a new file.

* Move scheduler to scheduler file.

* Try starting the scheduler in start_webserver,
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[HTTPS Connectors] Run Pre-Execution and Post-Execution Policy Webhooks
2 participants