-
Notifications
You must be signed in to change notification settings - Fork 16
Email Connector: Build Masking Instructions #1168
Email Connector: Build Masking Instructions #1168
Conversation
… so it can be used for the email connector, which won't have any rows returned from an access request. - Add an EmailConnector.build_masking_instructions method with a draft of data needed to instruct the user how to query/mask/what fields to mask on their end.
… to be masked in Redis. We'll use this to send one email at the end for each "email"-based dataset at end, instead of sending one email for each collection. Reuse some of the caching code created for manual connectors / failed privacy requests where similar to the EmailConnectors, we have some separate action that is required on a given collection. Rename to make more generic.
…he manual action could just be locating data for another collection downstream. Cache email template details, even if there are no actions needed on that specific collection,
for edge in node.incoming_edges_from_same_dataset(): | ||
append(locators, edge.f2.field_path.string_path, str(edge.f1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If an upstream dependency for the email connector is in the same email-based dataset, we won't have data to give them to help them query for it, so instead I'm caching the upstream field so they can find it themselves.
mask_map[rule_field_path.string_path] = ( | ||
rule.masking_strategy.get("strategy") | ||
if rule.masking_strategy | ||
else None | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm caching the masking strategy here, but we will probably just send a list of fields they should mask, not the strategy. I didn't think it hurt to cache the strategy still in case that is useful later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good — I could also imagine us precomputing values for the third party to update PII to also
locators={ | ||
"parent_id": ["email_dataset:daycare_customer:id"] | ||
}, # The only locator is on a separate collection on their end. We don't have data for it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is tricky if one email collection is dependent on another collection in the same email connector, we won't have any data to give them to query from the dependent collection, they need to locate on their own. Here, we don't have any parent_ids
to give them, they need to go to the daycare_customer collection, use the customer_id of 1, and then find the parent_id, and use that here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting edge case! @mfbrown do you have any insight on whether this is likely to happen?
"payment": CollectionActionRequired( | ||
step=CurrentStep.erasure, | ||
collection=CollectionAddress("email_dataset", "payment"), | ||
action_needed=[ | ||
ManualAction( | ||
locators={"payer_email": ["[email protected]"]}, | ||
get=None, | ||
update=None, # Nothing to mask on this collection | ||
) | ||
], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though there's nothing to mask on the payment collection, I am still caching details for all the "email" collections because its locator may be useful in helping them find data on other downstream collections.
class ManualAction(BaseSchema): | ||
"""Surface how to manually retrieve or mask data in a database-agnostic way | ||
"""Surface how to retrieve or mask data in a database-agnostic way | ||
|
||
"locators" are similar to the SQL "WHERE" information. | ||
"get" contains a list of fields that should be retrieved from the source | ||
"update" is a dictionary of fields and the value they should be replaced with. | ||
"update" is a dictionary of fields and the replacement value/masking strategy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reusing a structure created for the "manual" connector to cache details regarding further actions needed by some other party to continue processing of this request.
"""Cache instructions for how to mask data in this collection. | ||
One email will be sent for all collections in this dataset at the end of the privacy request execution. | ||
""" | ||
|
||
manual_action: ManualAction = self.build_masking_instructions( | ||
node, policy, input_data | ||
) | ||
|
||
logger.info("Caching action needed for collection: '%s", node.address.value) | ||
privacy_request.cache_email_connector_template_contents( | ||
step=CurrentStep.erasure, | ||
collection=node.address, | ||
action_needed=[manual_action], | ||
) | ||
|
||
return 0 # Fidesops itself does not mask this collection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a primary change of this PR - EmailConnector.mask_data caches some instructions on how to find and mask data for this particular collection.
Looking at test failure ^ |
* Pass in input_data to erasure requests, and not just access requests, so it can be used for the email connector, which won't have any rows returned from an access request. - Add an EmailConnector.build_masking_instructions method with a draft of data needed to instruct the user how to query/mask/what fields to mask on their end. * Have the EmailConnector.mask_data cache the raw details of what needs to be masked in Redis. We'll use this to send one email at the end for each "email"-based dataset at end, instead of sending one email for each collection. Reuse some of the caching code created for manual connectors / failed privacy requests where similar to the EmailConnectors, we have some separate action that is required on a given collection. Rename to make more generic. * Remove restriction that a ManualAction needs a get or update value. The manual action could just be locating data for another collection downstream. Cache email template details, even if there are no actions needed on that specific collection, * Update the expected number of collections in the email dataset. * build_masking_instructions is not required to return a ManualAction. * Reconcile this test with the work to make log send asynchronous. Co-authored-by: Sean Preston <[email protected]>
👉 The EmailConnector doesn't yet "send" an email. Documentation will be added for this feature in followup #1158
Purpose
Update
EmailConnector.mask_data
to cache details about actions needed to mask that collection (how to locate the records, which fields to mask, how to mask).The email is not yet sent, but in #1158 we'll look into pulling these details out of the cache to send one email per "email-based dataset" at the end of the privacy request, not per collection.
Changes
PrivacyRequest.cache_email_connector_template_contents
to cache raw details about action needed that we will email to a third party to complete.PrivacyRequest.get_email_connector_template_contents_by_dataset
to retrieve the cached email template details.TraversalNode.incoming_edges_from_same_dataset
so we know any immediate upstream dependencies of the current collection that are also in the same email-related dataset. We won't have data from those upstream edges.Checklist
CHANGELOG.md
fileCHANGELOG.md
file is being appended toUnreleased
section in an appropriate category. Add a new category from the list at the top of the file if the needed one isn't already there.Run Unsafe PR Checks
label has been applied, and checks have passed, if this PR touches any external servicesTicket
Fixes #1135