Skip to content
This repository has been archived by the owner on Nov 30, 2022. It is now read-only.

[#743] Store provided identity data in application database #834

Merged
merged 23 commits into from
Jul 11, 2022

Conversation

seanpreston
Copy link
Contributor

@seanpreston seanpreston commented Jul 8, 2022

🚨 This PR contains a migration — please check the downrev before merging

Purpose

Store provided identity data in the application DB alongside the privacy request to enable:

  • exact match searching based on provided identities and blind indexing
  • data retention beyond Redis limits for historical request tracking

One concern I have is that we're not removing this data upon request completion (but are upon request deletion). This is something we'll need to build into the Policy layer, since GDPR, CCPA et al all specify different constraints around accountability and data retention.

Changes

  • Adds ProvidedIdentity table with:
    • encrypted_value — an encrypted column in the database which houses the value of the identity
    • hashed_value — a oneway hashed version of the field to be used for exact match searches later on
  • Store provided identity data within those fields on request creation by:
    • privacy request creation API view
    • DRP privacy request creation API view
    • OneTrust privacy request intake API view

Checklist

  • Update CHANGELOG.md file
    • Merge in main so the most recent CHANGELOG.md file is being appended to
    • Add description within the Unreleased section in an appropriate category. Add a new category from the list at the top of the file if the needed one isn't already there.
    • Add a link to this PR at the end of the description with the PR number as the text. example: #1
  • Applicable documentation updated (guides, quickstart, postman collections, tutorial, fidesdemo, database diagram.
  • If docs updated (select one):
    • documentation complete, or draft/outline provided (tag docs-team to complete/review on this branch)
    • documentation issue created (tag docs-team to complete issue separately)
  • Good unit test/integration test coverage
  • This PR contains a DB migration. If checked, the reviewer should confirm with the author that the down_revision correctly references the previous migration before merging
  • The Run Unsafe PR Checks label has been applied, and checks have passed, if this PR touches any external services

Ticket

Fixes #743

@seanpreston seanpreston added the run unsafe ci checks Triggers running of unsafe CI checks label Jul 8, 2022
@sanders41
Copy link
Contributor

General question about this. This information gets stored for any request including erasure? It seems wrong to store personal information in the database if the request was to remove personal information.

@seanpreston
Copy link
Contributor Author

seanpreston commented Jul 8, 2022

General question about this. This information gets stored for any request including erasure? It seems wrong to store personal information in the database if the request was to remove personal information.

Yes, including erasures. I agree it's counter-intuitive, however it's required for audit reasons. Fidesops operators must be able to show which identities were processed for any type of privacy request in the event of an audit. The individual regulation will specify the timeline that the operator must be able to show history for (e.g. GDPR and CCPA will specify different timelines) so we'll need to build that into the Policy layer.

Notably this PII is different to any type of data collected in the traversal, as it was provided by the user up front. Traversal data is still stored in the cache.

@seanpreston
Copy link
Contributor Author

seanpreston commented Jul 8, 2022

Closed in favour of the encrypted field approach here

@seanpreston seanpreston closed this Jul 8, 2022
@seanpreston seanpreston reopened this Jul 8, 2022
Copy link
Contributor

@eastandwestwind eastandwestwind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work on this @seanpreston ! Couple small things

@@ -222,6 +231,40 @@ def cache_identity(self, identity: PrivacyRequestIdentity) -> None:
value,
)

def persist_identity(self, db: Session, identity: PrivacyRequestIdentity) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we need to persist the identity every time we create a privacy request, can we call this method from def create(): in this same file?

This is also more in line with our pattern of deleting the identities within the def delete() method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is possible, we currently don't handle identity data within that method at all, in favour of caching it separately from the model. The reason I left it that way was because we don't need the identity data in the PrivacyRequest table, and those ORM overrides should mainly focus on what that table needs. In the event of deletion, we do need to clear the foreign keys in order to process deleted of PrivacyRequests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, OK we can leave as is for now, thanks!

class ProvidedIdentityType(EnumType):
"""Enum for privacy request identity types"""

email = "email"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the enums for identity types supported by fidesops, so I'm thinking we should use the enums here, too: https://github.com/ethyca/fidesops/blob/main/src/fidesops/service/drp/drp_fidesops_mapper.py#L26

Something like:

DRP_TO_FIDESOPS_SUPPORTED_IDENTITY_PROPS_MAP: Dict[str, str] = {
            "email": ProvidedIdentityType.email.value,
            "phone_number": ProvidedIdentityType.phone_number.value,
        }

},
)

def get_persisted_identity(self) -> PrivacyRequestIdentity:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a way to get by hashed value yet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great spot! I had intended to use this method but you're right — it wouldn't be useful for that because it generates a new salt each time without the option to refer to ProvidedIdentity.salt. Will fix 👍

Copy link
Contributor Author

@seanpreston seanpreston Jul 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method now uses a static SALT. This way we can consistently hash values we're searching for to see if they exist in a hashed form in the table.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay, this looks good to me now.

@eastandwestwind
Copy link
Contributor

Approved and confirming that I've checked down revision of migrations. Thanks for these changes @seanpreston !

@eastandwestwind eastandwestwind merged commit 2996bfc into main Jul 11, 2022
@eastandwestwind eastandwestwind deleted the 743-store-identities branch July 11, 2022 15:40
sanders41 pushed a commit that referenced this pull request Sep 22, 2022
* adds identity fields to PrivacyRequest model

* store identity data inside database

* update changelog

* add identities in test data command

* store identities provided via the DRP creation endpoint

* black + isort

* store provided identity data in request creation from onetrust

* remove deprecated migration

* adds new provided identity table

* use new provided identity table

* add docstring, remove comment

* update DRP privacy request creation to use ProvidedIdentity model

* update identity creation in test data command

* use persisted identity in OneTrust

* update test to use persisted identity

* isort update

* use enums

* optionally receive a salt in hash_value cmd

* use a constant salt for provided identity hashing

* remove import

* use typehints

* update typedef

* use enum in dict
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
run unsafe ci checks Triggers running of unsafe CI checks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Store provided identities in app database
3 participants