Skip to content
This repository has been archived by the owner on Nov 30, 2022. It is now read-only.

Return persisted identities in get_request_status view #860

Merged
merged 9 commits into from
Jul 13, 2022

Conversation

seanpreston
Copy link
Contributor

@seanpreston seanpreston commented Jul 12, 2022

Purpose

This PR updates the way we return identities to use the data stored in the database, rather than the cache.

Changes

  • Return identity date from ProvidedIdentity table
  • Persist identity data for privacy requests created by test fixtures

Checklist

  • Update CHANGELOG.md file
    • Merge in main so the most recent CHANGELOG.md file is being appended to
    • Add description within the Unreleased section in an appropriate category. Add a new category from the list at the top of the file if the needed one isn't already there.
    • Add a link to this PR at the end of the description with the PR number as the text. example: #1
  • Applicable documentation updated (guides, quickstart, postman collections, tutorial, fidesdemo, database diagram.
  • If docs updated (select one):
    • documentation complete, or draft/outline provided (tag docs-team to complete/review on this branch)
    • documentation issue created (tag docs-team to complete issue separately)
  • Good unit test/integration test coverage
  • This PR contains a DB migration. If checked, the reviewer should confirm with the author that the down_revision correctly references the previous migration before merging
  • The Run Unsafe PR Checks label has been applied, and checks have passed, if this PR touches any external services

Ticket

Fixes NA

@seanpreston seanpreston changed the title Return persist identities in get_request_status view Return persisted identities in get_request_status view Jul 12, 2022
Copy link
Contributor

@pattisdr pattisdr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some more cleanup needed @seanpreston

@@ -491,7 +491,7 @@ def get_request_status(
# Conditionally include the cached identity data in the response if
# it is explicitly requested
for item in paginated.items: # type: ignore
item.identity = item.get_cached_identity_data()
item.identity = item.get_persisted_identity().dict()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also downloading privacy requests as a CSV above is still using cached identity there, these should both pull from the same source, since they are supposed to be the same data in different formats. Otherwise, I can see the UI showing the identities, and then they go to download a CSV and the identity rows are blank.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating the request body for a webhook, creating the requests for saas configs retrieve/update statements, and feeding the initial seed data into the traversal all still use the cache, not the database.

Do we do this because it's easier to access the cache sometimes, we don't always have a readily available session? especially in the traversal? I'm a little worried about having different locations storing what the identity is, some pull from one, others pull from another.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we do this because it's easier to access the cache sometimes, we don't always have a readily available session?

For now, yes. Ideally I'd like everything to use the same source of truth for identity data, but that's a larger refactor for exactly this reason, the DB connection isn't piped into everywhere that would need it yet. I've made this ticket to be actioned as a follow-up.

Comment on lines +852 to +853
pr.cache_identity(identity_kwargs)
pr.persist_identity(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Policy webhooks can have derived_identities returned. Neither PrivacyRequest.trigger_policy_webhook nor privacy_request_endpoints > resume_privacy_request which both update the identity graph, persist the data to the database, they only add it to the redis cache.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be useful to have a method that both persists the identity in the cache and in the database at the same time? I'd like to avoid these mismatches we have now where they're both being updated in some places and not others.

Copy link
Contributor Author

@seanpreston seanpreston Jul 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be useful to have a method that both persists the identity in the cache and in the database at the same time?

I'm torn here. On the one hand it's nice to have consistency, on the other, as you rightly suggest above, it means we'll need to be plumbing the DB connection in more places. I'm not sure if it's better to have the execution update the cache with identity data at the very start before the traversal, such that we can guarantee the traversal will always use what was provided by the user on privacy request creation. That way the internals can still use the cache and benefit from the speed, and less refactoring is required. What do you think?

Copy link
Contributor Author

@seanpreston seanpreston Jul 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Policy webhooks can have derived_identities returned. Neither PrivacyRequest.trigger_policy_webhook nor privacy_request_endpoints > resume_privacy_request which both update the identity graph, persist the data to the database, they only add it to the redis cache.

We should separate these concerns for now. The ProvidedIdentity is useful for facilitating request search based on the exact identity provided, we don't currently want to search based on derived identities, or return them into the UI, so should be careful when we update the ProvidedIdentity table as that's what will get displayed in the UI (and doesn't currently support anything beyond email and phone_number).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That way the internals can still use the cache and benefit from the speed, and less refactoring is required. What do you think?

Thinking about this more, I agree, it's in line with our original design, in that we query everything up front, build the graph, and execute it. We're not regularly querying the database as we execute the traversal which I think is good for performance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should separate these concerns for now. The ProvidedIdentity is useful for facilitating request search based on the exact identity provided, we don't currently want to search based on derived identities

OK, that makes sense

@pattisdr
Copy link
Contributor

@seanpreston thanks for your response to my comments. This all makes sense, the one thing I would add then, are code comments in the places where we're writing to just the cache and not the database. I'd want to note why we're doing this and make it clear it's intentional.

@pattisdr
Copy link
Contributor

Just waiting on the changelog!

CHANGELOG.md Outdated
Comment on lines 62 to 67
## Changed
* Changed wording on Admin UI login page [#774](https://github.com/ethyca/fidesops/pull/774)
* Fixed typos in Admin UI [#774](https://github.com/ethyca/fidesops/pull/774)
* Update clipboard icon in Admin UI [#838](https://github.com/ethyca/fidesops/pull/838)
* Return identity data from application DB, instead of cache [#860](https://github.com/ethyca/fidesops/pull/860)
* Update admin ui to be served from the root route `/` [#720](https://github.com/ethyca/fidesops/pull/720)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad merge @seanpreston

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicates most of "changed" section above

@pattisdr pattisdr merged commit 6e6011b into main Jul 13, 2022
@pattisdr pattisdr deleted the return-provided-identity branch July 13, 2022 15:48
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants