-
Notifications
You must be signed in to change notification settings - Fork 16
Return persisted identities in get_request_status
view
#860
Conversation
…ersisted identity in get_request_status view
get_request_status
viewget_request_status
view
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some more cleanup needed @seanpreston
@@ -491,7 +491,7 @@ def get_request_status( | |||
# Conditionally include the cached identity data in the response if | |||
# it is explicitly requested | |||
for item in paginated.items: # type: ignore | |||
item.identity = item.get_cached_identity_data() | |||
item.identity = item.get_persisted_identity().dict() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also downloading privacy requests as a CSV above is still using cached identity there, these should both pull from the same source, since they are supposed to be the same data in different formats. Otherwise, I can see the UI showing the identities, and then they go to download a CSV and the identity rows are blank.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Creating the request body for a webhook, creating the requests for saas configs retrieve/update statements, and feeding the initial seed data into the traversal all still use the cache, not the database.
Do we do this because it's easier to access the cache sometimes, we don't always have a readily available session? especially in the traversal? I'm a little worried about having different locations storing what the identity is, some pull from one, others pull from another.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we do this because it's easier to access the cache sometimes, we don't always have a readily available session?
For now, yes. Ideally I'd like everything to use the same source of truth for identity data, but that's a larger refactor for exactly this reason, the DB connection isn't piped into everywhere that would need it yet. I've made this ticket to be actioned as a follow-up.
pr.cache_identity(identity_kwargs) | ||
pr.persist_identity( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Policy webhooks can have derived_identities returned. Neither PrivacyRequest.trigger_policy_webhook
nor privacy_request_endpoints > resume_privacy_request
which both update the identity graph, persist the data to the database, they only add it to the redis cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be useful to have a method that both persists the identity in the cache and in the database at the same time? I'd like to avoid these mismatches we have now where they're both being updated in some places and not others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be useful to have a method that both persists the identity in the cache and in the database at the same time?
I'm torn here. On the one hand it's nice to have consistency, on the other, as you rightly suggest above, it means we'll need to be plumbing the DB connection in more places. I'm not sure if it's better to have the execution update the cache with identity data at the very start before the traversal, such that we can guarantee the traversal will always use what was provided by the user on privacy request creation. That way the internals can still use the cache and benefit from the speed, and less refactoring is required. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Policy webhooks can have derived_identities returned. Neither PrivacyRequest.trigger_policy_webhook nor privacy_request_endpoints > resume_privacy_request which both update the identity graph, persist the data to the database, they only add it to the redis cache.
We should separate these concerns for now. The ProvidedIdentity
is useful for facilitating request search based on the exact identity provided, we don't currently want to search based on derived identities, or return them into the UI, so should be careful when we update the ProvidedIdentity
table as that's what will get displayed in the UI (and doesn't currently support anything beyond email
and phone_number
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That way the internals can still use the cache and benefit from the speed, and less refactoring is required. What do you think?
Thinking about this more, I agree, it's in line with our original design, in that we query everything up front, build the graph, and execute it. We're not regularly querying the database as we execute the traversal which I think is good for performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should separate these concerns for now. The ProvidedIdentity is useful for facilitating request search based on the exact identity provided, we don't currently want to search based on derived identities
OK, that makes sense
@seanpreston thanks for your response to my comments. This all makes sense, the one thing I would add then, are code comments in the places where we're writing to just the cache and not the database. I'd want to note why we're doing this and make it clear it's intentional. |
Just waiting on the changelog! |
CHANGELOG.md
Outdated
## Changed | ||
* Changed wording on Admin UI login page [#774](https://github.com/ethyca/fidesops/pull/774) | ||
* Fixed typos in Admin UI [#774](https://github.com/ethyca/fidesops/pull/774) | ||
* Update clipboard icon in Admin UI [#838](https://github.com/ethyca/fidesops/pull/838) | ||
* Return identity data from application DB, instead of cache [#860](https://github.com/ethyca/fidesops/pull/860) | ||
* Update admin ui to be served from the root route `/` [#720](https://github.com/ethyca/fidesops/pull/720) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bad merge @seanpreston
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicates most of "changed" section above
Purpose
This PR updates the way we return identities to use the data stored in the database, rather than the cache.
Changes
ProvidedIdentity
tableChecklist
CHANGELOG.md
fileCHANGELOG.md
file is being appended toUnreleased
section in an appropriate category. Add a new category from the list at the top of the file if the needed one isn't already there.Run Unsafe PR Checks
label has been applied, and checks have passed, if this PR touches any external servicesTicket
Fixes NA