Skip to content
This repository has been archived by the owner on Nov 30, 2022. It is now read-only.

Return persisted identities in get_request_status view #860

Merged
merged 9 commits into from
Jul 13, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 4 additions & 7 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,11 +29,11 @@ The types of changes are:
* Adds exact match identity search to the privacy request status endpoint [#765](https://github.com/ethyca/fidesops/pull/847/)

### Changed
* Changed wording on Admin UI login page [#774](https://github.com/ethyca/fidesops/pull/774)
* Fixed typos in Admin UI [#774](https://github.com/ethyca/fidesops/pull/774)
* Update clipboard icon in Admin UI [#838](https://github.com/ethyca/fidesops/pull/838)
* Stop masking uvicorn logs by default [#831](https://github.com/ethyca/fidesops/pull/831)
* Bump fideslib to handle base64 encoded password [#820](https://github.com/ethyca/fidesops/pull/820)
* Return identity data from application DB, instead of cache [#860](https://github.com/ethyca/fidesops/pull/860)
* Update admin ui to be served from the root route `/` [#720](https://github.com/ethyca/fidesops/pull/720)

### Developer Experience
* Replace user authentication routes with fideslib routes [#811](https://github.com/ethyca/fidesops/pull/811)
Expand All @@ -43,6 +43,8 @@ The types of changes are:
* Backend UI deployment [#827](https://github.com/ethyca/fidesops/pull/827)
* Fix publish_docs CI action [#818](https://github.com/ethyca/fidesops/pull/818)
* Reorganize docs and standardize formatting [#858](https://github.com/ethyca/fidesops/pull/858)
* Changed wording on Admin UI login page [#774](https://github.com/ethyca/fidesops/pull/774)
* Fixed typos in Admin UI [#774](https://github.com/ethyca/fidesops/pull/774)

### Fixed
* Resolve issue with MyPy seeing files in fidesops as missing imports [#719](https://github.com/ethyca/fidesops/pull/719)
Expand All @@ -53,11 +55,6 @@ The types of changes are:
* [User Management] Refactored New and Edit user pages to reduce duplicate code [#839]https://github.com/ethyca/fidesops/pull/839
* Fix error when there are no scopes in `ClientDetail` [#830](https://github.com/ethyca/fidesops/pull/830)

### Docs
* Backend UI deployment [#827](https://github.com/ethyca/fidesops/pull/827)
* Fix publish_docs CI action [#818](https://github.com/ethyca/fidesops/pull/818)
* Bump fideslib to handle base64 encoded password [#820](https://github.com/ethyca/fidesops/pull/820)
* Stop masking uvicorn logs by default [#831](https://github.com/ethyca/fidesops/pull/831)

## [1.6.1](https://github.com/ethyca/fidesops/compare/1.6.0...1.6.1)

Expand Down
6 changes: 4 additions & 2 deletions src/fidesops/api/v1/endpoints/privacy_request_endpoints.py
Original file line number Diff line number Diff line change
Expand Up @@ -255,7 +255,7 @@ def privacy_request_csv_download(
csv_file.writerow(
[
pr.created_at,
pr.get_cached_identity_data(),
pr.get_persisted_identity().dict(),
pr.policy.key if pr.policy else None,
pr.status.value if pr.status else None,
pr.reviewed_by,
Expand Down Expand Up @@ -509,7 +509,7 @@ def get_request_status(
# Conditionally include the cached identity data in the response if
# it is explicitly requested
for item in paginated.items: # type: ignore
item.identity = item.get_cached_identity_data()
item.identity = item.get_persisted_identity().dict()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also downloading privacy requests as a CSV above is still using cached identity there, these should both pull from the same source, since they are supposed to be the same data in different formats. Otherwise, I can see the UI showing the identities, and then they go to download a CSV and the identity rows are blank.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating the request body for a webhook, creating the requests for saas configs retrieve/update statements, and feeding the initial seed data into the traversal all still use the cache, not the database.

Do we do this because it's easier to access the cache sometimes, we don't always have a readily available session? especially in the traversal? I'm a little worried about having different locations storing what the identity is, some pull from one, others pull from another.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we do this because it's easier to access the cache sometimes, we don't always have a readily available session?

For now, yes. Ideally I'd like everything to use the same source of truth for identity data, but that's a larger refactor for exactly this reason, the DB connection isn't piped into everywhere that would need it yet. I've made this ticket to be actioned as a follow-up.

attach_resume_instructions(item)
else:
for item in paginated.items: # type: ignore
Expand Down Expand Up @@ -635,6 +635,8 @@ def resume_privacy_request(
) -> PrivacyRequestResponse:
"""Resume running a privacy request after it was paused by a Pre-Execution webhook"""
privacy_request = get_privacy_request_or_error(db, privacy_request_id)
# We don't want to persist derived identities because they have not been provided
# by the end user
privacy_request.cache_identity(webhook_callback.derived_identity) # type: ignore

if privacy_request.status != PrivacyRequestStatus.paused:
Expand Down
2 changes: 2 additions & 0 deletions src/fidesops/models/privacy_request.py
Original file line number Diff line number Diff line change
Expand Up @@ -482,6 +482,8 @@ def trigger_policy_webhook(self, webhook: WebhookTypes) -> None:
logger.info(
f"Updating known identities on privacy request {self.id} from webhook {webhook.key}."
)
# Don't persist derived identities because they aren't provided directly
# by the end user
self.cache_identity(response_body.derived_identity)

# Pause execution if instructed
Expand Down
2 changes: 1 addition & 1 deletion src/fidesops/schemas/privacy_request.py
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,7 @@ class PrivacyRequestResponse(BaseSchema):
# as it is an API response field, and we don't want to reveal any more
# about our PII structure than is explicitly stored in the cache on request
# creation.
identity: Optional[Dict[str, str]]
identity: Optional[Dict[str, Optional[str]]]
policy: PolicySchema
stopped_collection_details: Optional[StoppedCollectionDetails] = None
resume_endpoint: Optional[str]
Expand Down
42 changes: 25 additions & 17 deletions tests/api/v1/endpoints/test_privacy_request_endpoints.py
Original file line number Diff line number Diff line change
Expand Up @@ -640,7 +640,7 @@ def test_get_privacy_requests_with_identity(
assert resp["items"][0]["id"] == succeeded_privacy_request.id
assert (
resp["items"][0]["identity"]
== succeeded_privacy_request.get_cached_identity_data()
== succeeded_privacy_request.get_persisted_identity()
)

assert resp["items"][0]["policy"]["key"] == privacy_request.policy.key
Expand Down Expand Up @@ -710,22 +710,25 @@ def test_filter_privacy_requests_by_internal_id(
api_client,
url,
generate_auth_header,
privacy_request,
policy,
):
data = [
{
"requested_at": "2021-08-30T16:09:37.359Z",
"policy_key": policy.key,
"identity": {"email": "[email protected]"},
"id": "test_internal_id_1",
}
]
resp = api_client.post(url, json=data)
assert resp.status_code == 200
response_data = resp.json()["succeeded"]
assert len(response_data) == 1
privacy_request = PrivacyRequest.get(db=db, object_id=response_data[0]["id"])
auth_header = generate_auth_header(scopes=[PRIVACY_REQUEST_READ])
new_request_id = "test_internal_id_1"
response = api_client.get(
url + f"?request_id={new_request_id}", headers=auth_header
)
assert response.status_code == status.HTTP_200_OK
resp = response.json()
assert len(resp["items"]) == 0

privacy_request.id = new_request_id
privacy_request.save(db)

response = api_client.get(
url + f"?request_id={new_request_id}", headers=auth_header
url + f"?request_id={privacy_request.id}",
headers=auth_header,
)
assert response.status_code == status.HTTP_200_OK
resp = response.json()
Expand Down Expand Up @@ -1080,8 +1083,13 @@ def test_get_privacy_requests_csv_format(
privacy_request.status = PrivacyRequestStatus.approved
privacy_request.reviewed_by = user.id
privacy_request.reviewed_at = reviewed_at
TEST_EMAIL = "[email protected]"
TEST_PHONE = "+1 234 567 8910"
privacy_request.cache_identity(
{"email": "[email protected]", "phone_number": "111-111-1111"}
{
"email": TEST_EMAIL,
"phone_number": TEST_PHONE,
}
)
privacy_request.save(db)

Expand All @@ -1102,8 +1110,8 @@ def test_get_privacy_requests_csv_format(
first_row = next(csv_file)
assert parse(first_row["Time received"], ignoretz=True) == created_at
assert ast.literal_eval(first_row["Subject identity"]) == {
"email": "[email protected]",
"phone_number": "111-111-1111",
"email": TEST_EMAIL,
"phone_number": TEST_PHONE,
}
assert first_row["Policy key"] == "example_access_request_policy"
assert first_row["Request status"] == "approved"
Expand Down
18 changes: 16 additions & 2 deletions tests/fixtures/application_fixtures.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@
)
from fidesops.models.privacy_request import PrivacyRequest, PrivacyRequestStatus
from fidesops.models.storage import ResponseFormat, StorageConfig
from fidesops.schemas.redis_cache import PrivacyRequestIdentity
from fidesops.schemas.storage.storage import (
FileNaming,
StorageDetails,
Expand Down Expand Up @@ -772,10 +773,18 @@ def _create_privacy_request_for_policy(
microsecond=393185,
tzinfo=timezone.utc,
)
return PrivacyRequest.create(
pr = PrivacyRequest.create(
db=db,
data=data,
)
pr.persist_identity(
db=db,
identity=PrivacyRequestIdentity(
email="[email protected]",
phone_number="+1 234 567 8910",
),
)
return pr


@pytest.fixture(scope="function")
Expand Down Expand Up @@ -839,7 +848,12 @@ def succeeded_privacy_request(cache, db: Session, policy: Policy) -> PrivacyRequ
"client_id": policy.client_id,
},
)
pr.cache_identity({"email": "[email protected]"})
identity_kwargs = {"email": "[email protected]"}
pr.cache_identity(identity_kwargs)
pr.persist_identity(
Comment on lines +852 to +853
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Policy webhooks can have derived_identities returned. Neither PrivacyRequest.trigger_policy_webhook nor privacy_request_endpoints > resume_privacy_request which both update the identity graph, persist the data to the database, they only add it to the redis cache.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be useful to have a method that both persists the identity in the cache and in the database at the same time? I'd like to avoid these mismatches we have now where they're both being updated in some places and not others.

Copy link
Contributor Author

@seanpreston seanpreston Jul 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be useful to have a method that both persists the identity in the cache and in the database at the same time?

I'm torn here. On the one hand it's nice to have consistency, on the other, as you rightly suggest above, it means we'll need to be plumbing the DB connection in more places. I'm not sure if it's better to have the execution update the cache with identity data at the very start before the traversal, such that we can guarantee the traversal will always use what was provided by the user on privacy request creation. That way the internals can still use the cache and benefit from the speed, and less refactoring is required. What do you think?

Copy link
Contributor Author

@seanpreston seanpreston Jul 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Policy webhooks can have derived_identities returned. Neither PrivacyRequest.trigger_policy_webhook nor privacy_request_endpoints > resume_privacy_request which both update the identity graph, persist the data to the database, they only add it to the redis cache.

We should separate these concerns for now. The ProvidedIdentity is useful for facilitating request search based on the exact identity provided, we don't currently want to search based on derived identities, or return them into the UI, so should be careful when we update the ProvidedIdentity table as that's what will get displayed in the UI (and doesn't currently support anything beyond email and phone_number).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That way the internals can still use the cache and benefit from the speed, and less refactoring is required. What do you think?

Thinking about this more, I agree, it's in line with our original design, in that we query everything up front, build the graph, and execute it. We're not regularly querying the database as we execute the traversal which I think is good for performance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should separate these concerns for now. The ProvidedIdentity is useful for facilitating request search based on the exact identity provided, we don't currently want to search based on derived identities

OK, that makes sense

db=db,
identity=PrivacyRequestIdentity(**identity_kwargs),
)
yield pr
pr.delete(db)

Expand Down