Skip to content
This repository has been archived by the owner on Nov 30, 2022. It is now read-only.

Commit

Permalink
Fix Fields Affected on Execution Logs [#144] (#236)
Browse files Browse the repository at this point in the history
* Log just potentially affected fields on the execution logs, rather than all of the fields.

* Add test asserting rules with overlapping data categories don't produce duplicate fields in log.

* Update reporting docs to correct an inaccuracy  - queries are not logged in the execution logs, we just have the status of the request and potential fields affected.

- Also have the examples show the in_processing/complete execution logs instead of pending execution logs, because we don't create pending execution logs for an individual collection anymore.
- Privacy requests can also have a "paused" status due to pre-execution policy webhooks

* Add missing backticks around collection names.
  • Loading branch information
pattisdr authored Mar 2, 2022
1 parent ff43509 commit 4202d24
Show file tree
Hide file tree
Showing 4 changed files with 341 additions and 47 deletions.
79 changes: 42 additions & 37 deletions docs/fidesops/docs/guides/reporting.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,16 @@
In this section we'll cover:

- How to check the high-level status of your privacy requests
- How to get more detailed execution logs of queries that were run as part of your privacy requests.
- How to get more detailed execution logs of collections and fields that were potentially affected as part of your privacy request.


Take me directly to [API docs](/fidesops/api#operations-Privacy_Requests-get_request_status_api_v1_privacy_request_get).


## Overview

The reporting feature allows you to fetch information about privacy requests. You can opt for high-level or more detailed
information about the individual queries executed internally.
The reporting feature allows you to fetch information about privacy requests. You can opt for high-level status
information, or get more detailed information about the status of the requests on each of your collections.


## High-level Status
Expand Down Expand Up @@ -58,7 +58,7 @@ Use the following query params to further filter your privacy requests. Filters
`GET api/v1/privacy-request?created_gt=2021-10-01&created_lt=2021-10-05&status=pending`

- id
- status (one of `in_processing`, `pending`, `complete`, or `error`)
- status (one of `in_processing`, `pending`, `paused`, `complete`, or `error`)
- created_lt
- created_gt
- started_lt
Expand All @@ -68,6 +68,7 @@ Use the following query params to further filter your privacy requests. Filters
- errored_lt
- errored_gt


## View All Privacy Request Logs

To view all the execution logs for a Privacy Request, visit `/api/v1/privacy-request/{privacy_request_id}/logs`.
Expand All @@ -77,60 +78,64 @@ Check out the [API docs here](/fidesops/api#operations-Privacy_Requests-get_requ

## View Individual Privacy Request Log Details

Use the `verbose` query param to see more details about individual queries run as part of the Privacy Request along
with individual statuses.
Use the `verbose` query param to see more details about individual collections visited as part of the Privacy Request along
with individual statuses. Individual collection statuses include `in_processing`, `retrying`, `complete` or `error`.
You may see multiple logs for each collection as they reach different steps in the lifecycle.

`verbose` will embed a “results” key in the response, with execution logs grouped by dataset name. In the example below,
we have two datasets: `my-mongo-db` and `my-postgres-db`. There is one execution log for my-mongo-db and two execution
logs for my-postgres-db. The embedded execution logs are automatically truncated at 50 logs, so to view the entire
list of logs, visit the execution logs endpoint separately.
we have two datasets: `my-mongo-db` and `my-postgres-db`. There are two execution logs for `my-mongo-db` (when the `flights`
collection is starting execution and when the `flights` collection has finished) and two execution
logs for `my-postgres-db` (when the `order` collection is starting and finishing execution). `fields_affected` are the fields
that were potentially returned or masked based on the Rules you've specified on the Policy. The embedded execution logs
are automatically truncated at 50 logs, so to view the entire list of logs, visit the execution logs endpoint separately.

`GET api/v1/privacy-request?verbose=True`
`GET api/v1/privacy-request?id={privacy_request_id}&verbose=True`

```json
{
"items": [
{
"id": "pri_5f4feff5-fb60-4286-82bd-7e0748ce90ac",
"created_at": "2021-10-04T17:36:32.223287+00:00",
"started_processing_at": "2021-10-04T17:36:37.248880+00:00",
"finished_processing_at": "2021-10-04T17:36:37.263121+00:00",
"status": "pending",
"id": "pri_2e0655c3-7a76-425e-8c4c-52fee32ce14b",
"created_at": "2022-02-28T16:38:03.878898+00:00",
"started_processing_at": "2022-02-28T16:38:04.021763+00:00",
"finished_processing_at": "2022-02-28T16:38:06.211547+00:00",
"status": "complete",
"external_id": null,
"results": {
"my-mongo-db": [
{
"collection_name": "order",
"collection_name": "flights",
"fields_affected": [],
"message": "starting",
"action_type": "access",
"status": "in_processing",
"updated_at": "2022-02-28T16:38:04.668513+00:00"
},
{
"collection_name": "flights",
"fields_affected": [
{
"path": "order.customer_name",
"field_name": "name",
"path": "mongo_test:flights:passenger_information.full_name",
"field_name": "passenger_information.full_name",
"data_categories": [
"user.provided.identifiable.name"
]
}
],
"message": null,
"message": "success",
"action_type": "access",
"status": "pending",
"updated_at": "2021-10-05T18:24:55.570430+00:00"
"status": "complete",
"updated_at": "2022-02-28T16:38:04.727094+00:00"
}
],
"my-postgres-db": [
{
"collection_name": "order",
"fields_affected": [
{
"path": "order.customer_name",
"field_name": "name",
"data_categories": [
"user.provided.identifiable.name"
]
}
],
"message": null,
"fields_affected": [],
"message": "starting",
"action_type": "access",
"status": "pending",
"updated_at": "2021-10-05T18:24:39.953914+00:00"
"status": "in_processing",
"updated_at": "2022-02-28T16:38:04.668513+00:00"
},
{
"collection_name": "order",
Expand All @@ -142,11 +147,11 @@ list of logs, visit the execution logs endpoint separately.
"user.provided.identifiable.name"
]
}
],
"message": null,
],
"message": "success",
"action_type": "access",
"status": "pending",
"updated_at": "2021-10-05T18:24:45.240612+00:00"
"status": "complete",
"updated_at": "2022-02-28T16:39:04.668513+00:00"
}
]
}
Expand Down
63 changes: 54 additions & 9 deletions src/fidesops/task/graph_task.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,9 @@
TERMINATOR_ADDRESS,
FieldPath,
Field,
FieldAddress,
)
from fidesops.graph.graph import Edge, DatasetGraph
from fidesops.graph.graph import Edge, DatasetGraph, Node
from fidesops.graph.traversal import TraversalNode, Traversal
from fidesops.models.connectionconfig import ConnectionConfig, AccessLevel
from fidesops.models.policy import ActionType, Policy
Expand Down Expand Up @@ -219,14 +220,9 @@ def log_end(
logger.info(f"Ending {self.resources.request.id}, {self.key}")
self.update_status(
"success",
[
{
"field_name": field.name,
"path": f"{self.traversal_node.node.address}:{field.name}",
"data_categories": field.data_categories,
}
for field in self.traversal_node.node.collection.field_dict.values()
],
build_affected_field_logs(
self.traversal_node.node, self.resources.policy, action_type
),
action_type,
ExecutionLogStatus.complete,
)
Expand Down Expand Up @@ -487,3 +483,52 @@ def termination_fn(*dependent_values: int) -> Tuple[int, ...]:
)

return erasure_update_map


def build_affected_field_logs(
node: Node, policy: Policy, action_type: ActionType
) -> List[Dict[str, Any]]:
"""For a given node (collection), policy, and action_type (access or erasure) format all of the fields that
were potentially touched to be stored in the ExecutionLogs for troubleshooting.
:Example:
[{
"path": "dataset_name:collection_name:field_name",
"field_name": "field_name",
"data_categories": ["data_category_1", "data_category_2"]
}]
"""

targeted_field_paths: Dict[FieldAddress, str] = {}

for rule in policy.rules:
if rule.action_type != action_type:
continue
rule_categories: List[str] = rule.get_target_data_categories()
if not rule_categories:
continue

collection_categories: Dict[
str, List[FieldPath]
] = node.collection.field_paths_by_category
for rule_cat in rule_categories:
for collection_cat, field_paths in collection_categories.items():
if collection_cat.startswith(rule_cat):
targeted_field_paths.update(
{
node.address.field_address(field_path): collection_cat
for field_path in field_paths
}
)

ret: List[Dict[str, Any]] = []
for field_address, data_categories in targeted_field_paths.items():
ret.append(
{
"path": field_address.value,
"field_name": field_address.field_path.string_path,
"data_categories": [data_categories],
}
)

return ret
Loading

0 comments on commit 4202d24

Please sign in to comment.