Skip to content
This repository has been archived by the owner on Nov 30, 2022. It is now read-only.

Snowflake Query Execution [#73] #104

Merged
merged 51 commits into from
Dec 9, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
07bed10
fixtures for Snowflake datasets
Nov 22, 2021
567a1ce
update key
Nov 23, 2021
18620e2
use snowflake connector for snowflake dbs
Nov 23, 2021
6da4210
adds fixture for integration config
Nov 24, 2021
aeeb0a9
set secrets on the Snowflake connection config if available
Nov 24, 2021
716153d
add initial test for Snowflake privacy request
Nov 24, 2021
3c1fdfe
extend SQLQueryConfig to cater to Snowflake dialect
Nov 24, 2021
fe91118
Merge branch 'main' into seanpreston-73-snowflake-execution
Nov 24, 2021
a9267cb
adds docstrings
Nov 24, 2021
27c4084
add snowflake test dataset
Nov 24, 2021
1334cbd
fix typechecks
Nov 24, 2021
95d32df
fix expected return counts
Nov 24, 2021
50ba838
update statement formatting for snowflake
Dec 4, 2021
40231c2
fix key, add secrets
Dec 4, 2021
0dc3094
update customer name
Dec 4, 2021
527f81f
conflict resolution
Dec 6, 2021
ef26eb8
Merge branch 'main' into seanpreston-73-snowflake-execution
Dec 6, 2021
9190fc9
adds docstring
Dec 6, 2021
e11338b
ensure deleting connection configs removes related datasets
Dec 6, 2021
2e10b37
handle snowflake
Dec 6, 2021
da53475
update methodology for testing datasets, delete privacy request once …
Dec 6, 2021
1b0b4c9
fix return types
Dec 6, 2021
6452245
use correct mark
Dec 6, 2021
793d39a
only give snowflake secrets in external tests
Dec 7, 2021
23a5b89
remove expunge_all in favour of checking session membership
Dec 7, 2021
8fd5872
fix collection name
Dec 7, 2021
353053a
adds snowflake update statement generation and IN handling
Dec 7, 2021
869df1d
add hide_parameters option thats set by env var
Dec 8, 2021
2fe7a91
listen to self.hide_parameters
Dec 8, 2021
a6140b4
add another change in the logger to check for env var
Dec 8, 2021
2496783
add test for snowflake erasure on a specific category
Dec 8, 2021
689a44f
delete data from snowflake after we're finished with it
Dec 8, 2021
00b99ba
group snowflake resources into a fixture
Dec 8, 2021
43cd8c2
use correct formatting in queries and identity submissions
Dec 8, 2021
cace39d
Merge branch 'main' into seanpreston-73-snowflake-execution
Dec 8, 2021
7511980
format input data correctly
Dec 8, 2021
ce09591
add datatypes
Dec 8, 2021
5c20fff
fix refs to os.getenv
Dec 8, 2021
cc64399
adds docstrings
Dec 8, 2021
92981ff
remove parens
Dec 8, 2021
2b2efe2
toggle env var to check PII masking
Dec 8, 2021
272d733
remove extra whitespace
Dec 8, 2021
b3224d2
add scope to fixture
Dec 8, 2021
4acfea3
variant support
Dec 8, 2021
9285cd3
remove expunge_all in new file
Dec 8, 2021
043b032
remove copypasta
Dec 9, 2021
def2484
add docstring
Dec 9, 2021
8624fb3
add config.is_test_mode, remove superfluous name
Dec 9, 2021
1625ed1
remove circular dependency
Dec 9, 2021
c023f63
remove unused imports and fixtures
Dec 9, 2021
3241ed9
correct var names, allow fixture to proceed without secrets
Dec 9, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
229 changes: 229 additions & 0 deletions data/dataset/snowflake_example_test_dataset.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,229 @@
dataset:
- fides_key: snowflake_example_test_dataset
name: Snowflake Example Test Dataset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we're thinking about test fixtures for these different datasources, it would be nice to create some interdependency between them, like how the mongo example test dataset relies on a field in the postgres example test dataset, I don't think our datasets should all be completely independent datastores.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have the postman collection be able to make requests against our fidesops snowflake db - but the way i understand it, we wipe and repopulate it from these unit tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we wipe and repopulate it from these unit tests?

Not quite — we're only wiping and repopulating the data we use in erasures. Everything else is part of a copy of the dataset Atlas uses for Snowflake. There are some frustrating minor differences here, for example, in Snowflake the collection is order and in Postgres it's orders plural.

I'll factor this into the testing design for sure! Thanks

description: Example of a Snowflake dataset containing a variety of related tables like customers, products, addresses, etc.
collections:
- name: address
fields:
- name: city
data_categories: [user.provided.identifiable.contact.city]
- name: house
data_categories: [user.provided.identifiable.contact.street]
- name: id
data_categories: [system.operations]
fidesops_meta:
primary_key: True
- name: state
data_categories: [user.provided.identifiable.contact.state]
- name: street
data_categories: [user.provided.identifiable.contact.street]
- name: zip
data_categories: [user.provided.identifiable.contact.postal_code]

- name: customer
fields:
- name: address_id
data_categories: [system.operations]
fidesops_meta:
references:
- dataset: snowflake_example_test_dataset
field: address.id
direction: to
- name: created
data_categories: [system.operations]
- name: email
data_categories: [user.provided.identifiable.contact.email]
fidesops_meta:
identity: email
seanpreston marked this conversation as resolved.
Show resolved Hide resolved
data_type: string
- name: id
data_categories: [user.derived.identifiable.unique_id]
fidesops_meta:
primary_key: True
- name: name
data_categories: [user.provided.identifiable.name]
fidesops_meta:
data_type: string
length: 40
- name: variant_eg
# We use this data category so we can target this column from
# our Snowflake tests
data_categories: [user.provided.identifiable.name]
pattisdr marked this conversation as resolved.
Show resolved Hide resolved

- name: employee
fields:
- name: address_id
data_categories: [system.operations]
fidesops_meta:
references:
- dataset: snowflake_example_test_dataset
field: address.id
direction: to
- name: email
data_categories: [user.provided.identifiable.contact.email]
fidesops_meta:
identity: email
data_type: string
- name: id
data_categories: [user.derived.identifiable.unique_id]
fidesops_meta:
primary_key: True
- name: name
data_categories: [user.provided.identifiable.name]
fidesops_meta:
data_type: string

- name: login
fields:
- name: customer_id
data_categories: [user.derived.identifiable.unique_id]
fidesops_meta:
references:
- dataset: snowflake_example_test_dataset
field: customer.id
direction: from
- name: id
data_categories: [system.operations]
fidesops_meta:
primary_key: True
- name: time
data_categories: [user.derived.nonidentifiable.sensor]

- name: order
fields:
- name: customer_id
data_categories: [user.derived.identifiable.unique_id]
fidesops_meta:
references:
- dataset: snowflake_example_test_dataset
field: customer.id
direction: from
- name: id
data_categories: [system.operations]
fidesops_meta:
primary_key: True
- name: shipping_address_id
data_categories: [system.operations]
fidesops_meta:
references:
- dataset: snowflake_example_test_dataset
field: address.id
direction: to

# order_item
- name: order_item
fields:
- name: order_id
data_categories: [system.operations]
fidesops_meta:
references:
- dataset: snowflake_example_test_dataset
field: order.id
direction: from
- name: product_id
data_categories: [system.operations]
fidesops_meta:
references:
- dataset: snowflake_example_test_dataset
field: product.id
direction: to
- name: quantity
data_categories: [system.operations]

- name: payment_card
fields:
- name: billing_address_id
data_categories: [system.operations]
fidesops_meta:
references:
- dataset: snowflake_example_test_dataset
field: address.id
direction: to
- name: ccn
data_categories: [user.provided.identifiable.financial.account_number]
- name: code
data_categories: [user.provided.identifiable.financial]
- name: customer_id
data_categories: [user.derived.identifiable.unique_id]
fidesops_meta:
references:
- dataset: snowflake_example_test_dataset
field: customer.id
direction: from
- name: id
data_categories: [system.operations]
fidesops_meta:
primary_key: True
- name: name
data_categories: [user.provided.identifiable.financial]
- name: preferred
data_categories: [user.provided.nonidentifiable]

- name: product
fields:
- name: id
data_categories: [system.operations]
fidesops_meta:
primary_key: True
- name: name
data_categories: [system.operations]
- name: price
data_categories: [system.operations]

- name: report
fields:
- name: email
data_categories: [user.provided.identifiable.contact.email]
fidesops_meta:
identity: email
data_type: string
- name: id
data_categories: [system.operations]
fidesops_meta:
primary_key: True
- name: month
data_categories: [system.operations]
- name: name
data_categories: [system.operations]
- name: total_visits
data_categories: [system.operations]
- name: year
data_categories: [system.operations]

- name: service_request
fields:
- name: alt_email
data_categories: [user.provided.identifiable.contact.email]
fidesops_meta:
identity: email
data_type: string
- name: closed
data_categories: [system.operations]
- name: email
data_categories: [system.operations]
fidesops_meta:
identity: email
data_type: string
- name: employee_id
data_categories: [user.derived.identifiable.unique_id]
fidesops_meta:
references:
- dataset: snowflake_example_test_dataset
field: employee.id
direction: from
- name: id
data_categories: [system.operations]
fidesops_meta:
primary_key: True
- name: opened
data_categories: [system.operations]

- name: visit
fields:
- name: email
data_categories: [user.provided.identifiable.contact.email]
fidesops_meta:
identity: email
data_type: string
- name: last_visit
data_categories: [system.operations]
2 changes: 2 additions & 0 deletions src/fidesops/core/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,8 @@ class FidesopsConfig(FidesSettings):
security: SecuritySettings
execution: ExecutionSettings

is_test_mode: bool = os.getenv("TESTING") == "True"

class Config: # pylint: disable=C0115
case_sensitive = True

Expand Down
3 changes: 1 addition & 2 deletions src/fidesops/db/session.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
import logging
import os
from typing import Optional

from sqlalchemy import create_engine
Expand All @@ -20,7 +19,7 @@ def get_db_engine(database_uri: Optional[str] = None) -> Engine:
"""
if database_uri is None:
# Don't override any database_uri explicity passed in
if os.getenv("TESTING"):
if config.is_test_mode:
database_uri = config.database.SQLALCHEMY_TEST_DATABASE_URI
else:
database_uri = config.database.SQLALCHEMY_DATABASE_URI
Expand Down
13 changes: 12 additions & 1 deletion src/fidesops/models/connectionconfig.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import enum
from datetime import datetime
from typing import Optional

from sqlalchemy import (
Column,
Expand All @@ -18,7 +19,10 @@


from fidesops.core.config import config
from fidesops.db.base_class import Base, JSONTypeOverride
from fidesops.db.base_class import (
Base,
JSONTypeOverride,
)


class TestStatus(enum.Enum):
Expand Down Expand Up @@ -88,3 +92,10 @@ def update_test_status(self, test_status: TestStatus, db: Session) -> None:
self.last_test_timestamp = datetime.now()
self.last_test_succeeded = test_status == TestStatus.succeeded
self.save(db)

def delete(self, db: Session) -> Optional[Base]:
seanpreston marked this conversation as resolved.
Show resolved Hide resolved
"""Hard deletes datastores that map this ConnectionConfig."""
for dataset in self.datasets:
dataset.delete(db=db)

return super().delete(db=db)
6 changes: 6 additions & 0 deletions src/fidesops/service/connectors/base_connector.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
from abc import abstractmethod, ABC
from typing import Any, Dict, List, Optional, TypeVar, Generic

from fidesops.core.config import config
from fidesops.graph.traversal import Row, TraversalNode
from fidesops.models.connectionconfig import ConnectionConfig, TestStatus
from fidesops.models.policy import Policy
Expand Down Expand Up @@ -29,6 +30,11 @@ class BaseConnector(Generic[DB_CONNECTOR_TYPE], ABC):

def __init__(self, configuration: ConnectionConfig):
self.configuration = configuration
# If Fidesops is running in test mode, it's OK to show
# parameters inside queries for debugging purposes. By
# default we assume that Fidesops is not running in test
# mode.
self.hide_parameters = not config.is_test_mode
self.db_client: Optional[DB_CONNECTOR_TYPE] = None

@abstractmethod
Expand Down
Loading