Skip to content
This repository has been archived by the owner on Nov 30, 2022. It is now read-only.

Snowflake Query Execution [#73] #104

Merged
merged 51 commits into from
Dec 9, 2021
Merged

Conversation

seanpreston
Copy link
Contributor

@seanpreston seanpreston commented Nov 24, 2021

Purpose

This PR contains query execution for the Snowflake external datastore.

Changes

  • Overrides the SQLQueryConfig class to with a SnowflakeQueryConfig
  • Adds methods in the base SQLQueryConfig class to format variables that eventually end up in the query string
  • Implements those methods in the SnowflakeQueryConfig
  • Adds a dataset for our test Snowflake schema with data_type and length meta annotations
  • Adds an API test that runs an access request through Snowflake
  • Adds an API test that runs an erasure request through Snowflake

Left to do

  • Make similar query formatting changes for erasure handling in Snowflake
  • Add a separate test schema (Atlas uses the current one) on Snowflake for Fidesops
  • Configure test data programmatically for the new Snowflake test schema in the test fixtures
  • More thorough testing, e.g. specifically for the Variant type and erasure handling
  • Add type attribute into Snowflake dataset

Ticket

Closes #73

@seanpreston seanpreston added the run unsafe ci checks Triggers running of unsafe CI checks label Nov 24, 2021
@seanpreston seanpreston changed the title Seanpreston 73 snowflake execution Snowflake Query Execution [#73] Nov 25, 2021
@seanpreston seanpreston added run unsafe ci checks Triggers running of unsafe CI checks and removed run unsafe ci checks Triggers running of unsafe CI checks labels Nov 25, 2021
@seanpreston seanpreston added run unsafe ci checks Triggers running of unsafe CI checks and removed run unsafe ci checks Triggers running of unsafe CI checks labels Dec 6, 2021
@seanpreston seanpreston added run unsafe ci checks Triggers running of unsafe CI checks and removed run unsafe ci checks Triggers running of unsafe CI checks labels Dec 6, 2021
@pytest.fixture
def example_datasets() -> List[Dict]:
example_datasets = []
example_filenames = [
"data/dataset/postgres_example_test_dataset.yml",
"data/dataset/mongo_example_test_dataset.yml",
"data/dataset/snowflake_example_test_dataset.yml",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My plan from here out is to avoid adding any subsequent datasets to this fixture, preferring to have them loaded in by a dedicated fixture for each one.

@@ -129,7 +131,7 @@ def test_create_and_process_access_request(

policy.delete(db=db)
pr.delete(db=db)
db.expunge_all()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line was causing downstream failures in the test teardown where FKs also needed to be cleaned up (specifically the ConnectionConfig -> DatasetConfig relation). After talking to @pattisdr I discovered this line was here as a forcing function to ensure that pr is no longer in the database session when the execution logs are checked for their .privacy_request_id attribute, a step to ensure that value is not cleared when pr is deleted.

I've opted to manually check whether pr is in the session in the line below, as this performs the same function.

from fidesops.util import logger
from fidesops.util.logger import NotPii, MASKED


def test_logger_masks_pii() -> None:
@pytest.fixture(scope="function")
def toggle_testing_envvar() -> None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is useful to explicitly check areas where we use the os.environ["TESTING"] in production code

@seanpreston seanpreston added the run unsafe ci checks Triggers running of unsafe CI checks label Dec 8, 2021
@@ -38,8 +41,7 @@ def migrate_test_db() -> None:
def db() -> Generator:
"""Return a connection to the test DB"""
# Create the test DB enginge
## This asserts that TESTING==True
assert os.getenv("TESTING", False)
assert os.getenv("TESTING") == "True"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @stevenbenjamin — I've updated these checks to look like this now as that's the correct convention according to pylint. Environment vars can only be strings so returning a bool as the default arg is dangerous. Hopefully this syntax is clearer too.

operator: str,
) -> str:
"""Returns field names in clauses surrounded by quotation marks as required by Snowflake syntax."""
return f'"{field_name}" {operator} (:{field_name})'
Copy link
Contributor Author

@seanpreston seanpreston Dec 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the () around :{field_name} is an interesting difference to how the more vanilla SQL connectors work. It could be that currently in our Postgres and MySQL connectors IN functions are accepted without parenthesis, and if so we should add them in because with-parenthesis is the standard SQL syntax.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a snowflake thing, but an idiosyncracy of the the way the dialect is realized in sqlalchemy. the other dialects we've used seem to add the '()' around tuple IN values automatically.

Copy link
Contributor

@pattisdr pattisdr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks good @seanpreston, not merging yet in case there's anything lingering you want to address but it looks good.

code was organized and readable about why certain decisions were made. I completed local access request testing outside of our regular unit tests - this behaved as expected and it was nice to see the queries being executed too in testing mode.

src/fidesops/models/connectionconfig.py Show resolved Hide resolved
tests/api/v1/endpoints/test_dataset_endpoints.py Outdated Show resolved Hide resolved
tests/fixtures.py Outdated Show resolved Hide resolved
tests/fixtures.py Outdated Show resolved Hide resolved
@@ -256,6 +262,45 @@ def snowflake_connection_config(db: Session) -> Generator:
connection_config.delete(db)


@pytest.fixture(scope="function")
def snowflake_connection_config(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming convention is confusing now - snowflake_connection_config is based off of safe_snowflake_connection_config, but has write permissions, and then separate snowflake_read_connection_config has read. I was confused at first because I incorrectly assumed safe_snowflake_connection_config` meant read.

Copy link
Contributor Author

@seanpreston seanpreston Dec 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good spot, I'll tidy this up in some way. Maybe unsafe_snowflake_connection_config or just snowflake_connection_config and it always has secrets if available. The read only config isn't being used anywhere since that property isn't enforced in the traversal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seanpreston I added this PR #49 to not do erasures if a node's connection config doesn't have write access -

@@ -0,0 +1,229 @@
dataset:
- fides_key: snowflake_example_test_dataset
name: Snowflake Example Test Dataset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we're thinking about test fixtures for these different datasources, it would be nice to create some interdependency between them, like how the mongo example test dataset relies on a field in the postgres example test dataset, I don't think our datasets should all be completely independent datastores.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have the postman collection be able to make requests against our fidesops snowflake db - but the way i understand it, we wipe and repopulate it from these unit tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we wipe and repopulate it from these unit tests?

Not quite — we're only wiping and repopulating the data we use in erasures. Everything else is part of a copy of the dataset Atlas uses for Snowflake. There are some frustrating minor differences here, for example, in Snowflake the collection is order and in Postgres it's orders plural.

I'll factor this into the testing design for sure! Thanks

@seanpreston seanpreston added run unsafe ci checks Triggers running of unsafe CI checks and removed run unsafe ci checks Triggers running of unsafe CI checks labels Dec 9, 2021
@seanpreston seanpreston added run unsafe ci checks Triggers running of unsafe CI checks and removed run unsafe ci checks Triggers running of unsafe CI checks labels Dec 9, 2021
@seanpreston seanpreston added run unsafe ci checks Triggers running of unsafe CI checks and removed run unsafe ci checks Triggers running of unsafe CI checks labels Dec 9, 2021
@seanpreston seanpreston added run unsafe ci checks Triggers running of unsafe CI checks and removed run unsafe ci checks Triggers running of unsafe CI checks labels Dec 9, 2021
Copy link
Contributor

@pattisdr pattisdr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks ready to me @seanpreston

@pattisdr pattisdr merged commit ede6f93 into main Dec 9, 2021
@pattisdr pattisdr deleted the seanpreston-73-snowflake-execution branch February 9, 2022 19:08
sanders41 pushed a commit that referenced this pull request Sep 22, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
run unsafe ci checks Triggers running of unsafe CI checks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Query execution for Snowflake
4 participants