70 feat adds policy driven masking #108

eastandwestwind · 2021-11-30T16:12:01Z

Purpose

Adds handling for new length and data_type params on Field
Restricts supported masking strategies to the following:
[ ] HMAC
[ ] Hash
[ ] AES Encryption
[ x ] Null
[ x ] Random string rewrite
[ x ] Default string rewrite

Changes

Adds supported data_types to each masking strategy
Requires a data_type on Field for masking strategies that are not null_rewrite
Adds methods to each DataTypeConverter to truncate val to specified length if provided (we only truncate integer and string data types)

Checklist

Applicable documentation updated (guides, quickstart, postman collections, tutorial, fidesdemo)
Good unit test/integration test coverage

Ticket

Fixes #70

docs/fidesops/docs/tutorial/annotate_datasets.md

stevenbenjamin · 2021-12-01T19:14:41Z

src/fidesops/graph/config.py

@@ -219,6 +220,14 @@ def cast(self, value: Any) -> Optional[Any]:
        return value


+@dataclass
+class MaskingOverride:


As far as I can tell, this is only being used in iteration in the query config as a temporary storage of (datatype, length). Maybe this class is not needed?

With a class, we have the benefit of type hints (vs using a namedtuple), see https://stackoverflow.com/a/50038614/4957420

stevenbenjamin · 2021-12-01T19:51:19Z

src/fidesops/graph/data_type.py

+        """Truncates value to given length"""
+        logger.warning(
+            "Length truncation not supported for ObjectId data_type. Using original masked value instead for update query."
+        )


Since all of these truncate methods, with the exception of string, are identical, they should probably be implemented in the base class and just get overridden for string.

good point!

stevenbenjamin · 2021-12-01T19:59:55Z

src/fidesops/service/connectors/query_config.py

+                        logger.warning(
+                            f"Unable to generate a query for field {field_name} due to: data_type of {masking_override.data_type} is not supported for the {strategy_config['strategy']} masking strategy"
+                        )
+                        continue


There are 4 separate "if not" conditions in this method, makes it a bit hard to follow. Is there a way to simplify this?

for sure. Refactoring out some separate static methods to make this easier to read

stevenbenjamin · 2021-12-01T20:01:40Z

src/fidesops/service/connectors/query_config.py


        value_map: Dict[str, Any] = {}
        for rule, field_names in rule_to_collection_fields.items():
            strategy_config = rule.masking_strategy
-            strategy = get_strategy(


What about on #139: for rule, field_names in rule_to_collection_fields.items() if rule.masking_strategy: to get rid of the extra conditional on #141?

Hmm, not sure this works in Python. I would do this: https://stackoverflow.com/a/6981771/4957420 but then I'd still have to iterate over the new filtered variable.

src/fidesops/service/connectors/query_config.py

stevenbenjamin · 2021-12-01T20:09:09Z

src/fidesops/service/masking/strategy/masking_strategy_hash.py

@@ -15,6 +15,7 @@


 HASH = "hash"
+SUPPORTED_DATA_TYPES = ["string"]


If we're going to have Supported Types, it should probably be an abstract method or maybe a default implementation on the parent MaskingStrategy class. I don't think this should be a module level variable.

Also, since we're repeatedly calling "x in SUPPORTED_DATA_TYPES" , making SUPPORTED_DATA_TYPES a set (since we're only creating it once) will be faster.

I'm using the abstract method data_type_supported already, it's just the SUPPORTED_DATA_TYPES const that's specific to each module. I think a data_type_supported abstract method works better than a supported_data_types abstract method bc it leaves the specific details up to the module. i.e. for null masking, we always support the data type, so the caller of a supported_data_types abstract method would need to have a special case for null masking

That all makes sense, I do think, though, that SUPPORTED_DATA_TYPES should be inside the class and not accessible in any other way.

Makes sense, I'll update!

stevenbenjamin · 2021-12-02T16:11:02Z

src/fidesops/graph/data_type.py

+        logger.warning(
+            f"Length truncation not supported for {T} data_type. Using original masked value instead for update query."
+        )
+        return val



When I test this, I get e.g.

DataType.float.value.truncate(1, 1) >> Length truncation not supported for ~T data_type. Using original masked value instead for update query.

The only way I've been able to access that in the superclass is via:

def truncate(self, length: int, val: T) -> T: typename = self.__class__.__orig_bases__[0].__args__[0].__name__ # this will be "float", "int", ... logger.warning(f"not supported ... {typename}....")

But from what I've read this is not terribly stable and version dependent. Maybe just a static log method that only takes the string type name as a parameter would be cleaner?

Ah right, thinking how I'd obtain type in the abstract base class here...

looks like it'd be a bit convoluted https://stackoverflow.com/questions/57706180/generict-base-class-how-to-get-type-of-t-from-within-instance so I'm going with diff wording for now in this message

FYI @pattisdr suggested instead of typename the much simpler

def truncate(self, length: int, val: T) -> T: typename = self.__class__.__name__

Which will give you e.g. "IntTypeConverter"

ooo awesome idea thanks!

stevenbenjamin · 2021-12-02T16:47:14Z

src/fidesops/graph/data_type.py

@@ -49,6 +62,10 @@ def empty_value(self) -> int:
        """Empty int value"""
        return 0

+    def truncate(self, length: int, val: int) -> int:
+        """Truncates value to given length"""
+        return int(str(val)[:length])


Probably not the right way to truncate an int.
The reason for our truncation is so that we don't generate values that will fail on insert when we mask data because they won't fit in the target database. So, e.g. here are mysql types: https://dev.mysql.com/doc/refman/8.0/en/integer-types.html
These are truncated by bytes, so, e.g. if you had a mysql column defined as an INT and you tried to insert a value greater than 2147483647 it would fail.

I think this can be deferred ( for the moment at least - it was never supported in atlas AFAIK) but we shouldn't be truncating alphabetically, in any case.

got it. Would byte truncation be diff for nosql vs sql?

I don't really know. The thing I posted was mysql specific. All of the nosql databases I've worked with sidestep this issue by basically allowing you to write anything - e.g. dynamo only uses BigDecimals in the java implementation .
I think it shoiuld only matter with defined schema types. I don't think it's hugely common to define things like TINYINT columns (probably mattered more when people were very concerned with field sizes) , but it could happen. Don't know if this has every been an issue with Atlas

* adding some types and questions for tech discussion * adding some types and questions for tech discussion * removes hmac, aes, and hash from supported strategies for now * adds supported data types to masking strategies * adds logger warning if query cannot be generated to do data type configs * adds support for truncating specific data types * gets current tests passing * adds test for length truncation * adds docs * cr * moving supported_data_types inside class * remove type var * remove int truncation * use class name to identify type Co-authored-by: catherinesmith <[email protected]>

catherinesmith added 10 commits November 22, 2021 11:27

adding some types and questions for tech discussion

6318a61

adding some types and questions for tech discussion

26054c6

removes hmac, aes, and hash from supported strategies for now

9efefde

adds supported data types to masking strategies

fc1dbe0

adds logger warning if query cannot be generated to do data type configs

b49fbb4

adds support for truncating specific data types

f9fb664

gets current tests passing

37878ac

adds test for length truncation

a54d437

fix conflicts

8cc2c85

adds docs

9aa7687

iamkelllly requested a review from stevenbenjamin December 1, 2021 17:04

stevenbenjamin reviewed Dec 1, 2021

View reviewed changes

docs/fidesops/docs/tutorial/annotate_datasets.md Outdated Show resolved Hide resolved

stevenbenjamin reviewed Dec 1, 2021

View reviewed changes

src/fidesops/service/connectors/query_config.py Outdated Show resolved Hide resolved

stevenbenjamin reviewed Dec 1, 2021

View reviewed changes

cr

7c54a11

eastandwestwind mentioned this pull request Dec 2, 2021

Snowflake Query Execution [#73] #104

Merged

5 tasks

pattisdr assigned stevenbenjamin Dec 2, 2021

moving supported_data_types inside class

6757329

stevenbenjamin reviewed Dec 2, 2021

View reviewed changes

remove type var

c31f04c

stevenbenjamin reviewed Dec 2, 2021

View reviewed changes

catherinesmith added 2 commits December 2, 2021 16:54

remove int truncation

f071713

use class name to identify type

6f5d447

stevenbenjamin merged commit bef54bc into main Dec 3, 2021

NevilleS mentioned this pull request Jan 12, 2022

Update masking strategies guide with latest supported options #155

Closed

pattisdr deleted the 70-feat-adds-policy-driven-masking branch February 9, 2022 19:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

70 feat adds policy driven masking #108

70 feat adds policy driven masking #108

eastandwestwind commented Nov 30, 2021 •

edited

Loading

stevenbenjamin Dec 1, 2021

eastandwestwind Dec 2, 2021

stevenbenjamin Dec 1, 2021

eastandwestwind Dec 2, 2021

stevenbenjamin Dec 1, 2021

eastandwestwind Dec 2, 2021

stevenbenjamin Dec 1, 2021

eastandwestwind Dec 2, 2021 •

edited

Loading

stevenbenjamin Dec 1, 2021

eastandwestwind Dec 2, 2021

stevenbenjamin Dec 2, 2021

eastandwestwind Dec 2, 2021

stevenbenjamin Dec 2, 2021 •

edited

Loading

eastandwestwind Dec 2, 2021

eastandwestwind Dec 2, 2021

stevenbenjamin Dec 2, 2021 •

edited

Loading

eastandwestwind Dec 2, 2021

stevenbenjamin Dec 2, 2021

eastandwestwind Dec 2, 2021 •

edited

Loading

stevenbenjamin Dec 3, 2021 •

edited

Loading

		@@ -15,6 +15,7 @@


		HASH = "hash"
		SUPPORTED_DATA_TYPES = ["string"]

70 feat adds policy driven masking #108

70 feat adds policy driven masking #108

Conversation

eastandwestwind commented Nov 30, 2021 • edited Loading

Purpose

Changes

Checklist

Ticket

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eastandwestwind Dec 2, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stevenbenjamin Dec 2, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stevenbenjamin Dec 2, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eastandwestwind Dec 2, 2021 • edited Loading

Choose a reason for hiding this comment

stevenbenjamin Dec 3, 2021 • edited Loading

Choose a reason for hiding this comment

eastandwestwind commented Nov 30, 2021 •

edited

Loading

eastandwestwind Dec 2, 2021 •

edited

Loading

stevenbenjamin Dec 2, 2021 •

edited

Loading

stevenbenjamin Dec 2, 2021 •

edited

Loading

eastandwestwind Dec 2, 2021 •

edited

Loading

stevenbenjamin Dec 3, 2021 •

edited

Loading