You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now, if specify that we should depseudonymize during export, then the depseudonymization will be applied for all pseudo rules that are provided. This is done using the pseudoRules parameter which accepts a list of pseudo rules (name, pattern, func), each of which potentially matches multiple fields.
In the export endpoint, if we don't explicitly specify which pseudo rules to use, then we try to retrieve these rules from the dataset metadata. Deducing pseudo rules from the dataset metadata is assumably going to be the main use case. However, for these cases:
we don't have any mechanism to only specify a subset of rules to applied
we don't have any mechanism to only specify a subset of fields to be depseudonymized
Thus, the suggestion is to introduce two new parameters: pseudoRulesFilter and pseudoFieldsFilter.
To summarize, depseudonymization during export would be specified by the following parameters:
pseudoRules - if not present, then deduce these from the dataset path
pseudoRulesFilter - a list of named pseudo rules that should be considered
pseudoFieldsFilter - a list of globs that addresses the fields that should be considered. Allows the user to have more control over which fields gets depseudonymized, since a pseudo rule might match multiple fields
depseudo - whether or not the export should depseudonymize. Only required if pseudo rules should be deduced from dataset path and no pseudo filters have been specified. If either of the above parameters are present, then the export should assume this property to be true.
The text was updated successfully, but these errors were encountered:
kschulst
changed the title
Export: Support partial depseudnoymization when deducing pseudo rules from dataset metadata
Export: support partial depseudnoymization when deducing pseudo rules from dataset metadata
Apr 24, 2021
Right now, if specify that we should depseudonymize during export, then the depseudonymization will be applied for all pseudo rules that are provided. This is done using the
pseudoRules
parameter which accepts a list of pseudo rules (name
,pattern
,func
), each of which potentially matches multiple fields.In the export endpoint, if we don't explicitly specify which pseudo rules to use, then we try to retrieve these rules from the dataset metadata. Deducing pseudo rules from the dataset metadata is assumably going to be the main use case. However, for these cases:
Thus, the suggestion is to introduce two new parameters:
pseudoRulesFilter
andpseudoFieldsFilter
.To summarize, depseudonymization during export would be specified by the following parameters:
pseudoRules
- if not present, then deduce these from the dataset pathpseudoRulesPath
- optional explicit path to deduce pseudo rules from (Export: support retrieving pseudo rules from another dataset path #2)pseudoRulesFilter
- a list of named pseudo rules that should be consideredpseudoFieldsFilter
- a list of globs that addresses the fields that should be considered. Allows the user to have more control over which fields gets depseudonymized, since a pseudo rule might match multiple fieldsdepseudo
- whether or not the export should depseudonymize. Only required if pseudo rules should be deduced from dataset path and no pseudo filters have been specified. If either of the above parameters are present, then the export should assume this property to be true.The text was updated successfully, but these errors were encountered: