Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some questions about discovery-service mode #178

Open
sergiospa opened this issue Sep 17, 2024 · 7 comments
Open

Some questions about discovery-service mode #178

sergiospa opened this issue Sep 17, 2024 · 7 comments

Comments

@sergiospa
Copy link

Hello!

First of all, thank you for this amazing project! We are considering integrating it into our GCP organization and have been exploring the best ways to do so. We have a few questions and would appreciate your insights if you have a moment.

We plan to use the discovery-service mode, as we already perform automatic DLP inspections across numerous projects. Here are our questions:

  1. From my understanding, we need to manually list each project in the terraform.tfvars file and create a corresponding domain-project mapping, even when using just one domain. Is there any way to automatically include all projects within the organization without listing them individually? Alternatively, is it possible to map folders instead of specific projects?

  2. Is there a way to add all many projects to a single domain, something like this?

domain_mapping = [
  {
    project = "%folder1% / %list of projects 1%",
    domain = "domain1"
  },
  {
    project = "%folder2% / %list of projects 1%",
    domain = "domain2"
  },
  1. If we must list projects in the .tfvars file, does this mean we’ll need to manually add new projects each time one is created? Using folders would really avoid this issue, so can we use them?

I hope these questions make sense. We're really excited about the potential of your project and look forward to your response!

Best regards,
Sergio

@sergiospa
Copy link
Author

Hi! I've got one more question, if I'm allowed to ask...

At this moment we are only interested in the automatic tagging functionality but we are not interested in the access control feature, meaning this that we are not interested in the domain and IAM mapping settings. If we set data_catalog_taxonomy_activated_policy_types to [] in the tfvars file, does that completely disable the access control feature? What should we do with the domain and iam-mapping settings? Is there any "blank" value we can set for these variables that wouldn't break the solution?

Thanks!!
Regards,
Sergio.

@kwadie
Copy link
Collaborator

kwadie commented Sep 19, 2024 via email

@sergiospa
Copy link
Author

Hey! Great, thank you!! Enjoy the weekend :)

@kwadie
Copy link
Collaborator

kwadie commented Sep 23, 2024

Hey @sergiospa, we're glad that you find this project useful for your organization. Here are my answers:

From my understanding, we need to manually list each project in the terraform.tfvars file and create a corresponding domain-project mapping, even when using just one domain. Is there any way to automatically include all projects within the organization without listing them individually? Alternatively, is it possible to map folders instead of specific projects?

Unfortunately there is no folder-level support at the moment. The highest level of granularity in the scan scope (i.e. include and exclude lists) is a project, and by turn, the domain mapping config

Is there a way to add all many projects to a single domain, something like this?

In the current implementation, no. However, you can change the structure of this variable and use a list of projects as long as you propagate your changes to the creation of the v_config_projects_domains_map view in BigQuery.

If we must list projects in the .tfvars file, does this mean we’ll need to manually add new projects each time one is created? Using folders would really avoid this issue, so can we use them?

Yes you will need to add projects in both the projects_include_list (to be scanned by DLP) and to the domain mapping (to be assigned the correct policy tag based on DLP findings). As I mentioned earlier, folder-level support is not there yet as it's not as straight-forward to implement like the other levels

If we set data_catalog_taxonomy_activated_policy_types to [] in the tfvars file, does that completely disable the access control feature? What should we do with the domain and iam-mapping settings? Is there any "blank" value we can set for these variables that wouldn't break the solution?

The IAM and domain mapping configurations are required for a successful deployment and assigning policy tags to columns. Setting data_catalog_taxonomy_activated_policy_types = [ ] will just not enforce the access control feature of policy tags and it's equivalent to the toggle button "enforce access policies" seen in the console when you open a taxonomy. In your case if you only have one domain, let's call it default_domain you could use something like this:

domain_mapping = [
  {
    project  = "project_1",
    domain   = "default_domain",
    datasets = [] 
  },
  {
    project  = "project_2",
    domain   = "default_domain",
    datasets = [] 
  },
  ,etc..
]

iam_mapping = {

  default_domain = {
    PII_LEVEL1 = [],
    PII_LEVEL2 = []
   ,etc..
  }
}

where PII_LEVEL1 and PII_LEVEL2 are the ones used in the classification_taxonomy.classification field. In this case you leave the lists of IAM principles empty since you're not enforcing access policies.

@sergiospa
Copy link
Author

Hi @kwadie ,

Thank you, this answers my questions.

I will continue reviewing the solution in order to proceed with the deployment. We already have a DLP deployment, and we are only looking to incorporate the autotagging feature based on the findings, so we need to adapt several parts of the code. Therefore, I may come back with more questions in the future, I hope I won’t be too much of a bother!

Regards,
Sergio.

@sergiospa
Copy link
Author

Hello again @kwadie !

I would like to ask another question, if possible.

We currently have a separate deployment of DLP in our organization, and we want your automatic tagging solution to work alongside our existing DLP setup. In other words, we don't want the automatic tagging tool to make a separate deployment of DLP; we want to continue using our own deployment. This is possible, right?

That said, what does the automatic tagging solution take into account to determine whether or not to tag a DLP finding? I’m referring to the fact that DLP results indicate the likelihood (e.g., LIKELY, VERY_LIKELY) that the information may contain personal data. Is there a way to configure the automatic tagging solution to tag only when a certain likelihood threshold is met?

Thanks as always, and best regards,
Sergio.

@kwadie
Copy link
Collaborator

kwadie commented Sep 26, 2024

I am assuming you use DLP discovery service (i.e. Automatic DLP). In this case you can deploy this solution on top of it in the Discovery Service mode.

You're correct, DLP can find multiple InfoTypes with different levels of likelihood, however only one policy tag (representing an InfoType) can be attached to a column. For that, the solution runs a heuristic to "promote" only one InfoType per column given some signals like likelihood and number of findings .

In case of "Discovery Service" mode, the heuristic is simpler (due to the limited number of signals) and is defined in this SQL query that you can modify while keeping the same result schema and granularity. The current logic is as follows:

If Auto DLP promotes only one PII type, use this PII
If Auto DLP doesn't promote a PII type but finds only one "Other PII" type, use that one other PII type
If Auto DLP doesn't promote a PII type but finds more than one "Other PII" type, use MIXED

MIXED is a special policy tag the solution uses for when it can't promote a single InfoType over another (e.g. chat logs column). It's configured during deployment as explained here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants