Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for BigQuery authorized views #2517

Merged
merged 9 commits into from
Jun 11, 2020
Merged

Added support for BigQuery authorized views #2517

merged 9 commits into from
Jun 11, 2020

Conversation

azhard
Copy link
Contributor

@azhard azhard commented Jun 8, 2020

resolves #1718

Description

This PR enables BigQuery dbt users to create authorized views and or as well as add read, write, and owner permissions for datasets in BigQuery.

https://cloud.google.com/bigquery/docs/share-access-views
https://cloud.google.com/bigquery/docs/dataset-access-controls

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.

@cla-bot cla-bot bot added the cla:yes label Jun 8, 2020
@azhard
Copy link
Contributor Author

azhard commented Jun 9, 2020

Realized I probably should be passing in a relation into the macro instead of a dataset_id however I'm running into an error (Object of type dict is not JSON serializable) when attempting to pass in a relation in my config.

Copy link
Contributor

@beckjake beckjake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Realized I probably should be passing in a relation into the macro instead of a dataset_id however I'm running into an error (Object of type dict is not JSON serializable) when attempting to pass in a relation in my config.

Should dataset_id support specifying the project, as well? In that case you'll want to create a new hologram dataclass like this:

@dataclass
class GrantTarget(JsonSchemaMixin):
    dataset_id: str
    project_id: str

Then you'd make your grant_access_to definition look like:


@dataclass
class BigqueryConfig(AdapterConfig):
    ...
    grant_access_to:  Optional[List[GrantTarget]]

And then users can supply the field like:

models:
  this_project:
    this_model:
      +grant_access_to:
        project_id: share-project-id
        dataset_id: share-dataset-id

I think that would work, at least

Comment on lines 127 to 129
{% macro bigquery__grant_access_to(entity, entity_type, role, dataset_id) -%}
{% do adapter.grant_access_to(entity, entity_type, role, dataset_id) %}
{% endmacro %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to provide a macro interface for this!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DataClass
class BigqueryConfig(AdapterConfig):
...
grant_access_to: Optional[List[GrantTarget]]

Managed to incorporate all your suggested changes @beckjake except for this one, I wasn't able to figure out how to create a GrantTarget object from a model config or when passing in --args when using run-operation

Copy link
Contributor

@beckjake beckjake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think this all looks good to me! Is it feasible to write a test for this? I wouldn't want to break this in the future, though I admit I am not sure how to write a useful test of this behavior. Would we need a second user with different permissions that we grant access to, and then make sure it can run another dbt job using the model in question as a source?

@azhard
Copy link
Contributor Author

azhard commented Jun 11, 2020

@beckjake conceptually that makes sense to me. Is that something that is possible right now with the testing framework?

An alternative would be to to run the grant_access and then inside the test retrieve the access_entries for the dataset and verify that the access entry that you just added exists for that dataset

@beckjake
Copy link
Contributor

@azhard It's possible, but I think we'd have to set up a second bigquery user on our end. We're making some changes to our test env in the immediate future and maybe we can sneak this in (cc @jtcohen6)

I think your suggestion about having the test retrieve the access_entries is a great alternative for this PR.

@jtcohen6
Copy link
Contributor

We could definitely make a second user within our new BigQuery test env. From skimming the BQ docs linked above, it sounds like that second user needs to be a User on the project and a DataViewer on the dataset containing the secure view, but should not have any permissions on the source dataset. Is that right?

We haven't yet switched over to the new test env, but we could very soon for BQ, since all the tests are running.

@azhard
Copy link
Contributor Author

azhard commented Jun 11, 2020

@jtcohen6 that sounds right to me!

@beckjake added the integration test, let me know if you had a slightly different way in mind

Copy link
Contributor

@beckjake beckjake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. Thanks for adding the test and getting this across the line @azhard! I'm going to kick off tests and then merge this into 0.18.0 once they pass.

@beckjake
Copy link
Contributor

That azure test failure is from the test issue I fixed in the 0.17.1 branch but haven't merged up into dev/marian-anderson yet, we can ignore it for now (if I'm wrong and this is somehow broken on Windows after that, I'll fix it!)

@beckjake beckjake merged commit eea51be into dbt-labs:dev/marian-anderson Jun 11, 2020
@drewbanin drewbanin requested review from drewbanin and removed request for drewbanin June 12, 2020 00:41
@preston-hf
Copy link

Any docs for this? It seems like I can call the macro directly when needed or provide it as a config option? It's not clear from the test how to use this in a model but maybe I'm just a n00b :)

@azhard
Copy link
Contributor Author

azhard commented Jul 6, 2020

Hey @preston-hf, I'd imagine docs will get added with the release of 1.18.0 but nothing as of yet. What you'll want to do to use this in your models directly is add grant_access_to as a param to your model config with the value for that being a list of dicts with a dataset and project key. It uses a list of dicts to accommodate for granting access to multiple datasets.
Eg:

{{
  config(
    grant_access_to=[
      {
        "dataset": "my_dataset",
        "project": "my_project"
      }
    ]
  )
}}

@preston-hf
Copy link

Awesome! I appreciate the contribution and the prompt reply.

@hamzaalikhan
Copy link

hamzaalikhan commented Aug 4, 2020

Hi @azhard

I am wondering how I can use this to implement dataset access to a particular user. I am using dbt version 0.17.2

For example I have a user called "[email protected]" and I want to give them access to a particular dataset in bq, can I use dbt_project.yml file and grant_access_to to do this?

I tried using the following without any luck 😢

models:
my_project:
model_dir:
+grant_access_to:
role: READER
entity_type: userByEmail
entity_id: [email protected]

Thank you 😄

@beckjake
Copy link
Contributor

beckjake commented Aug 4, 2020

@hamzaalikhan This is not in 0.17.2. It's going to be released in 0.18.0 and should be available in 0.18.0b2 if you want to try it out.

@hamzaalikhan
Copy link

Thanks @beckjake for you prompt reply 😃 I upgraded dbt to 0.18.0b2 and tried this out, it works nicely for auth view! Great work 😀 👍

But what if I wanna use it to grant access to a dataset for a particular user? If we don't use auth views?

For granting access to a dataset (no auth views), I tried the following syntax in the dbt_project.yml but it didn't work 😭

Any leads on this please?

models:
my_project:
my_model:
+schema: staging
+labels:
domain: finance
+materialized: table
+grant_access_to:
- { dataset: my_dataset, project: my_project, entity_type: userByEmail, role: READER, entity: [email protected]}

@azhard
Copy link
Contributor Author

azhard commented Aug 5, 2020

Hey @hamzaalikhan, try giving it a shot using this syntax:

+grant_access_to:
    entity: [email protected]
    entity_type: userByEmail
    role: READER
    grant_target_dict:
        project: my_project
        dataset: my_dataset

@hamzaalikhan
Copy link

hamzaalikhan commented Aug 5, 2020

hey @azhard thanks for your prompt reply! :)

I tried the syntax you recommended and unfortunately it did not work for me, I don't see any logs also when I run dbt run, should I be expecting to see logs that say something about granting access?

models:
my_dbt_project:
my_dateset:
+grant_access_to:
entity: [email protected]
entity_type: userByEmail
role: READER
grant_target_dict:
project: bq_project_id
dataset: my_dataset
+labels:
domain: finance
+materialized: table

^ This is the full syntax I am using, I am the owner on the project so there are no permissions issue on the bq side

@hamzaalikhan
Copy link

I will probably just wait until 0.18.0 comes out, cheers for your help guys! 😄

@azhard
Copy link
Contributor Author

azhard commented Aug 5, 2020

Sorry for the confusion @hamzaalikhan, if you're still interested the actual way to run it for this case is to run it as a macro from the CLI like this:

dbt run-operation grant_access_to --args '{entity: [email protected], entity_type: userByEmail, role: READER, grant_target_dict: {project: my_project, dataset: my_dataset}}'

In the case where you want to authorize views to a dataset you can update your dbt_project config like this:

models:
  my_dbt_project:
      my_dbt_folder:
        +grant_access_to:
          - project: my_project1
            dataset: my_dataset1
          - project: my_project2
            dataset: my_dataset2

@hamzaalikhan
Copy link

Hey @azhard, using run-operation worked like a charm! Thank you so much!!

When using the dbt run-operation grant_access_to, is it supposed to give any logs like granting access to foo? It defs worked but silently 😄

@azhard
Copy link
Contributor Author

azhard commented Aug 5, 2020

I don't believe I added any logs to the macro / function execution, but glad to hear that it worked!

@glqstrauss
Copy link

Hello! When we implemented this feature in the dbt_project.yml, authorizing a dozen or so views against the same dataset, we ran into a number of errors that looked like:

412 PATCH https://bigquery.googleapis.com/bigquery/v2/projects/[omitted]/datasets/stitch_zuora?prettyPrint=false: Resource [omitted]:stitch_zuora did not meet condition IF_MATCH

412 PATCH https://bigquery.googleapis.com/bigquery/v2/projects/[omitted]/datasets/stitch_zuora?prettyPrint=false: Precondition check failed.

After a little digging, I'm pretty sure this error was due to one dataset PATCH call not have the most recent etag of another dataset PATCH call being executed against the same dataset. In other words, that this functionality isn't thread-safe.

We worked around the issue by rerunning the affected models in a single threaded run.

I'm not familiar enough with the dbt implementation to understand whether fixing this is minor, major, worth it, etc. Any thoughts as to whether this is worth opening as an issue?

@jtcohen6
Copy link
Contributor

Hey @glqstrauss, I think that's right, authorized view operations are not thread-safe. That's something other community members have run into.

Really, I view the current implementation of authorized views as necessarily limited, until such time as BigQuery releases the DDL to manage them, at which point we can switch to a more robust implementation. (They've been releasing a lot of SQL syntax parity of late, and I understand that grant statements may be on the horizon.)

In the meantime, I'd be supportive of adding a note to the docs site that mentions the single-threaded limitation. Is that something you'd be up to add?

@glqstrauss
Copy link

glqstrauss commented Apr 14, 2021

Hey @jtcohen6 I cut a change here: dbt-labs/docs.getdbt.com#634

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for BigQuery "Authorized Views"
6 participants