Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing test(s): TestAccBigQueryDatasetAccess_authorizedRoutine #13023

Comments

@megan07
Copy link
Contributor

megan07 commented Nov 11, 2022

Affected Resource(s)

  • google_bigquery_dataset_access

Failure rate: 100%

Impacted tests:

  • TestAccBigQueryDatasetAccess_authorizedRoutine

Nightly builds:

Message:

googleapi: Error 400: Routine ci-test-project-188019:tf_test_public_dataset_wujsmcqmfr.tf_test_public_routine_0xpknelg3m to authorize not found., invalid
with google_bigquery_dataset.private,

Reverted this for 4.44.0

@melinath
Copy link
Collaborator

Looking into this - it seems to be an ordering issue of some kind. In CI, the test does actions in the following order:

  1. PATCH dataset to remove references to routine (this is from dataset access deletion)
  2. DELETE routine
  3. PUT dataset to update it (unclear to me why this request is being issued.) This uses access state (which is now stale after the PATCH) that includes a reference to the routine, which was just deleted.

When I run this locally, these requests happen in the following order:

  1. PUT dataset to update it (unclear to me why this request is being issued.) This uses access state that includes a reference to the routine, which still exists.
  2. PATCH dataset to remove references to routine (this is from dataset access deletion)
  3. DELETE routine

@AarshDhokai
Copy link
Contributor

TestAccBigQueryDatasetAccess_authorizedRoutine

Google Cloud - 100% failure

Google Cloud Beta - 100% failure

@AarshDhokai
Copy link
Contributor

b/261390516

@rileykarson rileykarson added this to the Near-Term Goals milestone Jan 30, 2023
@roaks3 roaks3 self-assigned this Feb 17, 2023
@roaks3
Copy link
Collaborator

roaks3 commented Feb 17, 2023

It looks like the PUT is happening because the dataset is created with a description, but the "destroy" config does not include a description, so it is trying to update the dataset to remove it. So I think Stephen's thought is exactly right that this is a timing issue, where we do not know if the PATCH or PUT will happen first, and they actually probably are requested at almost the same exact time.

I've verified that we can fix the test by removing the description, so that the PUT is not needed, but this doesn't offer any help for users that may legitimately want to update a dataset and remove an access in the same config change. I'm thinking that a mutex may be the best solution, to ensure that PUT/PATCH calls are made serially on the dataset.

@roaks3
Copy link
Collaborator

roaks3 commented Feb 17, 2023

After digging a bit more, this is more complicated than a simple mutex, because:

  1. The access field on dataset is Optional+Computed, to account for the 4 server-created values, so it is included in the dataset's state even if it is not set by the user
  2. The dataset's state is not immediately updated when the dataset_access resource is removed, because it is a different resource
  3. The dataset resource is using a PUT for updates, so even if no access field is set by the user, the access values that are in its state will be applied to the server
    Together, this means that if a dataset_access resource is updated in the same config where its dataset resource is updated, the dataset PUT will attempt to overwrite whatever the dataset_access PATCH did.

I attempted to find precedence for this, but didn't come across anything. I think the best option would be to change the dataset resource to use a PATCH instead of PUT, and then allow subsequent reads to pick up that the access block was removed. However, this seems like a bigger change to make, and may come with other implications.

Lastly, we do have this note in our docs which kinda covers this:

If this resource is used alongside a google_bigquery_dataset resource, the dataset resource must either have no defined access blocks or a lifecycle block with ignore_changes = [access] so they don't fight over which accesses should be on the dataset
However, neither of the options proposed fix this issue, so here's my plan:

  • Fix the test failure by removing the description
  • Update the documentation to mention that you can't edit the dataset and dataset_access in the same change
  • Create a bug to support edits of dataset and dataset_access in the same change

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.