-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
google_dataflow_job - when updating, wait for new job to start #3591
google_dataflow_job - when updating, wait for new job to start #3591
Conversation
Hello! I am a robot who works on Magic Modules PRs. I have detected that you are a community contributor, so your PR will be assigned to someone with a commit-bit on this repo for initial review. Thanks for your contribution! A human will be with you soon. @emilymye, please review this PR or find an appropriate assignee. |
I'm not really sure how to add reviewers, but @c2thorn would have context regarding this change. |
Hi @jcanseco! Thank you so much for contributing to MM! We really appreciate all the contributions you've been making. Just a couple comments but otherwise LGTM. I'd also note (for future PRs, this one can stay as is) that we set up some polling utils in common_polling.go where you pass I don't need any action right now though to use these utils though - I see some hardcoded timeouts in the existing code, so I'll probably be doing a PR afterwards where I can add the polling utils. |
case "JOB_STATE_FAILED": | ||
return resource.NonRetryableError(fmt.Errorf("the replacement job with ID %q failed with state %q.", replacementJobID, state)) | ||
default: | ||
log.Printf("the replacement job with ID %q has state %q.", replacementJobID, state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log.Printf("the replacement job with ID %q has state %q.", replacementJobID, state) | |
log.Printf("[DEBUG] replacement job with ID %q has successful terminal state %q.", replacementJobID, state) |
(needs [DEBUG]
or else TF won't print it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
return resource.RetryableError(fmt.Errorf("the replacement job with ID %q has not yet started and has state %q.", replacementJobID, state)) | ||
case "JOB_STATE_FAILED": | ||
return resource.NonRetryableError(fmt.Errorf("the replacement job with ID %q failed with state %q.", replacementJobID, state)) | ||
default: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default: | |
case "": | |
return resource.RetryableError(fmt.Errorf("the replacement job with ID %q does not have a defined state. Retrying.", replacementJobID, state)) | |
default: |
Found a case where the state just returns empty before eventually getting to JOB_STATE_FAILED
. Adding a retry here gets to the failed state
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can also combine this with "JOB_STATE_PENDING" and change the message to "has pending state %q"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch! Fixed.
3b31ee5
to
8b8a5d2
Compare
My pleasure! |
Hello! I am a robot who works on Magic Modules PRs. I have detected that you are a community contributor, so your PR will be assigned to someone with a commit-bit on this repo for initial review. Thanks for your contribution! A human will be with you soon. @rambleraptor, please review this PR or find an appropriate assignee. |
This patch modifies the update-by-replacement logic to wait for the new job to start before updating the google_dataflow_job's resource ID to point to the new job's ID. This ensures that the google_dataflow_job resource continues to point to the original job if the update operation were to fail.
8b8a5d2
to
540663a
Compare
Hello! I am a robot who works on Magic Modules PRs. I have detected that you are a community contributor, so your PR will be assigned to someone with a commit-bit on this repo for initial review. Thanks for your contribution! A human will be with you soon. @rambleraptor, please review this PR or find an appropriate assignee. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - running a test just as sanity check and if it passes I'll merge
|
||
region, err := getRegion(d, config) | ||
if err != nil { | ||
return resource.NonRetryableError(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@spew brought up the following about this line:
This seems like a place where we would want to retry and thus not return NonRetryableError?
Example: transient errors such as service 500s, TLS handshakes, etc, I believe will be returned by resourceDataflowJobGetJob(...) as that function is simply using the GCP APIs directly.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds plausible, but haven't seen any such errors in practice. It doesn't hurt to retry if we know specifically which errors we want to retry for
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has occurred for many resources for us in the past. Most of it was fixed by running things through the retry functions in retry_utils.go using the defaultRetryPredicates in error_retry_predicates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest using the default retry predicates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll put out a patch for Magic Modules, then if it looks good to the Terraform team, I can bring that patch to KCC's copy of Terraform to ensure we're in sync.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good @jcanseco!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also apologies, I just realized I commented on the "NonRetryableError" for getRegion(). I meant to do so for the one for resourceDataflowGetJob(). Might've been obvious but I thought I should clarify it.
Release Note Template for Downstream PRs (will be copied)