google_dataflow_job - when updating, wait for new job to start #3591

jcanseco · 2020-06-02T02:26:06Z

Release Note Template for Downstream PRs (will be copied)

dataflow: changed the update logic for `google_dataflow_job` to wait for the replacement job to start successfully before modifying the resource ID to point to the replacement job

modular-magician · 2020-06-02T02:27:09Z

Hello! I am a robot who works on Magic Modules PRs.

I have detected that you are a community contributor, so your PR will be assigned to someone with a commit-bit on this repo for initial review.

Thanks for your contribution! A human will be with you soon.

@emilymye, please review this PR or find an appropriate assignee.

jcanseco · 2020-06-02T02:31:00Z

I'm not really sure how to add reviewers, but @c2thorn would have context regarding this change.

modular-magician · 2020-06-02T02:32:11Z

Hi! I'm the modular magician. Your PR generated some diffs in downstreams - here they are.

Diff report:

Terraform GA: Diff ( 1 file changed, 41 insertions(+))
Terraform Beta: Diff ( 1 file changed, 41 insertions(+))

emilymye · 2020-06-02T17:16:55Z

Hi @jcanseco! Thank you so much for contributing to MM! We really appreciate all the contributions you've been making. Just a couple comments but otherwise LGTM.

I'd also note (for future PRs, this one can stay as is) that we set up some polling utils in common_polling.go where you pass PollingWaitTime(...) a read function and a status-checking function specific to the resource.

I don't need any action right now though to use these utils though - I see some hardcoded timeouts in the existing code, so I'll probably be doing a PR afterwards where I can add the polling utils.

third_party/terraform/resources/resource_dataflow_job.go

emilymye · 2020-06-02T17:12:11Z

third_party/terraform/resources/resource_dataflow_job.go

+		case "JOB_STATE_FAILED":
+			return resource.NonRetryableError(fmt.Errorf("the replacement job with ID %q failed with state %q.", replacementJobID, state))
+		default:
+			log.Printf("the replacement job with ID %q has state %q.", replacementJobID, state)


Suggested change

log.Printf("the replacement job with ID %q has state %q.", replacementJobID, state)

log.Printf("[DEBUG] replacement job with ID %q has successful terminal state %q.", replacementJobID, state)

(needs [DEBUG] or else TF won't print it)

c2thorn · 2020-06-02T18:34:49Z

third_party/terraform/resources/resource_dataflow_job.go

+			return resource.RetryableError(fmt.Errorf("the replacement job with ID %q has not yet started and has state %q.", replacementJobID, state))
+		case "JOB_STATE_FAILED":
+			return resource.NonRetryableError(fmt.Errorf("the replacement job with ID %q failed with state %q.", replacementJobID, state))
+		default:


Suggested change

default:

case "":

return resource.RetryableError(fmt.Errorf("the replacement job with ID %q does not have a defined state. Retrying.", replacementJobID, state))

default:

Found a case where the state just returns empty before eventually getting to JOB_STATE_FAILED. Adding a retry here gets to the failed state

we can also combine this with "JOB_STATE_PENDING" and change the message to "has pending state %q"

Great catch! Fixed.

jcanseco · 2020-06-02T19:33:20Z

Hi @jcanseco! Thank you so much for contributing to MM! We really appreciate all the contributions you've been making.

My pleasure!

modular-magician · 2020-06-02T19:34:18Z

Hello! I am a robot who works on Magic Modules PRs.

I have detected that you are a community contributor, so your PR will be assigned to someone with a commit-bit on this repo for initial review.

Thanks for your contribution! A human will be with you soon.

@rambleraptor, please review this PR or find an appropriate assignee.

modular-magician · 2020-06-02T19:39:25Z

Hi! I'm the modular magician. Your PR generated some diffs in downstreams - here they are.

Diff report:

Terraform GA: Diff ( 1 file changed, 41 insertions(+))
Terraform Beta: Diff ( 1 file changed, 41 insertions(+))

This patch modifies the update-by-replacement logic to wait for the new job to start before updating the google_dataflow_job's resource ID to point to the new job's ID. This ensures that the google_dataflow_job resource continues to point to the original job if the update operation were to fail.

modular-magician · 2020-06-02T20:17:49Z

Hello! I am a robot who works on Magic Modules PRs.

I have detected that you are a community contributor, so your PR will be assigned to someone with a commit-bit on this repo for initial review.

Thanks for your contribution! A human will be with you soon.

@rambleraptor, please review this PR or find an appropriate assignee.

modular-magician · 2020-06-02T20:22:51Z

Hi! I'm the modular magician. Your PR generated some diffs in downstreams - here they are.

Diff report:

Terraform GA: Diff ( 1 file changed, 38 insertions(+))
Terraform Beta: Diff ( 1 file changed, 38 insertions(+))

emilymye

LGTM - running a test just as sanity check and if it passes I'll merge

jcanseco · 2020-06-08T17:58:44Z

third_party/terraform/resources/resource_dataflow_job.go

+
+		region, err := getRegion(d, config)
+		if err != nil {
+			return resource.NonRetryableError(err)


@emilymye @c2thorn,

@spew brought up the following about this line:

This seems like a place where we would want to retry and thus not return NonRetryableError?
Example: transient errors such as service 500s, TLS handshakes, etc, I believe will be returned by resourceDataflowJobGetJob(...) as that function is simply using the GCP APIs directly.

Thoughts?

Sounds plausible, but haven't seen any such errors in practice. It doesn't hurt to retry if we know specifically which errors we want to retry for

It has occurred for many resources for us in the past. Most of it was fixed by running things through the retry functions in retry_utils.go using the defaultRetryPredicates in error_retry_predicates.

suggest using the default retry predicates

Ok, I'll put out a patch for Magic Modules, then if it looks good to the Terraform team, I can bring that patch to KCC's copy of Terraform to ensure we're in sync.

Sounds good @jcanseco!

Also apologies, I just realized I commented on the "NonRetryableError" for getRegion(). I meant to do so for the one for resourceDataflowGetJob(). Might've been obvious but I thought I should clarify it.

googlebot added the cla: yes label Jun 2, 2020

modular-magician requested a review from emilymye June 2, 2020 02:27

emilymye reviewed Jun 2, 2020

View reviewed changes

c2thorn reviewed Jun 2, 2020

View reviewed changes

jcanseco force-pushed the dataflow_wait_for_update branch from 3b31ee5 to 8b8a5d2 Compare June 2, 2020 19:33

modular-magician requested a review from rambleraptor June 2, 2020 19:34

emilymye removed the request for review from rambleraptor June 2, 2020 19:43

jcanseco force-pushed the dataflow_wait_for_update branch from 8b8a5d2 to 540663a Compare June 2, 2020 20:16

modular-magician requested a review from rambleraptor June 2, 2020 20:17

c2thorn removed the request for review from rambleraptor June 2, 2020 20:18

emilymye self-requested a review June 3, 2020 21:07

emilymye approved these changes Jun 3, 2020

View reviewed changes

emilymye merged commit 3622c15 into GoogleCloudPlatform:master Jun 4, 2020

This was referenced Jun 4, 2020

google_dataflow_job - when updating, wait for new job to start hashicorp/terraform-provider-google#6534

Merged

google_dataflow_job - when updating, wait for new job to start hashicorp/terraform-provider-google-beta#2140

Merged

jcanseco commented Jun 8, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

google_dataflow_job - when updating, wait for new job to start #3591

google_dataflow_job - when updating, wait for new job to start #3591

jcanseco commented Jun 2, 2020

modular-magician commented Jun 2, 2020

jcanseco commented Jun 2, 2020 •

edited

Loading

modular-magician commented Jun 2, 2020

emilymye commented Jun 2, 2020

emilymye Jun 2, 2020

jcanseco Jun 2, 2020

c2thorn Jun 2, 2020

emilymye Jun 2, 2020

jcanseco Jun 2, 2020

jcanseco commented Jun 2, 2020

modular-magician commented Jun 2, 2020

modular-magician commented Jun 2, 2020

modular-magician commented Jun 2, 2020

modular-magician commented Jun 2, 2020

emilymye left a comment

jcanseco Jun 8, 2020

c2thorn Jun 8, 2020

spew Jun 8, 2020

spew Jun 8, 2020

jcanseco Jun 8, 2020

c2thorn Jun 8, 2020

jcanseco Jun 8, 2020

	log.Printf("the replacement job with ID %q has state %q.", replacementJobID, state)
	log.Printf("[DEBUG] replacement job with ID %q has successful terminal state %q.", replacementJobID, state)

google_dataflow_job - when updating, wait for new job to start #3591

google_dataflow_job - when updating, wait for new job to start #3591

Conversation

jcanseco commented Jun 2, 2020

modular-magician commented Jun 2, 2020

jcanseco commented Jun 2, 2020 • edited Loading

modular-magician commented Jun 2, 2020

Diff report:

emilymye commented Jun 2, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jcanseco commented Jun 2, 2020

modular-magician commented Jun 2, 2020

modular-magician commented Jun 2, 2020

Diff report:

modular-magician commented Jun 2, 2020

modular-magician commented Jun 2, 2020

Diff report:

emilymye left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jcanseco commented Jun 2, 2020 •

edited

Loading