-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Resource: Dataflow Job #855
Conversation
A dataflow job exists when it is in a nonterminal state, and does not exist if it is in a terminal state (or a non-running state which can only transition into terminal states).
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woo thanks for reviving this @ndmckinley! Overall structure looks good, just a few comments.
google/resource_dataflow_job.go
Outdated
return err | ||
} | ||
|
||
jobName := d.Get("name").(string) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eh may as well just inline these in env and request
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
google/resource_dataflow_job.go
Outdated
jobName := d.Get("name").(string) | ||
gcsPath := d.Get("template_gcs_path").(string) | ||
tempLocation := d.Get("temp_gcs_location").(string) | ||
zone := d.Get("zone").(string) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should probably be getZone now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
google/resource_dataflow_job.go
Outdated
return err | ||
} | ||
|
||
d.SetId(job.Id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For compute resources that are asynchronous, we usually follow this order:
- Send the request
- Set the Id
- Wait on the request
- If there's an error, set the id to ""
Since dataflow seems to use a synchronous API, we can't follow that exact pattern, but I still think setting the id before sending the request is probably the way to go:
- If the Terraform process quits for whatever reason when the request is running, the request will have already been sent, so it means that the resource does exist in GCP. With the id in the state file already, we can read the information back when we call the read() function on plan, and we don't lose out on anything. If the id didn't get saved, then Terraform would think it needed to recreate it.
- If the Terraform process quits (for whatever reason) exactly between when we set the id and when we make the request, then when it does the refresh on plan it would just see that it doesn't actually exist and set the id back to empty.
- If the Terraform process doesn't quit then it doesn't really matter when the id gets set.
Anyway that's a long explanation for a tiny change that doesn't make a difference 99% of the time, but now you know :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately the ID here is created on the server side and not delivered back to us until the create has already succeeded. Think it's worth trying to use an async api instead?
google/resource_dataflow_job.go
Outdated
"google.golang.org/api/googleapi" | ||
) | ||
|
||
var dataflowTerminalStatesMap = map[string]bool{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since bool's zero-value is false, there's no difference between a key that exists in the map but has a value of false and one not in the map at all, so if you wanted to you could remove all the falses from here. (according to the internet, the convention of how to do sets in go is to use a map[string]struct{}, although I'm pretty sure the reason for an empty struct instead of a bool is because it takes up less memory, and for a set this small the memory savings aren't going to make any difference at all)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stack Overflow tells me that map[string]bool is the right way to go if you can afford the memory overhead for syntax convenience reasons.
https://softwareengineering.stackexchange.com/questions/177428/sets-data-structure-in-golang
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I prefer struct{}
to bool
when creating a set, as I feel it's clearer--the value can't have any meaning, which prevents future maintainers from accidentally thinking the bool
means something--but it's such a minor preference I'm not super interested in enforcing it on anyone. I just find the semantic reasoning to be more persuasive than the memory savings.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case I think the code gets uglier if I need to check existence of a key rather than the value of the key. Since go only provides comma-ok notation for that, I can't really see how to use it in the second condition of a loop without some ugly repeating-myself. I don't mind changing it - and if you know a trick, I'd love to learn it - but my mild preference is to leave it as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, the for does complicate this. I, personally, would move the Read out of the for loop's definition, something like https://play.golang.org/p/BIxXBPs--vM
But now I'm really in the weeds, and honestly this is such a slight preference thing, I'm happy to defer to what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, done.
// If we have an error and it's not a google-specific error, we should go ahead and return. | ||
return err | ||
} else if ok && strings.Contains(gerr.Message, "not yet ready for canceling") { | ||
time.Sleep(5 * time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So usually for something that requires checking a bunch of times if something is in a certain state, we would use a StateChangeConf (see all the _operation.go files for examples), which handles things like timeouts and backoff and whatnot. Is there a way to know just based on read whether the job is ready for canceling, or is it something you can only know by trying it and seeing if it works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can only know by trying it, it seems to me.
google/resource_dataflow_job.go
Outdated
tempLocation := d.Get("temp_gcs_location").(string) | ||
zone := d.Get("zone").(string) | ||
maxWorkers := d.Get("max_workers").(int) | ||
params := expandDataflowParamsStringMap(d.Get("parameters").(map[string]interface{})) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use expandStringMap from utils.go for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
vendor/vendor.json
Outdated
@@ -1,6 +1,6 @@ | |||
{ | |||
"comment": "", | |||
"ignore": "appengine test github.com/hashicorp/nomad/ github.com/hashicorp/terraform/backend", | |||
"ignore": "test", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea - govendor did it for me. I'll put it back.
google/resource_dataflow_job.go
Outdated
} else { | ||
d.Set("state", job.CurrentState) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally it's a good idea to set as many fields as you have access to on read, not just computed ones. This helps detect any drift between what Terraform thinks is true and what GCP thinks is true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -0,0 +1,210 @@ | |||
package google |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll also want to add documentation in website/docs/r/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this last night but forgot to comment.
@paddycarver For the tests to pass on CI server you'll need to enable the Dataflow API for the test project: If you prefer I can stick a "google_project_service" resource on there to have the test re-enable it automatically. |
I've got a note now to enable the service for CI tomorrow. I'd rather the tests not try to set up their own environment, honestly, though... I think we just need a longer talk about testing and CI, and the ergonomics and developer experience around them. It's on my shortlist :) |
API enabled for CI :) |
Thanks! I'm running it now. It passes locally, so it ought to work. :) edit: Aha, great, it does pass. |
google/resource_dataflow_job.go
Outdated
log.Printf("[DEBUG] Removing resource '%s' because it is in state %s.\n", job.Name, job.CurrentState) | ||
d.SetId("") | ||
return nil | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: no need for this "else" (https://github.com/golang/go/wiki/CodeReviewComments#indent-error-flow)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good point, thanks.
google/resource_dataflow_job_test.go
Outdated
) | ||
|
||
func TestAccDataflowJobCreate(t *testing.T) { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add t.Parallel()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
```hcl | ||
resource "google_dataflow_job" "big_data_job" { | ||
name = "dataflow-job" | ||
template_gcs_path = "gs://my-bucket/templates/template_file" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: alignment (tabs vs spaces)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh no! Thanks, switched to spaces.
The following arguments are supported: | ||
|
||
* `name` - (Required) A unique name for the resource, required by Dataflow. | ||
Changing this forces a new resource to be created. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove this line since changing any of the fields forces a new resource to be created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
## Attributes Reference | ||
|
||
In addition to the above, the resource provides `state` - the current state of the resource, selected from the [JobState enum](https://cloud.google.com/dataflow/docs/reference/rest/v1b3/projects.jobs#Job.JobState) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eh even though there's just one let's still make a list here so it's easy to scan the left side of the page for attributes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Add support for Google Dataflow jobs Note: A dataflow job exists when it is in a nonterminal state, and does not exist if it is in a terminal state (or a non-running state which can only transition into terminal states). See doc for more detail.
Allow resource_*_iam_bindings with no members
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks! |
A long time ago, in a galaxy far, far away (prior to the provider split), @dasch created a PR hashicorp/terraform#14174 to close hashicorp/terraform#13537. @danawillow reviewed it, and it was nearly done, but the provider split happened, the PR was closed, and the issue was abandoned. This PR revives it, updates it in a minor way, and addresses one of @danawillow's remaining comments about the notion of existence.
I'm pretty new myself, so I'm not sure this is the right final state for this resource ("existence" is a weird notion for a dataflow job), but I think it's the closest to the original intent of the PR and I think it's a valid and useful way to do this.
@dasch and @k4y3ff are two people who have said that they have a use case in mind for this resource - what do you folks think?
If merged, this PR closes #81.