Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Resource: Dataflow Job #855

Merged
merged 11 commits into from
Jan 10, 2018
Merged

Conversation

nat-henderson
Copy link
Contributor

A long time ago, in a galaxy far, far away (prior to the provider split), @dasch created a PR hashicorp/terraform#14174 to close hashicorp/terraform#13537. @danawillow reviewed it, and it was nearly done, but the provider split happened, the PR was closed, and the issue was abandoned. This PR revives it, updates it in a minor way, and addresses one of @danawillow's remaining comments about the notion of existence.

I'm pretty new myself, so I'm not sure this is the right final state for this resource ("existence" is a weird notion for a dataflow job), but I think it's the closest to the original intent of the PR and I think it's a valid and useful way to do this.

@dasch and @k4y3ff are two people who have said that they have a use case in mind for this resource - what do you folks think?

If merged, this PR closes #81.

dasch and others added 4 commits December 13, 2017 22:08
A dataflow job exists when it is in a nonterminal state, and does not exist if it
is in a terminal state (or a non-running state which can only transition into terminal
states).
@nat-henderson
Copy link
Contributor Author

00:36 $ make testacc TEST=./google TESTARGS='-run=TestAccDataflow*'
==> Checking that code complies with gofmt requirements...
TF_ACC=1 go test ./google -v -run=TestAccDataflow* -timeout 120m
=== RUN   TestAccDataflowJobCreate
--- PASS: TestAccDataflowJobCreate (8.43s)
PASS
ok      github.com/terraform-providers/terraform-provider-google/google 8.446s

Copy link
Contributor

@danawillow danawillow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woo thanks for reviving this @ndmckinley! Overall structure looks good, just a few comments.

return err
}

jobName := d.Get("name").(string)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eh may as well just inline these in env and request

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

jobName := d.Get("name").(string)
gcsPath := d.Get("template_gcs_path").(string)
tempLocation := d.Get("temp_gcs_location").(string)
zone := d.Get("zone").(string)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should probably be getZone now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return err
}

d.SetId(job.Id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For compute resources that are asynchronous, we usually follow this order:

  • Send the request
  • Set the Id
  • Wait on the request
  • If there's an error, set the id to ""

Since dataflow seems to use a synchronous API, we can't follow that exact pattern, but I still think setting the id before sending the request is probably the way to go:

  • If the Terraform process quits for whatever reason when the request is running, the request will have already been sent, so it means that the resource does exist in GCP. With the id in the state file already, we can read the information back when we call the read() function on plan, and we don't lose out on anything. If the id didn't get saved, then Terraform would think it needed to recreate it.
  • If the Terraform process quits (for whatever reason) exactly between when we set the id and when we make the request, then when it does the refresh on plan it would just see that it doesn't actually exist and set the id back to empty.
  • If the Terraform process doesn't quit then it doesn't really matter when the id gets set.

Anyway that's a long explanation for a tiny change that doesn't make a difference 99% of the time, but now you know :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately the ID here is created on the server side and not delivered back to us until the create has already succeeded. Think it's worth trying to use an async api instead?

"google.golang.org/api/googleapi"
)

var dataflowTerminalStatesMap = map[string]bool{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since bool's zero-value is false, there's no difference between a key that exists in the map but has a value of false and one not in the map at all, so if you wanted to you could remove all the falses from here. (according to the internet, the convention of how to do sets in go is to use a map[string]struct{}, although I'm pretty sure the reason for an empty struct instead of a bool is because it takes up less memory, and for a set this small the memory savings aren't going to make any difference at all)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stack Overflow tells me that map[string]bool is the right way to go if you can afford the memory overhead for syntax convenience reasons.

https://softwareengineering.stackexchange.com/questions/177428/sets-data-structure-in-golang

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I prefer struct{} to bool when creating a set, as I feel it's clearer--the value can't have any meaning, which prevents future maintainers from accidentally thinking the bool means something--but it's such a minor preference I'm not super interested in enforcing it on anyone. I just find the semantic reasoning to be more persuasive than the memory savings.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case I think the code gets uglier if I need to check existence of a key rather than the value of the key. Since go only provides comma-ok notation for that, I can't really see how to use it in the second condition of a loop without some ugly repeating-myself. I don't mind changing it - and if you know a trick, I'd love to learn it - but my mild preference is to leave it as is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, the for does complicate this. I, personally, would move the Read out of the for loop's definition, something like https://play.golang.org/p/BIxXBPs--vM

But now I'm really in the weeds, and honestly this is such a slight preference thing, I'm happy to defer to what you think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, done.

// If we have an error and it's not a google-specific error, we should go ahead and return.
return err
} else if ok && strings.Contains(gerr.Message, "not yet ready for canceling") {
time.Sleep(5 * time.Second)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So usually for something that requires checking a bunch of times if something is in a certain state, we would use a StateChangeConf (see all the _operation.go files for examples), which handles things like timeouts and backoff and whatnot. Is there a way to know just based on read whether the job is ready for canceling, or is it something you can only know by trying it and seeing if it works?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can only know by trying it, it seems to me.

tempLocation := d.Get("temp_gcs_location").(string)
zone := d.Get("zone").(string)
maxWorkers := d.Get("max_workers").(int)
params := expandDataflowParamsStringMap(d.Get("parameters").(map[string]interface{}))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use expandStringMap from utils.go for this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -1,6 +1,6 @@
{
"comment": "",
"ignore": "appengine test github.com/hashicorp/nomad/ github.com/hashicorp/terraform/backend",
"ignore": "test",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea - govendor did it for me. I'll put it back.

} else {
d.Set("state", job.CurrentState)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally it's a good idea to set as many fields as you have access to on read, not just computed ones. This helps detect any drift between what Terraform thinks is true and what GCP thinks is true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -0,0 +1,210 @@
package google
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll also want to add documentation in website/docs/r/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this last night but forgot to comment.

@nat-henderson
Copy link
Contributor Author

@paddycarver For the tests to pass on CI server you'll need to enable the Dataflow API for the test project:

https://console.developers.google.com/apis/api/dataflow.googleapis.com/overview?project=hc-terraform-testing

If you prefer I can stick a "google_project_service" resource on there to have the test re-enable it automatically.

@paddycarver
Copy link
Contributor

I've got a note now to enable the service for CI tomorrow. I'd rather the tests not try to set up their own environment, honestly, though... I think we just need a longer talk about testing and CI, and the ergonomics and developer experience around them. It's on my shortlist :)

@paddycarver
Copy link
Contributor

API enabled for CI :)

@nat-henderson
Copy link
Contributor Author

nat-henderson commented Jan 10, 2018

Thanks! I'm running it now. It passes locally, so it ought to work. :)

edit: Aha, great, it does pass.

log.Printf("[DEBUG] Removing resource '%s' because it is in state %s.\n", job.Name, job.CurrentState)
d.SetId("")
return nil
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good point, thanks.

)

func TestAccDataflowJobCreate(t *testing.T) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add t.Parallel()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

```hcl
resource "google_dataflow_job" "big_data_job" {
name = "dataflow-job"
template_gcs_path = "gs://my-bucket/templates/template_file"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: alignment (tabs vs spaces)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh no! Thanks, switched to spaces.

The following arguments are supported:

* `name` - (Required) A unique name for the resource, required by Dataflow.
Changing this forces a new resource to be created.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove this line since changing any of the fields forces a new resource to be created.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


## Attributes Reference

In addition to the above, the resource provides `state` - the current state of the resource, selected from the [JobState enum](https://cloud.google.com/dataflow/docs/reference/rest/v1b3/projects.jobs#Job.JobState)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh even though there's just one let's still make a list here so it's easy to scan the left side of the page for attributes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@nat-henderson nat-henderson merged commit f50d4a1 into hashicorp:master Jan 10, 2018
@nat-henderson nat-henderson deleted the dataflow branch January 10, 2018 22:38
chrisst pushed a commit to chrisst/terraform-provider-google that referenced this pull request Nov 9, 2018
Add support for Google Dataflow jobs

Note: A dataflow job exists when it is in a nonterminal state, and does not exist if it
is in a terminal state (or a non-running state which can only transition into terminal
states).  See doc for more detail.
modular-magician pushed a commit to modular-magician/terraform-provider-google that referenced this pull request Sep 27, 2019
Allow resource_*_iam_bindings with no members
@ghost
Copy link

ghost commented Mar 30, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!

@ghost ghost locked and limited conversation to collaborators Mar 30, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
4 participants