Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable TPUs to use Shared VPC #3939

Merged
merged 10 commits into from
Sep 17, 2020

Conversation

upodroid
Copy link
Contributor

@upodroid upodroid commented Sep 1, 2020

Fixes: hashicorp/terraform-provider-google#7117

If this PR is for Terraform, I acknowledge that I have:

  • Searched through the issue tracker for an open issue that this either resolves or contributes to, commented on it to claim it, and written "fixes {url}" or "part of {url}" in this PR description. If there were no relevant open issues, I opened one and commented that I would like to work on it (not necessary for very small changes).
  • Generated Terraform, and ran make test and make lint to ensure it passes unit and linter tests.
  • Ensured that all new fields I added that can be set by a user appear in at least one example (for generated resources) or third_party test (for handwritten resources or update tests).
  • Ran relevant acceptance tests (If the acceptance tests do not yet pass or you are unable to run them, please let your reviewer know).
  • Read the Release Notes Guide before writing my release note below.

Release Note Template for Downstream PRs (will be copied)

tpu: added `use_service_networking` to `google_tpu_node` which enables Shared VPC Support.

@google-cla google-cla bot added the cla: yes label Sep 1, 2020
@modular-magician
Copy link
Collaborator

Hello! I am a robot who works on Magic Modules PRs.

I have detected that you are a community contributor, so your PR will be assigned to someone with a commit-bit on this repo for initial review.

Thanks for your contribution! A human will be with you soon.

@megan07, please review this PR or find an appropriate assignee.

@modular-magician
Copy link
Collaborator

Hi! I'm the modular magician. Your PR generated some diffs in downstreams - here they are.

Diff report:

Terraform GA: Diff ( 3 files changed, 104 insertions(+), 30 deletions(-))
Terraform Beta: Diff ( 3 files changed, 104 insertions(+), 30 deletions(-))
Ansible: Diff ( 2 files changed, 36 insertions(+), 4 deletions(-))
TF Conversion: Diff ( 1 file changed, 10 insertions(+))
TF OiCS: Diff ( 1 file changed, 21 insertions(+), 2 deletions(-))

@modular-magician
Copy link
Collaborator

Hi! I'm the modular magician. Your PR generated some diffs in downstreams - here they are.

Diff report:

Terraform GA: Diff ( 3 files changed, 105 insertions(+), 30 deletions(-))
Terraform Beta: Diff ( 3 files changed, 105 insertions(+), 30 deletions(-))
Ansible: Diff ( 2 files changed, 36 insertions(+), 4 deletions(-))
TF Conversion: Diff ( 1 file changed, 10 insertions(+))
TF OiCS: Diff ( 1 file changed, 21 insertions(+), 2 deletions(-))

@upodroid
Copy link
Contributor Author

upodroid commented Sep 1, 2020

Urgh, the google API takes the project name in the network field and holds it as a project number.

    TestAccTPUNode_tpuNodeFullExample: testing.go:674: Step 0 error: After applying this step, the plan was not empty:
        
        DIFF:
        
        DESTROY/CREATE: google_tpu_node.tpu
          accelerator_type:                "v3-8" => "v3-8"
          cidr_block:                      "10.96.0.0/29" => "<computed>"
          description:                     "Terraform Google Provider test TPU" => "Terraform Google Provider test TPU"
          id:                              "projects/REDACTED/locations/us-central1-b/nodes/tf-test-test-tpuv048js4jr2" => "<computed>"
          labels.foo:                      "bar" => "bar"
          name:                            "tf-test-test-tpuv048js4jr2" => "tf-test-test-tpuv048js4jr2"
          network:                         "projects/550924169191/global/networks/default" => "projects/REDACTED/global/networks/default" (forces new resource)
          network_endpoints:               "" => "<computed>" (forces new resource)
          network_endpoints.#:             "1" => "" (forces new resource)
          network_endpoints.0.ip_address:  "10.96.0.2" => "" (forces new resource)
          network_endpoints.0.port:        "8470" => "" (forces new resource)
          project:                         "REDACTED" => "<computed>"
          scheduling_config.#:             "1" => "1"
          scheduling_config.0.preemptible: "true" => "true"
          service_account:                 "[email protected]" => "<computed>"
          tensorflow_version:              "1.11" => "1.11"
          use_service_networking:          "true" => "true"
          zone:                            "us-central1-b" => "us-central1-b"
        
        
        
        STATE:
        
        data.google_compute_network.ne

@upodroid
Copy link
Contributor Author

upodroid commented Sep 4, 2020

Tests are passing

@modular-magician
Copy link
Collaborator

Hi! I'm the modular magician. Your PR generated some diffs in downstreams - here they are.

Diff report:

Terraform GA: Diff ( 3 files changed, 112 insertions(+), 32 deletions(-))
Terraform Beta: Diff ( 3 files changed, 112 insertions(+), 32 deletions(-))
Ansible: Diff ( 2 files changed, 36 insertions(+), 4 deletions(-))
TF Conversion: Diff ( 1 file changed, 16 insertions(+))
TF OiCS: Diff ( 1 file changed, 21 insertions(+), 2 deletions(-))

schedulingConfig: !ruby/object:Overrides::Terraform::PropertyOverride
diff_suppress_func: 'compareTpuNodeSchedulingConfig'
schedulingConfig.preemptible: !ruby/object:Overrides::Terraform::PropertyOverride
diff_suppress_func: 'compareTpuNodeSchedulingConfig'
network: !ruby/object:Overrides::Terraform::PropertyOverride
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be overwriting the network block from line 36, and also, I don't think we want to always suppress this diff. What if the intent is to change the network, I don't think it'd recognize the change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it was suppressing the values the api returns from get call in the Read function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworked it to check for a change in network.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this will make a huge difference, because if the user really wants to change the network, this will prevent them from doing so. I wonder if we could maybe do a diffsuppress that would parse the network and see if the only difference is the project and then check that the project number is the correct project number? Does that make sense? It's a little hacky, but would be the workaround until the upstream issue was solved.
Also, I think we want to use the diff_suppress_func from line 38 above rather than creating a new network block here. Then, in the new diff suppress func that you're implementing, make sure you include the same logic that is in compareSelfLinkOrResourceName so we don't lose that as well. Does that make sense?
Let me know if you have questions.

Copy link
Contributor Author

@upodroid upodroid Sep 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have a look my last commit. How do I get the *Config in to the diffsupress function? I'm using other functions that expect the config block as arguments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @upodroid! What a finicky issue! I spent my morning looking into how we can do this. Although diff suppress would be ideal, it looks like we can't use it here. If we want access to config we'll have to use a CustomizeDiff.
Basically what I did in the CustomizeDiff was get the old (value using project number) and new (value using project name) values, used regex to find the submatch for project. Then called out to the projects API using the project name and if the returned project.Number was the same as the project number, then i did a diff.SetNew(old). Does that make sense? So basically if they're the same project, we're just going to set it to use the project number so there is no diff.
Let me know if you have questions! Thanks again!!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I amended it to use CustomizeDiff now but i'm still getting diffs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm. It is ready to go.

 REDACTED  MCW0CDP3YY  ~  go  …  github.com  hashicorp  terraform-provider-google-beta   master  66⬇  5✎  USAGE  $    make testacc TEST=./google-beta TESTARGS='-run=TestAccTPUNode_tpuNodeFullExample'
==> Checking source code against gofmt...
==> Checking that code complies with gofmt requirements...
TF_ACC=1 TF_SCHEMA_PANIC_ON_ERROR=1 go test ./google-beta -v -run=TestAccTPUNode_tpuNodeFullExample -timeout 240m -ldflags="-X=github.com/hashicorp/terraform-provider-google-beta/version.ProviderVersion=acc"
=== RUN   TestAccTPUNode_tpuNodeFullExample
=== PAUSE TestAccTPUNode_tpuNodeFullExample
=== CONT  TestAccTPUNode_tpuNodeFullExample
--- PASS: TestAccTPUNode_tpuNodeFullExample (364.17s)
PASS
ok      github.com/hashicorp/terraform-provider-google-beta/google-beta 366.543s

templates/terraform/constants/tpu_node.erb Outdated Show resolved Hide resolved
@upodroid
Copy link
Contributor Author

upodroid commented Sep 4, 2020

It is strange behaviour in the Google API.

POST API Call. Missing the network in the API call though.

020/09/04 21:12:57 [DEBUG] Retry Transport: request attempt 0
2020/09/04 21:12:57 [DEBUG] Google API Request Details:
---[ REQUEST ]---------------------------------------
POST /v1/projects/REDACTED/locations/us-central1-b/nodes?alt=json&nodeId=tf-test-test-tpuph6bmmfqw8 HTTP/1.1
Host: tpu.googleapis.com
User-Agent: HashiCorp Terraform/0.12.7-sdk (+https://www.terraform.io) Terraform Plugin SDK/1.11.0 terraform-provider-google-beta/acc
Content-Length: 232
Content-Type: application/json
Accept-Encoding: gzip

{
 "acceleratorType": "v3-8",
 "description": "Terraform Google Provider test TPU",
 "labels": {
  "foo": "bar"
 },
 "name": "tf-test-test-tpuph6bmmfqw8",
 "schedulingConfig": {
  "preemptible": true
 },
 "tensorflowVersion": "1.11",
 "useServiceNetworking": true
}

GET Call after the op is complete.

2020/09/04 21:15:13 [DEBUG] Google API Request Details:
---[ REQUEST ]---------------------------------------
GET /v1/projects/REDACTED/locations/us-central1-b/nodes/tf-test-test-tpuph6bmmfqw8?alt=json HTTP/1.1
Host: tpu.googleapis.com
User-Agent: HashiCorp Terraform/0.12.7-sdk (+https://www.terraform.io) Terraform Plugin SDK/1.11.0 terraform-provider-google-beta/acc
Content-Type: application/json
Accept-Encoding: gzip


-----------------------------------------------------
2020/09/04 21:15:14 [DEBUG] Google API Response Details:
---[ RESPONSE ]--------------------------------------
HTTP/1.1 200 OK
Connection: close
Transfer-Encoding: chunked
Alt-Svc: h3-29=":443"; ma=2592000,h3-27=":443"; ma=2592000,h3-T051=":443"; ma=2592000,h3-T050=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"
Cache-Control: private
Content-Type: application/json; charset=UTF-8
Date: Fri, 04 Sep 2020 20:15:14 GMT
Server: ESF
Vary: Origin
Vary: X-Origin
Vary: Referer
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-Xss-Protection: 0

2c9
{
  "name": "projects/REDACTED/locations/us-central1-b/nodes/tf-test-test-tpuph6bmmfqw8",
  "description": "Terraform Google Provider test TPU",
  "acceleratorType": "v3-8",
  "ipAddress": "10.71.0.2",
  "state": "READY",
  "tensorflowVersion": "1.11",
  "network": "projects/550924169191/global/networks/default",
  "cidrBlock": "10.71.0.0/29",
  "port": "8470",
  "serviceAccount": "[email protected]",
  "createTime": "2020-09-04T20:13:00.287428315Z",
  "schedulingConfig": {
    "preemptible": true
  },
  "networkEndpoints": [
    {
      "ipAddress": "10.71.0.2",
      "port": 8470
    }
  ],
  "labels": {
    "foo": "bar"
  },
  "useServiceNetworking": true
}

@modular-magician
Copy link
Collaborator

Hi! I'm the modular magician. Your PR generated some diffs in downstreams - here they are.

Diff report:

Terraform GA: Diff ( 3 files changed, 115 insertions(+), 32 deletions(-))
Terraform Beta: Diff ( 3 files changed, 115 insertions(+), 32 deletions(-))
Ansible: Diff ( 2 files changed, 36 insertions(+), 4 deletions(-))
TF Conversion: Diff ( 1 file changed, 19 insertions(+))
TF OiCS: Diff ( 1 file changed, 21 insertions(+), 2 deletions(-))

@modular-magician
Copy link
Collaborator

Hi! I'm the modular magician. Your PR generated some diffs in downstreams - here they are.

Diff report:

Terraform GA: Diff ( 3 files changed, 139 insertions(+), 31 deletions(-))
Terraform Beta: Diff ( 3 files changed, 139 insertions(+), 31 deletions(-))
Ansible: Diff ( 2 files changed, 37 insertions(+), 4 deletions(-))
TF Conversion: Diff ( 1 file changed, 43 insertions(+))
TF OiCS: Diff ( 1 file changed, 21 insertions(+), 2 deletions(-))

@modular-magician
Copy link
Collaborator

Hi! I'm the modular magician. Your PR generated some diffs in downstreams - here they are.

Diff report:

Terraform GA: Diff ( 3 files changed, 136 insertions(+), 30 deletions(-))
Terraform Beta: Diff ( 3 files changed, 136 insertions(+), 30 deletions(-))
Ansible: Diff ( 2 files changed, 37 insertions(+), 4 deletions(-))
TF Conversion: Diff ( 1 file changed, 42 insertions(+))
TF OiCS: Diff ( 1 file changed, 20 insertions(+), 2 deletions(-))

@modular-magician
Copy link
Collaborator

Hi! I'm the modular magician. Your PR generated some diffs in downstreams - here they are.

Diff report:

Terraform GA: Diff ( 3 files changed, 135 insertions(+), 30 deletions(-))
Terraform Beta: Diff ( 3 files changed, 135 insertions(+), 30 deletions(-))
Ansible: Diff ( 2 files changed, 37 insertions(+), 4 deletions(-))
TF Conversion: Diff ( 1 file changed, 41 insertions(+))
TF OiCS: Diff ( 1 file changed, 20 insertions(+), 2 deletions(-))

@megan07
Copy link
Contributor

megan07 commented Sep 16, 2020

/gcbrun

@modular-magician
Copy link
Collaborator

Hi! I'm the modular magician. Your PR generated some diffs in downstreams - here they are.

Diff report:

Terraform GA: Diff ( 3 files changed, 135 insertions(+), 30 deletions(-))
Terraform Beta: Diff ( 3 files changed, 135 insertions(+), 30 deletions(-))
Ansible: Diff ( 2 files changed, 37 insertions(+), 4 deletions(-))
TF Conversion: Diff ( 1 file changed, 41 insertions(+))
TF OiCS: Diff ( 1 file changed, 20 insertions(+), 2 deletions(-))

Copy link
Contributor

@megan07 megan07 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thank you!

@megan07 megan07 merged commit b5b2ceb into GoogleCloudPlatform:master Sep 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TPU shared VPC
3 participants