Skip to content
This repository has been archived by the owner on Feb 5, 2020. It is now read-only.

gcp basis #1918

Merged
merged 25 commits into from
Sep 28, 2017
Merged

gcp basis #1918

merged 25 commits into from
Sep 28, 2017

Conversation

enxebre
Copy link
Member

@enxebre enxebre commented Sep 15, 2017

This is a follow up for the work started here #531

  • It allows to consistently recreate and destroy a functional Tectonic cluster on Google Cloud Compute with three masters, three etcd nodes and TLS support.
  • It allows to consistently deploy and access a simple app in the form of pod + service + ingress.
  • It adds basic smoke testing for GCP
  • It runs with Terraform v0.10.4 and use all the common/cloud abstracted modules in line with the other providers.

@coreosbot
Copy link

Can one of the admins verify this patch?

2 similar comments
@coreosbot
Copy link

Can one of the admins verify this patch?

@coreosbot
Copy link

Can one of the admins verify this patch?

Copy link
Contributor

@s-urbaniak s-urbaniak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great so far, thanks! Just a few comments/refactoring suggestions. 👍

@@ -0,0 +1,45 @@
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mind to move all dns related bootstrapping to modules/dns? That way we make the DNS story swappable, as this is requested by some users.

Copy link
Member Author

@enxebre enxebre Sep 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @s-urbaniak thanks for the comments! I'm looking at the refactoring stuff but may be better leave it for a follow up PR? This is something functional now and refactoring can lead to difficult to anticipate dependency issues which can affect to other clouds which can lead to other problems out of the scope of having an "alpha" gcp version in place which should be the target here. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to avoid platform skew as much as possible. We loosened constraints in the past for other platforms and it turned out to be a big mistake, especially in the light of alpha platforms. Being an alpha platform we can loosen the constraint on test coverage but I would like to avoid to loosen the constraint on general architectural compliance.

More specifically we are working on enabling optional modules. If we leave GCP behind (as we already did with modules/aws which still has its internal DNS implementation) re-priorization of tasks could slip tech-debt.

tldr: I'd really advocate for having dns logic in modules/dns/gcp. Sorry if this complicates things.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

--initial-advertise-peer-urls=${var.tls_enabled ? "https" : "http"}://${var.cluster_name}-etcd-${count.index}.${var.base_domain}:2380 \
--listen-client-urls=${var.tls_enabled ? "https" : "http"}://0.0.0.0:2379 \
--listen-peer-urls=${var.tls_enabled ? "https" : "http"}://0.0.0.0:2380
EOF
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note to myself: we really need to unify this. We already have done this for the kubelet, but etcd is a complex service and needs one place to be configured. Leave it here for now, we will tackle this as a follow-up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PS: this is already planned for the next sprint.

value = ["${split(",", length(var.external_endpoints) == 0 ? join(",", google_dns_record_set.etc_a_node.*.name) : join(",", var.external_endpoints))}"]
}

# vim: ts=2:sw=2:sts=2:et:ai
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no strong feelings, but I'd like to keep editor-specific metadata out of the files.

base_instance_name = "mstr"
}

# vim: ts=2:sw=2:sts=2:et:ai
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see nit above

value = ["${google_compute_instance_group_manager.tectonic-master-igm.*.instance_group}"]
}

# vim: ts=2:sw=2:sts=2:et:ai
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...


[Install]
WantedBy=multi-user.target
RequiredBy=bootkube.service kubelet-env.service
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a carbon-copy of modules/aws/master-asg/resources/services/init-assets.service. As we use the very same thing here, this becomes a candidate for unification.

Can you please move this to modules/ignition/resources/services and dependency inject it as the other ignition artifacts?

port_range = "443"
}

resource "google_dns_record_set" "api-external" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above for etcd. let's bootstrap all dns records in modules/dns/google.

@@ -0,0 +1,42 @@
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a naming nit/question: in platforms we call it platforms/gcp, in modules we call this modules/google. seems impedance mismatch'y a little bit, no?

@coreosbot
Copy link

Can one of the admins verify this patch?

2 similar comments
@coreosbot
Copy link

Can one of the admins verify this patch?

@coreosbot
Copy link

Can one of the admins verify this patch?

s-urbaniak
s-urbaniak previously approved these changes Sep 19, 2017
Copy link
Contributor

@s-urbaniak s-urbaniak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM on first eye-sight, but referring to @alexsomesan for a final opinion.

Copy link
Contributor

@alexsomesan alexsomesan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a partial look and am putting down some of the comment while I go through the rest of it, so we can speed up the process.

|------|-------------|:----:|:-----:|
| google_managedzone_name | GCP resource name of Cloud DNS ManagedZone - created outside of Tectonic | string | `my-managed-zone` |
| tectonic_dns_prefix_name | DNS prefix used to construct the console and API server endpoints. | string | `` |
| tectonic_etcd_nodes_max | Upper limit for auto-scaling, per zone. | string | `2` |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm missing something this is not being used anywhere in the current change set.

Also, is this intended to configure autoscaling for etcd nodes? Can't really make that out from the description.

| Name | Description | Type | Default |
|------|-------------|:----:|:-----:|
| google_managedzone_name | GCP resource name of Cloud DNS ManagedZone - created outside of Tectonic | string | `my-managed-zone` |
| tectonic_dns_prefix_name | DNS prefix used to construct the console and API server endpoints. | string | `` |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a GCP specific setting. It should be infixed with _gcp_ just like the other ones introduced in this file.


| Name | Description | Type | Default |
|------|-------------|:----:|:-----:|
| google_managedzone_name | GCP resource name of Cloud DNS ManagedZone - created outside of Tectonic | string | `my-managed-zone` |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why doesn't this one have a tectonic_gcp_ prefix?

All installer inputs have to be namespaced (at least) under tectonic_. If it refers to an externally provisioned resource we usually denote that by including something like _external_ or _ext_.

See here, for example: https://github.com/coreos/tectonic-installer/blob/master/platforms/aws/variables.tf#L80

| tectonic_gcp_worker_disktype | The type of disk (pd-standard or pd-ssd) for the worker nodes. | string | `pd-standard` |
| tectonic_gcp_worker_gce_type | Instance size for the worker node(s). Example: `n1-standard-2`. | string | `n1-standard-2` |
| tectonic_gcp_zones | List of two or more zones to use from specified GCP region. | list | `<list>` |
| tectonic_masters_max | Upper limit for auto-scaling, per zone. | string | `2` |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_gcp_ platform infix? Same for the one below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @alexsomesan thanks for pointing that out, just removed the stale variables and fixed namespacing

Copy link
Contributor

@alexsomesan alexsomesan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I went through the rest of it and left some more comments.

One general question on top: how does SSH-ing into the nodes work? How would the user specify the key? We need that option for debugging the cluster.

description = "The CIDR range to use for the subnetwork for etcd."
}

variable "tectonic_gcp_network_etcd_loadbalancer_ip" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need to be a user-facing input?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stale variable, thanks!

description = "The CIDR range to use for the subnetwork for Workers."
}

variable "tectonic_gcp_network_etcd_cidr_range" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need to be a user-facing input?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

description = "List of two or more zones to use from specified GCP region."
}

variable "tectonic_gcp_network_masters_cidr_range" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need to be a user-facing input?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, don't really need to. This also might be taken into consideration on how we handle the zones. I removed the user facing var and hardcoded the value on the module input. In line with my comment above, keeping things simple and explicit for now

description = "The GCP region to use. Some regions only have 2 zones."
}

variable "tectonic_gcp_zones" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you rather use this here and not have this as an input?
Also we'd rather not have default values here, for the same reason as the region parameter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @alexsomesan just removed default values. This is related to how we handle the number of instances, the instance group managers here https://github.com/coreos/tectonic-installer/pull/1918/files#diff-00091bc50a0d600c32caba6d18409955R53, how we'll want to handle them once this lands in terraform hashicorp/terraform-provider-google#394 and also to adding autoscaling. I'd rather leave it explicit for now as it makes things more obvious, simple and easy, we can complement it later in conjunction with those other factors. what do you think?


variable "tectonic_gcp_region" {
type = "string"
default = "us-central1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're deliberately not setting any default for regions in any of the other platforms. We want this to be a deliberate input from the user. This is to raise the user's awareness about where their resources are going to be created.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @alexsomesan make sense, just removed the default. Raising a warning here though as it seems we are actually currently setting the default for aws https://github.com/coreos/tectonic-installer/blob/master/platforms/aws/variables.tf#L262

"tectonic_gcp_project_id": "project-id",
"tectonic_gcp_region": "us-central1",
"google_managedzone_name": "managedzone-name",
"tectonic_base_domain": "tectonic.dev.coreos.systems",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This domain is already registered in Route53 for the AWS installer.
Does this actually also work here? Does it not need to be registered in Google Cloud NS?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither "project-id" or "managedzone-name" exist yet. I set those values as a sample, proper values will need to be set later when hooking into CI system

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks for making that clear. Makes sense now.

@enxebre
Copy link
Member Author

enxebre commented Sep 22, 2017

Hey @alexsomesan I'm relying on gcloud for ssh into the machines for now which handle the ssh keys on demand. https://cloud.google.com/compute/docs/instances/connecting-to-instance#gcetools
i.e gcloud compute ssh [INSTANCE_NAME]

alexsomesan
alexsomesan previously approved these changes Sep 25, 2017
Copy link
Contributor

@alexsomesan alexsomesan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this now looks in shape to get in.
We can address eventual minor issues that come out of testing as they arise.

@alexsomesan
Copy link
Contributor

ok to test

limitations under the License.
*/

provider "google" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To keep things aligned with the other platforms and allow the CI tool to work consistently, we should only inject the region via config variables.
credentials and project should come from the corresponding Terraform / SDK environment variables.
(See here and here)

@alexsomesan
Copy link
Contributor

one test timed out - looks unrelated
we can merge

@alexsomesan alexsomesan merged commit 28c95c5 into coreos:master Sep 28, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants