-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
intra-egress
firewall rule depends on google_container_cluster.primary
#1124
Comments
Honestly, this seems like a bit of an edge case. Since you already have a way to grant the correct network tag, shouldn't your existing firewall rule be able to allow the traffic? |
My problem is that the firewall rule is never getting created -- there's a Terraform dependency causing it to wait for the This comment from the That said -- I can certainly copy these firewall rules and create them myself (without the implicit dependency on the |
Hmm, the firewall rules were originally meant as as "shadow" rules to enable logging. #741 has more context. @rux616 Since you added the TPU support in #810, can you confirm this current setup is actually working for you? I'm actually unsure why it would.
We could probably split into a rule for the cluster overall, then a rule specific to just TPUs. |
I'm currently on holiday, but will dig into this when I get back next week. Just off the top of my head though, I believe we haven't had to touch the code on our end where the cluster is defined in a while as it has been working for us. |
Since #1126 is merged (thanks to @tomasgareau!), this issue can be closed as fixed. |
@apeabody ^ can we close this one? |
Completed in #1126 |
Summary
The intra-cluster egress firewall rule appears to depend on
google_container_cluster.primary
. This means it will not be created until after the default node pool is created and deleted. If this node pool requires the firewall rule in order to report back to the control plane (as is the case in a private cluster in a VPC network with a default-deny-all rule), the default node pool can never get created.Context
I'm deploying a
safer-cluster
in a VPC network that enforces a default deny-all rule. Exceptions are granted through the use of network tags.The
safer-cluster
module includes anadd_cluster_firewall_rules
variable that adds firewall rules for this use-case (in particular, one allowing intra-cluster egress, which is required for node pools to communicate back to the control plane).Through #1118 & #1123 I added the
gke-${var.name}
network tag to the default pool so that this firewall rule would apply to it.Expected behaviour
I would now expect the following sequence of actions:
google_container_cluster.primary
resource is createdgke-${var.name}
tag to the default pool)Actual behaviour
google_container_cluster.primary
resource starts being createdgoogle_container_cluster.primary
resource to be createdSuspected root cause
fff0078 added TPU support to beta clusters. Part of this change included adding
google_container_cluster.primary.tpu_ipv4_cidr_block
to thedestination_ranges
of thegke-intra-egress
firewall rule:terraform-google-kubernetes-engine/autogen/main/firewall.tf.tmpl
Lines 37 to 43 in fff0078
I suspect this creates a Terraform dependency on the
google_container_cluster.primary
resource. In my case, I have not enabled TPU (and have no need for it). I reverted fff0078 on my test branch and was able to bring up mysafer-cluster
instance successfully (i.e., the intra-cluster egress firewall rule was created before the default pool).Potential fix
This could maybe be addressed by creating two firewall rules -- one that can be created before the cluster (without the TPU IPv4 block) and one that is created after the cluster (for the TPU IPv4 block). Don't know if this is zooming in too narrowly on my use-case, however (-:
The text was updated successfully, but these errors were encountered: