-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for K8s pod #13571
Add support for K8s pod #13571
Conversation
Hi @radeksimko ,
So when user inputs some specific volumes to be mounted in the "tf" file what comes back in the API read is a Pod which has 2 volumes , one the user specified and the one k8s added by default. So, I was thinking to loop over the user input of volumes in the tf configuration and just check it is present in the volumes obtained from the API in the Read operation phase without setting back the schema ( as schema already has the user input). Is there any other way to deal with such cases where some resources get attached/created over and above the user input? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ashishth09
thank you very much for the PR. It was nearly on the top of my todo list. 😄
As you might know this is the biggest resource (in terms of LOC) and possibly the most critical one in K8S. I'll try to keep reviewing changes you make in a timely manner, but expect a few more feedback cycles with this PR as we get closer to merge time 😃
Overall the PR is looking as pretty good start 👍 Out of interest - did you do this all by hand or did you use https://github.com/radeksimko/terraform-gen ?
So, I was thinking to loop over the user input of volumes in the tf configuration and just check it is present in the volumes obtained from the API in the Read operation phase without setting back the schema ( as schema already has the user input).
That is pretty much what we'll have to do. I did something similar for internal K8S annotations already. The filtering shouldn't be too hard, the tricky part is to reliably check if a given volume or container is "internal" and avoid ignoring any user-defined volumes and containers.
What about 1st filtering all volumes with name prefixed as default-token-
& Type=Secret
and then double check via Secrets API if that the secret has this annotation?
kubernetes.io/service-account.name: default
The user shouldn't be ever defining such annotation - at least we don't allow internal k8s annotations to be defined by the user in kubernetes_secret
and users shouldn't be generally touching annotations unless it's for experiments with alpha features.
Also once we're happy with the implementation we'll also need some docs. Feel free to use the mentioned generator for bootstrapping it.
} | ||
in := p[0].(map[string]interface{}) | ||
|
||
if v, ok := in["active_deadline_seconds"].(*int64); ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the expected type here is int
(without the pointer).
Elem: &schema.Resource{ | ||
Schema: containerSchema(), | ||
}, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the pod spec schema is missing a few fields, specifically init_containers
, host_pid
and security_context
. Is there any specific reason you left these out?
att["active_deadline_seconds"] = *in.ActiveDeadlineSeconds | ||
} | ||
if in.DNSPolicy != "" { | ||
att["dns_policy"] = in.DNSPolicy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'll need to cast to string here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the terraform core is somehow taking care of serializing the right type ? Seems like magically working. I will check this again and put the cast.
att["node_selector"] = in.NodeSelector | ||
} | ||
if in.RestartPolicy != "" { | ||
att["restart_policy"] = in.RestartPolicy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above - we'll need to cast to string.
https://github.com/kubernetes/kubernetes/blob/v1.5.3/pkg/api/v1/types.go#L2035
|
||
if len(in.NodeSelector) > 0 { | ||
att["node_selector"] = in.NodeSelector | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is duplicate code - we have NodeSelector
already above.
} | ||
} | ||
} | ||
`, name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good start, but we'll certainly need better test coverage - especially because pod is such an important critical resource in the context of K8S.
A few examples on my mind:
TestAccKubernetesPod_withVolumes
TestAccKubernetesPod_withVolumeMounts
TestAccKubernetesPod_withInitContainers
we both might come up with more as we test this and observe some failure scenarios.
if v.PersistentVolumeClaim != nil { | ||
obj["persistent_volume_claim"] = flattenPersistentVolumeClaimVolumeSource(v.PersistentVolumeClaim) | ||
} | ||
//More values needed here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have a look at flattenPersistentVolumeSource
- I think we'll need something similar here. e.g.
if v.GCEPersistentDisk != nil {
obj["gce_persistent_disk"] = flattenGCEPersistentDiskVolumeSource(v.GCEPersistentDisk)
}
//Is this right way to do this | ||
//p := expandPersistentVolumeSource([]interface{}{v}) | ||
//vl[i].AWSElasticBlockStore = p.AWSElasticBlockStore | ||
//vl[i].AzureDisk = p.AzureDisk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we won't be able to leverageexpandPersistentVolumeSource
here because it's a different type (Volume
vs PersistentVolumeSource
) and the VolumeSource
is embedded directly within Volume
.
The best way to do this is probably steal some code from there and copy it here. Fortunately all of the source-specific expanders should work just fine, i.e. I'd expect something like this:
if v, ok := in["gce_persistent_disk"].([]interface{}); ok && len(v) > 0 {
vl[i].GCEPersistentDisk = expandGCEPersistentDiskVolumeSource(v)
}
} | ||
vl := make([]api.Volume, len(volumes)) | ||
for i, c := range volumes { | ||
v := c.(map[string]interface{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for clarity it would be nice to perhaps call it m
(as "map") rather than v
. That will free you up that variable name below where you can use it as "value".
While "value" is not great variable name, the map here is more significant because it is used in more lines - as opposed to 3-line conditionals where it matters less.
}, | ||
"containers": { | ||
Type: schema.TypeList, | ||
Optional: true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this field cannot be updated, so I think we'll need to mark it as such via ForceNew: true
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that deployment or replication controllers etc may use pod's schema and they would allow the containers to be updated, sort of, basically they would destroy and create a new set of containers internally. So I thought it would be better to leave it as it is. As providing ForceNew and using the same schema in deployment would lead to deployment destruction. I am not very sure how you would typically handle this one. Perhaps by not making the deployment depend on the pod schema?. This would lead to lots of code duplication though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 I'm not sure I understand entirely, but either we need to perform a real update/patch API operation for this field or mark it as ForceNew - there is no other expectation from the user - either they want it to be updated or recreated, certainly not ignored.
I will bump kubernetes_replication_controller
resource in priority to understand better how is it affected by pod in this context, but I won't get to it until Friday as I'm off to Google Next on Wed+Thu.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some food for thought:
I wonder if it's possible to reference a previously created PodTemplate via ObjectMeta:
https://github.com/kubernetes/kubernetes/blob/v1.5.3/pkg/api/types.go#L1968
Not that we could avoid the coupling between potential kubernetes_pod_template
and kubernetes_pod
but at least it's more obvious for these resources and more importantly recreation wouldn't hurt if we could then just update the reference in kubernetes_replication_controller
rather than the whole spec.
Recreating whole RC just to update PodSpec is indeed overkill - I agree with you here.
Hi @radeksimko , I was waiting for your reply 😄 So during that time I had fixed most of the issues you have mentioned. Right, testing is going to be a key here. The schema has tons of fields. In cases , I skipped some fields as you mentioned. I was just curious to see this up and running with very basic things working first. No other reasons to leave them out 😉 I will update the PR soon with the new changes. Thanks for inputs and pointers. I didn't know about https://github.com/radeksimko/terraform-gen. This is going to be a great tool to use 👍 |
543ee12
to
11959a5
Compare
Hi @radeksimko, |
Hi @ashishth09 I was following the source code of vendored K8S code and that does contain My Google-fu allowed me to discover https://kubernetes.io/docs/concepts/workloads/pods/init-containers/ which says
so feel free to leave this one out as we only aim to cover stable fields/resources - sorry for the confusion. |
@ashishth09 I just spent some time reading more about Pods & PodTemplates & RCs and you're right in the sense that containers cannot be added nor removed for any pods, whereas RCs can update the attached pod with pretty much no limitations. The In addition to that the schema library doesn't have a way to express this behaviour cleanly (i.e. we have not implemented "ForceNewWhen" yet). To keep things simple I'd just fall back to the API validation and let invalid updates (which are adding or removing whole containers) pass through. I will go through the PR again and see if there's anything left. Based on your recent updates I'm assuming it's ready for another round of review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is starting to look better 🙂
Besides what I mentioned I see that both 2 acceptance tests are currently failing.
Let me know if you have any questions or need any help.
Thank you for all the work so far.
Default: false, | ||
Description: `Run container in privileged mode. Processes in privileged containers are essentially equivalent to root on the host.`, | ||
}, | ||
"read_only_root_filesystem ": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a forgotten trailing space here.
} | ||
log.Printf("[INFO] Pod %s created", out.Name) | ||
|
||
d.SetId(buildId(out.ObjectMeta)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be safer to keep this line above the retry logic so that if the pod fails to reach "Running" for any reason we can keep it in the state and let the user remove it.
namespace, name := idParts(d.Id()) | ||
ops := patchMetadata("metadata.0.", "/metadata/", d) | ||
if d.HasChange("spec") { | ||
specOps, err := patchPodSpec("/spec", "spec", d) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind including the 0th index in the prefix too, so that we don't hide it in the implementation? It may sound like a nitpick, but it's also to make the patchPodSpec
reusable just in case we'd change the parent schema in any way in the future.
return fmt.Errorf("Failed to marshal update operations: %s", err) | ||
} | ||
|
||
log.Printf("[INFO] Updating pod%s: %s", d.Id(), ops) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing space here. 👓
return err | ||
} | ||
|
||
err = d.Set("spec", flattenPodSpec(pod.Spec, userSpec)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand this - why exactly do you need the config (userSpec
) here? We should be able to filter out the internal volumes without it, I think - unless I'm missing something 🤔
obj := map[string]interface{}{} | ||
|
||
if !userVolumeNames[v.Name] { | ||
//Found a volume which was not provided by user in the tf configuration file and must be internal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit tricky situation here. It is a convention and expectation from the user that Terraform will detect any drifts of the config from the reality - i.e. if someone adds a volume outside of Terraform (e.g. by calling K8S API directly or just by using kubectl) we need to be able to show that to the user.
At this moment with this current implementation we'd hide it as we're expecting that the config is single point of truth.
Also we would be wiping those internal volumes during each update, which makes me think that DiffSuppressFunc
would be good candidate for this - instead of having the filtering logic in flattener and expander.
Is there any reason you want to avoid the proposed solution I mentioned earlier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had this locally on my laptop last time before I had asked for help 😄. So I am yet to implement the logic you suggested. This would take care of the userSpec in the above review comment
return vl, nil | ||
} | ||
|
||
func expandPodTemplateSpec(p []interface{}) (v1.PodTemplateSpec, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can see this function is currently unused? 👀 I'm happy for you to help with the implementation of RC 😃 , but I'd prefer if kept this PR only focused on Pod.
Value: v.Image, | ||
}) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming there's still some work left to do in terms of missing fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can only update the active_deadline_seconds and container's image in a Pod.
es = append(es, fmt.Errorf("%s must be greater than 0", key)) | ||
} | ||
return | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since they're all the same, how would you feel about merging all the above validation functions into one and call it e.g. validatePositiveInteger
?
Hi @radeksimko , Thanks for the review comments. Right still some work to do in terms of missing fields. Basically the containers one now. I think I have Pod level one's covered. I will recheck. I think I have those tests passing though 😄 I am running them against minikube on my system.
|
The tests are certainly & consistently failing on GKE using the test setup from https://github.com/hashicorp/terraform/blob/master/builtin/providers/kubernetes/test-infra/main.tf See https://gist.github.com/radeksimko/ee386a97e5cd809fe44b03fe373c1265 |
@radeksimko , Thanks for putting it out. I will take a look. |
I see the following lines in the gist
The test case doesn't set those values. I need to see if there is some default values getting set internally by the k8s 😃 based on some rules at the cluster level itself. For requests I am unsure though. I don't think k8s will set any default value for the requests as it should come from the user. |
Hi @radeksimko , So I created a limit range at the cluster level on some namespace and ran the test on the same namespace. I was able to re-create the issue. Now the problem is how to solve it 😃 The resources can have limits and requests field which can get default values if some one has created a limits resource on the cluster. Containers resources can't be updated for a Pod. So no one can add/remove resources for a container outside terraform also. Hence no drift can be seen from outside as well. So should I just check if the input value for resources in tf file is same as what we get on reading back the pod and ignore persisting it if it differs. |
Hi @radeksimko ,
So if image_pull_secrets is not defined in the user tf file but someone has defined the secret at the cluster level then we we read back we get those results which we should ignore. So at this point we need to handle
My first thought is to just persist them if they are there in the tf file else ignore. |
Hi @ashishth09 👋
The fact that a part of a resource is not updatable doesn't mean that a drift cannot occur. Someone can still recreate the resource under the same name with different attributes. More importantly we usually stick a to this convention (of always detecting any possible drift) because it's how the most basic import mechanism (passthrough) works. It just calls See how we verify this behaviour in other resources by
For all of the cases where a drift may occur because there are some internal "subresources" the preferred approach is to detect such internal subresources and filter those out without checking any user config. In this specific case I'd suggest we look up some details about I hope it makes sense. 😃 |
Hi @radeksimko , Sure, I get your drift (pun intentional 😃.) I think before proceeding any further to fine tune things, I will need your review of the image_pull_secrets and the resource_requirements of the containers. From what I see image_pull_secrets can get the same value as you define or if you don't then if service account has those defined. So I made it For the container resource requirements https://kubernetes.io/docs/api-reference/v1.5/#resourcerequirements-v1 too, if there is any limitrange define on the cluster then the default values from that could be assigned as the resource requirements of the containers, else whatever use provides that is picked. So I made the individual fields |
Hah, drift 😄 I will have a look tomorrow - I was working on I really struggled to reliably identify what's default (machine-managed) and what may come from the user, so I totally feel your pain. |
Hi @ashishth09 The cleanest approach which I think we should take is to enforce this new flag when creating the pod: Unfortunately it was added in 1.6 so I'll need to bump the vendored k8s repo to support this. I thought originally it could be avoided until we finish implementing all I will hold off on submitting Service Account because of the same problem and RC (because it will reuse part of the schema from your PR here). I will see if I can finish Thank you for the effort so far & for the patience. I hope it all makes sense? 😅 |
^ that is to address the secrets drift - I haven't looked into the other problem you mentioned about limit ranges yet |
I think by enforcing that flag we are letting go of I think enforcing that flag would work for only "Mountable secrets" as seen in the output below
So on a pod, I need to know whether the image_pull_secret is internal or not. I don't think we can find that out without taking user input into consideration. Case 1) User specifies ["myregistrykey"]. We persist it even though it matches the service account "Image pull secrets". Case 2) User specifies ["abcde"]. That secrets exists in the same namespace but it is not part of the current service account. We need to persist it. |
Hi @radeksimko , I think I will take this opportunity to learn the correct usage of optional and compute both set to true on a schema paramter. My understanding is if something could be assigned by server side of the API even if user doesn't give the value for that then it should be optional + computed in the schema. Is the Image pull secrets above not really one?. I mean If I don't provide one in tf file then one gets assigned to if one is set at the service account level. Else it sets what the user provides. So I don't have to worry if something like "myregistrykey" shows up in the terraform state even if the tf configuration file doesn't have that because I make it optional + computed. Thanks |
@ashishth09 you're right, but this rule only applies if the optional user input replaces the default value completely, not when the default value is a value that you add up to. There's no such thing as "partially optional" - like half of the map/list being computed and half defined by the user. If a field is In other words - even if you specify your own image pull secrets AFAIK the kubelet will assign the default secrets anyway and you'd be then fighting with kubelet by wiping that default secret in any subsequent apply as it would keep appearing in the diff. Correct me if I'm wrong, but I remember looking at that specific behaviour a few weeks back and it behaved that way. re my current state - I'm dealing with vendoring issues of k8s 1.6.4, but overall the resource coverage (after adding service + HPA) is now as far as we can get without this upgrade, so this upgrade is my main focus. |
Sure even I got very little time in the last week to make any progress. It picked up that secret only rather than the default secret defined on the service account. There was only one secret , the secret given by user. So looks like it really is optional + computed holding only one of them at a time. Thanks for confirming the definition 😃 While testing I did find a bug also in my code, I will push that today. While we are at it 😃 (and I am being selfish to learn things from you) was wondering about how optional + computed fields behave . Since it is computed and someone changes the field outside terraform we will not know the changes made to it if that field is not present in the tf file. Is that fine? I sense some drift occurring here. |
If the field is not defined in the config then yes - it will behave exactly that way. That's why it's best to avoid Computed if it's not necessary and/or use |
FYI - just opened #14923 - also verified that it's actually fixing our problem with pod here. Feel free to subscribe to that PR to get notified when it's merged - then you can rebase and avoid all the complex logic for mounted secrets and then we can have discussion & debugging around the limit ranges. Also you were right that this isn't solving similar problem with service account. I have just raised a question about that at SO: https://stackoverflow.com/questions/44262350/how-to-avoid-default-secret-being-attached-to-serviceaccount but I have a feeling it may not be possible and we'll probably end up filing an issue to upstream (K8S). |
@ashishth09 The PR upgrading to 1.6.1 was merged and I'd like to get this PR to the finish line and eventually help you take it there next week, if possible. Let me know how you see it. 😃 If you don't have any time next week, that's 👌 as well, just let me know and I can finish it for you (and keep the authorship of existing commits, obvs). |
Hi @radeksimko , Two things (apart from some new fields in the new API) that I feel are pending: I would be interested to know how we handle the limit range issues though. I had made them |
I will take care of the extra updatable fields, but I'm struggling to reproduce the issue with limit ranges 🤔 It looks like the behaviour has changed in 1.6 or I forgot what's the exact configuration needed to trigger the problem. I tried this: resource "kubernetes_namespace" "test" {
metadata {
name = "blablah"
}
}
resource "kubernetes_pod" "test" {
metadata {
name = "radek-pod-test"
namespace = "${kubernetes_namespace.test.metadata.0.name}"
}
spec {
containers {
image = "nginx:1.7.9"
name = "containername"
}
}
depends_on = ["kubernetes_limit_range.test"]
}
resource "kubernetes_limit_range" "test" {
metadata {
name = "radek-test-lr"
namespace = "${kubernetes_namespace.test.metadata.0.name}"
}
spec {
limit {
type = "Pod"
min {
cpu = "1m"
memory = "1M"
}
}
limit {
type = "Container"
default_request {
cpu = "50m"
memory = "24M"
}
}
}
} and saw no drift. Would you mind sharing any more details about your setup and/or your configs? Thanks |
If there is a default request/limit set at the cluster level then K8s applies that to the Pod's resource requirements. If the Pod itself defines its own requirements then K8s would apply the pod's requirements. Following this, I set limits/requests to be computed and the problem was gone. The change I did was - terraform/builtin/providers/kubernetes/schema_container.go Lines 92 to 143 in 16ef37d
I wanted your view on this change, if you think that's the right way to do or not 😃 My apologies for not being clear on the topic 😞 |
@ashishth09 As far as I can tell limit ranges are namespace-specific: What do you mean by cluster-level limits and how does one set such limits? |
Yes you are right. So the points I mentioned above are applicable to this namespace. Any new pods created in the namespace go though this rule. If default limits are set and you have not defined any in Pod then that is what is attached to the pod else Pod's limits are honored. This is what I observed. I am not sure what happens when more than one default limits are defined in a namespace. If K8s do some kind of |
I'd just like to understand how to reproduce the problem and possibly add a regression test - that's all 😄 - hence I'm wondering how did you set such default cluster-wide limits? Is there any example with |
No there is no cluster-wide limits. As you pasted the link in previous comment it is on a namespace level 😃 So when I first wrote the code I didn't know that system is applying default limits to the Pod. So to have similar environment at my end, I applied https://k8s.io/docs/tasks/configure-pod-container/limits.yaml to my local So you can't reproduce the same problem now 😃 Setting the value as
I think we would need to add some tests around this also. Also how things would behave in case multiple limits are defined on the same namespace as pointed before. In general I think we might not worry about this. But some test cases to take care of internal unit conversion should be there. |
Just to add to that.. I had no test specifically for limits initially and still that limits issue came which you had shown in your gist. I was not setting any values in the my test configuration (there was no limits block)but k8s set the default value and hence terraform reported the diff below. |
16ef37d
to
7906235
Compare
This was certainly the right decision and I saw a test covering this, so that's 👌 I cannot reproduce the issue with drifting limits anymore, but I will believe you as I saw that in the past too 😄 so I'll keep it as TL;DR If the field doesn't need to be computed it should not be. I have pushed the following final touches to your branch:
resource "kubernetes_pod" "test" {
metadata {
name = "terraform-example"
}
spec {
container {
image = "nginx:1.9.7"
name = "example"
}
container {
image = "nginx:1.9.6"
name = "example-two"
}
}
} I know there's another way to structure it w/ |
Right, with the earlier code, I think I used to do resource "kubernetes_pod" "test" {
metadata {
name = "terraform-example"
}
spec {
containers = [{
image = "nginx:1.9.7"
name = "example"
},
{
image = "nginx:1.9.6"
name = "example-two"
},
]
}
} |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further. |
I was just testing it and looks default volumes are created as part of Pod creation which is leading to the below error.