-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add instructions for deploying with ConfigSync/ACM #4
Comments
Issue-Label Bot is automatically applying the labels:
Please mark this comment with 👍 or 👎 to give our bot feedback! |
Here are some issues I'm running into as I try this out. #14 - ACM and the iap-enabler pod will conflict trying to set the ingress policy We will need to create namespace directories in the ACM repo for
|
It looks like we need a tool to chop up the K8s configs and lay them out in the format ACM expects. |
I'm hitting various problems with Kubeflow manifests violating ACM validation requirements.
I'm also seeing errors like the following.
This appears to be due to kustomize applying the namespace transform even to instances of cluster scoped custom resources. I will need to investigate to see whether there is a way to customize the namespace transform to prevent it from being applied to clusterscoped resources. |
Another alternative would be a kpt transform to fix the resources. |
I didn't see a way to do this with kustomize. I filed |
* This is a first pass at coming up with a recipe for deploying Kubeflow using ACM and GitOps. * There's lots of friction but I was able to get it to work. * Related to GoogleCloudPlatform#4
* This is a first pass at coming up with a recipe for deploying Kubeflow using ACM and GitOps. * There's lots of friction but I was able to get it to work. * Related to GoogleCloudPlatform#4
* This is a first pass at coming up with a recipe for deploying Kubeflow using ACM and GitOps. * There's lots of friction but I was able to get it to work. * Related to #4
* Modify get_kf_testing_cluster.py so we can output to a YAML file information about the cluster that we matched against. * This is necessary to allow getting information such as the name of the deployment in subsequent steps/tasks. * Refactor get_kf_testing_cluster.py so we can start using the python Fire module to create CLI entrypoints as opposed to using argparse. * Provide backwards compatibility with argpase Related to: GoogleCloudPlatform/kubeflow-distribution#4 endpoint ready test is failing
* GCP endpoint ready test needs to set the name of the cluster * Modify get_kf_testing_cluster.py so we can output to a YAML file information about the cluster that we matched against. * This is necessary to allow getting information such as the name of the deployment in subsequent steps/tasks. * Refactor get_kf_testing_cluster.py so we can start using the python Fire module to create CLI entrypoints as opposed to using argparse. * Provide backwards compatibility with argpase Related to: GoogleCloudPlatform/kubeflow-distribution#4 endpoint ready test is failing * Fix lint. * Fix lint.
Running
Shows the files that have errors. Looks like we have a bunch of empty files corresponding to ISTIO components that aren't being installed. Apparently this causes ACM to choke. |
It looks like the problem might have been I was using an older version of nomos and config sync. When I upgraded to
That appeared to fix the problem. |
* This is needed to produce YAMLs that are compatible with ACM related to * GoogleCloudPlatform/kubeflow-distribution#27 kustomize function to remove namespace * GoogleCloudPlatform/kubeflow-distribution#4 instructions for ACM
* Add a kustomize function to remove namespace * This is needed to produce YAMLs that are compatible with ACM related to * GoogleCloudPlatform/kubeflow-distribution#27 kustomize function to remove namespace * GoogleCloudPlatform/kubeflow-distribution#4 instructions for ACM * Fix spec. * Add the serviemanagement API because this is what cloudendpoints uses.
* Use a kpt function to remove namespace from non namespace scoped objects * Use yq to attach backend config to the ingress. * Remove the iap enabler pod; this is a partial work around for GoogleCloudPlatform#14 * The IAP enabler pod will try to update the ISTIO security policy which will conflict with ACM. So we disable it for now even though that means we have to manually update the health check. * Switch to using a structured repo with ACM (GoogleCloudPlatform#29) * Add a script to rewrite the YAML files in the appropriate structure * If we don't use a structured repository we end up with problems because resources in different namespaces but with the same name will be written to the same file. * Add a hack to create the kube-system namespace as part of the ACM deployment. * Now that we are using structured repositories we need to have a namespace directory with a namespace.yaml for kube-system in order to install resources in that namespace. Related to GoogleCloudPlatform#4 - use ACM to deploy Kubeflow
I have a bunch of PRs pending that will make a bunch of fixes to using ACM to deploy Kubeflow
We are still having an issue with the reverse proxy routes as described in #22. Our virtual services are configured to use gateway "kubeflow-gateway". We need to change that to "istio-system/ingressgateway". A kpt function is probably a good way to do that. |
* Use a kpt function to remove namespace from non namespace scoped objects * Use yq to attach backend config to the ingress. * Remove the iap enabler pod; this is a partial work around for #14 * The IAP enabler pod will try to update the ISTIO security policy which will conflict with ACM. So we disable it for now even though that means we have to manually update the health check. * Switch to using a structured repo with ACM (#29) * Add a script to rewrite the YAML files in the appropriate structure * If we don't use a structured repository we end up with problems because resources in different namespaces but with the same name will be written to the same file. * Add a hack to create the kube-system namespace as part of the ACM deployment. * Now that we are using structured repositories we need to have a namespace directory with a namespace.yaml for kube-system in order to install resources in that namespace. Related to #4 - use ACM to deploy Kubeflow
* Use a kpt function to remove namespace from non namespace scoped objects * Use yq to attach backend config to the ingress. * Remove the iap enabler pod; this is a partial work around for GoogleCloudPlatform#14 * The IAP enabler pod will try to update the ISTIO security policy which will conflict with ACM. So we disable it for now even though that means we have to manually update the health check. * Switch to using a structured repo with ACM (GoogleCloudPlatform#29) * Add a script to rewrite the YAML files in the appropriate structure * If we don't use a structured repository we end up with problems because resources in different namespaces but with the same name will be written to the same file. * Add a hack to create the kube-system namespace as part of the ACM deployment. * Now that we are using structured repositories we need to have a namespace directory with a namespace.yaml for kube-system in order to install resources in that namespace. Related to GoogleCloudPlatform#4 - use ACM to deploy Kubeflow
…use workload identity #109: Update instructions for using ACM. #113: ACM: notebook controller needs to use istio ingress Cherry pick of #105 #109 #113 on v1.1-branch. #105: Management blueprint; add kptfile and use workload identity #109: Update instructions for using ACM. #113: ACM: notebook controller needs to use istio ingress (#122) * Management blueprint; add kptfile and use workload identity mode for CNRM * management/instance needs a Kptfile to work with the latest versions of kpt * Per #13 we don't want to run CNRM in namespace mode because this burdensome instead we use workload identity mode; i.e. the same GCP sa to administer multiple projects. Related to #13 - Use workload identity mode Related to #102 Fix blueprint * Remove cluster and nodepool patches from instance; we aren't actually patching anything. * Update instructions for using ACM. * Use a kpt function to remove namespace from non namespace scoped objects * Use yq to attach backend config to the ingress. * Remove the iap enabler pod; this is a partial work around for #14 * The IAP enabler pod will try to update the ISTIO security policy which will conflict with ACM. So we disable it for now even though that means we have to manually update the health check. * Switch to using a structured repo with ACM (#29) * Add a script to rewrite the YAML files in the appropriate structure * If we don't use a structured repository we end up with problems because resources in different namespaces but with the same name will be written to the same file. * Add a hack to create the kube-system namespace as part of the ACM deployment. * Now that we are using structured repositories we need to have a namespace directory with a namespace.yaml for kube-system in order to install resources in that namespace. Related to #4 - use ACM to deploy Kubeflow * ACM: notebook controller needs to use istio ingress istio-system/ingressgateway * Related to #111
Here's a couple issues I ran into trying to follow the latest instructions
There is a bug in the acm-gcp ule
The first command
|
We should add instructions for deploying with ConfigSync/ACM.
ConfigSync can install KCC so we don't have to do that piece.
However, the current version of KCC is too old and incompatible with some of our specs. So we need to wait for the next release of ACM.
The text was updated successfully, but these errors were encountered: