Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add instructions for deploying with ConfigSync/ACM #4

Open
jlewi opened this issue May 1, 2020 · 10 comments
Open

Add instructions for deploying with ConfigSync/ACM #4

jlewi opened this issue May 1, 2020 · 10 comments

Comments

@jlewi
Copy link
Contributor

jlewi commented May 1, 2020

We should add instructions for deploying with ConfigSync/ACM.

ConfigSync can install KCC so we don't have to do that piece.

However, the current version of KCC is too old and incompatible with some of our specs. So we need to wait for the next release of ACM.

@issue-label-bot
Copy link

Issue-Label Bot is automatically applying the labels:

Label Probability
feature 0.95

Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

@jlewi jlewi removed the feature label May 5, 2020
@jlewi
Copy link
Contributor Author

jlewi commented May 15, 2020

Here are some issues I'm running into as I try this out.

#14 - ACM and the iap-enabler pod will conflict trying to set the ingress policy

We will need to create namespace directories in the ACM repo for

  • kube-system (some cert-manager resources) get installed there
  • istio-system
  • kubeflow
  • cert-manager

@jlewi
Copy link
Contributor Author

jlewi commented May 15, 2020

It looks like we need a tool to chop up the K8s configs and lay them out in the format ACM expects.

@jlewi
Copy link
Contributor Author

jlewi commented May 19, 2020

I'm hitting various problems with Kubeflow manifests violating ACM validation requirements.

[2] KNV9999: metadata.annotations must be a map
  • This appears to be coming from the manifests generated by ASM; the istio-ingressgateway service in particular appears to be in violation

I'm also seeing errors like the following.

code-intelligence   KNV1052: cluster-scoped resources MUST NOT declare metadata.namespace

source: kubeflow.org_v1beta1_profile_kubeflow-jlewi.yaml
namespace: kubeflow
metadata.name: kubeflow-jlewi
group: kubeflow.org
version: v1beta1
kind: Profile

This appears to be due to kustomize applying the namespace transform even to instances of cluster scoped custom resources. I will need to investigate to see whether there is a way to customize the namespace transform to prevent it from being applied to clusterscoped resources.

@jlewi
Copy link
Contributor Author

jlewi commented May 19, 2020

Another alternative would be a kpt transform to fix the resources.

@jlewi
Copy link
Contributor Author

jlewi commented May 19, 2020

I didn't see a way to do this with kustomize. I filed
/kubernetes-sigs/kustomize#2498

jlewi pushed a commit to jlewi/gcp-blueprints that referenced this issue May 21, 2020
* This is a first pass at coming up with a recipe for deploying
  Kubeflow using ACM and GitOps.

* There's lots of friction but I was able to get it to work.

* Related to GoogleCloudPlatform#4
jlewi pushed a commit to jlewi/gcp-blueprints that referenced this issue May 21, 2020
* This is a first pass at coming up with a recipe for deploying
  Kubeflow using ACM and GitOps.

* There's lots of friction but I was able to get it to work.

* Related to GoogleCloudPlatform#4
k8s-ci-robot pushed a commit that referenced this issue May 21, 2020
* This is a first pass at coming up with a recipe for deploying
  Kubeflow using ACM and GitOps.

* There's lots of friction but I was able to get it to work.

* Related to #4
jlewi pushed a commit to jlewi/testing that referenced this issue Jun 27, 2020
* Modify get_kf_testing_cluster.py so we can output to a YAML file
  information about the cluster that we matched against.

  * This is necessary to allow getting information such as the name of
    the deployment in subsequent steps/tasks.

* Refactor get_kf_testing_cluster.py so we can start using the python Fire
  module to create CLI entrypoints as opposed to using argparse.

  * Provide backwards compatibility with argpase

Related to: GoogleCloudPlatform/kubeflow-distribution#4 endpoint ready test is failing
k8s-ci-robot pushed a commit to kubeflow/testing that referenced this issue Jun 29, 2020
* GCP endpoint ready test needs to set the name of the cluster

* Modify get_kf_testing_cluster.py so we can output to a YAML file
  information about the cluster that we matched against.

  * This is necessary to allow getting information such as the name of
    the deployment in subsequent steps/tasks.

* Refactor get_kf_testing_cluster.py so we can start using the python Fire
  module to create CLI entrypoints as opposed to using argparse.

  * Provide backwards compatibility with argpase

Related to: GoogleCloudPlatform/kubeflow-distribution#4 endpoint ready test is failing

* Fix lint.

* Fix lint.
@jlewi
Copy link
Contributor Author

jlewi commented Jul 31, 2020

Running

nomos vet --disable-hierarchy --no-api-server-check

Shows the files that have errors. Looks like we have a bunch of empty files corresponding to ISTIO components that aren't being installed. Apparently this causes ACM to choke.

@jlewi
Copy link
Contributor Author

jlewi commented Jul 31, 2020

It looks like the problem might have been I was using an older version of nomos and config sync.

When I upgraded to

gcr.io/config-management-release/config-sync-operator:20200723192352-op-cs

That appeared to fix the problem.

jlewi pushed a commit to jlewi/manifests that referenced this issue Jul 31, 2020
* This is needed to produce YAMLs that are compatible with ACM

related to

* GoogleCloudPlatform/kubeflow-distribution#27 kustomize function to remove namespace
* GoogleCloudPlatform/kubeflow-distribution#4 instructions for ACM
k8s-ci-robot pushed a commit to kubeflow/manifests that referenced this issue Aug 2, 2020
* Add a kustomize function to remove namespace

* This is needed to produce YAMLs that are compatible with ACM

related to

* GoogleCloudPlatform/kubeflow-distribution#27 kustomize function to remove namespace
* GoogleCloudPlatform/kubeflow-distribution#4 instructions for ACM

* Fix spec.

* Add the serviemanagement API because this is what cloudendpoints uses.
jlewi pushed a commit to jlewi/gcp-blueprints that referenced this issue Aug 2, 2020
* Use a kpt function to remove namespace from non namespace scoped
  objects

* Use yq to attach backend config to the ingress.

* Remove the iap enabler pod; this is a partial work around for GoogleCloudPlatform#14

  * The IAP enabler pod will try to update the ISTIO security policy
    which will conflict with ACM. So we disable it for now even though
    that means we have to manually update the health check.

* Switch to using a structured repo with ACM (GoogleCloudPlatform#29)

  * Add a script to rewrite the YAML files in the appropriate structure

  * If we don't use a structured repository we end up with problems because
    resources in different namespaces but with the same name will be written
    to the same file.

* Add a hack to create the kube-system namespace as part of the ACM deployment.

  * Now that we are using structured repositories we need to have
    a namespace directory with a namespace.yaml for kube-system
    in order to install resources in that namespace.

Related to GoogleCloudPlatform#4 - use ACM to deploy Kubeflow
@jlewi
Copy link
Contributor Author

jlewi commented Aug 2, 2020

I have a bunch of PRs pending that will make a bunch of fixes to using ACM to deploy Kubeflow

We are still having an issue with the reverse proxy routes as described in #22. Our virtual services are configured to use gateway "kubeflow-gateway". We need to change that to "istio-system/ingressgateway". A kpt function is probably a good way to do that.

k8s-ci-robot pushed a commit that referenced this issue Aug 3, 2020
* Use a kpt function to remove namespace from non namespace scoped
  objects

* Use yq to attach backend config to the ingress.

* Remove the iap enabler pod; this is a partial work around for #14

  * The IAP enabler pod will try to update the ISTIO security policy
    which will conflict with ACM. So we disable it for now even though
    that means we have to manually update the health check.

* Switch to using a structured repo with ACM (#29)

  * Add a script to rewrite the YAML files in the appropriate structure

  * If we don't use a structured repository we end up with problems because
    resources in different namespaces but with the same name will be written
    to the same file.

* Add a hack to create the kube-system namespace as part of the ACM deployment.

  * Now that we are using structured repositories we need to have
    a namespace directory with a namespace.yaml for kube-system
    in order to install resources in that namespace.

Related to #4 - use ACM to deploy Kubeflow
jlewi pushed a commit to jlewi/gcp-blueprints that referenced this issue Aug 12, 2020
* Use a kpt function to remove namespace from non namespace scoped
  objects

* Use yq to attach backend config to the ingress.

* Remove the iap enabler pod; this is a partial work around for GoogleCloudPlatform#14

  * The IAP enabler pod will try to update the ISTIO security policy
    which will conflict with ACM. So we disable it for now even though
    that means we have to manually update the health check.

* Switch to using a structured repo with ACM (GoogleCloudPlatform#29)

  * Add a script to rewrite the YAML files in the appropriate structure

  * If we don't use a structured repository we end up with problems because
    resources in different namespaces but with the same name will be written
    to the same file.

* Add a hack to create the kube-system namespace as part of the ACM deployment.

  * Now that we are using structured repositories we need to have
    a namespace directory with a namespace.yaml for kube-system
    in order to install resources in that namespace.

Related to GoogleCloudPlatform#4 - use ACM to deploy Kubeflow
k8s-ci-robot pushed a commit that referenced this issue Aug 13, 2020
…use workload identity #109: Update instructions for using ACM. #113: ACM: notebook controller needs to use istio ingress Cherry pick of #105 #109 #113 on v1.1-branch. #105: Management blueprint; add kptfile and use workload identity #109: Update instructions for using ACM. #113: ACM: notebook controller needs to use istio ingress (#122)

* Management blueprint; add kptfile and use workload identity mode for CNRM

* management/instance needs a Kptfile to work with the latest versions of kpt

* Per #13 we don't want to run CNRM in namespace mode because this burdensome
  instead we use workload identity mode; i.e. the same GCP sa to administer
  multiple projects.

Related to #13 - Use workload identity mode
Related to #102 Fix blueprint

* Remove cluster and nodepool patches from instance; we aren't actually patching anything.

* Update instructions for using ACM.

* Use a kpt function to remove namespace from non namespace scoped
  objects

* Use yq to attach backend config to the ingress.

* Remove the iap enabler pod; this is a partial work around for #14

  * The IAP enabler pod will try to update the ISTIO security policy
    which will conflict with ACM. So we disable it for now even though
    that means we have to manually update the health check.

* Switch to using a structured repo with ACM (#29)

  * Add a script to rewrite the YAML files in the appropriate structure

  * If we don't use a structured repository we end up with problems because
    resources in different namespaces but with the same name will be written
    to the same file.

* Add a hack to create the kube-system namespace as part of the ACM deployment.

  * Now that we are using structured repositories we need to have
    a namespace directory with a namespace.yaml for kube-system
    in order to install resources in that namespace.

Related to #4 - use ACM to deploy Kubeflow

* ACM: notebook controller needs to use istio ingress istio-system/ingressgateway

* Related to #111
@jlewi
Copy link
Contributor Author

jlewi commented Aug 23, 2020

Here's a couple issues I ran into trying to follow the latest instructions

  • ACM_MGMT_REPO isn't defined in the Makefile

There is a bug in the acm-gcp ule

acm-gcp: hydrate-gcp
	acm-gcp: hydrate-gcp
	mkdir -p $(ACM_MGMT_REPO)/namespaces/$(PROJECT)
	cp -r $(BUILD_DIR)/gcp_config/* $(ACM_MGMT_REPO)/namespaces/$(PROJECT)
	rm -rf $(BUILD_DIR)/gcp_config

The first command acm-gcp:hydrate-gcp is a bug. Looks like a bad merge conflict.

  • The acm-gcp rule should run nomos vet to check for validation errors.
  • kustomize-fns aren't available on the 1.1 branch so you need to use the master branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant