Debugging external-apps-ingress-controller in Prod cluster #29

dystewart · 2022-10-31T23:06:16Z

I've done some looking around and I'm seeing what may be a typo in the ingressController yaml file on this line

I'm assuming it's meant to be:

spec:
  domain: apps.openshift.nerc.mghpcc.org

As opposed to:

spec:
  domain: apps.shift.nerc.mghpcc.org

This may not be the only thing that needs changing seeing as there is also something going on with the node scheduling as seen in the conditions here, but it's a start.

The text was updated successfully, but these errors were encountered:

larsks · 2022-11-01T13:11:05Z

See my comment on the PR: the original value was correct (and you can verify that with a simple DNS query; compare the result of looking up foo.apps.shift.nerc.mghpcc.org with foo.apps.openshift.mghpcc.org).

larsks · 2022-11-01T13:14:10Z

And to avoid some additional confusion: there is some overlap between this issue and with #16. This issue is supposed to be "Why isn't the external ingress controller running?"

dystewart · 2022-11-08T17:21:40Z

It looks like the reason the external ingressController is not creating is because the pods are not schedule-able (See the PodsScheduled status at: here

I quickly looked through the nodes available to the prod cluster and I don't see any labels (zone: external) which as you can see in the ingressController yaml, it's looking for

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  name: external-apps-ingress-controller
  namespace: openshift-ingress-operator
spec:
  domain: apps.shift.nerc.mghpcc.org
  defaultCertificate:
    name: external-apps-ingress-certificate
  endpointPublishingStrategy:
    type: LoadBalancerService
    loadBalancer:
      scope: External
  nodePlacement:
    nodeSelector:
      matchLabels:
        zone: external
  namespaceSelector:
    matchLabels:
      type: external

On several of the prod cluster's worker nodes there is however a label (nerc.mghpcc.org/external-ingress: 'true') which appears to be the label we are actually looking for. I think the namespace label may become an issue as well but the main error is related to the nodeselector so I'm going to open a PR to see about making this change and will link it below.

dystewart · 2022-11-10T22:23:53Z

The patch in OCP-on-NERC/nerc-ocp-config#42 did not result in any change to the error status of the ingressController at: https://console-openshift-console.apps.nerc-ocp-prod.rc.fas.harvard.edu/k8s/ns/openshift-ingress-operator/operator.openshift.iov1IngressController/external-apps-ingress-controller/.

I also have determined that the namespaceSelector field, shouldn't be the root of the problem since

namespaceSelector: This selects particular namespaces for which all Pods should be allowed as ingress sources or egress destinations

and so this should have no impact on the ingressController pods being scheduled.
Continuing to look into the issue...

dystewart · 2022-11-11T19:25:32Z

After deleting the external-apps-ingress-controller ingressController in the prod cluster, and recreating it via an argoCD sync, the PR to update the nodeSelector fields does appear to have worked as the ingressContoller is reporting that pods are now scheduled. There is still something going on here though bc we're still seeing 0/2 replicas available

dystewart · 2022-11-18T16:20:30Z

OCP-on-NERC/nerc-ocp-config#157

dystewart · 2022-11-21T19:19:43Z

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Debugging external-apps-ingress-controller in Prod cluster #29

Debugging external-apps-ingress-controller in Prod cluster #29

dystewart commented Oct 31, 2022

larsks commented Nov 1, 2022

larsks commented Nov 1, 2022

dystewart commented Nov 8, 2022

dystewart commented Nov 10, 2022

dystewart commented Nov 11, 2022

dystewart commented Nov 18, 2022

dystewart commented Nov 21, 2022

larsks commented Nov 30, 2022

Debugging external-apps-ingress-controller in Prod cluster #29

Debugging external-apps-ingress-controller in Prod cluster #29

Comments

dystewart commented Oct 31, 2022

larsks commented Nov 1, 2022

larsks commented Nov 1, 2022

dystewart commented Nov 8, 2022

dystewart commented Nov 10, 2022

dystewart commented Nov 11, 2022

dystewart commented Nov 18, 2022

dystewart commented Nov 21, 2022

larsks commented Nov 30, 2022