Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix defaults for node_selector and tolerations #205

Merged

Conversation

dmarav
Copy link
Contributor

@dmarav dmarav commented Jun 2, 2024

When setting node_selector and tolerations values for all deployments/staefulsets, the playbook in the operator fails with the following error:

unsupported operand type(s) for +=: 'dict' and 'str'

The CRD defines the tolerations as an array, and the node_selector as an object.

The defaults for the roles are empty strings, so when trying to combine the <component> and _<component> dictionaries to render the final configuration, a string is added to an array/dict and I get the aforementioned error.

It can be reproduced with this manifest using operator version 1.02:

apiVersion: eda.ansible.com/v1alpha1
kind: EDA
metadata:
  name: eda-demo
  namespace: eda
spec:
  no_log: false
  automation_server_url: http://awx-service.awx
  image: quay.io/ansible/eda-server
  image_version: main
  image_web: quay.io/ansible/eda-ui
  image_web_version: main
  ipv6_disabled: true
  ingress_type: ingress
  ingress_path: "/"
  ingress_path_type: Prefix
  hostname: my-eda.example.com
  api:
    node_selector:
      system/dedicated: my-node-group
    tolerations:
      - key: "system/dedicated"
        operator: "Equal"
        value: "my-node-group"
        effect: "NoSchedule"
  ui:
    node_selector:
      system/dedicated: my-node-group
    tolerations:
      - key: "system/dedicated"
        operator: "Equal"
        value: "my-node-group"
        effect: "NoSchedule"
  scheduler:
    node_selector:
      system/dedicated: my-node-group
    tolerations:
      - key: "system/dedicated"
        operator: "Equal"
        value: "my-node-group"
        effect: "NoSchedule"
  default_worker:
    node_selector:
      system/dedicated: my-node-group
    tolerations:
      - key: "system/dedicated"
        operator: "Equal"
        value: "my-node-group"
        effect: "NoSchedule"
  activation_worker:
    node_selector:
      system/dedicated: my-node-group
    tolerations:
      - key: "system/dedicated"
        operator: "Equal"
        value: "my-node-group"
        effect: "NoSchedule"
  redis:
    node_selector:
      system/dedicated: my-node-group
    tolerations:
      - key: "system/dedicated"
        operator: "Equal"
        value: "my-node-group"
        effect: "NoSchedule"
  database:
    node_selector:
      system/dedicated: my-node-group
    tolerations:
      - key: "system/dedicated"
        operator: "Equal"
        value: "my-node-group"
        effect: "NoSchedule"

@dmarav dmarav marked this pull request as draft June 2, 2024 09:34
@dmarav dmarav marked this pull request as ready for review June 2, 2024 11:50
@rooftopcellist
Copy link
Member

@dmarav early on, in the awx-operator, we naively used a string value for this and documented specifying the values like this:

Which should also work for the eda-server-operator.

We plan to cut a new apiVersion for the awx-operator with improvements like these in the coming year. I think it we change it now without backwards compat, it may break users who have been defining it as a string on the CR spec...

I think we should probably leave this PR open until we do the new apiVersion for both AWX and EDA...

cc @TheRealHaoLiu @rcarrillocruz re: new apiVersion

@dmarav
Copy link
Contributor Author

dmarav commented Jun 4, 2024

Hi @rooftopcellist! Thanks for the quick response!

@dmarav early on, in the awx-operator, we naively used a string value for this and documented specifying the values like this:

This was my first approach, to set the tolerations and node_selector as string, since that's how it was defined in the awx-operator, but in the CRD node_selector is defined as a dict and tolerations is defined as a list of dicts.

Which should also work for the eda-server-operator.

It doesn't because the CRD expects one type, and the defaults set in the roles is string, so when you combine the defaults with the user provided values, ansible fails with the error in my first comment. It cannot combine a dictionary/list with a string.

With this manifest:

---
apiVersion: eda.ansible.com/v1alpha1
kind: EDA
metadata:
  name: eda-dbops
  namespace: eda
spec:
  no_log: false
  automation_server_url: http://awx-service.awx
  image: quay.io/ansible/eda-server
  image_version: main
  image_web: quay.io/ansible/eda-ui
  image_web_version: main
  ipv6_disabled: true
  ingress_type: ingress
  ingress_path: "/"
  ingress_path_type: Prefix
  hostname: my-eda.example.com
  ui:
    node_selector: |
      system/dedicated: my-node-group
    tolerations: |
      - key: "system/dedicated"
        operator: "Equal"
        value: "my-node-group"
        effect: "NoSchedule"

I get this response from kubernetes

# kubectl -n eda apply -f  eda-test-node-selector.yml
The EDA "eda-dbops" is invalid:
* spec.ui.tolerations: Invalid value: "string": spec.ui.tolerations in body must be of type array: "string"
* spec.ui.node_selector: Invalid value: "string": spec.ui.node_selector in body must be of type object: "string"

@rooftopcellist
Copy link
Member

@dmarav you are absolutely right. The CRD schema expects object/dict for these settings, which does not line up with how that string is later templated into the deployment yaml.

I agree with your approach here and believe that we should standardize all of the ansible related operator around having node_selector etc. be a dict rather than a string (which necessitated the confusing | syntax).

I plan to test this out this week and get it merged. For awx-operator, we'll need to wait until we cut a new apiVersion, which is something we are planning to tackle in the coming months. Thanks for the contribution!

@rooftopcellist
Copy link
Member

It would be great to have some docs for how to use these in this repo, particularly because it will be different from the awx-operator for a time. Help appreciated there, but I won't let that block this PR from merging.

@dmarav
Copy link
Contributor Author

dmarav commented Jun 30, 2024

Hi @rooftopcellist! Should I create a separate PR for updating the docs and link to this one, or just update the docs in this one?

@rooftopcellist rooftopcellist merged commit 12998d2 into ansible:main Jul 2, 2024
4 checks passed
@rooftopcellist
Copy link
Member

@dmarav I just merged, sorry for the delay. If you can open a separate PR for docs on how to use this, that would be wonderful!

@dmarav dmarav deleted the node-selector-tolerations-default-values branch July 4, 2024 06:22
@dmarav
Copy link
Contributor Author

dmarav commented Jul 4, 2024

@dmarav I just merged, sorry for the delay. If you can open a separate PR for docs on how to use this, that would be wonderful!

@rooftopcellist I just created PR #214 with the documentation. I copied the same format/structure as in the AWX operator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants