Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DNS entry for ovsdbserver-sb- services #154

Merged

Conversation

averdagu
Copy link
Contributor

@averdagu averdagu commented Nov 10, 2023

When the ovsdbserver-sb pod gets deleted we can't ensure that it will be recreated with the same internalapi IP, since that IP is popullated to the EDPM nodes during ansibleee-deployment_phase, if the IP changes during reboot of the pod EDPM won't know until user retriggers ansibleee-deployment_phase. This will mean that meanwhile ovn_controller and neutron-ovn-metadata won't have connectivity to the SB DB.

In order to fix this instead of using a string of IPs on the ovn-remote a single DNS entry will be used. Every service will add two entries to the openstack-dnsmasq:

  • ovsdbserver-sb-x.openstack.svc (where x = replica/service number)
  • ovsdbserver-sb.openstack.svc

The last one will be the one popullated to the EDPM node, as querying it will return one IP from all the SB pods initialized at that moment (dns will use sequential round-robin to fulfill the request).

TODO:

  • Fix "should delete the external config map when networkAttachment is detached from SB DB". It is failing since when NAD is detached from pod, GetExternalEndpoint still returns dns_hostname
  • Fix "OVNController - should create an external ConfigMap with expected key-value pairs and OwnerReferences set" test
  • Rebase
  • Improve CreateOVNDBCluster nad to string
  • Get rid of commented code/useless comments
  • Ensure all checks are good to go
  • Adapt new tests (added during rebase)
  • Fix kuttl tests
  • Modify where internalDbAddress is set (needs to be set once service is created)

Depends-On: openstack-k8s-operators/install_yamls#686

Copy link
Contributor

openshift-ci bot commented Nov 10, 2023

Hi @averdagu. Thanks for your PR.

I'm waiting for a openstack-k8s-operators member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Contributor

@booxter booxter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a drive-by, incomplete review since it's a draft PR for now. I am happy to hear this seems to work. I hope the initial comments may be useful in your polishing work. Kudos!

controllers/ovndbcluster_controller.go Outdated Show resolved Hide resolved
controllers/ovndbcluster_controller.go Outdated Show resolved Hide resolved
controllers/ovndbcluster_controller.go Outdated Show resolved Hide resolved
@olliewalsh olliewalsh self-requested a review November 10, 2023 22:49
api/v1beta1/ovndbcluster_types.go Outdated Show resolved Hide resolved
controllers/ovndbcluster_controller.go Outdated Show resolved Hide resolved
controllers/ovndbcluster_controller.go Outdated Show resolved Hide resolved
controllers/ovndbcluster_controller.go Outdated Show resolved Hide resolved
@averdagu averdagu force-pushed the feat/sbdb-dns branch 13 times, most recently from 3f4ac62 to bcb7341 Compare November 15, 2023 09:51
@averdagu averdagu force-pushed the feat/sbdb-dns branch 5 times, most recently from 09ceab7 to cc683aa Compare November 21, 2023 11:55
@averdagu averdagu force-pushed the feat/sbdb-dns branch 4 times, most recently from cc9a5a0 to f7ce6d7 Compare November 24, 2023 10:54
return instance.Status.DBAddress, nil
//return instance.Status.DBAddress, nil
//return "tcp:ovsdbserver-sb.openstack.svc:6642", nil
dns_hostname := "tcp:ovsdbserver-sb." + instance.Namespace + ".svc:6642"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So with this as all the places going to be using dns name would be good to update Status.DBAddress and Status.InternalDBAddress itself for both NB/SB DBs?

I recall from some discussion .cluster.local suffix was required for connecting to the db endpoints, /me don't know why though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think it could be good to update DBAddres (with ovsdbserver-sb." + instance.Namespace + ".svc) and InternalDBAddress with "ovsdbserver-sb." + instance.Namespace + ".svc.cluster.local" (/me needs to look exactly how coreDNS sets DNS)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please update the fields in db.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recall from some discussion .cluster.local suffix was required for connecting to the db endpoints, /me don't know why though

OVS DNS support (openvswitch/ovs@4d55a36 but I expect it's unbound that doesn't support this) doesn't look at search_domains in /etc/resolv.conf so must use fully-qualified DNS names. Also looks like it only queries A/AAAA records so CNAME wouldn't work either.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default, even when pulling in resolv.conf, it doesn't appear to append search domains.

>>> import unbound
>>> ctx = unbound.ub_ctx()
>>> ctx.resolvconf("/etc/resolv.conf")
0
>>> status, result = ctx.resolve("source")
>>> result.havedata
0
>>> status, result = ctx.resolve("source.redhat.com", unbound.RR_TYPE_A, unbound.RR_CLASS_IN)
>>> result.havedata
1
>>> result.data.address_list
['52.247.118.198']

As far as CNAMEs, that should be fine. www.redhat.com is a CNAME and:

>>> status, result = ctx.resolve("www.redhat.com", unbound.RR_TYPE_A)
>>> result.data.address_list
['96.16.244.143']
>>> status, result = ctx.resolve("www.redhat.com", unbound.RR_TYPE_AAAA)
>>> result.data.address_list
['38.0.20.4.100.0.22.150.0.0.0.0.0.0.13.68', '38.0.20.4.100.0.22.137.0.0.0.0.0.0.13.68']

It may be configurable via an unbound.conf to set a search domain, and we can pass those via an environment variable. But it seems like a good practice to use the FQDN.

controllers/ovndbcluster_controller.go Outdated Show resolved Hide resolved
controllers/ovndbcluster_controller.go Outdated Show resolved Hide resolved
controllers/ovndbcluster_controller.go Outdated Show resolved Hide resolved
tests/functional/ovndbcluster_controller_test.go Outdated Show resolved Hide resolved
var dnsName string
var dnsIP string
var hostnames []string
dnsName = "dns-" + ovnPod.Name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be better to create 1 dnsdata per DBCluster instead of per pod?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may complicate logic when scaling up/down. I can't see any benefit of having only 1 DNSData per cluster instead per service.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With one per pod will scale down need explicit logic to delete the DNSData?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, would need new logic to retrieve the DNSData, iterate over all hosts and delete those which had been scaled-down and update the resource.
Currently I just delete the DNSData CR associated to the scaled-down pod/service

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this case i.e with keeping 1 dnsdata per DBCluster, i think not required to fetch/delete but can just update dnsdata with current set of pods in case of scale up/down.

controllers/ovndbcluster_controller.go Show resolved Hide resolved
@booxter
Copy link
Contributor

booxter commented Jan 11, 2024

oops, it was not my intention to push anything into your repo (why do I even have permissions there?) I will revert to original.

Copy link
Contributor

@booxter booxter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is very close, but some code cleanup is still due.

One major point of contention is, as Yatin pointed out, - it's unclear why we even maintain -%d DNS records for each pod of the cluster. Seems redundant (and hence belongs to the chopping block).

There were some comments of mine from the previous iterations that are not addressed. They are not marked Completed so it should be easy to find them. Let me know if you can't.

controllers/ovndbcluster_controller.go Outdated Show resolved Hide resolved
controllers/ovndbcluster_controller.go Outdated Show resolved Hide resolved
pkg/ovndbcluster/dnsdata.go Outdated Show resolved Hide resolved
pkg/ovndbcluster/dnsdata.go Outdated Show resolved Hide resolved
headlessDnsHostname = ServiceNameNB + "." + instance.Namespace + ".svc"
}
if dbAddress == "" {
// Since DNS is used for DBAddress value will be the same for each iteration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"iteration"? I don't think it makes sense in the context of the helper. Let the caller avoid busy work by calling to the helper once; the helper is a pure function returning the result based on input, it doesn't care how you call it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what I think would be helpful to document is why you think Ports[0] is the right thing to do. (There is some assumption here, that only one port is set per service - which is probably true, but I'd like to know why it's true and why it's guaranteed to be true.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I move this comment outside of dnsdata.go, and it makes sense again (why we only set once dbAddress on loop).
Regarding Ports[0] is something that it was done before, I should talk to @karelyatin for that

pkg/ovndbcluster/dnsdata.go Show resolved Hide resolved
pkg/ovndbcluster/dnsdata.go Outdated Show resolved Hide resolved
pkg/ovndbcluster/dnsdata.go Outdated Show resolved Hide resolved
tests/functional/base_test.go Outdated Show resolved Hide resolved
tests/functional/ovndbcluster_controller_test.go Outdated Show resolved Hide resolved
@booxter
Copy link
Contributor

booxter commented Jan 11, 2024

kuttl failures still seem to suggest that the kind for DNSData is not known to the operator controller. This will have to be explored.

@karelyatin
Copy link
Contributor

kuttl failures still seem to suggest that the kind for DNSData is not known to the operator controller. This will have to be explored.

openstack-k8s-operators/install_yamls#686 will handle that. I posted some comments there, once it's ready i think can do Depends-On and that should work

averdagu added a commit to averdagu/install_yamls that referenced this pull request Jan 12, 2024
With the OVN DNS PR
(openstack-k8s-operators/ovn-operator#154)
OVN consumes the infranetworkv1.DNSData CDR so the infra operator
is needed to run the OVN kuttl tests.
@booxter
Copy link
Contributor

booxter commented Jan 13, 2024

/test ovn-operator-build-deploy-kuttl

Copy link
Contributor

@booxter booxter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some nits, nothing major. I am ready to merge it once CI is happy.

controllers/ovndbcluster_controller.go Outdated Show resolved Hide resolved
controllers/ovndbcluster_controller.go Outdated Show resolved Hide resolved
if nad != "" {
ovndbcluster.Status.DBAddress = dbaddr
endpoint := ""
// If cluster NAD is empty, ExtenralEndpoint will never be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extenral -> External

ovndbcluster.Status.DBAddress = dbaddr
endpoint := ""
// If cluster NAD is empty, ExtenralEndpoint will never be
// set, instead look at internalEndpoint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should check "instead". internal can always be checked; external when nad is set.

func GetDNSDataHostnameIP(dnsDataName string, namespace string, dnsHostname string) string {
dnsEntry := GetDNSData(types.NamespacedName{Name: dnsDataName, Namespace: namespace})
// DNSData contains Hosts (an array), this array will contain
// ovsdbserver-sb-0.ns.svc and ovsdbserver-sb.ns.svc, need to get
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is no longer correct, right? you removed -%d records?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, forgot to remove comment, good catch

@karelyatin
Copy link
Contributor

kuttl failures still seem to suggest that the kind for DNSData is not known to the operator controller. This will have to be explored.

openstack-k8s-operators/install_yamls#686 will handle that. I posted some comments there, once it's ready i think can do Depends-On and that should work

Depends-On not working with kuttl job, i validated this locally and approved other patch

controllers/ovndbcluster_controller.go Outdated Show resolved Hide resolved
pkg/ovndbcluster/dnsdata.go Show resolved Hide resolved
pkg/ovndbcluster/dnsdata.go Outdated Show resolved Hide resolved
pkg/ovndbcluster/dnsdata.go Outdated Show resolved Hide resolved
}
// Delete DNS records for deleted services/pods
namespace := helper.GetBeforeObject().GetNamespace()
err = deleteDNSData(ctx, helper, fullServiceName, namespace)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the dnsdata no longer exists for fulServiceName so this call and definition of function deleteDNSData not needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it does exist.
For each service a DNSData will be created. Inside this DNSData entry it will be the resolution ovsdbserver-sb.openstack.svc -> SB_POD_INTERNALAPI_IP
But at the end you'll have the same number of entries as services. What it was deleted was the resolution ovsdbserver-sb-X.openstack.svc -> SB_POD_INTERNALAPI_IP

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okk i expected it to be single dnsdata per service from past discussion, keeping dnsdata per service just looks unnecessary at this point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's something that can be done, but it would miss dp3. Can create jira to do a follow up if necessary

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok can be checked/done later

controllers/ovndbcluster_controller.go Outdated Show resolved Hide resolved
} else {
headlessDnsHostname = ServiceNameSB + "." + instance.Namespace + ".svc"
}
return fmt.Sprintf("tcp:%s:%d", headlessDnsHostname, svc.Spec.Ports[0].Port)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same seems can be done for port instead of passing full service object.

ctx,
helper,
svc,
dnsIP,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be fixed as not handling ips from multiple replica, just the ip of last pod will persist.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't, This will create a DNSData object and it will call CreateOrPatch. So since this is called on every loop iteration, there will be N objects of DNSData, each one resolving that pod's IP to ovsdbserver-(nb|sb).namespace.svc

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, reading it again got it as those created with podname. But looks unnecessary to maintain those extra dnsdata per pod. If it could be avoid it's good else can also cleanup later if others are ok.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok can be checked/done later

controllers/ovndbcluster_controller.go Show resolved Hide resolved
controllers/ovndbcluster_controller.go Outdated Show resolved Hide resolved
@averdagu averdagu force-pushed the feat/sbdb-dns branch 2 times, most recently from 10ff92f to e754613 Compare January 15, 2024 11:46
When the ovsdbserver-sb/nb pod gets deleted we can't ensure that it will
be recreated with the same internalapi IP, since that IP is popullated
to the EDPM nodes during ansibleee-deployment_phase, if the IP changes
during reboot of the pod EDPM won't know until user retriggers
ansibleee-deployment_phase. This will mean that meanwhile ovn_controller
and neutron-ovn-metadata won't have connectivity to the SB DB.

In order to fix this instead of using a string of IPs on the ovn-remote
a single DNS entry will be used. Every service will add an entries to
the openstack-dnsmasq:
 - ovsdbserver-xb.openstack.svc

The last one will be the one popullated to the EDPM node, as querying
it will return one IP from all the SB pods initialized at that moment
(dns will use sequential round-robin to fulfill the request).

Depends-on: openstack-k8s-operators/install_yamls#686
Resolves: OSPRH-660
@booxter
Copy link
Contributor

booxter commented Jan 16, 2024

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Jan 16, 2024
Copy link
Contributor

@karelyatin karelyatin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Copy link
Contributor

openshift-ci bot commented Jan 16, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: averdagu, karelyatin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit bd7f402 into openstack-k8s-operators:main Jan 16, 2024
5 of 6 checks passed
cschwede added a commit to cschwede/swift-operator that referenced this pull request Feb 9, 2024
Storage service hostnames need to be resolvable on the dataplane nodes
as well. They are used within the Swift rings to be able to allow node
IP changes without updating and redistributing Swift rings.

This patch creates DNSData entries for every storage service pod,
similar to [1]. It only creates DNS for the storage network, as this is
the only one that should be used within the storage backend services.

There is no delete function to scale down, as we don't support scaling
down for SwiftStorage instances. However, the created DNSData CRs are
owned by the SwiftStorage instance, thus being deleted properly if the
instance is deleted.

[1] openstack-k8s-operators/ovn-operator#154
cschwede added a commit to cschwede/swift-operator that referenced this pull request Feb 12, 2024
Storage service hostnames need to be resolvable on the dataplane nodes
as well. They are used within the Swift rings to be able to allow node
IP changes without updating and redistributing Swift rings.

This patch creates DNSData entries for every storage service pod,
similar to [1]. It only creates DNS for the storage network, as this is
the only one that should be used within the storage backend services.

There is no delete function to scale down, as we don't support scaling
down for SwiftStorage instances. However, the created DNSData CRs are
owned by the SwiftStorage instance, thus being deleted properly if the
instance is deleted.

[1] openstack-k8s-operators/ovn-operator#154
cschwede added a commit to cschwede/swift-operator that referenced this pull request Feb 12, 2024
Storage service hostnames need to be resolvable on the dataplane nodes
as well. They are used within the Swift rings to be able to allow node
IP changes without updating and redistributing Swift rings.

This patch creates DNSData entries for every storage service pod,
similar to [1]. It only creates DNS for the storage network, as this is
the only one that should be used within the storage backend services.

There is no delete function to scale down, as we don't support scaling
down for SwiftStorage instances. However, the created DNSData CRs are
owned by the SwiftStorage instance, thus being deleted properly if the
instance is deleted.

[1] openstack-k8s-operators/ovn-operator#154
cschwede added a commit to cschwede/swift-operator that referenced this pull request Feb 12, 2024
Storage service hostnames need to be resolvable on the dataplane nodes
as well. They are used within the Swift rings to be able to allow node
IP changes without updating and redistributing Swift rings.

This patch creates DNSData entries for every storage service pod,
similar to [1]. It only creates DNS for the storage network, as this is
the only one that should be used within the storage backend services.

There is no delete function to scale down, as we don't support scaling
down for SwiftStorage instances. However, the created DNSData CRs are
owned by the SwiftStorage instance, thus being deleted properly if the
instance is deleted.

[1] openstack-k8s-operators/ovn-operator#154
cschwede added a commit to cschwede/swift-operator that referenced this pull request Feb 15, 2024
Storage service hostnames need to be resolvable on the dataplane nodes
as well. They are used within the Swift rings to be able to allow node
IP changes without updating and redistributing Swift rings.

This patch creates DNSData entries for every storage service pod,
similar to [1]. It only creates DNS for the storage network, as this is
the only one that should be used within the storage backend services.

There is no delete function to scale down, as we don't support scaling
down for SwiftStorage instances. However, the created DNSData CRs are
owned by the SwiftStorage instance, thus being deleted properly if the
instance is deleted.

[1] openstack-k8s-operators/ovn-operator#154
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants