Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configure VCH DNS server fails to remove non-persistent assigned.dns entries #7775

Closed
mhagen-vmware opened this issue Apr 18, 2018 · 8 comments
Assignees
Labels
kind/defect Behavior that is inconsistent with what's intended priority/p1 team/foundation

Comments

@mhagen-vmware
Copy link
Contributor

mhagen-vmware commented Apr 18, 2018

Seen here:
https://ci-vic.vmware.com/vmware/vic/18515
https://ci-vic.vmware.com/vmware/vic/18484
https://ci-vic.vmware.com/vmware/vic/18487

Configure VCH DNS server                                              | FAIL |
Keyword 'Wait For DNS Update' failed after retrying 10 times. The last error was: '    
guestinfo.vice./init/networks|client/network/dns:                               1
    guestinfo.vice./init/networks|client/network/dns|0:                             AAAAAAAAAAAAAP//CnZRAQ==
    guestinfo.vice./init/networks|client/network/dns|1:                             AAAAAAAAAAAAAP//CnZRAg==
    guestinfo.vice./init/networks|management/network/dns:                           1
    guestinfo.vice./init/networks|management/network/dns|0:                         AAAAAAAAAAAAAP//CnZRAQ==
    guestinfo.vice./init/networks|management/network/dns|1:                         AAAAAAAAAAAAAP//CnZRAg==
    guestinfo.vice./init/networks|public/network/dns:                               1
    guestinfo.vice./init/networks|public/network/dns|0:                             AAAAAAAAAAAAAP//CnZRAQ==
    guestinfo.vice./init/networks|public/network/dns|1:                             AAAAAAAAAAAAAP//CnZRAg==
    guestinfo.vice..init.networks|client.network.assigned.dns:                      2
    guestinfo.vice..init.networks|client.network.assigned.dns|0:                    Co4HFQ==
    guestinfo.vice..init.networks|client.network.assigned.dns|1:                    Co4HFg==
    guestinfo.vice..init.networks|client.network.assigned.dns|2:                    CqYRWg==
    guestinfo.vice..init.networks|management.network.assigned.dns:                  2
    guestinfo.vice..init.networks|management.network.assigned.dns|0:                Co4HFQ==
    guestinfo.vice..init.networks|management.network.assigned.dns|1:                Co4HFg==
    guestinfo.vice..init.networks|management.network.assigned.dns|2:                CqYRWg==
    guestinfo.vice..init.networks|public.network.assigned.dns:                      2
    guestinfo.vice..init.networks|public.network.assigned.dns|0:                    Co4HFQ==
    guestinfo.vice..init.networks|public.network.assigned.dns|1:                    Co4HFg==
    guestinfo.vice..init.networks|public.network.assigned.dns|2:                    CqYRWg==' contains 'assigned.dns'

This is easily reproducible on our 6.5 HaaS test systems, seems to pass on the 6.7 Nimbus systems.

On further experimentation, power cycling the VCH one more time resolves the issue. The assigned.dns field appears to be a non-persistent VM guestinfo field, so it is not written to the vmx file and it is populated by the portlayer on power on?

The test did change recently here:
5e13301

But all I did was try to cut down on a race condition by waiting longer, which apparently didn't fix it as the original failure is actually the same. This appears be the first instance of this failure:
https://ci-vic.vmware.com/vmware/vic/18416/7

but this error also only shows up in HaaS runs so most of the merges to master prior to that ran on Nimbus.

@mhagen-vmware mhagen-vmware added kind/defect Behavior that is inconsistent with what's intended priority/p0 team/foundation labels Apr 18, 2018
@mhagen-vmware
Copy link
Contributor Author

This commit changed the power on behavior, maybe a candidate for the regression:
16653ff

@mhagen-vmware
Copy link
Contributor Author

I have narrowed it down to what I believe is an environmental change. I went back to the same code, test and VC that passed here:
https://ci-vic.vmware.com/vmware/vic/18306/7

TEST_URL=10.158.214.103
commit e6e9ada
build: https://storage.googleapis.com/vic-engine-builds/vic_18306.tar.gz

And it is now failing, so the only thing I can conclude is something changed with the env between then and now.

@mhagen-vmware
Copy link
Contributor Author

I do not know what could have changed in the environment that would cause this kind of failure though.

@mhagen-vmware
Copy link
Contributor Author

mhagen-vmware commented Apr 18, 2018

I can easily reproduce this manually:

  1. install VCH
  2. vic-machine-configure --dns-server 10.118.81.1
  3. govc vm.info -e virtual-container-host | grep dns <--- still shows assigned.dns, why the test is failing
  4. reboot the VCH
  5. govc vm.info -e virtual-container-host | grep dns <--- now shows the expected dns settings

@mdubya66 mdubya66 added component/test Tests not covered by a more specific component label and removed kind/defect Behavior that is inconsistent with what's intended labels Apr 18, 2018
@hickeng
Copy link
Member

hickeng commented Apr 19, 2018

The updateEndpoint function sets these assigned.dns fields (Assigned.Nameservers in the structure):

func updateEndpoint(newIP *net.IPNet, endpoint *NetworkEndpoint) {

As written, the only time there will not be a match for assigned.dns is if both:
a. --dns-server was not specified for vic-machine
b. none of the interfaces are DHCP or the DHCP server is not supplying DNS servers.

I think the test is supposed to be checking for network/dns and shouldn't be checking assigned/dns at all. I suspect that the only reason this has been working is because we weren't actually waiting for the VCH to initialize so there is a period of time where we can see the static portion of the config but without the runtime portion (the assigned fields).

This speculation would require that vic-machine configure not be blocking until the VCH has re-initialized after power on. If we're only failing on HaaS that would imply the VCH initialization is much faster in that environment vs nimbus.

@mhagen-vmware mhagen-vmware added kind/defect Behavior that is inconsistent with what's intended and removed component/test Tests not covered by a more specific component label labels Apr 23, 2018
@mdubya66 mdubya66 added impact/test/integration Requires creation of or changes to an integration test and removed impact/test/integration Requires creation of or changes to an integration test labels Apr 23, 2018
@cgtexmex cgtexmex self-assigned this Apr 23, 2018
@cgtexmex cgtexmex added this to the Sprint 30 Foundation milestone Apr 23, 2018
@mdubya66 mdubya66 assigned vburenin and unassigned cgtexmex Apr 24, 2018
@hickeng
Copy link
Member

hickeng commented Apr 24, 2018

Checking the endpointVM tether.debug we have the following eventually:

Apr 18 2018 06:06:27.817Z INFO  Added nameservers: [10.118.81.1 10.118.81.2 10.142.7.21 10.142.7.22 10.166.17.90]
Apr 18 2018 06:06:27.818Z DEBUG &{Mutex:{state:1 sema:0} EntryConsumer:<nil> dirty:true path:/.tether/etc/resolv.conf nameservers:[[0 0 0 0 0 0 0 0 0 0 255 255 10 118 81 1] [0 0 0 0 0 0 0 0 0 0 255 255 10 118 81 2] [10 142 7 21] [10 142 7 22] [10 166 17 90]] timeout:15000000000 attempts:5}
Apr 18 2018 06:06:27.821Z DEBUG &{lines:[nameserver 10.118.81.1 nameserver 10.118.81.2 nameserver 10.142.7.21 nameserver 10.142.7.22 nameserver 10.166.17.90 options timeout:15 options attempts:5] i:0}
Apr 18 2018 06:06:27.823Z DEBUG writing "nameserver 10.118.81.1\n" to /.tether/etc/resolv.conf
Apr 18 2018 06:06:27.824Z DEBUG writing "nameserver 10.118.81.2\n" to /.tether/etc/resolv.conf
Apr 18 2018 06:06:27.825Z DEBUG writing "nameserver 10.142.7.21\n" to /.tether/etc/resolv.conf
Apr 18 2018 06:06:27.826Z DEBUG writing "nameserver 10.142.7.22\n" to /.tether/etc/resolv.conf
Apr 18 2018 06:06:27.827Z DEBUG writing "nameserver 10.166.17.90\n" to /.tether/etc/resolv.conf
Apr 18 2018 06:06:27.828Z DEBUG writing "options timeout:15\n" to /.tether/etc/resolv.conf
Apr 18 2018 06:06:27.829Z DEBUG writing "options attempts:5\n" to /.tether/etc/resolv.conf
Apr 18 2018 06:06:27.830Z INFO  unmounting /etc/resolv.conf

This happens after DHCP address acquisition which is where the additional nameservers come from

@vburenin
Copy link
Contributor

According to the VIC code comment and logic having DHCP provided DNS server in the list is a legit behavior I think we should keep it there wven though I am disagree with this. Since I may want to use only specific DNS servers not those that might also be provided via DHCP.
I have updated test to look into /etc/resolve.conf file to check if --dns-server IPs are ended up to be in that file instead of looking into govc vm.info.

@vburenin
Copy link
Contributor

I have changed a behavior of some tether code. We no longer combine manually set DNS and DHCP provided addresses. If DNS is manually set, we only going to use manual settings even if DHCP supplies DNS config.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/defect Behavior that is inconsistent with what's intended priority/p1 team/foundation
Projects
None yet
Development

No branches or pull requests

6 participants