Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vsphere ipi: set hostname using vmtoolsd and VM extra config #1579

Merged

Conversation

jcpowermac
Copy link
Contributor

This commit adds two new files:

  • A systemd unit that runs a shell script
  • A shell script that runs vmtoolsd to retrieve guestinfo.hostname
    from vm guest. Then uses hostnamectl to set hostname.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 24, 2020
@jcpowermac jcpowermac changed the title [wip] vsphere ipi: set hostname using vmtoolsd and VM extra config vsphere ipi: set hostname using vmtoolsd and VM extra config Mar 25, 2020
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 25, 2020
#!/usr/bin/env bash
set -e

if hostname=$(/bin/vmtoolsd --cmd 'info-get guestinfo.hostname'); then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this binary present in the RHCOS system today? should we inform RHCOS people that please don't remove this binary in later versions as we depend on it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah its in RHCOS today. I tested out all the commands prior to scripting and using systemd to execute.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not in FCOS though and we do want to eventually containerize it (once vmware stops fighting that).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cgwalters I have an example of running open-vm-tools in a container on fcos. Using @dav1x blog post as an example: https://developers.redhat.com/blog/2017/03/23/containerizing-open-vm-tools-part-1-the-dockerfile-and-constructing-a-systemd-unit-file/

It would be interesting if I could query guestinfo by using an exec but I have yet to try.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@LorbusChris LorbusChris Mar 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc we layer vmtoolsd into OKD's machine-os-content, so it'll be on master/worker nodes after bootstrapping the cluster.
Is this script and service also present on bootstrap hosts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LorbusChris nope I didn't add it to bootstrap. bootstrap is fine as localhost.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to do something similar to this when I converted the container to runc. The exec should work fine:

I changed the systemd unit file to call this shell script instead.

#!/bin/sh
if [ -f /tmp/open-vm-tools-bash-env ]
  then
    source /tmp/open-vm-tools-bash-env
fi

COMMAND=/usr/local/bin/vmware-toolbox-cmd
if [ ! -e $COMMAND ]
  then
    echo 'runc exec -t open-vm-tools vmware-toolbox-cmd "$@"' > /usr/local/bin/vmware-toolbox-cmd
    chmod +x /usr/local/bin/vmware-toolbox-cmd
fi
exec /usr/bin/VGAuthService -s &
exec /usr/bin/vmtoolsd

set -e

if hostname=$(/bin/vmtoolsd --cmd 'info-get guestinfo.hostname'); then
/usr/bin/hostnamectl --transient --static set-hostname ${hostname}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

 hostnamectl set-hostname --help
hostnamectl [OPTIONS...] COMMAND ...

Query or change system hostname.

  -h --help              Show this help
     --version           Show package version
     --no-ask-password   Do not prompt for password
  -H --host=[USER@]HOST  Operate on remote host
  -M --machine=CONTAINER Operate on local container
     --transient         Only set transient hostname
     --static            Only set static hostname
     --pretty            Only set pretty hostname

is providing both transient and static valid option?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep...tested and reviewed man page.

@abhinavdahiya
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 25, 2020
@runcom
Copy link
Member

runcom commented Mar 26, 2020

/approve
/retest

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 26, 2020
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@cgwalters
Copy link
Member

OK but...we've had a whole lot of extensive discussions about vmware and networking (which includes hostname settings for static IP cases).

See for example coreos/fedora-coreos-tracker#427 and coreos/ignition-dracut#156 as well as coreos/afterburn#379

A core problem with the hostname is that it's messy who owns it and sets it.

One option is to ensure we only run this if a hostname was not otherwise set (i.e. only set if it's localhost).

Does setting this value require the admin to do something specifically, or the installer will be setting it?

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@jcpowermac
Copy link
Contributor Author

jcpowermac commented Mar 26, 2020

Does setting this value require the admin to do something specifically, or the installer will be setting it?

For masters its the installer via terraform for workers its MAO

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

This commit adds two new files:
- A systemd unit that runs a shell script
- A shell script that runs vmtoolsd to retrieve `guestinfo.hostname`
from vm guest. Then uses hostnamectl to set hostname.
@jcpowermac jcpowermac force-pushed the vsphere_ipi_hostname branch from f2c656d to 41d45f0 Compare March 26, 2020 20:42
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Mar 26, 2020
@jcpowermac
Copy link
Contributor Author

@abhinavdahiya I added a check for localhost when you have a moment can you please /lgtm.

@cgwalters
Copy link
Member

Since this is IPI, the installer today can just update the Ignition config for each host to include /etc/hostname - this is a supported cross-platform way to set the hostname. I'd strongly prefer this unless there's a good reason the installer can't do this.

@cgwalters
Copy link
Member

/hold
per above

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 27, 2020
@jcpowermac
Copy link
Contributor Author

Since this is IPI, the installer today can just update the Ignition config for each host to include /etc/hostname - this is a supported cross-platform way to set the hostname. I'd strongly prefer this unless there's a good reason the installer can't do this.

MAO creates workers the installer does not.

@cgwalters
Copy link
Member

MAO creates workers the installer does not.

Right, though the installer generates the user data. But...it's true today we don't support having dynamic userdata from the MAO (which would really then directly boil down to machine-specific machineconfigs since the userdata just points to the MCS, which we do want for the MCO, but that problem then goes into identifying machines...)

OK so I think where we really want this code then is as part of
https://github.com/coreos/afterburn/
which already today supports hostname across various providers but it looks like we don't have VMWare yet. I think @lucab is working on it?

One continual problem is people add random "grab bag" things to the MCO that might otherwise go into other CoreOS components I think because it's a repository conveniently integrated with OpenShift CI whereas patching other things goes into the world of RPM pain.

OK though short term,
/hold cancel
because we can at least drop this code once the afterburn code lands.

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 27, 2020
@enxebre
Copy link
Member

enxebre commented Mar 27, 2020

cc @alexander-demichev

@jcpowermac
Copy link
Contributor Author

OK so I think where we really want this code then is as part of
https://github.com/coreos/afterburn/
which already today supports hostname across various providers but it looks like we don't have VMWare yet. I think @lucab is working on it?

Completely agree. When that is ready please let me know and we can switch it out.

@ashcrow
Copy link
Member

ashcrow commented Mar 27, 2020

OK so I think where we really want this code then is as part of
https://github.com/coreos/afterburn/
which already today supports hostname across various providers but it looks like we don't have VMWare yet. I think @lucab is working on it?

I agree 👍 I think afterburn makes sense or maybe nm-cloud-setup per previous conversations if this is a more general need.

@cgwalters
Copy link
Member

@abhinavdahiya
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 27, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavdahiya, jcpowermac, runcom

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@darkmuggle
Copy link
Contributor

To echo what @cgwalters said -- the way hostnames are set in Linux is really messing and its a landmine. If we have NM, Ignition (via a files stanza for /etc/hostname), Afterburn and the MCO, the debug burden starts to grow. Reducing the number of things that can act on the hostname is to our advantage, IMHO.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 68bf465 into openshift:master Mar 27, 2020
@openshift-ci-robot
Copy link
Contributor

@jcpowermac: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-aws-scaleup-rhel7 41d45f0 link /test e2e-aws-scaleup-rhel7

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@lucab
Copy link

lucab commented Mar 31, 2020

(Sorry for the delay, I was on PTO)

I'm unfortunately missing most context about this and there are no references in PR/commit, can anybody provide that?

In particular, before giving any meaningful directions, these are my doubts:

  • is this meant for either nodes with DHCP, or with static networking, or both?
  • if this is for nodes with DHCP, why does hostname Option 12 not work? If both are provided, which one is intended to be the valid one?
  • if this is for nodes with static networking, why does hostname kargs setting not work? If both are provided, which one is intended to be the valid one?
  • if the Ignition config specifies some /etc/hostname value, which setting is intended to be the valid one?
  • where does guestinfo.hostname come from? Is it something we just invented for OCP?
  • is the guestinfo allowed to change while a VM is running?
  • why is MCO a relevant place for this?

In general I agree with all the feedbacks above, this is troublesome because 1) it bypasses the (already too many) possible owners of hostname logic with a random script in an unrelated OCP repo 2) it also reverts the "Ignition as a single source of truth" for file config.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.