Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for using unique identifiers to select a network connection in environments where names can be ambiguous. #237

Closed
taylor-madeak opened this issue Nov 16, 2022 · 27 comments
Assignees
Milestone

Comments

@taylor-madeak
Copy link

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request.
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request.
If you are interested in working on this issue or have submitted a pull request, please leave a comment.

Description

NSX allows for creating port groups with the same name, even on the same virtual distributed switch. This plugin has a long history of issues with trying to create VMs on a vSphere cluster with such virtual distributed port groups managed by NSX overlays. VMware resolved this in govmomi 0.27 by allowing finder to use other unique identifiers to select a network:

// Network finds a NetworkReference using a Name, Inventory Path, ManagedObject ID, Logical Switch UUID or Segment ID.
// With standard vSphere networking, Portgroups cannot have the same name within the same network folder.
// With NSX, Portgroups can have the same name, even within the same Switch. In this case, using an inventory path
// results in a MultipleFoundError. A MOID, switch UUID or segment ID can be used instead, as both are unique.
// See also: https://kb.vmware.com/s/article/79872#Duplicate_names
// Examples:
// - Name:                "dvpg-1"
// - Inventory Path:      "vds-1/dvpg-1"
// - ManagedObject ID:    "DistributedVirtualPortgroup:dvportgroup-53"
// - Logical Switch UUID: "da2a59b8-2450-4cb2-b5cc-79c4c1d2144c"
// - Segment ID:          "/infra/segments/vnet_ce50e69b-1784-4a14-9206-ffd7f1f146f7"

To leverage this, I request the following:

  1. Update the minimum version of govmomi used by this plugin to at least 0.27.
  2. Possibly write additional network finder logic to allow users to use one or more of the other unique identifiers for the network, in addition to (or in place of) the network name or inventory path. Segment ID and UUID seem like a reasonable choices here, as they are both fairly readily available from the vCenter UI.

I'm not a GoLang developer, so I'm probably not a great judge of how heavy a lift this would be, but it appears that this may be as simple as just changing the version of govmomi this plugin is built with. The plugin itself looks like it just passes the context and argument straight through to govmomi.Finder.

Use Case(s)

Allow builder to be used in large vSphere environments that provide networking with NSX for scalability and mobility between clusters.

@tenthirtyam
Copy link
Collaborator

Note
The latest release of vmware/govmomi is 0.29.0.

@tenthirtyam
Copy link
Collaborator

tenthirtyam commented Nov 17, 2022

Note
The aforementioned enhancement was released in vmware/govmomi v0.27.1 which included vmware/govmomi@6209be5.

@taylor-madeak
Copy link
Author

FWIW: I can't seem to hit on the right set of arguments to make any of these alternate unique identifiers work with govc either. My attempts to test by building this plugin using the latest version of govmomi have also failed (though, admittedly I don't really know what I'm doing when it comes to GoLang).

@tenthirtyam
Copy link
Collaborator

At a minimum, you'll need to download the source, and from the source tree run:

go get http://github.com/vmware/govmomi
go mod tidy
go build

Then copy the binary to your packer.d/plugins and then run your tests.

I've spoken with the maintainer about us updating to v0.29.0. Ideally, this dependency should generally be done as an isolated chore(deps) pull request.

@tenthirtyam
Copy link
Collaborator

tenthirtyam commented Nov 28, 2022

PR #240 for vmware/[email protected].

@tenthirtyam
Copy link
Collaborator

Note
hashicorp/[email protected] is now released and includes vmware/[email protected].

@taylor-madeak
Copy link
Author

@tenthirtyam I saw that earlier today. Unfortunately, this plugin still doesn't seem to be able to find a network by a unique identifier other than its name.

Using the Segment ID:

2022/12/07 02:24:03 [INFO] (telemetry) Starting builder vsphere-iso.linux
2022/12/07 02:24:03 packer-plugin-vsphere_v1.1.1_x5.0_linux_amd64 plugin: 2022/12/07 02:24:03 No URLs were provided to Step Download. Continuing...
2022/12/07 02:24:03 packer-plugin-vsphere_v1.1.1_x5.0_linux_amd64 plugin: 2022/12/07 02:24:03 No CD files specified. CD disk will not be made.
2022/12/07 02:24:03 packer-plugin-vsphere_v1.1.1_x5.0_linux_amd64 plugin: 2022/12/07 02:24:03 No URLs were provided to Step Download. Continuing...
2022/12/07 02:24:03 packer-plugin-vsphere_v1.1.1_x5.0_linux_amd64 plugin: 2022/12/07 02:24:03 No CD files specified. CD disk will not be made.
2022/12/07 02:24:03 ui: ESC[1;32m==> vsphere-iso.linux: Creating VM...ESC[0m
2022/12/07 02:24:03 [INFO] (telemetry) ending vsphere-iso.linux
2022/12/07 02:24:03 ui error: ESC[1;31mBuild 'vsphere-iso.linux' errored after 432 milliseconds 468 microseconds: error creating vm: network '/infra/segments/977eab1d-1670-4b4e-9072-f71038385359' not foundESC[0m
2022/12/07 02:24:03 ui:
==> Wait completed after 435 milliseconds 270 microseconds

Using the MOID:

2022/12/07 02:38:58 [INFO] (telemetry) ending vsphere-iso.linux
2022/12/07 02:38:58 ui error: ESC[1;31mBuild 'vsphere-iso.linux' errored after 350 milliseconds 544 microseconds: error creating vm: network 'DistributedVirtualPortgroup:dvportgroup-16495' not foundESC[0m

I imagine the issue is somewhere in here:

func (d *VCenterDriver) FindNetwork(name string) (*Network, error) {
n, err := d.finder.Network(d.ctx, name)
if err != nil {
return nil, err
}
return &Network{
network: n,
driver: d,
}, nil
}
func (d *VCenterDriver) FindNetworks(name string) ([]*Network, error) {
ns, err := d.finder.NetworkList(d.ctx, name)
if err != nil {
return nil, err
}
var networks []*Network
for _, n := range ns {
networks = append(networks, &Network{
network: n,
driver: d,
})
}
return networks, nil
}

Does this plugin need to pass some additional information to govmomi Finder.Network for this to work correctly?

What kind of additional information can I provide that will help running this down?

@tenthirtyam
Copy link
Collaborator

tenthirtyam commented Dec 7, 2022

Based on a quick review it looks like the plugin should call finder.networkByID based on the network input of an ID vs name.

https://github.com/vmware/govmomi/blob/d99e99542ffe1e054b2da68fac48ee5ce2bd4987/find/finder.go#L823-L856

@taylor-madeak
Copy link
Author

It looks to me like the finder.Network method already falls back to calling the finder.networkByID method:

https://github.com/vmware/govmomi/blob/17e669d84193839acdbebe6aed5aea26b1c65d48/find/finder.go#L804-L821

This raises some additional questions:

  • Why isn't this working in my case?
  • How can this project test it?
  • Is this an issue with this plugin, or the underlying govmomi library?

That last question comes up because I can't get the search to work with govc either.

@tenthirtyam
Copy link
Collaborator

tenthirtyam commented Dec 7, 2022

It may be a good idea to open a GitHub Discussion item on vmware/govmomi if it appears to also be an upstream concern. It can be converted to an issue if it is a bug.

Note

I pinged one of the vmware/govmomi who has kindly commented below. 👇

@dougm
Copy link

dougm commented Dec 8, 2022

Are you able to find the network with govc using:

% govc find / -type g -config.segmentId /infra/segments/seg_6e9bdde0-f9bf-4ee6-ac36-493627b6db32_0
/folder-WCP_DC/WCP_DC/network/seg-domain-c9:a97676f3-cf6d-42d7-875b-ae0bd0016e32-test-gc-e2e-demo-ns-0

If so and you add the -i flag, it will print the ManagedObject ID:

% govc find -i / -type g -config.segmentId /infra/segments/seg_6e9bdde0-f9bf-4ee6-ac36-493627b6db32_0
DistributedVirtualPortgroup:dvportgroup-71

Does using the MOID work with the plugin?

@taylor-madeak
Copy link
Author

@dougm this query has the same issue as searching by name, that is to say it returns multiple results.

govc find / -type g -config.segmentId /infra/segments/b8f015a1-c281-4dfd-abbc-df0c88c5b2a4
/dsc1-w1-dc/network/dsc1-w1-a1-gcib-ix-10.109.248.24_29
/dsc1-w1-dc/network/dsc1-w1-a1-gcib-ix-10.109.248.24_29
/dsc1-w1-dc/network/dsc1-w1-a1-gcib-ix-10.109.248.24_29

With the -i flag, we can see that these each have different MOID values:

govc find -i / -type g -config.segmentId /infra/segments/b8f015a1-c28
1-4dfd-abbc-df0c88c5b2a4
DistributedVirtualPortgroup:dvportgroup-16348
DistributedVirtualPortgroup:dvportgroup-8278
DistributedVirtualPortgroup:dvportgroup-16476

@dougm
Copy link

dougm commented Dec 8, 2022

My understanding based on the KB was that segmentId is unique, this is the first I've seen where it isn't. I wonder what is unique (other than moid), can take a look if you can share the output of:

% govc find -i / -type g -config.segmentId /infra/segments/b8f015a1-c281-4dfd-abbc-df0c88c5b2a4 | xargs -n1 govc object.collect -o -json

The error message in this comment is "not found":

network '/infra/segments/977eab1d-1670-4b4e-9072-f71038385359' not found

Based on your govc output, I'd expect the error to be "multiple" found. So I also wonder if the plugin here has govmomi w/ the networkByID fallback. You should be able to confirm but using one of the moid's (e.g. DistributedVirtualPortgroup:dvportgroup-16348)

@tenthirtyam
Copy link
Collaborator

The error message observed in the previous comment when using MOID DistributedVirtualPortgroup:dvportgroup-16348 was also "not found":

2022/12/07 02:38:58 [INFO] (telemetry) ending vsphere-iso.linux
2022/12/07 02:38:58 ui error: ESC[1;31mBuild 'vsphere-iso.linux' errored after 350 milliseconds 544 microseconds: error creating vm: network 'DistributedVirtualPortgroup:dvportgroup-16495' not foundESC[0m

@tenthirtyam
Copy link
Collaborator

tenthirtyam commented Dec 8, 2022

I may be incorrect, but it might be because addNetwork is using findNetwork - which in turn calls FindNetworks that uses NetworkList

func addNetwork(d *VCenterDriver, devices object.VirtualDeviceList, config *CreateConfig) (object.VirtualDeviceList, error) {
for _, nic := range config.NICs {
network, err := findNetwork(nic.Network, config.Host, d)
if err != nil {
return nil, err
}

func findNetwork(network string, host string, d *VCenterDriver) (object.NetworkReference, error) {
if network != "" {
var err error
networks, err := d.FindNetworks(network)
if err != nil {
return nil, err
}
if len(networks) == 1 {
return networks[0].network, nil
}

func (d *VCenterDriver) FindNetworks(name string) ([]*Network, error) {
ns, err := d.finder.NetworkList(d.ctx, name)
if err != nil {
return nil, err
}
var networks []*Network
for _, n := range ns {
networks = append(networks, &Network{
network: n,
driver: d,
})
}
return networks, nil
}

@dougm
Copy link

dougm commented Dec 8, 2022

I may be incorrect, but it might be because addNetwork is using findNetwork - which in turn calls FindNetworks that uses NetworkList

Yes, looks like that is the issue. We can change govmomi's NetworkList to do the networkByID fallback. Or the plugin could fallback to calling Network if list fails.

@tenthirtyam
Copy link
Collaborator

Thanks Doug - appreciate the assist here. I'll work with the maintainer and get a fix in for this in the plugin to use the networkByID fallback.

@taylor-madeak
Copy link
Author

I'm setup to test new plugin builds, if you guys can get me some PoC code.

@StephenDunne-CAL
Copy link

I take it this is still backlogged ?

@tenthirtyam tenthirtyam self-assigned this Mar 5, 2024
@tenthirtyam tenthirtyam modified the milestones: Backlog, v1.2.8 Apr 17, 2024
@tenthirtyam
Copy link
Collaborator

tenthirtyam commented Apr 26, 2024

I revisited this one this evening and did some tests on latest (v1.2.7) and didn't have any issues using the MOIDs for port groups (e.g. "Network:network-18085" or distributed port groups (e.g. "DistributedVirtualPortgroup:dvportgroup-22077") both of which had the same name and would error if just the name was used.

==> vsphere-iso.linux-photon: error creating virtual machine: path 'DHCP' resolves to multiple networks. please provide a host to match or the network full path

When using the MOIDs, the build is placed on the correct port group or distributed port groups without issue. I've not verified this with an NSX segment yet, but it should have the same results.

I was going to add the failback, as seen below, but it appears not to be needed...

func (d *VCenterDriver) FindNetworks(name string) ([]*Network, error) {
    ns, err := d.finder.NetworkList(d.ctx, name)
    if err != nil || len(ns) == 0 {
        n, err := d.finder.Network(d.ctx, name)
        if err != nil {
            return nil, err
        }
        return []*Network{
            {
                network: n,
                driver:  d,
            },
        }, nil
    }
    var networks []*Network
    for _, n := range ns {
        networks = append(networks, &Network{
            network: n,
            driver:  d,
        })
    }
    return networks, nil
}

Why, because of vmware/govmomi#2626 (@dougm is awesome! 🎉 ) added the failback (see vmware/govmomi@bb4f739) that was included in v0.31.0 of vmware/govmomi and was picked up in v1.2.3 of the plugin.

I'm going to close this issue, however, I will add a PR to update the duplicate networks error message to instead suggest using the ID or path of the network instead of only "a host to match or full path".

Ryan

tenthirtyam added a commit that referenced this issue Apr 26, 2024
- Updates the error messages when more than one network with the same name resolves to more than one network.
- Updated the documentation for `network` in `vsphere-iso` and `vsphere-clone` builders.

Ref: #237

Signed-off-by: Ryan Johnson <[email protected]>
@rtaylor-gci
Copy link

@tenthirtyam I'd feel a lot better about this if it was tested with a NSX segment before closing this. I'll see if I can get a test in later today or on Monday.

To clarify: I'm @taylor-madeak, just created a separate GitHub account for work stuff (which this issue relates to).

@tenthirtyam
Copy link
Collaborator

I've successfully tested this with the both the segment id and logical switch uuid using release v1.2.7 on VMware Cloud Foundation 5.1.1 BOM.

Ryan Johnson
Distinguished Engineer, VMware by Broadcom

@tenthirtyam tenthirtyam added builder/vsphere-iso Builder: vsphere-iso builder/vsphere-clone Builder: vsphere-clone labels Apr 28, 2024
@rtaylor-gci
Copy link

@tenthirtyam I'm still having some trouble getting a successful test for this in our VCF environment, where I'm not guaranteed to land on any one specific VM host in the cluster. Can you share which vsphere-iso source properties you're specifying when you test this feature? I'd like to verify that it's not just a template configuration issue on my part.

@tenthirtyam
Copy link
Collaborator

Is your use case to always use the same host and a specific network on that host?

@rtaylor-gci
Copy link

The opposite, actually. My current template specifies server, datacenter, and cluster. I'd like to continue not caring which host I end up on and still be able to get a network. I'm not an expert with NSX, but it appears that the overlays end up being associated with VM hosts in vCenter. So, by not specifying a host to build on, the distributed portgroup MOID or segment ID I specify isn't found by Packer.

lbajolet-hashicorp pushed a commit that referenced this issue May 9, 2024
- Updates the error messages when more than one network with the same name resolves to more than one network.
- Updated the documentation for `network` in `vsphere-iso` and `vsphere-clone` builders.

Ref: #237

Signed-off-by: Ryan Johnson <[email protected]>
@tenthirtyam tenthirtyam modified the milestones: v1.3.0, v1.3.1 May 13, 2024
@tenthirtyam
Copy link
Collaborator

Hey! If you'd like to take a look at this live let me know. You can email me [email protected] and we can schedule some time to look at this.

Ryan Johnson
VMware by Broadcom

@tenthirtyam
Copy link
Collaborator

@taylor-madeak - wanted to check in and see if you've had an opportunity to test with the latest. Please feel free to reach out at ryan.johnson [at] broadcom [dot] com if you would like to look at this live.

Ryan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants