Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect FloatingIP workflow #1985

Closed
serge-name opened this issue Mar 29, 2024 · 7 comments · Fixed by #1996
Closed

Incorrect FloatingIP workflow #1985

serge-name opened this issue Mar 29, 2024 · 7 comments · Fixed by #1996
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@serge-name
Copy link

/kind bug

What steps did you take and what happened:
I tried capo build for 1d5d2d5e45462dab056e37a6c948361e81875ea9. Some key details follow:

  1. Created a OpenStackFloatingIPPool (non-relevant fields removed)
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1
kind: OpenStackFloatingIPPool
metadata:
  name: osfipp
spec:
  floatingIPNetwork:
    id: c7c8509d-7083-41c9-b799-e30e855e9bc0
  reclaimPolicy: Delete
  #
  1. created a MachineDeployment and OpenStackMachineTemplate
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: OpenStackMachineTemplate
metadata:
  name: some
spec:
  template:
    spec:
      ports:
        - network:
            id: f16855bf-8ba1-4f75-ad8c-763e80134571
      floatingIPPoolRef:
        apiGroup: infrastructure.cluster.x-k8s.io/v1beta1
        kind: OpenStackFloatingIPPool
        name: osfipp
#

✅ Floating IP was successfully created. Here we get correct data fip.FloatingIP == "185.***.**.**", fip.FloatingNetworkID == "c7c8509d-7083-41c9-b799-e30e855e9bc0":

fip, err := networkingService.GetFloatingIP(address.Spec.Address)
if err != nil {
return err
}

❌ Here we get port == nil and an error "Failed while associating ip from pool: port for floating IP "185...*" on network c7c8509d-7083-41c9-b799-e30e855e9bc0 does not exist":

port, err := networkingService.GetPortForExternalNetwork(instanceStatus.ID(), fip.FloatingNetworkID)
if err != nil {
return fmt.Errorf("get port for floating IP %q: %w", fip.FloatingIP, err)
}
if port == nil {
conditions.MarkFalse(openStackMachine, infrav1.FloatingAddressFromPoolReadyCondition, infrav1.FloatingAddressFromPoolErrorReason, clusterv1.ConditionSeverityError, "Can't find port for floating IP %q on external network %s", fip.FloatingIP, fip.FloatingNetworkID)
return fmt.Errorf("port for floating IP %q on network %s does not exist", fip.FloatingIP, fip.FloatingNetworkID)
}

More details follow.

Here:

instancePorts, err := s.client.ListPort(instancePortsOpts)

Openstack API returns the following (non-relevant fields skipped):

{
  "ports": [
    {
      "device_id": "d1b99e45-991c-4143-93a3-9a8d3eddb416",
      "device_owner": "compute:nova",
      "fixed_ips": [
        {
          "ip_address": "10.21.10.29",
          "subnet_id": "616388c0-519f-418e-80b4-3687a546a65e"
        }
      ],
      "id": "0d1fe3bd-55f6-41d0-b879-a4071a15b5c0",
      "network_id": "f16855bf-8ba1-4f75-ad8c-763e80134571"
// …
    }
  ]
}

Please notice that we don't have a port associated with FIP network c7c8509d-7083-41c9-b799-e30e855e9bc0. And both FIP network ID and the FIP itself are not going to appear in the ports info because in our Openstack cloud floating IPs are not being added to ports directly. But NAT 185.***.**.**10.21.10.29 would be set up.

If the new k8s node got FIP it could be found here:
https://compute-api:8774/v2.1/TENANT_ID/servers/d1b99e45-991c-4143-93a3-9a8d3eddb416

And the reply might be looking like this (non-relevant fields skipped):

{ "server": {
    "id": "d1b99e45-991c-4143-93a3-9a8d3eddb416",
    "hci_info": {
      "network": [
        {
          "ips": [
            "10.21.10.29"
          ],
          "network": {
            "id": "f16855bf-8ba1-4f75-ad8c-763e80134571",
            "subnets": [
              {
                "ips": [
                  {
                    "address": "10.21.10.29",
                    "type": "fixed",
                    "version": 4,
                    "floating_ips": [
                      {
                        "address": "185.***.**.**",
                        "type": "floating",
                        "version": 4,
                      }
                    ]
                  } ] } ] } } ] } } }

Here it tries to find a fixed IP in the FIP network but in our openstack cloud all FIPs have device_owner == "network:floatingip" so it gets just an empty list:

networkPortsOpts := ports.ListOpts{
NetworkID: instancePort.NetworkID,
DeviceOwner: "network:router_interface",
}
networkPorts, err := s.client.ListPort(networkPortsOpts)

What did you expect to happen:
Successfully deployed k8s node with FIP attached.

Anything else you would like to add:
None so far. But please ask me any details. The issue is reproducible and I can add even more details if you want.

Environment:

  • Cluster API Provider OpenStack version (Or git rev-parse HEAD if manually built): 1d5d2d5e45462dab056e37a6c948361e81875ea9

  • Cluster-API version: 1.6.3

  • OpenStack version: Virtuozzo (https://virtuozzo.com), based on Openstack Xena

  • Minikube/KIND version: N/A

  • Kubernetes version (use kubectl version): 1.29.3

  • OS (e.g. from /etc/os-release): Talos (https://talos.dev) 1.6.7

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Mar 29, 2024
@mdbooth
Copy link
Contributor

mdbooth commented Apr 1, 2024

/cc @huxcrux @bilbobrovall

@bilbobrovall
Copy link
Contributor

What does f16855bf-8ba1-4f75-ad8c-763e80134571 look like, does it have a router?

It's not really documented, but we don't create any new ports for the FIPs, we just look for an existing port that the FIP can be attached to by checking if there's a port with a subnet that has an attached router to the floating ip network.

I've mostly tested it out with spec.ports omitted with the default setup, but I can test it out with something closer to your setup if I know more about how that network is setup.

@serge-name
Copy link
Author

serge-name commented Apr 2, 2024

Yes, I meant that the new port is being created by Openstack. But not in our cloud. I'm not so familiar with Openstack internals and don't have an access to different configurations except our particular cloud.

GET https://compute-api:9696/v2.0/networks/f16855bf-8ba1-4f75-ad8c-763e80134571
{
  "network": {
    "id": "f16855bf-8ba1-4f75-ad8c-763e80134571",
    "name": "internal",
    "tenant_id": "278fda03174b4fee9358559baffca010",
    "admin_state_up": true,
    "mtu": 8913,
    "default_vnic_type": null,
    "status": "ACTIVE",
    "subnets": [
      "616388c0-519f-418e-80b4-3687a546a65e"
    ],
    "shared": false,
    "availability_zone_hints": [],
    "availability_zones": [
      "nova"
    ],
    "ipv4_address_scope": null,
    "ipv6_address_scope": null,
    "router:external": false,
    "description": "",
    "port_security_enabled": true,
    "rbac_policies": [
      {
        "id": "c869c7ef-3c51-4fb6-88f5-c591989fe3ef",
        "action": "access_as_shared",
        "target_tenant": "d278dea8631e47ffba5a908265968fbb"
      }
    ],
    "qos_policy_id": null,
    "tags": [],
    "created_at": "2024-02-06T12:43:10Z",
    "updated_at": "2024-03-20T20:39:09Z",
    "revision_number": 5,
    "project_id": "278fda03174b4fee9358559baffca010",
    "provider:network_type": "vxlan"
  }
}
GET https://compute-api:9696/v2.0/routers/7142d8f1-2b11-4ae2-a343-eacd77a2ceee
{
  "router": {
    "id": "7142d8f1-2b11-4ae2-a343-eacd77a2ceee",
    "name": "DefaultRouter",
    "tenant_id": "278fda03174b4fee9358559baffca010",
    "admin_state_up": true,
    "status": "ACTIVE",
    "external_gateway_info": {
      "network_id": "c7c8509d-7083-41c9-b799-e30e855e9bc0",
      "external_fixed_ips": [
        {
          "subnet_id": "aa2bc8f7-fa02-4851-ba13-93e57d4c69e1",
          "ip_address": "69.**.**.**"
        }
      ],
      "enable_snat": true
    },
    "description": "",
    "availability_zones": [
      "nova"
    ],
    "availability_zone_hints": [],
    "routes": [
    ],
    "flavor_id": null,
    "tags": [],
    "created_at": "2024-02-06T11:49:58Z",
    "updated_at": "2024-03-29T14:41:39Z",
    "revision_number": 17,
    "project_id": "278fda03174b4fee9358559baffca010"
  }
}

That router's external_fixed_ips is automatically pre-created by Openstack.

If a VM has FIP attached then outgoing connections are being SNAT'ed from that FIP.
IF a VM has no FIP then connections are being SNAT'ed from the router's external IP.

GET https://compute-api:9696/v2.0/ports?device_id=7142d8f1-2b11-4ae2-a343-eacd77a2ceee
{
  "ports": [
    {
      "id": "0411af2f-d447-4f3c-88a7-1e8a57e70015",
      "name": "",
      "network_id": "f16855bf-8ba1-4f75-ad8c-763e80134571",
      "tenant_id": "",
      "mac_address": "fa:16:3e:44:38:7e",
      "admin_state_up": true,
      "status": "ACTIVE",
      "device_id": "7142d8f1-2b11-4ae2-a343-eacd77a2ceee",
      "device_owner": "network:router_centralized_snat",
      "fixed_ips": [
        {
          "subnet_id": "616388c0-519f-418e-80b4-3687a546a65e",
          "ip_address": "10.21.11.1"
        }
      ],
      "allowed_address_pairs": [],
      "extra_dhcp_opts": [],
      "security_groups": [],
      "description": "",
      "binding:vnic_type": "normal",
      "port_security_enabled": false,
      "qos_policy_id": null,
      "qos_network_policy_id": null,
      "tags": [],
      "created_at": "2024-02-06T14:02:02Z",
      "updated_at": "2024-03-23T18:11:57Z",
      "revision_number": 40,
      "project_id": ""
    },
    {
      "id": "ded9eafe-3ee0-4f29-9f7f-953470f3a3ae",
      "name": "",
      "network_id": "f16855bf-8ba1-4f75-ad8c-763e80134571",
      "tenant_id": "278fda03174b4fee9358559baffca010",
      "mac_address": "fa:16:3e:48:d2:da",
      "admin_state_up": true,
      "status": "ACTIVE",
      "device_id": "7142d8f1-2b11-4ae2-a343-eacd77a2ceee",
      "device_owner": "network:router_interface_distributed",
      "fixed_ips": [
        {
          "subnet_id": "616388c0-519f-418e-80b4-3687a546a65e",
          "ip_address": "10.21.10.1"
        }
      ],
      "allowed_address_pairs": [],
      "extra_dhcp_opts": [],
      "security_groups": [],
      "description": "",
      "binding:vnic_type": "normal",
      "port_security_enabled": false,
      "qos_policy_id": null,
      "qos_network_policy_id": null,
      "tags": [],
      "created_at": "2024-02-06T14:02:02Z",
      "updated_at": "2024-04-02T10:33:28Z",
      "revision_number": 68,
      "project_id": "278fda03174b4fee9358559baffca010"
    }
  ]
}

I've came up with a quick fix already: https://github.com/serge-name/cluster-api-provider-openstack/commit/bb19917957b82959f8406ed9778eebf82ebd7855 works fine so far. Right now I am short in time to create a decent PR.

@bilbobrovall
Copy link
Contributor

DeviceOwner: "network:router_interface",

Does it work for you if you replace network:router_interface with network:router_interface_distributed?

@serge-name
Copy link
Author

Yes, network:router_interface_distributed works absolutely fine. As it is in the commit https://github.com/serge-name/cluster-api-provider-openstack/commit/a1bf5b88e40b9bc6c5d5f5208628a3e0193e70fe

bilbobrovall added a commit to elastx/cluster-api-provider-openstack that referenced this issue Apr 3, 2024
bilbobrovall added a commit to elastx/cluster-api-provider-openstack that referenced this issue Apr 3, 2024
@serge-name
Copy link
Author

@bilbobrovall thanks a lot! Your commit elastx@ce38e8b works fine for me and fixes the issue.

There are several minor errors due to premature and frequent (8 API reqs in 2 seconds) checks for FIP. Not a problem for me, just a thing that can be improved later. Logs are follow:

minor_errors.txt

@bilbobrovall
Copy link
Contributor

@bilbobrovall thanks a lot! Your commit elastx@ce38e8b works fine for me and fixes the issue.

There are several minor errors due to premature and frequent (8 API reqs in 2 seconds) checks for FIP. Not a problem for me, just a thing that can be improved later. Logs are follow:

minor_errors.txt

👍 It's probably just neutron taking some time, and I think the retries should be fine for now since there's an exponential backoff when a reconciler returns the same error, but the initial retries feels a bit tight in this case.

bilbobrovall added a commit to elastx/cluster-api-provider-openstack that referenced this issue Apr 3, 2024
@github-project-automation github-project-automation bot moved this from Inbox to Done in CAPO Roadmap Apr 9, 2024
pierreprinetti pushed a commit to shiftstack/cluster-api-provider-openstack that referenced this issue Apr 19, 2024
pierreprinetti pushed a commit to shiftstack/cluster-api-provider-openstack that referenced this issue Apr 19, 2024
pierreprinetti pushed a commit to shiftstack/cluster-api-provider-openstack that referenced this issue Apr 19, 2024
MaysaMacedo pushed a commit to shiftstack/cluster-api-provider-openstack that referenced this issue Apr 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
Archived in project
4 participants