Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nomad CSI volume status UI page never shows allocations depending on the volume #9215

Closed
RickyGrassmuck opened this issue Oct 28, 2020 · 8 comments · Fixed by #9377
Closed
Assignees
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/api HTTP API and SDK issues theme/storage type/bug
Milestone

Comments

@RickyGrassmuck
Copy link
Contributor

Nomad version

Output from nomad version

nomad version

Nomad v1.0.0-beta2 (3acb12b)

Operating system and Environment details

[root@devbox examples]# cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"

Issue

The Nomad UI is not displaying active allocations despite the API calls returning the information about the allocations currently utilizing the volume.

Below is a partial response from the /v1/volume/csi API call made by the browser when loading the /ui/csi/volumes/influx_data endpoint.

 "Allocations": [
    {
     "ID": "7ac17416-e09b-d4f9-814d-817836e72ab3",
      "JobID": "testInfluxdb",
      "JobType": "service",
      "Name": "testInfluxdb.database[0]",
      "Namespace": "default",
      "NodeID": "a8fa77a1-bcb1-b528-b150-8ceaeb722514",
      "NodeName": "nomad-client-1",
      ....
   }
  "AttachmentMode": "file-system",
  "Context": {},
  "ControllerRequired": true,
  "ControllersExpected": 3,
  "ControllersHealthy": 3,
  "CreateIndex": 4714,
  "ExternalID": "<external-volume-id>",
  "ID": "influx_data",
  "ModifyIndex": 4720,
  "MountOptions": null,
  "Name": "influx_data",
  "Namespace": "default",
  "NodesExpected": 3,
  "NodesHealthy": 3,
  "Parameters": {},
  "PluginID": "cinder-csi",
  "Provider": "cinder.csi.openstack.org",
  "ProviderVersion": "1.2.1@latest",
  "ReadAllocs": null,
  "ResourceExhausted": null,
  "Schedulable": true,
  "Secrets": null,
  "Topologies": [],
  "WriteAllocs": null
}

I'm guessing the root of the issue is being caused by the allocation that is currently using the volume not being registered as a WriteAllocs or ReadAllocs and the UI is only using those data points to populate the interface.

Reproduction steps

  1. Spin up a dev server/client and register a CSI driver on it (we are using the Openstack Cinder CSI driver)
  2. Register a Volume.
  3. Run a Nomad job that claims the registered volume.
  4. Check the /ui/csi/volumes/:volume-id page in the interface and notice that there are no active allocations being shown

Job file (if appropriate)

Volume Registration

type            = "csi"
id              = "influx_data"
name            = "influx_data"
external_id     = "<external-volume-id>"
access_mode     = "single-node-writer"
attachment_mode = "file-system"
plugin_id       = "cinder-csi"
mount_options   = {
  fs_type = "ext4"
  mount_flags = ["rw", "noatime"]
}
job "testInfluxdb" {
  datacenters = ["dc1"]
  group "database" {
    volume "influx_data" {
      type      = "csi"
      source    = "influx_data"
      read_only = false
    }

    network {
      port "influx" { to = 8086 }
    }

    task "influxdb" {
      driver = "docker"
      volume_mount {
        volume = "influx_data"
        destination = "/var/lib/influxdb"
      }
      config {
        image = "influxdb:latest"
        ports = ["influx"]
      }
    }
  }
}

Nomad Client/Server logs (if appropriate)

Debug Logs Sent via Email.

Images of the Volume details page and the allocation page are attached and show the volume in use by an allocation while the details page for the volume show no active allocations.

allocation_page_showing_volume_usage

volume_status_page

@DingoEatingFuzz
Copy link
Contributor

I'm guessing the root of the issue is being caused by the allocation that is currently using the volume not being registered as a WriteAllocs or ReadAllocs and the UI is only using those data points to populate the interface.

This is exactly the issue. My question (for @tgross) is why did that happen?

I suspect this has something to do with the CSI driver, but that's just me speculating.

@RickyGrassmuck
Copy link
Contributor Author

RickyGrassmuck commented Oct 29, 2020

I haven't tested with any other drivers yet so I can't help with any insight there unfortunately.

If you need any additional information from the cinder driver just let me know and I'll round it up.

@tgross
Copy link
Member

tgross commented Oct 29, 2020

What's interesting here is that the allocation is live, so we're not running into the usual set of tricky conditions where we have an allocation that's terminal but hasn't had its volume claim reaped yet.

I know we made some changes to how that API response is getting populated in #8590 to fix a very similar-looking bug #8362. There may be a regression here... I'll dig in and see what I can come up with.

@tgross tgross self-assigned this Oct 29, 2020
@apollo13
Copy link
Contributor

apollo13 commented Nov 2, 2020

I am also seeing this with a 120 line custom made CSI driver that implements the bare minimum of the API. Not sure if this is in nomad or if my plugin needs to provide some info for this view. I am on nomad 0.12.5 and my CSI plugin implements all of the identity service and just NodeGetCapabilities/NodeGetInfo/NodePublishVolume/NodeUnpublishVolume for the node interface.

@tgross tgross added stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/api HTTP API and SDK issues and removed stage/needs-investigation labels Nov 13, 2020
@tgross
Copy link
Member

tgross commented Nov 13, 2020

I dug into this a bit more and this is definitely a regression of #8362. This was supposed to have been fixed in #8590 but the ReadAllocs/WriteAllocs that were mentioned as the root cause in #8362 (comment) still aren't being filled. This API issue is also related to one of the remaining pieces of #9230 so this should be next on my plate.

@tsarna
Copy link

tsarna commented Nov 14, 2020

I'm also seeing this with the AWS EBS and EFS drivers (with 0.12.7)

@tgross
Copy link
Member

tgross commented Nov 25, 2020

Fixed in #9377, which will ship in Nomad 1.0 GA.

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/api HTTP API and SDK issues theme/storage type/bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants