-
Notifications
You must be signed in to change notification settings - Fork 606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[cinder-csi-plugin] Volume list pagination problem #2295
Comments
@vhurtevent can you run controller with |
Doesn't look like your issue is caused by bad pagination, unless you have a custom environment that uses |
csc is a tool, but not used in cinder-csi-plugin now |
as I understand, CSC tool is not used, it's gophercloud and I don't find way to specify max_entries which could help to reproduce in a smaller setup. I tried to prepare debug logs (attached) without confidential information : cinder-csi-plugin.log In cinder-csi-plugin logs from its startup :
The ListVolumes requests ends line 44355 with the GRPC response to csi-attacher without retrieving the remaining volumes from the marker returned by Cinder API. If I look to csi-attacher at the same time, when receiving the GRPC response to ListVolumes call, there is also this message :
I tried to understand how pagination is done, but it's unclear for me how it's done between gophercloud, cinder-csi-plugin and csi-attacher, but I'm wondering if cinder-csi-plugin shouldn't return the next token (marker) to trigger a new GRPC ListVolume call from this marker. Or cinder-csi-plugin doesn't paginate itself correctly to get the complete volume list.
|
There has to be a bug in either gophercloud or our implementation of pagination here. First of all I'm pretty sure |
Hi @dulek, if I understand well, pagination is implemented a several levels, resulting in only one and first page results to csi-attacher ? |
As |
No, I'm seeing the first 1000 volumes being correctly returned to csi-attacher |
Ah, we're returning |
I nailed the source of the I made a custom build of the v3.5.0 tag with the hardcoded
UPD: pagination works fine...
|
@vhurtevent which csi-attacher tag are you using? UPD: according to your log, the |
Hello @kayrus, my setup is made with latest tag manifests : https://github.com/kubernetes/cloud-provider-openstack/blob/v1.27.1/manifests/cinder-csi-plugin/cinder-csi-controllerplugin.yaml
When I try to query volumes/detail with CURL :
Using the marker in the logs, at the time of capture :
These queries are made without maxEntries (or limit in cinder API context) parameter, therefore it's Cinder which paginates the results 1000 by 1000 not the client. Maybe it's a lead ? |
Using a custom build of external-attacher with maxEntries hardcoded to 500 or 1000 could help debug this last point ? |
diff --git a/pkg/attacher/lister.go b/pkg/attacher/lister.go
index 004c92e4..4b9e866f 100644
--- a/pkg/attacher/lister.go
+++ b/pkg/attacher/lister.go
@@ -42,6 +42,7 @@ func (a *CSIVolumeLister) ListVolumes(ctx context.Context) (map[string]([]string
tok := ""
for {
rsp, err := a.client.ListVolumes(ctx, &csi.ListVolumesRequest{
+ MaxEntries: 2,
StartingToken: tok,
})
if err != nil Then build an image using:
I can also suggest you to add this log into csi-controller: diff --git a/pkg/csi/cinder/controllerserver.go b/pkg/csi/cinder/controllerserver.go
index 21a72706..4e3e17fa 100644
--- a/pkg/csi/cinder/controllerserver.go
+++ b/pkg/csi/cinder/controllerserver.go
@@ -287,6 +287,8 @@ func (cs *controllerServer) ListVolumes(ctx context.Context, req *csi.ListVolume
}
maxEntries := int(req.MaxEntries)
+ klog.V(4).Infof("ListVolumes: called with %d entries and %q token", maxEntries, req.StartingToken)
+
vlist, nextPageToken, err := cs.Cloud.ListVolumes(maxEntries, req.StartingToken)
if err != nil {
klog.Errorf("Failed to ListVolumes: %v", err)
@@ -313,6 +315,8 @@ func (cs *controllerServer) ListVolumes(ctx context.Context, req *csi.ListVolume
ventries = append(ventries, &ventry)
}
+
+ klog.V(4).Infof("ListVolumes: %d entries with next token: %s", len(ventries), nextPageToken)
return &csi.ListVolumesResponse{
Entries: ventries,
NextToken: nextPageToken, |
Looks like it's true that Cinder will limit the size of a list to 1000 by default 1. Moreover some 2 drivers 3 limit page size their own too. So this is a good lead! I think the solution would be to use that info and only request maximum of 1000 volume pages in the cinder-csi-plugin. If CSI wants more we need to make multiple requests there (controlling when we're actually stopping iterations of Footnotes
|
@dulek from what I see in k8s source code, there should be no issues even when cinder artificially limits the output to 1000 entries. Let's wait for @vhurtevent results with extra logs to understand what is the root cause |
Thank you @kayrus, I hope I could test it this afternoon. |
I think I got good news ! I followed your tips @kayrus and fixed maxEntries=500 logs attached : csi-attacher_with-hardcoded-maxentries-500.log Volume list are correctly paginated with 500 item pages and csi-attacher get back the complete 1258 volumes. |
@vhurtevent I'd like to get extra logs (e.g. #2296 PR) without the 500 limit... Can you also provide them? |
Here the logs with extras from @kayrus
|
o_O |
Nailed it! :) diff --git a/pkg/csi/cinder/openstack/openstack_volumes.go b/pkg/csi/cinder/openstack/openstack_volumes.go
index 79372196..083908e6 100644
--- a/pkg/csi/cinder/openstack/openstack_volumes.go
+++ b/pkg/csi/cinder/openstack/openstack_volumes.go
@@ -96,11 +96,11 @@ func (os *OpenStack) ListVolumes(limit int, startingToken string) ([]volumes.Vol
}
if nextPageURL != "" {
- queryParams, err := url.ParseQuery(nextPageURL)
+ pageURL, err := url.Parse(nextPageURL)
if err != nil {
return false, err
}
- nextPageToken = queryParams.Get("marker")
+ nextPageToken = pageURL.Query().Get("marker")
}
return false, nil I need to check how to add this into unit tests... |
Hi @kayrus ! It's great you found the root cause :) Could you work on a fix & PR in the next days ? How could I help ? |
@vhurtevent the fix was submitted last week in #2296. It waits for a merge. |
it's merged ,can you help revisit this issue with the fix? @vhurtevent |
Hello @jichenjc, I applied the fix through a build from master branch on a cluster where I got the VolumeAttachment mismatch, and all is ok now. Thank you for your work ! As I tested with my own build, is there already an automatic build from master ? |
I think we can have a backport to 1.27 branch then have a 1.27.x release . thoughts @kayrus ? |
I'm not against the backport. However the release of olde versions is a manual task according to @zetaab |
we should release time to time on older release .. so backport such issue seems valid to me |
let me try the cherry-pick bot... |
@jichenjc for some reason all backports fail with CI tests |
/close |
@kayrus: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/kind bug
What happened:
In an OpenStack project with more than 1000 volumes provisioned and managed by cinder-csi-plugin, csi-attacher can't get a complete list of volumes. The volumes listed over the 1000th item, which seems to be the Cinder limit of entries, are not retrieved using pagination.
With volume attachments missing, csi-attacher logs about VolumeAttachment mismatch between VA inside the Kubernetes cluster and volumes attachments from ListVolumes call (Cinder API /v3/<project_id>/volumes/detail).
The log message is like :
In projects with several thousands of volumes, there are lots of VA mismatches, logs are very heavy and we suspect the VA queue for reprocessing is consuming too much requests to Kube API and Cinder API.
Cinder API ends its JSON answer with next item marker to start with for the next query. Although cinder-csi-plugin seems to implement pagination (https://github.com/kubernetes/cloud-provider-openstack/blob/master/pkg/csi/cinder/openstack/openstack_volumes.go#L79), only the first 1000 items are successfully retrieved from Cinder API.
What you expected to happen:
The Cinder API pagination is correctly used by cinder-csi-plugin and csi-attacher gets complete volume attachments list when calling ListVolumes.
How to reproduce it:
I searched in :
how to set max_entries to lower than 1000 to test pagination and reproduce in a small setup, without success.
Environment:
The text was updated successfully, but these errors were encountered: