Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WalkDir of gcs only iterate at most 1000 keys #30377

Closed
july2993 opened this issue Dec 3, 2021 · 4 comments · Fixed by #30393
Closed

WalkDir of gcs only iterate at most 1000 keys #30377

july2993 opened this issue Dec 3, 2021 · 4 comments · Fixed by #30393
Assignees
Labels
affects-5.2 This bug affects 5.2.x versions. affects-5.3 This bug affects 5.3.x versions. component/lightning This issue is related to Lightning of TiDB. severity/critical type/bug The issue is confirmed as a bug.

Comments

@july2993
Copy link
Contributor

july2993 commented Dec 3, 2021

Bug Report

ref

tidb/br/pkg/storage/gcs.go

Lines 183 to 197 in 891517f

maxKeys := int64(1000)
if opt.ListCount > 0 {
maxKeys = opt.ListCount
}
prefix := path.Join(s.gcs.Prefix, opt.SubDir)
if len(prefix) > 0 && !strings.HasSuffix(prefix, "/") {
prefix += "/"
}
query := &storage.Query{Prefix: prefix}
// only need each object's name and size
query.SetAttrSelection([]string{"Name", "Size"})
iter := s.bucket.Objects(ctx, query)
for i := int64(0); i != maxKeys; i++ {

note maxKeys is 1000, and we will end the loop after iterate maxKeys keys

1. Minimal reproduce step (Required)

import data more than 1000 keys/files

2. What did you expect to see? (Required)

import all data under directory.

3. What did you see instead (Required)

only scan at most 1000 files

4. What is your TiDB version? (Required)

@july2993 july2993 added the type/bug The issue is confirmed as a bug. label Dec 3, 2021
@kennytm
Copy link
Contributor

kennytm commented Dec 3, 2021

maxKeys is 1000 only if opt.ListCount is left has default (0).

@glorv
Copy link
Contributor

glorv commented Dec 3, 2021

I think this do is a bug. For s3 the maxKeys is used to limit the fetch keys for each request but still ensure to visit all the keys. But gcs only visit maxKeys keys and then stop.

@glorv
Copy link
Contributor

glorv commented Dec 3, 2021

/cc @3pointer

@Leavrth Leavrth self-assigned this Dec 3, 2021
@jebter jebter added the component/lightning This issue is related to Lightning of TiDB. label Dec 3, 2021
@jebter jebter added affects-4.0 This bug affects 4.0.x versions. affects-5.0 This bug affects 5.0.x versions. affects-5.1 This bug affects 5.1.x versions. affects-5.2 This bug affects 5.2.x versions. affects-5.3 This bug affects 5.3.x versions. labels Dec 3, 2021
@github-actions
Copy link

github-actions bot commented Dec 6, 2021

Please check whether the issue should be labeled with 'affects-x.y' or 'fixes-x.y.z', and then remove 'needs-more-info' label.

@glorv glorv removed affects-5.1 This bug affects 5.1.x versions. affects-4.0 This bug affects 4.0.x versions. affects-5.0 This bug affects 5.0.x versions. labels Feb 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-5.2 This bug affects 5.2.x versions. affects-5.3 This bug affects 5.3.x versions. component/lightning This issue is related to Lightning of TiDB. severity/critical type/bug The issue is confirmed as a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants