-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
store: add streamed postings reading #6340
Conversation
fb0895f
to
1ee9043
Compare
`readIndexRange` dominates the profiles here so let's stream reading postings into `index.Postings` instead of allocating everything at once. Work in progress. Signed-off-by: Giedrius Statkevičius <[email protected]>
1ee9043
to
2d10d0f
Compare
Hi @GiedriusS, we are interested in this feature as we saw same high heap usage issue recently of this part of code. |
@@ -2416,142 +2415,42 @@ func (r *bucketIndexReader) fetchPostings(ctx context.Context, keys []labels.Lab | |||
|
|||
// Fetch from object storage concurrently and update stats and posting list. | |||
g.Go(func() error { | |||
begin := time.Now() | |||
for _, p := range ptrs[i:j] { | |||
ir, err := r.block.bkt.GetRange(ctx, r.block.indexFilename(), p.ptr.Start, p.ptr.End-p.ptr.Start) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That means we have to send multiple requests to objstore while current logic is sending once per part?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we able to still send 1 request per part and create posting reader from the get range reader?
If this is not doable, I feel it is better to maybe just download postings to disk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe more get range requests won't impact performance, hopefully we can have some datapoints to understand the impact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could add this under a feature flag? if it is enabled then we would send multiple requests which would mean bigger costs if using some SaaS that charges per-request but we would get constant RAM usage
statsMtx: statsMtx, | ||
} | ||
|
||
postingsCount, err := getInt32(bktReader, r.readBuf[:0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be r.readBuf
, not r.readBuf[:0]
?
Tested r.readBuf[:0]
cannot read data cause the buffer size is 0.
} | ||
|
||
func getInt32(r io.Reader, buf []byte) (uint32, error) { | ||
read, err := r.Read(buf[:0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here buf[:0]
needs to be buf
@GiedriusS IIUC #6442 will supercede this pr? Or will we do both? |
What about the roaring bitmap approach? I feel like that one is superior since we can aggregate postings in a streaming manner. I don't know if making one or more requests per posting is sustainable at scale. @yeya24 also had an idea to calculate the intersection by merging one (or a controlled number) of postings at a time, instead of maxing out the fanout. |
Yeah I am thinking about the same. We know the postings length so we can just sort and start from the two smallest size of postings. Need some benchmarks though.
#6442 seems not doing this anymore. It is still one request per part. |
Changes
readIndexRange
dominates the profiles here so let's stream reading postings intoindex.Postings
instead of allocating everything at once.Work in progress.
Verification
Existing + ad-hoc tests.