Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Value log rewrites load the entire value log into memory #1292

Closed
Stebalien opened this issue Apr 7, 2020 · 5 comments · Fixed by #1357
Closed

Value log rewrites load the entire value log into memory #1292

Stebalien opened this issue Apr 7, 2020 · 5 comments · Fixed by #1357
Labels
kind/bug Something is broken. priority/P1 Serious issue that requires eventual attention (can wait a bit) status/accepted We accept to investigate or work on it.

Comments

@Stebalien
Copy link
Contributor

Stebalien commented Apr 7, 2020

What version of Go are you using (go version)?

$ go version
go version go1.13.8 linux/amd64

What version of Badger are you using?

1.6.1

Does this issue reproduce with the latest master?

I haven't tried but the code looks the same.

Description

When rewriting a value log (e.g., on garbage collection), badger will end up loading the entire value log into memory unless there are enough small values to exceed the batch count (ipfs/go-ds-badger#86).

When estimating the size of an entry:

badger/structs.go

Lines 153 to 158 in 8a93a41

func (e *Entry) estimateSize(threshold int) int {
if len(e.Value) < threshold {
return len(e.Key) + len(e.Value) + 2 // Meta, UserMeta
}
return len(e.Key) + 12 + 2 // 12 for ValuePointer, 2 for metas.
}

Badger only estimates the size of the entry as it exists in the LSM.

When deciding whether or not to flush, badger only looks at this estimated size, not the size of the value itself:

badger/value.go

Lines 566 to 576 in 617ed7c

es := int64(ne.estimateSize(vlog.opt.ValueThreshold))
// Ensure length and size of wb is within transaction limits.
if int64(len(wb)+1) >= vlog.opt.maxBatchCount ||
size+es >= vlog.opt.maxBatchSize {
tr.LazyPrintf("request has %d entries, size %d", len(wb), size)
if err := vlog.db.batchSet(wb); err != nil {
return err
}
size = 0
wb = wb[:0]
}

I believe the fix here is to look at the size of the value plus the LSM overhead, is that correct?

@jarifibrahim jarifibrahim added kind/bug Something is broken. priority/P2 Somehow important but would not block a release. status/accepted We accept to investigate or work on it. labels Apr 8, 2020
@jarifibrahim
Copy link
Contributor

Yes @Stebalien. That is correct. We should consider the size of the value when considering the size of the batch.

The fix would be similar to the one in #1278

@KlotzAndrew
Copy link

Running into OOM crashes during GC, is this related? I'm trying to use badger in a low memory environment with a large disk

pprof heap screenshot, this rapidly consumes all available RAM and crashes. These are the seeing I am using:

WithLoadBloomsOnOpen(false).
WithKeepL0InMemory(false).
WithValueLogLoadingMode(options.FileIO).
WithTableLoadingMode(options.FileIO)

Screenshot from 2020-05-18 14-55-18

@jarifibrahim jarifibrahim added priority/P1 Serious issue that requires eventual attention (can wait a bit) and removed priority/P2 Somehow important but would not block a release. labels Jun 2, 2020
@jarifibrahim
Copy link
Contributor

@KlotzAndrew Yes, it is the same issue. I've bumped up the priority since multiple people are seeing the same failure.

@jarifibrahim
Copy link
Contributor

Hey @Stebalien @KlotzAndrew, I have a PR that should reduce the memory consumed by value log GC. Do you think you can test it #1357? I'll try to generate some data with large values and see if I can reproduce the high memory consumption.

@Stebalien
Copy link
Contributor Author

PR looks good to me. I'll need to complete the work to support badger 2 in go-ipfs to actually test it.

jarifibrahim pushed a commit that referenced this issue Jun 9, 2020
The existing code doesn't consider the size of the value while
calculating the size of the entry batch in rewrite method.
This PR fixes it.

Fixes #1292
NamanJain8 pushed a commit that referenced this issue Sep 8, 2020
The existing code doesn't consider the size of the value while
calculating the size of the entry batch in rewrite method.
This PR fixes it.

Fixes #1292

(cherry picked from commit 14386ac)
NamanJain8 added a commit that referenced this issue Sep 9, 2020
The existing code doesn't consider the size of the value while
calculating the size of the entry batch in rewrite method.
This PR fixes it.

Fixes #1292

(cherry picked from commit 14386ac)

Co-authored-by: Ibrahim Jarif <[email protected]>
jarifibrahim pushed a commit that referenced this issue Oct 2, 2020
The existing code doesn't consider the size of the value while
calculating the size of the entry batch in rewrite method.
This PR fixes it.

Fixes #1292
mYmNeo pushed a commit to mYmNeo/badger that referenced this issue Jan 16, 2023
…o#1505)

The existing code doesn't consider the size of the value while
calculating the size of the entry batch in rewrite method.
This PR fixes it.

Fixes dgraph-io#1292

(cherry picked from commit 14386ac)

Co-authored-by: Ibrahim Jarif <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something is broken. priority/P1 Serious issue that requires eventual attention (can wait a bit) status/accepted We accept to investigate or work on it.
Development

Successfully merging a pull request may close this issue.

3 participants