-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
large database size #718
Comments
Hi @mschoch , Looking at the summary from
This indicates that so far there are not enough keys in the LSM tree, to run compactions. Badger supports version access for the keys, so it would only reclaim space from value logs once the key versions are discarded. When SSTable compactions happen, then the So, my recommendation would be to either set the If needed, I could make a change to flatten command, so it would also (optionally) run value log GC, so you can see the space reclaim instantly. This also indicates to me that the average value size per key is quite large. In that case, you might consider having smaller sized levels in the LSM tree, so more compactions are being done in general. That would allow value log GC to reclaim more space back. Look at |
Thanks for the suggestions. I should clarify that the only place we set However, the other command-line utilities I've run on the same db (after the original main application created it) all leave CompactL0OnClose set to the default (true). And they all properly call Close(), so I would have expected that L0 has been compacted by now.. Here is what I get when I run the
Running compaction again after the flatten, I still see the same thing:
DB still the same size:
I will try the other suggestions for level size and multipliers, but can you help me understand what you mean when you say, 'not enough keys in the LSM tree, to run compactions'? When I iterate all keys, here are the counts I get: |
Can you confirm that after you do The value log GC picks a file at random to see if it matches the threshold for reclamation. When Badger is running live, the LSM compaction keeps track of which keys were thrown out, and matches them against the value log files. But, it does not persist that information, so in offline mode, the random strategy is effective. You could modify your badger-compact loop to run it a few times, so it has a chance to pick the right log file.
A typical Badger SSTable can fit a couple of million keys in the default 64MB size. Only when a table exceeds that size, a new SSTable is created. In this case, there are not enough keys to build even a single SSTable (the one in |
No, those counts I reported were after running the the 466577 with all versions |
In live mode, the txn oracle maintains a read watermark, which indicates to the compactor a version below which keys are OK to drop on the floor. I realized in the offline mode ( I've pushed a fix here: ac321a8 . If you run If you're working on the same dir, you could do a |
I rebuilt all the binaries, but kept working on the same db. Even the backup/restore/flatten steps did not result in less disk space (still ~66GB). I do see somewhat fewer keys when visiting all versions now 357880 (was 466577 previously) I think I'm going to put my effort into getting the main application to call |
OK, we have now updated the main application to perform value log GC in a manner similar to what is suggested in the README:
By default this will wake up ever 30 seconds, try RunValueLogGC, and keep running it if some progress was made. Second, we've configured the db with much smaller values for some of the key options you mentioned:
Also, we're currently using a discardRatio of After the test case is performed, we saw the data size was still quite high:
Monitoring the logs, we see that ValueLogGC was run 46 times, and only 3 times did it report that it did work (return nil) and in all those cases the subsequent call, returned no rewrite. Next, we stopped the application (which will not perform a clean close). We then run
Out of curiosity, I then ran
Then
Out of curiosity I checked the number of keys I see with and without the all keys option:
It seems like with these options we do see fewer versions of keys, but overall size still leads us to believe there is significant space used for old versions of values. Are there any other diagnostics that could be run to gain insight into the space used in the value log? |
I've modified the utility I was using to count keys and it now also counts up the key and value sizes:
This seems to support our belief that the live value data is closer to 1GB (680MB counted here). We of course understand the actual size will be larger, but 62-66GB is pretty far from that. |
I've added some logging to try and understand why badger chooses not to rewrite the files, even though we suspect they are significantly garbage. These are again done with the offline compaction tool, so selection of a file is random, but that's OK for now. I believe the decision not to rewrite happens here: https://github.com/dgraph-io/badger/blob/master/value.go#L1253-L1255 The print from the line just before shows (formatting of message slightly different than the trace):
My read of this is that discard is almost the entire total, but this if condition is triggered because r.total is 679.something and yet sizeWindow*0.75 is 80583902.25. The comments say:
It seems to me that in our case, these preconditions of 1000 kv pairs, or 75% of the window size aren't met. Is it possible these are just arbitrary and happen to not work in our use case? |
That's right. The idea was to have a decent sample size. Considering a typical value log would have a million key-value pairs ( I think we could make the sampling sizes configurable via |
Thanks @manishrjain I can take a look at making it configurable. However, I have one more concern from reading the code. The comparison made here: https://github.com/dgraph-io/badger/blob/master/value.go#L1253
It seems to me this may be incorrectly comparing different units. The And entry sizes are converted to and tracked in MB: https://github.com/dgraph-io/badger/blob/master/value.go#L1177 But the comparison is made against a percentage of sizeWindow, and sizeWindow is in bytes (10% of the value log file size): https://github.com/dgraph-io/badger/blob/master/value.go#L1159 Am I misreading this? |
I've proposed a fix for what I think is the problem here: #724 At least in offline compaction, it works much better for us with this applied. Testing online behavior now. |
So here is our update attempting to perform online value log GC in our application: We're running with this patch applied #725 Our BadgerDB options are:
We invoke What we observe is that while our workload is running, which we know creates/updates large values, the database size still grows as high as 42GB. That is the peak, even as the workload is nearly finished, we still see it staying in the 15-20GB range. And finally, only after all mutations have stopped does it eventually come back to the ~1.4GB range. It feels like we're aggressively trying to do GC, and it can't keep up. We could try calling it even more frequently, but it seems like this ultimately is going to be too expensive to run this frequently all the time without any more signal to go on. I have been looking to see is there is some way we can try to invoke
But, how does knowing the size help without knowing the discard stats? Do you have any other suggestions for things to tune? |
The idea for We also could persist discard stats, so offline mechanism can also be better informed instead of shooting in the dark. I don't see much benefit in exposing discard stats to the user unless we have a way by which the user tells Badger which log file to attempt to GC -- I don't see obvious benefits there. |
Thanks again, setting And I can now see how using Have you had a chance to review #725? |
Great. Didn't realize you addressed the comments. |
Is there anything else needed here? |
I think we're OK for now. Thanks for your help. |
@manishrjain @mschoch Hello, I meet the problem same as mschoch‘s. func (db *BadgerDB) badgerGc() {
ticker := time.NewTicker(1 * time.Minute)
defer ticker.Stop()
for range ticker.C {
again:
err := db.db.RunValueLogGC(0.5)
if err == nil {
goto again
}
}
}
func NewBadgerDB(name string, dir string) (*BadgerDB, error) {
dbPath := filepath.Join(dir, name + ".db")
badgerOpts := badger.DefaultOptions
badgerOpts.Dir = dbPath
badgerOpts.ValueDir = dbPath
db, err := badger.Open(badgerOpts)
if err != nil {
return nil, err
}
database := &BadgerDB{
db: db,
}
go database.badgerGc()
return database, nil
} Base on commit deee8c7 |
We've been attempting to migrate an application from BoltDB to BadgerDB. One of the problems we're seeing is that the Badger database is considerably larger (66GB) than the corresponding Bolt database (1GB).
Our thinking is that reason for the size discrepancy is related to keys being updated repeatedly and the older values staying around.
The application that creates the database is using the default options, with the following changes:
Our understanding is that the default options set
NumVersionsToKeep
to1
and that is the appropriate setting to indicate we don't want to keep older versions.At the moment, our application is not calling
RunValueLogGC
because we had assumed the database was going to do this for us.Here is what
badger info
reports:Next, I'm using the following command-line utility to attempt to perform value log GC, to reclaim space:
When I run this program (with discard ratio 0.01) I get:
It took about 5 minutes to open, because it wasn't closed properly. That is unfortunate, but not the topic I wish to examine in this issue. Badger has reported that no rewrite was needed, so unsurprisingly, no space was reclaimed.
I understand there is statistical sampling, and possibly random selection of files, so I've tried running it again:
This time I see an error message about compaction unable to fill tables for level 0. I don't know if that is significant or not.
I tried running it 3 more times:
Still no space was reclaimed:
I thought, perhaps I've misunderstood the
discardRatio
, so I've tried another extreme (0.99):Still, no space was reclaimed:
At this point, it's fair to ask, are we sure there is any space that can be reclaimed? I wrote another program to try and iterate all versions, and see if I could confirm that Badger still sees older versions. Here is the program:
Here is some sampled output, which I interpret to mean multiple versions are still around, even after we've attempted to get rid of it:
This key is an extreme case but almost all the keys show multiple versions. It's not obvious how much space this is taking, but surely it is some.
So at this point we're not really sure what to do. We think our program probably should be calling
RunValueLogGC
, but our attempts to understand how it works in the command-line utility haven't shown anything useful.Are we using
RunValueLogGC
incorrectly?Are we using incorrect options for our use-case?
Are there any additional diagnostics we can run to gain more insight?
cc @aviadl
The text was updated successfully, but these errors were encountered: