Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

large database size #718

Closed
mschoch opened this issue Feb 21, 2019 · 19 comments
Closed

large database size #718

mschoch opened this issue Feb 21, 2019 · 19 comments
Labels
kind/question Something requiring a response

Comments

@mschoch
Copy link
Contributor

mschoch commented Feb 21, 2019

We've been attempting to migrate an application from BoltDB to BadgerDB. One of the problems we're seeing is that the Badger database is considerably larger (66GB) than the corresponding Bolt database (1GB).

$ du -sh marty.db/
66G	marty.db/

Our thinking is that reason for the size discrepancy is related to keys being updated repeatedly and the older values staying around.

The application that creates the database is using the default options, with the following changes:

opt.TableLoadingMode = options.MemoryMap
opt.CompactL0OnClose = false

Our understanding is that the default options set NumVersionsToKeep to 1 and that is the appropriate setting to indicate we don't want to keep older versions.

At the moment, our application is not calling RunValueLogGC because we had assumed the database was going to do this for us.

Here is what badger info reports:

$ badger info --dir marty.db/
Listening for /debug HTTP requests at port: 8080

[     2019-02-20T19:18:21Z] MANIFEST       16 B MA
[          4 seconds later] 000000.vlog  1.1 GB VL
[          9 seconds later] 000001.vlog  1.1 GB VL
[         17 seconds later] 000002.vlog  1.1 GB VL
[         24 seconds later] 000003.vlog  1.1 GB VL
[         29 seconds later] 000004.vlog  1.1 GB VL
[         39 seconds later] 000005.vlog  1.1 GB VL
[         47 seconds later] 000006.vlog  1.1 GB VL
[         55 seconds later] 000007.vlog  1.1 GB VL
[           1 minute later] 000008.vlog  1.1 GB VL
[           1 minute later] 000009.vlog  1.1 GB VL
[           1 minute later] 000010.vlog  1.1 GB VL
[           1 minute later] 000011.vlog  1.1 GB VL
[           1 minute later] 000012.vlog  1.1 GB VL
[           1 minute later] 000013.vlog  1.1 GB VL
[           1 minute later] 000014.vlog  1.1 GB VL
[          2 minutes later] 000015.vlog  1.1 GB VL
[          2 minutes later] 000016.vlog  1.1 GB VL
[          2 minutes later] 000017.vlog  1.1 GB VL
[          2 minutes later] 000018.vlog  1.1 GB VL
[          2 minutes later] 000019.vlog  1.1 GB VL
[          2 minutes later] 000020.vlog  1.1 GB VL
[          2 minutes later] 000021.vlog  1.1 GB VL
[          3 minutes later] 000022.vlog  1.1 GB VL
[          3 minutes later] 000023.vlog  1.1 GB VL
[          3 minutes later] 000024.vlog  1.1 GB VL
[          3 minutes later] 000025.vlog  1.1 GB VL
[          3 minutes later] 000026.vlog  1.1 GB VL
[          3 minutes later] 000027.vlog  1.1 GB VL
[          3 minutes later] 000028.vlog  1.1 GB VL
[          3 minutes later] 000029.vlog  1.1 GB VL
[          4 minutes later] 000030.vlog  1.1 GB VL
[          4 minutes later] 000031.vlog  1.1 GB VL
[          4 minutes later] 000032.vlog  1.1 GB VL
[          4 minutes later] 000033.vlog  1.1 GB VL
[          4 minutes later] 000034.vlog  1.1 GB VL
[          4 minutes later] 000035.vlog  1.1 GB VL
[          4 minutes later] 000036.vlog  1.1 GB VL
[          5 minutes later] 000037.vlog  1.1 GB VL
[          5 minutes later] 000038.vlog  1.1 GB VL
[          5 minutes later] 000039.vlog  1.1 GB VL
[          5 minutes later] 000040.vlog  1.1 GB VL
[          5 minutes later] 000041.vlog  1.1 GB VL
[          5 minutes later] 000042.vlog  1.1 GB VL
[          5 minutes later] 000043.vlog  1.1 GB VL
[          5 minutes later] 000044.vlog  1.1 GB VL
[          6 minutes later] 000045.vlog  1.1 GB VL
[          6 minutes later] 000046.vlog  1.1 GB VL
[          6 minutes later] 000047.vlog  1.1 GB VL
[          6 minutes later] 000048.vlog  1.1 GB VL
[          6 minutes later] 000049.vlog  1.1 GB VL
[          6 minutes later] 000050.vlog  1.1 GB VL
[          6 minutes later] 000051.vlog  1.1 GB VL
[          7 minutes later] 000052.vlog  1.1 GB VL
[          7 minutes later] 000053.vlog  1.1 GB VL
[          7 minutes later] 000054.vlog  1.1 GB VL
[          7 minutes later] 000055.vlog  1.1 GB VL
[          7 minutes later] 000056.vlog  1.1 GB VL
[          7 minutes later] 000057.vlog  1.1 GB VL
[          7 minutes later] 000058.vlog  1.1 GB VL
[          7 minutes later] 000059.vlog  1.1 GB VL
[          8 minutes later] 000060.vlog  1.1 GB VL
[          8 minutes later] 000061.vlog  1.1 GB VL
[          8 minutes later] 000062.vlog  1.1 GB VL
[          8 minutes later] 000063.vlog  1.1 GB VL
[          8 minutes later] 000064.vlog  1.1 GB VL
[          8 minutes later] 000065.vlog  396 MB VL

[EXTRA]
[2019-02-20T19:18:21Z] LOCK            6 B

[Summary]
Total index size:      0 B
Value log size:      70 GB

Abnormalities:
1 extra file.
0 missing files.
0 empty files.
0 truncated manifests.

Next, I'm using the following command-line utility to attempt to perform value log GC, to reclaim space:

package main

import (
	"flag"
	"log"

	"github.com/dgraph-io/badger"
)

var percent = flag.Float64("percent", 0.5, "rewrite value log if it will reclaim this percent of space")

func main() {

	flag.Parse()

	if flag.NArg() < 1 {
		log.Fatalf("must specify path to badger")
	}

	opt := badger.DefaultOptions
	opt.Dir = flag.Arg(0)
	opt.ValueDir = flag.Arg(0)

	db, err := badger.Open(opt)
	if err != nil {
		log.Fatalf("error opening badger: %v", err)
	}
	defer db.Close()

	count := 0
	for err == nil {
		log.Printf("starting value log gc")
		count++
		err = db.RunValueLogGC(*percent)
	}
	if err == badger.ErrNoRewrite {
		log.Printf("no rewrite needed")
	} else if err != nil {
		log.Fatalf("error running value log gc: %v", err)
	}
	log.Printf("completed gc, ran %d times", count)
}

When I run this program (with discard ratio 0.01) I get:

$ badger-compact --percent 0.01 marty.db/
badger 2019/02/20 19:39:07 INFO: Replaying file id: 0 at offset: 0
badger 2019/02/20 19:39:11 INFO: Replay took: 4.811759095s
badger 2019/02/20 19:39:11 INFO: Replaying file id: 1 at offset: 0
badger 2019/02/20 19:39:16 INFO: Replay took: 4.751968237s
badger 2019/02/20 19:39:16 INFO: Replaying file id: 2 at offset: 0
badger 2019/02/20 19:39:21 INFO: Replay took: 4.971052698s
badger 2019/02/20 19:39:21 INFO: Replaying file id: 3 at offset: 0
badger 2019/02/20 19:39:26 INFO: Replay took: 4.738904681s
badger 2019/02/20 19:39:26 INFO: Replaying file id: 4 at offset: 0
badger 2019/02/20 19:39:31 INFO: Replay took: 4.724747329s
badger 2019/02/20 19:39:31 INFO: Replaying file id: 5 at offset: 0
badger 2019/02/20 19:39:35 INFO: Replay took: 4.709767101s
badger 2019/02/20 19:39:35 INFO: Replaying file id: 6 at offset: 0
badger 2019/02/20 19:39:40 INFO: Replay took: 4.611179924s
badger 2019/02/20 19:39:40 INFO: Replaying file id: 7 at offset: 0
badger 2019/02/20 19:39:45 INFO: Replay took: 4.713674705s
badger 2019/02/20 19:39:45 INFO: Replaying file id: 8 at offset: 0
badger 2019/02/20 19:39:49 INFO: Replay took: 4.62619549s
badger 2019/02/20 19:39:49 INFO: Replaying file id: 9 at offset: 0
badger 2019/02/20 19:39:54 INFO: Replay took: 4.531565502s
badger 2019/02/20 19:39:54 INFO: Replaying file id: 10 at offset: 0
badger 2019/02/20 19:39:58 INFO: Replay took: 4.561584566s
badger 2019/02/20 19:39:58 INFO: Replaying file id: 11 at offset: 0
badger 2019/02/20 19:40:03 INFO: Replay took: 4.760297334s
badger 2019/02/20 19:40:03 INFO: Replaying file id: 12 at offset: 0
badger 2019/02/20 19:40:08 INFO: Replay took: 4.824707787s
badger 2019/02/20 19:40:08 INFO: Replaying file id: 13 at offset: 0
badger 2019/02/20 19:40:13 INFO: Replay took: 4.681027089s
badger 2019/02/20 19:40:13 INFO: Replaying file id: 14 at offset: 0
badger 2019/02/20 19:40:17 INFO: Replay took: 4.658125077s
badger 2019/02/20 19:40:17 INFO: Replaying file id: 15 at offset: 0
badger 2019/02/20 19:40:22 INFO: Replay took: 4.693305358s
badger 2019/02/20 19:40:22 INFO: Replaying file id: 16 at offset: 0
badger 2019/02/20 19:40:27 INFO: Replay took: 4.740523799s
badger 2019/02/20 19:40:27 INFO: Replaying file id: 17 at offset: 0
badger 2019/02/20 19:40:31 INFO: Replay took: 4.82007802s
badger 2019/02/20 19:40:31 INFO: Replaying file id: 18 at offset: 0
badger 2019/02/20 19:40:36 INFO: Replay took: 4.638329175s
badger 2019/02/20 19:40:36 INFO: Replaying file id: 19 at offset: 0
badger 2019/02/20 19:40:41 INFO: Replay took: 4.61783827s
badger 2019/02/20 19:40:41 INFO: Replaying file id: 20 at offset: 0
badger 2019/02/20 19:40:45 INFO: Replay took: 4.544613945s
badger 2019/02/20 19:40:45 INFO: Replaying file id: 21 at offset: 0
badger 2019/02/20 19:40:50 INFO: Replay took: 4.81666902s
badger 2019/02/20 19:40:50 INFO: Replaying file id: 22 at offset: 0
badger 2019/02/20 19:40:55 INFO: Replay took: 4.724866998s
badger 2019/02/20 19:40:55 INFO: Replaying file id: 23 at offset: 0
badger 2019/02/20 19:41:00 INFO: Replay took: 4.746947874s
badger 2019/02/20 19:41:00 INFO: Replaying file id: 24 at offset: 0
badger 2019/02/20 19:41:04 INFO: Replay took: 4.744179446s
badger 2019/02/20 19:41:04 INFO: Replaying file id: 25 at offset: 0
badger 2019/02/20 19:41:09 INFO: Replay took: 4.678761441s
badger 2019/02/20 19:41:09 INFO: Replaying file id: 26 at offset: 0
badger 2019/02/20 19:41:14 INFO: Replay took: 4.71652828s
badger 2019/02/20 19:41:14 INFO: Replaying file id: 27 at offset: 0
badger 2019/02/20 19:41:18 INFO: Replay took: 4.553530187s
badger 2019/02/20 19:41:18 INFO: Replaying file id: 28 at offset: 0
badger 2019/02/20 19:41:23 INFO: Replay took: 4.791599527s
badger 2019/02/20 19:41:23 INFO: Replaying file id: 29 at offset: 0
badger 2019/02/20 19:41:28 INFO: Replay took: 4.752528408s
badger 2019/02/20 19:41:28 INFO: Replaying file id: 30 at offset: 0
badger 2019/02/20 19:41:33 INFO: Replay took: 4.866835742s
badger 2019/02/20 19:41:33 INFO: Replaying file id: 31 at offset: 0
badger 2019/02/20 19:41:38 INFO: Replay took: 4.880718253s
badger 2019/02/20 19:41:38 INFO: Replaying file id: 32 at offset: 0
badger 2019/02/20 19:41:42 INFO: Replay took: 4.713830371s
badger 2019/02/20 19:41:42 INFO: Replaying file id: 33 at offset: 0
badger 2019/02/20 19:41:47 INFO: Replay took: 4.782097099s
badger 2019/02/20 19:41:47 INFO: Replaying file id: 34 at offset: 0
badger 2019/02/20 19:41:52 INFO: Replay took: 4.748420075s
badger 2019/02/20 19:41:52 INFO: Replaying file id: 35 at offset: 0
badger 2019/02/20 19:41:57 INFO: Replay took: 4.847798785s
badger 2019/02/20 19:41:57 INFO: Replaying file id: 36 at offset: 0
badger 2019/02/20 19:42:02 INFO: Replay took: 4.905576136s
badger 2019/02/20 19:42:02 INFO: Replaying file id: 37 at offset: 0
badger 2019/02/20 19:42:06 INFO: Replay took: 4.721499828s
badger 2019/02/20 19:42:06 INFO: Replaying file id: 38 at offset: 0
badger 2019/02/20 19:42:11 INFO: Replay took: 4.613040236s
badger 2019/02/20 19:42:11 INFO: Replaying file id: 39 at offset: 0
badger 2019/02/20 19:42:16 INFO: Replay took: 4.662545983s
badger 2019/02/20 19:42:16 INFO: Replaying file id: 40 at offset: 0
badger 2019/02/20 19:42:20 INFO: Replay took: 4.79309651s
badger 2019/02/20 19:42:20 INFO: Replaying file id: 41 at offset: 0
badger 2019/02/20 19:42:25 INFO: Replay took: 4.75567823s
badger 2019/02/20 19:42:25 INFO: Replaying file id: 42 at offset: 0
badger 2019/02/20 19:42:30 INFO: Replay took: 4.742469083s
badger 2019/02/20 19:42:30 INFO: Replaying file id: 43 at offset: 0
badger 2019/02/20 19:42:35 INFO: Replay took: 4.869502605s
badger 2019/02/20 19:42:35 INFO: Replaying file id: 44 at offset: 0
badger 2019/02/20 19:42:40 INFO: Replay took: 4.847217883s
badger 2019/02/20 19:42:40 INFO: Replaying file id: 45 at offset: 0
badger 2019/02/20 19:42:44 INFO: Replay took: 4.755110941s
badger 2019/02/20 19:42:44 INFO: Replaying file id: 46 at offset: 0
badger 2019/02/20 19:42:49 INFO: Replay took: 4.656211342s
badger 2019/02/20 19:42:49 INFO: Replaying file id: 47 at offset: 0
badger 2019/02/20 19:42:54 INFO: Replay took: 4.78607578s
badger 2019/02/20 19:42:54 INFO: Replaying file id: 48 at offset: 0
badger 2019/02/20 19:42:59 INFO: Replay took: 4.801395748s
badger 2019/02/20 19:42:59 INFO: Replaying file id: 49 at offset: 0
badger 2019/02/20 19:43:03 INFO: Replay took: 4.798918687s
badger 2019/02/20 19:43:03 INFO: Replaying file id: 50 at offset: 0
badger 2019/02/20 19:43:08 INFO: Replay took: 4.918489831s
badger 2019/02/20 19:43:08 INFO: Replaying file id: 51 at offset: 0
badger 2019/02/20 19:43:13 INFO: Replay took: 4.829245575s
badger 2019/02/20 19:43:13 INFO: Replaying file id: 52 at offset: 0
badger 2019/02/20 19:43:18 INFO: Replay took: 4.78955741s
badger 2019/02/20 19:43:18 INFO: Replaying file id: 53 at offset: 0
badger 2019/02/20 19:43:23 INFO: Replay took: 4.764601116s
badger 2019/02/20 19:43:23 INFO: Replaying file id: 54 at offset: 0
badger 2019/02/20 19:43:28 INFO: Replay took: 4.911577468s
badger 2019/02/20 19:43:28 INFO: Replaying file id: 55 at offset: 0
badger 2019/02/20 19:43:32 INFO: Replay took: 4.884888662s
badger 2019/02/20 19:43:32 INFO: Replaying file id: 56 at offset: 0
badger 2019/02/20 19:43:37 INFO: Replay took: 4.681730289s
badger 2019/02/20 19:43:37 INFO: Replaying file id: 57 at offset: 0
badger 2019/02/20 19:43:42 INFO: Replay took: 4.806671029s
badger 2019/02/20 19:43:42 INFO: Replaying file id: 58 at offset: 0
badger 2019/02/20 19:43:47 INFO: Replay took: 4.756638293s
badger 2019/02/20 19:43:47 INFO: Replaying file id: 59 at offset: 0
badger 2019/02/20 19:43:52 INFO: Replay took: 5.050781364s
badger 2019/02/20 19:43:52 INFO: Replaying file id: 60 at offset: 0
badger 2019/02/20 19:43:56 INFO: Replay took: 4.628812071s
badger 2019/02/20 19:43:56 INFO: Replaying file id: 61 at offset: 0
badger 2019/02/20 19:44:01 INFO: Replay took: 4.858133595s
badger 2019/02/20 19:44:01 INFO: Replaying file id: 62 at offset: 0
badger 2019/02/20 19:44:06 INFO: Replay took: 4.796919555s
badger 2019/02/20 19:44:06 INFO: Replaying file id: 63 at offset: 0
badger 2019/02/20 19:44:11 INFO: Replay took: 4.774836434s
badger 2019/02/20 19:44:11 INFO: Replaying file id: 64 at offset: 0
badger 2019/02/20 19:44:16 INFO: Replay took: 4.776516108s
badger 2019/02/20 19:44:16 INFO: Replaying file id: 65 at offset: 0
badger 2019/02/20 19:44:17 INFO: Replay took: 1.839273458s
2019/02/20 19:44:17 starting value log gc
2019/02/20 19:44:17 no rewrite needed
2019/02/20 19:44:17 completed gc, ran 1 times
badger 2019/02/20 19:44:17 INFO: Storing value log head: {Fid:65 Len:0 Offset:395888291}
badger 2019/02/20 19:44:18 INFO: Force compaction on level 0 done

It took about 5 minutes to open, because it wasn't closed properly. That is unfortunate, but not the topic I wish to examine in this issue. Badger has reported that no rewrite was needed, so unsurprisingly, no space was reclaimed.

$ du -sh marty.db/
66G	marty.db/

I understand there is statistical sampling, and possibly random selection of files, so I've tried running it again:

$ badger-compact --percent 0.01 marty.db/
badger 2019/02/20 19:47:41 INFO: Replaying file id: 65 at offset: 395888291
badger 2019/02/20 19:47:41 INFO: Replay took: 6.162µs
2019/02/20 19:47:41 starting value log gc
2019/02/20 19:47:46 no rewrite needed
2019/02/20 19:47:46 completed gc, ran 1 times
badger 2019/02/20 19:47:46 WARNING: Error while forcing compaction on level 0: Unable to fill tables for level: 0

This time I see an error message about compaction unable to fill tables for level 0. I don't know if that is significant or not.

I tried running it 3 more times:

$ badger-compact --percent 0.01 marty.db/
badger 2019/02/20 19:49:14 INFO: Replaying file id: 65 at offset: 395888291
badger 2019/02/20 19:49:14 INFO: Replay took: 6.702µs
2019/02/20 19:49:14 starting value log gc
2019/02/20 19:49:18 no rewrite needed
2019/02/20 19:49:18 completed gc, ran 1 times
badger 2019/02/20 19:49:18 WARNING: Error while forcing compaction on level 0: Unable to fill tables for level: 0
$ badger-compact --percent 0.01 marty.db/
badger 2019/02/20 19:49:26 INFO: Replaying file id: 65 at offset: 395888291
badger 2019/02/20 19:49:26 INFO: Replay took: 6.283µs
2019/02/20 19:49:26 starting value log gc
2019/02/20 19:49:31 no rewrite needed
2019/02/20 19:49:31 completed gc, ran 1 times
badger 2019/02/20 19:49:31 WARNING: Error while forcing compaction on level 0: Unable to fill tables for level: 0
$ badger-compact --percent 0.01 marty.db/
badger 2019/02/20 19:49:34 INFO: Replaying file id: 65 at offset: 395888291
badger 2019/02/20 19:49:34 INFO: Replay took: 9.307µs
2019/02/20 19:49:34 starting value log gc
2019/02/20 19:49:39 no rewrite needed
2019/02/20 19:49:39 completed gc, ran 1 times
badger 2019/02/20 19:49:39 WARNING: Error while forcing compaction on level 0: Unable to fill tables for level: 0

Still no space was reclaimed:

$ du -sh marty.db/
66G	marty.db/

I thought, perhaps I've misunderstood the discardRatio, so I've tried another extreme (0.99):

$ badger-compact --percent 0.99 marty.db/
badger 2019/02/20 19:51:26 INFO: Replaying file id: 65 at offset: 395888291
badger 2019/02/20 19:51:26 INFO: Replay took: 6.266µs
2019/02/20 19:51:26 starting value log gc
2019/02/20 19:51:31 no rewrite needed
2019/02/20 19:51:31 completed gc, ran 1 times
badger 2019/02/20 19:51:31 WARNING: Error while forcing compaction on level 0: Unable to fill tables for level: 0
$ badger-compact --percent 0.99 marty.db/
badger 2019/02/20 19:51:34 INFO: Replaying file id: 65 at offset: 395888291
badger 2019/02/20 19:51:34 INFO: Replay took: 5.9µs
2019/02/20 19:51:34 starting value log gc
2019/02/20 19:51:39 no rewrite needed
2019/02/20 19:51:39 completed gc, ran 1 times
badger 2019/02/20 19:51:39 WARNING: Error while forcing compaction on level 0: Unable to fill tables for level: 0
$ badger-compact --percent 0.99 marty.db/
badger 2019/02/20 19:51:40 INFO: Replaying file id: 65 at offset: 395888291
badger 2019/02/20 19:51:40 INFO: Replay took: 8.781µs
2019/02/20 19:51:40 starting value log gc
2019/02/20 19:51:40 no rewrite needed
2019/02/20 19:51:40 completed gc, ran 1 times
badger 2019/02/20 19:51:40 WARNING: Error while forcing compaction on level 0: Unable to fill tables for level: 0

Still, no space was reclaimed:

# du -sh marty.db/
66G	marty.db/

At this point, it's fair to ask, are we sure there is any space that can be reclaimed? I wrote another program to try and iterate all versions, and see if I could confirm that Badger still sees older versions. Here is the program:

package main

import (
	"flag"
	"fmt"
	"log"

	"github.com/dgraph-io/badger"
)

var rw = flag.Bool("rw", false, "force open the database read-write")

func main() {

	flag.Parse()

	if flag.NArg() < 1 {
		log.Fatalf("must specify path")
	}

	opt := badger.DefaultOptions
	opt.ReadOnly = !*rw
	opt.Dir = flag.Arg(0)
	opt.ValueDir = flag.Arg(0)

	db, err := badger.Open(opt)
	if err != nil {
		log.Fatalf("error opening db: %v", err)
	}

	db.View(func(txn *badger.Txn) error {
		iopt := badger.DefaultIteratorOptions
		iopt.AllVersions = true
		iopt.PrefetchValues = false
		itr := txn.NewIterator(iopt)
		defer itr.Close()
		for itr.Rewind(); itr.Valid(); itr.Next() {
			item := itr.Item()
			k := item.Key()

			fmt.Printf("key=%s, ver=%d\n", k, item.Version())

			if err != nil {
				return err
			}
		}
		return nil
	})
}

Here is some sampled output, which I interpret to mean multiple versions are still around, even after we've attempted to get rid of it:

key=^@domain-159^@, ver=177598
key=^@domain-159^@, ver=177488
key=^@domain-159^@, ver=177178
key=^@domain-159^@, ver=176958
key=^@domain-159^@, ver=176844
key=^@domain-159^@, ver=176662
key=^@domain-159^@, ver=176358
key=^@domain-159^@, ver=176294
key=^@domain-159^@, ver=176050
key=^@domain-159^@, ver=175786
key=^@domain-159^@, ver=175524
key=^@domain-159^@, ver=175386
key=^@domain-159^@, ver=175118
key=^@domain-159^@, ver=175018
key=^@domain-159^@, ver=174854
key=^@domain-159^@, ver=174600
key=^@domain-159^@, ver=174492
key=^@domain-159^@, ver=174192
key=^@domain-159^@, ver=174032
key=^@domain-159^@, ver=173824
key=^@domain-159^@, ver=173566
key=^@domain-159^@, ver=173444
key=^@domain-159^@, ver=173114
key=^@domain-159^@, ver=173004
key=^@domain-159^@, ver=172894
key=^@domain-159^@, ver=172626
key=^@domain-159^@, ver=172328
key=^@domain-159^@, ver=172222
key=^@domain-159^@, ver=172078
key=^@domain-159^@, ver=171744
key=^@domain-159^@, ver=171588
key=^@domain-159^@, ver=171336
key=^@domain-159^@, ver=171288
key=^@domain-159^@, ver=171036
key=^@domain-159^@, ver=170884
key=^@domain-159^@, ver=170504

This key is an extreme case but almost all the keys show multiple versions. It's not obvious how much space this is taking, but surely it is some.

So at this point we're not really sure what to do. We think our program probably should be calling RunValueLogGC, but our attempts to understand how it works in the command-line utility haven't shown anything useful.

Are we using RunValueLogGC incorrectly?
Are we using incorrect options for our use-case?
Are there any additional diagnostics we can run to gain more insight?

cc @aviadl

@manishrjain
Copy link
Contributor

manishrjain commented Feb 21, 2019

Hi @mschoch ,

Looking at the summary from badger info:

[Summary]
Total index size:      0 B

This indicates that so far there are not enough keys in the LSM tree, to run compactions. Badger supports version access for the keys, so it would only reclaim space from value logs once the key versions are discarded. When SSTable compactions happen, then the NumVersionsPerKey setting is considered to discard keys which have multiple versions, value log GC does not consider that.

So, my recommendation would be to either set the CompactL0OnClose=true, or run badger flatten to run the LSM compactions, so then value log GC can reclaim space.

If needed, I could make a change to flatten command, so it would also (optionally) run value log GC, so you can see the space reclaim instantly.

This also indicates to me that the average value size per key is quite large. In that case, you might consider having smaller sized levels in the LSM tree, so more compactions are being done in general. That would allow value log GC to reclaim more space back. Look at LevelOneSize and LevelSizeMultiplier, you can decrease one or both of them.

@manishrjain manishrjain added the kind/question Something requiring a response label Feb 21, 2019
@mschoch
Copy link
Contributor Author

mschoch commented Feb 21, 2019

Thanks for the suggestions.

I should clarify that the only place we set CompactL0OnClose = false is in the original application. And I believe this has no effect, as the main application never calls Close anyway. This causes other problems, and is something we're working on.

However, the other command-line utilities I've run on the same db (after the original main application created it) all leave CompactL0OnClose set to the default (true). And they all properly call Close(), so I would have expected that L0 has been compacted by now..

Here is what I get when I run the flatten utility:

$ badger flatten --dir marty.db
Listening for /debug HTTP requests at port: 8080
badger 2019/02/21 18:03:20 INFO: Replaying file id: 65 at offset: 395888291
badger 2019/02/21 18:03:20 INFO: Replay took: 6.358µs
badger 2019/02/21 18:03:20 INFO: 
badger 2019/02/21 18:03:20 INFO: Level: 0.      0 B Size.      0 B Max.
badger 2019/02/21 18:03:20 INFO: Level: 1.    20 MB Size.   268 MB Max.
badger 2019/02/21 18:03:20 INFO: Level: 2.      0 B Size.   2.7 GB Max.
badger 2019/02/21 18:03:20 INFO: Level: 3.      0 B Size.    27 GB Max.
badger 2019/02/21 18:03:20 INFO: Level: 4.      0 B Size.   268 GB Max.
badger 2019/02/21 18:03:20 INFO: Level: 5.      0 B Size.   2.7 TB Max.
badger 2019/02/21 18:03:20 INFO: Level: 6.      0 B Size.    27 TB Max.
badger 2019/02/21 18:03:20 INFO: All tables consolidated into one level. Flattening done.
badger 2019/02/21 18:03:20 WARNING: Error while forcing compaction on level 0: Unable to fill tables for level: 0

Running compaction again after the flatten, I still see the same thing:

$ badger-compact -percent 0.01 marty.db/
badger 2019/02/21 18:06:51 INFO: Replaying file id: 65 at offset: 395888291
badger 2019/02/21 18:06:51 INFO: Replay took: 6.094µs
2019/02/21 18:06:51 starting value log gc
2019/02/21 18:06:54 no rewrite needed
2019/02/21 18:06:54 completed gc, ran 1 times
badger 2019/02/21 18:06:54 WARNING: Error while forcing compaction on level 0: Unable to fill tables for level: 0

DB still the same size:

$ du -sh marty.db
66G	marty.db

I will try the other suggestions for level size and multipliers, but can you help me understand what you mean when you say, 'not enough keys in the LSM tree, to run compactions'?

When I iterate all keys, here are the counts I get:
466577 with all versions
74665 only most recent version

@manishrjain
Copy link
Contributor

Can you confirm that after you do flatten, you only see one version per key (for most keys)?

The value log GC picks a file at random to see if it matches the threshold for reclamation. When Badger is running live, the LSM compaction keeps track of which keys were thrown out, and matches them against the value log files. But, it does not persist that information, so in offline mode, the random strategy is effective. You could modify your badger-compact loop to run it a few times, so it has a chance to pick the right log file.

I will try the other suggestions for level size and multipliers, but can you help me understand what you mean when you say, 'not enough keys in the LSM tree, to run compactions'?

A typical Badger SSTable can fit a couple of million keys in the default 64MB size. Only when a table exceeds that size, a new SSTable is created. In this case, there are not enough keys to build even a single SSTable (the one in badger info shows 20MB). So, the live path wouldn't have a need to run LSM tree compaction (one can still force it via db.Flatten).

@mschoch
Copy link
Contributor Author

mschoch commented Feb 21, 2019

Can you confirm that after you do flatten, you only see one version per key (for most keys)?

No, those counts I reported were after running the the badger flatten command I showed the output of.

466577 with all versions
74665 only most recent version

@manishrjain
Copy link
Contributor

In live mode, the txn oracle maintains a read watermark, which indicates to the compactor a version below which keys are OK to drop on the floor. I realized in the offline mode (badger flatten), this read mark was not being updated, causing older versions of keys to not be dropped during compaction.

I've pushed a fix here: ac321a8 . If you run badger flatten in a way where it triggers compaction (it won't on an already flattened out LSM tree) with this change, you'd only see one version per key -- which would then allow value log GC to do its job better.

If you're working on the same dir, you could do a badger backup, followed by a badger restore, so you can then run badger flatten. Alternatively, you could modify the flatten tool options to have a 4MB table size, and an 8MB LevelOneSize, so it would force a compaction of the existing table.

@mschoch
Copy link
Contributor Author

mschoch commented Feb 23, 2019

I rebuilt all the binaries, but kept working on the same db. Even the backup/restore/flatten steps did not result in less disk space (still ~66GB). I do see somewhat fewer keys when visiting all versions now 357880 (was 466577 previously)

I think I'm going to put my effort into getting the main application to call RunValueLogGC and tuning the level sizes to hopefully see values getting GC'd.

@mschoch
Copy link
Contributor Author

mschoch commented Feb 25, 2019

OK, we have now updated the main application to perform value log GC in a manner similar to what is suggested in the README:

func (b *badgerWrapper) runValueLogGC(pollInterval int, discardRatio float64) {
	ticker := time.NewTicker(time.Duration(pollInterval) * time.Second)
	defer ticker.Stop()

	for {
		select {
		case <-b.closing:
			return
		case <-ticker.C:
			logrus.Infof("staring value log GC on %s with discardRatio: %f", b.path, discardRatio)
			var err error
			for err == nil {
				err = b.db.RunValueLogGC(discardRatio)
				if err == nil {
					logrus.Infof("value log GC on %s made progress, continuing", b.path)
				}
			}
			if err == badger.ErrNoRewrite {
				logrus.Infof("finished value log GC on %s got no rewrite", b.path)
			} else {
				logrus.WithError(err).Errorf("error performing value log gc on %s", b.path)
			}
		}
	}
}

By default this will wake up ever 30 seconds, try RunValueLogGC, and keep running it if some progress was made.

Second, we've configured the db with much smaller values for some of the key options you mentioned:

MaxTableSize = 4194304
LevelOneSize = 8388608
LevelSizeMultiplier = 2

Also, we're currently using a discardRatio of 0.01 even though that is likely inappropriate, we're trying to see evidence of the value log GC making any improvement.

After the test case is performed, we saw the data size was still quite high:

$ du -sh marty.db
62G	marty.db

Monitoring the logs, we see that ValueLogGC was run 46 times, and only 3 times did it report that it did work (return nil) and in all those cases the subsequent call, returned no rewrite.

Next, we stopped the application (which will not perform a clean close). We then run badger info:

$ badger info --dir marty.db
Listening for /debug HTTP requests at port: 8080

[     2019-02-25T13:06:42Z] MANIFEST      156 B MA
[        2 minutes earlier] 000007.sst   2.3 MB L0 
[         1 minute earlier] 000008.sst   2.3 MB L0 
[                      now] 000009.sst   2.2 MB L0 
[        4 minutes earlier] 000006.sst   1.4 MB L1 
[       11 minutes earlier] 000000.vlog  1.1 GB VL
[       10 minutes earlier] 000001.vlog  1.1 GB VL
[       10 minutes earlier] 000002.vlog  1.1 GB VL
[       10 minutes earlier] 000003.vlog  1.1 GB VL
[       10 minutes earlier] 000004.vlog  1.1 GB VL
[       10 minutes earlier] 000005.vlog  1.1 GB VL
[        9 minutes earlier] 000006.vlog  1.1 GB VL
[        9 minutes earlier] 000008.vlog  1.1 GB VL
[        9 minutes earlier] 000009.vlog  1.1 GB VL
[        8 minutes earlier] 000011.vlog  1.1 GB VL
[        8 minutes earlier] 000012.vlog  1.1 GB VL
[        8 minutes earlier] 000013.vlog  1.1 GB VL
[        8 minutes earlier] 000014.vlog  1.1 GB VL
[        8 minutes earlier] 000015.vlog  1.1 GB VL
[        8 minutes earlier] 000016.vlog  1.1 GB VL
[        7 minutes earlier] 000018.vlog  1.1 GB VL
[        7 minutes earlier] 000019.vlog  1.1 GB VL
[        7 minutes earlier] 000020.vlog  1.1 GB VL
[        7 minutes earlier] 000021.vlog  1.1 GB VL
[        7 minutes earlier] 000022.vlog  1.1 GB VL
[        6 minutes earlier] 000023.vlog  1.1 GB VL
[        6 minutes earlier] 000024.vlog  1.1 GB VL
[        6 minutes earlier] 000025.vlog  1.1 GB VL
[        6 minutes earlier] 000026.vlog  1.1 GB VL
[        6 minutes earlier] 000027.vlog  1.1 GB VL
[        6 minutes earlier] 000028.vlog  1.1 GB VL
[        5 minutes earlier] 000029.vlog  1.1 GB VL
[        5 minutes earlier] 000030.vlog  1.1 GB VL
[        5 minutes earlier] 000031.vlog  1.1 GB VL
[        5 minutes earlier] 000032.vlog  1.1 GB VL
[        5 minutes earlier] 000033.vlog  1.1 GB VL
[        5 minutes earlier] 000034.vlog  1.1 GB VL
[        4 minutes earlier] 000035.vlog  1.1 GB VL
[        4 minutes earlier] 000036.vlog  1.1 GB VL
[        4 minutes earlier] 000037.vlog  1.1 GB VL
[        4 minutes earlier] 000038.vlog  1.1 GB VL
[        4 minutes earlier] 000039.vlog  1.1 GB VL
[        4 minutes earlier] 000040.vlog  1.1 GB VL
[        3 minutes earlier] 000041.vlog  1.1 GB VL
[        3 minutes earlier] 000042.vlog  1.1 GB VL
[        3 minutes earlier] 000043.vlog  1.1 GB VL
[        3 minutes earlier] 000044.vlog  1.1 GB VL
[        3 minutes earlier] 000045.vlog  1.1 GB VL
[        3 minutes earlier] 000046.vlog  1.1 GB VL
[        2 minutes earlier] 000047.vlog  1.1 GB VL
[        2 minutes earlier] 000048.vlog  1.1 GB VL
[        2 minutes earlier] 000049.vlog  1.1 GB VL
[        2 minutes earlier] 000050.vlog  1.1 GB VL
[        2 minutes earlier] 000051.vlog  1.1 GB VL
[         1 minute earlier] 000052.vlog  1.1 GB VL
[         1 minute earlier] 000053.vlog  1.1 GB VL
[         1 minute earlier] 000054.vlog  1.1 GB VL
[         1 minute earlier] 000055.vlog  1.1 GB VL
[         1 minute earlier] 000056.vlog  1.1 GB VL
[         1 minute earlier] 000057.vlog  1.1 GB VL
[       55 seconds earlier] 000058.vlog  1.1 GB VL
[       46 seconds earlier] 000059.vlog  1.1 GB VL
[       37 seconds earlier] 000060.vlog  1.1 GB VL
[       26 seconds earlier] 000061.vlog  1.1 GB VL
[       15 seconds earlier] 000062.vlog  1.1 GB VL
[        5 seconds earlier] 000063.vlog  1.1 GB VL
[          8 minutes later] 000064.vlog  824 MB VL

[EXTRA]
[2019-02-25T12:55:20Z] LOCK            6 B

[Summary]
Level 0 size:       6.8 MB
Level 1 size:       1.4 MB
Total index size:   8.2 MB
Value log size:      66 GB

Abnormalities:
1 extra file.
0 missing files.
0 empty files.
0 truncated manifests.

Out of curiosity, I then ran badger flatten:

$ badger flatten --dir marty.db
Listening for /debug HTTP requests at port: 8080
badger 2019/02/25 13:24:43 INFO: All 4 tables opened in 15ms
badger 2019/02/25 13:24:43 INFO: Replaying file id: 64 at offset: 571402549
badger 2019/02/25 13:24:44 INFO: Replay took: 159.711188ms
badger 2019/02/25 13:24:44 INFO: 
badger 2019/02/25 13:24:44 INFO: Level: 0.   6.8 MB Size.      0 B Max.
badger 2019/02/25 13:24:44 INFO: Level: 1.   1.4 MB Size.   268 MB Max.
badger 2019/02/25 13:24:44 INFO: Level: 2.      0 B Size.   2.7 GB Max.
badger 2019/02/25 13:24:44 INFO: Level: 3.      0 B Size.    27 GB Max.
badger 2019/02/25 13:24:44 INFO: Level: 4.      0 B Size.   268 GB Max.
badger 2019/02/25 13:24:44 INFO: Level: 5.      0 B Size.   2.7 TB Max.
badger 2019/02/25 13:24:44 INFO: Level: 6.      0 B Size.    27 TB Max.
badger 2019/02/25 13:24:44 INFO: Attempting to compact with {level:0 score:1.71}
badger 2019/02/25 13:24:44 INFO: 1 compactor(s) succeeded. One or more tables from level 0 compacted.
badger 2019/02/25 13:24:44 INFO: 
badger 2019/02/25 13:24:44 INFO: Level: 0.      0 B Size.      0 B Max.
badger 2019/02/25 13:24:44 INFO: Level: 1.   5.4 MB Size.   268 MB Max.
badger 2019/02/25 13:24:44 INFO: Level: 2.      0 B Size.   2.7 GB Max.
badger 2019/02/25 13:24:44 INFO: Level: 3.      0 B Size.    27 GB Max.
badger 2019/02/25 13:24:44 INFO: Level: 4.      0 B Size.   268 GB Max.
badger 2019/02/25 13:24:44 INFO: Level: 5.      0 B Size.   2.7 TB Max.
badger 2019/02/25 13:24:44 INFO: Level: 6.      0 B Size.    27 TB Max.
badger 2019/02/25 13:24:44 INFO: All tables consolidated into one level. Flattening done.
badger 2019/02/25 13:24:44 INFO: Storing value log head: {Fid:64 Len:0 Offset:823572315}
badger 2019/02/25 13:24:44 INFO: Force compaction on level 0 done

Then badger info again:

$ badger info --dir marty.db
Listening for /debug HTTP requests at port: 8080

[     2019-02-25T13:24:44Z] MANIFEST      334 B MA
[                      now] 000012.sst   6.4 MB L1 2855da2f3f97d00a10818403f558143a2db4edff5b458e9172f7736cce13d98b
[       29 minutes earlier] 000000.vlog  1.1 GB VL
[       28 minutes earlier] 000001.vlog  1.1 GB VL
[       28 minutes earlier] 000002.vlog  1.1 GB VL
[       28 minutes earlier] 000003.vlog  1.1 GB VL
[       28 minutes earlier] 000004.vlog  1.1 GB VL
[       28 minutes earlier] 000005.vlog  1.1 GB VL
[       27 minutes earlier] 000006.vlog  1.1 GB VL
[       27 minutes earlier] 000008.vlog  1.1 GB VL
[       27 minutes earlier] 000009.vlog  1.1 GB VL
[       26 minutes earlier] 000011.vlog  1.1 GB VL
[       26 minutes earlier] 000012.vlog  1.1 GB VL
[       26 minutes earlier] 000013.vlog  1.1 GB VL
[       26 minutes earlier] 000014.vlog  1.1 GB VL
[       26 minutes earlier] 000015.vlog  1.1 GB VL
[       26 minutes earlier] 000016.vlog  1.1 GB VL
[       25 minutes earlier] 000018.vlog  1.1 GB VL
[       25 minutes earlier] 000019.vlog  1.1 GB VL
[       25 minutes earlier] 000020.vlog  1.1 GB VL
[       25 minutes earlier] 000021.vlog  1.1 GB VL
[       25 minutes earlier] 000022.vlog  1.1 GB VL
[       24 minutes earlier] 000023.vlog  1.1 GB VL
[       24 minutes earlier] 000024.vlog  1.1 GB VL
[       24 minutes earlier] 000025.vlog  1.1 GB VL
[       24 minutes earlier] 000026.vlog  1.1 GB VL
[       24 minutes earlier] 000027.vlog  1.1 GB VL
[       24 minutes earlier] 000028.vlog  1.1 GB VL
[       23 minutes earlier] 000029.vlog  1.1 GB VL
[       23 minutes earlier] 000030.vlog  1.1 GB VL
[       23 minutes earlier] 000031.vlog  1.1 GB VL
[       23 minutes earlier] 000032.vlog  1.1 GB VL
[       23 minutes earlier] 000033.vlog  1.1 GB VL
[       23 minutes earlier] 000034.vlog  1.1 GB VL
[       22 minutes earlier] 000035.vlog  1.1 GB VL
[       22 minutes earlier] 000036.vlog  1.1 GB VL
[       22 minutes earlier] 000037.vlog  1.1 GB VL
[       22 minutes earlier] 000038.vlog  1.1 GB VL
[       22 minutes earlier] 000039.vlog  1.1 GB VL
[       22 minutes earlier] 000040.vlog  1.1 GB VL
[       21 minutes earlier] 000041.vlog  1.1 GB VL
[       21 minutes earlier] 000042.vlog  1.1 GB VL
[       21 minutes earlier] 000043.vlog  1.1 GB VL
[       21 minutes earlier] 000044.vlog  1.1 GB VL
[       21 minutes earlier] 000045.vlog  1.1 GB VL
[       21 minutes earlier] 000046.vlog  1.1 GB VL
[       20 minutes earlier] 000047.vlog  1.1 GB VL
[       20 minutes earlier] 000048.vlog  1.1 GB VL
[       20 minutes earlier] 000049.vlog  1.1 GB VL
[       20 minutes earlier] 000050.vlog  1.1 GB VL
[       20 minutes earlier] 000051.vlog  1.1 GB VL
[       20 minutes earlier] 000052.vlog  1.1 GB VL
[       19 minutes earlier] 000053.vlog  1.1 GB VL
[       19 minutes earlier] 000054.vlog  1.1 GB VL
[       19 minutes earlier] 000055.vlog  1.1 GB VL
[       19 minutes earlier] 000056.vlog  1.1 GB VL
[       19 minutes earlier] 000057.vlog  1.1 GB VL
[       18 minutes earlier] 000058.vlog  1.1 GB VL
[       18 minutes earlier] 000059.vlog  1.1 GB VL
[       18 minutes earlier] 000060.vlog  1.1 GB VL
[       18 minutes earlier] 000061.vlog  1.1 GB VL
[       18 minutes earlier] 000062.vlog  1.1 GB VL
[       18 minutes earlier] 000063.vlog  1.1 GB VL
[                      now] 000064.vlog  824 MB VL

[Summary]
Level 0 size:          0 B
Level 1 size:       6.4 MB
Total index size:   6.4 MB
Value log size:      66 GB

Abnormalities: None.
0 extra files.
0 missing files.
0 empty files.
0 truncated manifests.

Out of curiosity I checked the number of keys I see with and without the all keys option:

all: 74665
normal: 140835

It seems like with these options we do see fewer versions of keys, but overall size still leads us to believe there is significant space used for old versions of values.

Are there any other diagnostics that could be run to gain insight into the space used in the value log?

@mschoch
Copy link
Contributor Author

mschoch commented Feb 25, 2019

I've modified the utility I was using to count keys and it now also counts up the key and value sizes:

$ badger-help -summary marty.db
badger 2019/02/25 15:21:50 INFO: All 1 tables opened in 25ms
badger 2019/02/25 15:21:50 INFO: Replaying file id: 64 at offset: 823572315
badger 2019/02/25 15:21:50 INFO: Replay took: 7.626µs
keys: 74336, keysSize: 2189597, valsSize: 680561299
$ badger-help -all -summary marty.db
badger 2019/02/25 15:22:00 INFO: All 1 tables opened in 25ms
badger 2019/02/25 15:22:00 INFO: Replaying file id: 64 at offset: 823572315
badger 2019/02/25 15:22:00 INFO: Replay took: 6.676µs
keys: 140502, keysSize: 5023011, valsSize: 995889909

This seems to support our belief that the live value data is closer to 1GB (680MB counted here). We of course understand the actual size will be larger, but 62-66GB is pretty far from that.

@mschoch
Copy link
Contributor Author

mschoch commented Feb 26, 2019

I've added some logging to try and understand why badger chooses not to rewrite the files, even though we suspect they are significantly garbage. These are again done with the offline compaction tool, so selection of a file is random, but that's OK for now.

I believe the decision not to rewrite happens here:

https://github.com/dgraph-io/badger/blob/master/value.go#L1253-L1255

The print from the line just before shows (formatting of message slightly different than the trace):

2019/02/26 03:47:27 fid: 4. skipped 345.27MB num iterations: 4654. data status: {total:679.4092302322388 discard:679.0312881469727 count:3118}

My read of this is that discard is almost the entire total, but this if condition is triggered because r.total is 679.something and yet sizeWindow*0.75 is 80583902.25. The comments say:

	// If we couldn't sample at least a 1000 KV pairs or at least 75% of the window size,
	// and what we can discard is below the threshold, we should skip the rewrite.

It seems to me that in our case, these preconditions of 1000 kv pairs, or 75% of the window size aren't met. Is it possible these are just arbitrary and happen to not work in our use case?

@manishrjain
Copy link
Contributor

It seems to me that in our case, these preconditions of 1000 kv pairs, or 75% of the window size aren't met. Is it possible these are just arbitrary and happen to not work in our use case?

That's right. The idea was to have a decent sample size. Considering a typical value log would have a million key-value pairs (ValueLogMaxEntries), 1000 is 0.1%. Clearly, this doesn't work in your case.

I think we could make the sampling sizes configurable via DefaultOptions. Happy to accept a PR to that extent.

@mschoch
Copy link
Contributor Author

mschoch commented Feb 26, 2019

Thanks @manishrjain I can take a look at making it configurable. However, I have one more concern from reading the code. The comparison made here:

https://github.com/dgraph-io/badger/blob/master/value.go#L1253

if (r.count < countWindow && r.total < sizeWindow*0.75) || r.discard < discardRatio*r.total {

It seems to me this may be incorrectly comparing different units. The r.total field is a sum of entry sizes: https://github.com/dgraph-io/badger/blob/master/value.go#L1196

And entry sizes are converted to and tracked in MB: https://github.com/dgraph-io/badger/blob/master/value.go#L1177

But the comparison is made against a percentage of sizeWindow, and sizeWindow is in bytes (10% of the value log file size): https://github.com/dgraph-io/badger/blob/master/value.go#L1159

Am I misreading this?

@mschoch
Copy link
Contributor Author

mschoch commented Feb 26, 2019

I've proposed a fix for what I think is the problem here: #724

At least in offline compaction, it works much better for us with this applied. Testing online behavior now.

@mschoch
Copy link
Contributor Author

mschoch commented Feb 27, 2019

So here is our update attempting to perform online value log GC in our application:

We're running with this patch applied #725

Our BadgerDB options are:

TableLoadingMode: options.MemoryMap
MaxTableSize: 4194304
LevelOneSize: 8388608
LevelSizeMultiplier: 2

We invoke RunValueLogGC every second with discard ratio of 0.01

What we observe is that while our workload is running, which we know creates/updates large values, the database size still grows as high as 42GB. That is the peak, even as the workload is nearly finished, we still see it staying in the 15-20GB range. And finally, only after all mutations have stopped does it eventually come back to the ~1.4GB range.

It feels like we're aggressively trying to do GC, and it can't keep up. We could try calling it even more frequently, but it seems like this ultimately is going to be too expensive to run this frequently all the time without any more signal to go on.

I have been looking to see is there is some way we can try to invoke RunValueLogGC more intelligently and I came across this documentation for the Size() method

Size returns the size of lsm and value log files in bytes. It can be used to decide how often to call RunValueLogGC.

But, how does knowing the size help without knowing the discard stats?

Do you have any other suggestions for things to tune?

@manishrjain
Copy link
Contributor

The idea for Size() is that every time it grows by a GB (i.e. added a new log file), you could run value log GC. You can also reduce the number of tables in L0, so L0 would compact as soon as it gets the first one. Try NumLevelZeroTables to 0 or 1, and ...Stall to say 1 or 2. That would help aggressively drop previous keys.

We also could persist discard stats, so offline mechanism can also be better informed instead of shooting in the dark.

I don't see much benefit in exposing discard stats to the user unless we have a way by which the user tells Badger which log file to attempt to GC -- I don't see obvious benefits there.

@mschoch
Copy link
Contributor Author

mschoch commented Feb 28, 2019

Thanks again, setting NumLevelZeroTables and ...Stall to 1 and 2 respectively worked well. While running the workload we stayed under 10GB (peaks) and closer to 2-3GB much of the time.

And I can now see how using Size() will be helpful. At a minimum it can let us detect when nothing is changing and back off a bit. I'll need to do some more experimenting to understand how to interpret and react to specific value size changes...

Have you had a chance to review #725?

@manishrjain
Copy link
Contributor

Great. Didn't realize you addressed the comments.
Will review #725 today.

@manishrjain
Copy link
Contributor

Is there anything else needed here?

@mschoch
Copy link
Contributor Author

mschoch commented Mar 9, 2019

I think we're OK for now. Thanks for your help.

@mschoch mschoch closed this as completed Mar 9, 2019
@pancakeliu
Copy link

@manishrjain @mschoch Hello, I meet the problem same as mschoch‘s.
I have been attempting to migrate an application from LevelDB to Badger. But Badger databases(400G) much better than LevelDB(146G).
I tried the method you discussed above. But didn't solve it.
Here are my codes, please review! Thanks a lot!

func (db *BadgerDB) badgerGc() {
	ticker := time.NewTicker(1 * time.Minute)
	defer ticker.Stop()
	for range ticker.C {
	again:
		err := db.db.RunValueLogGC(0.5)
		if err == nil {
			goto again
		}
	}
}

func NewBadgerDB(name string, dir string) (*BadgerDB, error) {
	dbPath := filepath.Join(dir, name + ".db")
	badgerOpts := badger.DefaultOptions
	badgerOpts.Dir      = dbPath
	badgerOpts.ValueDir = dbPath

	db, err := badger.Open(badgerOpts)
	if err != nil {
		return nil, err
	}

	database := &BadgerDB{
		db: db,
	}
	go database.badgerGc()

	return database, nil
}

Base on commit deee8c7
Most operations is insert, so I think the matter is not GC's.
Witch options should tuning? Thanks for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/question Something requiring a response
Development

No branches or pull requests

3 participants