Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce Pruning to IAVL #158

Merged
merged 69 commits into from
Jan 16, 2020
Merged
Show file tree
Hide file tree
Changes from 56 commits
Commits
Show all changes
69 commits
Select commit Hold shift + click to select a range
bed4822
Memtree (#11)
mattkanwisher Jul 2, 2019
23f0b99
Add version saving to in memory, and ability to flush to disk periodi…
mattkanwisher Jul 2, 2019
aef79a7
Add version saving to in memory, and ability to flush to disk periodi…
mattkanwisher Jul 2, 2019
43d2409
Add version saving to in memory, and ability to flush to disk periodi…
mattkanwisher Jul 2, 2019
c6f7eee
initial changes
AdityaSripal Jul 8, 2019
099ec05
more changes to nodedb
AdityaSripal Jul 10, 2019
6dc8e62
fmt and some docs
AdityaSripal Jul 10, 2019
6f51bea
move some pruning logic into nodedb
AdityaSripal Jul 11, 2019
0e2a954
completed changes to nodedb
AdityaSripal Jul 11, 2019
a840877
initial updates to mutable_tree
AdityaSripal Jul 12, 2019
86e8371
update tree.versions after pruning
AdityaSripal Jul 12, 2019
6b3597a
fix most build errs and allow memDB to be disabled
AdityaSripal Jul 12, 2019
8ea53a0
merge master
AdityaSripal Jul 12, 2019
bb45bf1
fix build errors and maintain backward compatibility
AdityaSripal Jul 12, 2019
bb129a9
only write to db if pruning opts allow it
AdityaSripal Jul 12, 2019
f01c63a
debugging...
AdityaSripal Jul 12, 2019
78060fe
remove print statement
AdityaSripal Jul 12, 2019
2f3502a
optimize deleteOrphans and fix final test errors
AdityaSripal Jul 12, 2019
bce37e3
lint
AdityaSripal Jul 12, 2019
46d16a0
add brief design doc and small fix
AdityaSripal Jul 15, 2019
d244ed3
add basic pruning tests
AdityaSripal Jul 15, 2019
c5af5b9
change tests to use trees with random pruning parameters
AdityaSripal Jul 16, 2019
2ab4609
quick fix
AdityaSripal Jul 16, 2019
e50b297
add edge case tests
AdityaSripal Jul 17, 2019
0dc0769
current progress in testing
AdityaSripal Jul 19, 2019
dacbd97
fix bugs
AdityaSripal Jul 22, 2019
5cf4558
fix merge
AdityaSripal Jul 23, 2019
67f9e19
address initial comments
AdityaSripal Jul 23, 2019
b5d09a7
add replace/remove pruning tests. improve remove efficiency
AdityaSripal Jul 24, 2019
f4b9941
update bench_test
AdityaSripal Jul 25, 2019
0a1f25e
add some more pruning options
AdityaSripal Jul 25, 2019
097f2c5
add commas
AdityaSripal Jul 25, 2019
7bf79c0
add pruning benchmark tests
AdityaSripal Jul 26, 2019
48e714f
slightly more reasonable pruning benchmarks
AdityaSripal Jul 27, 2019
67a4131
options refactor
AdityaSripal Jul 29, 2019
6ed75ff
use benching options in benchmark tests
AdityaSripal Jul 29, 2019
1a5311f
Apply suggestions from fede code review
AdityaSripal Jul 30, 2019
54b2dac
tidy and remove unnecessary orphan
AdityaSripal Jul 30, 2019
6383f8d
Merge branch 'aditya/pruning' of github.com:tendermint/iavl into adit…
AdityaSripal Jul 30, 2019
f1e117d
complete fede doc comments
AdityaSripal Jul 30, 2019
9df04a4
fix merge
AdityaSripal Jul 30, 2019
1e9bd2c
fix linter
AdityaSripal Jul 31, 2019
46785c6
fix merge
AdityaSripal Jul 31, 2019
9e8fde7
replace keys pruning tests
AdityaSripal Aug 5, 2019
fcc92e4
fix imports
AdityaSripal Aug 5, 2019
7e10d5d
Update pruning_test.go
AdityaSripal Aug 5, 2019
b15162c
add DO results
AdityaSripal Aug 6, 2019
cb05738
Merge branch 'aditya/pruning' of github.com:tendermint/iavl into adit…
AdityaSripal Aug 6, 2019
6367c2f
more pruning strategies
AdityaSripal Aug 7, 2019
1772b7e
fix deleteVersionsFrom
AdityaSripal Aug 9, 2019
4d67ccf
merge and version benchmarks
AdityaSripal Aug 12, 2019
0fe8a80
add sdk benchmark results
AdityaSripal Aug 13, 2019
f661c70
fix tiny bug
AdityaSripal Oct 8, 2019
e833cf4
Merge branch 'master' into aditya/pruning
tac0turtle Dec 4, 2019
1b74253
minor linting fixes
tac0turtle Dec 4, 2019
529695f
Merge branch 'master' into aditya/pruning
tac0turtle Dec 5, 2019
fbdde5d
fix linting issues
tac0turtle Dec 6, 2019
126ac73
Update PRUNING.md
tac0turtle Dec 6, 2019
59317ba
fix test
tac0turtle Dec 6, 2019
8b2dc94
Merge branch 'aditya/pruning' of https://github.com/tendermint/iavl i…
tac0turtle Dec 6, 2019
84deb83
Add options validation
tnachen Dec 19, 2019
533652c
Fix build
tnachen Dec 20, 2019
c529855
Fix linter
tnachen Dec 20, 2019
bbb6aa3
move pruning docs
tac0turtle Jan 8, 2020
72d9cdd
Update pruning docs and tests
tnachen Jan 10, 2020
9b3e944
Merge branch 'master' into aditya/pruning
tac0turtle Jan 14, 2020
a382f71
Fix build
tnachen Jan 14, 2020
3da2290
Merge branch 'master' into aditya/pruning
tac0turtle Jan 15, 2020
0b86aa0
errors: add some error handling (#199)
tac0turtle Jan 16, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .golangci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -61,4 +61,4 @@ linters-settings:
# enabled-tags:
# - performance
# - style
# - experimental
# - experimental
2 changes: 2 additions & 0 deletions CHANGELOG_PENDING.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ Special thanks to external contributors on this release:

### BREAKING CHANGES

- [/#158] NodeDB constructor must provide `keepRecent` and `keepEvery` fields to define PruningStrategy. All Save functionality must specify whether they should flushToDisk as well using `flushToDisk` boolean argument. All Delete functionality must specify whether object should be deleted from memory only using the `memOnly` boolean argument.

### IMPROVEMENTS

### Bug Fix
Expand Down
3 changes: 3 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,9 @@ fullbench:
go test $(LDFLAGS) -bench=Mem . && \
go test $(LDFLAGS) -timeout=60m -bench=LevelDB .

benchprune:
cd benchmarks && \
go test -bench=PruningStrategies -timeout=24h
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a line somewhere how to reproduce the benchmarks?


# note that this just profiles the in-memory version, not persistence
profile:
Expand Down
38 changes: 38 additions & 0 deletions PRUNING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Pruning
tac0turtle marked this conversation as resolved.
Show resolved Hide resolved

Setting Pruning fields in the IAVL tree can optimize performance by only writing versions to disk if they are meant to be persisted indefinitely. Versions that are known to be deleted eventually are temporarily held in memory until they are ready to be pruned. This greatly reduces the I/O load of IAVL.
AdityaSripal marked this conversation as resolved.
Show resolved Hide resolved

We can set custom pruning fields in IAVL using: `NewMutableTreePruningOpts`
tac0turtle marked this conversation as resolved.
Show resolved Hide resolved


## Current design

### NodeDB
NodeDB has extra fields:

```go
recentDB dbm.DB // Memory node storage.
recentBatch dbm.Batch // Batched writing buffer for memDB.

// Pruning fields
keepEvery int64n // Saves version to disk periodically
keepRecent int64 // Saves recent versions in memory
```

If version is not going to be persisted to disk, the version is simply saved in `recentDB` (typically a `memDB`)
If version is persisted to disk, the version is written to `recentDB` **and** `snapshotDB` (typically `levelDB`)

#### Orphans:

Save orphan to `memDB` under `o|toVersion|fromVersion`.

If there exists snapshot version `snapVersion` s.t. `fromVersion < snapVersion < toVersion`, save orphan to disk as well under `o|snapVersion|fromVersion`.
NOTE: in unlikely event, that two snapshot versions exist between `fromVersion` and `toVersion`, we use closest snapshot version that is less than `toVersion`

Can then simply use the old delete algorithm with some minor simplifications/optimizations

### MutableTree

MutableTree can be instantiated with a pruning-aware NodeDB.

When `MutableTree` saves a new Version, it also calls `PruneRecentVersions` on nodeDB which causes oldest version in recentDB (`latestVersion - keepRecent`) to get pruned.
15 changes: 6 additions & 9 deletions basic_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ import (
)

func TestBasic(t *testing.T) {
tree := NewMutableTree(db.NewMemDB(), 0)
tree := getTestTree(0)
up := tree.Set([]byte("1"), []byte("one"))
if up {
t.Error("Did not expect an update (should have been create)")
Expand Down Expand Up @@ -186,12 +186,10 @@ func TestUnit(t *testing.T) {
}

func TestRemove(t *testing.T) {
size := 10000
keyLen, dataLen := 16, 40

d := db.NewDB("test", "memdb", "")
defer d.Close()
t1 := NewMutableTree(d, size)
size := 10000
t1 := getTestTree(size)
tac0turtle marked this conversation as resolved.
Show resolved Hide resolved

// insert a bunch of random nodes
keys := make([][]byte, size)
Expand Down Expand Up @@ -221,7 +219,7 @@ func TestIntegration(t *testing.T) {
}

records := make([]*record, 400)
tree := NewMutableTree(db.NewMemDB(), 0)
tree := getTestTree(0)

randomRecord := func() *record {
return &record{randstr(20), randstr(20)}
Expand Down Expand Up @@ -303,7 +301,7 @@ func TestIterateRange(t *testing.T) {
}
sort.Strings(keys)

tree := NewMutableTree(db.NewMemDB(), 0)
tree := getTestTree(0)

// insert all the data
for _, r := range records {
Expand Down Expand Up @@ -393,8 +391,7 @@ func TestPersistence(t *testing.T) {
func TestProof(t *testing.T) {

// Construct some random tree
db := db.NewMemDB()
tree := NewMutableTree(db, 100)
tree := getTestTree(100)
for i := 0; i < 10; i++ {
key, value := randstr(20), randstr(20)
tree.Set([]byte(key), []byte(value))
Expand Down
72 changes: 38 additions & 34 deletions benchmarks/bench_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,6 @@ import (
db "github.com/tendermint/tm-db"
)

const historySize = 20

func randBytes(length int) []byte {
key := make([]byte, length)
// math.rand.Read always returns err=nil
Expand All @@ -23,8 +21,8 @@ func randBytes(length int) []byte {
return key
}

func prepareTree(b *testing.B, db db.DB, size, keyLen, dataLen int) (*iavl.MutableTree, [][]byte) {
t := iavl.NewMutableTree(db, size)
func prepareTree(b *testing.B, snapdb db.DB, memdb db.DB, keepEvery int64, keepRecent int64, size, keyLen, dataLen int) (*iavl.MutableTree, [][]byte) {
t := iavl.NewMutableTreeWithOpts(snapdb, memdb, size, iavl.PruningOptions(keepEvery, keepRecent))
keys := make([][]byte, size)

for i := 0; i < size; i++ {
Expand All @@ -37,19 +35,15 @@ func prepareTree(b *testing.B, db db.DB, size, keyLen, dataLen int) (*iavl.Mutab
return t, keys
}

// commit tree saves a new version and deletes and old one...
// commit tree saves a new version according to pruning strategy passed into IAVL
func commitTree(b *testing.B, t *iavl.MutableTree) {
t.Hash()
_, version, err := t.SaveVersion()

_, _, err := t.SaveVersion() //this will flush for us every so often

if err != nil {
b.Errorf("Can't save: %v", err)
}
if version > historySize {
err = t.DeleteVersion(version - historySize)
if err != nil {
b.Errorf("Can't delete: %v", err)
}
}
}

func runQueries(b *testing.B, t *iavl.MutableTree, keyLen int) {
Expand Down Expand Up @@ -220,28 +214,38 @@ func BenchmarkLevelDBLargeData(b *testing.B) {

func runBenchmarks(b *testing.B, benchmarks []benchmark) {
fmt.Printf("%s\n", iavl.GetVersionInfo())
for _, bb := range benchmarks {
prefix := fmt.Sprintf("%s-%d-%d-%d-%d", bb.dbType, bb.initSize,
bb.blockSize, bb.keyLen, bb.dataLen)

// prepare a dir for the db and cleanup afterwards
dirName := fmt.Sprintf("./%s-db", prefix)
defer func() {
err := os.RemoveAll(dirName)
if err != nil {
b.Errorf("%+v\n", err)
pruningStrategies := []pruningstrat{
{1, 0}, // default pruning strategy
{0, 1}, // keep single recent version
{100, 5}, // simple pruning
{1000, 10}, // average pruning
{1000, 1}, // extreme pruning
{10000, 100}, // SDK pruning
}
for _, ps := range pruningStrategies {
for _, bb := range benchmarks {
prefix := fmt.Sprintf("%s-%d-%d-%d-%d-%d-%d", bb.dbType, ps.keepEvery, ps.keepRecent,
bb.initSize, bb.blockSize, bb.keyLen, bb.dataLen)

// prepare a dir for the db and cleanup afterwards
dirName := fmt.Sprintf("./%s-db", prefix)
defer func() {
err := os.RemoveAll(dirName)
if err != nil {
b.Errorf("%+v\n", err)
}
}()

// note that "" leads to nil backing db!
var d db.DB
if bb.dbType != "nodb" {
d = db.NewDB("test", bb.dbType, dirName)
defer d.Close()
}
}()

// note that "" leads to nil backing db!
var d db.DB
if bb.dbType != "nodb" {
d = db.NewDB("test", bb.dbType, dirName)
defer d.Close()
b.Run(prefix, func(sub *testing.B) {
runSuite(sub, d, ps.keepEvery, ps.keepRecent, bb.initSize, bb.blockSize, bb.keyLen, bb.dataLen)
})
}
b.Run(prefix, func(sub *testing.B) {
runSuite(sub, d, bb.initSize, bb.blockSize, bb.keyLen, bb.dataLen)
})
}
}

Expand All @@ -254,12 +258,12 @@ func memUseMB() float64 {
return mb
}

func runSuite(b *testing.B, d db.DB, initSize, blockSize, keyLen, dataLen int) {
func runSuite(b *testing.B, d db.DB, keepEvery int64, keepRecent int64, initSize, blockSize, keyLen, dataLen int) {
// measure mem usage
runtime.GC()
init := memUseMB()

t, keys := prepareTree(b, d, initSize, keyLen, dataLen)
t, keys := prepareTree(b, d, db.NewMemDB(), keepEvery, keepRecent, initSize, keyLen, dataLen)
used := memUseMB() - init
fmt.Printf("Init Tree took %0.2f MB\n", used)

Expand Down
99 changes: 99 additions & 0 deletions benchmarks/prune_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
package benchmarks

import (
"fmt"
"math/rand"
"os"
"runtime"
"testing"

db "github.com/tendermint/tm-db"
)

type pruningstrat struct {
keepEvery, keepRecent int64
}

// To test effect of pruning strategy, we must measure time to execute many blocks
// Execute 30000 blocks with the given IAVL tree's pruning strategy
func runBlockChain(b *testing.B, prefix string, keepEvery int64, keepRecent int64, keyLen, dataLen int) {
// prepare a dir for the db and cleanup afterwards
dirName := fmt.Sprintf("./%s-db", prefix)
defer func() {
err := os.RemoveAll(dirName)
if err != nil {
b.Errorf("%+v\n", err)
}
}()

runtime.GC()

// always initialize tree with goleveldb as snapshotDB and memDB as recentDB
snapDB := db.NewDB("test", "goleveldb", dirName)
defer snapDB.Close()

// var mem runtime.MemStats
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leftover comments

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently in process of getting this to work correctly. will remove once i've fixed this

// runtime.ReadMemStats(&mem)
// memSize := mem.Alloc
// maxVersion := 0
var keys [][]byte
for i := 0; i < 100; i++ {
keys = append(keys, randBytes(keyLen))
}

// reset timer after initialization logic
b.ResetTimer()
t, _ := prepareTree(b, snapDB, db.NewMemDB(), keepEvery, keepRecent, 5, keyLen, dataLen)

// create 30000 versions
for i := 0; i < 5000; i++ {
// set 5 keys per version
for j := 0; j < 5; j++ {
index := rand.Int63n(100)
t.Set(keys[index], randBytes(dataLen))
}
_, _, err := t.SaveVersion()
if err != nil {
b.Errorf("Can't save version %d: %v", i, err)
}
// // Pause timer to garbage-collect and remeasure memory usage
// b.StopTimer()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

// runtime.GC()
// runtime.ReadMemStats(&mem)
// // update memSize if it has increased after saveVersion
// if memSize < mem.Alloc {
// memSize = mem.Alloc
// maxVersion = i
// }
// b.StartTimer()
b.StopTimer()
runtime.GC()
b.StartTimer()
}
//fmt.Printf("Maxmimum Memory usage was %0.2f MB at height %d\n", float64(memSize)/1000000, maxVersion)
b.StopTimer()
}

func BenchmarkPruningStrategies(b *testing.B) {
ps := []pruningstrat{
{1, 0}, // default pruning strategy
//{1, 1},
{0, 1}, // keep single recent version
{100, 1},
{100, 5}, // simple pruning
{5, 1},
{5, 2},
{10, 2},
// {1000, 10}, // average pruning
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

// {1000, 1}, // extreme pruning
// {10000, 100}, // SDK pruning
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these commented out on purpose? SDK pruning sounds important to benchmark.

}
for _, ps := range ps {
ps := ps
prefix := fmt.Sprintf("PruningStrategy{%d-%d}-KeyLen:%d-DataLen:%d", ps.keepEvery, ps.keepRecent, 16, 40)

b.Run(prefix, func(sub *testing.B) {
runBlockChain(sub, prefix, ps.keepEvery, ps.keepRecent, 16, 40)
})
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
goos: darwin
goarch: amd64
pkg: github.com/tendermint/iavl/benchmarks
BenchmarkPruningStrategies/PruningStrategy{1-0}-KeyLen:16-DataLen:40-8 1 2837806322 ns/op
BenchmarkPruningStrategies/PruningStrategy{0-1}-KeyLen:16-DataLen:40-8 1 1124373981 ns/op
BenchmarkPruningStrategies/PruningStrategy{100-1}-KeyLen:16-DataLen:40-8 1 1255040658 ns/op
BenchmarkPruningStrategies/PruningStrategy{100-5}-KeyLen:16-DataLen:40-8 1 1459752743 ns/op
PASS
ok github.com/tendermint/iavl/benchmarks 12.375s
Loading