executor: reduce memory usage and GC overhead for hash join. #2957

coocood · 2017-03-29T13:32:46Z

Implemented and use MVMap to reduce GC overhead and memory usage for hash join.

coocood · 2017-03-29T13:33:01Z

benchmark result will be added later.

hanfei1991 · 2017-03-30T06:00:01Z

executor/join.go

-			e.hashTable[string(hashcode)] = []*Row{row}
-		} else {
-			e.hashTable[string(hashcode)] = append(rows, row)
+		buffer = buffer[:0]


Why do this ?

Reuse the buffer memory.

hanfei1991 · 2017-03-30T06:01:48Z

executor/join.go

+	if err != nil {
+		return nil, errors.Trace(err)
+	}
+	return b, nil


return b, errors.Trace(err)

hanfei1991 · 2017-03-30T06:03:57Z

executor/join.go

+
+func (e *HashJoinExec) encodeRowKey(b []byte, rowKey *RowKeyEntry) []byte {
+	b = codec.EncodeVarint(b, rowKey.Handle)
+	for i, tn := range e.tableNames {


Every row's table name is the same. Need we encode it?

hanfei1991 · 2017-03-30T06:13:00Z

util/mvmap/mvmap.go

+	m.hashFunc = fnv.New64()
+	m.entryStore.slices = [][]entry{make([]entry, 0, 64)}
+	// append first empty entry so zero entry pointer an represent null.
+	m.entryStore.put(entry{})


put nullEntryAddr

hanfei1991 · 2017-03-30T06:22:09Z

util/mvmap/mvmap.go

+func (ds *dataStore) put(key, value []byte) dataAddr {
+	dataLen := uint32(len(key) + len(value))
+	if ds.sliceLen != 0 && ds.sliceLen+dataLen > maxDataSliceLen {
+		ds.slices = append(ds.slices, make([]byte, 0, maxDataSliceLen))


the cap should be max(maxDataSliceLen, dataLen) ?

hanfei1991 · 2017-03-30T06:30:33Z

util/mvmap/mvmap.go

+	hashKey := m.hash(key)
+	entryAddr := m.hashTable[hashKey]
+	for entryAddr != nullEntryAddr {
+		he := m.entryStore.get(entryAddr)


Why name he? It makes me confused.

hanfei1991 · 2017-03-30T06:33:38Z

util/mvmap/mvmap.go

+	for entryAddr != nullEntryAddr {
+		he := m.entryStore.get(entryAddr)
+		entryAddr = he.next
+		k, v := m.dataStore.get(he)


can we implement compare and get ? If key is different with k, don't fetch the v.

tiancaiamao · 2017-03-30T07:14:08Z

util/mvmap/mvmap.go

+// MVMap stores multiple value for a given key with minimum GC overhead.
+// A given key can store multiple values.
+// It is not thread-safe, should only be used in one goroutine.
+type MVMap struct {


This struct is complex, what's the benefit versus this?

type KeyType []byte type ValueType [][]byte map[KeyType]ValueType

If the key type and value type in the map don't have pointer, Go garbage collector simply skip the map.

tiancaiamao · 2017-03-30T07:16:35Z

executor/join.go

+	if err != nil {
+		return nil, errors.Trace(err)
+	}
+	row.Data = values


I'm confused with Row struct, it has a Data and a RowKeys fields, what's the difference?

tiancaiamao · 2017-03-30T07:20:16Z

Does the encode/decode overhead cover the GC cost after change Row to []byte ?

coocood · 2017-03-30T10:50:02Z

Tested join two tables with 1 million rows, the TiDB process memory has increased to 4GB, a little bit smaller than master, this probably due to the GC is slower than memory allocation.

./benchdb -run="create|insert:0_1000000"
./benchdb -table="benchdb2" -run="create|insert:0_1000000"

select * from benchdb, benchdb2 where benchdb.id = benchdb2.id and benchdb.exp + benchdb2.exp < 0;

The performance is about 10% faster than master when hash table size is 1 million, but slower for smaller hash table size.

I'll try to implement lazy decode on hash join, and see the difference.

hanfei1991 · 2017-03-30T12:01:51Z

LGTM

coocood · 2017-03-30T12:22:43Z

After remove the extra memory allocation.
The benchmark result is:

memory:

master: 3.3GB
encode-row: 3.0GB

time:

master: 7.6s            
encode-row: 6.8s

tiancaiamao · 2017-03-30T14:30:52Z

LGTM

executor: reduce memory usage and GC overhead for hash join.

7ac57c7

Implemented and use MVMap to reduce GC overhead and memory usage for hash join.

hanfei1991 reviewed Mar 30, 2017

View reviewed changes

tiancaiamao reviewed Mar 30, 2017

View reviewed changes

*: address comment

ab03536

hanfei1991 added the status/LGT1 Indicates that a PR has LGTM 1. label Mar 30, 2017

Merge branch 'master' into coocood/hashjoin-encode-row

1cdf6d9

tiancaiamao approved these changes Mar 30, 2017

View reviewed changes

coocood merged commit a168f41 into master Mar 30, 2017

coocood deleted the coocood/hashjoin-encode-row branch March 30, 2017 15:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

executor: reduce memory usage and GC overhead for hash join. #2957

executor: reduce memory usage and GC overhead for hash join. #2957

coocood commented Mar 29, 2017

coocood commented Mar 29, 2017

hanfei1991 Mar 30, 2017

coocood Mar 30, 2017

hanfei1991 Mar 30, 2017

hanfei1991 Mar 30, 2017

hanfei1991 Mar 30, 2017

hanfei1991 Mar 30, 2017

hanfei1991 Mar 30, 2017

hanfei1991 Mar 30, 2017

tiancaiamao Mar 30, 2017

coocood Mar 30, 2017

tiancaiamao Mar 30, 2017

tiancaiamao commented Mar 30, 2017

coocood commented Mar 30, 2017

hanfei1991 commented Mar 30, 2017

coocood commented Mar 30, 2017

tiancaiamao commented Mar 30, 2017

executor: reduce memory usage and GC overhead for hash join. #2957

executor: reduce memory usage and GC overhead for hash join. #2957

Conversation

coocood commented Mar 29, 2017

coocood commented Mar 29, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tiancaiamao commented Mar 30, 2017

coocood commented Mar 30, 2017

hanfei1991 commented Mar 30, 2017

coocood commented Mar 30, 2017

tiancaiamao commented Mar 30, 2017