-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
executor: reduce memory usage and GC overhead for hash join. #2957
Conversation
Implemented and use MVMap to reduce GC overhead and memory usage for hash join.
benchmark result will be added later. |
e.hashTable[string(hashcode)] = []*Row{row} | ||
} else { | ||
e.hashTable[string(hashcode)] = append(rows, row) | ||
buffer = buffer[:0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reuse the buffer memory.
executor/join.go
Outdated
if err != nil { | ||
return nil, errors.Trace(err) | ||
} | ||
return b, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return b, errors.Trace(err)
executor/join.go
Outdated
|
||
func (e *HashJoinExec) encodeRowKey(b []byte, rowKey *RowKeyEntry) []byte { | ||
b = codec.EncodeVarint(b, rowKey.Handle) | ||
for i, tn := range e.tableNames { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every row's table name is the same. Need we encode it?
m.hashFunc = fnv.New64() | ||
m.entryStore.slices = [][]entry{make([]entry, 0, 64)} | ||
// append first empty entry so zero entry pointer an represent null. | ||
m.entryStore.put(entry{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put nullEntryAddr
util/mvmap/mvmap.go
Outdated
func (ds *dataStore) put(key, value []byte) dataAddr { | ||
dataLen := uint32(len(key) + len(value)) | ||
if ds.sliceLen != 0 && ds.sliceLen+dataLen > maxDataSliceLen { | ||
ds.slices = append(ds.slices, make([]byte, 0, maxDataSliceLen)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the cap should be max(maxDataSliceLen, dataLen)
?
util/mvmap/mvmap.go
Outdated
hashKey := m.hash(key) | ||
entryAddr := m.hashTable[hashKey] | ||
for entryAddr != nullEntryAddr { | ||
he := m.entryStore.get(entryAddr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why name he? It makes me confused.
util/mvmap/mvmap.go
Outdated
for entryAddr != nullEntryAddr { | ||
he := m.entryStore.get(entryAddr) | ||
entryAddr = he.next | ||
k, v := m.dataStore.get(he) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we implement compare and get ? If key is different with k, don't fetch the v.
// MVMap stores multiple value for a given key with minimum GC overhead. | ||
// A given key can store multiple values. | ||
// It is not thread-safe, should only be used in one goroutine. | ||
type MVMap struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This struct is complex, what's the benefit versus this?
type KeyType []byte
type ValueType [][]byte
map[KeyType]ValueType
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the key type and value type in the map don't have pointer, Go garbage collector simply skip the map.
if err != nil { | ||
return nil, errors.Trace(err) | ||
} | ||
row.Data = values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused with Row struct, it has a Data and a RowKeys fields, what's the difference?
Does the encode/decode overhead cover the GC cost after change Row to []byte ? |
Tested join two tables with 1 million rows, the TiDB process memory has increased to 4GB, a little bit smaller than master, this probably due to the GC is slower than memory allocation.
The performance is about 10% faster than master when hash table size is 1 million, but slower for smaller hash table size. I'll try to implement lazy decode on hash join, and see the difference. |
LGTM |
After remove the extra memory allocation. memory:
time:
|
LGTM |
Implemented and use MVMap to reduce GC overhead and memory usage for hash join.