Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

statistics: add lru cache implement for statsCacheInner #34145

Closed
wants to merge 6 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions metrics/metrics.go
Original file line number Diff line number Diff line change
Expand Up @@ -165,6 +165,8 @@ func RegisterMetrics() {
prometheus.MustRegister(CPUProfileCounter)
prometheus.MustRegister(ReadFromTableCacheCounter)
prometheus.MustRegister(LoadTableCacheDurationHistogram)
prometheus.MustRegister(StatsCacheLRUCounter)
prometheus.MustRegister(StatsCacheLRUMemUsage)

tikvmetrics.InitMetrics(TiDB, TiKVClient)
tikvmetrics.RegisterMetrics()
Expand Down
14 changes: 14 additions & 0 deletions metrics/stats.go
Original file line number Diff line number Diff line change
Expand Up @@ -128,4 +128,18 @@ var (
Help: "Bucketed histogram of latency time (ms) of stats read during sync-load.",
Buckets: prometheus.ExponentialBuckets(1, 2, 22), // 1ms ~ 1h
})

StatsCacheLRUCounter = prometheus.NewCounterVec(prometheus.CounterOpts{
Namespace: "tidb",
Subsystem: "statistics",
Name: "cache_lru",
Help: "The counter of stats cache lru operation",
}, []string{LblType})

StatsCacheLRUMemUsage = prometheus.NewGaugeVec(prometheus.GaugeOpts{
Namespace: "tidb",
Subsystem: "statistics",
Name: "cache_mem_usage",
Help: "The Gauge of stats cache mem usage",
}, []string{LblType})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you going to make changes to tidb.json in another PR?

)
336 changes: 336 additions & 0 deletions statistics/handle/lru_cache.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,336 @@
// Copyright 2022 PingCAP, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

package handle

import (
"container/list"
"sync"

"github.com/pingcap/tidb/metrics"
"github.com/pingcap/tidb/statistics"
)

const (
typHit = "hit"
typMiss = "miss"
typUpdate = "update"
typDel = "del"
typEvict = "evict"
typCopy = "copy"
typTrack = "track"
typTotal = "total"
)

// cacheItem wraps Key and Value. It's the value of list.Element.
type cacheItem struct {
key int64
value *statistics.Table
tblMemUsage *statistics.TableMemoryUsage
}

// internalLRUCache is a simple least recently used cache
type internalLRUCache struct {
hawkingrei marked this conversation as resolved.
Show resolved Hide resolved
sync.RWMutex
capacity int64
// trackingCost records the tracking memory usage of the elements stored in the internalLRUCache
// trackingCost should be kept under capacity by evict policy
trackingCost int64
// totalCost records the total memory usage of the elements stored in the internalLRUCache
totalCost int64
elements map[int64]*list.Element
// cache maintains elements in list.
// Note that if the element's trackingMemUsage is 0, it will be removed from cache in order to keep cache not too long
cache *list.List
}

// newInternalLRUCache returns internalLRUCache
func newInternalLRUCache(capacity int64) *internalLRUCache {
if capacity < 1 {
panic("capacity of LRU Cache should be at least 1.")
}
return &internalLRUCache{
capacity: capacity,
elements: make(map[int64]*list.Element),
cache: list.New(),
}
}

// Get tries to find the corresponding value according to the given key.
func (l *internalLRUCache) Get(key int64) (*statistics.Table, bool) {
l.Lock()
r, hit := l.get(key)
l.Unlock()
if hit {
metrics.StatsCacheLRUCounter.WithLabelValues(typHit).Inc()
} else {
metrics.StatsCacheLRUCounter.WithLabelValues(typMiss).Inc()
}
return r, hit
}

func (l *internalLRUCache) get(key int64) (*statistics.Table, bool) {
element, exists := l.elements[key]
if !exists {
return nil, false
}
l.cache.MoveToFront(element)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we maintain the lru list for Column instead of Table?

Copy link
Contributor Author

@Yisaer Yisaer Apr 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can't, because the basic recent used element is table for current code logic.

eg: For now we can only know the updated table id in following code, thus it's hard to maintain column in lru instead of table.

// update updates the statistics table cache using copy on write.
func (sc statsCache) update(tables []*statistics.Table, deletedIDs []int64, newVersion uint64) statsCache {
	newCache := sc.copy()
	if newVersion == newCache.version {
		newCache.minorVersion += uint64(1)
	} else {
		newCache.version = newVersion
		newCache.minorVersion = uint64(0)
	}
	for _, tbl := range tables {
		id := tbl.PhysicalID
		newCache.Put(id, tbl)
	}
	for _, id := range deletedIDs {
		newCache.Del(id)
	}
	return newCache
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, the information about which columns are used after getTableStats is everywhere. We can leverage CollectColumnStatsUsage later to maintain LRU of columns.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And should only MoveToFront for GetByQuery.

return element.Value.(*cacheItem).value, true
}

// Put puts the (key, value) pair into the LRU Cache.
func (l *internalLRUCache) Put(key int64, value *statistics.Table) {
l.Lock()
l.put(key, value, value.MemoryUsage(), true)
trackingCost := l.trackingCost
totalCost := l.totalCost
l.Unlock()
metrics.StatsCacheLRUCounter.WithLabelValues(typUpdate).Inc()
metrics.StatsCacheLRUMemUsage.WithLabelValues(typTrack).Set(float64(trackingCost))
metrics.StatsCacheLRUMemUsage.WithLabelValues(typTotal).Set(float64(totalCost))
}

func (l *internalLRUCache) put(key int64, value *statistics.Table, tblMemUsage *statistics.TableMemoryUsage, tryEvict bool) {
// If the item TotalColTrackingMemUsage is larger than capacity, we will drop some structures in order to put it in cache
for l.capacity < tblMemUsage.TotalColTrackingMemUsage() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this forloop?

for _, col := range value.Columns {
col.DropEvicted()
tblMemUsage = value.MemoryUsage()
if l.capacity >= tblMemUsage.TotalColTrackingMemUsage() {
break
}
}
}
defer func() {
if tryEvict {
l.evictIfNeeded()
}
}()
element, exists := l.elements[key]
if exists {
oldMemUsage := element.Value.(*cacheItem).tblMemUsage
element.Value.(*cacheItem).value = value
element.Value.(*cacheItem).tblMemUsage = tblMemUsage
l.calculateCost(tblMemUsage, oldMemUsage)
l.maintainList(element, nil, tblMemUsage.TotalColTrackingMemUsage(), oldMemUsage.TotalColTrackingMemUsage())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that oldMemUsage.TotalColTrackingMemUsage() == 0 and newMemUsage.TotalColTrackingMemUsage() > 0? If it is possible, we will add nil into list.

return
}
newCacheEntry := &cacheItem{
key: key,
value: value,
tblMemUsage: tblMemUsage,
}
l.calculateCost(tblMemUsage, &statistics.TableMemoryUsage{})
// We first push it into cache front here, if the element TotalColTrackingMemUsage is 0, it will be
// removed form cache list in l.maintainList
element = l.cache.PushFront(newCacheEntry)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put means nothing to "Last recently used", it could be updates when reading latest stats from kv.

l.maintainList(element, newCacheEntry, tblMemUsage.TotalColTrackingMemUsage(), 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that newCacheEntry would not be used in maintainList? Is it better to pass nil in?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Emmm, I' not sure about it.

l.elements[key] = element
Yisaer marked this conversation as resolved.
Show resolved Hide resolved
}

// Del deletes the key-value pair from the LRU Cache.
func (l *internalLRUCache) Del(key int64) {
l.Lock()
del := l.del(key)
trackingCost := l.trackingCost
totalCost := l.totalCost
l.Unlock()
if del {
metrics.StatsCacheLRUCounter.WithLabelValues(typDel).Inc()
metrics.StatsCacheLRUMemUsage.WithLabelValues(typTrack).Set(float64(trackingCost))
metrics.StatsCacheLRUMemUsage.WithLabelValues(typTotal).Set(float64(totalCost))
}
}

func (l *internalLRUCache) del(key int64) bool {
element, exists := l.elements[key]
if !exists {
return false
}
delete(l.elements, key)
memUsage := element.Value.(*cacheItem).tblMemUsage
l.calculateCost(&statistics.TableMemoryUsage{}, memUsage)
l.maintainList(element, nil, 0, 1)
return true
}

// Cost returns the current cost
func (l *internalLRUCache) Cost() int64 {
l.RLock()
defer l.RUnlock()
return l.totalCost
}

// Keys returns the current Keys
func (l *internalLRUCache) Keys() []int64 {
l.RLock()
defer l.RUnlock()
r := make([]int64, 0, len(l.elements))
for _, v := range l.elements {
r = append(r, v.Value.(*cacheItem).key)
}
return r
}

// Values returns the current Values
func (l *internalLRUCache) Values() []*statistics.Table {
l.RLock()
defer l.RUnlock()
r := make([]*statistics.Table, 0, len(l.elements))
for _, v := range l.elements {
r = append(r, v.Value.(*cacheItem).value)
}
return r
}

// Map returns the map of table statistics
func (l *internalLRUCache) Map() map[int64]*statistics.Table {
l.RLock()
defer l.RUnlock()
r := make(map[int64]*statistics.Table, len(l.elements))
for k, v := range l.elements {
r[k] = v.Value.(*cacheItem).value
}
return r
}

// Len returns the current length
func (l *internalLRUCache) Len() int {
l.RLock()
defer l.RUnlock()
return len(l.elements)
}

// FreshMemUsage re-calculate the memory message
func (l *internalLRUCache) FreshMemUsage() {
l.Lock()
for _, v := range l.elements {
item := v.Value.(*cacheItem)
oldMemUsage := item.tblMemUsage
newMemUsage := item.value.MemoryUsage()
item.tblMemUsage = newMemUsage
l.calculateCost(newMemUsage, oldMemUsage)
l.maintainList(v, nil, newMemUsage.TotalColTrackingMemUsage(), oldMemUsage.TotalColTrackingMemUsage())
}
l.evictIfNeeded()
totalCost := l.totalCost
trackingCost := l.trackingCost
l.Unlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to start with defer unless your logic between Lock and Unlock is quite simple.

metrics.StatsCacheLRUMemUsage.WithLabelValues(typTrack).Set(float64(trackingCost))
metrics.StatsCacheLRUMemUsage.WithLabelValues(typTotal).Set(float64(totalCost))
}

// FreshTableCost re-calculate the memory message for the certain key
func (l *internalLRUCache) FreshTableCost(key int64) {
Yisaer marked this conversation as resolved.
Show resolved Hide resolved
l.Lock()
calculated := l.calculateTableCost(key)
totalCost := l.totalCost
trackingCost := l.trackingCost
l.Unlock()
if calculated {
metrics.StatsCacheLRUMemUsage.WithLabelValues(typTrack).Set(float64(trackingCost))
metrics.StatsCacheLRUMemUsage.WithLabelValues(typTotal).Set(float64(totalCost))
}
}

func (l *internalLRUCache) calculateTableCost(key int64) bool {
element, exists := l.elements[key]
if !exists {
return false
}
item := element.Value.(*cacheItem)
l.put(item.key, item.value, item.value.MemoryUsage(), true)
return true
}

// Copy returns a replication of LRU
func (l *internalLRUCache) Copy() statsCacheInner {
var newCache *internalLRUCache
l.RLock()
newCache = newInternalLRUCache(l.capacity)
node := l.cache.Back()
for node != nil {
key := node.Value.(*cacheItem).key
value := node.Value.(*cacheItem).value
tblMemUsage := node.Value.(*cacheItem).tblMemUsage
newCache.put(key, value, tblMemUsage, false)
node = node.Prev()
}
l.RUnlock()
metrics.StatsCacheLRUCounter.WithLabelValues(typCopy).Inc()
return newCache
}

// internalLRUCache will evict a table's column' structure in order to keep tracking cost under capacity
// If the elements has no structure can be evicted, it will be removed from list.
func (l *internalLRUCache) evictIfNeeded() {
curr := l.cache.Back()
evicted := false
for !l.underCapacity() {
evicted = true
item := curr.Value.(*cacheItem)
tbl := item.value
oldMemUsage := item.tblMemUsage
prev := curr.Prev()
for _, col := range tbl.Columns {
if col.IsEvicted() {
continue
}
col.DropEvicted()
newMemUsage := tbl.MemoryUsage()
item.tblMemUsage = newMemUsage
l.calculateCost(newMemUsage, oldMemUsage)
if l.underCapacity() {
break
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are maintaining LRU at Table level, why not evict CMSketch for all columns of this table directly to make it simple.

}
}
newMemUsage := tbl.MemoryUsage()
if newMemUsage.TotalColTrackingMemUsage() < 1 {
l.maintainList(curr, nil, newMemUsage.TotalColTrackingMemUsage(), oldMemUsage.TotalColTrackingMemUsage())
}
curr = prev
}
if evicted {
metrics.StatsCacheLRUCounter.WithLabelValues(typEvict).Inc()
Yisaer marked this conversation as resolved.
Show resolved Hide resolved
}
}

func (l *internalLRUCache) calculateCost(newUsage, oldUsage *statistics.TableMemoryUsage) {
l.totalCost += newUsage.TotalMemUsage - oldUsage.TotalMemUsage
l.trackingCost += newUsage.TotalColTrackingMemUsage() - oldUsage.TotalColTrackingMemUsage()
}

// maintainList maintains elements in list cache
// For oldTotalColTrackingMemUsage>0 && newTotalColTrackingMemUsage>0, it means the element is updated.
// For oldTotalColTrackingMemUsage>0 && newTotalColTrackingMemUsage=0, it means the element is removed.
// For oldTotalColTrackingMemUsage=0 && newTotalColTrackingMemUsage>0, it means the new element is inserted
// For oldTotalColTrackingMemUsage=0 && newTotalColTrackingMemUsage=0, we do nothing.
func (l *internalLRUCache) maintainList(element *list.Element, item *cacheItem, newTotalColTrackingMemUsage, oldTotalColTrackingMemUsage int64) *list.Element {
if oldTotalColTrackingMemUsage > 0 {
if newTotalColTrackingMemUsage > 0 {
l.cache.MoveToFront(element)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need to MoveToFront for update. It could be the table stats is updated by auto-analyze, it means nothing to LRU.
And I think oldTotalColTrackingMemUsage=0 doesn't mean the element does not exist, it could be the CMSketch of this table are all evicted.
I think we don't need to maintainList with memUsage, just do PushFront for new inserted table, do Remove for deleted table, and do MoveToFront for table GetByQuery.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just keep LRU list consistent with map, the only usage of LRU list is to provide the order for evict.

return element
}
l.cache.Remove(element)
return nil
}
if newTotalColTrackingMemUsage > 0 {
return l.cache.PushFront(item)
}
return nil
}

func (l *internalLRUCache) underCapacity() bool {
return l.trackingCost <= l.capacity
}
Loading