Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Spec on current cachekv implementation #13977

Merged
merged 10 commits into from
Dec 28, 2022
151 changes: 151 additions & 0 deletions store/cachekv/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
# CacheKVStore specification

A `CacheKVStore` is cache wrapper for a `KVStore`. It extends the operations of the `KVStore` to work with a cache, allowing for reduced I/O operations and more efficient disposing of changes (e.g. after processing a failed transaction).
dangush marked this conversation as resolved.
Show resolved Hide resolved

The core goals the CacheKVStore seeks to solve are:
* Buffer all writes to the parent store, so they can be dropped if they needs to be reverted
dangush marked this conversation as resolved.
Show resolved Hide resolved
* Allow iteration over contiguous spans of keys
* Act as a cache, so we don't repeat I/O to disk for reads we've already done
dangush marked this conversation as resolved.
Show resolved Hide resolved
* Note: We actually fail to achieve this for iteration right now
* Note: Need to consider this getting too large and dropping some cached reads
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain what you mean here? In contrast to the inter-block cache, there is no upper bound on the cache size in a CacheKV.

Copy link
Contributor Author

@dangush dangush Dec 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe @ValarDragon (who wrote this part) is referring to considering runtime issues that can be mitigated by bounding cache. For example, the complexity of running iteration on any size range of keys right now is tied to the overall size of the cache. The best case here would be to design iteration to run relative to the size of the range, but bounding cache size may be needed / a consideration also.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, but the current use of CacheKV to scope transactions in memory until writing back to the underlying IAVL doesn't really allow for bounding cache size, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I'm aware of, no.

* Make subsequent reads, account for prior buffered writes
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how this is different from the previous point, maybe they can be merged?

* Write all changes to the parent store
dangush marked this conversation as resolved.
Show resolved Hide resolved

We should revisit these goals with time (for instance its unclear that all disk writes need to be buffered to the end of the block), but this is the current status.
dangush marked this conversation as resolved.
Show resolved Hide resolved

## Types and Structs

```go
type Store struct {
mtx sync.Mutex
cache map[string]*cValue
deleted map[string]struct{}
unsortedCache map[string]struct{}
sortedCache *dbm.MemDB // always ascending sorted
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pointing out that #13881 replaces this with a BTree.

parent types.KVStore
}
```

The Store struct wraps the underlying `KVStore` (`parent`) with additional data structures for implementing the cache. Mutex is used as IAVL trees (the `KVStore` in application) are not safe for concurrent use.
Copy link

@thpani thpani Nov 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mutex is used as IAVL trees (the KVStore in application) are not safe for concurrent use.

This is very uncertain. Even with the mutex, CacheKV is not fully thread-safe.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not thread safe

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the thread-safety concerns with the implementation apart from the fact that Has() isn't guarded?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bigger question is whether cacheKV needs to be thread-safe or not. I am not sure the reason to have a Mutex is that IAVL trees are not safe for concurrent use.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment, I do not see any current usage patterns that would facilitate the need for concurrent usage.


### `cache`

The main mapping of key-value pairs stored in cache. This map contains both keys that are cached from read operations as well as ‘dirty’ keys which map to a value that is different than what is in the underlying `KVStore`.
dangush marked this conversation as resolved.
Show resolved Hide resolved

Values that are mapped to in `cache` are wrapped in a `cValue` struct, which contains the value and a boolean flag (`dirty`) representing whether the value differs from what's in `parent`.
dangush marked this conversation as resolved.
Show resolved Hide resolved

```go
type cValue struct {
value []byte
dirty bool
}
```

### `deleted`

Key-value pairs that are to be deleted from `parent` are stored in the `deleted` map. Keys are mapped to an empty struct to implement a set.

### `unsortedCache`

Similar to `deleted`, this is a set of keys that are dirty and will need to be updated in the `KVStore` upon a write. Keys are mapped to an empty struct to implement a set.
dangush marked this conversation as resolved.
Show resolved Hide resolved

### `sortedCache`

A database that will be populated by the keys in `unsortedCache` during iteration over the cache. Keys are always inserted in sorted order.
dangush marked this conversation as resolved.
Show resolved Hide resolved

## CRUD Operations and Writing

The `Set`, `Get`, and `Delete` functions all reference `setCacheValue()` , which is the only entrypoint to mutating `cache`.
dangush marked this conversation as resolved.
Show resolved Hide resolved
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which is the only entrypoint to mutating cache

Well, Write() also mutates cache.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, a technicality :)
I'll make note of it though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually Iterator mutate the cache as well, merge the unsorted into the sorted one.


`setCacheValue()` is defined as follows:

```go
func (store *Store) setCacheValue(key, value []byte, deleted bool, dirty bool) {
keyStr := conv.UnsafeBytesToStr(key)
store.cache[keyStr] = &cValue{
value: value,
dirty: dirty,
}
dangush marked this conversation as resolved.
Show resolved Hide resolved
if deleted {
store.deleted[keyStr] = struct{}{}
} else {
delete(store.deleted, keyStr)
}
if dirty {
store.unsortedCache[keyStr] = struct{}{}
}
}
```

`setCacheValue()` inserts a key-value pair into `cache`. Two boolean parameters, `deleted` and `dirty`, are passed in to flag whether the inserted key should also be inserted into the `deleted` and `dirty` sets.

### `Get`

`Get` first attempts to return the value from `cache`. If the key does not exist in `cache`, `parent.Get()` is called instead. This value from the parent is passed into `setCacheValue()` with `deleted=false` and `dirty=false`.
dangush marked this conversation as resolved.
Show resolved Hide resolved

### `Set`

Calls `setCacheValue()` with `deleted=false` and `dirty=true`.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a short description, similar to the one for Delete below?


### `Delete`

A value being deleted from the `KVStore` is represented with a `nil` value in `cache`, and an insertion of the key into the `deleted` set.
dangush marked this conversation as resolved.
Show resolved Hide resolved

Calls `setCacheValue()` with `deleted=true` and `dirty=true`.

### `Write`

Values in the cache are written to `parent` in ascending order of their keys.
dangush marked this conversation as resolved.
Show resolved Hide resolved

A slice of all dirty keys in `cache` is made, then sorted in increasing order. These keys are iterated over to update `parent`.

If a key is marked for deletion (checked with `isDeleted()`), then `parent.Delete()` is called. Otherwise, `parent.Set()` is called to update the underlying `KVStore` with the value in cache.

## Iteration

Efficient iteration over keys in `KVStore` is important for generating Merkle range proofs. Iteration over `CacheKVStore` requires producing all key-value pairs from the underlying `KVStore` while taking into account updated values from the cache.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Efficient iteration over keys in KVStore is important for generating Merkle range proofs.

Not sure I'd call it efficient, the iterators are also backed by a B-tree.

You actually say below that iteration does not benefit from caching.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it have anything to do with "Merkle range proofs" here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to describe the motivation for a fast time complexity for the iterator function. Is this any better?

Generating Merkle range proofs requires iteration over the KVStore, making the efficiency of the iterator a potential bottleneck.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the merkle proofs can only be generated at iavl level, the KVStore interfaces don't provide those functionalities.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I'm actually not too sure how this works then.
#12885 reduced the time for invariant checks for JUNO by nearly 100%. The invariants being checked involved generating/checking Merkle proofs to verify state, no? That's what I assumed is a big use of the iterator.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the invariant checks was to check the invariants within the states, not checking proofs I think.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should say here that iterators range over a key interval [start, end), as it becomes important below.


In the current implementation, there is no guarantee that all values in `parent` have been cached. As a result, iteration is achieved by iterating through both `parent` and the cache (failing to actually benefit from caching).
dangush marked this conversation as resolved.
Show resolved Hide resolved

[cacheMergeIterator](https://github.com/cosmos/cosmos-sdk/blob/d8391cb6796d770b02448bee70b865d824e43449/store/cachekv/mergeiterator.go) implements functions to provide a single iterator with an input of iterators over `parent` and the cache. This iterator iterates over keys from both iterators in a shared lexicological order, and overrides the value provided by the parent iterator if the same key is dirty or deleted in the cache.
dangush marked this conversation as resolved.
Show resolved Hide resolved

### Implementation Overview

Iterators over `parent` and the cache are generated and passed into `cacheMergeIterator`, which returns a single iterator. Implementation of the `parent` iterator is up to the underlying `KVStore`. The rest of the implementation details here will cover the generation of the cache iterator.
dangush marked this conversation as resolved.
Show resolved Hide resolved

Generating the cache iterator can be decomposed into four parts:
dangush marked this conversation as resolved.
Show resolved Hide resolved

1. Finding all keys that exist in the range we are iterating over
2. Sorting this list of keys
3. Inserting these keys into `sortedCache` and removing them from `unsortedCache`
4. Returning an iterator over `sortedCache` with the desired range

Currently, the implementation for the first two parts is split into two cases, depending on the size of the unsorted cache. The two cases are as follows.

If the size of `unsortedCache` is less than `minSortSize` (currently 1024), a linear time approach is taken to search over keys.

```go
n := len(store.unsortedCache)
unsorted := make([]*kv.Pair, 0)

if n < minSortSize {
for key := range store.unsortedCache {
if dbm.IsKeyInDomain(conv.UnsafeStrToBytes(key), start, end) {
cacheValue := store.cache[key]
unsorted = append(unsorted, &kv.Pair{Key: []byte(key), Value: cacheValue.value})
}
}
store.clearUnsortedCacheSubset(unsorted, stateUnsorted)
return
}
```

Here, we iterate through all the keys in `unsortedCache`, collecting them in a slice called `unsorted`.
dangush marked this conversation as resolved.
Show resolved Hide resolved

At this point, part 3. is achieved in `clearUnsortedCacheSubset()`. This function iterates through `unsorted`, removing each key from `unsortedCache`. Afterwards, `unsorted` is sorted. Lastly, it iterates through the key-pairs in the now sorted slice, setting any key meant to be deleted to map to an arbitrary value (`[]byte{}`).
dangush marked this conversation as resolved.
Show resolved Hide resolved
dangush marked this conversation as resolved.
Show resolved Hide resolved

In the case that the size of `unsortedCache` is larger than `minSortSize`, a linear time approach to finding keys within the desired range is too slow to use. Instead, a slice of all keys in `unsortedCache` is sorted, and binary search is used to find the beginning and ending indices of the desired range. This produces an already-sorted slice that is passed into the same `clearUnsortedCacheSubset()` function. An iota identifier (`sortedState`) is used to skip the sorting step in the function.

Finally, part 4. is achieved with `memIterator`, which implements an iterator over the items in `sortedCache`.

As of [PR #12885](https://github.com/cosmos/cosmos-sdk/pull/12885), an optimization to the binary search case mitigates the overhead of sorting the entirety of the key set in `unsortedCache`. To avoid wasting the compute spent sorting, we should ensure that a reasonable amount of values are removed from `unsortedCache`. If the length of the range for iteration is less than `minSortedCache`, we widen the range of values for removal from `unsortedCache` to be up to `minSortedCache` in length. This amortizes the cost of processing elements across multiple calls.