Skip to content

gholt/store

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ValueStore

Development Repository

Package store provides a disk-backed data structure for use in storing []byte values referenced by 128 bit keys with options for replication.

It can handle billions of keys (as memory allows) and full concurrent access across many cores. All location information about each key is stored in memory for speed, but values are stored on disk with the exception of recently written data being buffered first and batched to disk later.

This has been written with SSDs in mind, but spinning drives should work also; though storing toc files (Table Of Contents, key location information) on a separate disk from values files is recommended in that case.

Each key is two 64bit values, known as keyA and keyB uint64 values. These are usually created by a hashing function of the key name, but that duty is left outside this package.

Each modification is recorded with an int64 timestamp that is the number of microseconds since the Unix epoch (see github.com/gholt/brimtime.TimeToUnixMicro). With a write and delete for the exact same timestamp, the delete wins. This allows a delete to be issued for a specific write without fear of deleting any newer write.

Internally, each modification is stored with a uint64 timestamp that is equivalent to (brimtime.TimeToUnixMicro(time.Now())<<8) with the lowest 8 bits used to indicate deletions and other bookkeeping items. This means that the allowable time range is 1970-01-01 00:00:00 +0000 UTC (+1 microsecond because all zeroes indicates a missing item) to 4253-05-31 22:20:37.927935 +0000 UTC. There are constants TIMESTAMPMICRO_MIN and TIMESTAMPMICRO_MAX available for bounding usage.

There are background tasks for:

  • TombstoneDiscard: This will discard older tombstones (deletion markers). Tombstones are kept for Config.TombstoneAge seconds and are used to ensure a replicated older value doesn't resurrect a deleted value. But, keeping all tombstones for all time is a waste of resources, so they are discarded over time. Config.TombstoneAge controls how long they should be kept and should be set to an amount greater than several replication passes.

  • PullReplication: This will continually send out pull replication requests for all the partitions the ValueStore is responsible for, as determined by the Config.MsgRing. The other responsible parties will respond to these requests with data they have that was missing from the pull replication request. Bloom filters are used to reduce bandwidth which has the downside that a very small percentage of items may be missed each pass. A moving salt is used with each bloom filter so that after a few passes there is an exceptionally high probability that all items will be accounted for.

  • PushReplication: This will continually send out any data for any partitions the ValueStore is not responsible for, as determined by the Config.MsgRing. The responsible parties will respond to these requests with acknowledgements of the data they received, allowing the requester to discard the out of place data.

  • Compaction: TODO description.

  • Audit: This will verify the data on disk has not been corrupted. It will slowly read data over time and validate checksums. If it finds issues, it will try to remove affected entries the in-memory location map so that replication from other stores will send the information they have and the values will get re-stored locally. In cases where the affected entries cannot be determined, it will make a callback requesting the store be shutdown and restarted; this restart will result in the affected keys being missing and therefore replicated in by other stores.

Note that if the disk gets filled past a configurable threshold, any external writes other than deletes will result in error. Internal writes such as compaction and removing successfully push-replicated data will continue.

There is also a modified form of ValueStore called GroupStore that expands the primary key to two 128 bit keys and offers a Lookup method which retrieves all matching items for the first key.

API Documentation

This is the latest development area for the package.
Eventually a stable version of the package will be established but, for now, all things about this package are subject to change.

Copyright See AUTHORS. All rights reserved.
Use of this source code is governed by a BSD-style
license that can be found in the LICENSE file.

About

Experimental keyhash/value store.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages