-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add InternalMerge Command #196
Conversation
@@ -493,6 +493,22 @@ func internalSnapshotCopyArgs(key []byte, endKey []byte, maxResults int64, snaps | |||
return args, reply | |||
} | |||
|
|||
// internalMergeArgs returns a InternalMergeRequest and InternalMergeResponse | |||
// pair addressed to the default repilca for the specified key. The request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replica
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, no double space after dot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Could you clean up the following outdated comments in our codebase while you're at it?
|
@@ -47,6 +47,11 @@ const ( | |||
// end key up to some maximum number of results from the given snapshot_id. | |||
// It will create a snapshot if snapshot_id is empty. | |||
InternalSnapshotCopy = "InternalSnapshotCopy" | |||
// InternalMerge merges a given value into the specified key. Merge is a | |||
// high-performance operation provided by underlying data storage for values | |||
// which are accumulated over several writes. Because it cannot be made |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/cannot be made/is not/
LGTM |
// InternalMerge is used to merge a value into an existing key. Merge is an | ||
// efficient accumulation operation which is exposed by RocksDB, used by | ||
// Cockroach for the efficient accumulation of certain values. Due to the | ||
// difficulty of making these operations transactional, merges are not currently |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it difficult to make a merge transactional just because it's hard to make it efficient? Because a merge is really just a read followed by a write.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RocksDB merges are way more involved than just a read followed by a write, they are really going for high performance here.
A Merge usually will go to memory and not touch the actual key. As more merge requests add up to it, RocksDB will (in-memory) pre-merge those merges (still not touching the actual key), assuming associativity. Only when you actually want to know what the value is (or probably automatically after some time when there's a convenient moment to do so) RocksDB will merge into the stored key and write the update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, interesting. And yet it seems difficult to have properly ordered time series data without transactions -- assuming these values are even replicated without transactions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every command is replicated, including non-transactional changes.
Time series data is always ordered because every piece of data is paired with a timestamp. The individual samples can be written out-of-order, and that ordering can be easily corrected by the client when the data is read. There are concerns regarding dropped writes, but monitoring systems are designed to account for this possibility.
Add the InternalMerge command, which is replacing the anticipated (but never implemented) AccumulateTS command. InternalMerge is similar to a put in that it writes a single value to a key - however, the supplied value is *merged* into the existing value instead of replacing it. Merging is not transactional, and thus it is only exposed internally for usage in certain operations which require a high write performance (such as the accumulation of time series data).
a96360a
to
122f55d
Compare
// InternalMerge merges a given value into the specified key. Merge is a | ||
// high-performance operation provided by underlying data storage for values | ||
// which are accumulated over several writes. Because it is not | ||
// transactional, Merge is currently not made available to external clients. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/not transactional/requires context-specific implementation and is not transactional/
LGTM |
We have been seeing long startup times which disappear spontaneously. During a restart of the beta cluster, the following goroutine was observed, which suggests that we were spending a lot of time GCing replicas on startup. engine ??:0 _Cfunc_DBIterNext(cockroachdb#324, cockroachdb#323, 0, 0, 0, 0, 0, 0, 0) engine rocksdb.go:1135 (*rocksDBIterator).Next(cockroachdb#235) storage replica_data_iter.go:104 (*ReplicaDataIterator).Next(cockroachdb#316) storage store.go:1748 (*Store).destroyReplicaData(#109, cockroachdb#317, 0, 0) storage store.go:841 (*Store).Start.func2(0x101b, cockroachdb#300, 0x36, 0x40, cockroachdb#301, 0x36, 0x40, cockroachdb#315, 0x3, 0x4, ...) storage store.go:734 IterateRangeDescriptors.func1(cockroachdb#306, 0x40, 0x41, cockroachdb#307, 0x92, 0x92, cockroachdb#341, 0, 0x186c, 0x4000, ...) engine mvcc.go:1593 MVCCIterate(cockroachdb#329, #68, #47, #81, cockroachdb#232, 0x9, 0x10, cockroachdb#233, 0xb, 0x10, ...) storage store.go:738 IterateRangeDescriptors(cockroachdb#330, cockroachdb#196, #47, #81, cockroachdb#195, #179, #110) storage store.go:867 (*Store).Start(#109, cockroachdb#330, cockroachdb#196, #179, cockroachdb#185, 0x1) server node.go:405 (*Node).initStores(#78, cockroachdb#330, cockroachdb#196, #98, 0x1, 0x1, #179, 0, #55) server node.go:330 (*Node).start(#78, cockroachdb#330, cockroachdb#196, #42, #129, #98, 0x1, 0x1, 0, 0, ...) server server.go:431 (*Server).Start(#5, cockroachdb#330, cockroachdb#196, #95, 0x1) cli start.go:368 runStart(#34, #178, 0, 0x9, 0, 0) cobra command.go:599 (*Command).execute(#34, #177, 0x9, 0x9, #34, #177) cobra command.go:689 (*Command).ExecuteC(#33, #70, cockroachdb#343, #72) cobra command.go:648 (*Command).Execute(#33, #71, cockroachdb#343) cli cli.go:96 Run(#64, 0xa, 0xa, 0, 0) main main.go:37 main()
We have been seeing long startup times which disappear spontaneously. During a restart of the beta cluster, the following goroutine was observed, which suggests that we were spending a lot of time GCing replicas on startup. engine ??:0 _Cfunc_DBIterNext(cockroachdb#324, cockroachdb#323, 0, 0, 0, 0, 0, 0, 0) engine rocksdb.go:1135 (*rocksDBIterator).Next(cockroachdb#235) storage replica_data_iter.go:104 (*ReplicaDataIterator).Next(cockroachdb#316) storage store.go:1748 (*Store).destroyReplicaData(#109, cockroachdb#317, 0, 0) storage store.go:841 (*Store).Start.func2(0x101b, cockroachdb#300, 0x36, 0x40, cockroachdb#301, 0x36, 0x40, cockroachdb#315, 0x3, 0x4, ...) storage store.go:734 IterateRangeDescriptors.func1(cockroachdb#306, 0x40, 0x41, cockroachdb#307, 0x92, 0x92, cockroachdb#341, 0, 0x186c, 0x4000, ...) engine mvcc.go:1593 MVCCIterate(cockroachdb#329, #68, #47, #81, cockroachdb#232, 0x9, 0x10, cockroachdb#233, 0xb, 0x10, ...) storage store.go:738 IterateRangeDescriptors(cockroachdb#330, cockroachdb#196, #47, #81, cockroachdb#195, #179, #110) storage store.go:867 (*Store).Start(#109, cockroachdb#330, cockroachdb#196, #179, cockroachdb#185, 0x1) server node.go:405 (*Node).initStores(#78, cockroachdb#330, cockroachdb#196, #98, 0x1, 0x1, #179, 0, #55) server node.go:330 (*Node).start(#78, cockroachdb#330, cockroachdb#196, #42, #129, #98, 0x1, 0x1, 0, 0, ...) server server.go:431 (*Server).Start(#5, cockroachdb#330, cockroachdb#196, #95, 0x1) cli start.go:368 runStart(#34, #178, 0, 0x9, 0, 0) cobra command.go:599 (*Command).execute(#34, #177, 0x9, 0x9, #34, #177) cobra command.go:689 (*Command).ExecuteC(#33, #70, cockroachdb#343, #72) cobra command.go:648 (*Command).Execute(#33, #71, cockroachdb#343) cli cli.go:96 Run(#64, 0xa, 0xa, 0, 0) main main.go:37 main()
Add the InternalMerge command, which is replacing the anticipated (but never
implemented) AccumulateTS command. InternalMerge is similar to a put in that it
writes a single value to a key - however, the supplied value is merged into
the existing value instead of replacing it.
Merging is not transactional, and thus it is only exposed internally for usage
in certain operations which require a high write performance (such as the
accumulation of time series data).