-
Notifications
You must be signed in to change notification settings - Fork 236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rebase per-key actor epoch (kv679) on 2.0 [JIRA: RIAK-1331] #1070
Conversation
use the counter to create a per epoch key when local not found.
When it hits that limit create a new vnodeid.
Start a new epoch if the local object has a lower epoch for the vnode id than the incoming object (no epoch is lower, no epich is zero)
Bad match for re-refactord put_merge return. Correct return type for highest_actor
use the bare vnodeid as long as possible, only start a new epoch when needed. In a way the bare vnodeid is just epoch zero, so use it. Also, consider the incoming counter when deciding about new epochs. An incoming clock with the same actor+epoch but greater counter is a hint that a byzantine failure occurred and a new epoch is needed
NOTE: make dialyzer exits with an error, but no information. Could do with some help on that.
including such hits as "What is fold (baby don't recurse me)?" and "it's an IF not a CASE"
Refactor riak_kv_backend basic test, why? It depended on order. It didn't use the mutated mod state from one test to the next (so it implicitly expected side-effects)
Mr @slfritchie gave me some review over email. Reproduced below are his comments and my replies. As usual he was right and insightful and I made the changes he suggested. On 19 Dec 2014, at 07:22, Scott Lystig Fritchie [email protected] wrote:
I have set to 20seconds, you think more?
This is all new code, where in 1.4 do we have such a timeout?
These are things that cannot be done without object format change as = They are nice-to-have as they solve the byzantine case of dataloss.
Made it 10k
The overflow protection is in the status manager, you can't hand out
OK, we discussed this, and the answer is smaller Max Lease size, and =
Will remove.
I hope it doesn=92t matter, but I=92ll give it a poke.
Will do, thanks.
Ooops. |
First draft of test to exercise riak_kv_vnode_status_mgr in a manner similar to the way it is used by riak_kv_vnode. The eqc property exercises the vnode_status_mgr_driver api which in turn interacts with the riak_kv_vnode_status_mgr module.
@@ -110,6 +110,21 @@ | |||
reqid :: term(), | |||
target :: pid()}). | |||
|
|||
-record(counter_state, { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit: suggest renaming this record to epoch_counter (and giving the state record field that same name) to be more explicit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I agree. It's just a counter: the keys get epochs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, but it's not a purposeless counter, it has a specific role. Is there a good name for its role?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whatever man. Rename it if you like. Thanks for the contribution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming is hard. I think the inline comments make up for any possible confusion about its purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @macintux, I'm sorry, that wasn't cool on my part. This is at least the 3rd review, and I appreciate more eyes. It's late and I shouldn't have responded until morning. What I mean is, if you want to rename it, please be my guest, maybe it would help a future maintainer understand the purpose of the counter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bike and sleep deprived. No sweat. I'll keep reading and thinking.
Sent from my iPhone
On Jan 8, 2015, at 5:02 PM, Russell Brown [email protected] wrote:
In src/riak_kv_vnode.erl:
@@ -110,6 +110,21 @@
reqid :: term(),
target :: pid()}).+-record(counter_state, {
Hey @macintux, I'm sorry, that wasn't cool on my part. This is at least the 3rd review, and I appreciate more eyes. It's late and I shouldn't have responded until morning. What I mean is, if you want to rename it, please be my guest, maybe it would help a future maintainer understand the purpose of the counter.—
Reply to this email directly or view it on GitHub.
I'm seeing the following failure when running
|
@andrewjstone Those are fixed by #1071. |
Thanks @seancribbs On Thu, Jan 8, 2015 at 6:34 PM, Sean Cribbs [email protected]
|
#counter_state{lease_size=LeaseSize} = CounterState, | ||
Start = os:timestamp(), | ||
receive | ||
{'EXIT', Pid, Reason} -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would need to call process_flag(trap_exit, true)
in init/1 to get an EXIT
message.
See "Receiving Exit Signals" here
Are we sure we want to actually trap exits in the vnode though? @jonmeredith @jtuple
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already trap exits in riak_core_vnode
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhh, Thank you @seancribbs
On Tue, Jan 13, 2015 at 3:38 PM, Sean Cribbs [email protected]
wrote:
In src/riak_kv_vnode.erl
#1070 (diff):
- blocking_lease_counter(State, {0, MaxErrs, MaxTime}).
+-spec blocking_lease_counter(#state{}, {Errors :: non_neg_integer(),
MaxErrors :: non_neg_integer(),
TimeRemainingMillis :: non_neg_integer()}
) ->
{ok, #state{}} |
+blocking_lease_counter(_State, {MaxErrs, MaxErrs, _MaxTime}) ->counter_lease_error().
- {error, counter_lease_max_errors};
+blocking_lease_counter(State, {ErrCnt, MaxErrors, MaxTime}) ->- #state{idx=Index, vnodeid=VId, status_mgr_pid=Pid, counter=CounterState} = State,
- #counter_state{lease_size=LeaseSize} = CounterState,
- Start = os:timestamp(),
- receive
{'EXIT', Pid, Reason} ->
We already trap exits in riak_core_vnode
https://github.com/basho/riak_core/blob/develop/src/riak_core_vnode.erl#L186
.—
Reply to this email directly or view it on GitHub
https://github.com/basho/riak_kv/pull/1070/files#r22892621.
@russelldb This PR looks very good. I've gone over all the code and ran the eqc test a few times. Tomorrow I will go over the corresponding riak_test PR and run that against this code. |
Arrggh. I messed up. Need to re-do that change, hold off re-review for now |
Urgh, I went and rebased again. And I don't know how to "unrebase" and I don't want to force push. Give me a bit of time to wrestle with the tools here. |
there we go |
👍 d77f65f |
…ase2.0 Per-key actor epochs for vnode vclock entries Reviewed-by: andrewjstone
I was checking this over to see if you had already inserted a metric for when the epoch increments (or leases granted, etc), and found you have a stray file in the PR, |
See #679 and the associated platform_task RFC and summary.
This PR addresses the "doomstone", backup-restore, and some byzantine flavours of the kv679 bug.
The RFC explains the mechanism in detail but briefly:
add a persisted to disk vnode counter (persisted with leases, aysnc)
when ever a key is written to for the "first time" by a vnode, create an epoch actor for the key by concatenating vnodeid+counter (and increment the counter)
This ensures that a first time write for a key gets a new actor, this is the epoch for the key. It means we don't mix up deleted+re-created keys {a,1} event with the original {a, 1} event for some key, by ensuring an actor per epoch, without causing a keyspace wide actor explosion.
There are riak_tests at https://github.com/basho/riak_test/tree/rdb/gh-kv679
This has been long running work, there is a PR here #1040 and here /pull/1053 that are closed in favour of this.