Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rebase per-key actor epoch (kv679) on 2.0 [JIRA: RIAK-1331] #1070

Closed
wants to merge 33 commits into from

Conversation

russelldb
Copy link
Member

See #679 and the associated platform_task RFC and summary.

This PR addresses the "doomstone", backup-restore, and some byzantine flavours of the kv679 bug.

The RFC explains the mechanism in detail but briefly:

add a persisted to disk vnode counter (persisted with leases, aysnc)
when ever a key is written to for the "first time" by a vnode, create an epoch actor for the key by concatenating vnodeid+counter (and increment the counter)
This ensures that a first time write for a key gets a new actor, this is the epoch for the key. It means we don't mix up deleted+re-created keys {a,1} event with the original {a, 1} event for some key, by ensuring an actor per epoch, without causing a keyspace wide actor explosion.

There are riak_tests at https://github.com/basho/riak_test/tree/rdb/gh-kv679


This has been long running work, there is a PR here #1040 and here /pull/1053 that are closed in favour of this.


  • review
  • benchmarking
  • riak_test review
  • add Kelly's EQC test

use the counter to create a per epoch key when local not found.
When it hits that limit create a new vnodeid.
Start a new epoch if the local object has a lower epoch for the
vnode id than the incoming object (no epoch is lower, no epich is zero)
Bad match for re-refactord put_merge return.
Correct return type for highest_actor
use the bare vnodeid as long as possible, only start a new epoch
when needed. In a way the bare vnodeid is just epoch zero, so use it.

Also, consider the incoming counter when deciding about new epochs.
An incoming clock with the same actor+epoch but greater counter is
a hint that a byzantine failure occurred and a new epoch is needed
NOTE: make dialyzer exits with an error, but no information. Could
do with some help on that.
including such hits as "What is fold (baby don't recurse me)?" and
"it's an IF not a CASE"
Refactor riak_kv_backend basic test, why?
It depended on order. It didn't use the mutated mod state from
one test to the next (so it implicitly expected side-effects)
@russelldb russelldb changed the title Bug/rdb/gh679 actor epochs re rebase2.0 Per actor epochs for vnode vclock entries Dec 30, 2014
@russelldb russelldb changed the title Per actor epochs for vnode vclock entries Per-key actor epochs for vnode vclock entries Dec 30, 2014
@russelldb
Copy link
Member Author

Mr @slfritchie gave me some review over email. Reproduced below are his comments and my replies. As usual he was right and insightful and I made the changes he suggested.

On 19 Dec 2014, at 07:22, Scott Lystig Fritchie [email protected] wrote:

Howdy!

I can put these comments into the PR, but there are enough
comments/conversation fragments there that I don't want things to get
lost right now.

Question: blocking_lease_counter()

I don't believe that 5 seconds is long enough. I've seen Linux
systems block all writes for 15+ seconds when disk has been
overwhelmed by random dirty pages and/or a disk device is stuttering
(i.e., in an early hardware failure state).

I have set to 20seconds, you think more?

Also, see that the 5 seconds is what's in the 1.4 branch. We might
want to patch that, depending on results of the discussion here.

This is all new code, where in 1.4 do we have such a timeout?

Question: put_merge() and prepare_new_put()

I'm not knowlegeable to know if the new @todo things are must-do or
would-be-nice-to-do-in-unicorn-land or somewhere in between.

These are things that cannot be done without object format change as =
they require some persistent per-key-metadat about the clock. Leaving as =
TODOs, if that is OK?

They are nice-to-have as they solve the byzantine case of dataloss.

Question: -define(DEFAULT_CNTR_LEASE, 1000).

Increase by a factor of 5 or 10?

Made it 10k

Question: update_counter() and key_epoch_actor()

key_epoch_actor() says that the counter is 32 bits. update_counter()
doesn't appear to have any overflow protection?

The overflow protection is in the status manager, you can't hand out
above the Lease, and the lease is protected from overflowing. Makes
sense?

riak_kv_vnode_status_manager has checks for ?MAX_CNTR for
get_vnodeid_and_counter() and lease_counter(), but those are checking
the lease size instead of checking the lease's upper bound.

HRM, ok, so there is an enforcement, but it feels weird.
get_counter_lease() has this:

  •        LeaseTo =3D min(PrevLease + LeaseSize, ?MAX_CNTR),
    

So, if we ask for a lease range of 100 and a PrevLease of ?MAX_CNTR-2,
then LeaseTo will only be ?MAX_COUNTER, which means a lease size of 2;
the vnode will (nearly) immediately ask for a new lease,
get_counter_lease() will see PrevLease =3D=3D ?MAX_CNTR, and then=
assign a
new vnodeID and a new lease range starting at zero.

OK, we discussed this, and the answer is smaller Max Lease size, and =
always give out a lease of the size requested, even if it means a new =
vnodeid =93prematurely"

I'm still puzzled by the "LeaseSize =3D< ?MAX_CNTR" checks, though, =
they
feel very distracting.

Will remove.

Question: assign_vnodeid()

Looks like you just moved this function and didn't edit it, but ... it
seems possible for VnodeEpoch to overflow 32 bits. I don't think that
matters in The Big Scheme of Things(tm), but perhaps I'm wrong?

I hope it doesn=92t matter, but I=92ll give it a poke.

Suggestion: riak_kv_vnode_status_mgr_eqc:lease_counter()

Who knows how flakey the CI system might be. I'd change it to 30 or
60 seconds and then wait to be exasperated about CI when something
that ludicrous fails. I've seen enough eunit tests with CI fail at 5
second when they should take 2 milliseconds, so =85

Will do, thanks.

Hrm, there isn't an EUnit wrapper for the QuickCheck property in this
module, is that intentional?

Ooops.

First draft of test to exercise riak_kv_vnode_status_mgr in a manner
similar to the way it is used by riak_kv_vnode. The eqc property
exercises the vnode_status_mgr_driver api which in turn interacts with
the riak_kv_vnode_status_mgr module.
@@ -110,6 +110,21 @@
reqid :: term(),
target :: pid()}).

-record(counter_state, {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit: suggest renaming this record to epoch_counter (and giving the state record field that same name) to be more explicit

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I agree. It's just a counter: the keys get epochs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, but it's not a purposeless counter, it has a specific role. Is there a good name for its role?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whatever man. Rename it if you like. Thanks for the contribution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming is hard. I think the inline comments make up for any possible confusion about its purpose.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @macintux, I'm sorry, that wasn't cool on my part. This is at least the 3rd review, and I appreciate more eyes. It's late and I shouldn't have responded until morning. What I mean is, if you want to rename it, please be my guest, maybe it would help a future maintainer understand the purpose of the counter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bike and sleep deprived. No sweat. I'll keep reading and thinking.

Sent from my iPhone

On Jan 8, 2015, at 5:02 PM, Russell Brown [email protected] wrote:

In src/riak_kv_vnode.erl:

@@ -110,6 +110,21 @@
reqid :: term(),
target :: pid()}).

+-record(counter_state, {
Hey @macintux, I'm sorry, that wasn't cool on my part. This is at least the 3rd review, and I appreciate more eyes. It's late and I shouldn't have responded until morning. What I mean is, if you want to rename it, please be my guest, maybe it would help a future maintainer understand the purpose of the counter.


Reply to this email directly or view it on GitHub.

@andrewjstone
Copy link
Contributor

I'm seeing the following failure when running make test for riak_kv on this branch.

Failures:

  1) riak_kv_schema_tests:commit_hooks_test/0
     Failure/Error: ?assertEqual({error,apply_translations,{error,[{error,[84,114,97,110,115,108,97,116,105,111,110,32,102,111,114,32,39,114,105,97,107,95,99,111,114,101,46,100,101,102,97,117,108,116,95,98,117,99,107,101,116,95,112,114,111,112,115,46,112,111,115,116,99,111,109,109,105,116,39,32,102,111,117,110,100,32,105,110,118,97,108,105,100,32,99,111,110,102,105,103,117,114,97,116,105,111,110,58,32,105,110,99,111,114,114,101,99,116,32,104,111,111,107,32,102,111,114,109,97,116,32,39,106,115,76,79,76,39]},{error,[84,114,97,110,115,108,97,116,105,111,110,32,102,111,114,32,39,114,105,97,107,95,99,111,114,101,46,100,101,102,97,117,108,116,95,98,117,99,107,101,116,95,112,114,111,112,115,46,112,114,101,99,111,109,109,105,116,39,32,102,111,117,110,100,32,105,110,118,97,108,105,100,32,99,111,110,102,105,103,117,114,97,116,105,111,110,58,32,105,110,99,111,114,114,101,99,116,32,104,111,111,107,32,102,111,114,109,97,116,32,39,98,97,100,58,109,111,100,58,102,117,110,39]}]}}, Config)
       expected: {error,apply_translations,
                     {error,
                         [{error,
                              "Translation for 'riak_core.default_bucket_props.postcommit' found invalid configuration: incorrect hook format 'jsLOL'"},
                          {error,
                              "Translation for 'riak_core.default_bucket_props.precommit' found invalid configuration: incorrect hook format 'bad:mod:fun'"}]}}
            got: {error,apply_translations,
                     {errorlist,
                         [{error,
                              {translation_invalid_configuration,
                                  {"riak_core.default_bucket_props.postcommit",
                                   "incorrect hook format 'jsLOL'"}}},
                          {error,
                              {translation_invalid_configuration,
                                  {"riak_core.default_bucket_props.precommit",
                                   "incorrect hook format 'bad:mod:fun'"}}}]}}
     %% test/riak_kv_schema_tests.erl:278:in `riak_kv_schema_tests:-commit_hooks_test/0-fun-0-/2`


  2) riak_kv_schema_tests:correct_error_handling_by_multibackend_test/0
     Failure/Error: ?assertMatch({ error , apply_translations , { error , _ } }, Config)
       expected: = { error , apply_translations , { error , _ } }
            got: {error,apply_translations,
                     {errorlist,
                         [{error,
                              {translation_invalid_configuration,
                                  {"riak_kv.multi_backend",
                                   "Error processing multi_backend configuration for backend default"}}}]}}
     %% test/riak_kv_schema_tests.erl:305:in `riak_kv_schema_tests:correct_error_handling_by_multibackend_test/0`

@seancribbs
Copy link
Contributor

@andrewjstone Those are fixed by #1071.

@andrewjstone
Copy link
Contributor

Thanks @seancribbs

On Thu, Jan 8, 2015 at 6:34 PM, Sean Cribbs [email protected]
wrote:

@andrewjstone https://github.com/andrewjstone Those are fixed by #1071
#1071.


Reply to this email directly or view it on GitHub
#1070 (comment).

#counter_state{lease_size=LeaseSize} = CounterState,
Start = os:timestamp(),
receive
{'EXIT', Pid, Reason} ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would need to call process_flag(trap_exit, true) in init/1 to get an EXIT message.
See "Receiving Exit Signals" here

Are we sure we want to actually trap exits in the vnode though? @jonmeredith @jtuple

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already trap exits in riak_core_vnode.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh, Thank you @seancribbs

On Tue, Jan 13, 2015 at 3:38 PM, Sean Cribbs [email protected]
wrote:

In src/riak_kv_vnode.erl
#1070 (diff):

  • blocking_lease_counter(State, {0, MaxErrs, MaxTime}).

+-spec blocking_lease_counter(#state{}, {Errors :: non_neg_integer(),

  •                                    MaxErrors :: non_neg_integer(),
    
  •                                    TimeRemainingMillis :: non_neg_integer()}
    
  •                        ) ->
    
  •                                {ok, #state{}} |
    
  •                                counter_lease_error().
    
    +blocking_lease_counter(_State, {MaxErrs, MaxErrs, _MaxTime}) ->
  • {error, counter_lease_max_errors};
    +blocking_lease_counter(State, {ErrCnt, MaxErrors, MaxTime}) ->
  • #state{idx=Index, vnodeid=VId, status_mgr_pid=Pid, counter=CounterState} = State,
  • #counter_state{lease_size=LeaseSize} = CounterState,
  • Start = os:timestamp(),
  • receive
  •    {'EXIT', Pid, Reason} ->
    

We already trap exits in riak_core_vnode
https://github.com/basho/riak_core/blob/develop/src/riak_core_vnode.erl#L186
.


Reply to this email directly or view it on GitHub
https://github.com/basho/riak_kv/pull/1070/files#r22892621.

@andrewjstone
Copy link
Contributor

@russelldb This PR looks very good. I've gone over all the code and ran the eqc test a few times. Tomorrow I will go over the corresponding riak_test PR and run that against this code.

@russelldb
Copy link
Member Author

Arrggh. I messed up. Need to re-do that change, hold off re-review for now

@russelldb
Copy link
Member Author

Urgh, I went and rebased again. And I don't know how to "unrebase" and I don't want to force push. Give me a bit of time to wrestle with the tools here.

@russelldb
Copy link
Member Author

there we go

@andrewjstone
Copy link
Contributor

👍 d77f65f
riak_tests specific to this pass. So do dt tests and sibling explosion tests. Code looks good as well.

borshop added a commit that referenced this pull request Jan 15, 2015
…ase2.0

Per-key actor epochs for vnode vclock entries

Reviewed-by: andrewjstone
@seancribbs
Copy link
Contributor

I was checking this over to see if you had already inserted a metric for when the epoch increments (or leases granted, etc), and found you have a stray file in the PR, test/vnode_driver.erl~. I'll make a separate task for the metrics but removing that stray file would be good, then we can re-approve and merge.

@Basho-JIRA Basho-JIRA changed the title Per-key actor epochs for vnode vclock entries Add metrics for epoch increments, reaps, etc [JIRA: RIAK-1620] Mar 16, 2015
@seancribbs seancribbs changed the title Add metrics for epoch increments, reaps, etc [JIRA: RIAK-1620] Rebase per-key actor epoch (kv679) on 2.0 [JIRA: RIAK-1331] Mar 17, 2015
@Basho-JIRA Basho-JIRA closed this Mar 30, 2015
@martincox martincox deleted the bug/rdb/gh679-actor-epochs-re-rebase2.0 branch June 14, 2019 09:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants