Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raft/rafttest: introduce datadriven testing #11005

Merged
merged 2 commits into from
Aug 12, 2019
Merged

Conversation

tbg
Copy link
Contributor

@tbg tbg commented Aug 7, 2019

It has often been tedious to test the interactions between multi-member
Raft groups, especially when many steps were required to reach a certain
scenario. Often, this boilerplate was as boring as it is hard to write and
hard to maintain, making it attractive to resort to shortcuts whenever
possible, which in turn tended to undercut how meaningful and maintainable
the tests ended up being - that is, if the tests were even written, which
sometimes they weren't.

This change introduces a datadriven framework specifically for testing
deterministically the interaction between multiple members of a raft group
with the goal of reducing the friction for writing these tests to near
zero.

In the near term, this will be used to add thorough testing for joint
consensus (which is already available today, but wildly undertested), but
just converting an existing test into this framework has shown that the
concise representation and built-in inspection of log messages highlights
unexpected behavior much more readily than the previous unit tests did (the
test in question is snapshot_succeed_via_app_resp; the reader is invited
to compare the old and new version of it).

The main building block is InteractionEnv, which holds on to the state of
the whole system and exposes various relevant methods for manipulating it,
including but not limited to adding nodes, delivering and dropping
messages, and proposing configuration changes. All of this is extensible so
that in the future I hope to use it to explore the phenomena discussed in

#7625 (comment)

which requires injecting appropriate "crash points" in the Ready handling
loop. Discussions of the "what if X happened in state Y" can quickly be
made concrete by "scripting up an interaction test".

Additionally, this framework is intentionally not kept internal to the raft
package.. Though this is in its infancy, a goal is that it should be
possible for a suite of interaction tests to allow applications to validate
that their Storage implementation behaves accordingly, simply by running a
raft-provided interaction suite against their Storage.

raft/rafttest/interaction_env_handler.go Show resolved Hide resolved
raft/rafttest/interaction_env_handler_campaign.go Outdated Show resolved Hide resolved
raft/testdata/confchange_v1.txt Show resolved Hide resolved
raft/testdata/confchange_v1.txt Outdated Show resolved Hide resolved
raft/rafttest/interaction_env_logger.go Outdated Show resolved Hide resolved
raft/rafttest/interaction_env_handler_campaign.go Outdated Show resolved Hide resolved
raft/rafttest/interaction_env_handler_handle_ready.go Outdated Show resolved Hide resolved
raft/rafttest/interaction_env_handler.go Outdated Show resolved Hide resolved
raft/rafttest/interaction_env.go Outdated Show resolved Hide resolved
raft/testdata/campaign.txt Show resolved Hide resolved
Picks up some fixes for papercuts.
@tbg tbg force-pushed the interactiontest branch from 5f1e2f7 to 09788c9 Compare August 9, 2019 22:57
@tbg
Copy link
Contributor Author

tbg commented Aug 9, 2019

Thanks for the reviews so far. I need to address some more additional comments and clean up some more, will ping when it's ready.

@codecov-io
Copy link

Codecov Report

Merging #11005 into master will increase coverage by 0.2%.
The diff coverage is 83.33%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #11005     +/-   ##
=========================================
+ Coverage   63.98%   64.18%   +0.2%     
=========================================
  Files         402      402             
  Lines       37646    37712     +66     
=========================================
+ Hits        24087    24205    +118     
+ Misses      11955    11897     -58     
- Partials     1604     1610      +6
Impacted Files Coverage Δ
raft/tracker/progress.go 97.72% <ø> (ø) ⬆️
raft/raft.go 90.51% <100%> (+0.34%) ⬆️
raft/util.go 82.17% <81.53%> (+9.7%) ⬆️
pkg/netutil/netutil.go 63.11% <0%> (-7.38%) ⬇️
pkg/fileutil/purge.go 65.9% <0%> (-6.82%) ⬇️
etcdserver/api/v3rpc/lease.go 69.31% <0%> (-5.69%) ⬇️
clientv3/concurrency/election.go 79.68% <0%> (-2.35%) ⬇️
pkg/mock/mockserver/mockserver.go 72.72% <0%> (-2.28%) ⬇️
clientv3/balancer/balancer.go 86.36% <0%> (-2.28%) ⬇️
clientv3/maintenance.go 40.81% <0%> (-2.05%) ⬇️
... and 16 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4a4629f...09788c9. Read the comment docs.

It has often been tedious to test the interactions between multi-member
Raft groups, especially when many steps were required to reach a certain
scenario. Often, this boilerplate was as boring as it is hard to write
and hard to maintain, making it attractive to resort to shortcuts
whenever possible, which in turn tended to undercut how meaningful and
maintainable the tests ended up being - that is, if the tests were even
written, which sometimes they weren't.

This change introduces a datadriven framework specifically for testing
deterministically the interaction between multiple members of a raft group
with the goal of reducing the friction for writing these tests to near
zero.

In the near term, this will be used to add thorough testing for joint
consensus (which is already available today, but wildly undertested),
but just converting an existing test into this framework has shown that
the concise representation and built-in inspection of log messages
highlights unexpected behavior much more readily than the previous unit
tests did (the test in question is `snapshot_succeed_via_app_resp`; the
reader is invited to compare the old and new version of it).

The main building block is `InteractionEnv`, which holds on to the state
of the whole system and exposes various relevant methods for
manipulating it, including but not limited to adding nodes, delivering
and dropping messages, and proposing configuration changes. All of this
is extensible so that in the future I hope to use it to explore the
phenomena discussed in

etcd-io#7625 (comment)

which requires injecting appropriate "crash points" in the Ready
handling loop. Discussions of the "what if X happened in state Y"
can quickly be made concrete by "scripting up an interaction test".

Additionally, this framework is intentionally not kept internal to the
raft package.. Though this is in its infancy, a goal is that it should
be possible for a suite of interaction tests to allow applications to
validate that their Storage implementation behaves accordingly, simply
by running a raft-provided interaction suite against their Storage.
@tbg tbg force-pushed the interactiontest branch from 09788c9 to e8090e5 Compare August 12, 2019 09:13
@tbg tbg changed the title [wip] raft/rafttest: introduce datadriven testing raft/rafttest: introduce datadriven testing Aug 12, 2019
@tbg
Copy link
Contributor Author

tbg commented Aug 12, 2019

Going to merge this now, but happy to address any follow-up comments.

@tbg tbg merged commit 029401a into etcd-io:master Aug 12, 2019
@tbg tbg deleted the interactiontest branch August 12, 2019 09:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants