Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests/e2e: add test cases related to HashKV #18369

Merged
merged 1 commit into from
Jul 31, 2024

Conversation

fuweid
Copy link
Member

@fuweid fuweid commented Jul 26, 2024

}
}

func TestVerifyHashKVAfterTwoCompactions_MixVersions(t *testing.T) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note: If #18274 is accepted, the main branch always keep compacted revision. However, existing releases delete that compacted revisions if they're tombstone. The #18274 should skip previous compacted revision which is tombstone. So that HashKV result can be the same in mix versions, especially it won't file data corrupt alert when cluster is updating to new release.

@fuweid fuweid changed the title tests/e2e: add new test cases related to HashKV tests/e2e: add test cases related to HashKV Jul 26, 2024
@fuweid fuweid force-pushed the add-hashkv-test branch from a48b08b to f2636ee Compare July 26, 2024 11:54
hashKVOnRev int64
}{
{
compactedOnRev: 33, // tombstone
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't use comments to imply some relation of number here and consts in newTestKeySetInCluster. Passing an explicit variable should always be better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rollback to original one. Please take a look. If it's accepted, I will squash the commits. Thanks

@fuweid
Copy link
Member Author

fuweid commented Jul 26, 2024

ping @ahrtr

@ahrtr ahrtr self-requested a review July 26, 2024 13:08
@jmhbnz
Copy link
Member

jmhbnz commented Jul 28, 2024

/retest


scenarios := []struct {
ClusterVersion fe2e.ClusterVersion
OnlyOneKey bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide more context what OnlyOneKey means?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When there is only one key and new compaction revision is on tombstone, all MVCC keys will be involved in hash value, because there is empty available revision result keep.

func (h *kvHasher) WriteKeyValue(k, v []byte) {
kr := BytesToRev(k)
upper := Revision{Main: h.revision + 1}
if !upper.GreaterThan(kr) {
return
}
lower := Revision{Main: h.compactRevision + 1}
// skip revisions that are scheduled for deletion
// due to compacting; don't skip if there isn't one.
if lower.GreaterThan(kr) && len(h.keep) > 0 {
if _, ok := h.keep[kr]; !ok {
return
}
}
h.hash.Write(k)
h.hash.Write(v)
}

I want to keep previous compaction revision in #18274, no matter what type of key it is, normal revision or tombstone. Both v3.5.x and v3.4.x releases delete compaction revision if it's tombstone. So, if there is empty available revision result keep, #18274 patch could possibly involve the tombstone in hash value, while the existing releases doesn't. The scenario can be used to verify #18274 won't break compatible.

For example, in one cluster, we have two v3.5.X ETCD members and one main-branch(with #18274 patch) ETCD member. In this cluster, we only have one key foo.

// key: "foo"
// modified: 9
// generations:
//    {{9, 0}[1]}
//    {{5, 0}, {6, 0}, {7, 0}, {8, 0}(t)[4]}
//    {{2, 0}, {3, 0}, {4, 0}(t)[3]}

First compaction is on revision {Main: 4}.

In v3.5.X member

// key: "foo"
// modified: 9
// generations:
//    {{9, 0}[1]}
//    {{5, 0}, {6, 0}, {7, 0}, {8, 0}(t)[4]}


in main-branch(with #18274 patch) member

// key: "foo"
// modified: 9
// generations:
//    {{9, 0}[1]}
//    {{5, 0}, {6, 0}, {7, 0}, {8, 0}(t)[4]}
//    {4, 0}(t)[3]}

If HashKV is to have hash value on Revision{Main: 8}, based on v3.5.x, the hash result will involve the following keys

  • Revision{Main: 5}
  • Revision{Main: 6}
  • Revision{Main: 7}
  • Revision{Main: 8}

But for main-branch(with #18274 patch), it could compute the Revision{Main: 4} for HashKV result. It's unexpected, which could cause data corruption alert.

So, I use OnlyOneKey to represent this case. Sorry It's confusing.
However, I don't want to involve too many detail in here because it seems reasonable to have different key set to test.


compactedOnRev := dataset.tombstones[0]
if !tt.compactedOnTombstoneRev {
compactedOnRev += 3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the significance of number 3 here? Can we create a local const? Like numberOfRevisions or emptyWrites.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 is random number I use to pick up a non-tombstone revision for compaction.
As your suggestion, I use afterWrites variable for that. Please take a look.

Comment on lines 339 to 347
if clusVersion != fe2e.CurrentVersion {
if !fileutil.Exist(fe2e.BinPath.EtcdLastRelease) {
t.Skipf("%q does not exist", fe2e.BinPath.EtcdLastRelease)
}
}

ctx := context.Background()

fe2e.BeforeTest(t)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be in the test case directly instead of a helper function.

I don't think we need newClusterForHashKV at all.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I removed it.

}

type datasetInfo struct {
keys map[string]struct{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You just populate the data for keys, but it isn't used at all.

We can remove datasetInfo, and just return tombstones and latestRev in the functions of populating data.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 108 to 117
t.Logf("HashKV on rev=%d", hashKVOnRev)
resp, err := cli.HashKV(ctx, hashKVOnRev)
require.NoError(t, err)

require.Len(t, resp, 3)
require.True(t, resp[0].Hash != 0)
t.Logf("One Hash value is %d", resp[0].Hash)

require.Equal(t, resp[0].Hash, resp[1].Hash)
require.Equal(t, resp[1].Hash, resp[2].Hash)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comment, we can add a helper function something verifyConsistentHashAcrossAllMembers

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

"go.etcd.io/etcd/client/pkg/v3/fileutil"
clientv3 "go.etcd.io/etcd/client/v3"
"go.etcd.io/etcd/tests/v3/framework/config"
fe2e "go.etcd.io/etcd/tests/v3/framework/e2e"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just e2e? All other test just import this as e2e.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

lastestRevision int64
}

// key: "foo"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for crafting such a meticulous testing scenario, this seems very throughout, however for reviewer it's or any future contributor it might be hard to understand why we picked this exact data shape and no other.

My question would be, how hard would it be to generalize? We know one an exact case that we know that we could break hashKV, but how sure we are that we don't miss another case? Hardcoding a exact scenario protects us against this exact mistake, but can we be sure there will not be more such cases in the future? Can we be sure that HashKV, is resilient enough?

Have you heard about property testing? https://www.mayhem.security/blog/what-is-property-based-testing. For me the etcd should always hold property that no matter the type of write, no matter the compaction revision, no matter the etcd version, no matter the revision, HashKV should always be the same on all cluster members.

I think it would be ok to merge those exact scenarios, but I also we should think how we can generalize it. For example, instead of checking hashKV on one revision, test all of them in for loop.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of checking hashKV on one revision, test all of them in for loop.

Good point. Will simplify data generator.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I follow @ahrtr suggestion with using populateData. Please take a look.

// generations:
//
// {34, 1}
func newTestDatasetInCluster(t *testing.T, clus *fe2e.EtcdProcessCluster, cliCfg fe2e.ClientConfig, onlyOneKey bool) *datasetInfo {
Copy link
Member

@ahrtr ahrtr Jul 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unnecessarily complicated.

Please consider a function something like below,

// populateData populates some sample data, and return a slice of tombstone revisions and the latest revision
func populateData(t *testing.T, clus *fe2e.EtcdProcessCluster, clientCfg fe2e.ClientConfig, keys []string) ([]int64, int64) {
	c := newClient(t, clus.EndpointsGRPC(), clientCfg)

	ctx := context.Background()
	totalOperations := 40

	var (
		tombStoneRevs []int64
		latestRev     int64
	)

	deleteStep := 10 // submit a delete operation on every 10 operations
	for i := 1; i <= totalOperations; i++ {
		if i%deleteStep == 0 {
			t.Logf("Deleting key=%s", keys[0]) // Only delete the first key for simplicity
			resp, derr := c.Delete(ctx, keys[0])
			require.NoError(t, derr)
			latestRev = resp.Header.Revision
			tombStoneRevs = append(tombStoneRevs, resp.Header.Revision)
			continue
		}

		value := fmt.Sprintf("%d", i)
		var ops []clientv3.Op
		for _, key := range keys {
			ops = append(ops, clientv3.OpPut(key, value))
		}

		t.Logf("Writing keys: %v, value: %s", keys, value)
		resp, terr := c.Txn(ctx).Then(ops...).Commit()
		require.NoError(t, terr)
		require.True(t, resp.Succeeded)
		require.Len(t, resp.Responses, len(ops))
		latestRev = resp.Header.Revision
	}

	return tombStoneRevs, latestRev
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I used your code in latest commit. Please take a look.

@fuweid fuweid force-pushed the add-hashkv-test branch from bcbe3ca to 2345b17 Compare July 29, 2024 12:49
Comment on lines 189 to 194
require.Len(t, resp, 3)
require.True(t, resp[0].Hash != 0)
t.Logf("One Hash value is %d", resp[0].Hash)

require.Equal(t, resp[0].Hash, resp[1].Hash)
require.Equal(t, resp[1].Hash, resp[2].Hash)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
require.Len(t, resp, 3)
require.True(t, resp[0].Hash != 0)
t.Logf("One Hash value is %d", resp[0].Hash)
require.Equal(t, resp[0].Hash, resp[1].Hash)
require.Equal(t, resp[1].Hash, resp[2].Hash)
require.Greater(t, len(resp), 1)
require.True(t, resp[0].Hash != 0)
t.Logf("One Hash value is %d", resp[0].Hash)
for i := 1; i < len(resp); i++ {
require.Equal(t, resp[0].Hash, resp[i].Hash)
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the suggestion. updated

Comment on lines 81 to 85
// If compaction revision is not tombstone, select revision after 3 writes from first tombstone.
// And ensure it's not the following tombstone.
const afterWriters = int64(3)
if !compactedOnTombstoneRev {
compactedOnRev += afterWriters
require.True(t, tombstoneRevs[1] > compactedOnRev)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 3 may cause unnecessary confusion. We don't care about the number at all, we just need to ensure the compaction revision isn't a tombstone.

Suggested change
// If compaction revision is not tombstone, select revision after 3 writes from first tombstone.
// And ensure it's not the following tombstone.
const afterWriters = int64(3)
if !compactedOnTombstoneRev {
compactedOnRev += afterWriters
require.True(t, tombstoneRevs[1] > compactedOnRev)
}
// If compaction revision isn't a tombstone, select a revision in the middle of two tombstones.
if !compactedOnTombstoneRev {
compactedOnRev = (tombstoneRevs[0] + tombstoneRevs[1]) / 2
require.True(t, tombstoneRevs[0] < compactedOnRev && compactedOnRev < tombstoneRevs[1])
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.


tombstoneRevs, lastestRev := populateDataForHashKV(t, clus, cfg.Client, scenario.keys)

compactedOnRev := tombstoneRevs[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also add a scenario which compact the last tombstone tombstoneRevs[len(tombstoneRevs) - 1]?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added TestVerifyHashKVAfterCompactionOnLastTombstone_MixVersions

@fuweid fuweid force-pushed the add-hashkv-test branch 2 times, most recently from 34f81e7 to c118987 Compare July 30, 2024 11:06
Comment on lines 81 to 82
// If compaction revision is not tombstone, select revision after 3 writes from first tombstone.
// And ensure it's not the following tombstone.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comment

Suggested change
// If compaction revision is not tombstone, select revision after 3 writes from first tombstone.
// And ensure it's not the following tombstone.
// If compaction revision isn't a tombstone, select a revision in the middle of two tombstones.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my mistake. Updated

@fuweid fuweid force-pushed the add-hashkv-test branch from c118987 to 6f93af8 Compare July 30, 2024 13:02
@ahrtr
Copy link
Member

ahrtr commented Jul 30, 2024

Ideally, we should consolidate the two cases, TestVerifyHashKVAfterCompact and TestVerifyHashKVAfterCompactionOnLastTombstone_MixVersions, into one. But I won't insist on that because the PR is already good enough.

Copy link
Member

@ahrtr ahrtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Nice work, thanks @fuweid

Copy link
Member

@serathius serathius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to give us confidence in fixing the tombstone compaction bug. Thanks @fuweid awesome work.

I think we should also consider generalizing it by adding hashKV checking to robustness tests.

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahrtr, fuweid, serathius

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@serathius serathius merged commit 739a9b6 into etcd-io:main Jul 31, 2024
43 checks passed
// If compaction revision isn't a tombstone, select a revision in the middle of two tombstones.
if !compactedOnTombstoneRev {
compactedOnRev = (tombstoneRevs[0] + tombstoneRevs[1]) / 2
require.Greater(t, tombstoneRevs[1], compactedOnRev)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It seems we partially missed the comment #18369 (comment)

require.True(t, tombstoneRevs[0] < compactedOnRev && compactedOnRev < tombstoneRevs[1])

fuweid added a commit to fuweid/etcd that referenced this pull request Aug 1, 2024
@fuweid
Copy link
Member Author

fuweid commented Aug 1, 2024

Thanks @ahrtr @serathius for the review. I file pr #18387 to fix the comment #18369 (comment).

@fuweid
Copy link
Member Author

fuweid commented Aug 21, 2024

ping @ahrtr could we remove label backport/v3.4? The #18476 covers mixed cluster versions [3.4 and 3.5].

@ahrtr
Copy link
Member

ahrtr commented Aug 21, 2024

ping @ahrtr could we remove label backport/v3.4? The #18476 covers mixed cluster versions [3.4 and 3.5].

Removed.

I was thinking the e2e test cases should also support different patch versions, e.g 3.4.33 vs 3.4.34+, or 3.5.15 vs 3.5.16+. But I do not see a simple/maintainable way to do it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

5 participants