-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tests: add robustness test for issue 17780 #18099
Conversation
Skipping CI for Draft Pull Request. |
tests/robustness/traffic/etcd.go
Outdated
@@ -61,6 +61,18 @@ var ( | |||
{choice: LargePut, weight: 5}, | |||
}, | |||
} | |||
// Issue17780EtcdPutDelete is to create high chance to have more delete | |||
// requests so that it's likely to compact that revision which is tombstone. | |||
Issue17780EtcdPutDelete = etcdTraffic{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be inline if it is not used by any other tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean inline is about anonymous value during assignment?
etcdTraffic
is unexported type and choiceWeight
is alsog unexported type...
I think it's hard to make it inline if I understand it correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. How about just name it to EtcdPutDeleteHeavy
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 merging it with other traffic and incorporating it with exploratory testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @siyuanfoundation @serathius , I want to keep it with special name here because it's hard to reproduce it without special input or setting. If we put it into exploratory testing, existing case might have chance to reproduce it. But it's unknown for us. I was trying many combinations of traffic but it runs one day without luck. And then we introduce compact traffic, it also makes it hard to trigger issue 17780.
If you have better idea, please let me know. Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to avoid having a issue specific traffic, to prevent regressions in robustness we don't need 100% reproducability, about 2-5% is ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to use existing traffic to reproduce it. Unfortunately, I run it for 3 days. No luck :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me take a look at it
I would like to better understand why this is not aligned with exploratory tests. The goal is to incorporate regressions into exploratory testing, not just create a new scenario. |
tests/robustness/failpoint/gofail.go
Outdated
|
||
// AllowBatchCompactBeforeSetFinishedCompactPanic is used to trigger | ||
// that compactBeforeSetFinishedCompact failpoint only if the current | ||
// revision number is higher than that batch limit. | ||
AllowBatchCompactBeforeSetFinishedCompactPanic Failpoint = goPanicFailpoint{ | ||
failpoint: "compactBeforeSetFinishedCompact", | ||
trigger: triggerCompact{multiBatchCompaction: true}, | ||
target: AnyMember, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// AllowBatchCompactBeforeSetFinishedCompactPanic is used to trigger | |
// that compactBeforeSetFinishedCompact failpoint only if the current | |
// revision number is higher than that batch limit. | |
AllowBatchCompactBeforeSetFinishedCompactPanic Failpoint = goPanicFailpoint{ | |
failpoint: "compactBeforeSetFinishedCompact", | |
trigger: triggerCompact{multiBatchCompaction: true}, | |
target: AnyMember, | |
} | |
CompactBeforeSetFinishedCompactPanic Failpoint = goPanicFailpoint{"compactBeforeSetFinishedCompact", triggerCompact{multiBatchCompaction: true}, AnyMember} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @serathius , CompactBeforeSetFinishedCompactPanic
is already existing at
etcd/tests/robustness/failpoint/gofail.go
Line 47 in 5790774
CompactBeforeSetFinishedCompactPanic Failpoint = goPanicFailpoint{"compactBeforeSetFinishedCompact", triggerCompact{}, AnyMember} |
I add new one here because this issue has a requirement that the compactor has to delete tombstone in one batch and update UnsafeSetFinishedCompact
value in next round. IMO, it's better to have two go-failpoints here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can have both, best to even duplicate all failpoints with triggerCompact to have option with and without multiBatchCompaction.
See #17680 for an example of how to add new repro. I adjusted couple of parameters. |
ping any progress? |
Hi @serathius I switched to use
|
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files
... and 22 files with indirect coverage changes @@ Coverage Diff @@
## main #18099 +/- ##
==========================================
+ Coverage 68.78% 68.83% +0.05%
==========================================
Files 420 420
Lines 35523 35532 +9
==========================================
+ Hits 24433 24459 +26
+ Misses 9665 9652 -13
+ Partials 1425 1421 -4 Continue to review full report in Codecov by Sentry.
|
A quick question, did you reproduce the issue on the latest code of main branch or did you just reproduce it on the code before the fix? |
@@ -72,7 +72,7 @@ func (t triggerCompact) Trigger(ctx context.Context, _ *testing.T, member e2e.Et | |||
return nil, fmt.Errorf("failed to get revision: %w", err) | |||
} | |||
|
|||
if !t.multiBatchCompaction || rev > int64(clus.Cfg.ServerConfig.ExperimentalCompactionBatchLimit) { | |||
if !t.multiBatchCompaction || rev >= int64(clus.Cfg.ServerConfig.ExperimentalCompactionBatchLimit) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How this change is related? Does it help with repro? I assumed ">" was intentional as we want have additional revision to ensure 2 batches execute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. Reverted this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Managed to reproduce the issue within ~100 runs.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: fuweid, serathius The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Closes etcd-io#17780 Signed-off-by: Wei Fu <[email protected]>
Hi @ahrtr , I can't reproduce this issue on main branch. It's used to reproduce it on the code before the fix. |
/retest |
Closes #17780
Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.