-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: chaos test on route could still works when etcd is down #3404
Conversation
Signed-off-by: yiyiyimu <[email protected]>
Signed-off-by: yiyiyimu <[email protected]>
Signed-off-by: yiyiyimu <[email protected]>
Signed-off-by: yiyiyimu <[email protected]>
Signed-off-by: yiyiyimu <[email protected]>
Signed-off-by: yiyiyimu <[email protected]>
Signed-off-by: yiyiyimu <[email protected]>
Signed-off-by: yiyiyimu <[email protected]>
t/chaos-test/kill-etcd_test.go
Outdated
setRoute(e, http.StatusOK) | ||
getRoute(e, http.StatusOK) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add check such as nginx error log and Prometheus
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Added
Signed-off-by: yiyiyimu <[email protected]>
Signed-off-by: yiyiyimu <[email protected]>
Signed-off-by: yiyiyimu <[email protected]>
Signed-off-by: yiyiyimu <[email protected]>
Signed-off-by: yiyiyimu <[email protected]>
Signed-off-by: yiyiyimu <[email protected]>
…t working Signed-off-by: yiyiyimu <[email protected]>
t/chaos/kill-etcd_test.go
Outdated
// run in background | ||
go func() { | ||
for i := 1; ; i++ { | ||
go getWithoutTest(g, "/hello", map[string]string{"Host": "foo.com"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HELP NEEDED!!
Here I try to visit the route in the same frequency, so I could check if it keep the same before and after etcd shutdown.
For some reason, using bare net/http
pkg to access the route would always return "404 Route not found", although the route is created, and accessing the same URL with httpexpect
could work(in line 207). So the next check would fail, getIngressBandwidthPerSecond
would return 0 since no request succeeds (commit) (see ci error).
While the reason I don't want to use httpexpect
this place, is due to it would print lots of log each time a request sent and it would make test log into a mess (commit). So a bare HTTP get would be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @juzhiyuan could you offer some help on this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since using httpexpect
would print log each time a request sent, I turn the route visit frequency from 10 times/s to 1 times/s to reduce the log printing. The reason not to reduce visit duration is that, when frequency is larger than 1 time/s, some visit would return "503 Service Unavailable"
, although I don't think that's reasonable since the frequency is still very low compared to the real usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question is raised in apache/apisix-dashboard#1388
Currently when etcd is killed, apisix would keep print out |
Signed-off-by: yiyiyimu <[email protected]>
Signed-off-by: yiyiyimu <[email protected]>
Signed-off-by: yiyiyimu <[email protected]>
Signed-off-by: yiyiyimu <[email protected]>
Signed-off-by: yiyiyimu <[email protected]>
…ics, so need to calculate the duration Signed-off-by: yiyiyimu <[email protected]>
@Yiyiyimu It seems in this case you kill all etcd members. What about just killing the leader (will recover overtime), killing the follower (no issue), kill majority members (cluster will not recover)? |
Hi @tokers, I think in that case, that's more about test etcd itself, but not apisix. Besides, I don't think we should test etcd in this way, since that's what we make sure how raft and etcd works. If we test how apisix performs, like when we kill the leader, in a rare case (because the sync is so fast), we could find maybe one new route is missing. But the test case is so unstable, so it's maybe not so reasonable Update: |
Signed-off-by: yiyiyimu <[email protected]>
Signed-off-by: yiyiyimu <[email protected]>
What this PR does / why we need it:
Related to #2757
Use chaos mesh to generate the situation, when etcd is all killed, route could still works.
TODO
Pre-submission checklist: