-
Notifications
You must be signed in to change notification settings - Fork 8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor degrade hierarchy with new circuit breaker mechanism and improve strategy #1490
Conversation
df75e22
to
c017b12
Compare
Codecov Report
@@ Coverage Diff @@
## master #1490 +/- ##
============================================
+ Coverage 43.96% 45.08% +1.11%
- Complexity 1719 1786 +67
============================================
Files 376 382 +6
Lines 10660 10854 +194
Branches 1418 1443 +25
============================================
+ Hits 4687 4893 +206
+ Misses 5408 5373 -35
- Partials 565 588 +23 Continue to review full report at Codecov.
|
for (DegradeRule rule : list) { | ||
if (!isValidRule(rule)) { | ||
RecordLog.warn( | ||
"[DegradeRuleManager] Ignoring invalid degrade rule when loading new rules: " + rule); | ||
RecordLog.warn("[DegradeRuleManager] Ignoring invalid rule when loading new rules: " + rule); | ||
continue; | ||
} | ||
|
||
if (StringUtil.isBlank(rule.getLimitApp())) { | ||
rule.setLimitApp(RuleConstant.LIMIT_APP_DEFAULT); | ||
} | ||
|
||
String identity = rule.getResource(); | ||
Set<DegradeRule> ruleSet = newRuleMap.get(identity); | ||
if (ruleSet == null) { | ||
ruleSet = new HashSet<>(); | ||
newRuleMap.put(identity, ruleSet); | ||
CircuitBreaker cb = getExistingSameCbOrNew(rule); | ||
if (cb == null) { | ||
RecordLog.warn("[DegradeRuleManager] Unknown circuit breaking strategy, ignoring: " + rule); | ||
continue; | ||
} | ||
ruleSet.add(rule); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inside getExistingSameCbOrNew
method, the design of reuse the circuit breaker if the rule remains unchanged.is smart. I tried with DegradeRuleManager.loadRules
in different places, it works. A small question is that, if using DegradeRuleManager.loadRules(rules)
first time, and rules have two same DegradeRule
, there will be two CircuitBreaker
created, since in the first time, the static circuitBreakers
variable is null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it can be improved.
return; | ||
} | ||
|
||
if (curEntry.getBlockError() == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This judgement seems can be removed since Line#63.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's here in case the blockError was modified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really concurrent? And the time lapse is really tiny here.
where is the proposal for this pr? |
I'll link the proposal here soon. |
public boolean tryPass() { | ||
// Template implementation. | ||
if (currentState.get() == State.CLOSED) { | ||
return true; | ||
} | ||
if (currentState.get() == State.OPEN) { | ||
// For half-open state we allow a request for trial. | ||
return retryTimeoutArrived() && fromOpenToHalfOpen(); | ||
} | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If thare are two breakers and curState is open, one breaker tryPass and change state to half open, but other breaker tryPass return false, it is right that no request be allowed. but the below requests will not be allowed because two breakers tryPass can't both be true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sczyh30 Any idea for the scene?Was I wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's ok for this scenario.
B1(open) & B2(open) -> no request at all
B1(open -> half) & B2(open) -> no request at all, but B1 may transform to closed leaving B2 alone, that is:
B1(closed) & B2(open)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, i cannot find how to change half-open to close.
Method fromHalfOpenToClose
is invoked in DegradeSlot#exist
when no BlockingException. But if B1(open -> half) & B2(open), the follow requests check by CircuitBreaker#tryPass
must return false then throw DegradeException.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requests is blocked by tryPass()
in abstract parent and the state of breakers are maintained in different implementations when requests complete (post-request).
So they are not collided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public boolean tryPass() {
// Template implementation.
if (currentState.get() == State.CLOSED) {
return true;
}
if (currentState.get() == State.OPEN) {
// For half-open state we allow a request for trial.
return retryTimeoutArrived() && fromOpenToHalfOpen();
}
return false;
}
我已经自我怀疑了🤨😢 是我理解错了吗??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
```java public boolean tryPass() { // Template implementation. if (currentState.get() == State.CLOSED) { return true; } if (currentState.get() == State.OPEN) { // For half-open state we allow a request for trial. return retryTimeoutArrived() && fromOpenToHalfOpen(); } return false; }我已经自我怀疑了🤨😢 是我理解错了吗??
emmm.........仔细想了下,你考虑的问题的确存在,而且貌似还存在其它限流规则的问题,因为这里并没有判断是不是DegradeException,待我下午拿Test测一下
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sczyh30 Hi, Eric
Here're some boundaries which lead to new issues while we should discuss more how to make the recovery logic better:
- Multiple degrading rules for same resource(Especially after introducing prefix/postfix matching)
- Another rule after degrading rule in half-open state(Though there's no such kind of rule but it's possible in future)
These will lead to the incorrect workflow:
entry: R0(open -> half) -> R1(block)
exit: XXXBlockException -> skip the detection for stalling half state
For straightforward patch i think the onComplete
method is intentioned to be call after a successfully transforming into half. We should make the exiting logic more simple and clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jasonjoo2010 @wavesZh Yes, it's a fatal bug indeed and should be resolved immediately. However we need a better temporary workaround for this (for the half-open case). I'll merge this PR first and we may discuss it in a new issue. It's a common problem for all rules that have their own metrics and rely on onComplete
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #1638
...java/com/alibaba/csp/sentinel/slots/block/degrade/circuitbreaker/AbstractCircuitBreaker.java
Outdated
Show resolved
Hide resolved
...inel-core/src/main/java/com/alibaba/csp/sentinel/slots/block/degrade/DegradeRuleManager.java
Outdated
Show resolved
Hide resolved
...inel-core/src/main/java/com/alibaba/csp/sentinel/slots/block/degrade/DegradeRuleManager.java
Outdated
Show resolved
Hide resolved
return; | ||
} | ||
|
||
if (curEntry.getBlockError() == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really concurrent? And the time lapse is really tiny here.
...ibaba/csp/sentinel/slots/block/degrade/circuitbreaker/CircuitBreakerStateChangeObserver.java
Outdated
Show resolved
Hide resolved
...rc/main/java/com/alibaba/csp/sentinel/slots/block/degrade/circuitbreaker/CircuitBreaker.java
Show resolved
Hide resolved
...java/com/alibaba/csp/sentinel/slots/block/degrade/circuitbreaker/CircuitBreakerStrategy.java
Outdated
Show resolved
Hide resolved
* | ||
* @since 1.7.0 | ||
*/ | ||
private int rtSlowRequestAmount = RuleConstant.DEGRADE_DEFAULT_SLOW_REQUEST_AMOUNT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a missing property in new implementation. Is that acceptable in compatibility?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could bring breaking changes (and break the SEMVER spec) indeed. In convention, we should mark it deprecated and remove it until 2.x. But actually it's rarely used and often regarded as a "hidden" attribute (not even appeared in the dashboard)...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah you're right. But i just afraid the users who integrating manually suffering broken compatibility.😂 We can make decision here whether doing it carefully this time. Decide and record it and it would be ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO we could drop it (the legacy bad design) and add notes in relevant documents :)
@Override | ||
public void exit(Context context, ResourceWrapper resourceWrapper, int count, Object... args) { | ||
fireExit(context, resourceWrapper, count, args); | ||
public void exit(Context context, ResourceWrapper r, int count, Object... args) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here's a problem that if something went wrong which may cause RT down to more than 10s (100ms for normal) . Our breaker will cut the traffic after 10s. Maybe we could improve it (or as a new issue and do it in future) that unfinished entries could be recorded as statIntervalMs
when slidingCounter is reset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's necessary to improve the scenarios regarding slow in-flight requests. See #1405
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh they are not the same issue i think.
How to calculate a more reasonable RT is one thing while how to make a more reasonable degradation is quite another thing. The primary reason is that the calculating window is separated from general RT, isn't?
And sure we can improve it later but it's more important than making it more reasonable calculating RT. Maybe another time window is required to make it under tracing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could open a new issue to discuss it.
…rove strategy * Add `CircuitBreaker` abstraction (with half-open state) and add circuit breaker state change event observer support. * Improve circuit breaking strategy (avg RT → slow request ratio) and make statistics of each rule dependent (to support arbitrary statistic interval). * Add simple "trial" mechanism (aka. half-open). * Refactor mechanism of metric recording and state change handling for circuit breakers: record RT and error when requests have completed (i.e. `onExit`, based on alibaba#1420). Signed-off-by: Eric Zhao <[email protected]>
Signed-off-by: Eric Zhao <[email protected]>
Signed-off-by: Eric Zhao <[email protected]>
4c043ed
to
17395c0
Compare
@jasonjoo2010 I've updated the PR, please check. Note: please use "rebase and merge" for this PR instead of "squash and merge". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Describe what this PR does / why we need it
Refactor legacy degrade hierarchy with new circuit breaker mechanism and improve strategy.
Does this pull request fix one issue?
Resolves #1421, #1032, #951, #154, #308, #56
Describe how you did it
CircuitBreaker
abstraction (with half-open state) and add circuit breaker state change event observer support. Now Sentinel follows the canonical circuit breaker pattern (with some improvements).onExit
, based on Refactor the mechanism of recording error (on completed) #1420).Describe how to verify it
Run the test cases and demo.
Special notes for reviews
This PR contains internal breaking changes (and some behavioral changes for RT-based circuit breaking).