-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix arithmetic underflow in the StabilityPool #438
Conversation
In the original implementation, we track an absolute error (`lastBoldLossError_Offset`), based on some amount of total Bold deposits, but afterwards we add it to a fraction with a smaller amount of total Bold deposits, amplifying the effect of this error recovery too much. This causes that over a sequence of liquidations, the ratio between the initial P and the final P is bigger than the ratio between the initial total amount of deposits and its final value. This is fine in one step if it’s compensating for a previous error, but not on the long run. If we compare to an initial state where there was no error, the ratio between P’s should be always less or equal than the ratio between total deposits. Of course theoretically should be exactly the same, but in practice, we must handle the rounding errors that way to prevent that sum of yield rewards (which use P ratios) grows bigger than the total amount of owed rewards (causing an underflow in the subtraction if all users would try to claim).
Even with the fix in the previous commit, the error itself has rounding errors. So if a given P_0 is smaller than it should due to rounding, then it may happen that over a sequence of liquidations the ratio P_n / P_0 is again too big.
As an extra safety measure to the previous commits, we make sure we never try to claim more than we have tracked as yields owed, so that the subtraction doesn’t underflow.
... for better readability.
The `P-1` one plus the balance cap to prevent underflows.
contracts/src/StabilityPool.sol
Outdated
@@ -632,52 +664,6 @@ contract StabilityPool is LiquityBase, IStabilityPool, IStabilityPoolEvents { | |||
return initialDeposit * (firstPortion + secondPortion) / snapshots.P / DECIMAL_PRECISION; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also truncate this to yieldGainsOwed
, if we want to be consistent. Plus, the invariant tests rely on getDepositorYieldGainWithPending()
to predict the precise amount of yield that's going to be received by the depositor, so there are currently some failures due to the inconsistency between getDepositorYieldGain()
& getDepositorYieldGainWithPending()
.
Truncate also with `yieldGainsOwed`.
contracts/src/StabilityPool.sol
Outdated
|
||
return yieldGain; | ||
return LiquityMath._min(yieldGain, yieldGainsOwed); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the ceiling be yieldGainsOwed + pendingSPYield
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it's more complicated. I think we should only add pendingSPYield
if totalBoldDeposits >= DECIMAL_PRECISION
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably explains why all the tests are failing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed: 120792d
Found this case during a CI run (belongs in function test_Issue_ShareOfCollMismatch() external {
testLiquidationsWithLowPAllowFurtherRewardsForAllFreshDepositors_Cheat_Fuzz(0, 504242351683211589370490);
} This passes on main but fails on this PR branch. Thought it was worth looking at. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Left some questions.
@@ -408,7 +409,7 @@ contract StabilityPool is LiquityBase, IStabilityPool, IStabilityPoolEvents { | |||
uint256 yieldPerUnitStaked = yieldNumerator / totalBoldDepositsCached; | |||
lastYieldError = yieldNumerator - yieldPerUnitStaked * totalBoldDepositsCached; | |||
|
|||
uint256 marginalYieldGain = yieldPerUnitStaked * P; | |||
uint256 marginalYieldGain = yieldPerUnitStaked * (P - 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't fully get the use of P - 1
rather than P. As far as I can see:
- The P error correction increases P, but
- P error correction itself has rounding error (so, smaller than it should be)
Why do we then use P -1? Doesn't that make it even smaller than it should be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rationale is that we end up using x * P_n / P_m
, with n > m
(and being P_n
the one in the line above.
The fraction P_n / P_m
represents the ratio of total deposits between those two snapshots (m
and n
).
We want the ratio between P
’s to always be <= than the ratio between deposits. So that rewards shared are less or equal than should be, but not greater.
As both P_n
and P_m
are inaccurate, it may happen than the error (truncating) in P_m
is bigger than in P_n
. That would make the fraction to increase.
Let’s put an example, assuming DECIMAL_PRECISION = 100
. Let’s imagine that the total deposits are 4999 and 2001 for m
and n
snapshots, and that theoretical values of P
would be 49.99 (truncated to 49) and 20.01 (truncated to 20). Then we have P_n / P_m = 20 / 49 > 2001 / 4999 = deposit_n / deposit_m
. That’s what we wan to avoid, so we use P_n = 19
instead of 20. Of course in this example this makes the precision of P_n / P_m
very bad, but this is an edge case and mostly due to the low precision (only 2 decimals). With 18 decimals, subtracting 1 wouldn’t be such big deal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, yes I see. So this correction should always make the error in P_n
bigger than the error in P_m
. Cool, it makes sense.
} else { | ||
uint256 boldLossNumerator = _debtToOffset * DECIMAL_PRECISION - lastBoldLossError_Offset; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering was this right in the first place (or maybe I need to refresh). Prior to this PR we were subtracting the error from the bold loss (L461). But shouldn't we add it, as we do for the coll?
Let's consider what we were trying to fix, if there was no error correction at all:
boldLossPerUnitStaked
(b) is too small, due to floor divisionnewProductFactor
(f) is too big sincef = 1e18 - b
- Therefore
P
is not reduced enough at the liquidation - Therefore the deposit calc,
P/P_0 * initialDeposit
is too large - So collateral gains and BOLD gains will also be too high, as they use
P
values in numerator at each reward - This could lead to underflows withdrawing deposits, since
So with no error correction it's like that.
But by subtracting the last BOLD error, don't we make it worse: b
would already be too small from the floor division, and we then make it smaller.
In contrast, with the coll (line 452), we add the last coll error, which makes sense: coll rewards are too small from floor div, so we're storing the error and adding it.
I wonder if simply changing the sign here in L461 would fix underflow (or if its not so simple)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t think that’s what we were trying to fix with the error. That we do it with the+ 1
in boldLossPerUnitStaked
assignment below. We are doing a (pseudo-) ceil division.
Then with the lastBoldLossError_Offset
what we are trying is that the difference between the real theoretical value and the value resulting from the ceil division doesn’t grow too big over time.
So:
- With the
+ 1
we try to avoid underflows - With
lastBoldLossError_Offset
we try that the accumulated error from the previous measure doesn’t accumulate.
The problem is that the second step can revert the effect of the first one to some extent, and make the underflows appear again (if I’m not mistaken because the treatment of that error was wrong). That’s what we are trying to fix in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In contrast, with the coll (line 452), we add the last coll error, which makes sense: coll rewards are too small from floor div, so we're storing the error and adding it.
Yes, coll rewards have the opposite direction because it’s a floor division (as opposed to the ceil division of bold deposits). I.e., we don’t do the +1
trick.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, coll rewards have the opposite direction because it’s a floor division (as opposed to the ceil division of bold deposits). I.e., we don’t do the +1 trick.
Ahhh right. Thanks, yes that makes it clear!
newP = currentP * newProductFactor / DECIMAL_PRECISION; | ||
uint256 errorFactor; | ||
if (lastBoldLossError_Offset > 0) { | ||
errorFactor = lastBoldLossError_Offset * newProductFactor / lastBoldLossError_TotalDeposits; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to understand the rationale for the denominator here.
lastBoldLossError_TotalDeposits
is only assigned a value in L458 (for a pool-emptying liq) and in L538 after it's been used in the division in L533.
So that means 1) if the current liq is pool-emptying, we use current _totalBoldDeposits
in the division. But if 2) it's not pool emptying, we use the stored value lastBoldLossError_TotalDeposits
which is stored at the previous liq.
Is that intended? If so, why?
In case 1), we use the most recent totalBoldDeposits, which could have changed since the last liq as deposits were added/removed. In 2, we use a potentially outdated stored value.
I'd have assumed we should always use the current value, since we're scaling a past error (calculated using past total deposits) by current total deposits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that intended? If so, why?
Yes, that’s intended. In case 1 we are kind of resetting everything, P
to 1 too. So we need to reset the error tracking too.
Anyway, there’s a TODO in this PR to review the case for scale updates, there may be something missing (although as far I know, it hasn’t surfaced during fuzzing yet). So, it’s intended, but I’m not 100% sure it’s correct. I want to get deep into that case once we make sure the basic logic (for the generic case where P
is not reset) is sound.
if (lastBoldLossError_Offset > 0) { | ||
errorFactor = lastBoldLossError_Offset * newProductFactor / lastBoldLossError_TotalDeposits; | ||
} | ||
newP = (currentP * newProductFactor + errorFactor) / DECIMAL_PRECISION; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems consistent with the previous implementation, i.e. making P
larger (than without error correction). But see my comment on L461 questioning whether that was correct in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Answered above.
This PR implements 3 fixes:
The most important one is about proper handling of error offset. In the original implementation, we track an absolute error (
lastBoldLossError_Offset
), based on some amount of total Bold deposits, but afterwards we add it to a fraction with a smaller amount of total Bold deposits, amplifying the effect of this error recovery too much.This causes that over a sequence of liquidations, the ratio between the initial
P
and the finalP
is bigger than the ratio between the initial total amount of deposits and its final value. This is fine in one step if it’s compensating for a previous error, but not on the long run. If we compare to an initial state where there was no error, the ratio betweenP
’s should be always less or equal than the ratio between total deposits. Of course theoretically should be exactly the same, but in practice, we must handle the rounding errors that way to prevent that sum of yield rewards (which useP
ratios) grows bigger than the total amount of owed rewards (causing an underflow in the subtraction if all users would try to claim).Use
(P - 1)
instead ofP
for total yield rewards. Even with the previous fix, the error itself has rounding errors. So if a givenP_0
is smaller than it should due to rounding, then it may happen that over a sequence of liquidations the ratioP_n / P_0
is again too big. Even with this fix, it still happens sometimes that the total amount of deposits is less than the sum of compounded deposits of all depositors. (It’s not a problem as we are already capping it). Unless we go very conservative, and therefore less precise, it’s hard to completely avoid this, specially when there are differentP
snapshots involved (for different users) and whenP
becomes small. But essentially in any step what we do isdeposits * new P / old P
, so the error is approximately bound bydeposits / old P
. Assuming thatP >= 1e9
, that means that for a 1 BOLD error the SP should be around 1 billion BOLD size. And that’s quite an extreme case: after 3 years and a half, in v1P
is still0.119129209443316949
(i.e.~1e17
).Prevent yields underflow. As an extra safety measure to the previous fixes, we make sure we never try to claim more than we have tracked as yields owed, so that the subtraction doesn’t underflow.
TODO: