Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix arithmetic underflow in the StabilityPool #438

Merged
merged 24 commits into from
Oct 15, 2024
Merged

Fix arithmetic underflow in the StabilityPool #438

merged 24 commits into from
Oct 15, 2024

Conversation

bingen
Copy link
Collaborator

@bingen bingen commented Sep 27, 2024

This PR implements 3 fixes:

  1. The most important one is about proper handling of error offset. In the original implementation, we track an absolute error (lastBoldLossError_Offset), based on some amount of total Bold deposits, but afterwards we add it to a fraction with a smaller amount of total Bold deposits, amplifying the effect of this error recovery too much.
    This causes that over a sequence of liquidations, the ratio between the initial P and the final P is bigger than the ratio between the initial total amount of deposits and its final value. This is fine in one step if it’s compensating for a previous error, but not on the long run. If we compare to an initial state where there was no error, the ratio between P’s should be always less or equal than the ratio between total deposits. Of course theoretically should be exactly the same, but in practice, we must handle the rounding errors that way to prevent that sum of yield rewards (which use P ratios) grows bigger than the total amount of owed rewards (causing an underflow in the subtraction if all users would try to claim).

  2. Use (P - 1) instead of P for total yield rewards. Even with the previous fix, the error itself has rounding errors. So if a given P_0 is smaller than it should due to rounding, then it may happen that over a sequence of liquidations the ratio P_n / P_0 is again too big. Even with this fix, it still happens sometimes that the total amount of deposits is less than the sum of compounded deposits of all depositors. (It’s not a problem as we are already capping it). Unless we go very conservative, and therefore less precise, it’s hard to completely avoid this, specially when there are different P snapshots involved (for different users) and when P becomes small. But essentially in any step what we do is deposits * new P / old P, so the error is approximately bound by deposits / old P. Assuming that P >= 1e9, that means that for a 1 BOLD error the SP should be around 1 billion BOLD size. And that’s quite an extreme case: after 3 years and a half, in v1 P is still 0.119129209443316949 (i.e. ~1e17).

  3. Prevent yields underflow. As an extra safety measure to the previous fixes, we make sure we never try to claim more than we have tracked as yields owed, so that the subtraction doesn’t underflow.

TODO:

  • Should we apply the fix to the case where we update the scale?

In the original implementation, we track an absolute
error (`lastBoldLossError_Offset`), based on some amount of total Bold
deposits, but afterwards we add it to a fraction with a smaller amount
of total Bold deposits, amplifying the effect of this error recovery
too much.

This causes that over a sequence of liquidations, the ratio between
the initial P and the final P is bigger than the ratio between the
initial total amount of deposits and its final value. This is fine in
one step if it’s compensating for a previous error, but not on the
long run. If we compare to an initial state where there was no error,
the ratio between P’s should be always less or equal than the ratio
between total deposits. Of course theoretically should be exactly the
same, but in practice, we must handle the rounding errors that way to
prevent that sum of yield rewards (which use P ratios) grows bigger
than the total amount of owed rewards (causing an underflow in the
subtraction if all users would try to claim).
Even with the fix in the previous commit, the error itself has
rounding errors. So if a given P_0 is smaller than it should due to
rounding, then it may happen that over a sequence of liquidations the
ratio P_n / P_0 is again too big.
As an extra safety measure to the previous commits, we make sure we
never try to claim more than we have tracked as yields owed, so that
the subtraction doesn’t underflow.
@bingen bingen self-assigned this Sep 27, 2024
@@ -632,52 +664,6 @@ contract StabilityPool is LiquityBase, IStabilityPool, IStabilityPoolEvents {
return initialDeposit * (firstPortion + secondPortion) / snapshots.P / DECIMAL_PRECISION;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also truncate this to yieldGainsOwed, if we want to be consistent. Plus, the invariant tests rely on getDepositorYieldGainWithPending() to predict the precise amount of yield that's going to be received by the depositor, so there are currently some failures due to the inconsistency between getDepositorYieldGain() & getDepositorYieldGainWithPending().

Truncate also with `yieldGainsOwed`.

return yieldGain;
return LiquityMath._min(yieldGain, yieldGainsOwed);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the ceiling be yieldGainsOwed + pendingSPYield?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it's more complicated. I think we should only add pendingSPYield if totalBoldDeposits >= DECIMAL_PRECISION.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably explains why all the tests are failing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: 120792d

@danielattilasimon
Copy link
Collaborator

Found this case during a CI run (belongs in stabilityPool.t.sol):

function test_Issue_ShareOfCollMismatch() external {
    testLiquidationsWithLowPAllowFurtherRewardsForAllFreshDepositors_Cheat_Fuzz(0, 504242351683211589370490);
}

This passes on main but fails on this PR branch. Thought it was worth looking at.

Copy link
Collaborator

@RickGriff RickGriff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Left some questions.

@@ -408,7 +409,7 @@ contract StabilityPool is LiquityBase, IStabilityPool, IStabilityPoolEvents {
uint256 yieldPerUnitStaked = yieldNumerator / totalBoldDepositsCached;
lastYieldError = yieldNumerator - yieldPerUnitStaked * totalBoldDepositsCached;

uint256 marginalYieldGain = yieldPerUnitStaked * P;
uint256 marginalYieldGain = yieldPerUnitStaked * (P - 1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully get the use of P - 1 rather than P. As far as I can see:

  • The P error correction increases P, but
  • P error correction itself has rounding error (so, smaller than it should be)

Why do we then use P -1? Doesn't that make it even smaller than it should be?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rationale is that we end up using x * P_n / P_m, with n > m (and being P_n the one in the line above.
The fraction P_n / P_m represents the ratio of total deposits between those two snapshots (m and n).
We want the ratio between P’s to always be <= than the ratio between deposits. So that rewards shared are less or equal than should be, but not greater.
As both P_n and P_m are inaccurate, it may happen than the error (truncating) in P_m is bigger than in P_n. That would make the fraction to increase.
Let’s put an example, assuming DECIMAL_PRECISION = 100. Let’s imagine that the total deposits are 4999 and 2001 for m and n snapshots, and that theoretical values of P would be 49.99 (truncated to 49) and 20.01 (truncated to 20). Then we have P_n / P_m = 20 / 49 > 2001 / 4999 = deposit_n / deposit_m. That’s what we wan to avoid, so we use P_n = 19 instead of 20. Of course in this example this makes the precision of P_n / P_m very bad, but this is an edge case and mostly due to the low precision (only 2 decimals). With 18 decimals, subtracting 1 wouldn’t be such big deal.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, yes I see. So this correction should always make the error in P_n bigger than the error in P_m. Cool, it makes sense.

} else {
uint256 boldLossNumerator = _debtToOffset * DECIMAL_PRECISION - lastBoldLossError_Offset;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering was this right in the first place (or maybe I need to refresh). Prior to this PR we were subtracting the error from the bold loss (L461). But shouldn't we add it, as we do for the coll?

Let's consider what we were trying to fix, if there was no error correction at all:

  • boldLossPerUnitStaked (b) is too small, due to floor division
  • newProductFactor (f) is too big since f = 1e18 - b
  • Therefore P is not reduced enough at the liquidation
  • Therefore the deposit calc,P/P_0 * initialDeposit is too large
  • So collateral gains and BOLD gains will also be too high, as they use P values in numerator at each reward
  • This could lead to underflows withdrawing deposits, since

So with no error correction it's like that.

But by subtracting the last BOLD error, don't we make it worse: b would already be too small from the floor division, and we then make it smaller.

In contrast, with the coll (line 452), we add the last coll error, which makes sense: coll rewards are too small from floor div, so we're storing the error and adding it.

I wonder if simply changing the sign here in L461 would fix underflow (or if its not so simple)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think that’s what we were trying to fix with the error. That we do it with the+ 1 in boldLossPerUnitStaked assignment below. We are doing a (pseudo-) ceil division.
Then with the lastBoldLossError_Offset what we are trying is that the difference between the real theoretical value and the value resulting from the ceil division doesn’t grow too big over time.
So:

  • With the + 1 we try to avoid underflows
  • With lastBoldLossError_Offset we try that the accumulated error from the previous measure doesn’t accumulate.

The problem is that the second step can revert the effect of the first one to some extent, and make the underflows appear again (if I’m not mistaken because the treatment of that error was wrong). That’s what we are trying to fix in this PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In contrast, with the coll (line 452), we add the last coll error, which makes sense: coll rewards are too small from floor div, so we're storing the error and adding it.

Yes, coll rewards have the opposite direction because it’s a floor division (as opposed to the ceil division of bold deposits). I.e., we don’t do the +1 trick.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, coll rewards have the opposite direction because it’s a floor division (as opposed to the ceil division of bold deposits). I.e., we don’t do the +1 trick.

Ahhh right. Thanks, yes that makes it clear!

newP = currentP * newProductFactor / DECIMAL_PRECISION;
uint256 errorFactor;
if (lastBoldLossError_Offset > 0) {
errorFactor = lastBoldLossError_Offset * newProductFactor / lastBoldLossError_TotalDeposits;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to understand the rationale for the denominator here.

lastBoldLossError_TotalDeposits is only assigned a value in L458 (for a pool-emptying liq) and in L538 after it's been used in the division in L533.

So that means 1) if the current liq is pool-emptying, we use current _totalBoldDeposits in the division. But if 2) it's not pool emptying, we use the stored value lastBoldLossError_TotalDeposits which is stored at the previous liq.

Is that intended? If so, why?

In case 1), we use the most recent totalBoldDeposits, which could have changed since the last liq as deposits were added/removed. In 2, we use a potentially outdated stored value.

I'd have assumed we should always use the current value, since we're scaling a past error (calculated using past total deposits) by current total deposits.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that intended? If so, why?

Yes, that’s intended. In case 1 we are kind of resetting everything, P to 1 too. So we need to reset the error tracking too.

Anyway, there’s a TODO in this PR to review the case for scale updates, there may be something missing (although as far I know, it hasn’t surfaced during fuzzing yet). So, it’s intended, but I’m not 100% sure it’s correct. I want to get deep into that case once we make sure the basic logic (for the generic case where P is not reset) is sound.

if (lastBoldLossError_Offset > 0) {
errorFactor = lastBoldLossError_Offset * newProductFactor / lastBoldLossError_TotalDeposits;
}
newP = (currentP * newProductFactor + errorFactor) / DECIMAL_PRECISION;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems consistent with the previous implementation, i.e. making P larger (than without error correction). But see my comment on L461 questioning whether that was correct in the first place.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answered above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants