-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mrhudson890 - VoteDelegate
's reserveHatch
allows multi-block MEV to grief LSE users during time-sensitive voting periods
#95
Comments
The reserve hatch mechanism includes both a period for allowing locks (15 blocks) and a period for allowing frees (5 blocks). Thus if an action is critical enough, its issuer can do it in the respective sequence of blocks where it should succeed regardles of MMEV griefing. |
VoteDelegate
's reserveHatch
allows multi-block MEV to grief LSE users during time-sensitive voting periodsVoteDelegate
's reserveHatch
allows multi-block MEV to grief LSE users during time-sensitive voting periods
I'd like to escalate. Both direct financial loss, and broader risk to protocol governance are likely and are not addressed by the sponsor's response. Impact unchallengedThe griefing impact was not disputed by the sponsor, instead they pointed out a workaround either for the user to use after being griefed, that is unlikely to be used prior to griefing, and which was already mentioned in the submission:
However, this workaround is unlikely to prevent the griefing attack if the attacks are not already prevalent - because relies on otherwise wasteful and implausible user behavior: always submitting extra transactions, and waiting between sequential steps for a long period of time for no good reason (most of the time). Unreasonable user behavior assumptions
The sponsor assumes one of two things:
So that assumption is invalid for the scenarios covered in the report. The other alternative:
This assumption is highly unreasonable and is unlikely wasteful user behavior - instead of submitting a transaction, submitting a "pre-transaction" transaction, waiting for more than a minute, and only than submitting the actual needed transaction. Specifically, the only circumstances where users will behave in this way is only if a substantial MMEV grieving attack is already taking place. However, if we must assume the attack has happened for the workaround to be plausible - the attack is validated. Coordinated occasional attacksThe most destructive and motivated attack would be if these attacks be coordinated around specific time as detailed in Scenario 3. The attacks would happen in the time preceding some specific votes: especially controversial or emergency actions, in a selective way, and because of the usual user behavior will grief a large amount of unprepared users as well as influence the vote. These attacks will be especially effective, since rare (but critically timed) attacks will ensure most users are not engaging in wasteful "pre-transaction evasive maneuvers" - and so are vulnerable in large numbers during the attacks. Lack of upgrade path
There isn't a straightforward upgrade path for the LSE mechanism. The only two directly upgradeable contracts in the contest scope are NST and SNST due to usage of UUPS proxy pattern. The LSE system, with it's per user LSUrn, that holds vat debt and vat collateral does not have an easy upgrade path. If the LSE logic needs to be upgraded, a painful and complex migration will need to take place. Again, not only MKR and NGT balances will need to be moved, but also all the debt for each single urn will need to be moved with its collateral, among other things. Thus, in an extreme case, and extremely painful and expensive migration is possible, but it's unlikely to be done to resolve griefing attacks. Instead users will suffer from griefing, and Maker will suffer due to reduced voter participation and reduced trust. |
Escalate Escalating on behalf of @mrhudson890, refer to his/her comment. |
You've created a valid escalation! To remove the escalation from consideration: Delete your comment. You may delete or edit your escalation comment anytime before the 48-hour escalation window closes. After that, the escalation becomes final. |
Could you elaborate on the multiblock part of your submission? I don't see a multiblock MEV in the attack paths and only a one-time block. I see that you're saying it may have a very large impact on emergency voting, but for this issue to actually have an affect, the voting has to be several minutes or even less (again depends on how many blocks is meant by Moreover, is there a material loss besides paying the gas for failed transaction? |
@WangSecurity thanks for your comment and questions.
Multiblock here refers to the MEV type. It's the ability to extract MEV - control the order of transactions - in two or more consecutive Ethereum blocks.
Every attack blocks one user transaction (and wastes their gas) for at least 1 minute (6 blocks until cooldown), but the total impact on voting is over a longer period, due to different users (with different VDs) voting and being griefed at different times during the time-sensitive voting period. Please see more detailed scenario below. Detailed scenario for selective governance attack (Scenario 3 in the report)I'll expand the scenario using the recent attack on Compound DAO, and the previous attacks on Balancer DAO and Sushiswap DAO . Humpy's attack on Maker DAO:
The two impacts are demonstrated here:
Rational usersExpanding why the sponsor response ("there's a workaround") does not address the issue. Because users are rational, they will not user the workaround proactively - it is more expensive, complex, and takes more time. If the attacks are not frequent enough, rational users will just default to submit their intended transaction. This is because when they submit, their wallet's simulation will pass. Only after they submit, it will be MEVed to revert. Griefing is in scope
The griefing loss is material and substantial on itself (separately from the governance attack) - possibly more than 5% of the users' value (up to 630 USD in the PoC). Furthermore, the griefing attack is explicitly in scope by both Sherlock and MakerDAO specific scope rules:
Because voting in involved, the transaction is clearly time-sensitive for contentious, emergency, and even regular votes.
As shown in the report, the damage ratio for this attack exceeds this low threshold, at up to 3200% in the PoC. This is because calling RH by the attacker is much cheaper than the attack's damage to user. To conclude, the losses are material and likely for both of the impacts. The sponsor's response does not address / mitigate the attack. The griefing loss by itself explicitly qualifies by scope rules. |
Firstly, the loss of gas is not a "loss of funds". Secondly, about the multi-block MEV. it allows to DOS the transaction for 2-3 blocks. To cause a larger impact or a constant DOS, the attack has to be repeated. Hence, it falls under this rule:
Thirdly, this function being time-sensitive. As I know, the emergency voting is at least a few hours. Hence, DOSing users for 2-3 blocks is not enough to cause a discrepancy in voting leading to a loss of funds. I believe this issue is low severity since it's about blocking the voting process for 2-3 blocks. Planning to reject the escalation and leave the issue as it is. Note: I see there's a Humpy scenario that leads to a "loss of funds" via influencing the voting process. But, it should've been in the report. The report only shows a loss of gas and how the users can be DOSed for 2-3 blocks. |
@WangSecurity thanks for your comments, and I apologize for the long responses, but I have 4 comments on the above and would be grateful for your opinion on all 4. 1. Humpy attack in original report
It is in the original report, please see "Attack path > Scenario 3", it describes this exact scenario.
And then also in "Impacts":
2. DOS duration is 6 blocks
The blockage (DOS) is for at least 6 blocks, or 1 minute and 12 seconds. This is because once 3. Time-sensitivity constraint
As the blockage is for at least 1 minute and 12 seconds (6 blocks), a "last minute vote" can be blocked. While last minute votes are the most strict constraint, they are nevertheless very plausible. As support, I'd like to cite your opinion in #107
The constraints in that issue (rare adjustment of dust + rare liquidations + liquidation of a just-right-dusty amount) are more extreme than the most extreme constraint in this issue: voting in last minute. Last minute voting is almost certain to occur for some users during any voting period. These users can be blocked with a 30% chance using this attack. Additionally, time-sensitivity on longer time scale (e.g., last hour of voting) is also demonstrated for scenarios involving repeated blocking of a large amount of users (Scenario 3) due to the aggregated effect on a large amount of users. 4. Gas fees as loss of funds
Can you please explain why user losing a substantial amount of ETH due to malicious MEV is excluded from loss of funds? I see it explicitly included in both Sherlock and MakerDAO contest rules as cited above in "Griefing is in scope" section of my my previous message. |
Note that the rule says the issue is only valid if the function is time-sensitive. Not because the user pays too much gas for a failed transaction. Here, I believe DOS of 6 blocks for voting is not medium severity. Even if it's the last minute of voting, I believe 6 blocks is lot severity. The decision remains, planning to reject the escalation and leave the issue as it is. |
@WangSecurity I will respond within a day or so, thanks for you patience. |
1. Time-sensitivity
This belief contradicts the rules, and is unsupported by any arguments. The rule says: "only if the function is clearly time-sensitive, it can be a Medium severity issue". "Clearly time-sensitive" applies to "1 minutes 12 seconds" just as to any other duration. Time-sensitivity is not a duration, but is difference in outcome because of time passing:
While both short and long duration attacks are presented, both are time-sensitive, since both lead to loss of funds due to blocking functionality in a time-sensitive way. 2. Loss of gas fees
This claim also contradicts both sources of truth for the contest - Sherlock rules and MakerDAO contest docs (as detailed above ). Both those sources of truth contain no exclusion of this type of loss, and explicitly cite specific constraints to satisfy for validity: time-sensitivity, and cost to damage ratios. Both these constraints are met. Additionally, while not official sources of truth, these past findings further show the claim is not only wrong in principle, but also in practice. In all these issues the only impact was loss of gas fees, some issues were even resolved by you personally as loss of funds:
3 Scenario 3 - still unaddressed
This scenario was overlooked - and not addressed still, despite the apology. Scenario 3 - "Humpy's" attack was discussed in the report. Then in further detail in comment. It shows time-sensitivity and two impacts of losses to MakerDAO and large amounts of users. The scenario is highly likely, as shown by the attacks on Compoound, Balancer, and Sushiswap. Additionally, this scenario's time-scale is both significantly longer, and its impact is larger: both on more individual users, and on the protocol itself (loss to governance attack). Given the overlooked Scenario 3, the clear time-sensitivity of the issue as defined by the rules, and the fact that gas fee losses are considered valid loss of funds by the rules, I respectfully request a reconsideration of your opinion. The evidence presented demonstrates that this vulnerability is by all rules a medium severity issue, posing significant risks to both individual users and the protocol as a whole. |
Excuse me for the confusion, let me elaborate on what I mean. The duration of the DOS is only 1 minute 12 seconds. The emergency votings are at least a few hours. In that context, I believe that blocking the several-hour voting for 6 blocks is not time-sensitive. Even if it's last-minute voting, I believe in the context of the vote lasting for several hours it's not time-sensitive.
This doesn't contradict any sources of truth, none sources of truth claim that griefing for gas, i.e. making the victim pay for the failed transaction is a valid loss of funds. Gas is the essential part of making transactions and each user has to pay the gas, even if the transaction fails. I believe paying the gas is not a medium-severity loss of funds. Especially, in the case that all other funds of the user are safe. About Sherlock rules and MakerDAO contest docs, I may not elaborated well enough last time, griefing the transaction to fail can be valid not based on the gas paid, but on the larger impact it has, i.e. trading, trading DOSed for even 1 block can be significant. In this case, it's not, even though we consider Ethereum mainnet, the loss of gas fees is not considered a valid loss of funds.
Most of the findings you shared were not judged by me, but from what I see the problem there is not it gas paid, but in the fact that it should've been repaid to the keeper (e.g. in Elfi protocol) but it wasn't. In this case, the gas fees shouldn't be repaid to anyone, hence, it's not a loss of funds but how the blockchain works. About the 2 findings that were judged by me (2 escalations from Elfi), they do not talk about loss of gas fees, but about lost keeper fees that should've been paid. Hence, they're irrelevant in this discussion.
Indeed, reflecting on it now: The decision remains as it is, planning to reject the escalation and leave the issue as it is. |
@WangSecurity Thank you for your detailed explanation. Your position is much clearer now. Some crucial points given your clarifications:
While the voting period is longer, users do not expect to be blocked, so will vote at different times during the period - this is the assumption. Users cannot be assumed to vote ONLY before the last minute.
While each instance is short, the attack is repeated throughout the voting period, leading to cumulative impact. This could significantly sway voting results, especially in close decisions, as shown in the attacks on Compound, Balancer, and SushiSwap.
You mentioned griefing can be valid "based on the larger impact it has". Influencing governance decisions, controlling the entire protocol, is one of the largest impacts possible, potentially affecting millions in protocol funds. This aligns with this criteria for validity. In this respect Scenario 3 is not griefing, but a direct, and profitable loss of protocol funds.
The rules don't explicitly exclude gas fees as a valid loss. It is arbitrary to decide this isn't a "loss of funds" when users are losing real ETH due to a vulnerability. From a user's perspective, this is a financial loss due to a protocol vulnerability - allowing an attacker to block their transactions. This is not just "how blockchain works" if the specific vulnerability allows the loss. For example, in the Elfi findings you mentioned, there is no difference between a keeper losing gas fees due to a revert (like in this report), or them being not sufficiently refunded for paying gas fees. It is an arbitrary distinction - like saying that "-10" is a loss, but "0-10" is not a loss:
It is the exact same loss of funds, the only difference is that gas is either lost with a revert or without a revert. Your interpretation of loss of funds being excluded is not consistent with all of:
Again, I cite previous accepted findings for this loss not as sources of truth, but as evidence for the argument that your current interpretation of the rules is inconsistent with the rules and how they are interpreted by other watsons and judges (including yourself). This distinction between gas fees lost to revert due to a bug and gas fees "not refunded" due to a bug is arbitrary, and is not even semantic - it's 100% same in every way. I respectfully request a reconsideration based on these arguments. Thank you for your time. |
Still, my decision remains. If the voting is DOSed for 1 minute 12 seconds out of several hours of voting, it's not time-sensitive, even if it's the last minute, it's not time-sensitive.
The impact of this attack is DOS of voting, for this to have a large impact, the attack has to be repeated perpetually. Hence, we should consider one instance of the attack based on the rules I've already quoted above, i.e. we've already discussed the repeatability above.
The above paragraph about repeatability also applies here. With one instance we have only 1 minute and 12 second DOS of voting, which I believe doesn't influence the governance decisions or control the entire protocol because even the emergency voting will last for at least several hours. Moreover, I don't see where scenario 3 in the report mentions a loss of funds and DOSing the voting doesn't directly lead to a loss of funds, especially, when the DOS is 1m12s long. Yes, it's not directly said the loss of gas fees is excluded, but the decision is still that the loss of gas fees is low severity because all the funds of the user remain untouched. Paying gas for a failed transaction is how the blockchain works even if the user is forced to. Also, making the user pay for 2 transactions instead of 1 is not a medium-severity loss of funds. About comparison with Elfi, it's the last time that I respond to these arguments because historical decisions are not sources of truth as said in our rules and have no effect on the judging outcome.
That's the reason why the historical decisions are not sources of truth. Elfi and Maker are completely different protocols, with completely different designs, contracts, functionalities, etc. That's why "similar" issues may be judged differently in different contests. Elfi's design was to refund the gas paid by the keeper for completing the transaction, i.e. keeper made a transaction and they must receive back the gas fee they paid, but they didn't. That's the reason why it was valid. I don't see how it's similar to the issue here, they are completely different. Hence, the following statement is incorrect:
Hope the above answers how this line is irrelevant here and why the issues on Elfi were judged the way they were.
Note that based on the rules not every loss of funds warrants M/H severity.
These are not sources of truth. Above, I explained why on Elfi they were valid. Also, I think you missed the part of my previous comment. The only issues from Elfi that you mentioned and I judged didn't mention loss of gas fees in any place and the impact wasn't related to gas fees in any way.
Again, the protocols have different designs. The current interpretation is consistent with the rules and how they were always interpreted. The issues on Elfi are valid NOT because it was a loss of gas for a failed transaction as in this case.
It's not even close to being "100% same in every way". Elfi protocol had a broken functionality which resulted in users not getting the funds they must have got. Here, it's nothing close to it. Again, the decisions on any other contest are not a source of truth and don't affect judging on any of the other contests. Refrain from using it as an argument. But, even if you do, please check the designs of the protocol you want to reference and check how they should and shouldn't work. The rule "Design decisions are not a source of truth" is implemented due to this exact reason of different codebases having different designs, hence, on one contest it's a valid issue, on the other it's an invalid issue and on the third, it's a design decision. The decision remains the same, reject the escalation and leave the issue as it is. |
Thank you for your response. However, I find that it largely repeats earlier claims without addressing the core counter-arguments I presented or adding any supporting evidence to your claims. Specifically:
These points align with medium severity criteria for three distinct vulnerabilities of medium severity:
I respectfully request to engage with my arguments and to provide support for yours, instead of repeating the disputed claims. In my opinion, if no such arguments can be found, it is highly likely that your conviction is misplaced, and the issue should be accepted based on either or all of the three distinct categories. |
However, you also repeat the very same arguments several times. Specifically:
Again, I disagree it's medium severity:
Lost gas fees due to a griefed transaction are not a valid loss of funds and never were.
I disagree this DOS of 1m12s is sufficient to have a governance manipulation.
It's not a time-sensitive DOS. Repeating to finally settle the discussion, lost gas fees due to griefed transactions and this situation not being time-sensitive is the decision, not a debatable question. Also, before accusing me of repeating the same arguments, please refrain from repeating your arguments as well. The decision remains the same, reject the escalation and leave the issue as it is. |
I apologise for causing you this frustration. The problem I was trying to resolve was repeating claims (as opposed to arguments), and not addressing or countering my arguments. Which is why I felt I had to repeat them, to try to get to why you disagree and how it aligns with the rules. See this pattern that I was seeing:
However, I'd still like to clarify one part, that appears to be an important misunderstanding.
I am not assuming only high voting-power users are blocked. This is a misunderstanding of the cumulative effect - the source of the "extra voting power":
For the protocol's loss of funds, the amount of "extra voting power" the attacker gains, is not time dependent in this case. It depends on the fraction of the users that are deterred from re-voting with blocking + the fraction of voters that vote in the last minute and so cannot physically re-vote. If 1000 users vote, and 300 are deterred + blocked, if together they control 10% of the relevant voting power - this is a highly significant fraction for the many close-call governance attacks I cited. Looking at criteria for Medium again (not a High!), I think it's fully justified:
|
Oh I see, excuse me, then yep highlight what hasn’t been answered clarified.
I understand it’s not exactly what I meant by targeting specific voters. I was just trying to show that even if one user is targeted, why they would give up their vote after 1m12s DOS. And I mentioned high-voting power users, because they’re better to DOS, i.e. it’s better to DOS someone with 100 voting power than someone with 1.
Yep, it’s fair to say that someone won’t vote in that case, but it would a speculation to say who exactly and how many people. Hence, we cannot quantify the effect here, even though I agree it’s fair to assume someone won’t vote.
Yep, I agree that there will be voting at the very end. Excuse me if my previous messages sounded like there won’t be such a case. The problem is that even in that case I consider “voting” as time-sensitive. It sounds like a repeat myself, and I’ll try to elaborate on why I do it. It’s my decision “it’s not time-sensitive” and you bring new arguments, but my decision remains as it is, hence, I repeat myself, because this is still my decision. Hope that makes sense.
Yes, I understand, but I still stand by my initial decision that the DOS of 1m12s for several-hour voting, even for the last-minute, is not time-sensitive and low-severity.
Excuse me if my messages seemed to claim its design decision, I don’t.
Still, I believe the DOS of several-hour voting for a little longer than a minute doesn’t pose medium-severity outcome and it’s my decision.
Agree
Excuse me, then it was the misunderstanding from my side and I thought you claim it to be time-sensitive.
Again, my thought was that you claim the loss of gas fees as medium impact. Still, my decision remains as it is, I believe DOSing at least several-hour voting for 1m12s doesn’t pose a Medium severity impact on the protocol and should remain as it is. Planning to reject the escalation. Note: I didn’t quote and respond a couple of your paragraphs, not because I overlooked or don’t want to address them. I don’t have anything to add to them and they look like a scenario explaining a possible outcome. It’s viable and I don’t say “it won’t happen”. |
@WangSecurity thank you, I think we have a much better shared understanding now. However, I think there was confusion about my claim and which impact I was arguing for. Previously I was claiming all three impacts:
In my previous comment I've stopped arguing for 1 and 2 due to our disagreement about time sensitivity and gas fee loss. I focused only on impact 3 - medium severity loss of funds for the protocol, for which we don't need to agree on time-sensitivity, or gas fees loss. This is because for impact 3, time-sensitivity is not a validity criteria (it's not DoS or griefing) - it's a profitable attack.
I think that quote shows it did become clear only at the end of my comment. However, the misunderstanding made most of your responses before that irrelevant. You responded to most of my arguments in terms of validity of DOS and time-sensitivity, instead of loss of funds:
Finally, in the context of this mismatch, your last paragraph, reiterating your decision, seems incorrect:
I'll now address the comments that are relevant to that impact, add data, and restate the case for clarity. You agree with the following scenario arguments:
The only disagreement I see:
To address this, I'll both quantify the effect, and quote some relevant available numbers. The gained "extra voting power":
How large do N + M need to be for flipping the result?
The validity criteria for medium loss of funds:
For these reasons, I think the validity criteria for this impact, at medium severity, are thoroughly satisfied. |
Thank you for this thorough analysis and excuse me that I didn't reflect on the scenario 3 clear enough.
That's fair, but we still have to remember that
But in case of Humpy, the page you linked several times doesn't mention the scenario from this report. In case of humpy it's (simply put) is a whale that can acquire large share of the governance token and influence the voting. This itself is not an issue and a trade-off of governance system as a whole. I think we both understand this part, I just wanted to make it clear. As I understand, you use this "attack" to amplify yours that this whale can even DOS other voters and get even more advantage. Also, some votes have a very little number of holders competing. And for these small votes, you could be just a whale (as Humpy) and you would be free to influence the final decision. As for points about it being a Medium severity issue, I see that basically getting the control of the voting can easily make a large loss of funds. The problem is that from my point of view, the DOS of 1m12s of even several-hour voting, i.e. the way that the attacker gains the additional voting power, is not sufficient to gain the control of that vote and influence the voting outcome. Hence, I believe it's low-severity because it allows to only DOS voting for 1m12s, which is basically no impact if it's not the last minute, and the user can freely vote after it, and even if it's the last-minute voting, unfortunately, I still don't believe it's sufficient to influence and gain any control of the governance. Hence, the decision remains the same, reject the escalation and leave the issue as it is. |
Sorry for repeating, but the DOS itself is not important for this impact. The fact that users have the ability, but will rationally not all use it doesn't help the protocol's loss. We already agreed on that I think:
I think you misunderstand what I'm establishing with citing Humpy's attacks. I'm establishing the likelihood of the preconditions of my scenario:
Crucially, buying voting power is not an attack. Voting power gained by buying tokens is 100% legitimate. However, voting power gained by blocking opposing votes, is an attack.
Thank you. I think this is the most important point.
This contradicts the evidence I cited above in this and previous comments:
In my understanding, the bar for medium in this case is:
@WangSecurity since both criteria are satisfied, I think your reluctance to accept as medium severity is not a correct application of the rules. The attack allows otherwise impossible and illegitimate voting outcome manipulation, that under constraints that are shown to repeatedly occur, will result in protocol losses above the 0.5% threshold. I do not understand how this can be judged below medium according to the rules of the contest. |
Impact 3 is the same as 3.1.1 from https://github.com/sherlock-audit/2024-06-makerdao-endgame/blob/main/vote-delegate/audit/20240703-cantina-report-maker-vote-delegate.pdf which is a Low, where it says Note:
|
@IllIllI000 thank you very much, as I understand, not only impact is the same, the issue is the same. Both this and 3.1.1 from Cantina report talk about the attacker calling Hence, it's not only low-severity, but also a known issue from the previous audit. The decision remains the same, reject the escalation and leave the issue as it is. @mrhudson890 if you want to further discuss this issue and me to elaborate on your last comment, contact me on Discord, since it's a known issue and cannot be accepted here. |
@IllIllI000 That issue describes a completely different scenario and impact - statistically spamming It is indeed low severity:
The scenario I'm describing allows targeted blocking of only the specific transactions, via MMEV, that users have already signed and sent:
Not only are impacts different, the root cause and mitigations are different:
@WangSecurity So while the issues sound related, they don't share a root cause, impact, or mitigation, therefore that issue cannot invalidate this issue. Another way to see it is that if my issue is mitigated - their issue remains unchanged. I am hoping you will respond also to my previous comment that addresses your latest arguments and decision on severity. |
I believe in both cases the root cause is the same, which is calling |
@WangSecurity I'm sad to see that you have ignored the arguments entirely, and only reiterated the claim these arguments refuted. Do you find my arguments incorrect or irrelevant? If so, why?
This is not the root cause in 3.1.1. These two findings have different fixable flaws (root cause) they describe and recommend to fix. 3.1.1 does not care about the timing of the hatch window within the cycle. Instead it recommends: "A reasonable HATCH_SIZE should be chosen that gives liquidators enough time to perform the liquidation, as well as a reasonable HATCH_COOLDOWN such that the lock-blocking ratio remains low while still allowing frequent liquidations". Their root cause is HATCH_SIZE to HATCH_COOLDOWN ratio, they only point out the spam attack blocks a certain ratio (20%), their "flaw" is that this ratio should not be too high. Their scenario is statistical spam, which is not timed in any way relatively to the users' actions: "Assuming locks are randomly distributed over time", which means that the timing of the hatch within the cycle makes no difference in that issue. In contrast, my finding's recommendation is "introduce a longer delay before the hatch takes effect. Instead of starting from the next block, it should start after M blocks, where M is a governance-controlled parameter.". This finding's root cause is that the hatch window starts immediately, I do not recommend any changes to HATCH_SIZE or HATCH_COOLDOWN. My finding's flaw is the timing, not the ratio. Another way to see the findings do not share a root cause is that if mitigated, the the two findings are entirely independent from one another:
If mitigation of one finding does not mitigate the other - they do not have the same root cause. The additional arguments for why these findings are independent (not duplicates), as explained in the comment above that are also important:
This report is not a known issue: different root cause, mitigation, scenarios, impacts. |
I still believe they have the same root cause. If we look at the description of the Cantina's finding:
I believe the issue has the exact root cause as I've said in the previous message:
And:
The latter part of the Cantina's finding description is the impact and how many locks would be blocked. The root cause of this issue:
It's not as short as my description of the root cause or Cantina's but I believe these 3 sentences from the report can be rephrased in my definition of the root cause:
Hence, the two issues have the same root cause. Yes, they show different scenarios and thus different impact, but still the root cause is still the same. Hence, it's a known issue, planning to reject the escalation and leave the issue as it is. |
Also note that the different fixable flaws that you've pointed to, according to Sherlock's definitions, are attack paths and not root causes. There can be multiple attack paths for the same root cause. |
@WangSecurity Both findings must describe the original While Sherlock's definition is a bit vague, it is clear that a root cause should be uniquely identifiable (there should be one, "root"), therefore "primary" or "fundamental" must mean the most specific of all.
Your version of the root cause neither is the most fundamental, nor uniquely matches the unwanted outcome in either of the findings. In particular 3.1.1 discusses the primary factor of the of HATCH_SIZE to HATCH_COOLDOWN ratio, which imposes the unwanted outcome of being able to statistically block 20% of blocks. They propose to choose that ratio carefully, without altering anything else (like hatch delay). The fundamental factor and the outcome are clearly matching. My finding discusses the primary factor of the hatch delay being too small (=1) after the call, leading to the unwanted outcome of being able to frontrun transactions via MMEV and cause them to fail after being submitted. I propose to fix it by moving the hatch start, without altering anything else (like size, cooldown, and total cycle length). The fundamental factor and the outcome are clearly matching. From broad to specific properties of each described system:
This understanding of what can be primary or fundamental is equivalent to "fixable flaw" definition I used, which is more easily operationalized (used). Clearly, considering VoteDelagate contract as the flaw is too broad, and hatch_delay=1 is too narrow. These two definitions result in the same conclusion, of the two findings' root causes being separate.
The flaws I point out are not attack paths, because you can't fix attack paths: In my finding the hatch delay is the flaw, and the fix. It is not the attack path - MMEV is the attack path - I do not propose to "fix" MMEV. In 3.1.1 the size/cooldown ratio is the flaw, and the recommendation. It is not an attack path - spamming RH on all VDs is the attack path - they do not propose to fix "ability to spamming". |
Thank you for this extensive description. But, I still believe what you describe is the attack paths and the actual root cause of both findings is what I've described in my previous #95 (comment) . It's the primary factor of both issues, what you describe is different attack paths that lead to different outcomes. I see you've said I didn't address each of your points separately, because I don't see how it adds value to the discussion if I just quote your comment and say "it's incorrect, this is no the root cause, this is different scenarios triggered by the same root cause". The decision remains the same, reject the escalation and leave the issue as it is. |
You seem to have missed my whole response to the "the attack path" argument above. Here it is again, in more detail. Sherlock's definitions:
Now please see what I say the root causes are: 3.1.1:
My finding:
The unwanted outcome is the attack or related to it, but the root cause is the reason the outcome is possible. It's the "fundamental reason that imposes an unwanted outcome". How can one possibly confuse the "bug" - part of logic or code, with the attack path - what the attacker does? How can one possibly interpret "hatch delay is too short" to be "the attacker's actions"? It's in the code. And it's the only thing that allows my issue's attack path. Or "hatch size / cooldown ratio" to be "the attacker's actions"? It's in the code. And it's the only thing that allows 3.1.1's attack path. |
There's nothing that says 20% of blocks not allowing locking is not wanted. That in and of itself is not a vulnerability. The unwanted outcome is being able to block other people's votes
Based on the above, you've written In both cases, given the code as written, isn't the fact that calling reserveHatch() prevents calling lock() for a certain number of blocks, the reason both attacks are possible? Would either attack be possible if it didn't block calls to lock()?
|
Here's how I see these 2 issues. Root cause: The attack path:
The impact:
The mitigation:
Indeed, the reports suggest 2 different fixes, but they fix the attack scenarios, not the root cause. Your report fixes the front-running calls to In other words, both fixes don't mitigate the block calls to That's why I believe it's a known issue, even though two reports address 2 different scenarios, the issue is still the same. Planning to reject the escalation and leave the issue as it is. |
@WangSecurity This is NOT the vulnerability, it is the mechanism. Blocking one or the other direction from the chief is the purpose of reserveHatch. Blocking "in general" is not a root cause - there is no way to avoid "some" locking due to cheif's limitations. It is the same as saying that Blockchains, Ethereum, Maker, VodeDelegate contract - are all root causes because without them there is no bug. Each of these, despite being A cause, is NOT the ROOT cause. There is no way around blocking locks (and therefore votes) to chief for some of the time - this IS the reserveHatch mechanism. The root cause is:
This is the minimal (fundametal) reason that allows the outcome, and this is what needs to be mitigated to prevent it - otherwise it didn't fundamentally impose the unwanted outcome:
First, "novelty" doesn't even matter, because the absence of some code can be a root cause. The absence of whitelisting, access control, input validation, slippage protection - are all valid root causes - regardless of being present originally. Second, this claim is factually incorrect. In the current code there is a hatch delay - it is 1 block. If there was no delay, the hatch would start from the hatch trigger, but it applies from the next block. The reserveHatch block itself is a cooldown block (reserveHatch cannot be called in the same block again). For clarity, the reason it is done so originally - 1 block delay - is that 0 delay would be vulnerable to regular MEV (frontrunning). The delay is already there to prevent MEV, but it fails for MMEV. Note also in my mitigation diff:
I think this is an illogical understanding of what a root cause is. You cannot "fix a scenario but not the root cuase" - it makes no sense. If the finding is fixable it's fix removes the root cause, otherwise it was not the "fundamental / primary" property, and it did not impose the unwanted behavior.
Again, this misunderstands the design. Blocking calls to locks is a necessary feature of the design. What you suggest - " To further drive home the difference between the two issues, both root causes and unwanted outcomes, please see these different hatch blocking cycles ("B" is blocked, "C is open and in cooldown") each starting form the RH call. As 3.1.1 notes <20% ratio as too little, we'll use less than 20% as "bad" in either extreme. Current (delay 1, size 5, cooldown 20) - note that it starts with cooldown (cannot call RH, and start B on next block):
Delay 0:
Delay 3:
Size too short, delay ok:
Cooldown too short, delay ok:
Ratio ok, delay too short:
Cooldown too short, delay ok, mixed close and open:
As you can see the vulnerabilities are completely unrelated beyond the fact that both are concerned with the reserveHatch mechanism. @IllIllI000 Kindly support and explain your arguments. You've previously insisted on confusing attack path and root cause - one being implemented in code and another one actions done by attacker. Now you're making other confused but confident claims like "You yourself have that in your root cause in the submission" quoting out of context a single sentence out of a whole section. These are troll tactics and unbecoming to 2nd place on the leaderboard. You also now "mistake" the mechanism for its vulnerabilities:
Again, there is no way around blocking locks (and therefore votes) to chief for some of the time - this IS the reserveHatch mechanism. Maker understands it, Cantina understands it, even I understand that. You are ignoring this. The locking is the design, without it there is no reserveHatch - it's its only purpose. The vulnerabilities are in the details of the blocking cycle. Cantina focuses only on the ratio of size/cooldown - to allow sufficient total time for each one-way traffic, and I speak to the timing of hatch start - to prevent MMEV.
It is not said 20% is not-wanted, it is said that the ratio can be too little, and that currently it's slightly below 20%. It says the two sides need to be considered, with one side (liquidations) not being too dominant (over voting):
They are NOT saying - "do not block locks" - because it is the core functionality of that mechanism. |
WangSecurity has been very patient with you, and has been trying to show you why this is a duplicate and invalid. Where I see that the rules are applicable and have not been mentioned, I try to mention them, since I contributed to writing them along with WangSecurity, and think I can help with that. Despite what you may think, I wasn't trolling you. It's an exact quote from your report, in from the section I mentioned, and my intention was to show you that despite your assertion that it's not a root cause under your definition, it's the one Sherlock uses, and makes sense in the context of the report submission template.
I'm not confusing anything. Take a look here. I believe your claim is that only specific lines of code, or the minimal point fix of an issue, may be root causes (e.g. "These two findings have different fixable flaws (root cause) they describe and recommend to fix" above), but that's not the case. As I've highlighted, conceptual issues, as occur in this bug, can be root causes as well. Here, the
Given the above summary, both your submission and 3.1.1 are the same issue, just with different attack paths. You seem to be most focused on the MMEV aspect for the recent discussion, but you also list the same attack path as 3.1.1, in your
Just because the DOS protection exists, that doesn't mean that feature can't be modified, or some other new voting mechanism without the limitation couldn't be designed; they just made a design decision not to.
They added a mechanism that caused a bug (blocking votes). It doesn't matter that that's its sole purpose. |
Thanks to both of your comments. @mrhudson890 excuse me for misunderstanding, what you meant by hatch delay and I see the 1 block hatch delay now. But, still these issues are duplicates. As @IllIllI000 correctly said above, this is the conceptual issue of blocking the locks after calling reserveHatch. Both scenarios are still possible due to this functionality. And, I see you disagree that the fixes in two reports fix the scenario. But I still stand by my words. Fix in Cantina’s report fixes the scenario of spamming RH to block locks, while your report fixes the scenario of front-running. |
@WangSecurity Your position is that the reserveHatch mechanism's blocking of locks is the bug / problem / mistake. However, this is a misunderstanding of the system:
It's like a traffic light for regulating traffic. Saying that blocking locks is a "conceptual mistake", "problem", "mistake", "bug" is like saying that a traffic light's blocking of traffic during (red light) is a bug / mistake / problem. But it is ALL it's there to do. And it must be done, just like blocking conflicting traffic. Otherwise the system is vulnerable in another way. If you remove the chief's flashloan protection - you break voting, if you remove the reserveHatch - you break liquidations. This is the mechanism that balances these constraints, and it can only do it by blocking locks. You suggest that blocking is avoidable. This is incorrect. Maker's devs, Cantina's and ChainSecurity's auditors, and all the participants in this contest haven't been able to suggest an alternative mechanism. Given that the need for blocking locks is an necessary goal and purpose, with no safer alternatives, it cannot be a "mistake" or "bug", conceptual or otherwise. Therefore, I don't think it's intellectually honest to say that the single unavoidable goal and property of the mechanism - blocking locks - can be the root cause of two avoidable different issues with it. This claim could be considered if both issues were unavoidable. However, both are avoidable as shown, by addressing their specific root causes: hatch delay too short in this report, and potential size/cooldown imbalance for 3.1.1. In the examples (previous comment) I also show they are independently avoidable (each can be fixed without the other) Additionally, tying this back to the definition: the "reason" must impose the unwanted behavior. This factor cannot impose it, since both issues can be resolved with it in place - the locks are still blocked some of the time as needed. In contrast, the factors I outlined DO impose them, because they are necessary and sufficient to avoid / fix those issues. Therefore, the root causes, according to Sherlock's definition, are the ones I outlined. Specifically, they are different for the two issues. |
The current on-chain vote delegation code does not have a |
@mrhudson890 I understand that it's how the protocol works and it's intended, still it's the block of locks that allow for both attacks. Moreover, the team is aware of the risks that locks can be locked and prevented for locking (for 5 blocks). Hence, my decision remains, I still believe these 2 reports are the same issue and only different attack paths/scenario. Additionally, it's a common case that there are 2+ scenarios from the same root cause and they all can be fixed individually, but they still come from the same underlying reason, which I believe is the case. |
Your first method breaks liquidations. Your second method breaks delegation because for a vote change an unlimited number of per-delegatee contracts will need to be called to update their vote in chief.
|
The decision remains, planning to reject the escalation and leave the issue as it is. I'll apply this decision in several hours. |
@WangSecurity after careful consideration of your position, I believe there's a crucial distinction being missed.
While it's true that multiple scenarios can stem from the same root cause (there are 3 scenarios in this report), this is not the case between these two findings. The core issue isn't that blocking occurs, but how it's incorrectly applied in each case. The reserveHatch mechanism must block locks, but the "how" is what these issues address - and they highlight completely different, unrelated, and independent "hows". Your current position - that the 'block of locks' is the root cause for both attacks - conflates the mechanism's intended function with its specific vulnerabilities. This is like saying that a car's ability to move is the root cause of both speeding and wrong-way driving accidents. While movement enables these incidents, it's not their root cause.
Furthermore, this interpretation contradicts Sherlock's definition: blocking itself does not "impose an unwanted outcome" as it remains in all cases (so does not impose), and is therefore not fundamental or primary. The specific, incorrect implementations of blocking are what impose the unwanted outcomes and are the fundamental distinct reasons.
The team's awareness of some risks due to locking does not include the voting manipulation via MMEV to gain extra voting power, as detailed in this report. This risk is not acknowledged in any of the extensive available sources of truth.
As shown in my previous comments (1, 2, 3), these issues share almost nothing in common beyond involving blocking locks (which is the unavoidable core of the mechanism). As an impartial arbiter of Sherlock's rules, I urge you to reconsider your position regarding the issue being a duplicate of 3.1.1, and instead judge it on its own merits. |
Thank you for another comment and expressing your position again. But, my decision remains. I still believe that blocking of locks is what allows for both attacks and it's only different scenario's/attack paths as I explained in my several previous comments. It is blocking that leads to an unwanted outcome, i.e. the attacker blocks users from voting against them. Hence, my decision remains that these 2 reports are duplicates. When I said that this risk is known, I meant that the team is aware that voting can be blocked for certain period of time, which is also the case in this finding. Hence, my decision remains as in several previous comments, planning to reject the escalation and leave the issue as it is. I will apply this decision in several hours. Note: let's refrain from using analogies(e.g. with a car above, with traffic lights in the previous comment or similar), they're irrelevant to the discussion. |
Result: |
Escalations have been resolved successfully! Escalation status:
|
mrhudson890
Medium
VoteDelegate
'sreserveHatch
allows multi-block MEV to grief LSE users during time-sensitive voting periodsSummary
LockstakeEngine (LSE) and VoteDelegate (VD) interaction allows malicious actors to exploit multi-block MEV (MMEV) which is possible for roughly 30% of blocks. This allows griefing users of significant funds, and disrupting time-sensitive governance functions. By manipulating VD's
reserveHatch()
(RH) in the first block of MMEV, attackers can cause users' LSE transactions (selectVoteDelegate
,lock
,lockNgt
) to fail during crucial voting periods, potentially leading to losses exceeding 0.5%, or even 5% of user value in gas fees. This exploit not only results in financial losses but also prevents time-sensitive voting delegations, manipulating governance outcomes, especially in emergency situations. The non-upgradeable nature of the system amplifies the long-term impact given MMEV.The cost to attacker is minimal: 28K gas, while the cost to user is much larger, e.g., 900K gas for a large multicall batch. At 200 gwei and 3500 USD per ETH, this translates to attacker cost (
3500×28000×200÷10^9
): 19 USD, and user cost of 630 USD (more than 5% for 10K user value) in the case of a large multicall batch.Root Cause
The VoteDelegate contract includes a
reserveHatch
(RH) mechanism that, when called, prevents locking (entering) for a set number of blocks (5), starting from the next block. This prevents single block MEV griefing.However, this does not prevent MMEV: MEV over consecutive blocks controlled by a single builder or proposer. This is exacerbated by the use of Multicall which will cause multiple batched user actions to all fail, wasting a very high amount of gas fees.
The
LockstakeEngine
uses this lock function in its_selectVoteDelegate
internal method, that's used throughoutselectVoteDelegate
,lock
, andlockNgt
user facing functions:This design allows an attacker to exploit the
reserveHatch
mechanism to prevent users from executing these functions, particularly during critical voting periods.This griefing vector is exacerbated by the fact that a Multicall batch, carrying potentially several gas expensive user actions, will fail when one of the vulnerable actions is blocked. This is because calling RH can be as cheap as 28K gas, but the loss for the user can be much higher, potentially Millions in gas units (depending on the batched operations). This creates a very high griefing factor (cost to damage ratio) of e.g., 3200% griefing ratio (damage / cost) for a 911K batch (from the PoC).
The vulnerability is enabled by the current state of Ethereum's block production, where a small number of builders control a large portion of block production (54% for beaverbuild alone), enabling multi-block MEV attacks via builders (controlling many consecutive blocks) as well as individual proposers (controlling significant stake and occasional consecutive blocks). For example coinbase controlled proposers control 15% of the stake right now.
If a builder controls 54% of the blocks, the chances of the next block being controlled by them is 54%, which translates to 29% for any given block. Thus, if supported by beaverbuild, MMEV would enable this vector for 29% of all blocks. If MMEV becomes widely adopted by MEV infra software, this could rise to as much 60%, since MEV infra is used by around 80% of stake.
While this finding focuses on
selectVoteDelegate
,lock
, andlockNgt
, it's worth noting thatfree
,freeNGT
can also be blocked by frontrunning with a deposit into theVoteDelegate
duecheif
's flashloan protection. This opens even more MMEV options, due to the ability to block multicalls selectively, by either targetingselectVD / lock
actions, or targetting thefree
actions for different batches. Both attack paths are linked via the tradeoff imposed by thereserveHatch
andcheif
functionality between only being able to either free or lock during a single block.Internal pre-conditions
External pre-conditions
Attack Path
Scenario 1: Builder-based MMEV Attack
reserveHatch()
call to the target VoteDelegate.lock()
in the VoteDelegate.Scenario 2: Proposer-based MMEV Attack
reserveHatch()
call.Scenario 3: Selective Governance Attack
reserveHatch
attack during the voting period when possible (outside of cooldown periods) against users trying to vote against the attacker's proposal.Impact
Loss of funds: Users lose significant gas fees when their time-sensitive transactions fail. Depending on gas prices and complexity of multicall transactions, losses could amount to hundreds of dollars per failed transaction. For example, at 3500 USD per ETH, and 200 gwei gas price:
selectVoteDelegate
revert: 101K gas, 70 USDGovernance Interference: The attack can prevent users from updating their vote delegations during critical periods, potentially swaying governance decisions. This is particularly impactful for emergency proposals that require quick action.
Long-term Participation Decline: Consistent attacks could discourage users from participating in governance, leading to more vulnerable governance due to lower honest user participation.
Systemic Risk: Given that the LockstakeEngine is designed to be non-upgradeable and used for multiple years, this vulnerability poses a long-term risk to the system's integrity and decentralization due to the likelihood of MMEV prevalence.
PoC
The PoC measures the gas costs for:
reserveHatch
(attacker): 28K gasselectVoteDelegate
revert: 101K gasThe PoC adds tests for
LockstakeEngineBenchmarks
class inBenchmarks.t.sol
file in this measurement PR https://github.com/makerdao/lockstake/pull/38/files of branch https://github.com/makerdao/lockstake/tree/benchmarks. I've also merged latestdev
into it to ensure it's using latest contracts.The output of the PoC:
>>> forge test --mc LockstakeEngineBenchmarks --mt testGas -vvv Logs: reserveHatch cost: 28248 multicall (5 ops) revert cost: 709731 multicall (8 ops) revert cost: 911424 Logs: reserveHatch cost: 23594 selectVoteDelegate revert cost: 101133
The added tests:
Mitigation
To mitigate this issue, the VoteDelegate contract's
reserveHatch
mechanism should be modified to introduce a longer delay before the hatch takes effect. Instead of starting from the next block, it should start after M blocks, where M is a governance-controlled parameter. The parameters can be stored on the VD factory, and the VDs can query it there. This makes the attack more expensive and less likely to succeed, as controlling M consecutive blocks is more expensive and less likely.The total time the hatch is open and the duration of the cooldown cycle is not changed. The only modification is the timing of the hatch in the cycle. Instead of the cycle being 25 blocks starting with the 5 hatch blocks, the 5 hatch blocks are moved later in the cycle.
Proposed change:
Governance can adjust
hatchDelay
based on the likelihood of MMEV issues.The text was updated successfully, but these errors were encountered: