-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
htlcswitch: relax failure message length check #6913
htlcswitch: relax failure message length check #6913
Conversation
e925b9d
to
05d0267
Compare
52e2860
to
47a5852
Compare
Is there any chance that this can be part of v0.15.2? Strictly speaking lnd is in a vulnerable position. If other implementations would decide to deviate from the recommended failure message size, senders using lnd will see their mission control messed up because of all the penalties applied for unreadable failures. |
Need to decide on how to signal sender capabilities as per lightning/bolts#1021 (comment) |
This PR could be merged regardless of the signaling mechanism though. On its own, I don't think it does any harm and just makes lnd spec compliant. The same fix in Eclair: ACINQ/eclair#2438 |
@@ -1235,10 +1235,6 @@ func DecodeFailure(r io.Reader, pver uint32) (FailureMessage, error) { | |||
if err := ReadElement(r, &failureLength); err != nil { | |||
return nil, fmt.Errorf("unable to read error len: %v", err) | |||
} | |||
if failureLength > FailureMessageLength { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this constant in use anywhere else? If so, we should check whether their use must be changed. If not, maybe it can be deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oof, there is a lot. Need to take a look at that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All fixed except for the length check here. See #6913 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think perhaps a less disruptive way to become closer to spec compliant is instead of removing this check just require the length to be at least FailureMessageLength
.
We are already enforcing exactly FailureMessageLength
in lightning_onion
and it seems like we cannot easily just remove the check altogether without opening up a can of works wrt those converted errors.
If we instead start allowing larger errors than before, we can support the larger errors as we want (and that the spec says we should support), and still we can distinguish the malformed errors by their length, as no valid error message could be FailureMessageLength +4
including the HMAC.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely a nice intermediate step. One thing to consider is that if we increase the recommended failure length to 1024 bytes eventually, we need to do something extra to also extend converted malformed failures to that length.
Perhaps when we detect a malformed failure, clip off the padding again, and re-add a longer padding block? Or store the unencrypted failures without padding at all, so that they can be padded right before returning them to the switch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we stick to lengths >= 256, I don't think we need to make any further changes to this PR. Every failure reason != 260 is interpreted as a regular failure already and this would also work for longer reasons.
47a5852
to
b0a2792
Compare
Short cut coming back at us: #3027 (comment) Lightning Labs (@Crypt-iQ perhaps?) - how do we get rid of this? |
I've created two draft PRs that show different ways of getting variable length failure messages though the channel state machine:
The easy alternative is to just allow longer failure message and no shorter ones: #6913 (comment). PR is implementing this currently. |
b0a2792
to
0cfd9d1
Compare
15c4888
to
a834040
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
a834040
to
7691697
Compare
3fbb6f3
to
1828262
Compare
0db93ae
to
09751ea
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm once last test-related comment addressed
09751ea
to
92068db
Compare
@joostjager, remember to re-request review from reviewers when ready |
// Because the reason is unreadable for the payer | ||
// anyway, we just replace it by a compliant-length | ||
// series of random bytes. | ||
msg.Reason = make([]byte, minimumFailReasonLength) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For debugging/visibility purposes, could it be an idea to pass back a static "I'm sorry the original error is unreadable"+pad error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suppose we'd actually handle this outcome differently (so going beyond debugging/visibility), this might motivate an attacker to corrupt failure messages in this exact way.
Also if the origin node receives this message, they won't know which node set it because this is part of the intermediate transform logic. They do know that it must have been an lnd node 😄
I like the suggestion, but not sure if it is actually good or beneficial. It should never happen with unmodified nodes, because all is set to 256 bytes currently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true, not very likely to happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one minor comment then lgtm!
@joostjager the dependent PR has been merged, so we can bump the dep here to unblock the CI. |
This mock is used in the switch test TestUpdateFailMalformedHTLCErrorConversion. But because the mock isn't very realistic, it doesn't detect problems in the handling of malformed failures in the link.
Adds extra checks to make sure the failure message is well-formed.
92068db
to
46bcc61
Compare
Removed go.mod replace and bumped lightning-onion dep. |
This commit modifies the link behavior so that every failure reason that we pass back is length-compliant (>=256 bytes).
46bcc61
to
fc73ee2
Compare
This fixes an incompatibility where lnd enforces a strict 256 byte failure message, where as the spec sets this only as the recommended length.
fc73ee2
to
4c8ea29
Compare
The spec does not dictate but only recommends a length of 256 bytes. Future tlv extensions may push the failure message length over this limit.
With this change, receivers can ignore the lengthier extensions without handling it as an unreadable failure. Before long tlv extensions can be used, a sufficiently large number of senders needs to be upgraded. Therefore the sooner this PR is released, the better.
Depends lightningnetwork/lightning-onion#59