-
Notifications
You must be signed in to change notification settings - Fork 902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve route randomization #1012
Conversation
8411066
to
5099930
Compare
Ready for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi! I like this patch, but I have some suggestions:
- I think the API should rename inefficiency to "fuzz", and make it a percentage.
- I think the seed should be optional: it's only really useful for testing reproducibility.
- I think the default should be 5% or so, not 0.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent PR, just the scaling up and down that feels a bit dodgy.
gossipd/gossip.c
Outdated
&source, &destination, | ||
&msatoshi, &riskfactor, &final_cltv, | ||
&inefficiency_scaled, &seed); | ||
inefficiency = ((double) inefficiency_scaled) / 4294967296.0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like @rustyrussell I'd prefer having an integer percentage that doesn't require scaling up and down.
24a9bbf
to
474c015
Compare
Rebased and modified as per @rustyrussell 's initial review. However, @cdecker:
Is this what is wanted? Internally I would drop to a Or do you mean, I should accept integer percentage, serialize to |
@ZmnSCPxj yeah that was what I was thinking of, upconverting from double to int for transfer and then back again seems strange to me. Having a percentage and downconverting it to double once will reduce any strange imprecision effect. |
4294967296.0 is exactly a power of two, so it should just be manipulation of the exponent part (modulo weird numbers like NAN and infinity and zero). Then conversion to and from Another concern is that our other "percent" argument is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor changes in normalization, and defaults...
lightningd/payalgo.c
Outdated
pay->tries = 0; | ||
pay->getroute_tries = 0; | ||
pay->sendpay_tries = 0; | ||
pay->fuzz = 0.75; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We hand pay->fuzz
into towire_gossip_getroute_request() which takes a u32, so this turns into 0 fuzz. That's probably not what we want; should we use the same default value as getroute(), or something else?
gossipd/gossip.c
Outdated
&source, &destination, | ||
&msatoshi, &riskfactor, &final_cltv, | ||
&fuzz_scaled, &seed); | ||
fuzz = ((double) fuzz_scaled) / 4294967296.0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't reach 100% here. Perhaps that's OK, but I think it's unintended.
lightningd/gossip_control.c
Outdated
@@ -362,16 +366,16 @@ static void json_getroute(struct command *cmd, const char *buffer, const jsmntok | |||
} | |||
/* Convert from percentage */ | |||
fuzz = fuzz / 100.0; | |||
fuzz_scaled = (u32) (fuzz * 4294967296.0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you specify 100%, this may evaluate to 0.
lightningd/gossip_control.c
Outdated
} | ||
/* Convert from percentage */ | ||
fuzz = fuzz / 100.0; | ||
fuzz_scaled = (u32) (fuzz * 4294967296.0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still means 100% == 0!
Let's fix this properly and do towire_double/fromwire_double. It can just memcpy, since our inter-daemon comms is all same-architecture, and I can't find a portable way to encode a double other than printf/scanf :(
lightningd/payalgo.c
Outdated
pay->sendpay_parent = NULL; | ||
pay->getroute_tries = 0; | ||
pay->sendpay_tries = 0; | ||
pay->fuzz = 0.75; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is default here 75%, vs 5% for getroute?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a default, but an initial value. pay
respects maxfeepercent
, and if it gets a route that has too high fee, it will reduce its fuzz
value in the hope that the next returned route with a lower fuzz will have a lower fee. This means that to take advantage of higher-than-usual maxfeepercent
for improved privacy we should start with a large fuzz
(which has a chance of getting much higher fee than would be typical for lower or 0 fuzz).
The point of route randomization is to avoid low-fee routes, which might be honeypots used by three-letter monitoring agencies, by occassionally jumping to higher-than-usual fee routes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If 75% is a good starting point for pay
, why it it not a good default for getroute
? It it anticipated that we will see significant failures at 75%? I'm not sure that's true. These numbers seem made up, which is fine, but they do need a comment on what we'd need to know to set decent defaults in future...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, my initial idea was to have getroute
default to 0%, for compatibility with before this patch, where getroute
never fuzzed and (if there were no changes in network topology) would always return the same route again and again. A mere 5% for fuzz is unlikely to change anything, either.
If that behavior is not necessary for getroute
, then 75% would be a reasonable default for getroute
, too.
bfe76f0
to
a557fa8
Compare
Added serialization of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, let's use a fixed 16-byte seed, ready for siphash, and use siphash internally.
We could also use the node pubkey (directly, feed it into siphash) instead of scid to make the iteration more stable (since fuzz would depend on the node itself rather than what edge we came in on).
We can get faster in future by using isaac with a single initialization, but that loses per-node determinism.
lightningd/payalgo.c
Outdated
pay->try_parent = tal_free(pay->try_parent); | ||
pay->try_parent = tal(pay, char); | ||
|
||
++pay->tries; | ||
/* Generate random seed */ | ||
seed = tal_arr(pay->try_parent, u8, ISAAC64_SEED_SZ_MAX); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2048 bytes of randonmess is extreme!
8 bytes of non-cryptographic randomness is more than sufficient; we should just add a psuedorand64() which wraps isaac64_next_uint64() and use that here I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, will do.
lightningd/payalgo.c
Outdated
pay->sendpay_parent = NULL; | ||
pay->getroute_tries = 0; | ||
pay->sendpay_tries = 0; | ||
pay->fuzz = 0.75; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If 75% is a good starting point for pay
, why it it not a good default for getroute
? It it anticipated that we will see significant failures at 75%? I'm not sure that's true. These numbers seem made up, which is fine, but they do need a comment on what we'd need to know to set decent defaults in future...
gossipd/routing.c
Outdated
towire_short_channel_id(&scid, &c->short_channel_id); | ||
isaac64_reseed(&myrng, | ||
(const unsigned char *) scid, tal_len(scid)); | ||
tal_free(scid); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's pretty heavy. It's nice because it makes the per-edge decision deterministic, though.
run-bench-find_route 10000 10
with 0 fuzz:
10 (10 succeeded) routes in 10000 nodes in 3990 msec (399066864 nanoseconds per route)
With 0.75 fuzz and a fixed rng:
10 (10 succeeded) routes in 10000 nodes in 73423 msec (7342349728 nanoseconds per route)
Ouch!
I switched it to hand in a 64-bit base_seed, and use short_channel_id_to_uint() instead of wire allocation, and initialize the rng with [base_seed, short_channel_id_to_uint()], but it's still v. slow:
10 (10 succeeded) routes in 10000 nodes in 67433 msec (6743351288 nanoseconds per route)
This is partially because it's expensive to set up an isaac64 context just to pull one number out; we're better using siphash for this:
10 (10 succeeded) routes in 10000 nodes in 5668 msec (566855365 nanoseconds per route)
I've put a rough patch (enough to compile and run the bench): https://gist.github.com/rustyrussell/c3334112c096d97e0adb0d3a8fc3fc32
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The patch is actually pretty good and I would ACK that.
My understanding of the algorithm is that each edge is what is assigned a cost, and if we encounter a node multiple times then if the cumulative cost from an edge is lower than the current cost of the node, then the current cost of the node is updated to be coming from the edge. If we use per-node, then all outgoing channels from that node will have the same fuzz cost. At least, that is my understanding of the algorithm.... I will defer to you if you think my understanding is wrong, as you wrote it in the first place anyway. |
In preparation for improved route randomization.
Now try_parent is used as the parent for all allocations needed for a try.
a557fa8
to
dbdf68d
Compare
f0c4e46
to
d3493aa
Compare
Rebased, used siphash as per @rustyrussell patch, change getroute default fuzz to 75%, explain fuzz. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK d3493aa
to provide some randomization to the route generated. | ||
0.0 means the exact fee of that channel is used, | ||
while 100.0 means the fee used might be from 0 to twice the actual fee. | ||
The default is 5.0, or up to 5% fee distortion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default is 75.0, or up to 75% fee distortion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops!
* 2*fuzz*rand random number between 0.0 -> 2*fuzz | ||
* 2*fuzz*rand - fuzz random number between -fuzz -> +fuzz | ||
*/ | ||
fee_scale = 1.0 + (2.0 * fuzz * h / UINT64_MAX) - fuzz; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty sure we can just make h an s64 and divide by INT64_MAX: fee_scale = 1.0 + fuzz * h / INT64_MAX
But it's trivial and unlikely to be any faster, so I'll just leave this as a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Negative number absolute range is one higher than positive range, so getting INT64_MIN by chance would break that. Of course very unlikely but well.
lightningd/payalgo.c
Outdated
@@ -243,7 +243,7 @@ static bool json_pay_try(struct pay *pay) | |||
pay->try_parent = tal(pay, char); | |||
|
|||
/* Generate random seed */ | |||
seed = tal_arr(pay->try_parent, u8, ISAAC64_SEED_SZ_MAX); | |||
seed = tal_arr(pay->try_parent, u8, sizeof(struct siphash_seed)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've produced a patch to actually hand a siphash_seed across the wire, which IMHO makes this neater, but I'll create a separate PR after this.
ACK d3493aa |
You are right on the bfg_one_edge logic; per-edge makes more sense than per-node. |
Fixes: #928
I want to integrate
pay
command also to use route randomization if themaxfeepercent
is high enough, so, this will wait for #993 first so I can add it topayalgo.c
.