Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve route randomization #1012

Merged
merged 10 commits into from
Feb 26, 2018
Merged

Conversation

ZmnSCPxj
Copy link
Collaborator

Fixes: #928

I want to integrate pay command also to use route randomization if the maxfeepercent is high enough, so, this will wait for #993 first so I can add it to payalgo.c.

@ZmnSCPxj ZmnSCPxj force-pushed the route-random branch 2 times, most recently from 8411066 to 5099930 Compare February 16, 2018 14:16
@ZmnSCPxj ZmnSCPxj changed the title [WIP] Improve route randomization Improve route randomization Feb 16, 2018
@ZmnSCPxj
Copy link
Collaborator Author

Ready for review

Copy link
Contributor

@rustyrussell rustyrussell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! I like this patch, but I have some suggestions:

  1. I think the API should rename inefficiency to "fuzz", and make it a percentage.
  2. I think the seed should be optional: it's only really useful for testing reproducibility.
  3. I think the default should be 5% or so, not 0.

@ZmnSCPxj
Copy link
Collaborator Author

  1. Okay, will do later.
  2. seed is already optional, but perhaps you mean it should be a randomly-generated seed if not specified, rather than fixed to 8x 0 bytes?
  3. Presumably inefficiency/fuzz? Currently, multiple getroute with the same arguments will return the same route, modulo any channel updates that make the route shorter. I wanted to retain this property, so default inefficiency/fuzz to 0. However, it is also true that channel updates made by other nodes opening new channels and closing old ones will cause getroute to change its return value, so perhaps such a default of 0.05/5% is reasonable.

Copy link
Member

@cdecker cdecker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent PR, just the scaling up and down that feels a bit dodgy.

gossipd/gossip.c Outdated
&source, &destination,
&msatoshi, &riskfactor, &final_cltv,
&inefficiency_scaled, &seed);
inefficiency = ((double) inefficiency_scaled) / 4294967296.0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like @rustyrussell I'd prefer having an integer percentage that doesn't require scaling up and down.

@ZmnSCPxj
Copy link
Collaborator Author

Rebased and modified as per @rustyrussell 's initial review. However, @cdecker:

Like @rustyrussell I'd prefer having an integer percentage that doesn't require scaling up and down.

Is this what is wanted? Internally I would drop to a doublefor random generation anyway. Indeed I think I would want some way to serialize a double (towire_double, fromwire_double) if I can find some easy way to do so.

Or do you mean, I should accept integer percentage, serialize to u8, then perform the conversion from percentage to a double 0.0 -> 1.0 inside gossipd?

@cdecker
Copy link
Member

cdecker commented Feb 19, 2018

@ZmnSCPxj yeah that was what I was thinking of, upconverting from double to int for transfer and then back again seems strange to me. Having a percentage and downconverting it to double once will reduce any strange imprecision effect.

@ZmnSCPxj
Copy link
Collaborator Author

4294967296.0 is exactly a power of two, so it should just be manipulation of the exponent part (modulo weird numbers like NAN and infinity and zero). Then conversion to and from u32 should just be cutting off some bits of mantissa.

Another concern is that our other "percent" argument is pay maxfeepercent, which accepts fractionals, and indeed defaults to 0.5 (0.5%). Do we accept fractionals for fuzzpercent? If not, this would be unexpected since maxfeepercent accepts fractions but fuzzpercent does not. If accept, then we would have roundoff/conversion issue regardless, unless we make a towire_double or fromwire_double.

Copy link
Contributor

@rustyrussell rustyrussell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor changes in normalization, and defaults...

pay->tries = 0;
pay->getroute_tries = 0;
pay->sendpay_tries = 0;
pay->fuzz = 0.75;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We hand pay->fuzz into towire_gossip_getroute_request() which takes a u32, so this turns into 0 fuzz. That's probably not what we want; should we use the same default value as getroute(), or something else?

gossipd/gossip.c Outdated
&source, &destination,
&msatoshi, &riskfactor, &final_cltv,
&fuzz_scaled, &seed);
fuzz = ((double) fuzz_scaled) / 4294967296.0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't reach 100% here. Perhaps that's OK, but I think it's unintended.

@@ -362,16 +366,16 @@ static void json_getroute(struct command *cmd, const char *buffer, const jsmntok
}
/* Convert from percentage */
fuzz = fuzz / 100.0;
fuzz_scaled = (u32) (fuzz * 4294967296.0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you specify 100%, this may evaluate to 0.

}
/* Convert from percentage */
fuzz = fuzz / 100.0;
fuzz_scaled = (u32) (fuzz * 4294967296.0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still means 100% == 0!

Let's fix this properly and do towire_double/fromwire_double. It can just memcpy, since our inter-daemon comms is all same-architecture, and I can't find a portable way to encode a double other than printf/scanf :(

pay->sendpay_parent = NULL;
pay->getroute_tries = 0;
pay->sendpay_tries = 0;
pay->fuzz = 0.75;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is default here 75%, vs 5% for getroute?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a default, but an initial value. pay respects maxfeepercent, and if it gets a route that has too high fee, it will reduce its fuzz value in the hope that the next returned route with a lower fuzz will have a lower fee. This means that to take advantage of higher-than-usual maxfeepercent for improved privacy we should start with a large fuzz (which has a chance of getting much higher fee than would be typical for lower or 0 fuzz).

The point of route randomization is to avoid low-fee routes, which might be honeypots used by three-letter monitoring agencies, by occassionally jumping to higher-than-usual fee routes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If 75% is a good starting point for pay, why it it not a good default for getroute? It it anticipated that we will see significant failures at 75%? I'm not sure that's true. These numbers seem made up, which is fine, but they do need a comment on what we'd need to know to set decent defaults in future...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, my initial idea was to have getroute default to 0%, for compatibility with before this patch, where getroute never fuzzed and (if there were no changes in network topology) would always return the same route again and again. A mere 5% for fuzz is unlikely to change anything, either.

If that behavior is not necessary for getroute, then 75% would be a reasonable default for getroute, too.

@ZmnSCPxj ZmnSCPxj force-pushed the route-random branch 2 times, most recently from bfe76f0 to a557fa8 Compare February 21, 2018 12:25
@ZmnSCPxj
Copy link
Collaborator Author

Added serialization of double instead of converting to u32 back and forth. Also rebased. Also answered @rustyrussell question in review item.

Copy link
Contributor

@rustyrussell rustyrussell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let's use a fixed 16-byte seed, ready for siphash, and use siphash internally.

We could also use the node pubkey (directly, feed it into siphash) instead of scid to make the iteration more stable (since fuzz would depend on the node itself rather than what edge we came in on).

We can get faster in future by using isaac with a single initialization, but that loses per-node determinism.

pay->try_parent = tal_free(pay->try_parent);
pay->try_parent = tal(pay, char);

++pay->tries;
/* Generate random seed */
seed = tal_arr(pay->try_parent, u8, ISAAC64_SEED_SZ_MAX);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2048 bytes of randonmess is extreme!

8 bytes of non-cryptographic randomness is more than sufficient; we should just add a psuedorand64() which wraps isaac64_next_uint64() and use that here I think.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, will do.

pay->sendpay_parent = NULL;
pay->getroute_tries = 0;
pay->sendpay_tries = 0;
pay->fuzz = 0.75;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If 75% is a good starting point for pay, why it it not a good default for getroute? It it anticipated that we will see significant failures at 75%? I'm not sure that's true. These numbers seem made up, which is fine, but they do need a comment on what we'd need to know to set decent defaults in future...

towire_short_channel_id(&scid, &c->short_channel_id);
isaac64_reseed(&myrng,
(const unsigned char *) scid, tal_len(scid));
tal_free(scid);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's pretty heavy. It's nice because it makes the per-edge decision deterministic, though.

run-bench-find_route 10000 10 with 0 fuzz:

10 (10 succeeded) routes in 10000 nodes in 3990 msec (399066864 nanoseconds per route)

With 0.75 fuzz and a fixed rng:

10 (10 succeeded) routes in 10000 nodes in 73423 msec (7342349728 nanoseconds per route)

Ouch!

I switched it to hand in a 64-bit base_seed, and use short_channel_id_to_uint() instead of wire allocation, and initialize the rng with [base_seed, short_channel_id_to_uint()], but it's still v. slow:

10 (10 succeeded) routes in 10000 nodes in 67433 msec (6743351288 nanoseconds per route)

This is partially because it's expensive to set up an isaac64 context just to pull one number out; we're better using siphash for this:

10 (10 succeeded) routes in 10000 nodes in 5668 msec (566855365 nanoseconds per route)

I've put a rough patch (enough to compile and run the bench): https://gist.github.com/rustyrussell/c3334112c096d97e0adb0d3a8fc3fc32

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The patch is actually pretty good and I would ACK that.

@ZmnSCPxj
Copy link
Collaborator Author

We could also use the node pubkey (directly, feed it into siphash) instead of scid to make the iteration more stable (since fuzz would depend on the node itself rather than what edge we came in on).

My understanding of the algorithm is that each edge is what is assigned a cost, and if we encounter a node multiple times then if the cumulative cost from an edge is lower than the current cost of the node, then the current cost of the node is updated to be coming from the edge. If we use per-node, then all outgoing channels from that node will have the same fuzz cost. At least, that is my understanding of the algorithm.... I will defer to you if you think my understanding is wrong, as you wrote it in the first place anyway.

@ZmnSCPxj
Copy link
Collaborator Author

Rebased, used siphash as per @rustyrussell patch, change getroute default fuzz to 75%, explain fuzz.

Copy link
Contributor

@rustyrussell rustyrussell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK d3493aa

to provide some randomization to the route generated.
0.0 means the exact fee of that channel is used,
while 100.0 means the fee used might be from 0 to twice the actual fee.
The default is 5.0, or up to 5% fee distortion.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default is 75.0, or up to 75% fee distortion.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops!

* 2*fuzz*rand random number between 0.0 -> 2*fuzz
* 2*fuzz*rand - fuzz random number between -fuzz -> +fuzz
*/
fee_scale = 1.0 + (2.0 * fuzz * h / UINT64_MAX) - fuzz;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure we can just make h an s64 and divide by INT64_MAX: fee_scale = 1.0 + fuzz * h / INT64_MAX

But it's trivial and unlikely to be any faster, so I'll just leave this as a comment.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Negative number absolute range is one higher than positive range, so getting INT64_MIN by chance would break that. Of course very unlikely but well.

@@ -243,7 +243,7 @@ static bool json_pay_try(struct pay *pay)
pay->try_parent = tal(pay, char);

/* Generate random seed */
seed = tal_arr(pay->try_parent, u8, ISAAC64_SEED_SZ_MAX);
seed = tal_arr(pay->try_parent, u8, sizeof(struct siphash_seed));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've produced a patch to actually hand a siphash_seed across the wire, which IMHO makes this neater, but I'll create a separate PR after this.

@rustyrussell
Copy link
Contributor

ACK d3493aa

@rustyrussell
Copy link
Contributor

You are right on the bfg_one_edge logic; per-edge makes more sense than per-node.

@rustyrussell rustyrussell merged commit d23650d into ElementsProject:master Feb 26, 2018
@ZmnSCPxj ZmnSCPxj deleted the route-random branch February 26, 2018 05:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants