Simpler and faster implementation of Floyd's F2 #1277

ciphergoth · 2022-12-29T16:56:11Z

The previous implementation used either Vec::insert or a second F-Y shuffling phase to achieve fair random order. Instead, use the random numbers already drawn to achieve a fair shuffle.

ciphergoth · 2022-12-29T17:14:01Z

On my machine - an M1 MacBook I get the following benchmarks:

before:

test misc_sample_indices_100_of_1G           ... bench:       2,672 ns/iter (+/- 37)
test misc_sample_indices_100_of_1M           ... bench:       2,649 ns/iter (+/- 64)
test misc_sample_indices_100_of_1k           ... bench:         454 ns/iter (+/- 10)
test misc_sample_indices_10_of_1k            ... bench:          50 ns/iter (+/- 1)
test misc_sample_indices_1_of_1k             ... bench:          20 ns/iter (+/- 0)
test misc_sample_indices_200_of_1G           ... bench:       4,078 ns/iter (+/- 160)
test misc_sample_indices_400_of_1G           ... bench:       8,120 ns/iter (+/- 139)
test misc_sample_indices_600_of_1G           ... bench:      11,577 ns/iter (+/- 186)

After:

test misc_sample_indices_100_of_1G           ... bench:       2,252 ns/iter (+/- 16)
test misc_sample_indices_100_of_1M           ... bench:       2,233 ns/iter (+/- 49)
test misc_sample_indices_100_of_1k           ... bench:         453 ns/iter (+/- 3)
test misc_sample_indices_10_of_1k            ... bench:          54 ns/iter (+/- 0)
test misc_sample_indices_1_of_1k             ... bench:          19 ns/iter (+/- 0)
test misc_sample_indices_200_of_1G           ... bench:       4,060 ns/iter (+/- 581)
test misc_sample_indices_400_of_1G           ... bench:       8,080 ns/iter (+/- 618)
test misc_sample_indices_600_of_1G           ... bench:      11,537 ns/iter (+/- 3,903)```

ciphergoth · 2022-12-29T20:30:49Z

The 8% slower performance on misc_sample_indices_10_of_1k seemed bad, so I have pushed a new version with no performance regressions.

test misc_sample_indices_100_of_1G           ... bench:       2,160 ns/iter (+/- 52)
test misc_sample_indices_100_of_1M           ... bench:       2,154 ns/iter (+/- 57)
test misc_sample_indices_100_of_1k           ... bench:         453 ns/iter (+/- 6)
test misc_sample_indices_10_of_1k            ... bench:          49 ns/iter (+/- 3)
test misc_sample_indices_1_of_1k             ... bench:          19 ns/iter (+/- 0)
test misc_sample_indices_200_of_1G           ... bench:       4,068 ns/iter (+/- 29)
test misc_sample_indices_400_of_1G           ... bench:       8,081 ns/iter (+/- 1,526)
test misc_sample_indices_600_of_1G           ... bench:      11,534 ns/iter (+/- 310)

dhardy

Good simplification. The standard proof-by-induction of Floyd's combination algorithm can be extended to prove that the output is fully shuffled. It's a little tedious. A copy of my notes for those interested.

Proof by induction


for j in (len - amount) .. len {
    let t = rng.gen_range(0..=j);
    if X(t) {
        choose j
    } else {
        choose t
    }
}

Where:
  m = amount
  X(i) = event that i is chosen
Expect:
  P(X(i)) = m / len

Standard proof by induction, using notation:
  Xm(i) = event that i is chosen by (no later than) step m
  Xm1(i) = event i chosen at step m - 1
  Rlen = a random value from 0..=len

Case m=1 is trivial.

Assume m > 1 and correct for previous step:
Assumption: for i in 0..(len-1) { P(Xm1(i)) = (amount-1) / (len-1) }

Case: i in 0..(len-1)
Xm(i) = Xm1(i) ⊻ {Rlen=i}
P(Xm(i)) = P(Xm1(i)) + P(!Xm(i))*(1/len)
P(Xm(i)) = (m-1)/(len-1) + (len-1-m+1)/(len-1)*(1/len)
P(Xm(i)) = len*(m-1)/(len*(len-1)) + (len-1-m+1)/(len*(len-1))
P(Xm(i)) = (len*m-m)/(len*(len-1))
P(Xm(i)) = m/len

Case: i=len-1
Xm(i) = {Rlen=i} ⊻ {Rlen=k where Xm1(k)}
P(Xm(i)) = 1/len + Sum(k in 0..len){1/len * P(Xm1(k))}
P(Xm(i)) = 1/len + (len-1)/len * (m-1)/(len-1)
P(Xm(i)) = 1/len + 1/len * (m-1)
P(Xm(i)) = m/len

Ok.


Fully shuffled variant:
for j in (len - amount) .. len {
    let t = rng.gen_range(0..=j);
    if X(t) {
        choose j
        swap elts x, j
    } else {
        choose t
    }
}

Proof by induction:
  X(i,j) = event that i is chosen and appears in output at position j
Expect:
  P(X(i,j)) = P(X(i)) / m = 1 / len

Assume for step m-1:
Assumption: for i in 0..(len-1), j in 0..(m-1) { P(Xm1(i,j)) = 1/(len-1) }

Case: i in 0..(len-1), j in 0..(m-1)
Xm(i,j) = if Xm1(Rlen,j) then 0 else Xm1(i,j)
Xm(i,j) = !Xm1(Rlen,j) AND Xm1(i,j)
P(Xm(i,j)) = (1 - Sum(k in 0..(len-1)){1/len * 1/(len-1)}) / (len-1)
P(Xm(i,j)) = (1 - 1/len) / (len-1)
P(Xm(i,j)) = (len - 1) / len / (len-1)
P(Xm(i,j)) = 1/len

Case: i in 0..(len-1), j=m-1: P(Xm1(i,j)) = 0
Xm(i,j) = Rlen=i
P(Xm(i,j)) = 1/len

Case: i=len-1, j=m-1:
Xm(i,j) = Rlen=i
P(Xm(i,j)) = 1/len

Case: i=len-1, j in 0..(m-1):
Xm(i,j) = Xm1(Rlen, j)
P(Xm(i,j)) = Sum(k in 0..(len-1)){1/len / (len-1) = 1/len

Ok.

The one thing that needs changing is the CHANGELOG: this is a value-breaking change. Affected API must be noted.

The previous implementation used either `Vec::insert` or a second F-Y shuffling phase to achieve fair random order. Instead, use the random numbers already drawn to achieve a fair shuffle.

ciphergoth · 2022-12-31T20:12:30Z

Have added some notes on this, let me know if this is worded the way you need - thanks!

dhardy

Thanks

ciphergoth force-pushed the fasterf2 branch from bcb472a to 3fc3b5b Compare December 29, 2022 20:29

dhardy requested changes Dec 31, 2022

View reviewed changes

Simpler and faster implementation of Floyd's F2

0488810

The previous implementation used either `Vec::insert` or a second F-Y shuffling phase to achieve fair random order. Instead, use the random numbers already drawn to achieve a fair shuffle.

ciphergoth force-pushed the fasterf2 branch from 3fc3b5b to 0488810 Compare December 31, 2022 20:09

dhardy approved these changes Jan 1, 2023

View reviewed changes

dhardy merged commit e97b5b6 into rust-random:master Jan 1, 2023

ciphergoth deleted the fasterf2 branch January 1, 2023 17:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simpler and faster implementation of Floyd's F2 #1277

Simpler and faster implementation of Floyd's F2 #1277

ciphergoth commented Dec 29, 2022

ciphergoth commented Dec 29, 2022

ciphergoth commented Dec 29, 2022

dhardy left a comment •

edited

Loading

ciphergoth commented Dec 31, 2022

dhardy left a comment

Simpler and faster implementation of Floyd's F2 #1277

Simpler and faster implementation of Floyd's F2 #1277

Conversation

ciphergoth commented Dec 29, 2022

ciphergoth commented Dec 29, 2022

ciphergoth commented Dec 29, 2022

dhardy left a comment • edited Loading

Choose a reason for hiding this comment

ciphergoth commented Dec 31, 2022

dhardy left a comment

Choose a reason for hiding this comment

dhardy left a comment •

edited

Loading