Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tweak Span encoding. #58458

Merged
merged 1 commit into from
Apr 3, 2019
Merged

Conversation

nnethercote
Copy link
Contributor

Failing to fit base is more common than failing to fit len.

@rust-highfive
Copy link
Collaborator

r? @nikomatsakis

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Feb 14, 2019
@nnethercote
Copy link
Contributor Author

Local measurements indicate that this is a 3-5% instruction win for style-servo and didn't make much difference for anything else. I'll double check if CI agrees with those results.

@bors try

@bors
Copy link
Contributor

bors commented Feb 14, 2019

⌛ Trying commit b67352a with merge 46701e6...

bors added a commit that referenced this pull request Feb 14, 2019
Tweak `Span` encoding.

Failing to fit `base` is more common than failing to fit `len`.
@petrochenkov
Copy link
Contributor

cc #44646
(The PR that introduced the current encoding, it contains some further references and benchmarks.)

@bors
Copy link
Contributor

bors commented Feb 14, 2019

☀️ Test successful - checks-travis
State: approved= try=True

@petrochenkov petrochenkov added S-waiting-on-perf Status: Waiting on a perf run to be completed. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 14, 2019
@nnethercote
Copy link
Contributor Author

@rust-timer build 46701e6

@rust-timer
Copy link
Collaborator

Success: Queued 46701e6 with parent c67d474, comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit 46701e6

@nnethercote
Copy link
Contributor Author

The good results for style-servo were replicated:

style-servo-check
        avg: -3.9%      min: -5.7%      max: -2.8%
style-servo-opt
        avg: -1.3%      min: -5.4%      max: -0.2%
style-servo-debug
        avg: -2.0%      min: -4.7%      max: -0.9%

Other changes were minor and hard to distinguish from noise.

@nnethercote
Copy link
Contributor Author

r? @petrochenkov

@petrochenkov
Copy link
Contributor

This is certainly a tradeoff.
Giving one bit from len to base benefits large crates at the expense of small crates.
Apparently style-servo is huge so it gets the benefits.

@petrochenkov
Copy link
Contributor

To put things into perspective.

24 bit base is enough to cover about 15MB of code.
I suspect that this includes all the dependent crates used in one compilation session with the primary crate, since spans from other crates can also be reported.
So, 25 bit base now covers about 31MB of code.
(libstd + libcore is ~5.8MB, for example.)

7 bit len is 127 bytes of code

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

this is a small function of a modest if condition if a larger function.
Most of items probably don't fit, but most of expressions and patterns do.

6 bit len is 63 bytes of code

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

this is a non-block expression or a pattern, items and larger expressions probably don't fit.
Most notable, identifier spans still most certainly fit.

@petrochenkov
Copy link
Contributor

I'd be really interested to look at distribution of bases/lengths/ctxts during both span decoding and encoding since they should be quite different.
We looked only at encoding when introducing the span compression, since this is what determines memory consumption, but not at decoding that's probably more common and determines instruction counts.

@petrochenkov
Copy link
Contributor

@nnethercote
Could you collect the base/len/ctxt distribution on some code you think is interesting, e.g. style-servo?
This should be as simple as dumping the input numbers into a file in span_encoding::encode and span_encoding::decode.

(The most convenient way to do it is static mut *mut libc::FILE and libc::printf, sigh.)

@nnethercote
Copy link
Contributor Author

Could you collect the base/len/ctxt distribution on some code you think is interesting, e.g. style-servo?
This should be as simple as dumping the input numbers into a file in span_encoding::encode and span_encoding::decode.

I did that (with eprintln! and counts) before making this change. The machine holding the numbers is turned off for the weekend, but I found that too-long bases were more common than too-long lengths, hence the change. Lengths up to 63 are quite common and tend to drop off a lot after that; reducing the length to 5 bits would make things quite a bit worse. I can paste full numbers on Monday if you like.

And you are right that this trade-off favours large crates. I view this as a good thing... compile times of small crates are already low :) Meanwhile large crates, which are slow, pay an extra price for their size; this change mitigates that a little.

It's a shame that script-servo is broken at the moment, I suspect it would benefit too, because it's even larger than style-servo.

@petrochenkov
Copy link
Contributor

I can paste full numbers on Monday if you like.

Yes, please.
Looking at the decode statistics will make me more comfortable at approving this change.

@petrochenkov petrochenkov added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Feb 15, 2019
@Centril
Copy link
Contributor

Centril commented Feb 23, 2019

Ping from triage @nnethercote :)

@Dylan-DPC-zz
Copy link

ping from triage @nnethercote Unfortunately we haven't heard from you on this in a while, so I'm closing the PR to keep things tidy. Don't worry though, if you'll have time again in the future please reopen this PR, we'll be happy to review it again!

@Dylan-DPC-zz Dylan-DPC-zz added S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Mar 11, 2019
@petrochenkov petrochenkov added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. S-waiting-on-perf Status: Waiting on a perf run to be completed. and removed I-nominated S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Apr 2, 2019
@nnethercote
Copy link
Contributor Author

helloworld results are less good:

old
20600 counts:
(  1)    18555 (90.1%, 90.1%): fast encode
(  2)     1416 ( 6.9%, 96.9%): fast decode
(  3)      324 ( 1.6%, 98.5%): slow decode
(  4)      303 ( 1.5%,100.0%): slow encode

new
20600 counts:
(  1)    14695 (71.3%, 71.3%): fast encode
(  2)     4163 (20.2%, 91.5%): slow encode
(  3)     1414 ( 6.9%, 98.4%): fast decode
(  4)      326 ( 1.6%,100.0%): slow decode

Fast paths drop from 96.9% to 78.2%.

Failing to fit `base` is more common than failing to fit `len`.
@nnethercote nnethercote force-pushed the tweak-Span-encoding branch from b67352a to ff94fea Compare April 2, 2019 23:04
@nnethercote
Copy link
Contributor Author

@bors try

@bors
Copy link
Contributor

bors commented Apr 2, 2019

⌛ Trying commit ff94fea with merge b56778a3f620ca1c5270ab9d9973da37a07b9f62...

@bors
Copy link
Contributor

bors commented Apr 3, 2019

☀️ Try build successful - checks-travis
Build commit: b56778a3f620ca1c5270ab9d9973da37a07b9f62

@nnethercote
Copy link
Contributor Author

@rust-timer build b56778a3f620ca1c5270ab9d9973da37a07b9f62

@rust-timer
Copy link
Collaborator

Success: Queued b56778a3f620ca1c5270ab9d9973da37a07b9f62 with parent 428943c, comparison URL.

@nnethercote
Copy link
Contributor Author

Here are some len distributions (both encoding and decoding) for spans with a zero SyntaxContext. The number after each "L" indicates how many bits were required to store each length.

helloworld
20214 counts:
(  1)     4205 (20.8%, 20.8%): L6
(  2)     3862 (19.1%, 39.9%): L7
(  3)     3004 (14.9%, 54.8%): L2
(  4)     2385 (11.8%, 66.6%): L3
(  5)     1995 ( 9.9%, 76.4%): L5
(  6)     1817 ( 9.0%, 85.4%): L1
(  7)     1477 ( 7.3%, 92.7%): L0
(  8)     1226 ( 6.1%, 98.8%): L4
(  9)       88 ( 0.4%, 99.2%): L9
( 10)       84 ( 0.4%, 99.6%): L8
( 11)       42 ( 0.2%, 99.9%): L10
( 12)       16 ( 0.1%, 99.9%): L11
( 13)        9 ( 0.0%,100.0%): L12
( 14)        3 ( 0.0%,100.0%): L14
( 15)        1 ( 0.0%,100.0%): L13

ripgrep
4513315 counts:
(  1)  1331930 (29.5%, 29.5%): L3
(  2)   745711 (16.5%, 46.0%): L4
(  3)   669557 (14.8%, 60.9%): L0
(  4)   594187 (13.2%, 74.0%): L1
(  5)   488818 (10.8%, 84.9%): L2
(  6)   315629 ( 7.0%, 91.9%): L5
(  7)   202941 ( 4.5%, 96.4%): L6
(  8)   143324 ( 3.2%, 99.5%): L7
(  9)    10364 ( 0.2%, 99.8%): L8
( 10)     5862 ( 0.1%, 99.9%): L9
( 11)     2840 ( 0.1%,100.0%): L10
( 12)     1289 ( 0.0%,100.0%): L11
( 13)      540 ( 0.0%,100.0%): L12
( 14)      150 ( 0.0%,100.0%): L13
( 15)       96 ( 0.0%,100.0%): L14
( 16)       37 ( 0.0%,100.0%): L15
( 17)       18 ( 0.0%,100.0%): L17
( 18)       14 ( 0.0%,100.0%): L24
( 19)        8 ( 0.0%,100.0%): L16

style-servo
52961408 counts:
(  1) 13960255 (26.4%, 26.4%): L3
(  2)  9857544 (18.6%, 45.0%): L4
(  3)  8476040 (16.0%, 61.0%): L5
(  4)  8100731 (15.3%, 76.3%): L0
(  5)  6131785 (11.6%, 87.8%): L1
(  6)  3616192 ( 6.8%, 94.7%): L2
(  7)  1993894 ( 3.8%, 98.4%): L6
(  8)   676393 ( 1.3%, 99.7%): L7
(  9)    70194 ( 0.1%, 99.9%): L8
( 10)    40349 ( 0.1%, 99.9%): L9
( 11)    18673 ( 0.0%,100.0%): L10
( 12)     9843 ( 0.0%,100.0%): L11
( 13)     3749 ( 0.0%,100.0%): L12
( 14)     2253 ( 0.0%,100.0%): L13
( 15)     1632 ( 0.0%,100.0%): L17
( 16)      778 ( 0.0%,100.0%): L14
( 17)      318 ( 0.0%,100.0%): L15
( 18)      248 ( 0.0%,100.0%): L16
( 19)      209 ( 0.0%,100.0%): L24
( 20)      152 ( 0.0%,100.0%): L22
( 21)       94 ( 0.0%,100.0%): L23
( 22)       25 ( 0.0%,100.0%): L18
( 23)       25 ( 0.0%,100.0%): L21
( 24)       23 ( 0.0%,100.0%): L19
( 25)        7 ( 0.0%,100.0%): L20
( 26)        2 ( 0.0%,100.0%): L25

rustc
370386914 counts:
(  1) 71681717 (19.4%, 19.4%): L0
(  2) 69652930 (18.8%, 38.2%): L4
(  3) 69366057 (18.7%, 56.9%): L3
(  4) 62368764 (16.8%, 73.7%): L1
(  5) 37265248 (10.1%, 83.8%): L5
(  6) 29597104 ( 8.0%, 91.8%): L2
(  7) 19322268 ( 5.2%, 97.0%): L6
(  8) 10155160 ( 2.7%, 99.7%): L7
(  9)   417426 ( 0.1%, 99.8%): L8
( 10)   260701 ( 0.1%, 99.9%): L9
( 11)   152713 ( 0.0%,100.0%): L10
( 12)    72886 ( 0.0%,100.0%): L11
( 13)    37076 ( 0.0%,100.0%): L12
( 14)    21746 ( 0.0%,100.0%): L13
( 15)     8930 ( 0.0%,100.0%): L14
( 16)     2142 ( 0.0%,100.0%): L22
( 17)     1386 ( 0.0%,100.0%): L15
( 18)     1013 ( 0.0%,100.0%): L16
( 19)      560 ( 0.0%,100.0%): L23
( 20)      350 ( 0.0%,100.0%): L17
( 21)      293 ( 0.0%,100.0%): L24
( 22)      150 ( 0.0%,100.0%): L20
( 23)      108 ( 0.0%,100.0%): L25
( 24)       65 ( 0.0%,100.0%): L21
( 25)       64 ( 0.0%,100.0%): L19
( 26)       57 ( 0.0%,100.0%): L18

Here are the same results for base:

helloworld
20214 counts:
(  1)     8353 (41.3%, 41.3%): B23
(  2)     4133 (20.4%, 61.8%): B21
(  3)     3455 (17.1%, 78.9%): B22
(  4)     1929 ( 9.5%, 88.4%): B20
(  5)     1495 ( 7.4%, 95.8%): B0
(  6)      193 ( 1.0%, 96.8%): B15
(  7)      182 ( 0.9%, 97.7%): B19
(  8)      107 ( 0.5%, 98.2%): B24
(  9)      105 ( 0.5%, 98.7%): B18
( 10)      103 ( 0.5%, 99.2%): B5
( 11)       45 ( 0.2%, 99.4%): B14
( 12)       42 ( 0.2%, 99.6%): B16
( 13)       32 ( 0.2%, 99.8%): B6
( 14)       25 ( 0.1%, 99.9%): B4
( 15)        8 ( 0.0%,100.0%): B3
( 16)        7 ( 0.0%,100.0%): B2

ripgrep
4513315 counts:
(  1)   748159 (16.6%, 16.6%): B16
(  2)   721945 (16.0%, 32.6%): B18
(  3)   711070 (15.8%, 48.3%): B17
(  4)   644334 (14.3%, 62.6%): B0
(  5)   523121 (11.6%, 74.2%): B15
(  6)   280247 ( 6.2%, 80.4%): B24
(  7)   236609 ( 5.2%, 85.6%): B14
(  8)   219443 ( 4.9%, 90.5%): B23
(  9)   142768 ( 3.2%, 93.7%): B13
( 10)    64378 ( 1.4%, 95.1%): B21
( 11)    61704 ( 1.4%, 96.5%): B12
( 12)    50680 ( 1.1%, 97.6%): B22
( 13)    44914 ( 1.0%, 98.6%): B20
( 14)    30196 ( 0.7%, 99.3%): B11
( 15)    17740 ( 0.4%, 99.6%): B19
( 16)     8809 ( 0.2%, 99.8%): B10
( 17)     3173 ( 0.1%, 99.9%): B8
( 18)     2588 ( 0.1%,100.0%): B9
( 19)     1020 ( 0.0%,100.0%): B7
( 20)      164 ( 0.0%,100.0%): B6
( 21)      109 ( 0.0%,100.0%): B5
( 22)       71 ( 0.0%,100.0%): B4
( 23)       40 ( 0.0%,100.0%): B2
( 24)       23 ( 0.0%,100.0%): B3
( 25)       10 ( 0.0%,100.0%): B1

style-servo
52961408 counts:
(  1) 14743079 (27.8%, 27.8%): B24
(  2) 10758107 (20.3%, 48.2%): B25
(  3)  7817248 (14.8%, 62.9%): B0
(  4)  5507943 (10.4%, 73.3%): B21
(  5)  2837391 ( 5.4%, 78.7%): B20
(  6)  2065167 ( 3.9%, 82.6%): B19
(  7)  1957924 ( 3.7%, 86.3%): B17
(  8)  1917016 ( 3.6%, 89.9%): B18
(  9)  1551566 ( 2.9%, 92.8%): B15
( 10)  1166388 ( 2.2%, 95.0%): B16
( 11)   973654 ( 1.8%, 96.9%): B23
( 12)   613856 ( 1.2%, 98.0%): B14
( 13)   511654 ( 1.0%, 99.0%): B13
( 14)   216395 ( 0.4%, 99.4%): B12
( 15)   198358 ( 0.4%, 99.8%): B22
( 16)    86184 ( 0.2%, 99.9%): B11
( 17)    27818 ( 0.1%,100.0%): B10
( 18)     9600 ( 0.0%,100.0%): B9
( 19)     1434 ( 0.0%,100.0%): B8
( 20)      222 ( 0.0%,100.0%): B4
( 21)      215 ( 0.0%,100.0%): B7
( 22)       87 ( 0.0%,100.0%): B5
( 23)       70 ( 0.0%,100.0%): B3
( 24)       28 ( 0.0%,100.0%): B2
( 25)        3 ( 0.0%,100.0%): B1
( 26)        1 ( 0.0%,100.0%): B6

rustc
370386883 counts:
(  1) 67350517 (18.2%, 18.2%): B0
(  2) 46673993 (12.6%, 30.8%): B24
(  3) 46242997 (12.5%, 43.3%): B22
(  4) 40759067 (11.0%, 54.3%): B23
(  5) 37617157 (10.2%, 64.4%): B21
(  6) 30556919 ( 8.3%, 72.7%): B20
(  7) 30546917 ( 8.2%, 80.9%): B19
(  8) 19659648 ( 5.3%, 86.2%): B18
(  9) 19027989 ( 5.1%, 91.4%): B17
( 10) 12393685 ( 3.3%, 94.7%): B16
( 11)  7367651 ( 2.0%, 96.7%): B15
( 12)  4140107 ( 1.1%, 97.8%): B14
( 13)  3308420 ( 0.9%, 98.7%): B13
( 14)  2539587 ( 0.7%, 99.4%): B25
( 15)  1319804 ( 0.4%, 99.8%): B12
( 16)   515782 ( 0.1%, 99.9%): B11
( 17)   229554 ( 0.1%,100.0%): B10
( 18)    75830 ( 0.0%,100.0%): B9
( 19)    39175 ( 0.0%,100.0%): B8
( 20)    10555 ( 0.0%,100.0%): B7
( 21)     5048 ( 0.0%,100.0%): B6
( 22)     2138 ( 0.0%,100.0%): B4
( 23)     2074 ( 0.0%,100.0%): B5
( 24)     1428 ( 0.0%,100.0%): B3
( 25)      728 ( 0.0%,100.0%): B2
( 26)      113 ( 0.0%,100.0%): B1

Most lengths are short, with a hump at 3 and 4. Lengths drop off quite a bit going from 6 bits to 7, and a lot more going from 7 bits to 8.

Bases are more evenly spread out between zero and the maximum base. Because the above tables shows the number of bits, their entries are biased towards the larger sizes, because each additional bit doubles the span covered.

So, the effects of too few bits are quite different for length vs. base. For length, all programs will be affected roughly equally. E.g. dropping from 7 to 6 bits makes things a bit worse, dropping from 6 to 5 bits would be a lot worse, etc. For base, it depends on the crate size; any program that is big enough to greatly exceed the base maximum is going to face a performance cliff.

So this PR makes things universally slightly worse for lengths for all programs, but then makes things a lot better for bases for large crates.

@nnethercote
Copy link
Contributor Author

I thought about using another tag bit and then having two compression regimes, perhaps 23base/7len and 26base/4len... but it doesn't seem worth it.

@nnethercote
Copy link
Contributor Author

script-servo is even bigger than style-servo. Here is the important part of its base distribution:

130151231 counts:
(  1) 26758797 (20.6%, 20.6%): B26
(  2) 18808123 (14.5%, 35.0%): B0
(  3) 13406089 (10.3%, 45.3%): B21
(  4) 12769437 ( 9.8%, 55.1%): B22
(  5)  8856470 ( 6.8%, 61.9%): B20
(  6)  8474469 ( 6.5%, 68.4%): B25
(  7)  8405568 ( 6.5%, 74.9%): B24
(  8)  5744849 ( 4.4%, 79.3%): B18
(  9)  5051649 ( 3.9%, 83.2%): B16
( 10)  4888619 ( 3.8%, 86.9%): B17
( 11)  4404563 ( 3.4%, 90.3%): B19
( 12)  4011167 ( 3.1%, 93.4%): B15
( 13)  2885705 ( 2.2%, 95.6%): B14
( 14)  2079133 ( 1.6%, 97.2%): B23
( 15)  1572516 ( 1.2%, 98.4%): B13
( 16)   916387 ( 0.7%, 99.1%): B12
( 17)   873029 ( 0.7%, 99.8%): B11
( 18)   189718 ( 0.1%,100.0%): B10

20.6% require 26 bits...

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit b56778a3f620ca1c5270ab9d9973da37a07b9f62

@petrochenkov
Copy link
Contributor

I thought about using another tag bit and then having two compression regimes

We tried that in the original PR (with a perf run), it was slightly slower, apparently due to more complex encoding/decoding.

@bors r+

@bors
Copy link
Contributor

bors commented Apr 3, 2019

📌 Commit ff94fea has been approved by petrochenkov

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Apr 3, 2019
@petrochenkov
Copy link
Contributor

petrochenkov commented Apr 3, 2019

Future directions, mostly to satisfy curiosity I guess, it's hard to expect significant perf gains:

@petrochenkov
Copy link
Contributor

Actually, a good layout for 64-bit spans (if the span size is bumped) would probably be simply

struct Span {
    base: u32,
    len: u16,
    ctxt_and_tag: u16,
}

which is both machine and compiler friendly, and cover the statistics found in this thread and corner case like #36799 (comment).

@bors
Copy link
Contributor

bors commented Apr 3, 2019

⌛ Testing commit ff94fea with merge 0ba7d41...

bors added a commit that referenced this pull request Apr 3, 2019
Tweak `Span` encoding.

Failing to fit `base` is more common than failing to fit `len`.
@bors
Copy link
Contributor

bors commented Apr 3, 2019

☀️ Test successful - checks-travis, status-appveyor
Approved by: petrochenkov
Pushing 0ba7d41 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Apr 3, 2019
@bors bors merged commit ff94fea into rust-lang:master Apr 3, 2019
@nnethercote nnethercote deleted the tweak-Span-encoding branch April 4, 2019 01:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. S-waiting-on-perf Status: Waiting on a perf run to be completed. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants