Feat/c32 optimizations #3075

zone117x · 2022-03-15T22:40:49Z

Various optimizations in the C32 address decoding paths. Use a decoding table (the inverse of the encoding table/alphabet) in the decoding hot path (previously an O(n) utf8 char scan was used). Reduce some allocs & copies.

zone117x · 2022-03-15T23:08:18Z

stacks-common/src/address/c32.rs

-    let norm_str: String = input_str
-        .to_uppercase()
-        .replace("O", "0")
-        .replace("L", "1")
-        .replace("I", "1");


Reductions of the char scans and allocs here is one of the primary optimizations in this PR

Where did the normalization code go?

This normalization fn was no longer called once the encoding logic was changed to use a faster C32 alphabet lookup using the C32_CHARACTERS_MAP table. If you take a look at the code there, you'll see where the lowercase and special chars are included. I'm happy to add it back (will be dead code though).

zone117x · 2022-03-15T23:10:24Z

stacks-common/src/address/c32.rs

-    let iter_c32_digits_opts: Vec<Option<usize>> = c32_normalize(input_str)
-        .chars()
-        .rev()
-        .map(|x| C32_CHARACTERS.find(x))


Refactor of this find from O(n) to O(1) in the hot path is one of the primary optimizations in this PR

zone117x · 2022-03-15T23:12:22Z

stacks-common/src/address/c32.rs

+///     table[pair.0.to_ascii_lowercase() as usize] = i;
+/// }
+/// ```
+const C32_CHARACTERS_MAP: [i8; 128] = [


I experimented with trying both lazy_static and const fn for this, neither seemed right, so I took the table result and inlined it. The rust code doc has compilation validation support (at least in the tools I'm using). So seems like this is the best approach. But I'm open to alternatives!

codecov · 2022-03-15T23:37:52Z

Codecov Report

Merging #3075 (b1061ef) into develop (1e0b80b) will increase coverage by 0.03%.
The diff coverage is 91.59%.

@@             Coverage Diff             @@
##           develop    #3075      +/-   ##
===========================================
+ Coverage    83.72%   83.76%   +0.03%     
===========================================
  Files          259      260       +1     
  Lines       200777   200932     +155     
===========================================
+ Hits        168093   168302     +209     
+ Misses       32684    32630      -54

Impacted Files	Coverage Δ
stacks-common/src/address/mod.rs	`81.17% <ø> (ø)`
stacks-common/src/address/c32_old.rs	`89.77% <89.77%> (ø)`
stacks-common/src/address/c32.rs	`97.55% <97.82%> (-0.34%)`	⬇️
stacks-common/src/util/hash.rs	`82.47% <100.00%> (-1.11%)`	⬇️
clarity/src/vm/ast/traits_resolver/mod.rs	`94.44% <0.00%> (-3.97%)`	⬇️
src/net/neighbors.rs	`36.84% <0.00%> (-1.59%)`	⬇️
src/net/poll.rs	`60.91% <0.00%> (-1.53%)`	⬇️
src/net/dns.rs	`90.25% <0.00%> (-0.86%)`	⬇️
clarity/src/vm/database/key_value_wrapper.rs	`96.51% <0.00%> (-0.32%)`	⬇️
... and 21 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1e0b80b...b1061ef. Read the comment docs.

jcnelson

So, this is all consensus critical code you're touching. We use the bare addresses in the chainstate MARF, which affects the block state root hashes. At a minimum, I need to see unit tests demonstrating that this new code behaves exactly the same way as the old code.

zone117x · 2022-03-16T10:15:03Z

The existing tests cover these paths pretty extensively, e.g. one for the normalization code:
https://github.com/stacks-network/stacks-blockchain/blob/03107740595e5359f2381dfb97b5ee153125c44a/stacks-common/src/address/c32.rs#L383-L404

I'm happy to add more, but not really seeing any gaps in testing, would appreciate some pointers here.

jcnelson · 2022-03-16T15:52:50Z

A couple things:

Because this PR touches consensus-critical code to implement better performance, I'd like to know what exactly the performance impact is. If it's not very big in practice, then I'd rather not merge this PR. It's better to have slightly-slower but known-good code in the consensus-critical paths than fast code that we're less than 100% certain of. Like, does this PR improve the node bootup speed? Does it improve p2p or RPC responsiveness? Does it improve the miner speed?
If this PR yields a significant improvement in performance, then the test I'd like to see added is to do the following:
- Keep the old code around, but decorated as #[cfg(test)].
- Generate lots and lots of addresses with the old code, and verify that they decode via the new code exactly the same way as they would in the old code.
- Generate lots and lots of addresses with the new code, and verify that they decode via the old code exactly the same way as they would in the new code.
We do something like this with the MARF PR I have open, for example -- it keeps all the old code paths around, and verifies that the new code paths generate exactly the same trie root hashes as the old code paths given the same key/value insert schedule.

zone117x · 2022-03-16T16:06:42Z

Sounds reasonable, keeping the old code around would also help with generating benchmarks. Btw, this code is in a library "stacks-common" which is used in projects outside the stacks-node, so I wouldn't be surprised if there was a perf gain that doesn't help with the stacks-node operations you've listed, but could be significant for other use cases. Seems like these would still be good things to upstream (assuming the consensus-critical change requirements you've outlined are met).

zone117x · 2022-03-16T18:09:04Z

@jcnelson how does this look? Added tests to generate addrs with new and old code, decode both with new and old code, verifying they all match. I ran it with 500,000 iterations of random input, and tests pass (takes a few minutes). I committed a version of the test that performs 5000 iterations.

Then added a c32_bench which shows around a 4x perf increase:

C32 Decoding/Legacy/S01JQF9J50WT0EZSN30CH1KQSGR9H0PPHNCHPH0AG                                                                             
                        time:   [1.9297 us 1.9448 us 1.9707 us]
C32 Decoding/Updated/S01JQF9J50WT0EZSN30CH1KQSGR9H0PPHNCHPH0AG                                                                              
                        time:   [499.92 ns 502.18 ns 505.40 ns]
C32 Decoding/Legacy/S62TSHKFJ7BRQFTFJ6BN0645E85NPYZ43G0AD7D71                                                                             
                        time:   [1.9495 us 1.9565 us 1.9633 us]
C32 Decoding/Updated/S62TSHKFJ7BRQFTFJ6BN0645E85NPYZ43G0AD7D71                                                                            
                        time:   [497.96 ns 499.04 ns 500.14 ns]
C32 Decoding/Legacy/S83AHFYJ817AYTZKKVSMR1XTEG31QQDF6BQP5PN70                                                                             
                        time:   [1.9602 us 1.9659 us 1.9712 us]
C32 Decoding/Updated/S83AHFYJ817AYTZKKVSMR1XTEG31QQDF6BQP5PN70                                                                              
                        time:   [498.07 ns 499.26 ns 500.42 ns]
C32 Decoding/Legacy/S1TXJX2Z31R3XQW7VT15XR01JYHAW2P4056GTGSG                                                                             
                        time:   [1.9368 us 1.9417 us 1.9466 us]
C32 Decoding/Updated/S1TXJX2Z31R3XQW7VT15XR01JYHAW2P4056GTGSG                                                                            
                        time:   [496.47 ns 497.27 ns 498.07 ns]
C32 Decoding/Legacy/SG3KAEP0GX797TPWTRX3BB1DKSQ0NPPWVDHY3TW09                                                                             
                        time:   [1.9559 us 1.9619 us 1.9679 us]
C32 Decoding/Updated/SG3KAEP0GX797TPWTRX3BB1DKSQ0NPPWVDHY3TW09                                                                             
                        time:   [498.71 ns 499.79 ns 500.87 ns]

stacks-common/src/address/mod.rs

jcnelson · 2022-04-14T15:46:23Z

stacks-common/src/address/c32.rs

-        .map(|ref_x| *ref_x)
+        .rev()
+        .filter_map(|x| C32_CHARACTERS_MAP.get(*x as usize))
+        .filter_map(|v| u8::try_from(*v).ok())


This code seems to depend on u8::try_from(*v) returning None if *v is -1. Can this code be more explicit about this? Such as:

.filter_map(|v| { if *v < 0 { None } else { // infallible due to the above check Some(*v as u8) } })

u8::try_from is very explicit about conversions: negative inputs will always error, as will any other input outside the range of u8. x as u8 is a much more convoluted conversion: 1000 as u8 returns 232 (it neither is a saturating conversion, nor a panic).

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=c687a7113c8758f439bfd06a96a4957c

It may actually be better to use an array of Option(u8) rather than -1i8 for the "null" sentinel values (this is what the b58 table does). I'll give that a try when I address the other comments and see if there's any unexpected perf issues with that approach.

I changed the table from [i8; 128] to [Option<u8>; 128] which gets rid of this integer conversion, and the benchmark shows a slight perf increase.

stacks-common/src/address/c32.rs

stacks-common/src/address/c32_old.rs

jcnelson

This LGTM, but please address my comments before merging. Thanks!

kantai

This looks good to me, the only comments I think need to be addressed are the cfg(test) decoration for the old code, and including the normalization rules in the docstring.

blockstack-devops · 2024-11-18T00:22:49Z

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

zone117x added 4 commits March 15, 2022 23:37

feat: c32 optimizations

6f5fe51

chore: c32 optimizations, round 2

7ace1e9

chore: c32 optimizations, round 3

80ad9d2

feat: reduce allocs in sha256 digests

a1a681a

zone117x requested review from kantai and jcnelson March 15, 2022 22:46

zone117x added 2 commits March 15, 2022 23:48

chore: remove redundant ascii check scan

1a9ef3c

chore: expand comment on C32_CHARACTERS_MAP table

0310774

zone117x commented Mar 15, 2022

View reviewed changes

jcnelson reviewed Mar 16, 2022

View reviewed changes

zone117x added 2 commits March 16, 2022 18:27

test: preserve previous c32 code and use for regression testing

683e2c2

bench: c32 decode function benchmarks

a280deb

Merge branch 'develop' into feat/c32-optimizations

d537703

pavitthrap reviewed Mar 22, 2022

View reviewed changes

stacks-common/src/address/mod.rs Show resolved Hide resolved

jcnelson reviewed Apr 14, 2022

View reviewed changes

stacks-common/src/address/c32.rs Show resolved Hide resolved

jcnelson reviewed Apr 14, 2022

View reviewed changes

stacks-common/src/address/c32_old.rs Show resolved Hide resolved

jcnelson approved these changes Apr 14, 2022

View reviewed changes

kantai approved these changes Apr 19, 2022

View reviewed changes

zone117x added 4 commits April 19, 2022 17:24

Merge branch 'develop' into feat/c32-optimizations

ed71280

chore: only compile c32_old in test builds

5ccbee4

chore: expand doc string for c32 table normalization rules

6355b76

chore: docstring for c32_old module purpose

b491970

chore: use Option<u8> for c32 table rather than i8

5fcdb07

zone117x force-pushed the feat/c32-optimizations branch from ab381fc to 5fcdb07 Compare April 19, 2022 17:31

zone117x added 2 commits April 19, 2022 19:34

chore: cargo fmt

e2563fa

chore: preserve array indexer unwrap

773026e

kantai approved these changes Apr 19, 2022

View reviewed changes

Merge branch 'develop' into feat/c32-optimizations

b1061ef

zone117x merged commit d2f31ca into develop Apr 20, 2022

zone117x deleted the feat/c32-optimizations branch April 20, 2022 12:41

blockstack-devops added the locked label Nov 18, 2024

stacks-network locked as resolved and limited conversation to collaborators Nov 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/c32 optimizations #3075

Feat/c32 optimizations #3075

zone117x commented Mar 15, 2022 •

edited

Loading

zone117x Mar 15, 2022

jcnelson Mar 16, 2022

zone117x Mar 16, 2022

zone117x Mar 15, 2022

zone117x Mar 15, 2022

codecov bot commented Mar 15, 2022 •

edited

Loading

jcnelson left a comment

zone117x commented Mar 16, 2022

jcnelson commented Mar 16, 2022

zone117x commented Mar 16, 2022

zone117x commented Mar 16, 2022

jcnelson Apr 14, 2022

kantai Apr 19, 2022

zone117x Apr 19, 2022

zone117x Apr 19, 2022

jcnelson left a comment

kantai left a comment

blockstack-devops commented Nov 18, 2024

Feat/c32 optimizations #3075

Feat/c32 optimizations #3075

Conversation

zone117x commented Mar 15, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Mar 15, 2022 • edited Loading

Codecov Report

jcnelson left a comment

Choose a reason for hiding this comment

zone117x commented Mar 16, 2022

jcnelson commented Mar 16, 2022

zone117x commented Mar 16, 2022

zone117x commented Mar 16, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jcnelson left a comment

Choose a reason for hiding this comment

kantai left a comment

Choose a reason for hiding this comment

blockstack-devops commented Nov 18, 2024

zone117x commented Mar 15, 2022 •

edited

Loading

codecov bot commented Mar 15, 2022 •

edited

Loading