Add RLE dump code #7691

Kubuxu · 2021-11-25T20:40:44Z

No description provided.

Signed-off-by: Jakub Sztandera <[email protected]>

codecov · 2021-11-25T20:49:51Z

Codecov Report

Merging #7691 (4d8be81) into master (ab55a6f) will decrease coverage by 0.15%.
The diff coverage is 0.00%.

@@            Coverage Diff             @@
##           master    #7691      +/-   ##
==========================================
- Coverage   39.56%   39.41%   -0.16%     
==========================================
  Files         637      637              
  Lines       67924    67979      +55     
==========================================
- Hits        26877    26796      -81     
- Misses      36445    36556     +111     
- Partials     4602     4627      +25

Impacted Files	Coverage Δ
cmd/lotus-shed/sectors.go	`0.00% <0.00%> (ø)`
node/modules/dtypes/mpool.go	`87.50% <0.00%> (-12.50%)`	⬇️
chain/events/message_cache.go	`87.50% <0.00%> (-12.50%)`	⬇️
blockstore/api.go	`24.00% <0.00%> (-8.00%)`	⬇️
blockstore/blockstore.go	`62.96% <0.00%> (-7.41%)`	⬇️
chain/events/observer.go	`71.64% <0.00%> (-6.72%)`	⬇️
miner/miner.go	`52.31% <0.00%> (-4.64%)`	⬇️
chain/stmgr/execute.go	`86.95% <0.00%> (-4.35%)`	⬇️
markets/storageadapter/ondealsectorcommitted.go	`77.33% <0.00%> (-4.00%)`	⬇️
chain/stmgr/call.go	`71.51% <0.00%> (-3.64%)`	⬇️
... and 17 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ab55a6f...4d8be81. Read the comment docs.

arajasek

Curious what motivated this

cmd/lotus-shed/sectors.go

Co-authored-by: Aayush Rajasekaran <[email protected]>

Kubuxu · 2021-11-26T13:57:48Z

I was analyzing how good/bad our RLE+ encoding is in practice.
Rough results:
A symbol in our case is a run.
Shannon Entropy of RLEs in AllocatedSectors: Η = 3.468 bits/symbol
RLE+ encoding: ΔH = 0.7687 bits/symbol
RLE+ with small modification: ΔH = 0.203 bits/symbol
RLE+ with Hufman: ΔH = ~0.0178 bits/symbol
RLE+ with Asymetric Numeral Systems: ΔH ~= 0.001 bits/symbol

We could save 18% of RLE+ storage but it is such a small fraction that it isn't worth the complexity right now (after collecting data I know that RLEs are only ~10MB of chain state, but the churn is frequent).
The major reason for the high ΔH of our RLE+ coding are 6-bit encodings of symbols 2 and 3.

If I were to design RLE+ today I would have gone with RLE with Huffman coding of small runs and additional data lengths.

Signed-off-by: Jakub Sztandera <[email protected]>

Add dump code

af113f8

Signed-off-by: Jakub Sztandera <[email protected]>

Kubuxu requested a review from a team as a code owner November 25, 2021 20:40

arajasek approved these changes Nov 25, 2021

View reviewed changes

cmd/lotus-shed/sectors.go Outdated Show resolved Hide resolved

cmd/lotus-shed/sectors.go Outdated Show resolved Hide resolved

Fix typo

e3c7b8d

Co-authored-by: Aayush Rajasekaran <[email protected]>

Add usage

4d8be81

Signed-off-by: Jakub Sztandera <[email protected]>

jennijuju merged commit a4c2a20 into master Nov 26, 2021

jennijuju deleted the misc/rle-dump branch November 26, 2021 22:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add RLE dump code #7691

Add RLE dump code #7691

Kubuxu commented Nov 25, 2021

codecov bot commented Nov 25, 2021 •

edited

Loading

arajasek left a comment

Kubuxu commented Nov 26, 2021 •

edited

Loading

Add RLE dump code #7691

Add RLE dump code #7691

Conversation

Kubuxu commented Nov 25, 2021

codecov bot commented Nov 25, 2021 • edited Loading

Codecov Report

arajasek left a comment

Choose a reason for hiding this comment

Kubuxu commented Nov 26, 2021 • edited Loading

codecov bot commented Nov 25, 2021 •

edited

Loading

Kubuxu commented Nov 26, 2021 •

edited

Loading