calculate plot size given the parameter k #471

hackerzhuli · 2021-05-15T13:17:28Z

Using a formula instead of a constant is better for improving plotman to work with plots with a different k.

Formula from Chia Proof of Space Construction "Space Required" section.

hackerzhuli · 2021-05-15T13:26:42Z

Test failed because the test file size 100GB is too close to rule out that it can be a valid k32 plot.

altendky

I think perhaps that document is out of date. All my plots are in the 108.8GB range rather than 104.7GB. Do you have time to dig into this discrepancy?

$ ls -l /farm/sites/007 | tail -n 5
-rw-r--r-- 1 chia chia 108861233015 May 15 05:30 plot-k32-2021-05-14-13-46-8b913363152db776f47dd84a73b28c213c479ad9eb6cf9c5b62f5e53b260f945.plot
-rw-r--r-- 1 chia chia 108833740950 May 15 07:50 plot-k32-2021-05-14-16-07-8f281deb8a61e7d8240a3c023b40fdc4ae37d30221af1e18a36a1248da629740.plot
-rw-r--r-- 1 chia chia 108824815844 May 15 09:30 plot-k32-2021-05-14-17-48-ad72c2d71419d10b4ae4ecc13d8e453ab154f77c7a07d254a47e781fb3519b7c.plot
-rw-r--r-- 1 chia chia 108790088102 May 15 11:18 plot-k32-2021-05-14-19-31-b981549c5e8bd3f26b1c47b10c400b42e79ec79dc3d7c40ea41ef223297923f3.plot
-rw-r--r-- 1 chia chia 108884412258 May 15 13:09 plot-k32-2021-05-14-21-15-8fec2777aa2cc7097fd52ea4ce7d70c3f6f494e3b55688ef914fa8574477c700.plot

$ python -c 'print(int(0.762 * 32 * 2**32))'
104728482545

hackerzhuli · 2021-05-15T14:34:29Z

I'm not sure where the diffrence come from. I'll take a look into it.

hackerzhuli · 2021-05-15T17:36:03Z

The formula in the document, assumes that each entry on average takes k bits. It is close but inaccurate. Entries are "parked", that is, each 2048 entries are stored together as a group. Even though the actual sizes of each park (in bytes) are different, chiapos used constant park size for each table to increase performance. This resulted in empty spaces in the parks.

By looking into the source code of chiapos, I am able to find exactly how park size is calculated and thus account for some of the space differences. I wrote code to find out the plotsize taking park sizes into account. plot_size.py:

# the following constants are from chiapos pos_constants.hpp

# EPP for the final file, the higher this is, the less variability, and lower delta
# Note: if this is increased, ParkVector size must increase
kEntriesPerPark = 2048

# To store deltas for EPP entries, the average delta must be less than this number of bits
kMaxAverageDeltaTable1 = 5.6;
kMaxAverageDelta = 3.5

# The number of bits in the stub is k minus this value
kStubMinusBits = 3

def calc_average_size_of_entry(k, table_index):
    '''
    calculate the average size of entries in bytes
    the average size of entry is approximately k/8 bytes
    but due to wasted space of parks, it is larger than that
    '''
    return calc_park_size(k, table_index) / kEntriesPerPark

def byte_align(num_bits):
    return (num_bits + (8 - ((num_bits) % 8)) % 8)

def calc_park_size(k, table_index):
    '''
    park size in bytes for storing kEntriesPerPark = 2048 entries
    it is approximately kEntriesPerPark * k / 8, but it is actually larger than that
    derived from chiapos EntrySizes::CalculateParkSize
    '''
    line_point_size_bits = byte_align(2 * k)
    stub_size_bits = byte_align((kEntriesPerPark - 1) * (k - kStubMinusBits))
    max_delta_size_bits = byte_align((kEntriesPerPark - 1) * kMaxAverageDeltaTable1 if table_index == 1 else (kEntriesPerPark - 1) * kMaxAverageDelta)
    return (line_point_size_bits + stub_size_bits + max_delta_size_bits) / 8

def get_probability_of_entries_kept(k, table_index):
    '''
    get the probibility of entries in table of table_index that is not dropped
    the formula is derived from https://www.chia.net/assets/proof_of_space.pdf,  section Space Required, p5 and pt
    '''

    if table_index > 5:
        return 1

    power = 2**k
    
    if table_index == 5:
        return 1 - (1 - 2 / power) ** power    # derived from Space Required p5
    else:
        return 1 - (1 - 2 / power) ** (get_probability_of_entries_kept(k, table_index + 1) * power) # derived from Space Required pt

def get_plotsize_scaler(k):
    '''
    get scaler for plot size so that the plot size can be calculated by scaler * k * 2 ** k
    '''

    result = 0
    for i in range(1, 8):
        probability = get_probability_of_entries_kept(k, i)
        average_size_of_entry = calc_average_size_of_entry(k, i)
        scaler_for_table = probability * average_size_of_entry / k
        #print(f"probability = {probability}, size = {average_size_of_entry}, scaler = {scaler_for_table}")
        result += scaler_for_table

    return result    

if __name__ == "__main__":
    for k in range(1, 41):
        scaler = get_plotsize_scaler(k)
        size = int(scaler * k * 2**k) 
        print(f"k = {k}, scaler = {scaler:.3f}, size = {size}")

console output:

k = 1, scaler = 1.582, size = 3
k = 2, scaler = 1.160, size = 9
k = 3, scaler = 1.005, size = 24
k = 4, scaler = 0.931, size = 59
k = 5, scaler = 0.891, size = 142
k = 6, scaler = 0.866, size = 332
k = 7, scaler = 0.849, size = 760
k = 8, scaler = 0.837, size = 1715
k = 9, scaler = 0.829, size = 3819
k = 10, scaler = 0.822, size = 8416
k = 11, scaler = 0.816, size = 18384
k = 12, scaler = 0.811, size = 39885
k = 13, scaler = 0.808, size = 86031
k = 14, scaler = 0.805, size = 184538
k = 15, scaler = 0.802, size = 394032
k = 16, scaler = 0.799, size = 837981
k = 17, scaler = 0.797, size = 1776187
k = 18, scaler = 0.795, size = 3752047
k = 19, scaler = 0.793, size = 7901885
k = 20, scaler = 0.792, size = 16602476
k = 21, scaler = 0.790, size = 34808608
k = 22, scaler = 0.789, size = 72812055
k = 23, scaler = 0.788, size = 152013792
k = 24, scaler = 0.787, size = 316806954
k = 25, scaler = 0.786, size = 659272492
k = 26, scaler = 0.785, size = 1369662479
k = 27, scaler = 0.784, size = 2841160598
k = 28, scaler = 0.783, size = 5886791196
k = 29, scaler = 0.783, size = 12184119819
k = 30, scaler = 0.782, size = 25186119655
k = 31, scaler = 0.781, size = 52007999351
k = 32, scaler = 0.781, size = 107287518791
k = 33, scaler = 0.780, size = 221143636517
k = 34, scaler = 0.780, size = 455373353413
k = 35, scaler = 0.779, size = 936816632588
k = 36, scaler = 0.779, size = 1925977586715
k = 37, scaler = 0.778, size = 3957052756528
k = 38, scaler = 0.778, size = 8123482799240
k = 39, scaler = 0.777, size = 16665720170855
k = 40, scaler = 0.777, size = 34168949486458

The size for k = 32 is 107.29 GB, which is much more accurate than result by the formula in the document. It turns out the scaler is different for different k, probabally due to space waste differences.

However, plot_size.py is too complicated, I don't think we should include it in plotman. So, I will keep it as a reference here. I guess we should just hard code the scalers of different k s.

altendky · 2021-05-15T17:53:10Z

Is there any of the Chia code that would be useful to copy and paste? We already to that for the command line parsing of plotting processes so we could do it for other bits as needed. Also, there is a mechanism to track the different versions of chia, though they haven't changed that code since we started copying it.

https://github.com/ericaltendorf/plotman/blob/1876d4cbda73fb38584f95113a00f01af1e21704/src/plotman/chia.py

altendky · 2021-05-15T17:53:21Z

Oh, and thanks for the hard work here. :]

hackerzhuli · 2021-05-15T18:00:23Z

We cannot copy the code in this case, because chiapos is implemented in C++. I changed them a bit to run in python. Also, I think the code is too complex for this simple usage. I guess we should just hard code the scalers.

altendky · 2021-05-15T20:16:14Z

I would default to having the actual calculations present in the code. If they are kind of slow then we can run them as needed, or at start up, and cache results. If they are super slow then we can consider pre-generation.

I think it is reasonable to continue these here and view this PR as a general improvement on the assumptions and simplifications that plotman has at present. It would be good to provide pinned links in the code to the C++ code it 'mirrors'. This allows for easier maintenance and verification. Let's pin the links to release versions as in the link I shared for the copied code.

I'm thinking I may restructure that to have a separate module per version like plotman/chia/v1_1_5.py or such and an interface to get at each by just latest version or a specific version. But, I can follow up with that when I have time. In case I hadn't mentioned it to you, I forget where I wrote it, I imagine that we could identify the version of chia that each plotting process is using and use the proper cli parser etc for it. Someday...

hackerzhuli · 2021-05-16T06:57:59Z

I moved to code that is from chiapos to chiapos.py, since it is different from chia-blockchain.

I think code ported from chiapos should not be duplicated every time there is a new version, since the code from chiapos is much less likely to change.

…et_plotsize

hackerzhuli · 2021-05-17T14:11:26Z

I just realized that the size calculated here is the minimal size of the plot file. So we expect an actual file will be about 1-5% larger than it(I guess, there is no proof the difference is less than 5% though). This can be used to tell whether a plot file is of any size K .

hackerzhuli added 2 commits May 15, 2021 21:09

calculate plot size given the parameter k

1f2276b

converts size from float to int

71feada

altendky requested changes May 15, 2021

View reviewed changes

hackerzhuli added 4 commits May 16, 2021 14:43

calculates plotsize given parameter k more accurately

65f23ea

changes position of variable to avoid conflict

55e8905

add back line that was accidently removed

c822b67

Merge branch 'development' into get_plotsize

0ebec09

altendky mentioned this pull request May 16, 2021

k32 plot sizes approach 109GB #486

Closed

Merge branch 'development' of github.com:ericaltendorf/plotman into g…

0f250fe

…et_plotsize

hackerzhuli requested a review from altendky May 17, 2021 14:11

altendky added 3 commits May 29, 2021 21:54

Merge branch 'development' into get_plotsize

a0cf3bf

add test for get_plotsize()

cc27431

Merge branch 'development' into get_plotsize

f9af424

altendky approved these changes May 30, 2021

View reviewed changes

altendky merged commit 500ac12 into ericaltendorf:development May 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

calculate plot size given the parameter k #471

calculate plot size given the parameter k #471

hackerzhuli commented May 15, 2021 •

edited

Loading

hackerzhuli commented May 15, 2021 •

edited

Loading

altendky left a comment

hackerzhuli commented May 15, 2021

hackerzhuli commented May 15, 2021 •

edited

Loading

altendky commented May 15, 2021

altendky commented May 15, 2021

hackerzhuli commented May 15, 2021 •

edited

Loading

altendky commented May 15, 2021

hackerzhuli commented May 16, 2021

hackerzhuli commented May 17, 2021 •

edited

Loading

calculate plot size given the parameter k #471

calculate plot size given the parameter k #471

Conversation

hackerzhuli commented May 15, 2021 • edited Loading

hackerzhuli commented May 15, 2021 • edited Loading

altendky left a comment

Choose a reason for hiding this comment

hackerzhuli commented May 15, 2021

hackerzhuli commented May 15, 2021 • edited Loading

altendky commented May 15, 2021

altendky commented May 15, 2021

hackerzhuli commented May 15, 2021 • edited Loading

altendky commented May 15, 2021

hackerzhuli commented May 16, 2021

hackerzhuli commented May 17, 2021 • edited Loading

hackerzhuli commented May 15, 2021 •

edited

Loading

hackerzhuli commented May 15, 2021 •

edited

Loading

hackerzhuli commented May 15, 2021 •

edited

Loading

hackerzhuli commented May 15, 2021 •

edited

Loading

hackerzhuli commented May 17, 2021 •

edited

Loading