Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

calculate plot size given the parameter k #471

Merged
merged 10 commits into from
May 30, 2021

Conversation

hackerzhuli
Copy link
Contributor

@hackerzhuli hackerzhuli commented May 15, 2021

Using a formula instead of a constant is better for improving plotman to work with plots with a different k.

Formula from Chia Proof of Space Construction "Space Required" section.

image

@hackerzhuli
Copy link
Contributor Author

hackerzhuli commented May 15, 2021

Test failed because the test file size 100GB is too close to rule out that it can be a valid k32 plot.

Copy link
Collaborator

@altendky altendky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think perhaps that document is out of date. All my plots are in the 108.8GB range rather than 104.7GB. Do you have time to dig into this discrepancy?

$ ls -l /farm/sites/007 | tail -n 5
-rw-r--r-- 1 chia chia 108861233015 May 15 05:30 plot-k32-2021-05-14-13-46-8b913363152db776f47dd84a73b28c213c479ad9eb6cf9c5b62f5e53b260f945.plot
-rw-r--r-- 1 chia chia 108833740950 May 15 07:50 plot-k32-2021-05-14-16-07-8f281deb8a61e7d8240a3c023b40fdc4ae37d30221af1e18a36a1248da629740.plot
-rw-r--r-- 1 chia chia 108824815844 May 15 09:30 plot-k32-2021-05-14-17-48-ad72c2d71419d10b4ae4ecc13d8e453ab154f77c7a07d254a47e781fb3519b7c.plot
-rw-r--r-- 1 chia chia 108790088102 May 15 11:18 plot-k32-2021-05-14-19-31-b981549c5e8bd3f26b1c47b10c400b42e79ec79dc3d7c40ea41ef223297923f3.plot
-rw-r--r-- 1 chia chia 108884412258 May 15 13:09 plot-k32-2021-05-14-21-15-8fec2777aa2cc7097fd52ea4ce7d70c3f6f494e3b55688ef914fa8574477c700.plot
$ python -c 'print(int(0.762 * 32 * 2**32))'
104728482545

@hackerzhuli
Copy link
Contributor Author

I'm not sure where the diffrence come from. I'll take a look into it.

@hackerzhuli
Copy link
Contributor Author

hackerzhuli commented May 15, 2021

The formula in the document, assumes that each entry on average takes k bits. It is close but inaccurate. Entries are "parked", that is, each 2048 entries are stored together as a group. Even though the actual sizes of each park (in bytes) are different, chiapos used constant park size for each table to increase performance. This resulted in empty spaces in the parks.

By looking into the source code of chiapos, I am able to find exactly how park size is calculated and thus account for some of the space differences. I wrote code to find out the plotsize taking park sizes into account. plot_size.py:

# the following constants are from chiapos pos_constants.hpp

# EPP for the final file, the higher this is, the less variability, and lower delta
# Note: if this is increased, ParkVector size must increase
kEntriesPerPark = 2048

# To store deltas for EPP entries, the average delta must be less than this number of bits
kMaxAverageDeltaTable1 = 5.6;
kMaxAverageDelta = 3.5

# The number of bits in the stub is k minus this value
kStubMinusBits = 3

def calc_average_size_of_entry(k, table_index):
    '''
    calculate the average size of entries in bytes
    the average size of entry is approximately k/8 bytes
    but due to wasted space of parks, it is larger than that
    '''
    return calc_park_size(k, table_index) / kEntriesPerPark

def byte_align(num_bits):
    return (num_bits + (8 - ((num_bits) % 8)) % 8)

def calc_park_size(k, table_index):
    '''
    park size in bytes for storing kEntriesPerPark = 2048 entries
    it is approximately kEntriesPerPark * k / 8, but it is actually larger than that
    derived from chiapos EntrySizes::CalculateParkSize
    '''
    line_point_size_bits = byte_align(2 * k)
    stub_size_bits = byte_align((kEntriesPerPark - 1) * (k - kStubMinusBits))
    max_delta_size_bits = byte_align((kEntriesPerPark - 1) * kMaxAverageDeltaTable1 if table_index == 1 else (kEntriesPerPark - 1) * kMaxAverageDelta)
    return (line_point_size_bits + stub_size_bits + max_delta_size_bits) / 8

def get_probability_of_entries_kept(k, table_index):
    '''
    get the probibility of entries in table of table_index that is not dropped
    the formula is derived from https://www.chia.net/assets/proof_of_space.pdf,  section Space Required, p5 and pt
    '''

    if table_index > 5:
        return 1

    power = 2**k
    
    if table_index == 5:
        return 1 - (1 - 2 / power) ** power    # derived from Space Required p5
    else:
        return 1 - (1 - 2 / power) ** (get_probability_of_entries_kept(k, table_index + 1) * power) # derived from Space Required pt

def get_plotsize_scaler(k):
    '''
    get scaler for plot size so that the plot size can be calculated by scaler * k * 2 ** k
    '''

    result = 0
    for i in range(1, 8):
        probability = get_probability_of_entries_kept(k, i)
        average_size_of_entry = calc_average_size_of_entry(k, i)
        scaler_for_table = probability * average_size_of_entry / k
        #print(f"probability = {probability}, size = {average_size_of_entry}, scaler = {scaler_for_table}")
        result += scaler_for_table

    return result    

if __name__ == "__main__":
    for k in range(1, 41):
        scaler = get_plotsize_scaler(k)
        size = int(scaler * k * 2**k) 
        print(f"k = {k}, scaler = {scaler:.3f}, size = {size}")

console output:

k = 1, scaler = 1.582, size = 3
k = 2, scaler = 1.160, size = 9
k = 3, scaler = 1.005, size = 24
k = 4, scaler = 0.931, size = 59
k = 5, scaler = 0.891, size = 142
k = 6, scaler = 0.866, size = 332
k = 7, scaler = 0.849, size = 760
k = 8, scaler = 0.837, size = 1715
k = 9, scaler = 0.829, size = 3819
k = 10, scaler = 0.822, size = 8416
k = 11, scaler = 0.816, size = 18384
k = 12, scaler = 0.811, size = 39885
k = 13, scaler = 0.808, size = 86031
k = 14, scaler = 0.805, size = 184538
k = 15, scaler = 0.802, size = 394032
k = 16, scaler = 0.799, size = 837981
k = 17, scaler = 0.797, size = 1776187
k = 18, scaler = 0.795, size = 3752047
k = 19, scaler = 0.793, size = 7901885
k = 20, scaler = 0.792, size = 16602476
k = 21, scaler = 0.790, size = 34808608
k = 22, scaler = 0.789, size = 72812055
k = 23, scaler = 0.788, size = 152013792
k = 24, scaler = 0.787, size = 316806954
k = 25, scaler = 0.786, size = 659272492
k = 26, scaler = 0.785, size = 1369662479
k = 27, scaler = 0.784, size = 2841160598
k = 28, scaler = 0.783, size = 5886791196
k = 29, scaler = 0.783, size = 12184119819
k = 30, scaler = 0.782, size = 25186119655
k = 31, scaler = 0.781, size = 52007999351
k = 32, scaler = 0.781, size = 107287518791
k = 33, scaler = 0.780, size = 221143636517
k = 34, scaler = 0.780, size = 455373353413
k = 35, scaler = 0.779, size = 936816632588
k = 36, scaler = 0.779, size = 1925977586715
k = 37, scaler = 0.778, size = 3957052756528
k = 38, scaler = 0.778, size = 8123482799240
k = 39, scaler = 0.777, size = 16665720170855
k = 40, scaler = 0.777, size = 34168949486458

The size for k = 32 is 107.29 GB, which is much more accurate than result by the formula in the document. It turns out the scaler is different for different k, probabally due to space waste differences.

However, plot_size.py is too complicated, I don't think we should include it in plotman. So, I will keep it as a reference here. I guess we should just hard code the scalers of different k s.

@altendky
Copy link
Collaborator

Is there any of the Chia code that would be useful to copy and paste? We already to that for the command line parsing of plotting processes so we could do it for other bits as needed. Also, there is a mechanism to track the different versions of chia, though they haven't changed that code since we started copying it.

https://github.com/ericaltendorf/plotman/blob/1876d4cbda73fb38584f95113a00f01af1e21704/src/plotman/chia.py

@altendky
Copy link
Collaborator

Oh, and thanks for the hard work here. :]

@hackerzhuli
Copy link
Contributor Author

hackerzhuli commented May 15, 2021

We cannot copy the code in this case, because chiapos is implemented in C++. I changed them a bit to run in python. Also, I think the code is too complex for this simple usage. I guess we should just hard code the scalers.

@altendky
Copy link
Collaborator

I would default to having the actual calculations present in the code. If they are kind of slow then we can run them as needed, or at start up, and cache results. If they are super slow then we can consider pre-generation.

I think it is reasonable to continue these here and view this PR as a general improvement on the assumptions and simplifications that plotman has at present. It would be good to provide pinned links in the code to the C++ code it 'mirrors'. This allows for easier maintenance and verification. Let's pin the links to release versions as in the link I shared for the copied code.

I'm thinking I may restructure that to have a separate module per version like plotman/chia/v1_1_5.py or such and an interface to get at each by just latest version or a specific version. But, I can follow up with that when I have time. In case I hadn't mentioned it to you, I forget where I wrote it, I imagine that we could identify the version of chia that each plotting process is using and use the proper cli parser etc for it. Someday...

@hackerzhuli
Copy link
Contributor Author

I moved to code that is from chiapos to chiapos.py, since it is different from chia-blockchain.

I think code ported from chiapos should not be duplicated every time there is a new version, since the code from chiapos is much less likely to change.

@hackerzhuli
Copy link
Contributor Author

hackerzhuli commented May 17, 2021

I just realized that the size calculated here is the minimal size of the plot file. So we expect an actual file will be about 1-5% larger than it(I guess, there is no proof the difference is less than 5% though). This can be used to tell whether a plot file is of any size K .

@hackerzhuli hackerzhuli requested a review from altendky May 17, 2021 14:11
@altendky altendky merged commit 500ac12 into ericaltendorf:development May 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants