-
Notifications
You must be signed in to change notification settings - Fork 279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
calculate plot size given the parameter k #471
calculate plot size given the parameter k #471
Conversation
Test failed because the test file size 100GB is too close to rule out that it can be a valid k32 plot. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think perhaps that document is out of date. All my plots are in the 108.8GB range rather than 104.7GB. Do you have time to dig into this discrepancy?
$ ls -l /farm/sites/007 | tail -n 5
-rw-r--r-- 1 chia chia 108861233015 May 15 05:30 plot-k32-2021-05-14-13-46-8b913363152db776f47dd84a73b28c213c479ad9eb6cf9c5b62f5e53b260f945.plot
-rw-r--r-- 1 chia chia 108833740950 May 15 07:50 plot-k32-2021-05-14-16-07-8f281deb8a61e7d8240a3c023b40fdc4ae37d30221af1e18a36a1248da629740.plot
-rw-r--r-- 1 chia chia 108824815844 May 15 09:30 plot-k32-2021-05-14-17-48-ad72c2d71419d10b4ae4ecc13d8e453ab154f77c7a07d254a47e781fb3519b7c.plot
-rw-r--r-- 1 chia chia 108790088102 May 15 11:18 plot-k32-2021-05-14-19-31-b981549c5e8bd3f26b1c47b10c400b42e79ec79dc3d7c40ea41ef223297923f3.plot
-rw-r--r-- 1 chia chia 108884412258 May 15 13:09 plot-k32-2021-05-14-21-15-8fec2777aa2cc7097fd52ea4ce7d70c3f6f494e3b55688ef914fa8574477c700.plot
$ python -c 'print(int(0.762 * 32 * 2**32))'
104728482545
I'm not sure where the diffrence come from. I'll take a look into it. |
The formula in the document, assumes that each entry on average takes k bits. It is close but inaccurate. Entries are "parked", that is, each 2048 entries are stored together as a group. Even though the actual sizes of each park (in bytes) are different, chiapos used constant park size for each table to increase performance. This resulted in empty spaces in the parks. By looking into the source code of chiapos, I am able to find exactly how park size is calculated and thus account for some of the space differences. I wrote code to find out the plotsize taking park sizes into account. plot_size.py:
console output:
The size for k = 32 is 107.29 GB, which is much more accurate than result by the formula in the document. It turns out the scaler is different for different k, probabally due to space waste differences. However, plot_size.py is too complicated, I don't think we should include it in plotman. So, I will keep it as a reference here. I guess we should just hard code the scalers of different k s. |
Is there any of the Chia code that would be useful to copy and paste? We already to that for the command line parsing of plotting processes so we could do it for other bits as needed. Also, there is a mechanism to track the different versions of chia, though they haven't changed that code since we started copying it. |
Oh, and thanks for the hard work here. |
We cannot copy the code in this case, because chiapos is implemented in C++. I changed them a bit to run in python. Also, I think the code is too complex for this simple usage. I guess we should just hard code the scalers. |
I would default to having the actual calculations present in the code. If they are kind of slow then we can run them as needed, or at start up, and cache results. If they are super slow then we can consider pre-generation. I think it is reasonable to continue these here and view this PR as a general improvement on the assumptions and simplifications that plotman has at present. It would be good to provide pinned links in the code to the C++ code it 'mirrors'. This allows for easier maintenance and verification. Let's pin the links to release versions as in the link I shared for the copied code. I'm thinking I may restructure that to have a separate module per version like |
I moved to code that is from chiapos to chiapos.py, since it is different from chia-blockchain. I think code ported from chiapos should not be duplicated every time there is a new version, since the code from chiapos is much less likely to change. |
I just realized that the size calculated here is the minimal size of the plot file. So we expect an actual file will be about 1-5% larger than it(I guess, there is no proof the difference is less than 5% though). This can be used to tell whether a plot file is of any size K . |
Using a formula instead of a constant is better for improving plotman to work with plots with a different k.
Formula from Chia Proof of Space Construction "Space Required" section.