-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better errors when (Sub)SmallExtension is filled. #513
Conversation
It looks like |
Hmm...For some reason |
I have no way to test if it crashes for some other reason. |
This may be irrelevant, but on MacOS ( ./build/sources/vorm -D TEST=Issue512_2 check/fixes.frm
The corresponding Valgrind run on Linux is: valgrind ./build/sources/vorm -D TEST=Issue512_2 check/fixes.frm
Something wrong seems to be happening before calling |
I'll have a look at it. It might be related to #468 (comment) |
The valgrind error here is due to Lines 3581 to 3611 in 0112aa6
The function certainly knows things have gone wrong; if you define That we get a valgrind error here means that the content of the old small buffer + SplitScratch has blown through the small extension, and also through the |
This is actually not the case here; it is the first loop which causes the problem. I don't understand why, though. The general idea is,
I would expect that the original terms pointed to by sPointer should be guaranteed to fit in the SmallBuffer+Extension? |
I added some prints to try to work out what is going on here. For a "working" call of
so as soon as sFill plus the size of the new term is >= sTop2, GarbHand is called which clears out the gaps in the small buffer (this is all during a SplitMerge so terms are being merged a lot). At the problematic call:
Here "total" is larger than sTop2 (I changed the print compared to the master branch, which prints 2*total for some reason). But sFill was 960006 before the call, and it was only trying to add a term of 48 words. So I am not sure how a "total" of 982988 is possible, unless the I am testing on top of #529 so writes over sTop2 are an immediate crash, unlike in the master branch which needs a write beyond PObuffer also (the test rebased onto master actually doesn't cause a valgrind error anymore, due to the increased value of |
The issue is that "NEWSPLITMERGE" is not strict about setting pointers to zero, for terms which have merged (or cancel, but in this test nothing cancels). If we use the old code, Maybe this was done for speed, but what is means is that the "new" code fails in some cases where the old succeeds because GarbHand cannot work properly. If you run |
e679123
to
710d340
Compare
@tueda can you test this on your mac? In case it is needed again, I used the following print code (and diff) in the two versions of
|
Now it works on my MacBook Air. |
OK, that is good. I have run a few mincer benchmarks and did not see any performance impact. I think in FORM the term comparison is so expensive that no amount of pointer operations during the sort is noticable. I have a "re-written" version of |
Previously, FORM warns only about SmallExtension, even if it is a sub-buffer sort. This prompts the user to adjust the wrong buffer in the setup.
710d340
to
96b9485
Compare
I missed some pointer zeroing which was causing GarbHand to still double-count terms. The crash was due to the merge of the split sort allocation, so now errors here are overruns of the small extension buffer (this passed the CI before, where the overruns were just spilling into the PO buffer). The "Timsort" part needed to be modified slightly, when there are not two copies of the terms present, but it is functionally the same. I will do some performance regression checks later. |
I don't see any performance impact due to this compared to master, and indeed I can see a ~1% improvement due to the Timsort code (I could not measure a difference there in the past). I'll merge the fix commit into the previous so that there isn't a buggy commit on the master branch. |
Commit f1b83ae removed various zeroed pointers from SplitMerge, which does not produce wrong results, since they are "out of range" of the sorted and merged region of the pointer array, but it leads to GarbHand computing the total size of the terms incorrectly. This causes valgrind errors when we write over the end of the SmallExtension, and causes termination due to a "too small SmallExtension" which might not actually be the case.
7de7e57
to
c9d5abd
Compare
This should be good to merge now I think. |
Before merging, can you remove warnings in the test cases added by this PR? You can see them by running:
These warning messages are not caught by the test suite for now (#581) but should be resolved. |
Yes, this is just a case of using the buffer sizes that form uses anyway. I could also add a "debug mode" for splitmerge which I used to test the changes (hidden behind a preproc variable) which could be useful in the future? |
You mean debug code disabled by a C preprocessor #ifdef directive? Yes, that would be useful. |
To do this the test needs to be ~4x larger. Otherwise if I use buffers like
the The test for Issue468 also gives the warnings. |
So, the problem is that the buffer size should be adjusted for the number of workers, right? One possible (and tedious) solution is to make several test cases for, e.g., A more sophisticated solution would be to introduce a kind of template
but it requires some work in |
Yes and no. With sizes per-test-config, the single-threaded valgrind runs could be made smaller and faster, but the |
Optionally print the state of the pointer array on entry to SplitMerge, for ease of potential future debugging here.
Maybe we have to allow such slow test cases. To allow tests that take more time than usual (10 seconds for non-Valgrind and 300 seconds for Valgrind), we can use, e.g., |
OK, then I will get the proper tests in and see how they run in the CI. It could run a bit faster in general, by splitting the tests into a larger number of groups -- usually the CI is waiting for just a few (<20, the limit) tests to finish for a while. |
Make sure the tests run properly for tform -w4 also. (This makes the tests larger and more time consuming, also for form). Also fix the buffers for the Issue468 test.
Now if you define
This produces huge output for larger problems, but it allows you do diff the logs and make sure things are going OK. You should get the same output, regardless of your In the master branch if you put the same printing code, you will see for the Issue512 test cases (best not run with |
@tueda if you are happy with this I would like to merge it, a collaborator has smallextension problems with some jobs and this should fix it for them. |
Yes, I think this can be merged. I'll merge it. |
Thanks. This one should go into 4.3 as well, if you are not already on it. |
Yes, I'm trying to resolve conflicts. |
Previously, FORM warns only about SmallExtension, even if it is a sub-buffer sort.
This prompts the user to adjust the wrong buffer in the setup.
Fixes #512