-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
valgrind errors when creating tablebases #318
Comments
This looks a bug in a boilerplate code for a kind of Lines 965 to 988 in 558b01f
|
Yes, it looks like this is the problem.
after the first I guess that the |
I'm not sure that this is the right place, because the comment Lines 989 to 991 in 558b01f
sounds like the pointer to the old buffer is still needed to do something, maybe for some checks and setup of the new blocks, until the end of the function. |
Well, only At least, the example above no longer crashes or creates enormous output, and I can load and use the table in other script. |
OK, I'm also not familiar with this minos code, and perhaps you are right. Do you want to make a commit and pull request for this? Then I will merge it. |
It looks indeed that those two lines are missing.
When it continues in the next loop it assumes that the space is there and hence nnew should have
been installed.
Thanks for finding this.
Jos
… On 28 Aug 2019, at 12:54, Takahiro Ueda ***@***.***> wrote:
OK, I'm also not familiar with this minos code, and perhaps you are right.
Indeed, the fact that your example works is an improvement.
Do you want to make a commit and pull request for this? Then I will merge it.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub <#318?email_source=notifications&email_token=ABJPCERW55G5T2BS5XZ3DT3QGZKPZA5CNFSM4IQEIZJ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5KW26A#issuecomment-525692280>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABJPCEWXS47UX6IR4JM3EF3QGZKPZANCNFSM4IQEIZJQ>.
|
The newly created larger array was not swapped with the old array. This caused lots of invalid array accesses and, sometimes, an fseek to an undefined location resulting in incorrectly sized output files. See Issue vermaseren#318 . Also when comparing contents the block under consideration was not correctly updated. This caused valgrind to report an Illegal write. I have not found any actual runtime issues caused by this. Could this be the cause of an old issue from the forum, where I described how FORM writes a really enormous amount of data to disk when creating relatively small tablebase files? For this reason it is a very bad idea to create a tablebase directly in a network directory.
My fork is a real mess it seems. Are you able to cherry-pick a single commit, or just re-implement the change in this repo? |
OK. Then I will cherry-pick it from your fork. |
The newly created larger array was not swapped with the old array. This caused lots of invalid array accesses and, sometimes, an fseek to an undefined location resulting in incorrectly sized output files. See Issue #318 . Also when comparing contents the block under consideration was not correctly updated. This caused valgrind to report an Illegal write. I have not found any actual runtime issues caused by this. Could this be the cause of an old issue from the forum, where I described how FORM writes a really enormous amount of data to disk when creating relatively small tablebase files? For this reason it is a very bad idea to create a tablebase directly in a network directory.
Thanks! This is done. (And hopefully, the CI will finish without any errors...) |
Perhaps this warrants a separate issue, but some more detail on the commentry of that commit: If you take the same test file from the first post here, and put
before This means, I think, that FORM has written 18GB to disk in total but has over-written the same data many times, and only 19.4MB was actually committed to disk in the end by the OS. If you run the same example on a network storage location, these numbers roughly coincide, and if you look at some sort of network throughput monitor while FORM is running you will see that your machine really is transferring all of that 18GB to the file server. I am not sure why this is, but for this reason I always create tablebases in a tmpfs, and transfer to the proper place afterwards. |
The problem with table bases (and databases in general) is that you have to keep all information synchronized.
Hence, if you write a new element of the table, you have to update the index.
I know there must be better algorithms to do this, but I wanted to play it as safe as possible.
This will indeed generate more traffic than absolutely needed.
Jos
… On 28 Aug 2019, at 14:48, jodavies ***@***.***> wrote:
Perhaps this warrants a separate issue, but some more detail on the commentry of that commit:
If you take the same test file from the first post here, and put
#system cat /proc/`PID_'/io
before .end, you can inspect the difference between wchar and write_bytes. In this example, running from a local disk, wchar is about 18GB and write_bytes is 19.4MB (the size of test.tbl).
This means, I think, that FORM has written 18GB to disk in total but has over-written the same data many times, and only 19.4MB was actually committed to disk in the end by the OS.
If you run the same example on a network storage location, these numbers roughly coincide, and if you look at some sort of network throughput monitor while FORM is running you will see that your machine really is transferring all of that 18GB to the file server.
I am not sure why this is, but for this reason I always create tablebases in a tmpfs, and transfer to the proper place afterwards.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#318?email_source=notifications&email_token=ABJPCERRB7ZMN3MDYPM4AFLQGZX2FA5CNFSM4IQEIZJ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5LAAOI#issuecomment-525729849>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABJPCETNBLBGMCGOKZFA6ZDQGZX2FANCNFSM4IQEIZJQ>.
|
The CI successfully finished, so I would like to close this issue for now. |
The examples in this issue all relate to tablebases containing hundreds of tables. Just for information, 0b3ab5d also fixes problems with tablebases containing a single table with millions of entries. In this case the tablebase was created without apparent errors, but upon applying it to an expression in another form script, we received the errors |
The following program produces some invalid reads and writes in
minos.c: PutTableNames
. I can't always get it to run successfully, it sometimes crashes.I can't remember why I made this test file (some months ago -- I think someone asked me for help with crashes related to tablebases containing many different tables) but it seems that it sometimes (not always) creates something like "sparse" files, which claim to be much bigger than they really are.
The size seems to depend on the filesystem, but when this program misbehaves it seems to produce a 1.9TB file on ext4, and a 5.6EB file (!!!) on nfs. Needless to say, this caused some problems when rsync attempted to back this file up to a remote location. The true size of the file is just 20MB.
Perhaps the errors reported by valgrind are responsible for the occasional crazy files?
I attach a file with the output:
vg.log
Thanks,
Josh.
The text was updated successfully, but these errors were encountered: