-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate should throw record length error when record delimiter does not occur in correct location #519
Comments
Cannot find the file ECM000XXX_2022173T165156_HK_LR_SOLARWIND_CAL_ASC_RAW010.xml and associated data mentioned in #518. Please attach to this ticket. |
@tloubrieu-jpl hopefully adding this to the top of your stack. can you please provide the test data @al-niessner has requested? |
Seems much has changed. I do not get the errors in #518 nor does the bad table looks the same:
Seems clear enough that something in the table definition and file are not right. I am not sure that we can make sense of it from there either. What I find super interesting and think is the real bug, if I change both 944 references to 750 it runs with 10s of thousands of errors because types are not right. However if we take the file size and divide by the number of rows 513353/944 = 543.8061440677966. That is not rounding error. If I do it for the expected 750, 513353/750 = 684.4706666666667 which is also not rounding error. However, the file size check thinks it is right with XML defined record length of 684 and 750 rows which is 513000 but really is off by 353 bytes. Really? Cannot even blame it on the header because there is no header. Using python3, loaded the table and then found that CR-NL were used to delimit the file. There are 944 rows with that delimiter. They start off with a length of 541(139), drop to 540(134), climb to 542(447), then plateau at 543(224). The number in () is the number of those rows of that value out of the 944 rows. This brings me to the second problem, why should the table be processed if the XML is not even close to the actual file? The fact that it blew the size check twice (file then again table) allows it to grab 684 characters. Of course it is going to be wrong. I can put a check before it checks every cell to see if the line is properly terminated (making it a row) before doing cell processing. Is that the desirement of this ticket? |
@al-niessner what happens if we don't trigger this file size error? in other words, what if we change the label to say the number of records is 750 (like the error message reads), and keep on trying to validate the file? do we then throw an error that the record delimiter is invalid? |
@jordanpadams |
Here is the first set of offending code: I take it the first check is to allow for extra bytes at the end of the file. The second allows allows for junk at the end of the file too. Is this desired? The second error message is far less informative about why it is short on byte. Should its error message be expanded like other check? Lastly, I have changed the first error set to be this:
That bit of code says that if there is an even multiplier for too many rows or different integer length that works those messages give you clean integers all the way around (help with missing row or miscount of length but not both). If there are no integer resolutions it gives this message:
Helps the user figure out what is wrong. I think this same block should be used for the not enough bytes short message as well. Let me know what you want. |
issue #519: tweak table processing
🐛 Describe the bug
Per #518, validate appears to be missing checking that the record delimiter occurs at the end of the expected record length. When reading a record, if the record length is read, and it does not end with the expected record delimiter, validate should throw an error.
📜 To Reproduce
See #518
🕵️ Expected behavior
Validate should throw a record length error
📚 Version of Software Used
2.2.3
🩺 Test Data / Additional context
https://github.com/NASA-PDS/validate/files/8959903/Archive.zip
🦄 Related requirements
⚙️ Engineering Details
Medium severity because it will most likely fail elsewhere in most cases, but it is not at all intuitive what the issue is, and it does not appear that we are properly checking for record lengths.
The text was updated successfully, but these errors were encountered: