-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
issue #519: tweak table processing #589
Conversation
When the length of the record read from the length given in the XML does not match that reading by delimiter, toss an error. Used RECORD_LENGTH_MISMATCH as that seemed appropriate rather than creaating a new one. Cleaned up the end of line check too. Not unit test pushed with this one as the results from the given data are mirky at best. Lastly, requires changes to pds4-jparser as well.
Do you want to ignore the gross file mismatch and do content processing? It is very unclear to me from #519 and its predecessors if processing despite such a gross error is desired. The code DataDefinitionAndContentValidationRule.validate() catches the exception bypassing all other checks. The exception is generated at TableValidator.validateDataObjectContents() when it calls this.tableObject.getRawTableReader(). Removing the exception or trying to bypass it will be significant rework of pds4-jparser and parts of validation as most of the code is written to expect enough data. Seems to me, the error message clearly tells the user they need to change something in the XML to make it work. |
Also, the file checking allows for slop in the file (more bytes than required). Is this desired? |
@al-niessner sorry for the lack of clarity here.
As you said, I think we want to just throw an error like we already do.
Unfortunately, the standard allows for undefined data in the file. We can maybe throw a warning in this case? |
@jordanpadams @nutjob4life @tloubrieu-jpl Ready for review. 1: abort processing if file too small |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@al-niessner a couple issues identified:
- NullPointerException when running
mvn test
$ validate -t src/test/resources/github469/201401031400_rdr_min_FAIL.xml
java.lang.NullPointerException: Cannot invoke "String.length()" because "line" is null
at gov.nasa.pds.tools.validate.rule.pds4.TableValidator.validateTableDelimitedContent(TableValidator.java:243)
at gov.nasa.pds.tools.validate.rule.pds4.TableValidator.validateDataObjectContents(TableValidator.java:160)
at gov.nasa.pds.tools.validate.rule.pds4.TableValidator.validate(TableValidator.java:98)
at gov.nasa.pds.tools.validate.rule.pds4.DataDefinitionAndContentValidationRule.validate(DataDefinitionAndContentValidationRule.java:86)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:564)
at gov.nasa.pds.tools.validate.rule.AbstractValidationRule.execute(AbstractValidationRule.java:64)
at org.apache.commons.chain.impl.ChainBase.execute(ChainBase.java:191)
at gov.nasa.pds.tools.validate.task.ValidationTask.execute(ValidationTask.java:130)
at gov.nasa.pds.tools.validate.task.BlockingTaskManager.submit(BlockingTaskManager.java:26)
at gov.nasa.pds.tools.label.LocationValidator.validate(LocationValidator.java:284)
at gov.nasa.pds.validate.ValidateLauncher.doValidation(ValidateLauncher.java:1402)
at gov.nasa.pds.validate.ValidateLauncher.processMain(ValidateLauncher.java:1688)
at gov.nasa.pds.validate.ValidateLauncher.main(ValidateLauncher.java:1737)
PDS Validate Tool Report
Configuration:
Version 3.2.0-SNAPSHOT
Date 2023-01-31T19:20:58Z
Parameters:
Targets [file:/Users/jpadams/proj/pds/pdsen/workspace/validate/src/test/resources/github469/201401031400_rdr_min_FAIL.xml]
Severity Level WARNING
Recurse Directories true
File Filters Used [*.xml, *.XML]
Data Content Validation on
Product Level Validation on
Max Errors 100000
Registered Contexts File /Users/jpadams/proj/pds/pdsen/workspace/validate/validate-3.2.0-SNAPSHOT/resources/registered_context_products.json
Product Level Validation Results
FAIL: file:/Users/jpadams/proj/pds/pdsen/workspace/validate/src/test/resources/github469/201401031400_rdr_min_FAIL.xml
Begin Content Validation: file:/Users/jpadams/proj/pds/pdsen/workspace/validate/src/test/resources/github469/201401031400_rdr_min_FAIL.tab
ERROR [error.table.field_value_out_of_min_max_range] data object 2, record 9, field 32: Field has a value '11.0' that is less than the defined minimum value '12.0'.
ERROR [error.table.field_value_out_of_min_max_range] data object 2, record 10, field 32: Field has a value '11.0' that is less than the defined minimum value '12.0'.
ERROR [error.table.field_value_out_of_min_max_range] data object 2, record 10, field 33: Field has a value '-1' that is less than the defined minimum value '0.0'.
End Content Validation: file:/Users/jpadams/proj/pds/pdsen/workspace/validate/src/test/resources/github469/201401031400_rdr_min_FAIL.tab
1 product validation(s) completed
Summary:
3 error(s)
0 warning(s)
Product Validation Summary:
0 product(s) passed
1 product(s) failed
0 product(s) skipped
Referential Integrity Check Summary:
0 check(s) passed
0 check(s) failed
0 check(s) skipped
Message Types:
3 error.table.field_value_out_of_min_max_range
End of Report
Completed execution in 5799 ms
- test failure:
[ERROR] Execute validate command for tests below. #87 Time elapsed: 0.755 s <<< FAILURE!
org.opentest4j.AssertionFailedError: 0 error messages expected. {"title":"PDS Validation Tool Report","configuration":{"version":"3.2.0-SNAPSHOT","date":"2023-01-30T22:02:41Z"},"parameters":{"targets":"[file:/Users/jpadams/proj/pds/pdsen/workspace/validate/src/test/resources/github325/crs009x.xml]","ruleType":"pds4.label","severityLevel":"WARNING","recurseDirectories":"true","fileFiltersUsed":"[*.xml, *.XML]","dataContentValidation":"on","productLevelValidation":"on","maxErrors":"100000","registeredContextsFile":"/Users/jpadams/proj/pds/pdsen/workspace/validate/src/main/resources/util/registered_context_products.json"},"productLevelValidationResults":[{"status":"FAIL","label":"file:/Users/jpadams/proj/pds/pdsen/workspace/validate/src/test/resources/github325/crs009x.xml","messages":[],"fragments":[],"dataContents":[{"dataFile":"file:/Users/jpadams/proj/pds/pdsen/workspace/validate/src/test/resources/github325/crs009x.tab","messages":[{"severity":"ERROR","type":"error.table.record_length_mismatch","table":1,"record":1,"message":"Delimiter is not at the end of the record. Record read using delimiter is 82 bytes long while record is defined to be 246 bytes."},{"severity":"ERROR","type":"error.table.record_length_mismatch","table":1,"record":2,"message":"Delimiter is not at the end of the record. Record read using delimiter is 82 bytes long while record is defined to be 246 bytes."}]}]}],"summary":{"totalProducts":1,"totalErrors":2,"totalWarnings":0,"messageTypes":[{"messageType":"error.table.record_length_mismatch","total":2}]}} ==> expected: <2> but was: <0>
Cannot reproduce 1. Do you have the latest jparser when running maven test? Put a hook in anyway that will prevent the exception but probably blow up somewhere else if you do not have the latest jparser. I can reproduce 2 but that means a delimiter is not a delimiter. The only way to embed the delimiter in a record should be to escape it: quotes, backslash, something. No other language in the world says CRLF is my delimiter if and only if it occurs at bytes 22 an 23. That must means you have a fixed length record of 23 bytes. |
temporarily turn off 325
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved with new #593 opened to handle regression
🗒️ Summary
When the length of the record read from the length given in the XML does not match that reading by delimiter, toss an error. Used RECORD_LENGTH_MISMATCH as that seemed appropriate rather than creaating a new one. Cleaned up the end of line check too. Not unit test pushed with this one as the results from the given data are mirky at best.
⚙️ Test Data and/or Report
There are over 23000 errors and messages so just giving the highlight:
♻️ Related Issues
resolves #519
NASA-PDS/pds4-jparser#84 needed to build