Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue #519: tweak table processing #589

Merged
merged 2 commits into from
Jan 31, 2023
Merged

issue #519: tweak table processing #589

merged 2 commits into from
Jan 31, 2023

Conversation

al-niessner
Copy link
Contributor

@al-niessner al-niessner commented Jan 26, 2023

🗒️ Summary

When the length of the record read from the length given in the XML does not match that reading by delimiter, toss an error. Used RECORD_LENGTH_MISMATCH as that seemed appropriate rather than creaating a new one. Cleaned up the end of line check too. Not unit test pushed with this one as the results from the given data are mirky at best.

⚙️ Test Data and/or Report

There are over 23000 errors and messages so just giving the highlight:

>< snip ><
      ERROR  [error.table.record_length_mismatch]   data object 1, record 750: Delimiter is not at the end of the record. Record read using delimiter is 545 bytes long while record is defined to be 684 bytes.
>< snip ><
    750          error.table.record_length_mismatch
>< snip ><

♻️ Related Issues

resolves #519
NASA-PDS/pds4-jparser#84 needed to build

When the length of the record read from the length given in the XML does not match that reading by delimiter, toss an error. Used RECORD_LENGTH_MISMATCH as that seemed appropriate rather than creaating a new one. Cleaned up the end of line check too. Not unit test pushed with this one as the results from the given data are mirky at best.

Lastly, requires changes to pds4-jparser as well.
@al-niessner
Copy link
Contributor Author

@jordanpadams

Do you want to ignore the gross file mismatch and do content processing? It is very unclear to me from #519 and its predecessors if processing despite such a gross error is desired. The code DataDefinitionAndContentValidationRule.validate() catches the exception bypassing all other checks. The exception is generated at TableValidator.validateDataObjectContents() when it calls this.tableObject.getRawTableReader(). Removing the exception or trying to bypass it will be significant rework of pds4-jparser and parts of validation as most of the code is written to expect enough data. Seems to me, the error message clearly tells the user they need to change something in the XML to make it work.

@al-niessner
Copy link
Contributor Author

@jordanpadams

Also, the file checking allows for slop in the file (more bytes than required). Is this desired?

@jordanpadams
Copy link
Member

@al-niessner sorry for the lack of clarity here.

Do you want to ignore the gross file mismatch and do content processing?

As you said, I think we want to just throw an error like we already do.

Also, the file checking allows for slop in the file (more bytes than required). Is this desired?

Unfortunately, the standard allows for undefined data in the file. We can maybe throw a warning in this case?

@jordanpadams jordanpadams changed the title issue 519: tweak table processing issue #519: tweak table processing Jan 26, 2023
@al-niessner
Copy link
Contributor Author

@jordanpadams @nutjob4life @tloubrieu-jpl

Ready for review.

1: abort processing if file too small
2: files are allowed to be too big because XML may only be specify part of it

Copy link
Member

@nutjob4life nutjob4life left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member

@jordanpadams jordanpadams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@al-niessner a couple issues identified:

  1. NullPointerException when running mvn test
$ validate -t src/test/resources/github469/201401031400_rdr_min_FAIL.xml

java.lang.NullPointerException: Cannot invoke "String.length()" because "line" is null
	at gov.nasa.pds.tools.validate.rule.pds4.TableValidator.validateTableDelimitedContent(TableValidator.java:243)
	at gov.nasa.pds.tools.validate.rule.pds4.TableValidator.validateDataObjectContents(TableValidator.java:160)
	at gov.nasa.pds.tools.validate.rule.pds4.TableValidator.validate(TableValidator.java:98)
	at gov.nasa.pds.tools.validate.rule.pds4.DataDefinitionAndContentValidationRule.validate(DataDefinitionAndContentValidationRule.java:86)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:564)
	at gov.nasa.pds.tools.validate.rule.AbstractValidationRule.execute(AbstractValidationRule.java:64)
	at org.apache.commons.chain.impl.ChainBase.execute(ChainBase.java:191)
	at gov.nasa.pds.tools.validate.task.ValidationTask.execute(ValidationTask.java:130)
	at gov.nasa.pds.tools.validate.task.BlockingTaskManager.submit(BlockingTaskManager.java:26)
	at gov.nasa.pds.tools.label.LocationValidator.validate(LocationValidator.java:284)
	at gov.nasa.pds.validate.ValidateLauncher.doValidation(ValidateLauncher.java:1402)
	at gov.nasa.pds.validate.ValidateLauncher.processMain(ValidateLauncher.java:1688)
	at gov.nasa.pds.validate.ValidateLauncher.main(ValidateLauncher.java:1737)

PDS Validate Tool Report

Configuration:
   Version                       3.2.0-SNAPSHOT
   Date                          2023-01-31T19:20:58Z

Parameters:
   Targets                       [file:/Users/jpadams/proj/pds/pdsen/workspace/validate/src/test/resources/github469/201401031400_rdr_min_FAIL.xml]
   Severity Level                WARNING
   Recurse Directories           true
   File Filters Used             [*.xml, *.XML]
   Data Content Validation       on
   Product Level Validation      on
   Max Errors                    100000
   Registered Contexts File      /Users/jpadams/proj/pds/pdsen/workspace/validate/validate-3.2.0-SNAPSHOT/resources/registered_context_products.json



Product Level Validation Results

  FAIL: file:/Users/jpadams/proj/pds/pdsen/workspace/validate/src/test/resources/github469/201401031400_rdr_min_FAIL.xml
    Begin Content Validation: file:/Users/jpadams/proj/pds/pdsen/workspace/validate/src/test/resources/github469/201401031400_rdr_min_FAIL.tab
      ERROR  [error.table.field_value_out_of_min_max_range]   data object 2, record 9, field 32: Field has a value '11.0' that is less than the defined minimum value '12.0'.
      ERROR  [error.table.field_value_out_of_min_max_range]   data object 2, record 10, field 32: Field has a value '11.0' that is less than the defined minimum value '12.0'.
      ERROR  [error.table.field_value_out_of_min_max_range]   data object 2, record 10, field 33: Field has a value '-1' that is less than the defined minimum value '0.0'.
    End Content Validation: file:/Users/jpadams/proj/pds/pdsen/workspace/validate/src/test/resources/github469/201401031400_rdr_min_FAIL.tab
        1 product validation(s) completed

Summary:

  3 error(s)
  0 warning(s)

  Product Validation Summary:
    0          product(s) passed
    1          product(s) failed
    0          product(s) skipped

  Referential Integrity Check Summary:
    0          check(s) passed
    0          check(s) failed
    0          check(s) skipped

  Message Types:
    3            error.table.field_value_out_of_min_max_range

End of Report
Completed execution in 5799 ms
  1. test failure:
[ERROR] Execute validate command for tests below. #87  Time elapsed: 0.755 s  <<< FAILURE!
org.opentest4j.AssertionFailedError: 0 error messages expected. {"title":"PDS Validation Tool Report","configuration":{"version":"3.2.0-SNAPSHOT","date":"2023-01-30T22:02:41Z"},"parameters":{"targets":"[file:/Users/jpadams/proj/pds/pdsen/workspace/validate/src/test/resources/github325/crs009x.xml]","ruleType":"pds4.label","severityLevel":"WARNING","recurseDirectories":"true","fileFiltersUsed":"[*.xml, *.XML]","dataContentValidation":"on","productLevelValidation":"on","maxErrors":"100000","registeredContextsFile":"/Users/jpadams/proj/pds/pdsen/workspace/validate/src/main/resources/util/registered_context_products.json"},"productLevelValidationResults":[{"status":"FAIL","label":"file:/Users/jpadams/proj/pds/pdsen/workspace/validate/src/test/resources/github325/crs009x.xml","messages":[],"fragments":[],"dataContents":[{"dataFile":"file:/Users/jpadams/proj/pds/pdsen/workspace/validate/src/test/resources/github325/crs009x.tab","messages":[{"severity":"ERROR","type":"error.table.record_length_mismatch","table":1,"record":1,"message":"Delimiter is not at the end of the record. Record read using delimiter is 82 bytes long while record is defined to be 246 bytes."},{"severity":"ERROR","type":"error.table.record_length_mismatch","table":1,"record":2,"message":"Delimiter is not at the end of the record. Record read using delimiter is 82 bytes long while record is defined to be 246 bytes."}]}]}],"summary":{"totalProducts":1,"totalErrors":2,"totalWarnings":0,"messageTypes":[{"messageType":"error.table.record_length_mismatch","total":2}]}} ==> expected: <2> but was: <0>

@al-niessner
Copy link
Contributor Author

@jordanpadams

Cannot reproduce 1. Do you have the latest jparser when running maven test? Put a hook in anyway that will prevent the exception but probably blow up somewhere else if you do not have the latest jparser.

I can reproduce 2 but that means a delimiter is not a delimiter. The only way to embed the delimiter in a record should be to escape it: quotes, backslash, something. No other language in the world says CRLF is my delimiter if and only if it occurs at bytes 22 an 23. That must means you have a fixed length record of 23 bytes.

temporarily turn off 325
Copy link
Member

@jordanpadams jordanpadams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with new #593 opened to handle regression

@jordanpadams jordanpadams merged commit 8e83040 into main Jan 31, 2023
@jordanpadams jordanpadams deleted the issue_519 branch January 31, 2023 22:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Validate should throw record length error when record delimiter does not occur in correct location
3 participants