Skip to content

04. Bug Fixes Milestone 3

MustafaWasif edited this page Apr 19, 2022 · 13 revisions

Potential bug: skipEmptyLines doesn't skip empty lines

Originally opened here https://github.com/uniVocity/univocity-parsers/issues/469

Assigned to: Graham Stewart

As stated in the original issue, even with the skipEmptyLines setting set to true, the parser will read records that contain only empty values in as records containing only null entries. It is then further elaborated upon that the library does not actually consider rows of all empty values empty lines, it seems this is only referring to lines of all whitespace. Examining some of the other test cases (in AbstractRoutinesTest and SettingsExamples for example) does seem to indicate that this behavior is intentional. However, it does seem like being able to skip rows that consist only of empty values is a desirable setting.

Since this behavior is actually separate from the skipEmptyLines setting, it seemed appropriate to implement this as a new, separate setting to avoid upsetting established behavior and failing tests.

image

As shown above, this behavior can be implemented by checking for all null values in a record before it is added to the final returned array of the internalParseAll method. Because this is done in AbstractParser.java, which is extended by the various other parsers, it should work for all of them.

image

Additionally, a new field and relevant getters and setters were added to CommonSettings. This field is skipEmptyRecords, which defaults to false. The getters and setters are also given relevant docstrings.

image

A test case was created to ensure that both the desired new functionality and the old functionality are preserved. This test passes, demonstrating how the third record will not be included should skipEmptyRecords be toggled on. All other passing tests from before this test also still pass, meaning no other functionality was impacted.

Unit Tests Needed for Null and Blank Input Validation

Assigned to: Kester Yang

The original program implemented "validate" settings that could allow/disallow certain types of input values from input files by exception handling. If a disallowed input type is present in an input file, an exception would be thrown, thus triggering the corresponding exception handling. However, unit tests for the exception triggering were not included in the original repository. I felt adding unit tests for them were necessary because parser applications often run into problems when dealing with null inputs. In my earlier attempt of fixing a bug where csv files could not be properly read when first record being empty (details here: https://github.com/uniVocity/univocity-parsers/issues/475), I later found that it was a setting error instead of an actual bug, which could be fixed by setting nullable to true. Furthermore, in Graham's part, the program sometimes behaved inconsistently when parsing empty lines and nulls unless configured meticulously (for example, empty line differs from a line of nulls). He had to look up multiple unit tests to determine if the inconsistent behavior of the program was caused by bug or incorrect setting.

Below are snapshots of the program part which we are mainly testing: (To see the complete class:

  1. https://github.com/SolidSnackDrive/univocity-parsers/blob/master/src/main/java/com/univocity/parsers/conversions/ValidatedConversion.java
  2. https://github.com/SolidSnackDrive/univocity-parsers/blob/master/src/main/java/com/univocity/parsers/common/DataValidationException.java) image image

There are four variables in the settings that we are interested in testing:

  1. nullable: This boolean variable indicates whether an input is allowed to contain nulls. This is set to false by default.
  2. allowBlanks: This boolean variable indicates whether an input is allowed to contain blanks. This is set to false by default.
  3. oneOf: This string array ensures that the value of the input is one of a given set of alternatives.
  4. noneOf: This string array ensures that the value of the input is not an unwanted value.

The unit tests cover test cases for null and blank inputs under different setting configurations, including:

  • Input is null and nullable = true , noneof.contains(null) = true
  • Input is null and nullable is true , noneof == null is true
  • Input is null and nullable is false, oneOf has null

Additionally, the unit tests also cover test cases for non-null inputs under different setting configurations, including:

  • noneof has the input string
  • matcher is not null and input matches, noneOf is null
  • input is blank and allowblanks is true and noneof contains input

Below are snapshots of the unit tests and their results: image image image

As it can be seen in the snapshots, the unit tests compares the actual caught exceptions and their corresponding exception handling with the expected results. In fact, a lot of the settings in this parser are implemented by exception handling. So, this unit test approach can be further expanded to increase the test coverage. Skimming through the issue tracker, lots of the reported inconsistent behavior of the program are potentially caused by setting errors instead of bugs. The program's tutorial page does not sufficiently cover all program settings, and the unit tests do not cover all settings neither.

Feature: Filter out rows that has all commas

Reference: https://github.com/uniVocity/univocity-parsers/issues/457

Assigned to: Mustafa Wasif

The purpose of this feature is to eliminate unnecessary output data from the csv files. Note that this feature is earlier implemented in regards to fixing a different bug, although extensive testing for having the entire row filtered out was not carried out. Below is the test to justify functionality of the feature: image

The above test is run successfully and therefore reflects correct implementation of the feature.

Bug: Terminate parsing if some condition exists

Reference: https://github.com/uniVocity/univocity-parsers/issues/467

Assigned to: Mustafa Wasif

To show that the parser could be stopped before completing parsing 'all' the records, a boolean based variable, i.e., hasCondition, is considered for simplicity. What type of conditions or any details in regards to it isn't mentioned in this issue.

image

Inside the method parse, we check if there exists a 'condition', upon which parsing stops if condition is met. This piece of implementation is shown below: image