Potential Bug: skipEmptyLines doesn't skip empty lines #469

randy · 2021-06-18T13:38:12Z

I'm using univocity-parsers 2.9.1. Our project language is Groovy and we're using Spock as our testing framework, so my apologies for the syntactical differences; my Java/JUnit is very rusty...

This is a very simple test case of the issue. My actual data is based on 70+ columns, use of a BeanListProcessor, and an actual file, but the result is the same.

def "skip empty lines test"() {
    given:
        String csv = "HEADER1, HEADER2\n" +
            "value1, value2\n" +
            ",\n" +
            "value5, value6"
        CsvParserSettings csvSettings = new CsvParserSettings()
        csvSettings.skipEmptyLines = true
        csvSettings.headerExtractionEnabled = true
        CsvParser parser = new CsvParser(csvSettings)

    when:
        def result = parser.parseAllRecords(new ByteArrayInputStream(csv.getBytes()))
        
    then:
        result.size() == 2
}

This test fails as result.size() == 3. Since the second data row is all empty, it is my understanding that it should not be included in the output, therefore not part of the results. This is especially an issue when creating beans that end up with all-null properties... Even with a RowProcessorErrorHandler in place, no errors are recorded for this (which would be an acceptable result as it would allow manual skipping later on).

I took a gander at the unit tests for CsvParser and even searched the file for setSkipEmptyLines and it returned no results. I have a feel this was just not implemented, but I have not checked the actual source to verify. Any clarification of this setting's behavior would be appreciated.

The text was updated successfully, but these errors were encountered:

randy · 2022-01-21T15:41:11Z

Following up on this, it seems that the definition of "empty" is not clear.

Here's an example of what an "empty" row is according to the library:

this, is a, valid, csv, row
the, next, row, is, empty

and, this, row, is, not

And this is what I expect to be recognized as an empty row:

this, is a, valid, csv, row
the, next, row, is, empty
,,,,
and, this, row, is, not

Why this is not considered empty is a huge question. Literally every value is "empty," the resulting bean is "empty," so why isn't the record skipped as being empty? Instead, the library returns null for this which means any subsequent processing will throw an NPE if you're trying to touch any properties of the records.

To prevent these sort of "records" from appearing in the results seems to require a custom implementation of beanProcessed() or some other method. But why?

SolidSnackDrive mentioned this issue Apr 16, 2022

Bug: skipEmptyLines not skipping empty lines SolidSnackDrive/univocity-parsers#11

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential Bug: skipEmptyLines doesn't skip empty lines #469

Potential Bug: skipEmptyLines doesn't skip empty lines #469

randy commented Jun 18, 2021

randy commented Jan 21, 2022

Potential Bug: skipEmptyLines doesn't skip empty lines #469

Potential Bug: skipEmptyLines doesn't skip empty lines #469

Comments

randy commented Jun 18, 2021

randy commented Jan 21, 2022