Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential Bug: skipEmptyLines doesn't skip empty lines #469

Open
randy opened this issue Jun 18, 2021 · 1 comment
Open

Potential Bug: skipEmptyLines doesn't skip empty lines #469

randy opened this issue Jun 18, 2021 · 1 comment

Comments

@randy
Copy link

randy commented Jun 18, 2021

I'm using univocity-parsers 2.9.1. Our project language is Groovy and we're using Spock as our testing framework, so my apologies for the syntactical differences; my Java/JUnit is very rusty...

This is a very simple test case of the issue. My actual data is based on 70+ columns, use of a BeanListProcessor, and an actual file, but the result is the same.

def "skip empty lines test"() {
    given:
        String csv = "HEADER1, HEADER2\n" +
            "value1, value2\n" +
            ",\n" +
            "value5, value6"
        CsvParserSettings csvSettings = new CsvParserSettings()
        csvSettings.skipEmptyLines = true
        csvSettings.headerExtractionEnabled = true
        CsvParser parser = new CsvParser(csvSettings)

    when:
        def result = parser.parseAllRecords(new ByteArrayInputStream(csv.getBytes()))
        
    then:
        result.size() == 2
}

This test fails as result.size() == 3. Since the second data row is all empty, it is my understanding that it should not be included in the output, therefore not part of the results. This is especially an issue when creating beans that end up with all-null properties... Even with a RowProcessorErrorHandler in place, no errors are recorded for this (which would be an acceptable result as it would allow manual skipping later on).

I took a gander at the unit tests for CsvParser and even searched the file for setSkipEmptyLines and it returned no results. I have a feel this was just not implemented, but I have not checked the actual source to verify. Any clarification of this setting's behavior would be appreciated.

@randy
Copy link
Author

randy commented Jan 21, 2022

Following up on this, it seems that the definition of "empty" is not clear.

Here's an example of what an "empty" row is according to the library:

this, is a, valid, csv, row
the, next, row, is, empty

and, this, row, is, not

And this is what I expect to be recognized as an empty row:

this, is a, valid, csv, row
the, next, row, is, empty
,,,,
and, this, row, is, not

Why this is not considered empty is a huge question. Literally every value is "empty," the resulting bean is "empty," so why isn't the record skipped as being empty? Instead, the library returns null for this which means any subsequent processing will throw an NPE if you're trying to touch any properties of the records.

To prevent these sort of "records" from appearing in the results seems to require a custom implementation of beanProcessed() or some other method. But why?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant