Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rows starting with comment char should be automatically quoted #362

Closed
adutra opened this issue Nov 19, 2019 · 5 comments
Closed

Rows starting with comment char should be automatically quoted #362

adutra opened this issue Nov 19, 2019 · 5 comments
Assignees
Milestone

Comments

@adutra
Copy link

adutra commented Nov 19, 2019

I ran into a corner case when exporting/importing CSV data. Consider the following format and parse operations:

StringWriter sw = new StringWriter();
{
  CsvWriterSettings writerSettings = new CsvWriterSettings();
  CsvWriter writer = new CsvWriter(sw, writerSettings);
  writer.writeRow(new String[] {"#field1", "field2", "field3"});
  writer.close();
}
StringReader sr = new StringReader(sw.toString());
{
  CsvParserSettings parserSettings = new CsvParserSettings();
  CsvParser parser = new CsvParser(parserSettings);
  List<String[]> rows = parser.parseAll(sr);
  System.out.println(Arrays.toString(rows.get(0)));
}

The above will fail because the written row happens to start with the configured comment character #.

I am currently using the following workaround when exporting data:

writerSettings.setQuotationTriggers(writerSettings.getFormat().getComment());

This makes the code above work as expected.

But I think it would be a good idea to make CsvWriter automatically quote the first field of a row if its value starts with the configured comment character. Because otherwise the written row is likely to be ignored when read with the same settings.

@jbax
Copy link
Member

jbax commented Nov 22, 2019

Agree 100% and not even sure how we got this far forcing people to do that.
It will be less bad if you do this for writing:

writerSettings.getFormat().setComment('\0');

and for parsing:

parserSettings.getFormat().setComment('\0');

But I'll adjust to allow writing the first value without having to set the comment character to \0

Thanks for the suggestion

@jbax jbax self-assigned this Nov 22, 2019
@jbax jbax added this to the 2.8.4 milestone Nov 22, 2019
@adutra
Copy link
Author

adutra commented Nov 22, 2019

parserSettings.getFormat().setComment('\0');

This won't always work. Believe me or not, the initial issue I ran into was precisely with a row starting with \0, as in "\0foo,bar,qix" and my parser was configured with \0 as the comment char (or rather, was configured to consider no comment char). That row got written nicely, but couldn't be read.

If it's feasible for you, I would avoid mixing these two notions: \0 as the comment char, and no comment char at all. But that's another issue.

@jbax
Copy link
Member

jbax commented Nov 22, 2019 via email

@adutra
Copy link
Author

adutra commented Nov 22, 2019

That's an option too, but I sometimes deal with data encrypted with weird algorithms, and 0xFF is unfortunately likely to appear as well. That's why I came up quoting whichever character is set to be the comment character. At least it's bullet-proof :-)

@jbax
Copy link
Member

jbax commented Nov 26, 2019

Implemented in the latest 2.8.4-SNAPSHOT. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants