-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow CSV dialect to specify the meaning of blank lines #150
Comments
@hubgit this is a great suggestion. Do you want to make a formal suggestion for a mod to the spec? |
@hubgit ping re above comment ^^^ |
@hubgit ( @rgrp @danfowler ) I'm interested to have this as part of CSV dialect. In GoodTables there are config options for this as part of a validation run over a CSV file. To declare the behaviour as part of the CSV spec is appealing. |
+1 on this as a nice addition. I think list of options is:
@pwalsh what are the options to GoodTables? |
@rgrp GoodTables can ignore blank rows, and ignore ragged rows (in both cases, instead of raising and exception). |
This is what PHP's CSV parser does - a blank line is parsed as
This is an alternative to the above, avoiding the I suppose they're both describing the parser behaviour more than the data, so debatable whether they should be included in a CSV dialect. |
Maybe there should also be:
|
@hubgit thanks for the clarification. I guess my sense is that "oneEmptyField" and "zeroFields" are a bit odd in that if you have e.g. headers then I'd just say an empty row means that all fields are empty rather than zero or one. I wonder if we could just have: an |
@hubgit any final thoughts before this goes in (including on my last comment). |
@rgrp I think |
I am just recording that I am hesitating a bit on this one. Looking at various parsers these do not seem to be very common options and add a fair amount of complexity to something implementing CSV DDF - it also seems to be extending beyond a pure dialect description to something about how the data is formatted. My thoughts is that we might write this up as a pattern rather than something in the primary spec. |
That's fair enough - I imagine it could be handled fairly easily by a client, outside of the CSV parser, by simply ignoring everything before or after the first empty row. |
I'm suggesting we close this as a nice idea and WONTFIX: implementors can do it, but let's keep it out of the spec. @rgrp are you ok with that? |
WONTFIX. As per above discussion. |
A blank/empty line in a CSV file can have several meanings.
A StackOverflow discussion lists these possibilities for how a CSV parser could handle blank lines:
A CSV parser needs to know which of those meanings applies to a particular CSV file.
To see a CSV file that ends the data with a blank line then continues with free text metadata, run a search in the Kew Herbarium Catalogue, then choose one of the options under "Download specimen records".
The text was updated successfully, but these errors were encountered: