Better quote rule decision for tied options. #2436
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes #2404
Closes #2196
Thanks @franknarf1 for the great file!
It wasn't the jumps per se. The first 100 lines all had quoted fields which all contained a comma. The
quoteRule
was then bumped to 3 (meaning ignore quoting) because it found 100 consistent lines of 17 columns rather than 16 lines with that quoteRule (17 beat 16). I've changed it to only resolve ties (i.e. different ways to parse the same number of consistent lines) with a differentsep
and no longer the samesep
but differentquoteRule
. The lowerquoteRule
(0=double and 1=escape) are the standard ones so those should (and now do) take preference in these tied cases.That wrong quote rule was then causing the jump problem later. I've lessened the impact of that (it now skips sample jumps where there's a format error even after nextGoodLine) but there's more work to do on that.
The test file
grr.csv
is added to the test suite with 4 large columns removed to reduce the size from 860KB down to 283KB. It generated CRAN's >5MB warning at the original size. The columns removed are the ones with long strings (>100 chars) without any of the quoting features being tested.First 16 lines from the test file with quoted fields highlighted :