-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failing to parse quotes #25
Comments
So, it does parse the first field correctly? I'm going to bet one of the flags isn't getting cleared properly somewhere. I'll poke around with it a little more later today. |
Ah, sorry I didn't mention it. But yes, the first field is parsed correctly as I played around a bit and got it working by making the following changes (although I'm not sure I've understood everything, so any input from you is highly appreciated! In
That, however, caused issues when starting with many quotes in a field. For example:
I was able to get around this by adding a counter in the while loop inside
and in the next condition:
All my tests so far have been working, but may have missed something. I'm open for alternative solutions. It's very likely that I missed something obvious that could make a simpler solution. Thanks! |
I think we'll have to fix the non-luajit version of the I stripped this down to the bare bones, and tried switching between the two local sbyte = string.byte
-- vanilla lua closing quote finder
local function vanillaFindClosingQuote(i, inputLength, inputString, quote, doubleQuoteEscape)
local j, difference
i, j = inputString:find('"+', i)
if j == nil then return end
if i == nil then
return inputLength-1, doubleQuoteEscape
end
difference = j - i
-- print("difference", difference, "I", i, "J", j)
if difference >= 1 then doubleQuoteEscape = true end
if difference == 1 then
return vanillaFindClosingQuote(j+1, inputLength, inputString, quote, doubleQuoteEscape)
end
return j-1, doubleQuoteEscape
end
function luajitFindClosingQuote(i, inputLength, inputString, quote, doubleQuoteEscape)
local currentChar, nextChar = sbyte(inputString, i), nil
while i <= inputLength do
-- print(i)
nextChar = sbyte(inputString, i+1)
-- this one deals with " double quotes that are escaped "" within single quotes "
-- these should be turned into a single quote at the end of the field
if currentChar == quote and nextChar == quote then
doubleQuoteEscape = true
i = i + 2
currentChar = sbyte(inputString, i)
-- identifies the escape toggle
elseif currentChar == quote and nextChar ~= quote then
-- print("exiting", i-1)
return i-1, doubleQuoteEscape
else
i = i + 1
currentChar = nextChar
end
end
end
local a = '"A""B""""C"'
local b = '"A""""B""C"'
print("vanilla a:", vanillaFindClosingQuote(2, #a, a, sbyte('"'), false))
print("luajit b:", luajitFindClosingQuote(2, #a, a, sbyte('"'), false))
print("vanilla b:", vanillaFindClosingQuote(2, #b, b, sbyte('"'), false))
print("luajit b:", luajitFindClosingQuote(2, #b, b, sbyte('"'), false)) results in:
and those should all be 10... Feel free to keep working on it, the help is appreciated! I'll poke around some more tomorrow. |
Thanks for taking the time to look into this. To be honest I forgot about the luajit function since I was only using the other one. But as previously mentioned I got a better result (same as the luajit version in your tests) from But only changing that did not work when a field started with quotes. For example
But honestly this was just something I found by trying out a few things without actually knowing what I was doing :) |
So, if we change vanilla quote finder to this: local function vanillaFindClosingQuote(i, inputLength, inputString, quote, doubleQuoteEscape)
local j, difference
i, j = inputString:find('"+', i)
if j == nil then return end
if i == nil then
return inputLength-1, doubleQuoteEscape
end
difference = j - i
-- print("difference", difference, "I", i, "J", j)
if difference >= 1 then doubleQuoteEscape = true end
if difference % 2 == 1 then
return vanillaFindClosingQuote(j+1, inputLength, inputString, quote, doubleQuoteEscape)
end
return j-1, doubleQuoteEscape
end everything works as expected. It handles the use cases for any number of escaped double quotes and all the unit tests pass. I tried doing a with two I can roll these changes into 1.2.0 (in the parseLineIterator branch) and try to get it out in the next few days. |
Great! I will try this next week, thanks for your quick support. |
I've not got the changes in the parseLineIterator branch if you want to give it a try! |
I tried updating only the I haven't tried the parseLineIterator branch since I made some local changes in my file that was taken from the last release. Maybe possible to merge the parseLineIterator into that? What else is new compared to the last release? (1.1.5). Unless there's anything major I might as well stick with this and just patch it. Thanks! |
It's the addition of fixed-buffer reading, so you don't have to load the entire file into memory first, a lot of refactoring, and some speed improvements. Regardless, thanks for letting me know that it seems to be working for you! |
## Features * Can now parse files line by line in a fixed-size reading mode * Now has an option to ignore quotes when parsing ## Improvements * Speed increases in vanilla Lua and LuaJIT (benchmarks updated!) * Refactored code for easier maintenance ## Bugfixes * Better handling of multiple escaped quotes in vanilla lua (thanks @fredrikj83 #25)
I tried the following CSV:
`
"A""B""""C";"A""""B""C";Test
`
But it seems to fail to parse the second field (which should result in
A""B"C
but ends up withC
). I haven't been able to fully understand why, but it seems to be because of the first condition in the while loop.Any ideas?
The text was updated successfully, but these errors were encountered: