Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittently fails for large csv file #102

Closed
lethaldose opened this issue Jun 1, 2015 · 6 comments
Closed

Intermittently fails for large csv file #102

lethaldose opened this issue Jun 1, 2015 · 6 comments

Comments

@lethaldose
Copy link

Hi,

I have a csv file with 15000 entries (1.3mb) . Csv parsing fails randomly on AWS amazon linux box, but then it passes sometimes cleanly parsing all the rows correctly.There is no issue of memory, the machine instance has 4Gb memory. The error received is:

Parse Error: expected: '\"' got: 'undefined'

Sometimes the parser cannot read the complete line and so it fails for matching ending quote in parser.js:76 parsedEscapedItem function call.
The strange thing is the issue happens intermittently so it can actually read correctly few times.

the only option i pass on to fromStream method are {ignoreEmpty:true} rest are all defaults.

Any pointers would be helpful.
thanks.

@weagle08
Copy link

weagle08 commented Jan 9, 2017

I'm having the same issue on an 8k line csv.

@lethaldose
Copy link
Author

Not sure about whether latest version of fast-csv library has this issue. In our case we ended up using another csv parsing library at that time

@weagle08
Copy link

weagle08 commented Jan 9, 2017

it still has it apparently, I just tried it and it just never returned... went to a different library as well and it worked like a charm.

@mrksbnch
Copy link

mrksbnch commented Feb 2, 2018

We are seeing the same errors happening randomly on Google Compute Engine (Linux) for files with around 30MB.

@weagle08 What library did you end up using to resolve that issue?

@mrksbnch
Copy link

mrksbnch commented Feb 2, 2018

If it helps to resolve that issue we are seeing these errors on lines such as

123456,123456,2018-02-01T16:05:16Z,7,80428,65756,Unquoted_String_With_Underscores_Hypens And_Spaces And-Numbers 1,"{""JSON"":""DATA""}"

In that case, the data-invalid function returns the following Array

[
  "123456",
  "123456",
  "2018-02-01T16:05:16Z",
  "7",
  "80428",
  "65756",
  "Unquoted_String_With_Underscores_Hypens"
]

@doug-martin
Copy link
Contributor

I just retested this with the latest version v3.2.0 and I was unable to reproduce the error,

However, I did add a test to help catch this in the future. The test I added parses a 100K line file with the following line.

123456,123456,2018-02-01T16:05:16Z,7,80428,65756,Unquoted_String_With_Underscores_Hypens And_Spaces And-Numbers 1,"{""JSON"":""DATA""}"

doug-martin added a commit that referenced this issue Jul 29, 2019
doug-martin added a commit that referenced this issue Jul 29, 2019
doug-martin added a commit that referenced this issue Jul 30, 2019
doug-martin added a commit that referenced this issue Jul 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants