Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trying to parse a 1.7GB text file throws ArgumentOutOfRangeException #48

Open
atlemann opened this issue Oct 31, 2019 · 3 comments
Open

Comments

@atlemann
Copy link

Is there some limitation to how big a file FParsec supports? What I could find out from the code is that it reads by chunks, but I cannot seem to find which StringBuilder.Append is failing.

System.ArgumentOutOfRangeException: The length cannot be greater than the capacity. (Parameter 'valueCount')
   at System.Text.StringBuilder.Append(Char* value, Int32 valueCount)
   at System.Text.StringBuilder.Append(Char[] value, Int32 startIndex, Int32 charCount)
   at FParsec.CharStream.StreamConstructorContinue(Stream stream, Boolean leaveOpen, Encoding encoding, Boolean detectEncodingFromByteOrderMarks, Int32 byteBufferLength)
   at FParsec.CharStream..ctor(String path, Encoding encoding, Boolean detectEncodingFromByteOrderMarks, Int32 byteBufferLength)
   at FParsec.CharStream..ctor(String path, Encoding encoding)
   at FParsec.CharStream`1..ctor(String path, Encoding encoding)
   at FParsec.CharParsers.runParserOnFile[a,u](FSharpFunc`2 parser, u ustate, String path, Encoding encoding)

File.ReadAllText on the same file throws System.OutOfMemoryException: Insufficient memory to continue the execution of the program. so I have to parse it in chunks.

@stephan-tolksdorf
Copy link
Owner

The version of FParsec that is shipped in the FParsec NuGet package can't parse arbitrarily long streams, see http://www.quanttec.com/fparsec/download-and-installation.html#nuget-packages
The FParsec-Big-Data-Edition version does, but unfortunately it hasn't yet been ported to .NET Core.

@atlemann
Copy link
Author

Ok, thanks! Will it require a lot of code change to make it netstandard2.0 or is it more or less a update project files job? I could probably contribute with that although it seems Enrico has done that job already maybe?

@stephan-tolksdorf
Copy link
Owner

AFAIK, the biggest issue is that the encoding decoders in .NET Core are not serializable, which breaks the non-low-trust implementation of CharStream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants