-
Notifications
You must be signed in to change notification settings - Fork 990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fread crashes on files with mixed Windows and Unix line endings #1183
Comments
Thanks for the great reproducible report, trying v1.9.5 and the version information. Marked high priority. |
Reproduced. The 1st line contains the column names and ends with \r\n, the Windows standard. However the rest of the lines end with the unix standard \n. Here are the relevant lines from the output :
So fread thinks the file is a two line file. Notice the single ^M if you scroll the following window all the way to the right :
I can't think of a quick fix for this in fread. Leaving open and postponing to v1.9.8. In the meantime :
Works fine now. Although the bumping messages are separate to tidy up. |
A subtle problem — thanks for the help! Everything below is notes on how I finished solving my problem. They're included here for anyone who stumbles across this thread dealing with a similar problem. apt-get isn't available for Mac OS (as far as I can tell). A command-line alternative to
Following this stackoverflow answer, "you can only do this safely if CR appears in your file only as the first byte of a CRLF byte pair." This is the case with my two files. However,
Finally, using |
You can use homebrew: Install it as shown in Dos2unix formula |
Closed in e79d63b |
I'm brand new to
fread
anddata.table
. I'm trying outfread
as (hopefully) a faster alternative toread.csv
for two large sets of data from the US Department of Education (about 320 and 270 MB each). My dataset can be downloaded as a zip file here: http://nces.ed.gov/ipeds/deltacostproject/download/IPEDS_Analytics_DCP_87_12_CSV.zip (110 MB). The zip file contains two csv files. For this MRE, I'm working withdelta_public_87_99.csv
.Given the csv in the working directory, this MRE reliably causes R to crash on my machine:
Here's the output from
fread
:At this point, memory use starts to grow dramatically. Around 6-8 GB the R session is aborted: "R encountered a fatal error. The session was terminated."
Output from
sessionInfo()
:Looking for similar problems, I found issue #1035, "fread fails if whitespace before first character." However, using
readLines
, it doesn't look like there are preceding whitespaces in my data file.Since I'm new to
fread
anddata.table
, I'm not sure if I might be missing something basic, so for now I am posing this as a [Support] rather than a bug report.The text was updated successfully, but these errors were encountered: