Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fread crashes on files with mixed Windows and Unix line endings #1183

Closed
dhicks opened this issue Jun 18, 2015 · 5 comments
Closed

fread crashes on files with mixed Windows and Unix line endings #1183

dhicks opened this issue Jun 18, 2015 · 5 comments
Milestone

Comments

@dhicks
Copy link

dhicks commented Jun 18, 2015

I'm brand new to fread and data.table. I'm trying out fread as (hopefully) a faster alternative to read.csv for two large sets of data from the US Department of Education (about 320 and 270 MB each). My dataset can be downloaded as a zip file here: http://nces.ed.gov/ipeds/deltacostproject/download/IPEDS_Analytics_DCP_87_12_CSV.zip (110 MB). The zip file contains two csv files. For this MRE, I'm working with delta_public_87_99.csv.

Given the csv in the working directory, this MRE reliably causes R to crash on my machine:

library(data.table)

sessionInfo()

ipeds1 <- 'delta_public_87_99.csv'

colclasses <- c(
    rep('numeric', 5), 
    rep('character', 5), 
    rep('numeric', 964))

#thing <- read.csv(ipeds1, colClasses = colclasses)
thing <- fread(ipeds1, colClasses = colclasses, verbose=TRUE)

Here's the output from fread:

# Input contains no \n. Taking this to be a filename to open
# File opened, filesize is 0.294936 GB.
# Memory mapping ... ok
# Detected eol as \r\n (CRLF) in that order, the Windows standard.
# Positioned on line 1 after skip or autostart
# This line is the autostart and not blank so searching up for the last non-blank ... line 1
# Detecting sep ... ','
# Detected 124595570 columns. Longest stretch was from line 2 to line 2
# Starting data input on line 2 (either column names or first row of data). First 10 characters: -434973,19
# Some fields on line 2 are not type character (or are empty). Treating as a data row and using default column names.

At this point, memory use starts to grow dramatically. Around 6-8 GB the R session is aborted: "R encountered a fatal error. The session was terminated."

Output from sessionInfo():

# R version 3.2.0 (2015-04-16)
# Platform: x86_64-apple-darwin13.4.0 (64-bit)
# Running under: OS X 10.10.3 (Yosemite)
# 
# locale:
# [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
# 
# attached base packages:
# [1] stats     graphics  grDevices utils     datasets  methods   base     
# 
# other attached packages:
# [1] data.table_1.9.5
# 
# loaded via a namespace (and not attached):
# [1] tools_3.2.0  chron_2.3-45

Looking for similar problems, I found issue #1035, "fread fails if whitespace before first character." However, using readLines, it doesn't look like there are preceding whitespaces in my data file.

Since I'm new to fread and data.table, I'm not sure if I might be missing something basic, so for now I am posing this as a [Support] rather than a bug report.

@mattdowle mattdowle added this to the v1.9.6 milestone Jun 18, 2015
@mattdowle
Copy link
Member

Thanks for the great reproducible report, trying v1.9.5 and the version information. Marked high priority.

@mattdowle
Copy link
Member

Reproduced. The 1st line contains the column names and ends with \r\n, the Windows standard. However the rest of the lines end with the unix standard \n. Here are the relevant lines from the output :

# Detected eol as \r\n (CRLF) in that order, the Windows standard.
# Detected 124595570 columns. Longest stretch was from line 2 to line 2

So fread thinks the file is a two line file. Notice the single ^M if you scroll the following window all the way to the right :

$ cat -v delta_public_87_99.csv | head -n 3
"groupid","academicyear","unitid_linchpin","unitid","isgrouped","instname","TCSName","city","state","zip","ansi_code","sector","sector_revised","iclevel","control","oberegion","census_division","census_region","region_compact","carnegie2000","carnegiegrp_2000","carnegie_sector_2000","carnegie2005","carnegiegrp_2005","carnegie_sector_2005","carnegie2010","carnegiegrp_2010","carnegie_sector_2010","flagship","landgrnt","hbcu","hsi","medical","hospital","cpi_index","cpi_scalar_2012","hepi_index","hepi_scalar_2012","heca_index","heca_scalar_2012","has_instruction","has_fte","has_completions","has_all","matched_n_87_12_26","matched_n_02_12_11","matched_n_07_12_6","fte_count","fte12mn","tuition01","tuition02","tuition03","nettuition01","net_student_tuition","federal03","state03","local03","state_local_app","federal07","federal07_net_pell","state06","local06","state_local_grant_contract","federal10","federal10_net_pell","state09","fed_state_loc_grants_con","private03","affiliate01","investment01","endowment03","priv_invest_endow","edactivity03","auxiliary03","hospital03","other03","other04","independent03","other05","auxother_rev","stable_operating_rev","total03_revenue","tot_rev_wo_auxother_sum","tot_rev_w_auxother_sum","unrestricted_revenue","restricted_revenue","tuition_reliance_a1","tuition_reliance_b1","tuition_reliance_c1","tuition_reliance_a2","tuition_reliance_b2","tuition_reliance_c2","govt_reliance_a","govt_reliance_b","govt_reliance_c","appliedaid01","appliedaid02","grant01","grant02","grant03","grant04","grant05","grant06","grant07","institutional_grant_aid","institutional_grant_aid_share","tuition_discount","any_aid_num","any_aid_pct","fed_grant_num","fed_grant_pct","fed_grant_avg_amount","state_grant_num","state_grant_pct","state_grant_avg_amount","inst_grant_num","inst_grant_pct","inst_grant_avg_amount","loan_num","loan_pct","loan_avg_amount","tuition01_tf","fee01_tf","tuitionfee01_tf","tuition02_tf","fee02_tf","tuitionfee02_tf","tuition03_tf","fee03_tf","tuitionfee03_tf","tuition05_tf","fee05_tf","tuitionfee05_tf","tuition06_tf","fee06_tf","tuitionfee06_tf","tuition07_tf","fee07_tf","tuitionfee07_tf","instruction01","instruction01_fasb","instruction02","research01","research01_fasb","research02","pubserv01","pubserv01_fasb","pubserv02","acadsupp01","acadsupp01_fasb","acadsupp02","studserv01","studserv01_fasb","studserv02","instsupp01","instsupp01_fasb","instsupp02","opermain01","opermain01_fasb","opermain02","depreciation01","grants01u","grants01r","grants01","grants01_fasb","auxiliary01","auxiliary01_fasb","auxiliary02","hospital01","hospital01_fasb","hospital02","independ01","independ01_fasb","independ02","otheroper01","otheroper02","totaloper01","totaloper02","totaloper03","totaloper04","totaloper05","interest01","othernon01","othernon02","other01","other01_fasb","other02","totalnon01","totalnon02","total01","total02","total03_expenses","total04","total05","total06","total07","eandg01","eandg01_sum","eandg01_w_auxother_sum","eandg02","eandg03","eandg03a","eandg03b","eandg04","eandg05","eandg06","eandg07","eandg08","rschpub01","acadinststud01","acadinstsupp01","education_share","noneducation_share","other_ed_related_cost","instruction_share","studserv_share","admin_share","eandr","eandr_degree","eandr_completion","research_share","research_related_cost","pubserv_share","pubserv_related_cost","research_pubserv_grants","auxother_cost","sticker_subsidy","average_subsidy","sticker_price_share","nettuition_share","average_subsidy_share","gross_auxiliary_margin","gross_auxiliary_margin_percent","gross_operating_margin","fringe_benefit_play","fringe_benefit_play_imp","instr_sal_as_pct_instrtot","labor_share_of_instructcost","research_sal_as_pct_restot","labor_share_of_rescost","acadsupp_sal_as_pct_acadsupptot","labor_share_of_acadsuppcost","studserv_sal_as_pct_studservtot","labor_share_of_studservcost","instsupp_sal_as_pct_instsupptot","labor_share_of_instsuppcost","pubserv_sal_as_pct_pubservtot","labor_share_of_pubservcost","assets06","liabilities07","assets11","land04","buildings05","equipment05","assets15","endow02m","assets16","associatedegrees","bachelordegrees","masterdegrees","doctordegrees","firstprofdegrees","awardslessthan1yr","awards1yrto2yr","awards2yrto4yr","postbacccertificates","postmastcertificates","firstprofcertificates","postmastFPcert","totaldegrees","totaldegrees_100fte","totalawards","totalcertificates","certificates_awards_100fte","totalcompletions","totalcompletions_100fte","assoc_deg_share_of_tot_deg","bach_deg_share_of_tot_deg","grad_deg_share_of_tot_deg","doc_deg_share_of_tot_deg","prof_deg_share_of_tot_deg","grad_rate_150_n","grad_rate_150_p","grad_rate_adj_cohort_n","grad_rate_150_n4yr","grad_rate_150_p4yr","grad_rate_adj_cohort_n4yr","grad_rate_150_n2yr","grad_rate_150_p2yr","grad_rate_adj_cohort_n2yr","ugentering","grscohort","pt_ugentering","grscohortpct","ftretention_rate","ptretention_rate","fall_cohort_num","fall_cohort_pct","fall_cohort_num_indistrict","fall_cohort_pct_indistrict","fall_cohort_num_instate","fall_cohort_pct_instate","fall_cohort_num_outofstate","fall_cohort_pct_outofstate","fall_cohort_num_resunknown","fall_cohort_pct_resunknown","fall_total_undergrad","year_cohort_num","year_cohort_pct","year_total_undergrad","ft_first_time_first_yr_deg_seek","other_full_time","total_full_time_undergraduates","returning_to_total_undergraduate","total_full_time_first_prof","total_full_time_graduates","total_full_time_postbacc","total_full_time","pt_first_time_first_yr_deg_seek","other_part_time","total_part_time_undergraduates","total_part_time_first_prof","total_part_time_graduates","total_part_time_postbacc","total_part_time","total_undergraduates","total_graduates","total_first_prof","total_postbacc","total_enrollment","total_enrollment_amin_tot","total_enrollment_asian_tot","total_enrollment_black_tot","total_enrollment_hisp_tot","total_enrollment_white_tot","total_enrollment_multi_tot","total_enrollment_unkn_tot","total_enrollment_nonres_tot","ftug_share_of_total_ft_enrl","ptug_share_of_total_pt_enrl","ftall03ug","ftall04ug","ftall05ug","ftall06ug","ftall08ug","ftall09ug","ftall10ug","ftall11ug","ftall12ug","ftall13ug","ftall14ug","ftall03pr","ftall04pr","ftall05pr","ftall06pr","ftall08pr","ftall09pr","ftall10pr","ftall11pr","ftall12pr","ftall13pr","ftall14pr","ftall03gr","ftall04gr","ftall05gr","ftall06gr","ftall08gr","ftall09gr","ftall10gr","ftall11gr","ftall12gr","ftall13gr","ftall14gr","ftall03pb","ftall04pb","ftall05pb","ftall06pb","ftall08pb","ftall09pb","ftall10pb","ftall11pb","ftall12pb","ftall13pb","ftall14pb","ftall03","ftall04","ftall05","ftall06","ftall08","ftall09","ftall10","ftall11","ftall12","ftall13","ftall14","ptall03ug","ptall04ug","ptall05ug","ptall06ug","ptall08ug","ptall09ug","ptall10ug","ptall11ug","ptall12ug","ptall13ug","ptall14ug","ptall03pr","ptall04pr","ptall05pr","ptall06pr","ptall08pr","ptall09pr","ptall10pr","ptall11pr","ptall12pr","ptall13pr","ptall14pr","ptall03gr","ptall04gr","ptall05gr","ptall06gr","ptall08gr","ptall09gr","ptall10gr","ptall11gr","ptall12gr","ptall13gr","ptall14gr","ptall03pb","ptall04pb","ptall05pb","ptall06pb","ptall08pb","ptall09pb","ptall10pb","ptall11pb","ptall12pb","ptall13pb","ptall14pb","ptall03","ptall04","ptall05","ptall06","ptall08","ptall09","ptall10","ptall11","ptall12","ptall13","ptall14","ftallgrp1ug","ftallgrp2ug","ftallgrp3ug","ftallgrp4ug","ptallgrp1ug","ptallgrp2ug","ptallgrp3ug","ptallgrp4ug","ftallgrp1pr","ftallgrp2pr","ftallgrp3pr","ftallgrp4pr","ptallgrp1pr","ptallgrp2pr","ptallgrp3pr","ptallgrp4pr","ftallgrp1gr","ftallgrp2gr","ftallgrp3gr","ftallgrp4gr","ptallgrp1gr","ptallgrp2gr","ptallgrp3gr","ptallgrp4gr","ftallgrp1pb","ftallgrp2pb","ftallgrp3pb","ftallgrp4pb","ptallgrp1pb","ptallgrp2pb","ptallgrp3pb","ptallgrp4pb","ftallgrp1","ftallgrp2","ftallgrp3","ftallgrp4","ptallgrp1","ptallgrp2","ptallgrp3","ptallgrp4","dependent1","dependent2","dependent3","dependent4","dependent5","fisap_dependent_total","independent1","independent2","independent3","independent4","independent5","fisap_independent_total","fisap_0_14999k_share","fisap_15_29999k_share","applcn","applcnm","applcnw","admssn","admssnm","admssnw","enrlt","enrlm","enrlw","applicantcount","admitcount","enrollftcount","enrollptcount","actnum","actpct","actcm25","actcm75","acten25","acten75","actmt25","actmt75","satnum","satpct","satmt25","satmt75","satvr25","satvr75","conthoursug","credhoursgr","credhoursug","instacttype","ftall1","ftall3","ftall4","ftall5","ftall6","ftall7","ftall8","ptall1","ptall2","ptall3","ptall4","ptall5","ptall6","ptall7","ptall8","ft_faculty_per_100fte","pt_faculty_per_100fte","total_executive_admin_managerial","ft_executive_per_100fte","pt_executive_per_100fte","total_other_professionals","ft_other_professional_per_100fte","pt_other_professional_per_100fte","total_technical_and_paraprof","ft_technical_per_100fte","pt_technical_per_100fte","total_clerical_secretarial","ft_clerical_per_100fte","pt_clerical_per_100fte","total_skilled_craft","ft_skilled_per_100fte","pt_skilled_per_100fte","total_service_maintenance","ft_service_per_100fte","pt_service_per_100fte","ft_exec_admin_man_share","ft_other_professional_share","ft_technical_paraprof_share","ft_clerical_secretarial_share","ft_skilled_craft_share","ft_service_maintenance_share","total_faculty_all","full_time_employees","full_time_employee_share","all_employees","ft_faculty_salary","full_time_employee_100fte","full_time_faculty_share","faculty_instr_headcount","salarytotal","Ifte_count","ifte12mn","Ituition01","Ituition02","Ituition03","Inettuition01","Ifederal03","Istate03","Ilocal03","Ifederal07","Istate06","Ilocal06","Ifederal10","Istate09","Iprivate03","Iinvestment01","Iaffiliate01","Iendowment03","Iedactivity03","Iauxiliary03","Ihospital03","Iother03","Iother04","Iindependent03","Iother05","Itotal03_revenue","Iappliedaid01","Iappliedaid02","Igrant01","Igrant02","Igrant03","Igrant04","Igrant05","Igrant06","Igrant07","Iany_aid_num","Iany_aid_pct","Ifed_grant_num","Ifed_grant_pct","Ifed_grant_avg_amount","Istate_grant_num","Istate_grant_pct","Istate_grant_avg_amount","Iinst_grant_num","Iinst_grant_pct","Iinst_grant_avg_amount","Iloan_num","Iloan_pct","Iloan_avg_amount","Ituition01_tf","Ifee01_tf","Ituitionfee01_tf","Ituition02_tf","Ifee02_tf","Ituitionfee02_tf","Ituition03_tf","Ifee03_tf","Ituitionfee03_tf","Ituition05_tf","Ifee05_tf","Ituitionfee05_tf","Ituition06_tf","Ifee06_tf","Ituitionfee06_tf","Ituition07_tf","Ifee07_tf","Ituitionfee07_tf","Iinstruction01","Iinstruction01_fasb","Iinstruction02","Iresearch01","Iresearch01_fasb","Iresearch02","Ipubserv01","Ipubserv01_fasb","Ipubserv02","Iacadsupp01","Iacadsupp01_fasb","Iacadsupp02","Istudserv01","Istudserv01_fasb","Istudserv02","Iinstsupp01","Iinstsupp01_fasb","Iinstsupp02","Iopermain01","Iopermain01_fasb","Iopermain02","Idepreciation01","Igrants01u","Igrants01r","Igrants01","Igrants01_fasb","Iauxiliary01","Iauxiliary01_fasb","Iauxiliary02","Ihospital01","Ihospital01_fasb","Ihospital02","Iindepend01","Iindepend01_fasb","Iindepend02","Iotheroper01","Iotheroper02","Itotaloper01","Itotaloper02","Itotaloper03","Itotaloper04","Itotaloper05","Iinterest01","Iothernon01","Iothernon02","Iother01","Iother01_fasb","Iother02","Itotalnon01","Itotalnon02","Itotal01","Itotal02","Itotal03_expenses","Itotal04","Itotal05","Itotal07","Ieandg01","Ieandg02","Ieandg03","Ieandg03a","Ieandg03b","Ieandg04","Ieandg05","Ieandg07","Ieandg08","Iassets06","Iliabilities07","Iassets11","Iland04","Ibuildings05","Iequipment05","Iassets15","Iendow02m","Iassets16","Iassociatedegrees","Ibachelordegrees","Imasterdegrees","Idoctordegrees","Ifirstprofdegrees","Iawardslessthan1yr","Iawards1yrto2yr","Iawards2yrto4yr","Ipostbacccertificates","Ipostmastcertificates","Ifirstprofcertificates","Ipostmastfpcert","Itotaldegrees","Itotalawards","Itotalcertificates","Itotalcompletions","Igrad_rate_150_n","Igrad_rate_150_p","Igrad_rate_adj_cohort_n","Igrad_rate_150_n4yr","Igrad_rate_150_p4yr","Igrad_rate_adj_cohort_n4yr","Igrad_rate_150_n2yr","Igrad_rate_150_p2yr","Igrad_rate_adj_cohort_n2yr","Iugentering","Igrscohort","Ipt_ugentering","Igrscohortpct","Iftretention_rate","Iptretention_rate","Ifall_cohort_num","Ifall_cohort_pct","Ifall_cohort_num_indistrict","Ifall_cohort_pct_indistrict","Ifall_cohort_num_instate","Ifall_cohort_pct_instate","Ifall_cohort_num_outofstate","Ifall_cohort_pct_outofstate","Ifall_cohort_num_resunknown","Ifall_cohort_pct_resunknown","Ifall_total_undergrad","Iyear_cohort_num","Iyear_cohort_pct","Iyear_total_undergrad","Ift_first_time_first_yr_deg_seek","Iother_full_time","Itotal_full_time_undergraduates","Itotal_full_time_first_prof","Itotal_full_time_graduates","Itotal_full_time_postbacc","Itotal_full_time","Ipt_first_time_first_yr_deg_seek","Iother_part_time","Itotal_part_time_undergraduates","Itotal_part_time_first_prof","Itotal_part_time_graduates","Itotal_part_time_postbacc","Itotal_part_time","itotal_undergraduates","itotal_graduates","itotal_first_prof","itotal_postbacc","Itotal_enrollment","Itotal_enrollment_amin_tot","Itotal_enrollment_asian_tot","Itotal_enrollment_black_tot","Itotal_enrollment_hisp_tot","Itotal_enrollment_white_tot","Itotal_enrollment_unkn_tot","Itotal_enrollment_nonres_tot","Iftall03ug","Iftall04ug","Iftall05ug","Iftall06ug","Iftall08ug","Iftall09ug","Iftall10ug","Iftall11ug","Iftall12ug","Iftall13ug","Iftall14ug","Iftall03pr","Iftall04pr","Iftall05pr","Iftall06pr","Iftall08pr","Iftall09pr","Iftall10pr","Iftall11pr","Iftall12pr","Iftall13pr","Iftall14pr","Iftall03gr","Iftall04gr","Iftall05gr","Iftall06gr","Iftall08gr","Iftall09gr","Iftall10gr","Iftall11gr","Iftall12gr","Iftall13gr","Iftall14gr","Iftall03","Iftall04","Iftall05","Iftall06","Iftall08","Iftall09","Iftall10","Iftall11","Iftall12","Iftall13","Iftall14","Iptall03ug","Iptall04ug","Iptall05ug","Iptall06ug","Iptall08ug","Iptall09ug","Iptall10ug","Iptall11ug","Iptall12ug","Iptall13ug","Iptall14ug","Iptall03pr","Iptall04pr","Iptall05pr","Iptall06pr","Iptall08pr","Iptall09pr","Iptall10pr","Iptall11pr","Iptall12pr","Iptall13pr","Iptall14pr","Iptall03gr","Iptall04gr","Iptall05gr","Iptall06gr","Iptall08gr","Iptall09gr","Iptall10gr","Iptall11gr","Iptall12gr","Iptall13gr","Iptall14gr","Iptall03","Iptall04","Iptall05","Iptall06","Iptall08","Iptall09","Iptall10","Iptall11","Iptall12","Iptall13","Iptall14","Iftallgrp1ug","Iftallgrp2ug","Iftallgrp3ug","Iftallgrp4ug","Iptallgrp1ug","Iptallgrp2ug","Iptallgrp3ug","Iptallgrp4ug","Iftallgrp1pr","Iftallgrp2pr","Iftallgrp3pr","Iftallgrp4pr","Iptallgrp1pr","Iptallgrp2pr","Iptallgrp3pr","Iptallgrp4pr","Iftallgrp1gr","Iftallgrp2gr","Iftallgrp3gr","Iftallgrp4gr","Iptallgrp1gr","Iptallgrp2gr","Iptallgrp3gr","Iptallgrp4gr","Iftallgrp1","Iftallgrp2","Iftallgrp3","Iftallgrp4","Iptallgrp1","Iptallgrp2","Iptallgrp3","Iptallgrp4","Idependent1","Idependent2","Idependent3","Idependent4","Idependent5","Iindependent1","Iindependent2","Iindependent3","Iindependent4","Iindependent5","Iapplcn","Iapplcnm","Iapplcnw","Iadmssn","Iadmssnm","Iadmssnw","Ienrlt","Ienrlm","Ienrlw","Iapplicantcount","Iadmitcount","Ienrollftcount","Ienrollptcount","Iactnum","Iactpct","Iactcm25","Iactcm75","Iacten25","Iacten75","Iactmt25","Iactmt75","Isatnum","Isatpct","Isatmt25","Isatmt75","Isatvr25","Isatvr75","Icredhoursug","Iconthoursug","Icredhoursgr","Iftall1","Iftall3","Iftall4","Iftall5","Iftall6","Iftall7","Iftall8","Iptall1","Iptall2","Iptall3","Iptall4","Iptall5","Iptall6","Iptall7","Iptall8","Ifaculty_instr_headcount","Isalarytotal"^M
-434973,1999,434973,434973,0,"University of Phoenix-Maryland Campus",,"Columbia","MD","21045-542",24,3,3,1,3,2,5,3,1,-3,,,29,5,,29,5,,0,2,2,0,-1,-1,164.5,0.722870388680157,189.1,0.644952251023192,68.89797744,0.6889797744,1,0,0,1,0,0,0,,,,,143115,143115,143115,,,,,,,,,,,,,,,,,,,1346,,,26554,,,26554,27900,171015,171015,143115,171015,143115,27900,1,1,1,0.990682606378192,0.990682606378192,0.990682606378192,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,6360,,,6360,,,6360,,,7440,,,7440,,,7440,106931,,,,,,,,,73726,,,8513,,,622727,,,,,,9443,,,,,,,,,,,,,,,,,,,,,,,,,,,,,811897,251739,51984,9443,498731,,,811897,811897,811897,,,,,,,,,,,704966,696453,1,,696453,0.131705130084235,0.0104853201822399,0.857809549733525,811897,,,,,,,,,668782,668782,0.176272359671239,0.176272359671239,0.823727640328761,,,-640882,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,120,2,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,-1,-2,-1,-1,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,0,-1,-1,0,-2,-1,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,0,-1,-1,0,-1,-1,0,-1,-1,0,-1,-1,0,-1,-1,0,0,-1,-1,-1,-1,-1,-1,-1,-1,0,-1,-1,0,-1,-1,0,-1,-1,-1,-1,-1,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,0,0,0,0,0,-1,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-2,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-2,-1,-1,-1,-1,-1,-1,-2,-1,-2,-2,-2,-2,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,0,-1,-1,-2,-2,-2,-2,-2,-2,-2,-2,-2,-2,-2,-2,-2,-2,-2,-1,-1
-434937,1999,434937,434937,0,"Yeshiva College of the Nations Capital",,"Silver  Spring","MD","20902",24,2,2,1,2,2,5,3,1,-3,,,24,5,10,24,5,10,0,2,2,0,-1,-1,164.5,0.722870388680157,189.1,0.644952251023192,68.89797744,0.6889797744,0,0,1,1,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,4300,,,4300,,,4300,,,4300,,,4300,,,4300,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,24,0,0,0,0,0,0,0,0,0,0,24,,0,0,,24,,0,1,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,90,888,2,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,-1,-2,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-2,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,0,-1,-1,0,-1,-1,0,-1,-1,0,-1,-1,0,-1,-1,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,0,0,0,0,0,0,0,0,0,0,0,-2,0,0,0,0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-2,-1,-1,-1,-1,-1,-1,-2,-1,-2,-2,-2,-2,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,0,-1,0,-2,-2,-2,-2,-2,-2,-2,-2,-2,-2,-2,-2,-2,-2,-2,-1,-1

I can't think of a quick fix for this in fread. Leaving open and postponing to v1.9.8.

In the meantime :

$ sudo apt-get install dos2unix
$ dos2unix delta_public_87_99.csv  # unusually needed as file has *mixed* line endings!
$ R
> require(data.table)
> fread("delta_public_87_99.csv", verbose=TRUE)

Works fine now. Although the bumping messages are separate to tidy up.

@mattdowle mattdowle changed the title [Support] fread causes R session crash fread crashes on files with mixed Windows and Unix line endings Jun 19, 2015
@mattdowle mattdowle modified the milestones: v1.9.8, v1.9.6 Jun 19, 2015
@mattdowle mattdowle added Medium and removed High labels Jun 19, 2015
@dhicks
Copy link
Author

dhicks commented Jun 20, 2015

A subtle problem — thanks for the help!

Everything below is notes on how I finished solving my problem. They're included here for anyone who stumbles across this thread dealing with a similar problem.

apt-get isn't available for Mac OS (as far as I can tell). A command-line alternative to dos2unix goes something like this:

tr -d '\r' <delta_public_87_99.csv >87_99.csv

Following this stackoverflow answer, "you can only do this safely if CR appears in your file only as the first byte of a CRLF byte pair." This is the case with my two files.

However, tr fails with an illegal byte sequence error on the other data file. Following this other stackoverflow answer, this appears to be due to some encoding issues. Workaround:

LC_ALL=C && tr -d '\r' <delta_public_00_12.csv >00_12.csv 

Finally, using rbind to combine the two datasets required defining column classes in advance (using colclasses as in my first comment).

@dhicks dhicks closed this as completed Jun 20, 2015
@dhicks dhicks reopened this Jun 20, 2015
@arunsrinivasan
Copy link
Member

apt-get isn't available for Mac OS (as far as I can tell).

You can use homebrew: Install it as shown in Dos2unix formula

@st-pasha
Copy link
Contributor

Closed in e79d63b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants