Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fread fails with SIGABRT when printing "Expected sep (',') but new line or EOF ends field 14 on line 33 when reading data" error #802

Closed
vlsi opened this issue Sep 9, 2014 · 2 comments
Assignees
Labels
Milestone

Comments

@vlsi
Copy link

vlsi commented Sep 9, 2014

It looks like fread does not like long lines when printing error messages.
When the line gets long, fread just crashes.

The sample data can be found in this gist: https://gist.github.com/vlsi/3b9e9e986bf952360397

The input CSV is not well formed, however I expect fread would pin-point the wrong pieces.
From the comma_sequence_per_line.csv it looks like I have non-teriminated quoted field at line 9.
Ultimately I would like fread to report exactly that: "possible missing quote for the field started at line 9".

Here's the proper (at least it does not crash R) error message (I've shortened all the words and it made fread to work):

R version 3.1.0 (2014-04-10) -- "Spring Dance"
Copyright (C) 2014 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin13.2.0 (64-bit)

> library(data.table)
data.table 1.9.3  For help type: help("data.table")
> fread('fails_with_proper_error.csv')
Error in fread("fails_with_proper_error.csv") : 
  Expected sep (',') but new line or EOF ends field 14 on line 33 when reading data: 6,3,6,,3,2,7,W,J,5,2,6,"X  #, ,D,B,A,P,,,,,2,,.
0,8,,,,,,F,Z,6,2,1,,,,,,,,,,5,,.
8,2,,,,,,I,M,0,1,2,,,,,,,,,,5,,.
8,2,,,,,,A,W,6,8,3,,,,,,,,,,8,,#,I,N,L,C,D,K,L,Q,R,J,L,V,E,F,O,N,E,B,Q,Z,S,Y,J
8,3,3,8,2,1,3,Y,S,2,5,4,H,,K,,L,,,,,4,,.
8,7,7,,6,7,0,L,B,1,0,8,K,Q,A,L,Q,,,,,7,,.
8,8,3,7,4,2,5,M,N,3,1,6,I,K,S,L,Q,,,,,5,,.
7,7,0,,6,1,4,V,K,7,6,2,W,S,S,J,P,,,,,1,Y,.
2,3,6,5,8,7,1,Q,H,8,1,4,F,X,V,O,M,,,,,8,A,.
6,8,5,8,4,6,7,S,J,8,7,4,R,B,Y,X,I,,,,,3,Y,.
2,2,0,8,6,4,2,Q,O,6,8,2,I,N,S,M,C,,,,,3,Z,.
6,8,1,3,4,0,1,P,V,6,7,4,J,F,Q,L,E,,,,,1,K,.
6,3,7,0,3,4,7,E,B,5,4,3,D,V,N,L,O,,,,,8,"P",.

Here's abort case:

> fread('fails_with_abort.csv')
Abort trap: 6
bash-3.2$

Here's lldb backtrace. I am sorry I have no idea how to enable debug support to make local variables visible to lldb.

(lldb) bt
* thread #1: tid = 0x40ea86, 0x00007fff82b8d866 libsystem_kernel.dylib`__pthread_kill + 10, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
  * frame #0: 0x00007fff82b8d866 libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007fff8aa7935c libsystem_pthread.dylib`pthread_kill + 92
    frame #2: 0x00007fff8b71db1a libsystem_c.dylib`abort + 125
    frame #3: 0x00007fff8b71dc91 libsystem_c.dylib`abort_report_np + 181
    frame #4: 0x00007fff8b741860 libsystem_c.dylib`__chk_fail + 48
    frame #5: 0x00007fff8b741830 libsystem_c.dylib`__chk_fail_overflow + 16
    frame #6: 0x00007fff8b741d5a libsystem_c.dylib`__sprintf_chk + 205
    frame #7: 0x000000011177953b datatable.so`readfile + 12465
    frame #8: 0x000000010ea3256e libR.dylib`do_dotcall + 1146
    frame #9: 0x000000010ea623dc libR.dylib`bcEval + 10059
    frame #10: 0x000000010ea5f637 libR.dylib`Rf_eval + 358
    frame #11: 0x000000010ea6a9b4 libR.dylib`Rf_applyClosure + 1482
    frame #12: 0x000000010ea5fa74 libR.dylib`Rf_eval + 1443
    frame #13: 0x000000010ea6d37f libR.dylib`do_set + 245
    frame #14: 0x000000010ea5fac3 libR.dylib`Rf_eval + 1522
    frame #15: 0x000000010ea6cf35 libR.dylib`do_begin + 465
    frame #16: 0x000000010ea5fac3 libR.dylib`Rf_eval + 1522
    frame #17: 0x000000010ea6a9b4 libR.dylib`Rf_applyClosure + 1482
    frame #18: 0x000000010ea5fa74 libR.dylib`Rf_eval + 1443
    frame #19: 0x000000010ea6a152 libR.dylib`Rf_evalList + 326
    frame #20: 0x000000010ea5f821 libR.dylib`Rf_eval + 848
    frame #21: 0x000000010ea6d37f libR.dylib`do_set + 245
    frame #22: 0x000000010ea5fac3 libR.dylib`Rf_eval + 1522
    frame #23: 0x000000010ea6cf35 libR.dylib`do_begin + 465
    frame #24: 0x000000010ea5fac3 libR.dylib`Rf_eval + 1522
    frame #25: 0x000000010ea6a9b4 libR.dylib`Rf_applyClosure + 1482
    frame #26: 0x000000010ea5fa74 libR.dylib`Rf_eval + 1443
    frame #27: 0x000000010ea6d37f libR.dylib`do_set + 245
    frame #28: 0x000000010ea5fac3 libR.dylib`Rf_eval + 1522
    frame #29: 0x000000010ea6cf35 libR.dylib`do_begin + 465
    frame #30: 0x000000010ea5fac3 libR.dylib`Rf_eval + 1522
    frame #31: 0x000000010ea6a9b4 libR.dylib`Rf_applyClosure + 1482
    frame #32: 0x000000010ea5fa74 libR.dylib`Rf_eval + 1443
    frame #33: 0x000000010ea6d37f libR.dylib`do_set + 245
    frame #34: 0x000000010ea5fac3 libR.dylib`Rf_eval + 1522
    frame #35: 0x000000010ea8ff92 libR.dylib`Rf_ReplIteration + 1082
    frame #36: 0x000000010ea911c4 libR.dylib`R_ReplConsole + 147
    frame #37: 0x000000010ea91102 libR.dylib`run_Rmainloop + 73
    frame #38: 0x000000010e9c2f54 R`main + 27
(lldb) frame select 7
frame #7: 0x000000011177953b datatable.so`readfile + 12465
datatable.so`readfile + 12465:
-> 0x11177953b:  addq   $0x20, %rsp
   0x11177953f:  callq  0x11177643c               ; EXIT
   0x111779544:  movq   %r14, %rsi
   0x111779547:  cmpl   $0x5, %r13d
(lldb) register read
General Purpose Registers:
       rbx = 0x000000000000000a
       rbp = 0x00007fff5123a5b0
       rsp = 0x00007fff5123a2d0
       r12 = 0x0000000000000012
       r13 = 0x0000000000000004
       r14 = 0x00007fff5123a350
       r15 = 0x0000000000000012
       rip = 0x000000011177953b  datatable.so`readfile + 12465
13 registers were unavailable.
@mattdowle mattdowle modified the milestones: v1.9.8, v1.9.6 Oct 24, 2014
@mattdowle
Copy link
Member

The rework in e15facd fixed this.

Now returns (test added) :

> fread("fread_line_error.csv")
Error in fread("fread_line_error.csv") : 
  Field 25 on line 9 starts with quote (") but then has a problem. It can contain
balanced unescaped quoted subregions but if it does it can't contain embedded
\n as well. Check for unbalanced unescaped quotes: "D   #7 - OK K-V N Y#3...

mattdowle added a commit that referenced this issue Nov 15, 2014
+ fread long error message segfault found and fixed, root cause of #802
@mattdowle
Copy link
Member

You were right about there being a general problem with long error messages. Fixed in that last commit.
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants