-
Notifications
You must be signed in to change notification settings - Fork 993
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fread buffer overflow with version 1.9.4 #956
Comments
Op 16-11-14 om 17:13 schreef Arun:
I just read README.md and noticed that there are bug fixes for fread, I tried to install 1.9.5, following the instructions (using devtools Maybe you can you it a shot? Thanks,
|
|
Well, I tried again by following the instructions on library(devtools) remove.packages("data.table") # revert back to CRAN While doing that I did not see any error messages. But the version Any suggestions? Op 16-11-14 om 21:39 schreef Jan Gorecki:
|
try only the first two lines library(devtools)
install_github("Rdatatable/data.table", build_vignettes = FALSE) |
Yes, in that case I get version 1.9.5. So the instructions should be corrected? WIth this version of 1.9.5 I tried again and now the buffer overflow is gone. However, fread fails in another way: Error in fread(input = file_data, header = FALSE) : It strikes me that the code seems to test the format of the header row, while there is no header row (header = FALSE). FWIW, the file format is space delimited (with one ore more spaces) and has 561 numeric values (e.g. "-6.6768331e-001") and the lines start with one ore more spaces. The function read.table() has no problem with is. The work around I use now is "data <- as.table.data(read.table(....))" |
Thanks, I've updated installation instructions to make it clearer. |
@heraldb If you could please go through What should the report contain? part and update your post, it would be great. The purpose of creating such instructions is to not go through this back and forth.
In any case, I've managed to reproduce the error, and marked as bug. require(data.table) ## 1.9.5
DT = fread("X_test.txt", verbose=TRUE)
# Input contains no \n. Taking this to be a filename to open
# File opened, filesize is 0.024641 GB.
# Memory mapping ... ok
# Detected eol as \r\n (CRLF) in that order, the Windows standard.
# Positioned on line 1 after skip or autostart
# This line is the autostart and not blank so searching up for the last non-blank ... line 1
# Detecting sep ... ' '
# Detected 561 columns. Longest stretch was from line 1 to line 30
# Starting data input on line 1 (either column names or first row of data). First 10 characters: 2.571777
# Error in fread("~/Downloads/X_test.txt", verbose = TRUE) :
# Not positioned correctly after testing format of header row. ch=' '
sessionInfo()
# R version 3.1.2 (2014-10-31)
# Platform: x86_64-apple-darwin13.4.0 (64-bit)
# locale:
# [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
# attached base packages:
# [1] stats graphics grDevices utils datasets methods base
# other attached packages:
# [1] data.table_1.9.5
# loaded via a namespace (and not attached):
# [1] chron_2.3-45 |
Yes the instructions to install a development version are clear now! Thanks a lot! |
👍 |
A minimal dataset to create this type of error: library(data.table)
packageVersion('data.table')
#[1] ‘1.9.5’
#create minimal data set
fn = 'data-956.txt'
write(file = fn, " 2 3")
# now read it with fread()
dt = fread(fn, header = FALSE, verbose = TRUE)
#Input contains no \n. Taking this to be a filename to open
#File opened, filesize is 0.000000 GB.
#Memory mapping ... ok
#Detected eol as \n only (no \r afterwards), the UNIX and Mac standard.
#Positioned on line 1 after skip or autostart
#This line is the autostart and not blank so searching up for the last non-blank ... line 1
#Detecting sep ... ' '
#Detected 2 columns. Longest stretch was from line 1 to line 1
#Starting data input on line 1 (either column names or first row of data). First 10 characters: 2 3
#Error in fread(fn, header = FALSE, verbose = TRUE) :
# Not positioned correctly after testing format of header row. ch=' '
sessionInfo()
#R version 3.1.1 (2014-07-10)
#Platform: x86_64-redhat-linux-gnu (64-bit)
#
#locale:
# [1] LC_CTYPE=nl_NL.utf8 LC_NUMERIC=C
# [3] LC_TIME=nl_NL.utf8 LC_COLLATE=nl_NL.utf8
# [5] LC_MONETARY=nl_NL.utf8 LC_MESSAGES=nl_NL.utf8
# [7] LC_PAPER=nl_NL.utf8 LC_NAME=C
# [9] LC_ADDRESS=C LC_TELEPHONE=C
#[11] LC_MEASUREMENT=nl_NL.utf8 LC_IDENTIFICATION=C
#
#attached base packages:
#[1] stats graphics grDevices utils datasets base
#
#other attached packages:
#[1] data.table_1.9.5
#
#loaded via a namespace (and not attached):
#[1] chron_2.3-45 methods_3.1.1
same results when removing "header = FALSE" from fread() call When changing the write() statement in the file above we can play with other file formats. We see the same results with fread() when also adding a header like this: write(file = fn, ncolumns = 1, c(" a b", " 2 3")) When removing the leading spaces in the first line the error with fread() goes away, e.g. write(file = fn, ncolumns = 1, c("a b", " 2 3")) Also removing the header line and removing the space before the first column makes the error go away: write(file = fn, "2 3") So leading spaces seems to play a role, but it's not the whole story. fread() also gets confused when multiple spaces are used as separation, which is used in some formats to align the columns by reserving space for the minus sign (so one space between "2" and "-2" but two spaces between "2" and "2" So when using double space between the columns, like this: write(file = fn, "2 3") Will make fread() produce the same error again. So in short, the problem seems to be with leading spaces and multiple spaces between columns. |
Awesome report! Appreciate it very much. Thanks. |
I can confirm the bug, also using version 1.9.4 and the same file as in the original post. |
Fixed with commit 0e7a835. Please upgrade and test. |
Hi!
I just ran into a bug with fread of version 1.9.4, when done on file "UCI HAR Dataset/test/X_test.txt" of
https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip
*** buffer overflow detected ***: /usr/lib64/R/bin/exec/R terminated
======= Backtrace: =========
/lib64/libc.so.6[0x3b92675a4f]
/lib64/libc.so.6(__fortify_fail+0x37)[0x3b92706947]
/lib64/libc.so.6[0x3b92704b20]
/lib64/libc.so.6[0x3b92704029]
/lib64/libc.so.6(_IO_default_xsputn+0xbc)[0x3b9267907c]
/lib64/libc.so.6(_IO_vfprintf+0x3190)[0x3b9264ab70]
/lib64/libc.so.6(__vsprintf_chk+0x88)[0x3b927040b8]
/lib64/libc.so.6(__sprintf_chk+0x7d)[0x3b9270400d]
/home/herald/R/x86_64-redhat-linux-gnu-library/3.1/data.table/libs/datatable.so(readfile+0x20bb)[0x7f08a2228cab]
/usr/lib64/R/lib/libR.so[0x32e3698386]
/usr/lib64/R/lib/libR.so[0x32e36d0469]
/usr/lib64/R/lib/libR.so(Rf_eval+0x260)[0x32e36d8030]
/usr/lib64/R/lib/libR.so(Rf_applyClosure+0x42c)[0x32e36d969c]
/usr/lib64/R/lib/libR.so(Rf_eval+0x336)[0x32e36d8106]
/usr/lib64/R/lib/libR.so[0x32e36db9ee]
/usr/lib64/R/lib/libR.so(Rf_eval+0x558)[0x32e36d8328]
/usr/lib64/R/lib/libR.so[0x32e36da6d3]
/usr/lib64/R/lib/libR.so(Rf_eval+0x558)[0x32e36d8328]
/usr/lib64/R/lib/libR.so(Rf_applyClosure+0x42c)[0x32e36d969c]
/usr/lib64/R/lib/libR.so(Rf_eval+0x336)[0x32e36d8106]
/usr/lib64/R/lib/libR.so(Rf_ReplIteration+0x252)[0x32e3700cd2]
/usr/lib64/R/lib/libR.so[0x32e3701021]
/usr/lib64/R/lib/libR.so(run_Rmainloop+0x44)[0x32e37010b4]
/usr/lib64/R/bin/exec/R(main+0x1b)[0x4007fb]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x3b92621d65]
/usr/lib64/R/bin/exec/R[0x40082d]
======= Memory map: ========
Let me know if you need any extra information.
Thanks,
Herald
The text was updated successfully, but these errors were encountered: