You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been trying to use fread() to import a large (12GB) tab-delimited text file, which is too large for my machine to import in its entirety. I thought that I would be able to use the nrows parameter to import a cut-down version of the file to draft my code on, but this results in the following error:
System errno 22 unmapping file: Invalid argument
Error in fread("data.tab", header = T, sep = "\t", nrows = 10L) :
Opened 11.63GB (12491321418 bytes) file ok but could not memory map it. This is a 64bit process. There is probably not enough contiguous virtual memory available.
However, if I use head to create a new file containing only the first 11 rows of the original file I am able to use fread() to import the new file without issue.
Below is a sample of the code I am running to replicate the issue:
### File Size (Original File) ###
file.size("data.tab")/1e9# Roughly 12.5 GB### Import First 10 Rows (Original File) ###dt<- fread("data.tab",
header=T, sep="\t",
nrows=10L) # Fails### Create New File Using the First 11 Rows of the Existing One ###
system("head -n 11 data.tab > data_head.tab")
### File Size (New File) ###
file.size("data_head.tab")/1e3# Roughly 330 KB### Import (New File) ###dt<- fread("data_head.tab",
header=T, sep="\t") # Succeeds
I have searched the current issues log for data.table, but the only problem I've found resembling mine is issue #2321, which was closed on 3rd March 2018. The closing messages for this issue stated that the issue had been fixed in data.table version 1.10.5 through the use of lazy memory mapping. However, I'm using data.table version 1.12.8 and seem to be stumbling across the same issue. Once imported, the 10-row data table is only 1.9MB - nowhere near the physical memory limit of my machine (4GB).
My output for sessionInfo() is below:
R version 3.6.3 (2020-02-29)
Platform: aarch64-unknown-linux-gnu (64-bit)
Running under: Manjaro ARM
Matrix products: default
BLAS: /usr/lib/libopenblasp-r0.3.9.so
LAPACK: /usr/lib/liblapack.so.3.9.0
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=en_GB.UTF-8 LC_NAME=en_GB.UTF-8
[9] LC_ADDRESS=en_GB.UTF-8 LC_TELEPHONE=en_GB.UTF-8
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=en_GB.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.12.8 rkward_0.7.1
loaded via a namespace (and not attached):
[1] compiler_3.6.3 tools_3.6.3
Thank you in advance,
Ben
The text was updated successfully, but these errors were encountered:
@jangorecki what I tried was reading one bz2 file from here: https://database.lichess.org/ . The data in the bz2 file are essentially 1 column data.
The command I used was: fread("lichess_db_standard_rated_2020-03.pgn.bz2", colClasses = "character", nrows = 1000)
But this fails. I apologize if this is not 100% reproducible but I cannot think of another way to share a massive file... Please let me know in case I can help more somehow.
Hi,
I've been trying to use
fread()
to import a large (12GB) tab-delimited text file, which is too large for my machine to import in its entirety. I thought that I would be able to use thenrows
parameter to import a cut-down version of the file to draft my code on, but this results in the following error:However, if I use
head
to create a new file containing only the first 11 rows of the original file I am able to usefread()
to import the new file without issue.Below is a sample of the code I am running to replicate the issue:
I have searched the current issues log for data.table, but the only problem I've found resembling mine is issue #2321, which was closed on 3rd March 2018. The closing messages for this issue stated that the issue had been fixed in data.table version 1.10.5 through the use of lazy memory mapping. However, I'm using data.table version 1.12.8 and seem to be stumbling across the same issue. Once imported, the 10-row data table is only 1.9MB - nowhere near the physical memory limit of my machine (4GB).
My output for
sessionInfo()
is below:Thank you in advance,
Ben
The text was updated successfully, but these errors were encountered: