-
Notifications
You must be signed in to change notification settings - Fork 991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fpaste: fwrite output as a character vector #4572
Comments
for the record I tried using |
@MichaelChirico I had forgotten to mention in my original post that I tried with |
This sounds impressive. I do not understand how writing to file is faster than manipulating it in RAM. What is happening? Why is |
You can still manipulate it in RAM with fwrite and fread if you use tempfile having tempdir set to ramdisk (search NEWS.md for "ramdisk"). I assume that |
@ColeMiller1 My initial thought was to just use Using the relevant parts of fpaste2 <- function(dt, sep = ",", envir = parent.frame()) {
eval({
file <- rawConnection(raw(0L), open = "w")
on.exit({
if (!is.null(file)) close(file)
})
capture.output(fwrite(dt, sep = sep, col.names = FALSE), file = file)
fread(rawToChar(rawConnectionValue(file)), sep = "\n", header = FALSE)
}, envir = envir, enclos = envir)
} This performs well. It's at least as fast if not faster than |
This will not work for a |
My use case for |
Just now seeing this today, but I think there certainly is an opportunity to improve vectorized string concatenation performance with a Back in 2018, I had a use case where this was the bottleneck in a data pipeline. I posted to stack overflow, https://stackoverflow.com/questions/48233309/fast-concatenation-of-data-table-columns-into-one-string-column , and in the course of investigating, I was suprised to find the same thing others described here - it was faster to One of the answers by Matrin Modrák proposed repurposing some of the code from /src/fwrite.c that ran 8x faster the previous best - an optimized
|
Given the speed of
fwrite
, it can be used in conjunction withfread
as an alternative todo.call(paste, ...)
to flatten multiple columns into a character vector. It would be nice to be able to capture the output offwrite
directly as a character vector.It is much faster than some of the other idiomatic approaches that are often considered.
Here's the behavior I'm hoping to be able to replicate:
Here's a comparison with a straightforward
paste
in adata.table
:The text was updated successfully, but these errors were encountered: