Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request "BufferedOutputStream()" #31

Closed
apalacio9502 opened this issue May 31, 2024 · 3 comments · Fixed by #53
Closed

Feature request "BufferedOutputStream()" #31

apalacio9502 opened this issue May 31, 2024 · 3 comments · Fixed by #53
Labels
feature a feature request or enhancement
Milestone

Comments

@apalacio9502
Copy link

apalacio9502 commented May 31, 2024

Hi,

I believe this library is very interesting for replacing the use of Arrow in different libraries that perform reading, writing, and serialization of data in Parquet format. However, I think it would be interesting to have an implementation of BufferOutputStream() to avoid the disk write in cases where the goal is to obtain the raw data of the Parquet file.

Regards,

@gaborcsardi gaborcsardi added the feature a feature request or enhancement label Jun 1, 2024
@gaborcsardi gaborcsardi added this to the 0.3.0 milestone Jun 3, 2024
@gaborcsardi
Copy link
Member

Do you want to mean that you want the output in a memory buffer, in a raw vector? Or you actually want to stream the output to HTTP?

@apalacio9502
Copy link
Author

Hi @gaborcsardi,

Thanks for your prompt response,

That's correct, I want the output in a memory buffer, as for example is done with an arrow:

export_parquet <- function(values) {

  check_installed(arrow, "for source_format = `PARQUET`")

  con <- arrow::BufferOutputStream$create()
  defer(con$close())
  arrow::write_parquet(values, con)

  as.raw(arrow::buffer(con))

}

Regards,

gaborcsardi added a commit that referenced this issue Jun 4, 2024
Which is returned as a raw vector. Closes #31.
@gaborcsardi
Copy link
Member

gaborcsardi commented Jun 4, 2024

Now you can do write_parquet(..., ":raw:") to write to a memory buffer, and write_parquet() will return the raw vector of the Parquet file:

pq <- nanoparquet::write_parquet(mtcars, ":raw:")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants