Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support raw #30

Closed
yutannihilation opened this issue Sep 24, 2023 · 10 comments · Fixed by #270
Closed

Support raw #30

yutannihilation opened this issue Sep 24, 2023 · 10 comments · Fixed by #270

Comments

@yutannihilation
Copy link
Owner

No description provided.

@yutannihilation
Copy link
Owner Author

Probably not worth. Close for now.

@eitsupi
Copy link
Contributor

eitsupi commented Jun 14, 2024

I would appreciate it if you could reconsider this, as the Apache Arrow world uses the ability to send and receive Arrow IPC format as a Raw vector for inter-session communication.

arrow::as_arrow_table(mtcars) |>
  arrow::write_to_raw(format = "stream") |>
  nanoarrow::read_nanoarrow() |>
  as.data.frame()
#>     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
#> 1  21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
#> 2  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
#> 3  22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
#> 4  21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
#> 5  18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
#> 6  18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
#> 7  14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
#> 8  24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
#> 9  22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
#> 10 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
#> 11 17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
#> 12 16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
#> 13 17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
#> 14 15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
#> 15 10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
#> 16 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
#> 17 14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
#> 18 32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
#> 19 30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
#> 20 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
#> 21 21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
#> 22 15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
#> 23 15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
#> 24 13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
#> 25 19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
#> 26 27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
#> 27 26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
#> 28 30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
#> 29 15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
#> 30 19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
#> 31 15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
#> 32 21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

Created on 2024-06-14 with reprex v2.1.0

Use case examples:

@yutannihilation
Copy link
Owner Author

Hmm, at the moment, it doesn't sound convincing to add a support just for a not-so-major usage. But, thanks for sharing.

@eitsupi
Copy link
Contributor

eitsupi commented Jun 15, 2024

it doesn't sound convincing to add a support just for a not-so-major usage.

Fair point.
In fact, if nanoarrow supports Arrow IPC writing (apache/arrow-nanoarrow#252), we may just do that via nanoarrow.

@yutannihilation
Copy link
Owner Author

In my understanding, copying data to an SEXP is not the best option. In the case of "inter session", I understand there's limitation that it's not possible to pass an external pointer around, but I'm wondering if there's some alternative that fits better.

@eitsupi
Copy link
Contributor

eitsupi commented Jun 15, 2024

In my understanding, copying data to an SEXP is not the best option.

FYI, I believe the background of r-polars use of raw vectors is mentioned in this report.
https://sicheng-pan.github.io/GSoC-2023/

@yutannihilation
Copy link
Owner Author

I totally don't understand why the data needs to be serialized and deserialized via R's memory.

@eitsupi
Copy link
Contributor

eitsupi commented Jun 15, 2024

I don't have enough knowledge to clear up your doubts, but it seems common to transfer something like this between sessions when C ABI is not available.
Such as transferring Arrow data between a JS front end and a Python back end.
https://stackoverflow.com/questions/74999055/what-is-the-best-way-to-send-arrow-data-to-the-browser
...or
https://github.com/apache/arrow-experiments/blob/188c4e5ff4bda08319d4520e380d736c36b9ee48/http/get_simple/r/client/client.R#L26-L40

@Sicheng-Pan
Copy link

In my understanding, copying data to an SEXP is not the best option.

FYI, I believe the background of r-polars use of raw vectors is mentioned in this report. https://sicheng-pan.github.io/GSoC-2023/

I used R's builtin serialize and deserialize to pass R objects (should always be functions in my case) across different processes. I used this approach because I could not think of a way to serialize R objects in Rust (and I had no clue about how memory is managed by R). Hopefully there is a better approach for this. The polars series data were not (de)serialized by R anyways.

@yutannihilation
Copy link
Owner Author

The polars series data were not (de)serialized by R anyways.

Yeah, that's the point.

Anyway, while I appreciate for your sharing the information, I don't want to discuss about the specific implementation details of Polars here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants