Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(c/driver/postgresql): Accept bulk ingest of dictionary-encoded strings/binary #1275

Merged
merged 3 commits into from
Nov 9, 2023

Conversation

paleolimbot
Copy link
Member

@paleolimbot paleolimbot commented Nov 8, 2023

This PR adds the ability for the Postgres driver to ingest dictionary-encoded arrays. This shows up in R because factors are relatively common and encode by default in Arrow to dictionary-encoded string for performance reasons.

Reprex in R:

library(adbcdrivermanager)

con <- adbcpostgresql::adbcpostgresql() |> 
  adbc_database_init(uri = Sys.getenv("ADBC_POSTGRESQL_TEST_URI")) |> 
  adbc_connection_init()

df <- data.frame(x = letters, y = factor(letters))
write_adbc(df, con, "some_table")
#> Error in adbc_statement_execute_query(stmt): [libpq] Failed to create table: ERROR:  relation "some_table" already exists
#> 
#> Query was: CREATE TABLE "public" . "some_table" ("x" TEXT, "y" TEXT)
read_adbc(con, "SELECT * from some_table") |> 
  as.data.frame() |> 
  str()
#> 'data.frame':    26 obs. of  2 variables:
#>  $ x: chr  "a" "b" "c" "d" ...
#>  $ y: chr  "a" "b" "c" "d" ...

Created on 2023-11-09 with reprex v2.0.2

There is probably some opportunity to consolidate some of the code that currently lives in the BindStream into the PostgresType and/or PostgresTypeResolver...I'm happy to poke away at that at some point but in the meantime it seemed like it wasn't too onerous to tack on dictionary support here.

@paleolimbot paleolimbot marked this pull request as ready for review November 9, 2023 15:01
@paleolimbot paleolimbot requested a review from lidavidm as a code owner November 9, 2023 15:01
@@ -1283,6 +1279,26 @@ class PostgresCopyBinaryFieldWriter : public PostgresCopyFieldWriter {
}
};

class PostgresCopyBinaryDictFieldWriter : public PostgresCopyFieldWriter {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like eventually, we'd want to be able to parametrize this writer on the sub-writer, but we can defer that for now

@lidavidm lidavidm added this to the ADBC Libraries 0.9.0 milestone Nov 9, 2023
@lidavidm lidavidm merged commit 9d6a245 into apache:main Nov 9, 2023
56 checks passed
@paleolimbot paleolimbot deleted the c-postgres-dict-encode branch November 9, 2023 18:14
vleslief-ms pushed a commit to vleslief-ms/arrow-adbc that referenced this pull request Nov 9, 2023
…trings/binary (apache#1275)

This PR adds the ability for the Postgres driver to ingest
dictionary-encoded arrays. This shows up in R because factors are
relatively common and encode by default in Arrow to dictionary-encoded
string for performance reasons.

Reprex in R:

``` r
library(adbcdrivermanager)

con <- adbcpostgresql::adbcpostgresql() |> 
  adbc_database_init(uri = Sys.getenv("ADBC_POSTGRESQL_TEST_URI")) |> 
  adbc_connection_init()

df <- data.frame(x = letters, y = factor(letters))
write_adbc(df, con, "some_table")
#> Error in adbc_statement_execute_query(stmt): [libpq] Failed to create table: ERROR:  relation "some_table" already exists
#> 
#> Query was: CREATE TABLE "public" . "some_table" ("x" TEXT, "y" TEXT)
read_adbc(con, "SELECT * from some_table") |> 
  as.data.frame() |> 
  str()
#> 'data.frame':    26 obs. of  2 variables:
#>  $ x: chr  "a" "b" "c" "d" ...
#>  $ y: chr  "a" "b" "c" "d" ...
```

<sup>Created on 2023-11-09 with [reprex
v2.0.2](https://reprex.tidyverse.org)</sup>

There is probably some opportunity to consolidate some of the code that
currently lives in the `BindStream` into the `PostgresType` and/or
`PostgresTypeResolver`...I'm happy to poke away at that at some point
but in the meantime it seemed like it wasn't too onerous to tack on
dictionary support here.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants