Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

let dbreadtable use copy #9

Open
parisni opened this issue Jun 22, 2021 · 4 comments
Open

let dbreadtable use copy #9

parisni opened this issue Jun 22, 2021 · 4 comments

Comments

@parisni
Copy link

parisni commented Jun 22, 2021

dbreadtable could use copy to csv and then read the csv to dataframe.

this needs a temporary folder, but this would improve performance in many cases.

also several R readers could be passed such fread to get different object such dataframe of datatable or whatever

@krlmlr
Copy link
Member

krlmlr commented Sep 11, 2021

Thanks. In what situation would COPY be faster? Would you like to share an example and timings?

@parisni
Copy link
Author

parisni commented Sep 11, 2021

well from my experience, copy is more effective in mostly any case. You can notice the performance improvement on large enough tables (milion rows). An other advantage is it has a lower impact on the postgres database compared with classic method, with no cursor needs.

A simple way to prove this is to benchmark on several tables sizes:

  • a simple bash script with psql + \copy (stmt) to 'path/to/csv' and then a R script with csv reader
  • your package way of dealing

Also copy has a drawback: it losts columns types in the process, so you have to get this information from the database and inject it in the csv reader afterwards. Moreover you cannot deal with bytea fields, nor with arrays (dependently of the csv reader capabilities).

For information, the psycopg3 python library, is a full rewrite of psycopg2 focussed on copy for any interaction with postgres https://www.psycopg.org/psycopg3/

@krlmlr
Copy link
Member

krlmlr commented Sep 13, 2021

Good point about column types. I think we should stick with the current approach for dbReadTable(). Unfortunately, COPY table_name TO STDOUT currently hangs R, we could mitigate this with a new postgresCopyTo() function.

@krlmlr
Copy link
Member

krlmlr commented Oct 29, 2021

An integration with Arrow might bring us much more bang for the buck, including type safety.

@krlmlr krlmlr closed this as completed Oct 29, 2021
@krlmlr krlmlr transferred this issue from r-dbi/RPostgres Oct 31, 2021
@krlmlr krlmlr reopened this Oct 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants