-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
let dbreadtable use copy #9
Comments
Thanks. In what situation would |
well from my experience, copy is more effective in mostly any case. You can notice the performance improvement on large enough tables (milion rows). An other advantage is it has a lower impact on the postgres database compared with classic method, with no cursor needs. A simple way to prove this is to benchmark on several tables sizes:
Also copy has a drawback: it losts columns types in the process, so you have to get this information from the database and inject it in the csv reader afterwards. Moreover you cannot deal with bytea fields, nor with arrays (dependently of the csv reader capabilities). For information, the psycopg3 python library, is a full rewrite of psycopg2 focussed on |
Good point about column types. I think we should stick with the current approach for |
An integration with Arrow might bring us much more bang for the buck, including type safety. |
dbreadtable could use copy to csv and then read the csv to dataframe.
this needs a temporary folder, but this would improve performance in many cases.
also several R readers could be passed such fread to get different object such dataframe of datatable or whatever
The text was updated successfully, but these errors were encountered: