refactor(duckdb): use `.sql` instead of `.execute` in performance-senitive locations #8669

jcrist · 2024-03-15T21:03:52Z

In #6875 we switched to using con.sql(sql) instead of con.execute(sql) when retrieving an arrow table of results. The former is more efficient than the latter since the .sql api lets the query planner take advantage of the known output type, possibly resulting in a more efficient execution.

In the the-epic-split refactor this optimization was dropped (note - it seems less necessary since duckdb 0.9.0, although they say a performance difference is still expected). We add the optimization back here, and apply it to all locations where we return in-memory data.

We choose not to use duckdb_con.sql instead of duckdb_con.execute in the public-facing raw_sql api, since:

The returned object isn't strictly compatible with the dbapi API
In the case of operations that don't return results (e.g. DDL), con.sql returns None

Fixes #8631.

…sitive locations

jcrist · 2024-03-15T21:26:02Z

Oh joy - segfaults on CI, but I can't reproduce them locally: https://github.com/ibis-project/ibis/actions/runs/8302276853/job/22724239870?pr=8669 👀 👀.

Debugging on CI it is then.

cpcloud · 2024-03-15T21:43:52Z

It's geospatial-related from duckdb/duckdb-spatial#236

https://github.com/ibis-project/ibis/actions/runs/8302276853/job/22724239870?pr=8669#step:15:125

cpcloud

LGTM, thanks!

refactor(duckdb): use .sql instead of .execute in performance-sen…

318259d

…sitive locations

refactor(duckdb): revert changes to to_pyarrow_batches

d6c456e

cpcloud approved these changes Mar 17, 2024

View reviewed changes

cpcloud added this to the 9.0 milestone Mar 17, 2024

cpcloud added performance Issues related to ibis's performance duckdb The DuckDB backend labels Mar 17, 2024

cpcloud enabled auto-merge (squash) March 17, 2024 11:08

cpcloud merged commit aa6aa0c into ibis-project:main Mar 17, 2024
80 of 82 checks passed

jcrist deleted the duckdb-use-con-sql branch March 17, 2024 13:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(duckdb): use `.sql` instead of `.execute` in performance-senitive locations #8669

refactor(duckdb): use `.sql` instead of `.execute` in performance-senitive locations #8669

jcrist commented Mar 15, 2024

jcrist commented Mar 15, 2024 •

edited

Loading

cpcloud commented Mar 15, 2024

cpcloud left a comment

refactor(duckdb): use .sql instead of .execute in performance-senitive locations #8669

refactor(duckdb): use .sql instead of .execute in performance-senitive locations #8669

Conversation

jcrist commented Mar 15, 2024

jcrist commented Mar 15, 2024 • edited Loading

cpcloud commented Mar 15, 2024

cpcloud left a comment

Choose a reason for hiding this comment

refactor(duckdb): use `.sql` instead of `.execute` in performance-senitive locations #8669

refactor(duckdb): use `.sql` instead of `.execute` in performance-senitive locations #8669

jcrist commented Mar 15, 2024 •

edited

Loading