Basic SQLAlchemy driver for DuckDB
- duckdb_engine
$ pip install duckdb-engine
DuckDB Engine also has a conda feedstock available, the instructions for the use of which are available in it's repository.
Once you've installed this package, you should be able to just use it, as SQLAlchemy does a python path search
from sqlalchemy import Column, Integer, Sequence, String, create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm.session import Session
Base = declarative_base()
class FakeModel(Base): # type: ignore
__tablename__ = "fake"
id = Column(Integer, Sequence("fakemodel_id_sequence"), primary_key=True)
name = Column(String)
eng = create_engine("duckdb:///:memory:")
Base.metadata.create_all(eng)
session = Session(bind=eng)
session.add(FakeModel(name="Frank"))
session.commit()
frank = session.query(FakeModel).one()
assert frank.name == "Frank"
With IPython-SQL and DuckDB-Engine you can query DuckDB natively in your notebook! Check out DuckDB's documentation or Alex Monahan's great demo of this on his blog.
You can configure DuckDB by passing connect_args
to the create_engine function
create_engine(
'duckdb:///:memory:',
connect_args={
'read_only': True,
'config': {
'memory_limit': '500mb'
}
}
)
The supported configuration parameters are listed in the DuckDB docs
conn = create_engine("duckdb:///:memory:").connect()
# with SQLAlchemy 1.3
conn.execute("register", ("dataframe_name", pd.DataFrame(...)))
# with SQLAlchemy 1.4+
conn.execute(text("register(:name, :df)"), {"name": "test_df", "df": df})
conn.execute("select * from dataframe_name")
Duckdb's SQL parser is based on the PostgreSQL parser, but not all features in PostgreSQL are supported in duckdb. Because the duckdb_engine
dialect is derived from the postgresql
dialect, SQLAlchemy
may try to use PostgreSQL-only features. Below are some caveats to look out for.
When defining an Integer column as a primary key, SQLAlchemy
uses the SERIAL
datatype for PostgreSQL. Duckdb does not yet support this datatype because it's a non-standard PostgreSQL legacy type, so a workaround is to use the SQLAlchemy.Sequence()
object to auto-increment the key. For more information on sequences, you can find the SQLAlchemy Sequence
documentation here.
The following example demonstrates how to create an auto-incrementing ID column for a simple table:
>>> import sqlalchemy
>>> engine = sqlalchemy.create_engine('duckdb:////path/to/duck.db')
>>> metadata = sqlalchemy.MetaData(engine)
>>> user_id_seq = sqlalchemy.Sequence('user_id_seq')
>>> users_table = sqlalchemy.Table(
... 'users',
... metadata,
... sqlalchemy.Column(
... 'id',
... sqlalchemy.Integer,
... user_id_seq,
... server_default=user_id_seq.next_value(),
... primary_key=True,
... ),
... )
>>> metadata.create_all(bind=engine)
NOTE: this is no longer an issue in versions >=0.5.0
of duckdb
The pandas.read_sql()
method can read tables from duckdb_engine
into DataFrames, but the sqlalchemy.engine.result.ResultProxy
trips up when fetchmany()
is called. Therefore, for now chunksize=None
(default) is necessary when reading duckdb tables into DataFrames. For example:
>>> import pandas as pd
>>> import sqlalchemy
>>> engine = sqlalchemy.create_engine('duckdb:////path/to/duck.db')
>>> df = pd.read_sql('users', engine) ### Works as expected
>>> df = pd.read_sql('users', engine, chunksize=25) ### Throws an exception
Unsigned integers are supported by DuckDB, and are available in duckdb_engine.datatypes
.
SQLAlchemy's companion library alembic
can optionally be used to manage database migrations.
This support can be enabling by adding an Alembic implementation class for the duckdb
dialect.
from alembic.ddl.impl import DefaultImpl
class AlembicDuckDBImpl(DefaultImpl):
"""Alembic implementation for DuckDB."""
__dialect__ = "duckdb"
After loading this class with your program, Alembic will no longer raise an error when generating or applying migrations.
DuckDB 0.9.0+ includes builtin support for autoinstalling and autoloading of extensions, see the extension documentation for more information.
Until the DuckDB python client allows you to natively preload extensions, I've added experimental support via a connect_args
parameter
from sqlalchemy import create_engine
create_engine(
'duckdb:///:memory:',
connect_args={
'preload_extensions': ['https'],
'config': {
's3_region': 'ap-southeast-1'
}
}
)
DuckDB 0.8.1+ includes builtin support for user-defined function (UDF) , see the extension documentation for more information.
Until the DuckDB python client allows you to natively preload functions, I've added experimental support via a connect_args
parameter
from sqlalchemy import create_engine
create_engine(
'duckdb:///:memory:',
connect_args={
"preload_functions": [
"/data/path/to/udf.py"
]
}
)
and here is a very simple udf.py
,
def add_built_in_type(x:int) -> int:
return x + 1
def create_functions(conn):
conn.create_function('add_built_in_type', add_built_in_type, ['BIGINT'], 'BIGINT', type='native')
Yes, I'm aware this package should be named duckdb-driver
or something, I wasn't thinking when I named it and it's too hard to change the name now