Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Superset Clickhouse driver #13285

Closed
suedschwede opened this issue Feb 22, 2021 · 8 comments
Closed

Superset Clickhouse driver #13285

suedschwede opened this issue Feb 22, 2021 · 8 comments
Assignees
Labels
data:connect:clickhouse Related to Clickhouse doc:user User / Superset documentation

Comments

@suedschwede
Copy link

At the moment there are two superset drivers "clickhouse-sqlalchemy","sqlalchemy-clickhouse"

There are problems with both drivers and that leads to a lot of questions

sqlalchemy-clickhouse

  • Table with 128 Columns - Datasets sync of column does not work correctly - it seems there is a maximum columns you can use
  • If the default user has a password you must use "infi.clickhouse_orm==1.0.4"
  • using a dictionary you get "Python int too large to convert to C long"

clickhouse-sqlalchemy

  • Sync of columns you get wrong data types e.g.: "NULLABLE(STRING)" instead of "STRING"

For me the only stable driver is "clickhouse-sqlalchemy" with the native interface

The documentation should recommend "clickhouse-sqlalchemy" instead of "sqlalchemy-clickhouse"
https://superset.apache.org/docs/databases/installing-database-drivers

It would be nice if the columns sync works correctly with "clickhouse-sqlalchemy"

@junlincc junlincc added the data:connect:clickhouse Related to Clickhouse label Feb 23, 2021
@junlincc
Copy link
Member

@srinify ^ 🙏

@junlincc junlincc added the doc:user User / Superset documentation label Feb 23, 2021
@hodgesrm
Copy link

Hello @suedschwede, I'm testing this case and need some advice on the problem you are seeing. To check datatypes I made a small table and added a row to enable queries.

CREATE TABLE superset_test (
    `basic_string` String,
    `nullable_string` Nullable(String),
    `lowcard_string` LowCardinality(String),
    `basic_datetime` DateTime,
    `nullable_datetime` Nullable(DateTime)
) ENGINE = TinyLog;

INSERT INTO superset_test VALUES('basic', 'nullable', 'lowcard', now(), now());

I can see the effect of syncing columns in the Edit Dataset view. The data types that return are STRING, NULLABLE(STRING), LOWCARDINALITY(STRING), DATETIME, NULLABLE(DATETIME). It looks as if the SQLAlchemy driver is getting these out of system.columns. A couple of questions:

  1. Is the expected behavior to see just STRING and DATETIME? Or would you expect SQL types like VARCHAR?
  2. Are there cases where these vendor-specific datatypes cause failures?
  3. Are there other places where these types appear and cause problems?

It looks as Sqllite returns standard SQL types by comparison.

Thanks! I appended a copy of the Edit Dataset view mentioned above.

Cheers, Robert
ClickHouse_Superset_Column_Sync

@hodgesrm
Copy link

hodgesrm commented Feb 25, 2021

Adding to the original request by @suedschwede, I strongly recommend switching to clickhouse-sqlalchemy as the standard driver for ClickHouse. The reasons for this switch are as follows.

  1. clickhouse-sqlalchemy is actively maintained (last commit 18 Feb 2021, under active development).
  2. Releases are pushed regularly to pypi.org (last push 14 Dec 2020).
  3. It has what appears to be a fairly complete SQLAlchemy implementation.
  4. It supports both HTTP and native TCP wire protocols (vs. HTTP only)
  5. It has a good security including support for TLS, SNI, and configurable certificate verification (vs. no encryption support)
  6. It has a good suite of tests that check SQLAlchemy interfaces.

To the extent that there are compatibility issues like the one discussed here, we (Altinity) can get them fixed fairly quickly. We currently recommend clickhouse-sqlalchemy and clickhouse-driver to our users and have made fixes to them in the past, most recently in December.

What are the next steps to making this switch?

@suedschwede
Copy link
Author

If you try to make a date/time filter to NULLABLE(DATE) you get an error in superset
image

The expected behavior is to see just STRING and DATETIME

NULLABLE(STRING) -> STRING
NULLABLE(DATE) -> DATE
NULLABLE(DATETIME) -> DATETIME

@villebro
Copy link
Member

We're currently working on improving type inference (#13294). We'll make it top priority to enable this for Clickhouse as soon as possible.

@betodealmeida betodealmeida self-assigned this Mar 17, 2021
@HeinzMayer
Copy link

I have created a temporary fix on my environment
clickhouse.py

It seems that the master branch does not work with clickhouse-sqlalchemy==0.1.6
Discussion

@betodealmeida
Copy link
Member

Superset now uses clickhouse-sqlalchemy for Clickhouse.

@royxact
Copy link

royxact commented Nov 25, 2021

Hi @villebro,

I see the issue is still happening in the current superset version, is there a fix planned, or HeinzMayer temporary fix the current solution?

Thanks!
Roy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data:connect:clickhouse Related to Clickhouse doc:user User / Superset documentation
Projects
None yet
Development

No branches or pull requests

7 participants