Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add pyexasol datasource, ensure that integer dont overflow in javascript #4378

Merged
merged 2 commits into from
Nov 27, 2019

Conversation

stefan-mees
Copy link
Contributor

What type of PR is this? (check all applicable)

  • New Query Runner (Data Source)

Description

Adds Exasol as datasource to Redash, we are running our own fork with this connector since a while and would like to contribute it back to the mainline.

Copy link
Member

@arikfr arikfr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! It's great we can finally add Exasol to the list of supported databases without using the more compelx setup of the ODBC driver.

I've added some comments ⬇

requirements.txt Outdated Show resolved Hide resolved
redash/query_runner/exasol.py Outdated Show resolved Hide resolved
redash/query_runner/exasol.py Outdated Show resolved Hide resolved
redash/query_runner/exasol.py Outdated Show resolved Hide resolved
redash/query_runner/exasol.py Outdated Show resolved Hide resolved
redash/query_runner/exasol.py Outdated Show resolved Hide resolved
@@ -32,3 +32,5 @@ phoenixdb==0.7
certifi>=2019.9.11
pydgraph==2.0.2
azure-kusto-data==0.0.35
pyexasol==0.9.1
python-rapidjson==0.8.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you notice any noticeable performance difference when using this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its recommended in the best practices of pyexasol, i have not done any benchmarking on this by myself.

https://github.com/exasol/pyexasol/blob/master/docs/BEST_PRACTICES.md#consider-faster-json-parsing-libraries

Copy link
Member

@arikfr arikfr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍 Will merge once CI passes.

@arikfr arikfr merged commit e82373a into getredash:master Nov 27, 2019
@littleK0i
Copy link

littleK0i commented Dec 13, 2019

@stefan-mees , @arikfr , Hi guys,

PyEXASOL creator is here. Thank you for the pull request!

Let's see if we can improve this code a little bit:

  • Regarding to rapidjson library. It improves performance of json decoding by ~10-30% in ideal case scenario. But in this query runner the response is encoded in JSON format again by redash.utils.json_dumps, so serialisation overhead is doubled or even tripled. In this case rapidjson won't really help much. Maybe it's not worth to add it as a mandatory dependency.

  • Regarding to _exasol_type_mapper. Please note that creation of datetime.date and datetime.datetime objects is quite expensive. If you have a lot of DATE / TIMESTAMP columns in result set, it may take more CPU resources than all other actions. But we do not really need such objects. They are serialised into strings again before returning the result set. Maybe it's possible to simplify mapper or even avoid using it altogether.

  • It might be worth adding more configurable parameters (full reference). At very least we need compression (it should be False in the cloud and in the same data center, it should be True for remote data center or WiFi connection), encryption (security), socket_timeout, schema, http_proxy.

  • Please note that having separate host and port parameters might be a bad idea in the long run. Currently it works fine, but starting from Exasol 7.0+ we'll have an ability to create multiple read-only clusters, and each of them might have a different port. DSN might look like this: cluster1exa1..10:8563,cluster2exa1..10:8564. It might be a good idea to keep this parameter as pure dsn and pass it directly to pyexasol.connect() without concatenation.

  • It is advised to add /*snapshot execution*/ prefix to this SQL query:

        SELECT
            COLUMN_SCHEMA,
            COLUMN_TABLE,
            COLUMN_NAME
        FROM EXA_ALL_COLUMNS

This is a new feature described in https://www.exasol.com/support/browse/IDEA-476. It will help to prevent locks while retrieiving large amounts of meta data.

  • Variable error does not seem to be populated by exceptions. It might be worth adding a basic except block catching exceptions and returning text message to ReDash. Please note that PyEXASOL exceptions are instances of pyexasol.ExaException, but sometimes we may get OSException from underlying WebSocket / socket networking layer. If default error messages are too verbose for ReDash, you may set verbose_error=False to get only text without extra information.

I would patch it myself, but we do not currently use ReDash in Badoo, so I cannot test those changes on prod long term.

Thank you! Have a nice day.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants