Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add Apache Doris support #24714

Merged
merged 12 commits into from
Nov 21, 2023
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,7 @@ Here are some of the major database solutions that are supported:
<img src="superset-frontend/src/assets/images/yugabyte.png" alt="yugabyte" border="0" width="200" height="80"/>
<img src="superset-frontend/src/assets/images/databend.png" alt="databend" border="0" width="200" height="80"/>
<img src="superset-frontend/src/assets/images/starrocks.png" alt="starrocks" border="0" width="200" height="80"/>
<img src="superset-frontend/src/assets/images/doris.png" alt="doris" border="0" width="200" height="80"/>
</p>

**A more comprehensive list of supported databases** along with the configuration instructions can be found [here](https://superset.apache.org/docs/databases/installing-database-drivers).
Expand Down
26 changes: 26 additions & 0 deletions docs/docs/databases/doris.mdx
rusackas marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
---
title: Apache Doris
hide_title: true
sidebar_position: 5
version: 1
---

## Doris

The [sqlalchemy-doris](https://pypi.org/project/pydoris/) library is the recommended way to connect to Apache Doris through SQLAlchemy.

You'll need the following setting values to form the connection string:

- **User**: User Name
- **Password**: Password
- **Host**: Doris FE Host
- **Port**: Doris FE port
- **Catalog**: Catalog Name
- **Database**: Database Name


Here's what the connection string looks like:

```
doris://<User>:<Password>@<Host>:<Port>/<Catalog>.<Database>
```
1 change: 1 addition & 0 deletions docs/docs/databases/installing-database-drivers.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ Some of the recommended packages are shown below. Please refer to [setup.py](htt
| Database | PyPI package | Connection String |
| --------------------------------------------------------- | ---------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------ |
| [Amazon Athena](/docs/databases/athena) | `pip install pyathena[pandas]` , `pip install PyAthenaJDBC` | `awsathena+rest://{aws_access_key_id}:{aws_secret_access_key}@athena.{region_name}.amazonaws.com/{schema_name}?s3_staging_dir={s3_staging_dir}&... ` |
| [Apache Doris](/docs/databases/doris) | `pip install pydoris` | `doris://<User>:<Password>@<Host>:<Port>/<Catalog>.<Database>` |
| [Amazon DynamoDB](/docs/databases/dynamodb) | `pip install pydynamodb` | `dynamodb://{access_key_id}:{secret_access_key}@dynamodb.{region_name}.amazonaws.com?connector=superset` |
| [Amazon Redshift](/docs/databases/redshift) | `pip install sqlalchemy-redshift` | ` redshift+psycopg2://<userName>:<DBPassword>@<AWS End Point>:5439/<Database Name>` |
| [Apache Drill](/docs/databases/drill) | `pip install sqlalchemy-drill` | `drill+sadrill:// For JDBC drill+jdbc://` |
Expand Down
5 changes: 5 additions & 0 deletions docs/src/resources/data.js
Original file line number Diff line number Diff line change
Expand Up @@ -117,4 +117,9 @@ export const Databases = [
href: 'https://www.microsoft.com/en-us/sql-server',
imgName: 'msql.png',
},
{
title: 'Apache Doris',
href: 'https://doris.apache.org/',
imgName: 'doris.png',
},
];
Binary file added docs/static/img/databases/doris.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -205,6 +205,7 @@ def get_git_sha() -> str:
"vertica": ["sqlalchemy-vertica-python>=0.5.9, < 0.6"],
"netezza": ["nzalchemy>=11.0.2"],
"starrocks": ["starrocks>=1.0.0"],
"doris": ["pydoris>=1.0.0, <2.0.0"],
},
python_requires="~=3.9",
author="Apache Software Foundation",
Expand Down
Binary file added superset-frontend/src/assets/images/doris.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
278 changes: 278 additions & 0 deletions superset/db_engine_specs/doris.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,278 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
import logging
import re
from re import Pattern
from typing import Any, Optional
from urllib import parse

from flask_babel import gettext as __
from sqlalchemy import Float, Integer, Numeric, String, TEXT, types
from sqlalchemy.engine.url import URL
from sqlalchemy.sql.type_api import TypeEngine

from superset.db_engine_specs.mysql import MySQLEngineSpec
from superset.errors import SupersetErrorType
from superset.utils.core import GenericDataType

# Regular expressions to catch custom errors
CONNECTION_ACCESS_DENIED_REGEX = re.compile(
"Access denied for user '(?P<username>.*?)'"
)
CONNECTION_INVALID_HOSTNAME_REGEX = re.compile(
"Unknown Doris server host '(?P<hostname>.*?)'"
)
CONNECTION_UNKNOWN_DATABASE_REGEX = re.compile("Unknown database '(?P<database>.*?)'")
CONNECTION_HOST_DOWN_REGEX = re.compile(
"Can't connect to Doris server on '(?P<hostname>.*?)'"
)
SYNTAX_ERROR_REGEX = re.compile(
"check the manual that corresponds to your MySQL server "
"version for the right syntax to use near '(?P<server_error>.*)"
)

logger = logging.getLogger(__name__)


class TINYINT(Integer):
__visit_name__ = "TINYINT"


class LARGEINT(Integer):
__visit_name__ = "LARGEINT"


class DOUBLE(Float):
__visit_name__ = "DOUBLE"


class HLL(Numeric):
__visit_name__ = "HLL"


class BITMAP(Numeric):
__visit_name__ = "BITMAP"


class QuantileState(Numeric):
__visit_name__ = "QUANTILE_STATE"


class AggState(Numeric):
__visit_name__ = "AGG_STATE"


class ARRAY(TypeEngine):
__visit_name__ = "ARRAY"

@property
def python_type(self) -> Optional[type[list[Any]]]:
return list

Check warning on line 84 in superset/db_engine_specs/doris.py

View check run for this annotation

Codecov / codecov/patch

superset/db_engine_specs/doris.py#L84

Added line #L84 was not covered by tests


class MAP(TypeEngine):
__visit_name__ = "MAP"

@property
def python_type(self) -> Optional[type[dict[Any, Any]]]:
return dict

Check warning on line 92 in superset/db_engine_specs/doris.py

View check run for this annotation

Codecov / codecov/patch

superset/db_engine_specs/doris.py#L92

Added line #L92 was not covered by tests


class STRUCT(TypeEngine):
__visit_name__ = "STRUCT"

@property
def python_type(self) -> Optional[type[Any]]:
return None

Check warning on line 100 in superset/db_engine_specs/doris.py

View check run for this annotation

Codecov / codecov/patch

superset/db_engine_specs/doris.py#L100

Added line #L100 was not covered by tests


class DorisEngineSpec(MySQLEngineSpec):
engine = "pydoris"
engine_aliases = {"doris"}
engine_name = "Apache Doris"
max_column_name_length = 64
default_driver = "pydoris"
sqlalchemy_uri_placeholder = (
"doris://user:password@host:port/catalog.db[?key=value&key=value...]"
)
encryption_parameters = {"ssl": "0"}
supports_dynamic_schema = True

column_type_mappings = ( # type: ignore
(
re.compile(r"^tinyint", re.IGNORECASE),
TINYINT(),
GenericDataType.NUMERIC,
),
(
re.compile(r"^largeint", re.IGNORECASE),
LARGEINT(),
GenericDataType.NUMERIC,
),
(
re.compile(r"^decimal.*", re.IGNORECASE),
types.DECIMAL(),
GenericDataType.NUMERIC,
),
(
re.compile(r"^double", re.IGNORECASE),
DOUBLE(),
GenericDataType.NUMERIC,
),
(
re.compile(r"^varchar(\((\d+)\))*$", re.IGNORECASE),
types.VARCHAR(),
GenericDataType.STRING,
),
(
re.compile(r"^char(\((\d+)\))*$", re.IGNORECASE),
types.CHAR(),
GenericDataType.STRING,
),
(
re.compile(r"^json.*", re.IGNORECASE),
types.JSON(),
GenericDataType.STRING,
),
(
re.compile(r"^binary.*", re.IGNORECASE),
types.BINARY(),
GenericDataType.STRING,
),
(
re.compile(r"^quantile_state", re.IGNORECASE),
QuantileState(),
GenericDataType.STRING,
),
(
re.compile(r"^agg_state.*", re.IGNORECASE),
AggState(),
GenericDataType.STRING,
),
(re.compile(r"^hll", re.IGNORECASE), HLL(), GenericDataType.STRING),
(
re.compile(r"^bitmap", re.IGNORECASE),
BITMAP(),
GenericDataType.STRING,
),
(
re.compile(r"^array.*", re.IGNORECASE),
ARRAY(),
GenericDataType.STRING,
),
(
re.compile(r"^map.*", re.IGNORECASE),
MAP(),
GenericDataType.STRING,
),
(
re.compile(r"^struct.*", re.IGNORECASE),
STRUCT(),
GenericDataType.STRING,
),
(
re.compile(r"^datetime.*", re.IGNORECASE),
types.DATETIME(),
GenericDataType.STRING,
),
(
re.compile(r"^date.*", re.IGNORECASE),
types.DATE(),
GenericDataType.STRING,
),
(
re.compile(r"^text.*", re.IGNORECASE),
TEXT(),
GenericDataType.STRING,
),
(
re.compile(r"^string.*", re.IGNORECASE),
String(),
GenericDataType.STRING,
),
)

custom_errors: dict[Pattern[str], tuple[str, SupersetErrorType, dict[str, Any]]] = {
CONNECTION_ACCESS_DENIED_REGEX: (
__('Either the username "%(username)s" or the password is incorrect.'),
SupersetErrorType.CONNECTION_ACCESS_DENIED_ERROR,
{"invalid": ["username", "password"]},
),
CONNECTION_INVALID_HOSTNAME_REGEX: (
__('Unknown Doris server host "%(hostname)s".'),
SupersetErrorType.CONNECTION_INVALID_HOSTNAME_ERROR,
{"invalid": ["host"]},
),
CONNECTION_HOST_DOWN_REGEX: (
__('The host "%(hostname)s" might be down and can\'t be reached.'),
SupersetErrorType.CONNECTION_HOST_DOWN_ERROR,
{"invalid": ["host", "port"]},
),
CONNECTION_UNKNOWN_DATABASE_REGEX: (
__('Unable to connect to database "%(database)s".'),
SupersetErrorType.CONNECTION_UNKNOWN_DATABASE_ERROR,
{"invalid": ["database"]},
),
SYNTAX_ERROR_REGEX: (
__(
'Please check your query for syntax errors near "%(server_error)s". '
"Then, try running your query again."
),
SupersetErrorType.SYNTAX_ERROR,
{},
),
}

@classmethod
def adjust_engine_params(
cls,
uri: URL,
connect_args: dict[str, Any],
catalog: Optional[str] = None,
schema: Optional[str] = None,
) -> tuple[URL, dict[str, Any]]:
database = uri.database
if schema and database:
schema = parse.quote(schema, safe="")
if "." in database:
database = database.split(".")[0] + "." + schema

Check warning on line 252 in superset/db_engine_specs/doris.py

View check run for this annotation

Codecov / codecov/patch

superset/db_engine_specs/doris.py#L250-L252

Added lines #L250 - L252 were not covered by tests
else:
database = "internal." + schema
uri = uri.set(database=database)

Check warning on line 255 in superset/db_engine_specs/doris.py

View check run for this annotation

Codecov / codecov/patch

superset/db_engine_specs/doris.py#L254-L255

Added lines #L254 - L255 were not covered by tests

return uri, connect_args

@classmethod
def get_schema_from_engine_params(
cls,
sqlalchemy_uri: URL,
connect_args: dict[str, Any],
) -> Optional[str]:
"""
Return the configured schema.

For doris the SQLAlchemy URI looks like this:

doris://localhost:9030/catalog.database

"""
database = sqlalchemy_uri.database.strip("/")

if "." not in database:
return None

return parse.unquote(database.split(".")[1])
Loading
Loading