Skip to content

Commit

Permalink
feat: Add validate_sql_json endpoint for checking that a given sql …
Browse files Browse the repository at this point in the history
…query is valid for the chosen database (#7422) (#7462)

merge from lyft-release-sp8 to master
  • Loading branch information
bearcage authored and betodealmeida committed May 6, 2019
1 parent 5cf454b commit 2497048
Show file tree
Hide file tree
Showing 8 changed files with 605 additions and 15 deletions.
40 changes: 26 additions & 14 deletions docs/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -439,8 +439,8 @@ The connection string for Teradata looks like this ::

Required environment variables: ::

export ODBCINI=/.../teradata/client/ODBC_64/odbc.ini
export ODBCINST=/.../teradata/client/ODBC_64/odbcinst.ini
export ODBCINI=/.../teradata/client/ODBC_64/odbc.ini
export ODBCINST=/.../teradata/client/ODBC_64/odbcinst.ini

See `Teradata SQLAlchemy <https://github.com/Teradata/sqlalchemy-teradata>`_.

Expand Down Expand Up @@ -811,6 +811,19 @@ in this dictionary are made available for users to use in their SQL.
'my_crazy_macro': lambda x: x*2,
}
SQL Lab also includes a live query validation feature with pluggable backends.
You can configure which validation implementation is used with which database
engine by adding a block like the following to your config.py:

.. code-block:: python
FEATURE_FLAGS = {
'SQL_VALIDATORS_BY_ENGINE': {
'presto': 'PrestoDBSQLValidator',
}
}
The available validators and names can be found in `sql_validators/`.

**Scheduling queries**

You can optionally allow your users to schedule queries directly in SQL Lab.
Expand Down Expand Up @@ -967,7 +980,7 @@ Note that the above command will install Superset into ``default`` namespace of
Custom OAuth2 configuration
---------------------------

Beyond FAB supported providers (github, twitter, linkedin, google, azure), its easy to connect Superset with other OAuth2 Authorization Server implementations that support "code" authorization.
Beyond FAB supported providers (github, twitter, linkedin, google, azure), its easy to connect Superset with other OAuth2 Authorization Server implementations that support "code" authorization.

The first step: Configure authorization in Superset ``superset_config.py``.

Expand All @@ -986,36 +999,36 @@ The first step: Configure authorization in Superset ``superset_config.py``.
},
'access_token_method':'POST', # HTTP Method to call access_token_url
'access_token_params':{ # Additional parameters for calls to access_token_url
'client_id':'myClientId'
'client_id':'myClientId'
},
'access_token_headers':{ # Additional headers for calls to access_token_url
'Authorization': 'Basic Base64EncodedClientIdAndSecret'
'access_token_headers':{ # Additional headers for calls to access_token_url
'Authorization': 'Basic Base64EncodedClientIdAndSecret'
},
'base_url':'https://myAuthorizationServer/oauth2AuthorizationServer/',
'access_token_url':'https://myAuthorizationServer/oauth2AuthorizationServer/token',
'authorize_url':'https://myAuthorizationServer/oauth2AuthorizationServer/authorize'
}
}
]
# Will allow user self registration, allowing to create Flask users from Authorized User
AUTH_USER_REGISTRATION = True
# The default user self registration role
AUTH_USER_REGISTRATION_ROLE = "Public"
Second step: Create a `CustomSsoSecurityManager` that extends `SupersetSecurityManager` and overrides `oauth_user_info`:

.. code-block:: python
from superset.security import SupersetSecurityManager
class CustomSsoSecurityManager(SupersetSecurityManager):
def oauth_user_info(self, provider, response=None):
logging.debug("Oauth2 provider: {0}.".format(provider))
if provider == 'egaSSO':
# As example, this line request a GET to base_url + '/' + userDetails with Bearer Authentication,
# As example, this line request a GET to base_url + '/' + userDetails with Bearer Authentication,
# and expects that authorization server checks the token, and response with user details
me = self.appbuilder.sm.oauth_remotes[provider].get('userDetails').data
logging.debug("user_data: {0}".format(me))
Expand All @@ -1027,7 +1040,6 @@ This file must be located at the same directory than ``superset_config.py`` with
Then we can add this two lines to ``superset_config.py``:

.. code-block:: python
from custom_sso_security_manager import CustomSsoSecurityManager
CUSTOM_SECURITY_MANAGER = CustomSsoSecurityManager
7 changes: 7 additions & 0 deletions superset/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -420,6 +420,9 @@ class CeleryConfig(object):
# Timeout duration for SQL Lab synchronous queries
SQLLAB_TIMEOUT = 30

# Timeout duration for SQL Lab query validation
SQLLAB_VALIDATION_TIMEOUT = 10

# SQLLAB_DEFAULT_DBID
SQLLAB_DEFAULT_DBID = None

Expand Down Expand Up @@ -608,6 +611,10 @@ class CeleryConfig(object):
# localtime (in the tz where the superset webserver is running)
IS_EPOCH_S_TRULY_UTC = False

# Configure which SQL validator to use for each engine
SQL_VALIDATORS_BY_ENGINE = {
'presto': 'PrestoDBSQLValidator',
}

try:
if CONFIG_PATH_ENV_VAR in os.environ:
Expand Down
27 changes: 27 additions & 0 deletions superset/sql_validators/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
from typing import Optional

from . import base # noqa
from . import presto_db # noqa
from .base import SQLValidationAnnotation # noqa


def get_validator_by_name(name: str) -> Optional[base.BaseSQLValidator]:
return {
'PrestoDBSQLValidator': presto_db.PrestoDBSQLValidator,
}.get(name)
66 changes: 66 additions & 0 deletions superset/sql_validators/base.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

# pylint: disable=too-few-public-methods

from typing import (
Any,
Dict,
List,
Optional,
)


class SQLValidationAnnotation:
"""Represents a single annotation (error/warning) in an SQL querytext"""
def __init__(
self,
message: str,
line_number: Optional[int],
start_column: Optional[int],
end_column: Optional[int],
):
self.message = message
self.line_number = line_number
self.start_column = start_column
self.end_column = end_column

def to_dict(self) -> Dict:
"""Return a dictionary representation of this annotation"""
return {
'line_number': self.line_number,
'start_column': self.start_column,
'end_column': self.end_column,
'message': self.message,
}


class BaseSQLValidator:
"""BaseSQLValidator defines the interface for checking that a given sql
query is valid for a given database engine."""

name = 'BaseSQLValidator'

@classmethod
def validate(
cls,
sql: str,
schema: str,
database: Any,
) -> List[SQLValidationAnnotation]:
"""Check that the given SQL querystring is valid for the given engine"""
raise NotImplementedError
186 changes: 186 additions & 0 deletions superset/sql_validators/presto_db.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,186 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

from contextlib import closing
import logging
import time
from typing import (
Any,
Dict,
List,
Optional,
)

from flask import g
from pyhive.exc import DatabaseError

from superset import app, security_manager
from superset.sql_parse import ParsedQuery
from superset.sql_validators.base import (
BaseSQLValidator,
SQLValidationAnnotation,
)
from superset.utils.core import sources

MAX_ERROR_ROWS = 10

config = app.config


class PrestoSQLValidationError(Exception):
"""Error in the process of asking Presto to validate SQL querytext"""


class PrestoDBSQLValidator(BaseSQLValidator):
"""Validate SQL queries using Presto's built-in EXPLAIN subtype"""

name = 'PrestoDBSQLValidator'

@classmethod
def validate_statement(
cls,
statement,
database,
cursor,
user_name,
) -> Optional[SQLValidationAnnotation]:
# pylint: disable=too-many-locals
db_engine_spec = database.db_engine_spec
parsed_query = ParsedQuery(statement)
sql = parsed_query.stripped()

# Hook to allow environment-specific mutation (usually comments) to the SQL
# pylint: disable=invalid-name
SQL_QUERY_MUTATOR = config.get('SQL_QUERY_MUTATOR')
if SQL_QUERY_MUTATOR:
sql = SQL_QUERY_MUTATOR(sql, user_name, security_manager, database)

# Transform the final statement to an explain call before sending it on
# to presto to validate
sql = f'EXPLAIN (TYPE VALIDATE) {sql}'

# Invoke the query against presto. NB this deliberately doesn't use the
# engine spec's handle_cursor implementation since we don't record
# these EXPLAIN queries done in validation as proper Query objects
# in the superset ORM.
try:
db_engine_spec.execute(cursor, sql)
polled = cursor.poll()
while polled:
logging.info('polling presto for validation progress')
stats = polled.get('stats', {})
if stats:
state = stats.get('state')
if state == 'FINISHED':
break
time.sleep(0.2)
polled = cursor.poll()
db_engine_spec.fetch_data(cursor, MAX_ERROR_ROWS)
return None
except DatabaseError as db_error:
# The pyhive presto client yields EXPLAIN (TYPE VALIDATE) responses
# as though they were normal queries. In other words, it doesn't
# know that errors here are not exceptional. To map this back to
# ordinary control flow, we have to trap the category of exception
# raised by the underlying client, match the exception arguments
# pyhive provides against the shape of dictionary for a presto query
# invalid error, and restructure that error as an annotation we can
# return up.

# Confirm the first element in the DatabaseError constructor is a
# dictionary with error information. This is currently provided by
# the pyhive client, but may break if their interface changes when
# we update at some point in the future.
if not db_error.args or not isinstance(db_error.args[0], dict):
raise PrestoSQLValidationError(
'The pyhive presto client returned an unhandled '
'database error.',
) from db_error
error_args: Dict[str, Any] = db_error.args[0]

# Confirm the two fields we need to be able to present an annotation
# are present in the error response -- a message, and a location.
if 'message' not in error_args:
raise PrestoSQLValidationError(
'The pyhive presto client did not report an error message',
) from db_error
if 'errorLocation' not in error_args:
raise PrestoSQLValidationError(
'The pyhive presto client did not report an error location',
) from db_error

# Pylint is confused about the type of error_args, despite the hints
# and checks above.
# pylint: disable=invalid-sequence-index
message = error_args['message']
err_loc = error_args['errorLocation']
line_number = err_loc.get('lineNumber', None)
start_column = err_loc.get('columnNumber', None)
end_column = err_loc.get('columnNumber', None)

return SQLValidationAnnotation(
message=message,
line_number=line_number,
start_column=start_column,
end_column=end_column,
)
except Exception as e:
logging.exception(f'Unexpected error running validation query: {e}')
raise e

@classmethod
def validate(
cls,
sql: str,
schema: str,
database: Any,
) -> List[SQLValidationAnnotation]:
"""
Presto supports query-validation queries by running them with a
prepended explain.
For example, "SELECT 1 FROM default.mytable" becomes "EXPLAIN (TYPE
VALIDATE) SELECT 1 FROM default.mytable.
"""
user_name = g.user.username if g.user else None
parsed_query = ParsedQuery(sql)
statements = parsed_query.get_statements()

logging.info(f'Validating {len(statements)} statement(s)')
engine = database.get_sqla_engine(
schema=schema,
nullpool=True,
user_name=user_name,
source=sources.get('sql_lab', None),
)
# Sharing a single connection and cursor across the
# execution of all statements (if many)
annotations: List[SQLValidationAnnotation] = []
with closing(engine.raw_connection()) as conn:
with closing(conn.cursor()) as cursor:
for statement in parsed_query.get_statements():
annotation = cls.validate_statement(
statement,
database,
cursor,
user_name,
)
if annotation:
annotations.append(annotation)
logging.debug(f'Validation found {len(annotations)} error(s)')

return annotations
Loading

0 comments on commit 2497048

Please sign in to comment.