-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(db_engine_specs): improve Presto column type matching #10658
Conversation
sqla_type = PrestoEngineSpec.get_sqla_column_type("varchar(255)") | ||
assert isinstance(sqla_type, types.VARCHAR) | ||
assert sqla_type.length == 255 | ||
|
||
sqla_type = PrestoEngineSpec.get_sqla_column_type("varchar") | ||
assert isinstance(sqla_type, types.String) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we can see that varchar(255)
creates a 255 long VARCHAR
object, whereas just varchar
leaves the length undefined.
4475922
to
495fc72
Compare
495fc72
to
5667d68
Compare
( | ||
re.compile(r"^varchar(\((\d+)\))*$", re.IGNORECASE), | ||
lambda match: types.VARCHAR(int(match[2])) if match[2] else types.String(), | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we're instantiating a types.VARCHAR
object with a set length based on the regex match. This is usually unnecessary, but I thought I'd throw it in here to demonstrate how the matches from the regex can be used to customize the SQLA object.
superset/db_engine_specs/base.py
Outdated
_column_type_mappings: Tuple[ | ||
Tuple[Pattern[str], Union[TypeEngine, Callable[[Match[str]], TypeEngine]]], ..., | ||
] = () |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here the type can either be a SQLA type instance, or a callback that receives the match object and in turn returns the SQLA type instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just a comment
@@ -885,12 +890,18 @@ def get_sqla_column_type(cls, type_: str) -> Optional[TypeEngine]: | |||
""" | |||
Return a sqlalchemy native column type that corresponds to the column type | |||
defined in the data source (return None to use default type inferred by | |||
SQLAlchemy). Needs to be overridden if column requires special handling | |||
SQLAlchemy). Override `_column_type_mappings` for specific needs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we expect subclasses to override _column_type_mappings
should we make it public?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, the thought crossed my mind while writing this but there were other similar properties that were private so I went with the flow. I changed this one to public, and I'll update the other ones in a later PR to keep this PR as light as possible.
Codecov Report
@@ Coverage Diff @@
## master #10658 +/- ##
==========================================
- Coverage 64.33% 60.01% -4.32%
==========================================
Files 784 784
Lines 36952 36952
Branches 3529 3529
==========================================
- Hits 23772 22178 -1594
- Misses 13071 14587 +1516
- Partials 109 187 +78
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice improvement, thank you!
|
||
@staticmethod | ||
def test_get_sqla_column_type(): | ||
sqla_type = PrestoEngineSpec.get_sqla_column_type("varchar(255)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be nice to add some more test cases here especially the ones that are not supported by presto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is blocking another fairly critical feature I'll defer the more comprehensive tests to a forthcoming PR and just cover the discovered bug here. I'm hoping I can carve out some time to build a template for testing this kind of functionality, and don't mind doing the initial work on the Presto spec.
@@ -490,3 +491,24 @@ def test_presto_expand_data_array(self): | |||
self.assertEqual(actual_cols, expected_cols) | |||
self.assertEqual(actual_data, expected_data) | |||
self.assertEqual(actual_expanded_cols, expected_expanded_cols) | |||
|
|||
@staticmethod | |||
def test_get_sqla_column_type(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to make it a class method, in pytest you can just do
def test_get_sqla_column_type():
assert True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, not sure what I was doing there! I'll just tack it on as a regular class method like the other tests to limit confusion. Do we already have a test module that follows pytest best practices that I can mimic going forward?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
23f3500
to
ca9f843
Compare
…ache#10658)" This reverts commit 9461f9c.
(see MSSQL for example of NCHAR/NVARCHAR handling). | ||
|
||
:param type_: Column type returned by inspector | ||
:return: SqlAlchemy column type | ||
""" | ||
for regex, sqla_type in cls.column_type_mappings: | ||
match = regex.match(type_) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@villebro we tested this during a deploy and got errors here because sometimes type_ is None. get_sqla_column_type is used in connectors/sqla/models.py
where sometimes the type on a column can be null.
…ache#10658)" This reverts commit 9461f9c.
…ache#10658)" This reverts commit 9461f9c.
…ache#10658)" This reverts commit 9461f9c.
…ache#10658)" This reverts commit 9461f9c.
* fix: improve Presto column type matching * add optional callback to type map and add tests * lint * change private to public
* fix: improve Presto column type matching * add optional callback to type map and add tests * lint * change private to public
* fix: improve Presto column type matching * add optional callback to type map and add tests * lint * change private to public
SUMMARY
Currently some column types aren't correctly identified on Presto, as the matching pattern doesn't allow for field length (for
varchar
it fails to correctly identifyvarchar(255)
). Since similar but more robust regex matching is already implemented on MSSQL, this aligns that handling on Presto and removes duplicated code. As of this PR, unidentified column types on Presto will now raise an exception to avoid these types of bugs going unnoticed, as the type coverage seems very comprehensive and is easy to expand as needed.TEST PLAN
CI + new tests
ADDITIONAL INFORMATION