Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(formats): turn TypeParser into a TypeMapper implementation for sqlglot #6876

Merged
merged 11 commits into from
Sep 4, 2023

Conversation

kszucs
Copy link
Member

@kszucs kszucs commented Aug 16, 2023

Extend the TypeMapper interface with to_string() and from_string() classmethods, then implement the TypeParser to rather be an implementation of TypeMapper for sqlglot. This means that the mappers are able to convert between ibis types and sqlglot datatype expressions back and forth. Hopefully this will turn useful once we are going to depend on sqlglot more heavily.

Additionally support for parsing typestrings and compiling to typestrings (this was the original functionality of the previous type parser), the latter functionality is offloaded to sqlglot (with a couple of exceptions already handled in the original type parse).

from collections.abc import Mapping


class PostgresTypeParser(TypeParser):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to base/sql/glot/datatypes.py for centralization, probably I should move it back.

Copy link
Member

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small naming nit, otherwise LGTM!

ibis/backends/base/sql/glot/datatypes.py Outdated Show resolved Hide resolved
@kszucs
Copy link
Member Author

kszucs commented Aug 17, 2023

This PR is not complete yet, I'd like to add more roundtrip tests for certain backends using hypothesis.

@kszucs kszucs force-pushed the sqlglot-type-mapper branch 3 times, most recently from ee66638 to f74e8fd Compare August 23, 2023 07:21
@kszucs kszucs marked this pull request as ready for review August 23, 2023 07:21
@kszucs kszucs force-pushed the sqlglot-type-mapper branch 10 times, most recently from 81e2f6e to 28349ff Compare August 24, 2023 14:34
@kszucs
Copy link
Member Author

kszucs commented Aug 24, 2023

@cpcloud could you please give it a preliminary review?

I still need to update the cloud backends and somewhat improve the testing.

@kszucs kszucs force-pushed the sqlglot-type-mapper branch from 4874821 to a712d34 Compare August 25, 2023 06:59
@kszucs
Copy link
Member Author

kszucs commented Aug 25, 2023

@cpcloud the cloud backends are passing too, please take a look! Feel free to update the PR as needed, I won't be touching it for a couple of days.

@@ -278,6 +277,7 @@ def test_rename_table(con, temp_table, temp_table_orig, new_schema):
@mark.notyet(
["trino"], reason="trino doesn't support NOT NULL in its in-memory catalog"
)
@mark.broken(["snowflake"], reason="snowflake shows not nullable column as nullable")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this broken upstream or is there a bug in the implementation of type parsing for snowflake?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is due to a weird behaviour of snowflake

>create temporary table _ibis_test_nullability (a number not null);
+----------------------------------------------------+
| status                                             |
|----------------------------------------------------|
| Table _IBIS_TEST_NULLABILITY successfully created. |
+----------------------------------------------------+
1 Row(s) produced. Time Elapsed: 0.470s

>desc _ibis_test_nullability;
+------+--------------+--------+-------+---------+-------------+------------+-------+------------+---------+-------------+
| name | type         | kind   | null? | default | primary key | unique key | check | expression | comment | policy name |
|------+--------------+--------+-------+---------+-------------+------------+-------+------------+---------+-------------|
| A    | NUMBER(38,0) | COLUMN | N     | NULL    | N           | N          | NULL  | NULL       | NULL    | NULL        |
+------+--------------+--------+-------+---------+-------------+------------+-------+------------+---------+-------------+
1 Row(s) produced. Time Elapsed: 0.212s

>select * from _ibis_test_nullability;
+---+
| A |
|---|
+---+
0 Row(s) produced. Time Elapsed: 0.224s

>DESC RESULT last_query_id();
+------+--------------+--------+-------+---------+-------------+------------+-------+------------+---------+-------------+
| name | type         | kind   | null? | default | primary key | unique key | check | expression | comment | policy name |
|------+--------------+--------+-------+---------+-------------+------------+-------+------------+---------+-------------|
| A    | NUMBER(38,0) | COLUMN | Y     | NULL    | N           | N          | NULL  | NULL       | NULL    | NULL        |
+------+--------------+--------+-------+---------+-------------+------------+-------+------------+---------+-------------+
1 Row(s) produced. Time Elapsed: 0.222s

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the null? values.

f"CREATE PRIVATE TEMPORARY TABLE {table} AS {query.strip(';')}"
)
result = con.exec_driver_sql(f"DESCRIBE {table}").mappings().all()
con.exec_driver_sql(f"DROP TABLE {table}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why there are a bunch of uncovered lines here.

"ipv6": dt.INET(nullable=default_nullable),
"object('json')": dt.JSON(nullable=default_nullable),
"array(null)": dt.Array(dt.null, nullable=default_nullable),
"array(nothing)": dt.Array(dt.null, nullable=default_nullable),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These types are case sensitive, why did you alter their case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From_string converts the input string to lowercase before looking up the types unknown to sqlglot

@cpcloud cpcloud added this to the 7.0 milestone Aug 25, 2023
@kszucs kszucs force-pushed the sqlglot-type-mapper branch from 6a81a51 to 98f0f6a Compare September 1, 2023 06:29
@kszucs
Copy link
Member Author

kszucs commented Sep 1, 2023

@cpcloud could you please take another look?

@kszucs kszucs force-pushed the sqlglot-type-mapper branch 2 times, most recently from 0dc48e0 to e17ac89 Compare September 3, 2023 07:52
@kszucs kszucs force-pushed the sqlglot-type-mapper branch from 4694739 to 76d4a84 Compare September 4, 2023 07:51
@kszucs kszucs requested a review from cpcloud September 4, 2023 11:53
@cpcloud
Copy link
Member

cpcloud commented Sep 4, 2023

Cloud backends look good:

cloud in 🌐 falcon in …/ibis on  sqlglot-type-mapper is 📦 v6.1.0 via 🐍 v3.10.12 via ❄️  impure (ibis-3.10.12-env) took 4s
❯ pytest -m 'snowflake or bigquery' -n 16 --dist loadgroup --snapshot-update --maxfail=10 -q
bringing up nodes...
.x...x...............x.x....x...x.x.x..xx..xx...x...x..x.......x.x..................x.x......x...x..........x....xx...............x.......x..............xx.....xx.x...x.......x..........x.x. [  6%]
xx.x.....x...x.xx.xx.x....s.xx..xx.xx.x..x.x..x...x.x.......s......sx.....x.........x.......x.xx.x...x..x.x.x.x....x...x.xxx...xxxxx.x......x.x..xxx.x.........x.......x........xx...x....x... [ 13%]
.........x....x.x.x..xx..x..xx..x.........x.x........x..........x...xx........x....x..x...x.....x..x...x...x.........x.x............x.........x...x..x...xx........x.x.x.x...xx.xx...x.x...... [ 20%]
.......x....x....x..xxxxxx.x..........xx.....x.....x.x...........xx.xx.......xx...xxx...........x.xx..xxxx.....x...x.xx....xx......x..............x.x....................x.........x...xxxxxxx [ 27%]
x.xxxx.xxx..xxx.x.xx.xxxx.xx.x.xxx.x.....xxx..x....x..x..x....xx.x..x.x.....x.xxxxxxxx.xxxxx..x.xx.xsssxs..xxx..x.x...xx...x..x.x.xx..x..................x...x.................x.............. [ 34%]
.....................................x.x............x......x.....x....x........x.xx.x..xx.xxx.x.x...x..x..x..........xs...........xx..x........s.........x..........x.x...x....x.x....x....... [ 41%]
.......x...x...........x..........................x..................x.x..x..x...xx.x........x.........x...x..........x...........x....xx.x..x....x......................x...............x.... [ 48%]
.x..........................x..x.........x..............xx....x.................x...x...x.......xx..xxx..xx.................................x..x..xx..xx.x..........x.........x..x..x.......x. [ 55%]
......x.x...........................x....................................................x................x.....x...x......xxx.........x....x...................x...x..x..x..............x.x.. [ 62%]
.........x.........x..........x.x............x.x......................................................................................................x..................x.x......xxx.xx...... [ 69%]
..xxx.x...xxxxssssxx.xx.x......xxx..xxx...xx.x.....................x.x.x.x.x..............................x.....x.......x.........x..............x....................................x......x [ 76%]
.....xxx..x...x...x.......x.x..s.....s..x........x...............x......x.......................................................x............................................................. [ 83%]
..................................................................................x........................x.s...........................................x.................................... [ 90%]
........................................................x...............................................................................xx......xx.......xx.........................x..x....xx [ 97%]
x...xx.x.x........x..................x.........x.........................                                                                                                                      [100%]
2278 passed, 16 skipped, 439 xfailed in 361.07s (0:06:01)

Copy link
Member

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweeeeeeet!

@cpcloud cpcloud added refactor Issues or PRs related to refactoring the codebase datatypes Issues relating to ibis's datatypes (under `ibis.expr.datatypes`) sql Backends that generate SQL labels Sep 4, 2023
@cpcloud cpcloud force-pushed the sqlglot-type-mapper branch from 76d4a84 to 7abda66 Compare September 4, 2023 12:53
@cpcloud cpcloud enabled auto-merge (rebase) September 4, 2023 12:54
@cpcloud cpcloud merged commit e0aa4da into ibis-project:master Sep 4, 2023
@kszucs kszucs deleted the sqlglot-type-mapper branch September 4, 2023 21:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datatypes Issues relating to ibis's datatypes (under `ibis.expr.datatypes`) refactor Issues or PRs related to refactoring the codebase sql Backends that generate SQL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants