Parser trips up on common snowflake query history #78

peteclark3 · 2021-10-14T14:44:46Z

Currently, the parser trips up on many common snowflake query history entries like select query_text from table(information_schema.query_history()); - also queries with the rm @SNOWFLAKE_... syntax... also queries with the keyword recluster ... in the latter case, the error being syntax error at or near "recluster", at index 35 ... I am systematically removing these from analysis prior to sending to the analyzer but just FYI that without doing this, the analyzer throws an exception

The text was updated successfully, but these errors were encountered:

vrajat · 2021-10-14T16:34:23Z

Yes. This is a known issue and happy to discuss how to approach parsing Snowflake queries better. Right now, this package uses pglast for parsing. So it parses Postgres syntax. It also does pretty well with Redshift since the dialect is very similar.
A set of parsers are required to support different databases. There are a few options that I have considered and funded through freelance gigs:

Use Apache Calcite and add support for custom syntax. This worked pretty well for a gig but it is in Java. Since this package is in Python, Python/Java bridges are a pain to use.
Use lark and build a parser. Note that one doesnt have to start from scratch.

Both of these require substantial effort and I am looking for opportunities to fund it.

peteclark3 · 2021-10-14T18:59:27Z

Ok. Is it fair to say that this package doesn't really support snowflake yet? I see it in the marketing materials - https://tokern.io/data-lineage/ .. but in practice when I try the examples, I never get any result in the graph because it trips up on so many queries. If there is an implementation that does use snowflake effectively, or at least filter out the syntax that pglast doesn't support, I'd be happy to use it... but right now I think I am going to have to try a different package because it's getting to a point where I have so many query syntax exclusions that it's becoming not feasible... (even then, queries are failing and not telling me why.. I'm just getting a bunch of these now):

tokern-data-lineage |   File "/opt/pysetup/.venv/lib/python3.8/site-packages/data_lineage/server.py", line 186, in post
tokern-data-lineage |     chosen_visitor = analyze_dml_query(self._catalog, parsed, source)
tokern-data-lineage |   File "/opt/pysetup/.venv/lib/python3.8/site-packages/data_lineage/parser/__init__.py", line 54, in analyze_dml_query
tokern-data-lineage |     chosen_visitor = visit_dml_query(parsed, source)
tokern-data-lineage |   File "/opt/pysetup/.venv/lib/python3.8/site-packages/data_lineage/parser/__init__.py", line 89, in visit_dml_query
tokern-data-lineage |     v(parsed.node)
tokern-data-lineage |   File "/opt/pysetup/.venv/lib/python3.8/site-packages/pglast/visitors.py", line 197, in __call__
tokern-data-lineage |     result = method(ancestors, node)
tokern-data-lineage |   File "/opt/pysetup/.venv/lib/python3.8/site-packages/data_lineage/parser/dml_visitor.py", line 254, in visit_SelectStmt
tokern-data-lineage |     super().__call__(node.intoClause)
tokern-data-lineage |   File "/opt/pysetup/.venv/lib/python3.8/site-packages/pglast/visitors.py", line 190, in __call__
tokern-data-lineage |     ancestors, node = generator.send(None)
tokern-data-lineage |   File "/opt/pysetup/.venv/lib/python3.8/site-packages/pglast/visitors.py", line 221, in iterate
tokern-data-lineage |     raise ValueError('Bad argument, expected a ast.Node instance or a tuple')
tokern-data-lineage | ValueError: Bad argument, expected a ast.Node instance or a tuple

vrajat · 2021-10-16T03:49:54Z

Yes. The open source package as it currently stands has poor Snowflake coverage. This is because there is no good open source Snowflake SQL parser. All other OSS packages have a similar problem.
I have private scripts to parse and I support snowflake as part of my freelance gigs. These scripts have too many hacks to open source.
I am looking for funding to build an open source Snowflake parser.

vrajat mentioned this issue Oct 22, 2021

Support for Merge operations on Snowflake queries #13

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parser trips up on common snowflake query history #78

Parser trips up on common snowflake query history #78

peteclark3 commented Oct 14, 2021

vrajat commented Oct 14, 2021

peteclark3 commented Oct 14, 2021 •

edited

Loading

vrajat commented Oct 16, 2021

Parser trips up on common snowflake query history #78

Parser trips up on common snowflake query history #78

Comments

peteclark3 commented Oct 14, 2021

vrajat commented Oct 14, 2021

peteclark3 commented Oct 14, 2021 • edited Loading

vrajat commented Oct 16, 2021

peteclark3 commented Oct 14, 2021 •

edited

Loading