Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add three new experimental engines, including SQLite #331

Merged
merged 6 commits into from
Feb 24, 2020
Merged

Conversation

kyleconroy
Copy link
Collaborator

@kyleconroy kyleconroy commented Feb 11, 2020

This will be a long-running pull request implementing SQLite support. SQLite support is being implemented using a different strategy than MySQL and PostgreSQL. Right now type checking, type inference, and code generation can't be shared among database engines.

For SQLite, the plan is use the parser (generated by ANTLR) to output a generic SQL AST. Think of this a form of compiler IR. The goal would then be to port both the MySQL and PostgreSQL engines to the same IR, and then do all (or most) of the work on that IR.

This is all a WIP and may fail horribly. I won't be changing any existing packages with this pull request.

#161

@mightyguava
Copy link
Contributor

The goal would then be to port both the MySQL and PostgreSQL engines to the same IR, and then do all (or most) of the work on that IR.

+1 This seems much more maintainable than to try to use 3 different parsers with 3 different AST representations.

@kyleconroy
Copy link
Collaborator Author

This is in a state where it's ready for a first set of eyes. The architecture of this change is more important than the specifics.

The best place to start is internal/endtoend/parsers/parser_test.go. Each database engine has a new package that exposes a Parser struct implementing this interface:

type Parser interface {
    Parse(io.Reader) ([]ast.Statement, error)
}

An ast.Statement is the entry point into the new internal/sql/ast package. This ast package contains a generic set of SQL AST nodes. Each database parser is responsible for turning a query string into a slice of statements.

The new internal/sql/catalog package takes those statements and builds a catalog. It's very similar to the existing internal/catalog / internal/pg packages, but not specific to a single engine.

The internal/sql/info package was just a proof-of-concept, feel free to ignore it.

Again, all of this is subject to change, but I'm happy with the initial progress.

cc @mightyguava @cmoog

@kyleconroy
Copy link
Collaborator Author

I'm happy enough with this progress that I'm going to merge this in. The experimental engines, prefixed with an underscore, are not enabled in releases. Once they get to a better state, we'll enable them for testing purposes.

@kyleconroy
Copy link
Collaborator Author

I'm also going to rebase and split this change into concrete commits before merging

@kyleconroy kyleconroy marked this pull request as ready for review February 21, 2020 22:45
@kyleconroy kyleconroy changed the title Add SQLite support Add three new experimental engines, including SQLite Feb 21, 2020
Copy link
Contributor

@mightyguava mightyguava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just read through the code. If I understand this correctly, you've added the start of a SQLite engine, containing a complete parser using Antler. For this you created an ast package that will serve as the intermediate representation that all language generators will consume. You've also added a MySQL engine backed by tidb's parser, and a PostgreSQL engine backed by the existing postgres parser. They also output the intermediate ast, so generators will only need to be written once for all engines, albeit with some switches for engine-specific features. At the moment though, each engine only supports very basic create/delete table and select ops.

The next steps are probably to fully port the PostgreSQL implementation to this format, and extend the MySQL and the SQLite ones? This seems pretty great. I think it's a good evolution of sqlc given the addition of the new engines and will make things a lot easier for writing the generators.

I'm a bit concerned about the maintainability of the engines though. It seems like it'll be quite a lot of effort to adapt each parser's output to a single fully featured IR. For example, traversing tidb's AST looks like it works really differently from traversing the sqlite antlr AST. Is this really scalable, especially if the end goal is to support all features of all the databases?

I wonder if there's a potential alternative (potentially insane) of using a single grammar that's a superset of all languages. Maybe to start with, use antlr to generate a parser for sql-92, which has published BNFs. From there, trim out the parts no supported engine supports, and extend to add features specific to postgres and mysql. Converting the antlr parsed AST for any of the engines to the IR would use the same code, but with specific code paths disabled or enabled to match engine support. This is probably a larger project up-front, but could pay off in the future in not having to maintain bunch of parser <=> IR adapters?

)

// Provide a read-only view into the catalog
func Newo(c *catalog.Catalog) InformationSchema {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Newo?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, as I said, that package is experimental and not used.

@mightyguava
Copy link
Contributor

@kyleconroy any thoughts on my long comment above?

@kyleconroy
Copy link
Collaborator Author

I'm a bit concerned about the maintainability of the engines though. It seems like it'll be quite a lot of effort to adapt each parser's output to a single fully featured IR.

So far this has not been the case. I've re-implemented the catalog using the new strategy and it's worked well.

I wonder if there's a potential alternative (potentially insane) of using a single grammar that's a superset of all languages.

The issue with a single grammar is that you have to write a filter per engine that knows which features are allowed for each engine. Right now this is handled by the individual parsers. For example, I have a generate CreateTypeStmt node. Since SQLite does not support CREATE TYPE, I don't need to do anything. The SQLite parser will fail to parse CREATE TYPE statements.

As I've done more work (#402, #401, #400, #399, #398, #397, #391) I'm confident in the current approach.

@mightyguava
Copy link
Contributor

Great to hear! I’d like to port the Kotlin code over for basic MySQL support as I have no use for it with PostgreSQL. Are the new parsers in a state for me to code against?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants