Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewriter infrastructure revamp #5611

Merged
merged 16 commits into from
Jan 21, 2020
Merged

Rewriter infrastructure revamp #5611

merged 16 commits into from
Jan 21, 2020

Conversation

systay
Copy link
Collaborator

@systay systay commented Dec 22, 2019

Before this change, we have two different AST visitors - one for rewriting expressions, and one for visiting SQLNodes. Both these needed to be manually updated whenever the AST changed.

This change introduces a new visitor pattern, that can do what the older visitors could, but that is generated from the AST instead of being hand coded.

To enable this, I had to restructure the AST a little. Now, the AST structs and interface implementations live in the ast.go file, which is what the visitor generator reads to produce the visitor. The rest of the AST functionality, all methods belonging to structs and not implementing any interfaces, now live in ast_funcs.go.

I'm now using this new visitor to simplify the AST rewriting we do today: auto-parameterization (replacing values with bindvars), last_insert_id() and database() should all happen close to each other and not be done while planning.

Copy link
Member

@deepthi deepthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a major refactor/rewrite. Can you write a description for the PR that lays out how things are changing and why?

@@ -72,6 +72,11 @@ install: build
parser:
make -C go/vt/sqlparser

visitor:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we be calling this whenever there is a change to sql.y (and sql.go)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really. only when there is a change to ast.go

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will people need to know to call this manually? is there a test that will catch it if someone forgets to do so after changing ast.go?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added pre-commit hook

@@ -0,0 +1,161 @@
package visitorgen
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing license header

@@ -0,0 +1,161 @@
package visitorgen

// simplified ast
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be some explanatory docs here.

@@ -0,0 +1,79 @@
package visitorgen
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing license header.

Copy link
Contributor

@sougou sougou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this codegen is not as type-safe as I had expected. It can still panic if you pass the wrong type to cursor.Replace. In this particular case, the codegen should have generated two visitor functions: one for *AliasedExpression, and one for Expr. The cursor.Replace for *AliasedExpression would only accept an AliasedExpression instead of SQLNode.

And the pattern I described for handling the substitutions in the rewriter would allow you to compose the two.

However, this is good enough for now. Maybe we can improve this later. Let's move forward with this. I can merge once the minor nits are taken care of.

PS: Note that I am nitpicking. This is otherwise an awesome piece of work!

@@ -87,6 +88,11 @@ install: build
parser:
make -C go/vt/sqlparser

visitor:
go build -o visitorgen go/visitorgen/main/main.go
./visitorgen -input=go/vt/sqlparser/ast.go -output=$(REWRITER)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to just go run go/visitorgen/main/main.go -input go/vt/sqlparser/ast.go -output=go/vt/sqlparser/rewriter.go. Then you don't have to worry about cleanup of the binary. I just tried, and it works as intended. For some reason, I had to remove the = after input.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't get this to run on my Mac. All I get is:

> go run -o visitorgen go/visitorgen/main/main.go -input go/vt/sqlparser/ast.go -output=go/vt/sqlparser/rewriter.go
flag provided but not defined: -o
usage: go run [build flags] [-exec xprog] package [arguments...]
Run 'go help run' for details.
 > go run -o visitorgen go/visitorgen/main/main.go -input=go/vt/sqlparser/ast.go -output=go/vt/sqlparser/rewriter.go
flag provided but not defined: -o
usage: go run [build flags] [-exec xprog] package [arguments...]
Run 'go help run' for details.
> go run -o visitorgen go/visitorgen/main/main.go -input=go/vt/sqlparser/ast.go -output go/vt/sqlparser/rewriter.go
flag provided but not defined: -o
usage: go run [build flags] [-exec xprog] package [arguments...]
Run 'go help run' for details.
> go run -o visitorgen go/visitorgen/main/main.go -input go/vt/sqlparser/ast.go -output go/vt/sqlparser/rewriter.go
flag provided but not defined: -o
usage: go run [build flags] [-exec xprog] package [arguments...]
Run 'go help run' for details.

Copy link
Member

@dweitzman dweitzman Jan 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Sugu that go run would be much nicer here. In the error output you saw, it looks like you had go run -o vistorgen and an error about the -o. If you get rid of those arguments it seems like go run should work

return sqltypes.Int24
case keywordStrings[INT]:
fallthrough
case keywordStrings[INTEGER]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to use case keywordStrings[INT], keywordStrings[INTEGER]:

func ReplaceExpr(root, from, to Expr) Expr {
expr, success := Rewrite(root, replaceExpr(from, to), nil).(Expr)
if !success {
panic("expression rewriting ended up with a non-expression")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better not to panic. This impossible in the new framework, right? Then it's better to not checkat all.

if right, ok = node.Right.(*SQLVal); !ok {
return false
}
if node.Operator == NotEqualStr && left.Type == right.Type {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use bytes.Equal

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean, and this is not something I changed: https://github.com/vitessio/vitess/blob/master/go/vt/sqlparser/ast.go#L2431

type expressionRewriter struct {
lastInsertID, database bool
err error
aliases []*AliasedExpr
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now see why you were trying to add aliases for every expression :).

If you were already willing to pay the price of generating an alias for every select expression, there is a simpler way: As soon as you encounter an unaliased *AliasedExpression, generate an alias for it locally. Then create a new RewriteASTResult and invole a Rewrite on it. If it returns saying that something has changed, then you substitute the new alias after it returns. And always return false.

This avoids the need to maintain a separate aliases list, and also there is no need for a comingUp function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice idea. Much cleaner.

cursor Cursor
}

func replaceAliasedExprAs(new SQLNode, parent SQLNode) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use something other than new? It could confuse people that use it as keyword.

Signed-off-by: Andres Taylor <[email protected]>
Signed-off-by: Andres Taylor <[email protected]>
@dweitzman
Copy link
Member

dweitzman commented Jan 21, 2020

I'm wondering if parameterizing the parser could avoid a whole bunch of code and complexity in supporting database() and last_insert_id().

Suppose we could change this part of sql.y:

/*
  Regular function calls without special token or syntax, guaranteed to not
  introduce side effects due to being a simple identifier
*/
function_call_generic:
  sql_id openb select_expression_list_opt closeb
  {
    $$ = &FuncExpr{Name: $1, Exprs: $3}
  }

To something more in the spirit of this:


sql.go:
// vtgate can set up a custom SQL parser that replaces certain
// functions during parse time. Avoids the need to adjust the AST
// later.
func yyNewParser(f FuncCreator) yyParser {
	return &yyParserImpl{funcCreator: f}
}

func (yyrcvr *yyParserImpl) CreateSimpleFunc(name ColIdent, exprs SelectExprs) *FuncExpr {
  if (len(exprs) == 0 && yyrcvr.funcCreator != nil) {
    res := yyrcvr.funcCreator.Create(name)
    if res != nil {
       return res
    }
  }
  return &FuncExpr{Name: name, Exprs: exprs}
}

sql.y:
/*
  Regular function calls without special token or syntax, guaranteed to not
  introduce side effects due to being a simple identifier
*/
function_call_generic:
  sql_id openb select_expression_list_opt closeb
  {
    $$ = yyrcvr.CreateSimpleFunc($1, $3)
  }

If a parameterized parser could be set up, is there any functionality in the rewriter today that a parameterized parser wouldn't be able to handle?

@sougou sougou merged commit 2cd6a1b into vitessio:master Jan 21, 2020
Copy link
Member

@dweitzman dweitzman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the larger question of whether the rewriter can be avoided entirely, but since the code is committed now I'll send some of the pending comments I had for the small fraction of the changes I've read so far


const usage = `Usage of visitorgen:

go run go/visitorgen/main/main.go -input=/path/to/ast.go -output=/path/to/rewriter.go
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"go run ./go/visitorgen/main" is a little shorter and avoids a "missing dot in first path element" error

@@ -0,0 +1,130 @@
/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about putting this package at go/ast/visitorgen so it's more clear that it's related to ast?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could do. I was thinking I wanted to use visitorgen for our Primitive plans as well. We don't rewrite them today, but we do visit them using the Find method, and I foresee rewriting of plans in our future as well.

Never mind - it will be easy to move it somewhere else when we do decide to start using it for more than the ast. For now, I'll move it into the sqlparser package

if *inputFile == "" || *outputFile == "" {
fmt.Println("> " + *inputFile)
fmt.Println("> " + *outputFile)
panic("need input and output file")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

panicing with a printed stack trace feels a bit aggressive. Maybe just log error an exit? Same comment elsewhere

exit.Return from vitess.io/vitess/go/exit seems to be popular for most of the vitess commands, although it's not super clear to me what value it provides over directly calling os.Exit()

replacementMethods := visitorgen.EmitReplacementMethods(vd)
typeSwitch := visitorgen.EmitTypeSwitches(vd)

fw := newFileWriter(*outputFile)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the fileWriter struct could be avoided here. One way:

var b bytes.Buffer
fmt.Fprintln(b, fileHeader)
fmt.Fprintln(b, ...)
err := ioutil.WriteFile(*outputFile, b.Bytes(), 0644)

visitor:
go build -o visitorgen go/visitorgen/main/main.go
./visitorgen -input=go/vt/sqlparser/ast.go -output=$(REWRITER)
rm ./visitorgen
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you feel about having the Makefile invoke go generate and the //go:generate annotation could have the go run command?

Being able to use pure go build tools seems nice. Anyone who wants to use make to invoke go would have the option, although I don't think it would do much for them.

switch node := cursor.Node().(type) {
case *AliasedExpr:
if node.As.IsEmpty() {
buf := NewTrackedBuffer(nil)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this does a serialize / memory allocations / copying even when nothing needs to change

@@ -87,6 +88,11 @@ install: build
parser:
make -C go/vt/sqlparser

visitor:
go build -o visitorgen go/visitorgen/main/main.go
./visitorgen -input=go/vt/sqlparser/ast.go -output=$(REWRITER)
Copy link
Member

@dweitzman dweitzman Jan 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Sugu that go run would be much nicer here. In the error output you saw, it looks like you had go run -o vistorgen and an error about the -o. If you get rid of those arguments it seems like go run should work

@systay
Copy link
Collaborator Author

systay commented Jan 27, 2020

Thanks for the feedback, @dweitzman. Addressed most of it with #5770

@systay
Copy link
Collaborator Author

systay commented Jan 27, 2020

I'm wondering if parameterizing the parser could avoid a whole bunch of code and complexity in supporting database() and last_insert_id().

...

If a parameterized parser could be set up, is there any functionality in the rewriter today that a parameterized parser wouldn't be able to handle?

You are probably right, at least for this case. I'm thinking ahead, being guided by experience and gut feeling and not much hard data. Maybe I'm overestimating my pretty singular experience and not seeing aspects that are new in this situation.

I'll try to be clear about the assumptions that lead to thinking this would be a good idea:

  • I think we'll soon do more even more rewriting of the AST
  • We'll want to use the same generator to produce rewriting support for the vtgate engine Primitive trees
  • I value separation of concerns a lot around these things. It makes it easier to test, to understand and to reason about.
  • If performance is the issue we're trying to address with the parameterized parser, I think we can use the visitorgen tool to produce pretty fast code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants