Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add plugin interface to allow frontend replacement #35149

Closed
wants to merge 1 commit into from

Conversation

c42f
Copy link
Member

@c42f c42f commented Mar 18, 2020

This is a revival of #32063, but with a lot less code movement. (Moving code can be done as separate NFC later if desired.)

The idea here is to allow the flisp frontend to be incrementally replaced with a frontend written in julia. Parts of the frontend can be replaced in a piecemeal fashion by starting with the result of @ccall jl_flisp_frontend()::NTuple{12,Ptr{Cvoid}} and only updating some of the pointers.

It would be nice if jl_frontend_t could be slimmed down a lot in the future (perhaps along the lines of #31954 (comment)), but for now this patch reflects the frontend as it exists, avoiding any other code movement or refactoring.

This patch introduces the new interface and forwards all frontend
functions to it, generally avoiding any other code movement or
refactoring.
@c42f c42f added the parser Language parsing and surface syntax label Mar 18, 2020
@c42f c42f requested a review from Keno March 18, 2020 14:31
@vtjnash
Copy link
Member

vtjnash commented Mar 18, 2020

I still feel a rather inexplicably intense dislike towards this design (I think mostly because ccall/cfunction seems like a poor choice of interface for it). A lot of these hooks appear to exist so that we can call Julia code inside Base from inside Base—but this seems like excess complexity just to call a Julia function from Julia—so the precise goal here seems unclear. For the rest, I'd rather we copied the existing jl_type_infer design (e.g. jl_call_in_typeinf_world), as I think that'd be a lot cleaner, more reliable, and faster.

Thus, this PR seems to make sense to me only for rapid prototyping (e.g. short term work), but not merging (e.g. long term support).

@c42f
Copy link
Member Author

c42f commented Mar 19, 2020

The aim here is certainly to merge something which enables iteration on replacing the frontend by incrementally replacing flisp components, and supporting a transition period where the new parser can be made opt-in.

This is not about having a perfect solution right now; it's about enabling iteration as a risk reduction strategy: Rewriting the entire frontend is going to take some time and it's not something I'd like to do in a single massive PR - I think there's a nonzero risk that such a large project will stall again (for whatever reason we cannot yet predict).

So yes, I would like to merge this ugly but internal interface in the support of the above goals, with the aim of removing it as soon as possible.

However, I'd be happy with an alternative solution provided it supports the goals:

  • Make it possible to iterate on this big project from master
  • Support a transition period while we figure out bootstrapping, iron out the inevitable bugs etc.

@c42f
Copy link
Member Author

c42f commented Mar 19, 2020

Ok, so having a look at jl_type_infer, etc and trying to understand your objection, it seems to me that you don't like the fact that to enter a julia version of Meta.parse the call chain would go through C code:

For flisp, in this PR we have the call chain

Meta.parse -> jl_parse_string -> jl_fl_parse_string

And in a potential julia replacement it would be

Meta.parse -> jl_parse_string -> cfunction(NewParser.parse_string)

Am I correct in thinking it's this second thing you find particularly objectionable?

@c42f
Copy link
Member Author

c42f commented Mar 19, 2020

Assuming I'm on the right track, would you find it better if we inverted this whole thing to make the old code path more ugly so that the new code path can stay in Julia?

For example, suppose we had in Core some functionality like:

# Call into flisp parser
function flisp_parse_string(...)
    ccall(:jl_parse_string ...)
end

const _parse_string = Ref{Any}(flisp_parse_string)

function parse_string(args...)
    _parse_string[](args...)
end

flisp call chain from Julia is then

Meta.parse -> Core.parse_string -> flisp_parse_string --ccall--> jl_parse_string

Whereas calling a pure julia parser would look like

Meta.parse -> Core.parse_string -> NewParser.parse_string

A big downside of this approach is that we must deal with the difficulties of parser bootstrapping right away before we can reap any benefits.

@Keno
Copy link
Member

Keno commented Mar 19, 2020

so the precise goal here seems unclear

The precise goal is to make it easy to switch back and forth between the flisp parser, which is written in C and a potential pure julia parser. We'll likely be needing to that for a while, if only for bootstrapping purposes. In particular, I'd like to be able to have the parser be able to be a standalone .so, whether that so be written in flisp or Julia. Having the interface have a C ABI rather than a Julia ABI makes a lot of sense to me for that use case, esp since we call these functions both from C and Julia. @vtjnash I think it would help if you could elaborate on the precise nature of your objections, so we can improve this PR towards a useful state.

@c42f
Copy link
Member Author

c42f commented Mar 19, 2020

Oh, I've left out what the call chain looks like from the C side. During the transition period we'd have something like

for the flisp code,

jl_parse_string -> jl_parse_string_func --jl_apply-->
    Core.parse_string -> flisp_parse_string --ccall--> jl_fl_parse_string

for the julia replacement,

jl_parse_string -> jl_parse_string_func --jl_apply-->
    Core.parse_string -> NewParser.parse_string

(just using parse_string as an example. All other frontend-related functions would need similar treatment.)

@vtjnash
Copy link
Member

vtjnash commented Mar 19, 2020

This just looks like an arbitrary set of functions smashed together. There's 12 things in here, but only 2 are related to the parser (parse_all and parse_string). Then there's stuff relating to printing (the data goes the wrong direction), some stuff related to lowering, and then you're renaming jl_macroexpand to jl_fl_macroexpand, even though this pass is written in C, and putting that in here too.

If the goal is to do incremental development of the parser, I'd think the first step would be to export the actual guts of the parser, so that you can replace the specific internal components. You don't actually need much for that though, since you can just directly edit the scheme code on the fly (e.g. poke the jl_call_scm_on_ast_and_loc function, and perhaps also take a look at #19321). Though it's not clear to me that we'd want to make an exact copy of the internal design, especially given that we already have mostly functional replacement packages, so I just don't see the point to this intermediate state.

@vtjnash
Copy link
Member

vtjnash commented Mar 19, 2020

esp since we call these functions both from C and Julia

Once we finish bootstrapping, we only call the parser from Julia (e.g. Meta.parse or Core.include or print_expr)

@c42f
Copy link
Member Author

c42f commented Mar 20, 2020

Then there's stuff relating to printing (the data goes the wrong direction

Do you refer to the warn in expand_with_loc_warn? Indeed that makes me uncomfortable. Though I'd point out that macroexpand and parse_eval_all are even more deeply entangled with the runtime's global state.

I think it's fair to say that these chosen entry points are still coupled to the runtime state in an awkward way. In particular, there are many data flows which are implicit in the interface:

  • Warnings propagate from lowering to the runtime
  • Macro expansion calls into julia functions
  • Incremental evaluation in parse_eval_all calls back into the runtime

I feel like there's been a bunch of surface level reasons offered for why this approach is bad, but the implicit data flow is actually the bigger problem. @vtjnash would you agree this is the real problem?

(@JeffBezanson — this is more what I was complaining about a few days ago on a call when I said the number of functions was large. It would have been more accurate to say "I feel bad about this interface" and to have a deeper think about the reasons.)

@c42f
Copy link
Member Author

c42f commented Mar 20, 2020

To address some other assorted questions and quibbles:

If the goal is to do incremental development of the parser

Our current plan is to do a wholesale replacement by adapting CSTParser for the needs of Base. But to smooth the transition by supporting both the flisp parser and some CSTParser-derived parser for a short time (perhaps one release cycle).

The goal for lowering is to continue with #32201 in some form. Currently that's a fairly direct translation of lowering so it could be done more incrementally, in principle. Bootstrap would be "interesting".

renaming jl_macroexpand to jl_fl_macroexpand, even though this pass is written in C

There's 500 lines of flisp code in macroexpand.scm and maybe 200 lines of expansion-related C code in ast.c. So it seems the bulk of this is still in flisp.

Once we finish bootstrapping, we only call the parser from Julia (e.g. Meta.parse or Core.include or print_expr)

I would have thought we'd continue to support the embedding API jl_eval_string() at least.


"I feel bad about this interface"

So having said that myself, we've still got to make progress here in some way. I'd be kind of ok with just abandoning this approach for now and trying to more specifically attack the problem of replacing the parser. That may better inform exactly what interface we need. (But I'd also point out that Keno has already done such prototyping in #31954.)

Another idea would be to make all the data flow explicit by passing callbacks which functions like macroexpand can use to access runtime state. I'd be happy to pursue that as long as it's a worthwhile refactoring in the longer term.

Overall I would say I'm not wedded to the design here. I simply want a way forward and this is one possibility (essentially a modernization and cleanup of the approach taken in #31954).

@JeffBezanson
Copy link
Member

I think the reason parse_eval_all is like that is to try to reduce unnecessary julia <-> scheme conversions. If we didn't care about that, parse_all could just return an expression, which could then be separately evaluated. That, of course, involves macro expansion, which necessarily involves calling julia functions. The overall macro expansion process is driven from C, with a large "helper routine" written in scheme. That code should eventually just be part of lowering. Lowering is totally separate from parsing though, so it probably shouldn't be part of the same interface.

So, I don't think there is any need for callbacks; anything involving evaluation can be done fully outside the front-end. The parser can just spit out expressions, and lowering can just convert expressions to other expressions.

I think the jl_is_operator etc. functions are indeed part of the parser since they are queries against the grammar.

@vtjnash
Copy link
Member

vtjnash commented Mar 20, 2020

Then there's stuff relating to printing (the data goes the wrong direction)

I mean the queries like is_operator, which are for implementing a de-parser

Warnings propagate from lowering to the runtime

Yeah, this is a new feature and hard to do well. The promise here is that you can only get warnings if you're doing top-level eval, so we know the caller is special and is expecting to handle warnings.

Macro expansion calls into julia functions

Correct, but that's largely why I moved this pass entirely into C now. Makes it much easier to deal with GC roots in particular.

Incremental evaluation in parse_eval_all calls back into the runtime

We may want to revisit this now. In theory it reduces round-trips, but since docs are a macro, I'm guessing we almost always do the round trip anyways. So possibly might as well eliminate this now and just parse_all to an Expr(:toplevel) and eval that.

There's 500 lines of flisp code in macroexpand.scm and maybe 200 lines of expansion-related C code in ast.c. So it seems the bulk of this is still in flisp.

That's actually lowering code, but it just usually is only needed if there was a macro. It's a legacy name.

jl_eval_string()

True, though wouldn't be unusual for this to call into Julia directly and just assume it's there.

@c42f
Copy link
Member Author

c42f commented Mar 20, 2020

Good to chat about this on a call, thanks guys.

I think the consensus was:

First step is to refactor parsing into a single entry point which takes a string and parsing flags (greedy etc), and returns a Julia Expr.

Actually I think we forgot to cover the desired ABI. For now I'll go with C ABI.


Another interesting snippet from the call — we discussed bootstrapping and the idea that we could distribute a simple text form of the parsed code to enable bootstrapping from C. Naturally SExpressions were mentioned :-)

@c42f
Copy link
Member Author

c42f commented Mar 20, 2020

There's 500 lines of flisp code in macroexpand.scm and maybe 200 lines of expansion-related C code in ast.c. So it seems the bulk of this is still in flisp.

That's actually lowering code, but it just usually is only needed if there was a macro. It's a legacy name.

Yep. I know there's a plan/desire for this code not to be there... but in the current state of things it is.

Macro expansion calls into julia functions

Correct, but that's largely why I moved this pass entirely into C now. Makes it much easier to deal with GC roots in particular.

Agreed, I remember what it was like before. Then one day (a year or two ago?) I was reading the source and I realized you'd cleaned it all up. Certainly it's a lot nicer now.

@c42f
Copy link
Member Author

c42f commented Mar 26, 2020

Will be replaced by #35243

@c42f c42f closed this Mar 26, 2020
@c42f c42f deleted the cjf/compiler-frontend-api branch March 26, 2020 04:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parser Language parsing and surface syntax
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants