Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replaceable single entry point for parser #35243

Merged
merged 4 commits into from
May 12, 2020
Merged

Replaceable single entry point for parser #35243

merged 4 commits into from
May 12, 2020

Conversation

c42f
Copy link
Member

@c42f c42f commented Mar 24, 2020

This is a refactoring to allow the flisp parser to be replaced at runtime with some other parser as I've started to do in JuliaLang/FancyDiagnostics.jl#4.

I've largely separated this into logically distinct commits so it may be worth looking at those in isolation while reviewing. (The main place where the separation isn't quite clean is in some fluctuation of the parsing function API.) To summarize the implementation:

  • All parsing goes through a jl_parse interface which lives in a new file frontend.c. The idea of this new file is to eventually contain all frontend-related C code, some of which should move from ast.c over time.
  • jl_parse currently has a C ABI and the implementation can be swapped out with jl_set_parser. I'm somewhat unsure about whether this is best compared to maintaining a global parser object in julia, but it seems potentially easier for solving future bootstrap issues.
  • The flisp parser now has a single entry point jl_fl_parse which is compatible with jl_parse. This and other flisp entry points still live in ast.c.
  • I have moved the implementation of include into Julia, which allowed the mapexpr handling in jl_load_rewrite_file_string and jl_load_rewrite to be removed and implemented in Julia instead, and also allows jl_parse_eval_all to be only used within bootstrap and for implementing the older jl_load and jl_load_file_string C API (actually these could be changed to call Base.include in future).
  • As part of moving include, I've introduced two small internal functions Meta.parseall and Meta.parseatom to accompany Meta.parse. I've used this naming as I think this is what @JeffBezanson preferred; these can be further elaborated in Add Meta.parsefile and Meta.parseall functions as a wrapper around Base.parse_input_line #34715 and made public there.
  • To preserve nicer stack traces from Duplicate include(::String) for nicer stacktraces #33087, I've added _simplify_include_frames in order to not show users all the include implementation details which were previously hidden in the C code jl_parse_eval_all. This only applies when process_backtrace is called to simplify the backtrace for printing.

Future TODOs

Clearly this isn't the end-game for the parser API, but I think it moves in a useful direction and I'd like to get something merged here to avoid the work going stale.

A big unresolved issue is how the parser can communicate fine-grained location information to macro expansion, lowering, etc, and how that will be preserved through those passes. Macro expansion is particularly interesting as any naive approach will break user code! Eventually we want to preserve location info in a CSTParser-like data structure, but for now jl_parse returns a normal Expr.

I felt like jl_parse should accept rich options in principle (for example, to support communicating parser state when reparsing an arbitrary piece of text, and other such things). This is partly why it accepts a jl_value_t for the options rather than an integer I had in the earlier patches here. It remains to be seen exactly what options we might need, however.

Also I should say that bootstrap and the full removal of the flisp frontend is completely unexplored so far.

Original description:

This is the start of a replacement for #35149, for discussion. The goals are

I may leave this PR as a WIP (while I make a mess learning how everything slots together) and pull smaller logical pieces out as separate PRs for detailed review.

@JeffBezanson JeffBezanson added the parser Language parsing and surface syntax label Mar 24, 2020
@c42f c42f force-pushed the cjf/parser-api branch 2 times, most recently from 49626e7 to 524b3e5 Compare April 1, 2020 08:11
@c42f c42f force-pushed the cjf/parser-api branch from 524b3e5 to d224963 Compare April 8, 2020 21:11
@c42f c42f force-pushed the cjf/parser-api branch 4 times, most recently from beba008 to 8fee17d Compare April 26, 2020 01:29
@c42f c42f changed the title WIP: Replaceable single entry point for parser Replaceable single entry point for parser Apr 27, 2020
@c42f
Copy link
Member Author

c42f commented Apr 27, 2020

Ok, after some more cleanup and chasing down of a few broken tests I think this is ready for a closer look @Keno.

@vtjnash as you can see this has a single entry point for the parser in jl_parse. I've kept C ABI for now as it seemed this might be easier for bootstrap when we try to remove the flisp parser one day. If it's realistic to bootstrap with a Julia ABI and an approach more similar to jl_set_typeinf_func that alternative would also fine by me. At this stage I haven't experimented with how bootstrap could work with a Julia parser so I don't know what's best there.

BTW it's certain that this isn't the end game for the frontend, but I've tried to get to a reasonable intermediate state and document where it's at.

Copy link
Member

@vtjnash vtjnash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonably straightforward now—just some small changes.

base/loading.jl Outdated Show resolved Hide resolved
base/loading.jl Outdated Show resolved Hide resolved
base/meta.jl Outdated Show resolved Hide resolved
base/meta.jl Outdated Show resolved Hide resolved
base/stacktraces.jl Outdated Show resolved Hide resolved
src/frontend.c Outdated
const char* filename, size_t filename_len,
size_t offset, jl_value_t* options)
{
return (*jl_current_parser)(text, text_len, filename, filename_len, offset, options);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't invalid to do, but I worry it'll reintroduce #27894, but much worse (as that's part of the definition of what cfunction means). Since we already have many APIs that call back into Julia without requiring a C shim (e.g. Base.require, Core.Compiler.typeinf_ext, Base.loaded_modules_array(), etc.) and none that use this design, I think it's just better that we stick with what we know (using jl_apply here, like jl_type_infer for example) and is slightly more powerful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok then, let's do it that way, it's simpler anyway. We can worry about bootstrap later (if indeed it makes any difference to this).

src/support/htable.h Show resolved Hide resolved
src/ast.c Outdated Show resolved Hide resolved
src/ast.c Outdated Show resolved Hide resolved
src/toplevel.c Outdated Show resolved Hide resolved
Copy link
Member

@vtjnash vtjnash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonably straightforward now—just some small changes.

base/loading.jl Outdated Show resolved Hide resolved
result = nothing
line_and_ex = Expr(:toplevel, loc, nothing)
for ex in ast.args
if ex isa LineNumberNode
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this removes our hacky globals

      jl_lineno = 0;
      jl_filename = jl_string_data(filename);

I guess that's a good thing, it's just the end of an era, like discarding of a well-worn shoe.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha! Yes it's going in that direction, though it doesn't quite remove them: the internal machinery will still set them, and their use in extreme circumstances (jl_critical_error) will still work the same as it currently does (and be exactly as unthreadsafe as it currently is).

@c42f c42f force-pushed the cjf/parser-api branch 5 times, most recently from f30dc33 to e8567db Compare May 2, 2020 22:15
@c42f
Copy link
Member Author

c42f commented May 3, 2020

This has been updated to move almost the whole parser replacement interface into Juila, with a few tiny shims in Core.Compiler to make this possible. I think the resulting interface is a lot more satisfying and hopefully somewhat more future proof.

One thing I find particularly satisfying is that the ownership of the text buffer can now be safely communicated through the interface. So if the parser wants to return an error message which references the text buffer, and that buffer is a String (or other Julia object), there's no need to make an entire copy of the buffer. With the buffer communicated as a Ptr{UInt8} in the previous setup, the only real option was to copy the entire buffer if you wanted to preserve it.

Regarding the ability to freeze the parser in a certain world, I added a _apply_in_world builtin so that this could be done from Julia code. This generalizes _apply_latest and could in fact replace it, though I didn't go that far here. After adding _apply_in_world, I realized that it's awkward to make this part of the interface through which the parser is called. Rather, the builtin allows the parser itself to make the decision about whether it runs in a given world, and what world that needs to be.

@c42f c42f force-pushed the cjf/parser-api branch from e8567db to a050e00 Compare May 3, 2020 04:03
base/stacktraces.jl Outdated Show resolved Hide resolved
base/stacktraces.jl Outdated Show resolved Hide resolved
Copy link
Member

@vtjnash vtjnash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good to me thus far. Slightly unfortunate to be causing Base now to require the existence of Core.Compiler, so perhaps we should put the bare minimum of those symbols into boot.jl, but not a show-stopper towards merging this.

base/stacktraces.jl Outdated Show resolved Hide resolved
src/ast.c Outdated Show resolved Hide resolved
@c42f
Copy link
Member Author

c42f commented May 4, 2020

Slightly unfortunate to be causing Base now to require the existence of Core.Compiler, so perhaps we should put the bare minimum of those symbols into boot.jl

How about I move the entire content of compiler/parsing.jl into boot.jl? It's only four new symbols.

@c42f c42f force-pushed the cjf/parser-api branch from f34a8f3 to f005a8c Compare May 4, 2020 02:29
@c42f
Copy link
Member Author

c42f commented May 4, 2020

Ok, I've addressed all suggested cleanups, and moved the Core.Compiler.parse shim to Core.parse, to break an explicit dependency from Base to Core.Compiler. Also I just deleted Core.Compiler.set_parser, as a simple Core.eval(:(_parser = $whatever)) will suffice during internal development and we can decide whether we actually want any API for this later.

@c42f c42f force-pushed the cjf/parser-api branch 3 times, most recently from a43b3ad to 73f857a Compare May 11, 2020 13:16
@c42f
Copy link
Member Author

c42f commented May 11, 2020

Rebased to fix the mess of commits and clean up a few things. Notable cleanup:

  • There's now only a single binding Core._parse which is added to Core.
  • I removed the _apply_in_world builtin from here - will move to a separate PR.

I think this is in a good state now so I plan to merge it later today unless there's anything further.

@c42f c42f mentioned this pull request May 12, 2020
4 tasks
base/boot.jl Outdated Show resolved Hide resolved
@JeffBezanson
Copy link
Member

Looking good!

c42f added 2 commits May 12, 2020 13:44
Add Core._parse binding as the internal entry point for parsing of Julia
code. This is called by both C and Julia code which wishes to parse
Julia. For now, it points at the flisp parser by default.

Refactor include() implementation to be done in Julia code, and parsing
rearrangement:

* Implement include() and include_string() in Julia. This also reunifies
  the versions of include() in MainInclude and Base which had started
  diverging.
* Add internal Core.Compiler.fl_parse and set this as the default value
  of the Core._parse binding.
* Use Core._parse from Meta
* Add `Meta.parseall()` for top-level parsing. For now as an internal
  function, but this will likely be made public shortly.

Refactoring in C runtime code:

* Remove flisp code from jl_parse_eval_all. This makes the top level
  parse-lower-eval loop independent of flisp internals.
* Remove jl_parse_eval_all from use as the include implementation (still
  used during bootstrap). This also allowed the removal of mapexpr
  handling from the C code.
* Keep jl_load and jl_load_file_string from julia.h unchanged for
  compatibility with the existing C API. (No julia code `ccall`s
  these anymore)
* Unify flisp parsing to always parse from string rather than directly
  from file.
This simplifies stack traces for include() which were made much more
complex by moving jl_parse_eval_all to the Julia layer.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parser Language parsing and surface syntax
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants