Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start macro expansion chapter #26

Merged
merged 6 commits into from
Jan 31, 2018
Merged

Conversation

mark-i-m
Copy link
Member

I just went through this code to implement ? macro repetition, so I thought I could take a stab at the chapter 😄

@mark-i-m mark-i-m changed the title [WIP] Start macro expansion chapter Start macro expansion chapter Jan 25, 2018
@mark-i-m
Copy link
Member Author

@nikomatsakis I don't really know anything about hygiene, proc macros, or custom derive, but I added a bit about macros-by-example, and left TODOs for the rest...

Copy link
Contributor

@nikomatsakis nikomatsakis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! Thanks =) I left some small suggestions


Macro expansion happens during parsing. `rustc` has two parsers, in fact: the
normal Rust parser, and the macro parser. During the parsing phase, the normal
Rust parser will call into the macro parser when it encounters a macro. The
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you be more precise about what a reference to a macro is? e.g. ,do you mean a macro invocation, like foo!(...)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, is it really called from the parser? I thought there was a second phase that came after parsing, but maybe I'm going to learn something here =)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me verify that :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nikomatsakis

Ok, so it looks like

normal Rust parser, and the macro parser. During the parsing phase, the normal
Rust parser will call into the macro parser when it encounters a macro. The
macro parser, in turn, may call back out to the Rust parser when it needs to
bind a metavariable (e.g. `$my_expr`). There are a few aspects of this system to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here, you mean when the macro is trying to parse the contents of the macro invocation against one of the macro arms?

Basically, the macro parser is like an NFA-based regex parser. It uses an
algorithm similar in spirit to the [Earley parsing
algorithm](https://en.wikipedia.org/wiki/Earley_parser). The macro parser is
defined in `src/libsyntax/ext/tt/macro_parser.rs`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make links into GH here (master branch)? this at least allows us to detect if those links rot

Rust parser will call into the macro parser when it encounters a macro. The
macro parser, in turn, may call back out to the Rust parser when it needs to
bind a metavariable (e.g. `$my_expr`). There are a few aspects of this system to
be explained. The code for macro expansion is in `src/libsyntax/ext/tt/`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make links into GH here (master branch)? this at least allows us to detect if those links rot

bind a metavariable (e.g. `$my_expr`). There are a few aspects of this system to
be explained. The code for macro expansion is in `src/libsyntax/ext/tt/`.

### The macro parser
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as a meta-comment, I think it's a good idea to start out with some kind of concrete example and walk it through. For example:

Imagine we have a macro

macro_rules! foo {
    ($metavariable:tt) => { ... }
}

now you can reference this example from the text below

parse different types of metavariables, such as `ident`, `block`, `expr`, etc.,
the macro parser must sometimes call back to the normal Rust parser.

Interestingly, both definitions and invokations of macros are parsed using the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling: invocations

_using the macro parser itself_.

When the compiler comes to a macro invokation, it needs to parse that
invokation. This is also known as _macro expansion_. The same NFA-based macro
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling: invocation

When the compiler comes to a macro invokation, it needs to parse that
invokation. This is also known as _macro expansion_. The same NFA-based macro
parser is used that is described above. Notably, the "pattern" (or _matcher_)
used is the first token tree extracted from the rules of the macro _definition_.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is where the running example would be really handy

parser is used that is described above. Notably, the "pattern" (or _matcher_)
used is the first token tree extracted from the rules of the macro _definition_.
In other words, given some pattern described by the _definition_ of the macro,
we want to match the contents of the _invokation_ of the macro.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling: invocation

that non-terminal. Then, the macro parser proceeds in parsing as normal.

For more information about the macro parser's implementation, see the comments
in `src/libsyntax/ext/tt/macro_parser.rs`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link to repo

@federicomenaquintero
Copy link

BTW, there's a very interesting discussion about hygiene and proc-macro in rust-lang/rust#45934

@mark-i-m
Copy link
Member Author

@nikomatsakis I updated the chapter (a lot). I think I have addressed your comments. Let me know. Thanks!


Also, copying this here, because the comment above is "outdated":

Also, is it really called from the parser? I thought there was a second phase that came after parsing, but maybe I'm going to learn something here =)

Ok, so it looks like

@mark-i-m
Copy link
Member Author

@federicomenaquintero Would you be interested in filling in some of the TODOs? I want to learn how they all work, but I don't have the bandwidth in the near future...

@nikomatsakis
Copy link
Contributor

Does this run after the parser?

Yes, it does


`$mvar` is called a _metavariable_. Unlike normal variables, rather than binding
to a value in a computation, a metavariable binds _at compile time_ to a tree of
_tokens_. A _token_ zero or more symbols that together have some meaning. For
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A token zero or more symbols that together have some meaning.

This sentence is not grammatical and I'm not quite sure how to fix it. =) In particular, I don't think of a token as "zero or more symbols" (and it's sort of unclear to me what you mean by symbol, which in parsing terminology is often used to mean the union of token and nonterminal).

I think I would maybe say something like this:

"A token is a single "unit" of the grammar, such as an identifier (e.g., print) or punctuation (e.g., =>). Token trees resulting from paired parentheses-like characters ((...), [...], and {...}) -- they include the open and close and all the tokens in between (we do require that parentheses-like characters be balanced)."

but it doesn't seem like the best either :)

Copy link
Contributor

@nikomatsakis nikomatsakis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really nice. I left a few nits.


The process of expanding the macro invocation into the syntax tree
`println!("{}", foo)` and then expanding that into a call to `Display::fmt` is
called _macro expansion_, it is the topic of this chapter.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: the word it is not needed here

In the analogy of a regex parser, `tts` is the input and we are matching it
against the pattern `ms`. Using our examples, `tts` could be the stream of
tokens containing the inside of the example invocation `print foo`, while `ms`
might be the sequence of token (trees) `print $mvar:ident`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tying back to the example is 💯

@mark-i-m
Copy link
Member Author

@nikomatsakis I updated the paragraph on tokens, as you suggested... I am wondering

  1. Is this a term we should add to the glossary?
  2. Should the parsing chapter go into more detail about lexing and parsing or rely on external sources for the basics? If it went into more detail, maybe we can just assume that the reader knows about tokens from that chapter and omit the paragraph from this chapter altogether?

@Michael-F-Bryan
Copy link
Contributor

Should the parsing chapter go into more detail about lexing and parsing or rely on external sources for the basics?

I was asking myself this exact question when I wrote the start of the parser chapter. Should we add a small note up the top saying we assume people know how a basic recursive descent parser works and what tokenizing/lexical analysis is? The idea being this is a book about rustc internals, not a book an introduction to parsing.

There is already loads of good quality material on basic parsers on the internet, a couple paragraphs at the top of the chapter probably wouldn't be able to do it justice.

@mark-i-m
Copy link
Member Author

mark-i-m commented Jan 30, 2018

I agree that we shouldn't try to teach parsing here, but given that I don't expect most people to know basic parsing, I worry that it would discourage contributions... Perhaps we can

  • Add a high level overview of the algorithm and point to a few solid resources for learning in detail
  • Give some key term definitions
  • Tie them all back to the code

What do you think?

@mark-i-m mark-i-m closed this Jan 30, 2018
@mark-i-m mark-i-m reopened this Jan 30, 2018
@mark-i-m
Copy link
Member Author

Erg... Sorry, I fat-fingered the "close and comment" button.... Updated my post above

@federicomenaquintero
Copy link

@federicomenaquintero Would you be interested in filling in some of the TODOs? I want to learn how they all work, but I don't have the bandwidth in the near future...

Yes, I'll see what I can do.

@Michael-F-Bryan
Copy link
Contributor

Michael-F-Bryan commented Jan 31, 2018

What do you think?

Sounds like a good idea. We could say something like this:

Rust syntax is specified by a grammar (link) which is essentially a list of rules where each rule specifies how a piece of the language is written (e.g. a crate contains multiple items, an item may be a function declaration, a function has a bunch of statements, a statement is a ...), with each rule being written in terms of other rules or terminals (the base case, typically tokens).

Generally speaking, for each grammar rule there will be one parser method. In this way we can translate a token stream into an AST by recursively calling the appropriate method.

It's essentially recursive descent 101, but you could tie all of this back to rustc by inspecting a sample code snippet (e.g. an if statement) and then showing what would be called when parsing it.

EDIT: @mark-i-m this conversation probably belongs in #13, so I'm moving it over there.

@Michael-F-Bryan Michael-F-Bryan mentioned this pull request Jan 31, 2018
8 tasks
@nikomatsakis
Copy link
Contributor

I agree that we shouldn't try to teach parsing here, but given that I don't expect most people to know basic parsing, I worry that it would discourage contributions... Perhaps we can

I think I agree with both of you. I don't think we want a lot of introductory material; a few links don't hurt, but not too much. But I think there's a third way, though it may take some iteration to get there: To some extent, I think you can serve both audiences by doing a kind of "walk through" of the code.

In other words, e.g. to explain tokenizing, we might point to the token data structure and give some source showing how it would be divided into tokens (we can always link to wikipedia or something too). This way, if you know what a token is, you learn about the Rust-specific parts of it. If you don't know what a token is, you can just understand it as this Rsut data structure and later learn about the more general form.

Similarly I imagine we can say something like "Rust has a recursive-descent parser" (where we link to wikipedia) and then walk through how it would parse some small example, showing a few key functions (eg., the one that parses a type). If you're not familiar with recursive descent, this will basically give you the idea, but if you are, then you'll learn about the names of key concepts in the code.

@nikomatsakis nikomatsakis merged commit b4b2b0d into rust-lang:master Jan 31, 2018
@mark-i-m mark-i-m deleted the macros branch May 23, 2018 16:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants