-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AST infra rewrite #1462
AST infra rewrite #1462
Conversation
1cd0ddb
to
86312a0
Compare
52a6458
to
ef54a3d
Compare
d7f6af6
to
52e6db5
Compare
52e6db5
to
beb8fe3
Compare
87a29f3
to
4b4f889
Compare
4b4f889
to
fe6caa9
Compare
8d23267
to
dcc6e13
Compare
dcc6e13
to
950444a
Compare
Main changes:
|
Some preliminary benchmarks (all with release builds): Code generation (from .spicy to C++):
Build times for the toolchain:
Max RSS while building the toolchain:
Binary sizes:
|
dc47214
to
58dfb9f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a pass, but likely missed a lot. Overall this looks all much more natural and also much simpler to me, so really looking forward to using it. I am really a fan of the more accessible global information. Thanks for all the work.
Couple global points:
- I believe the new factory functions could all take their args by value
- Unless we have a good reason not to we should use
std::make_shared
in factory functions - I suspect there might be opportunities to have getters on nodes to return raw ptrs instead of
*Ptr
types since most often callers want to only inspect, but not share ownership (e.g., in the base classes forExpression::type
, but also in many derived classes which callchild
which would need an anotherchild
function returning a non-owning ptr). I haven't checked in detail though. - We didn't heavily use
final
before and I am not sure we care about sealing interfaces. I don't have a strong opinion though. - This PR leaves the copyright headers in inconsistent state, could you make a pass updating them to e.g.,
2020-2024
?
394cea8
to
8cf68ea
Compare
I often followed clang-tidy recommendations, though honestly not always understanding them. Either way, I'm still planning to move away from
Probably, though I'd hope the compiler can optimize much of that away. But once again same argument: we'll soon have only raw pointers anyways.
Yeah, I was considering that and ended up thinking it doesn't really matter. If external parties started depending on this , it would be different, but I don't see that right now, and would rather take the optimization potential.
Will do. |
70087ab
to
7fc5fc0
Compare
7fc5fc0
to
70977d7
Compare
@bbannier ready for another review (except that it seems we might have still GCC problems on Debian 10). |
Added one more commit to fix that Debian 10 issue. |
46fb77f
to
7fe5b35
Compare
Squashed and rebased to main. |
7fe5b35
to
414149a
Compare
This is a large revamp of compiler internals, cleaning up and speeding up lots of the AST pipeline. From a user perspective, nothing changes, except that the new compiler is a tiny bit more strict: turns out that in rare cases the old compiler ended up accepting some ill-formed Spicy code that is now (rightfully) rejected. Specifically, two instances of this are known where existing Spicy code may need tweaking now: - Identifiers from the (internal) `hilti::` namespace are no longer accessibly. Usually you can just scope them with `spicy::` instead, and it'll work. - The old compiler didn't always enforce constness as it should have. In particular, function parameters could end up being mutable even when they weren't declared as `inout`. Now `inout` is required for supporting any mutable operations on a parameter, so make sure to add it where needed. For the record, the following is a summary of the main internal changes between new and old archictures: - Switched to traditional class hierarchy for AST nodes, with `Node` at the top. - Modeling type constness with a new node `QualifiedType`, instead of coding it into all `Type` instances. `Type` has been renamed into `UnqualifiedType` for easier distinguishing and so that the compiler will catch existing usage of the old `Type` that hasn't been updated. We also use `QualifiedType` to track LHS/RHS semantics. - IDs are no longer nodes themselves, just node attributes like other atomic values. - We now instantiate all nodes through static factory methods called `create()` that take an additional global `ASTContext` instance, which maintains AST-wide state. To facilitate this approach, all node constructors are private now; the caller must go through the `create()` methods. - The AST context holds all modules inside a single, global AST (previously, we have one AST per module). The root of this single AST is a node `ASTRoot`, and the immediate children of that node are the modules (`declaration::Module`). This structure significantly simplifies cross-module processing. Visitors now traverse the full global tree, seeing all modules at once. - To make usage of the `create()` methods easier, there's a new `NodeFactory` class that provides forwarding methods. The factory stores the relevant `ASTContext` internally and automatically adds it during forwarding. This factory also replaces the old `builder()->*` functions for creating AST nodes. The forwarding methods are automatically generated through `scripts/autogen-builder-api` (which, in turn, uses a `libclang`-based tool `bin/autogen-builder-api`). - Nodes now keep pointers to their parents. That replaces the old `position_t` stuff. - No singleton nodes anymore, they don't play well with that parent pointering. - No `node::None` anymore, using `nullptr` instead. Likewise, no `optional<SomeNode>` anymore, using `nullptr` as well. - For visitors, we're using traditional double-dispatch with virtual methods. No more complex template magic, and no more result values from visitors either. There are just two visitor variants now for pre-order and post-order walking (and those two are still implemented through a joined template). Plus, there are HILTI and Spicy version of each visitor. The Spicy-version adds additional virtual methods for Spicy-specific nodes. - Unresolved types: Use `Auto` if the type is presumably still going to be resolved, and `Unknown` if the type has been determined to be finally unresolvable. - To render AST nodes in a way that's user-presentable, use `print()` methods. In contrast, `dump()` is limited to dump out debug information, in particular the AST - Resolver/coercer/normalizer have all merged into a new, single pass called "resolver". The previous separation was confusing and hard to maintain. - The passes that were formerly in `compiler/visitors/*.cc` have moved up a level and gained their own header files to declare their entry points. No more `global.h` either. - No cycles are allowed inside the AST. That means a child cannot point back to a node that's already elsewhere in the AST. That also means, each node does have a unique parent. When adding a node to an AST that already has a parent, the node is deep-copied automatically (see next bullet). In case something goes wrong, debug builds run a cycle detector to catch problems. - When adding a node to an AST (i.e., as the child of another Node), there are two cases: 1. It's a new node instance that does not have a parent yet. In that case, we directly add a pointer to that Node as the child. 2. It's a node instance that already has a different parent. In that case, we deep-copy the node first and add a pointer to the copy as the child. Note that when wanting to *move* a node from one location inside the AST to another, you can avoid the deep-copy by first removing it from its old parent (which will clear the parent pointer) and then adding it to the new parent. Conceptually, there are two main places where this logic happens: 1. when manipulating the child of a Node through the corresponding Node methods; and 2. when instantiating new Node and given them their initial children. Generally, this all happens automatically when going through the corresponding methods, which deep-copy on-demand as needed. - If we *semantically* do need a cycle, the solution is to store a reference (see next bullet) to the target inside an explicit node attribute that's not part of the normal parent/child pointering. Examples of nodes doing this are `expression::Name`, `type::Name`, and `QualifiedType`. The latter has the notion of "external" types where the wrapped `UnqualifiedType` is not a child, but stored somewhere else inside the AST; the `QualifiedType` just stores a reference to it, which it transparently unwraps when the `UnqualifiedType` is requested (through `type()`). - The "references" mentioned above are implemented through mappings inside the ASTContext. Currently, this is supported for declarations and types. First, one registers the declaration/type with the AST context. That assigns it a unique, stable index ID. That index acts as the reference that nodes can store as attributes. To dereference, one can later ask the context to provide the declaration/type that corresponds to the stored index. - Non-abstract classes derived from Node: - Public `create()` factory - Protected constructors, called from `create()` - Protected copy/move constructors/assignment that perform shallow copies. - We do type comparison by computing a serialized version of each type, call "type unification". We then comparing those unifications as strings. An empty unification always compares false against anything else. - When comparing types, we distinguish between types that are compared structurally (most) and by name ("name types"; e.g., struct and enums). A virtual type method `isNameType()` indicates of which kind a type is.
dd7e633
to
2843729
Compare
We previously would evaluate whether a unit field or method was required by an optional feature before we had finished collecting all feature requirements. This behavior was fine when we were visiting individual modules' AST on by one, but breaks with #1462 were we changed using a single AST to hold all modules. This patch defers transformations until all feature requirements have been collected.
We previously would evaluate whether a unit field or method was required by an optional feature before we had finished collecting all feature requirements. This behavior was fine when we were visiting individual modules' AST one by one, but breaks with #1462 were we changed using a single AST to hold all modules. This patch defers transformations until all feature requirements have been collected.
We previously would evaluate whether a unit field or method was required by an optional feature before we had finished collecting all feature requirements. This behavior was fine when we were visiting individual modules' AST one by one, but breaks with #1462 were we changed using a single AST to hold all modules. This patch defers transformations until all feature requirements have been collected.
We previously would evaluate whether a unit field or method was required by an optional feature before we had finished collecting all feature requirements. This behavior was fine when we were visiting individual modules' AST one by one, but breaks with #1462 were we changed using a single AST to hold all modules. This patch defers transformations until all feature requirements have been collected.
We did this previously but stopped doing it with #1462.
We did this previously but stopped doing it with #1462.
I'm marking this as ready for review. I'll profile and polish a bit more, but generally the code should be stable at this point. The main missing piece is porting the Zeek plugin over to the new API, which I'll work on next.
This reworks the AST infrastructure to become much simpler in terms of
implementation and usage. At a high level, we get rid of (1) the
value-based storage of nodes through type erasure, and (2) all the
template magic implementing type erasure and visitors. We replace that
with a traditional "old-style" class hierarchy representing AST node
relationships, with memory management through smart pointers. ASTs are
now mutable as well.
See 1st comment below for some more detailed notes on changes
to the AST code.
See 2nd comment below for some preliminary benchmarks.
Note that when compiling existing analyzers, the new compiler is a bit more strict in what it accepts. While there aren't any language changes, it turns out that the old compiler ended up accepting some things that it shouldn't have. Specific notes for porting analyzers:
hilt::
namespace are no longer available; usually you can just scope them withspicy::
instead and it'll work.inout
. Theinout
is now required for all mutable operations on parameters.