Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mir: remove the tree delimiter nodes #1334

Merged
merged 16 commits into from
Jun 11, 2024

Conversation

zerbina
Copy link
Collaborator

@zerbina zerbina commented Jun 6, 2024

Summary

Instead of marking the end of a subtree with a dedicated node, the
number of child nodes in a subtree is now stored in the subtree's root
node. The idea is to:

  • reduce the memory footprint of MIR trees
  • make the storage format more streamlined
  • simplify some traversal

Details

The idea of using dedicated end nodes originated at a time where:

  • fast back-/up-wards MIR traversal was required
  • statements could be nested

Neither is the case anymore, and replacing the delimiter mnkEnd nodes
with storing the length has some benefits:

  • significantly less nodes per MirTree (reduced memory usage, faster
    scanning, faster copying, etc.)
  • O(1) length lookup
  • lower cost of changing a subtree root node, or replacing it with its
    single child node
  • simpler recursive traversal, as the end nodes don't have to be
    ignored/considered anymore

The downside of removing the delimiters nodes are that:

  • slightly higher tree construction cost: the number of child nodes
    needs to be kept track of
  • walking "upwards" (i.e., towards the root) in the tree has a higher
    cost

Ultimately, the upsides outweigh the downsides, especially since
walking the tree upwards is seldomly done.

Adjustments

For memory efficiency, some subtree root nodes (like mnkPathPos)
stored extra information in the root node. Since all subtree root nodes
now need to store the length, this is no longer possible, and the
information has to be moved into separate nodes:

  • for call trees, whether the call has side-effects is stored as an
    immediate value (using the new mnkImmediate node) in the tree's
    first slot
  • for field/positional access, the field/position is stored in the
    tree's second slot

For easier processing of mnkObjConstr/mnkRefConstr trees, the
mnkBinding is introduced, which groups the mnkField and associated
expression together in a subtree.

The rest of the changes are about:

  • updating the MirBuilder routines to track the child node count and
    patch the root nodes
  • updating the construction of MIR trees in various places
  • updating the various tree traversal logic that expected the old
    layout

zerbina added 11 commits June 6, 2024 00:51
It has the same role as the `cnkBinding` node in the CGIR, that is, to
group a field name and value expression in an object construction
expression together.

Previously, the `mnkField` and expression nodes loosely followed each
other, making `mnkObjConstr` subtrees heterogenous, which complicates
some traversal (or at least it will, once more traversal is tree-
length-based).
`mnkPathNamed`, `mnkPathVariant`, `mnkPathPosition`, and `mnkName` are
all going to change shape. For better forward compatibility, dedicated
construction template for them are added and used.

The `ekNone` effect kind is introduced for being able to signal "no
effect" when constructing a `mnkName` tree.
Instead of requiring delimiter nodes, subtree root nodes now store a
their number of subnodes. Multiple subtree root nodes previously stored
extra information, which is now no longer possible.

Most tree traversal routines are updated accordingly, and those
specific to `mnkEnd` nodes are removed.

Since it's obsolete now, the `GeneralEffect` enum type is removed.
* skipping end nodes is unnecessary and wrong now
* instead of iterating until an `mnkEnd` node is found, a for loop
  using the node count suffices
A `MirBuffer` instance keeps the track of the number of the subnodes in
the currently constructed tree, updating the `len` field in the root
node on subtree completion.

The value is saved on the stack when starting a nested subtree. For
manual subtree construction (i.e., not using the `subTree` template),
the `start` and `finish` procedures are provided.

Since the builder manages updating the length field, manually providing
the number of nodes is unnecessary (`mirgen` is updated accordingly).
Subtree nodes no longer being able to store non-length information
requires the information to be stored elsewhere. The `mnkImmediate`
node is added, and used as an ad-hoc way of storing information in
the tree.

* call trees store whether the call modifies some global state in
  an immediate value in the first slot
* `mnkName` trees store the tag as an immediate value in the first
  subnode -- `mnkTag` nodes are removed
* the named-field-access trees store the field position in an
  `mnkField` node in the second slot
* `mnkPathPos` stores the position as an immediate value in the
  second slot

The tree construction throught the MIR processing is updated
accordingly.
It replaces the now-unavailable `findEnd` + `previous` combination.
* inspection of call, path, and `mnkName` is adjusted to work with the
  new tree shapes
* dedicated query procedures are used instead of manual indexing where
  sensible
There's no more end node that needs to be patched.
@zerbina zerbina added refactor Implementation refactor compiler/backend Related to backend system of the compiler labels Jun 6, 2024
@zerbina zerbina added this to the MIR phase milestone Jun 6, 2024
Manually constructing a `mnkCall`/`mnkCheckedCall` sub-tree is prone to
mistakes, and it also complicates future changes to the syntax (should
any be made). `mirconstr` now provides the `rawBuildCall` template for
building call trees where full control over the makeup is needed.
@zerbina
Copy link
Collaborator Author

zerbina commented Jun 9, 2024

Using a -d:release compiler and compiling the current devel (96cefc4) compiler, I get the following benchmarks results:

hyperfine --warmpup 1 nim c --compileOnly --verbosity:0 --hints:off --warnings:off compiler/nim.nim

before  15.218 s ±  0.181 s
now     14.989 s ±  0.169 s

@zerbina zerbina marked this pull request as ready for review June 9, 2024 19:16
@saem
Copy link
Collaborator

saem commented Jun 9, 2024

FYI: minor updates to PR description (typo/grammar)

Copy link
Collaborator

@saem saem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments, but I think it's good to go.

compiler/backend/cgirgen.nim Outdated Show resolved Hide resolved
compiler/backend/cgirgen.nim Outdated Show resolved Hide resolved
compiler/mir/mirconstr.nim Outdated Show resolved Hide resolved
compiler/mir/mirconstr.nim Outdated Show resolved Hide resolved
@zerbina
Copy link
Collaborator Author

zerbina commented Jun 10, 2024

Thank you for the review, @saem.

@zerbina
Copy link
Collaborator Author

zerbina commented Jun 10, 2024

/merge

Copy link

Merge requested by: @zerbina

Contents after the first section break of the PR description has been removed and preserved below:


Notes for Reviewers

@chore-runner chore-runner bot added this pull request to the merge queue Jun 10, 2024
Merged via the queue into nim-works:devel with commit 33c8184 Jun 11, 2024
31 checks passed
@zerbina zerbina deleted the mir-no-end-nodes branch June 13, 2024 20:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/backend Related to backend system of the compiler refactor Implementation refactor
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants