-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parse all modules in a project before beginning the type-checking stage #1578
Comments
I'm not sure that the logic for hunting down files for sub-modules is something that belongs in the parser. The parser is, in my mind at least, just something that converts a block of text to an AST (where that AST represents the structure of the original text as literally as possible). The old parser already had far too much logic built into it which is why the old AST still exists and we now have another AST layer on top of it. So I'm wary of heading in that direction again. I think it's important to keep the stages of the compilation pipeline as simple and well-defined as possible. I think the general idea of what you're proposing is right though, just that it belongs in sway-core. Consider, for instance, that if we ever decide to add rust-like macros then it won't be possible to know about sub-modules until after macro expansion. |
Thanks for the input @canndrew, that makes a lot of sense. Revising the "Proposed Solution" above with this in mind, I'm thinking we:
|
Yep 👍 |
I 100% agree with this. But, at a high level, I think that this proposal is missing a key step will create a powerful mechanism in the compiler that we can use to solve a bunch of different issues. At a high-level I propose this design: step 1) the parsing phase. Parsing of all modules, separately. The goal of this step is to transform simple tokens to an internal AST representation. This could be the step 2) the "collection" phase. This is a new phase that we would need to implement, and what it would do is this: It would gather high level information about the declarations inside of each module, without type checking the contents of these declarations. The result of this phase would be some sort of "collection context" that contains key information about struct definitions, enum definitions, function definitions, trait definitions, impl definitions and more. step 3) the type checking phase. During this phase, the "collection context" would be passed around during type checking, such that all declarations (enum/struct/etc) have the ability to have knowledge about every other declaration. Importantly, we would need to create a new data structure to allow us to do this without dummy function bodies. Similar to the type engine, I propose a "declaration engine" that allows declarations of the type The benefits of this design are this:
|
To be clear @mitchmindtree, the "recent idea w.r.t. introducing a new "collection" stage prior to type-checking" idea that I mentioned previously would encapsulate the changes that you are hoping to see. By introducing the idea of a "collection" stage as a mechanism, we would be able to apply that mechanism to a variety of issues. |
This sounds interesting and could be a really valuable addition! A few questions for my own clarity:
We already have this via the I think there may be some issues here that are misrepresented, or that I'm understanding incorrectly. I went through them briefly and maybe you can point out where I'm missing something, as I'm not 100% seeing the link yet:
I'm reading these issues and thinking that #1557 is the core issue that you're suggesting this collecting phase would solve, but actually that's a codegen/type checking thing and has no relation to the fact that modules are parsed in a project after type checking begins. Could you elaborate on how this could impact #1557? With regards to the titular issue, parsing all modules brought in by |
@emilyaherbert I'm very excited about your proposal for a collection phase between parsing and type-checking! I think it makes a lot of sense, and will not only assist with monomorphizing, but also help to resolve the order in which declarations should get type-checked across modules more generally, and probably a bunch of other useful pre-type-check-preparation stuff as you mention :) Would you mind opening a dedicated issue for the new collection phase with all of these motivations and related issues? While I understand there's some overlap with this issue, my original intent for this issue was to be a small, actionable, easily reviewable, first-step toward these larger refactorings that we want to make. I still think it might be worth addressing the separation of parsing and type-checking as described in this issue first, and that doing so would make the addition of a new collection phase in between just a little easier/smaller. |
I am proposing that we perform a fundamental shift in how the compiler thinks of 1) ast node ordering, and 2) how the compiler performs its internal tasks.
Yes. This proposal suggests that we parse all the Sway files involved in a project first. Then perform the "collection" phase on all files, creating a "collection context." The "collection context" would be used to perform type checking, IR stuff, code gen, etc, such that it would make the concept of evaluating the AST in the "right order" obsolete.
Yes and no.
Sure, but this is a difficult task when the
At a high-level, there are several issues that I believe to be intractable without this proposal. There are also issues that could be patched or avoided given the current system. But, they must all be addressed individually, each requiring their own level of involvement with creating a new mechanism within the compiler (additional checks, dummy functions, etc). In additional to tackling the intractable issues, this proposal seeks to eliminate a wide swatch of other issues at the fundamental level, without the need to address them individually. Additionally, I believe that "special-casing" particular fixes and check to be an anti-pattern, and this proposal eliminates much of the need for "special-casing." Let me go through the issues individually:
This proposal would allow us to perform type checking of recursive functions without the need for dummy functions, and would make the idea of a "function engine" (discussed on slack) significantly more simple. i.e. imagine that the "collection phase" collects: 1) function signatures for all functions, 2) information about all traits, 3) signatures for methods for traits, 4) information about which traits are implemented for which types, across all Sway files, 5) the signatures for all of those methods from all of those traits for all of those types. All before type checking. This is information that we fundamentally do no have in the compiler during type checking as it stands currently.
This proposal shifts this mindset. There is no need for a check for recursive dependencies. Recursive dependencies are fully supported, out of the box.
There is no need for a manual check. Recursive
This proposal encompasses this change. The "collection context" could contain information about the dependency graph, making the concept of "explicit ordering" obsolete.
Yes, we could stop inlining functions as a result of this proposal. As I mentioned above, this proposal would help us construct a "function engine" meaning we could stop inlining functions, without the need for introducing dummy functions. Also I will note that this is not just a codegen issue #1557. Before codegen even occurs, type checking is currently unequipped for non-inlining of functions.
The current method of monomorphization creates one copy per call to each function or method, leading to wasted compute and unneeded code size. This PR is tracking changing this to creating one copy per type usage pattern, and will need to go in after we have stopped inlining functions.
This proposal removes the need for a manual check. Out of order
Fundamentally, this proposal is not just a parsing change. It is a complete shift in how the compiler thinks of dep/decl/impl/etc ordering, and eliminates the need for specific ordering at all. During implementation of this shift, parsing will be affected. Given this example: struct Foo<T> {
x: T
}
impl<T> Foo<T> where T: Double + Triple {
fn math(self) -> Foo<T> {
let a = self.x.double();
let b = self.x.triple();
Foo {
x: a + b
}
}
}
trait Double<T> {
fn double(self) -> T;
}
trait Triple<T> {
fn triple(self) -> T;
}
impl Double<u64> for u64 {
fn double(self) -> u64 {
self + self
}
}
impl Triple<u64> for u64 {
fn triple(self) -> u64 {
self + self + self
}
} During type checking of the The change might not be so apparent with this example. But imporantly, mutually recursive impl's would be inherently supported.
I don't know what specific additional information that you have in mind, but the "collection context" could help us gather that information.
This proposal would make the idea of a "function engine" and delayed monomorphization significantly easier. i.e. imagine that the "collection context" collects information about which types use each generic function. If this were the case, we could perform monomorphization separately, and could save compute by avoiding non-unique monomorphization. Rust does something similar: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_monomorphize/collector/index.html
This definitely could be patched with the current system. However it would still be affected by this proposal.
This proposal means that this check could happen before type checking, if we wanted to do that. Potentially saving compute.
Currently, type checking is conflated with dependency resolution, meaning that ripping out type inference will be a significant amount of work. This proposal lays the ground work for adding this feature in a more simple way.
This proposal would make this more simple.
Inlining functions is currently limited by type checking #1557. Type checking does not currently have any mechanism for handling non-inlined functions. We chatted about this on slack, but I believe the best solution to this is the concept of a "function engine", and the concept of a function engine is made significantly easier with this proposal. Without this proposal, the function engine would rely heavily on a mechanism that accounts for functions or methods that may or may not exist. But with this proposal, during type checking when the function engine is being populated and used, it would have the knowledge of the existence of all functions and methods in the project.
Yes, but it would be made easier with this proposal.
This would be made easier with this proposal.
Inlining would be significantly easier with this proposal.
This proposal makes this issues solvable, without the need for special-casing. Currently, it would be impossible to preemptively know that this enum declaration contains a recursive element, without doing it inside type checking.
This is not the core issue that this proposal solves. This proposal introduces a "collection context" mechanism that has many applications. One application is that it would eliminate the concept of "dep/impl/decl ordering" from the compiler. Another application is that it would ease efforts towards function inlining and towards reducing non-unique monomorphization. Another application could be improving type inference. |
SGTM @mitchmindtree ! |
Currently, we parse and type-check each module in a sway project one at a time in a rough depth-first order.
We should separate the concerns of parsing and and type-checking into separate stages both for code clarity and to unlock solutions to many issues related to inter-module definitions and the order in which we type-check items.
From a quick scan, I believe this is a blocker (or at least a first-stage) for addressing the following:
use
fails. #201dep
statements is required to resolve dependencies. #409type_check
? #1267Potential Solution
The existing
sway_parse::Program
type more closely matches what we currently conceive of as a parsed "module" in Sway. We should rename what is currently referred to asprogram
tomodule
. I.e.sway_parse::program
becomessway_parse::module
,sway_parse::Program
becomessway_parse::Module
.Re-introduce a new
sway_parse::Program
type that is a small wrapper around a tree ofsway_parse::Module
s.Add a high-level function the
sway_parse
lib.rs calledparse_project
that begins by callingparse_file
for the root module, then parses loads and parses all submodules based on the parent module's parseddependencies
.Refactor the
type_check
step in terms of the newsway_parse::Program
type. More specifically, refactor theimport_new_file
function toimport_new_dependency
where rather than loading and parsing the file, we look up the pre-parsed module in the module tree.Marking as
P: Critical
as this is a blocker for #409 (critical) as well as a number of other high priority issues.The text was updated successfully, but these errors were encountered: