-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open question: Calling functions defined later in the same file #472
Comments
A couple thoughts -- An advantage of requiring a forward declaration is that it means things (readers) only need to look up when reading a file. An advantage of using name lookup to solve this is that it means readers/writers only need to deal with the definition. i.e., they may need to look in both directions to find it, but they won't first find a forward declaration then need to keep reading. Similarly, a code maintainer doesn't need to keep a forward declaration in sync. I think using name lookup to solve this is more consistent with other languages (C/C++ is exceptional for using forward declarations). |
On #438, @josh11b said regarding keeping a forward declaration in sync:
I believe this accurately reflects the ability of a compiler to detect issues, but requiring forward declarations is still adding developer toil. |
To briefly state my preference, I would like to take the restrictive approach here of requiring a forward declaration. I understand that we can relax this with name lookup rules, but currently I'm in favor of avoiding it. I think it makes name lookup and other things simpler. I also think it is nice to have the "only look up" rule. The other important factor is that I think there will be a strong need to have forward declarations even without relaxing this. So in the cost/benefit analysis of allowing calling functions later in a file, I don't think the benefits include "no more forward declarations in the language". Without that benefit, I think the costs aren't justified. |
Do you think that the complexity will be to the implementation of Carbon, or complexity as perceived by developers using Carbon? In Code and Name Organization, it's called out that small libraries could be in a single |
My guess is that @chandlerc meant "requiring forward declarations to call a function defined later in the file", but we may have to wait for him to get back from his vacation to get clarification. |
@chandlerc just confirmed my interpretation over chat. |
FWIW, I also think Jon's question is reasonable. But I think there will be libraries that want to use a single file (even if the have some cyclic reference requiring forward declarations) because that simplifies their build / distribution / whatever. I hope we have package management sufficient to obviate many if not all of these concerns, but it still seems good to allow a single file when desired. |
I think this is a bit of an aside, but I would anticipate the top reason for a single file to be simplifying development. That is, in multi-file development, it's necessary to have edit both files in tandem, sometimes searching in each for definitions. -- From @josh11b's comment, I think I'm being misunderstood, so let me try to clarify my comments: I'm trying to suggest how developers would affect in ways that affect the understandability goal. I believe most developers would prefer to have the API at the top of the file, and internal implementation calls towards the bottom. I'm following a chain of thought that functions will call other functions, or access data. Typically the functions that call others are higher-level, and would more naturally float to the top of an API description. Forcing "only look up" inverts that ordering unless forward declarations are used for the API functions. C++ effectively does that with In the case of a single However, I perceive it as a strong push towards -- In chat, I also brought up structs and whether consistency with the "only look up" rule would also apply there. I think this has stronger understandability implications, as it breaks consistency with C++. In particular, I assume we'll provide something like "private" and "public" access controls in C++. C++ Core Guidelines and Google C++ Style are examples that place "public" before "private", and functions before data. When implementing a class, I would typically expect that public functions call private, and private functions may use both public and private variables. As a consequence of an "only look up" rule, Carbon style should invert the C++ ordering to follow "only look up": place "private" before "public", and data before functions. There would further need to be exceptions where, for example, a private function may reuse a public function. Forward declaring classes would only be a partial solution, as trivial functions (particularly accessors and mutators) would be start to dictate order:
I would tend to expect APIs to be at the top of files, which is the opposite of what an "only look up" rule leads to. So while I may not grasp the amount of implementation complexity this adds, I do see this as an ongoing writability and understandability issue for developers. Lacking examples of this limitation in other languages, I also see an "only look up" rule as an undesirable innovation of Carbon. |
I have a strong desire and a weak desire here. Setting aside, for the moment, scopes in which declaration order has an inherent meaning (such as in a function body where declaration order implies ordinary execution order): Strong desire: reordering declarations within a scope cannot change one valid program into another valid program with a different meaning. We can accommodate my strong desire alone, by restricting name shadowing and being careful about making negative properties ("has this type not been defined yet?") observable, without making the processing actually be order-independent. I think these are the costs associated with making declarations within a scope be order-irrelevant:
Regarding implementation complexity, something we need to think hard about is phase ordering. If we allow:
... then we have a problem that can easily lead to circularities:
(though in practice the circularities might be a lot less obvious). Here, we can't type-check Prior art, in languages with some amount of order-independence and also some amount of metaprogramming (more examples would be useful):
|
If we decide to require function declarations to precede calls in all cases, we will need to revisit #438, which says:
... but this would prevent (for example) mutually recursive functions that are not part of the API of a library from being defined an an |
I'd like to suggest we start with a restrictive rule that declarations must precede unqualified names in all cases. Reasons:
I think the biggest issue is the ergonomic cost to member functions defined lexically inside the class body. That is the one place where C++ currently works to provide order-independence and regressing that I think would be very costly for users. However, the design we have for member functions has already solved this problem because we aren't using unqualified name lookup to find instance members, and instead providing an explicit object parameter class X {
fn Method[me: Self]() {
// This would be an error:
var y: NestedType = {};
// But this would be fine:
var z: i32 = me.OtherMethod().GetNumber();
assert(z == 42);
}
class NestedType {
fn GetNumber[me: Self]() -> i32 { return 42; }
}
fn OtherMethod[me: Self]() -> NestedType {
return {};
}
} But requiring nested types to be forward declared (or the member function defined out-of-line) seems much less burdensome than requiring it for all instance members. The use of What do folks think? (edited to provide a more complete example) |
To help me respond, are there options that don't require forward declarations? Considering the particular line, would |
Whether And I'd expect We have to defer type checking the body of lexically nested member function definitions -- otherwise they'd be incapable of calling any member functions on So the only really interesting question is what to do with unqualified name lookup. Having that not depend on type checking seems valuable -- it lets us determine the relationship between top level declarations without doing (potentially expensive, and potentially impossible for mid-edit files) type checking or loading imports.
With |
To clarify the extent of this rule, within a file would you be able to refer to functions defined later as long as you didn't use an unqualified name for them? |
I think this is a somewhat separable set of questions. For qualified names derived from a type, there is a compelling argument that they should be part of type checking, and we're expecting the set of names to be complete when the type is complete. And then, where you can reference a name within a type before it is complete (lexically nested method definitions for example), that needs to wait until complete and be consistent with name lookup when complete. For qualified names from a namepspace I think we could try to make them work in either way if we want. Personally, I would prefer for namespaces to work very similarly to unqualified name lookup -- we've been trying to articulate them as just a useful way to build structured names. They're never "complete", etc. And so for me, that indicates the model I would prefer. While unqualified is "only look up", I'd suggest the same rule for namespaces. |
Chatted again with @zygoloid and we're both pretty happy with the direction I suggested above. I think @zygoloid is going to work on a proposal to actually document this design. To more fully answer @josh11b's question around whether qualifying the name would allow use further down in the file, and my prior answer was more complicated than it needs to be. The idea here is to lookup each name in a dotted sequence as early as possible. We should keep doing that lookup until we find a name that we cannot lookup until some form of deferred type checking. We handle the type of a class inside its definition the same way we would handle type template parameters (if we have templates) -- they're "dependent" until the end of the class definition, and then we finish the type checking. The results of this rule are:
But the difference between when a name qualified by a namespace and by a type is looked up above doesn't result in anything surprising within class definitions -- the set of namespace names doesn't change between inside the class body and it being completed. The difference is only observable with templates, and even then the original consistent rule is followed in both cases. We always walk as far down the dotted sequence of names as we can, and stop when we find the type of an enclosing class body or a dependent type. We can even imagine namespace names that are dependent:
One observation of this example that I didn't realize when discussing with Richard: it means we have the possibility of ODR violations of templates, because we have allowed dependent lookup into an extensible space -- namespaces. Two different calls to It is tempting to ask if doing name lookup globally would remove the potential ODR issue, but globally within the file doesn't help at all. It would need to do lookup across all the files that could define There is a name lookup strategy that would help with the ODR issue: we could insist that even a dependent namespace lookup only looks "up" lexically, regardless of when it is instantiated. This would define away the ODR issue I think. However, we should be careful that this more restrictive model doesn't break real use cases for aliasing a namespace. Another change we could make to define away this problem is to disallow aliasing a namespace as a member of a class. This is what I suggest, as it would make things more uniform IMO. Using an alias to create a nesting structure that cannot be created without an alias seems bad to me. But this is an orthogonal change to name lookup rules IMO. |
I'd like to note a consequence of #665 and #752 if forward declarations in the same file are required, and visibility markers on the separate definition are disallowed. Consider an
In the above example, someone reading the definition of A few approaches that I think would address this:
|
Noting a comment @chandlerc in meeting made about this applying to forward declarations of member functions, I think that would turn the definition into |
We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please comment or remove the |
@zygoloid and I have had a series of discussions about this, trying to make progress. To be clear, we see two primary directions that we could pursue here:
The approach needed for (1) seems clear and well understood. Similarly, the consequences of that approach. I won't try to spell out that in more detail. There are some concerns with approach (1) that motivate looking at (2) at all. This isn't intended to be all of the concerns or even a ranked list, but just examples:
There is also a very specific ergonomic use case that motivates not strictly following the direction of (2): inline nested function definitions. We both see a very strong use case for allowing the definition of functions nested within classes, where the class type is complete within the function body. This is pervasively used in C++, and Carbon further relies on it for easy definition of factory functions. Unfortunately, the original suggested rules for (2) don't work well when considering both inline and out-of-line method definitions in classes with inheritance or parameterized classes. The rules would cause inline and out-of-line definitions to have surprising different rules around name lookup. These would both hurt the ergonomics of inline definitions or create significant confusion (or both). In our discussions, I think we came up with a simpler alternative semantic model for (2) that doesn't have these problems. It focuses very specifically on nested function definitions as an ergonomic affordance, with rules to make it effective at being ergonomic. We start from an extremely simple set of rules: names must be declared before they are used. It applies consistently to all unqualified names. Further, the information being used should be introduced before it is used. If you want to write Without function bodies nested within classes, this works well. It largely forces a topological order to the source code, but allowing explicit forward declarations to deviate from that or break cycles when needed. It matches the simple and well understood parts of C++'s rules. It also causes the source to show in an obvious way the inherent acyclic order that must exist. Places where approach (1) would detect a cycle and reject, there would also be no viable source order. Then we handle nested function definitions specially. For example:
The '{}'-ed body of
Here, all the nested function definitions are parsed as-if defined out-of-line immediately after we return to the top level of the file, exactly in the order they are written. So with two levels of nesting and some other interesting cases:
The result would be as if:
Basically, nesting function definitions is just an ergonomic affordance, nothing more. An advantage of this formulation of (2) is that the rules seem very simple to teach and reason about, and it ensures nested definitions don't behave differently from out-of-line definitions. There are a number of corner cases that make C++'s version of this much more complex, but I think we can pick simple answers for Carbon by focusing on inline definitions being a convenience ergonomic feature.
The first point is the really big simplifying thing, and it makes inline and out-of-line definitions much more consistent. Without this, inline/nested definitions can't consistently have the type be complete or declare variables of that type. Whether that would be allowed depends on whether the function is called as part of defining the type (and thus creating a cycle). All of these together fit with the consistent model of nested definitions being exactly the same as out-of-line definitions at the top level. With this (much improved) idea of how (2) would work, I think both @zygoloid and I were at least convinced that both (1) and (2) would be workable and not have serious problems. They seem like reasonable directions, and the question is now much more -- what direction is best? It's worth noting that we could treat unqualified name lookup independently from qualified name lookup here if we wanted, but so far that hasn't significantly helped us get closer to consensus. Another thing worth noting is that as (2) is phrased here, there would be a smooth evolution path towards (1) if desired, and there again qualified and unqualified name lookup could evolve independently towards (1) if desired. Valid programs under (2) would also be valid and have the same meaning as (1). This feels related to the aesthetics of the language, how critical it is to have code organization follow similar structures to C++, and how worried we are about the interactions with metaprogramming. We didn't reach a conclusion between (1) and (2) here, I just wanted to write up a description of the much more workable model for (2) so we were all comparing more realistic formulations of these directions. |
Phrasing this in terms of parsing surprises me. Do you anticipate any situations where performing this transformation after parsing but before name resolution would produce a different result? |
The leads met today and decided to choose the "top down" approach over the "global within file" approach. |
@KateGregory would it be possible to say a few words about what part of the trade offs weighed most heavily in their decision? |
I think the more full details / rationale and such should be up-coming in #875 which aims to capture this decision in a principle. That said, here is a summary from my memory of the discussion that I ran past the other leads and so I think it is roughly accurate if somewhat brief: The leads felt like the tradeoffs ended up borderline. We specifically walked through the Carbon priorities to look for tradeoffs on each one. Many cases would have a different thing be made easier, but without any clear indication that one was more important than another. For example readability is a priority for Carbon. We know that some readers will prefer a "top-down" structure, but others will prefer to not have to scroll past helper code to find interesting code. It isn't clear that one or the other is significantly more important to optimize for, and neither seemed to be severe problems. Two somewhat minor points were language evolution and migration from C++. The top-down model is expected to be easier to relax into the other if desired compared to the other direction. And the top-down model may minorly reduce the initial surprise of C++ programmers encountering Carbon code as it will follow a fairly similar structure. Or the leads could decide these minor differences aren't big enough to really make a decision, and just let the painter pick a bikeshed color. The painter confirmed the color would be "top down" because it would match the paint of C++. Either way, we end up in the same place. |
(To be clear, I'm posting mostly because I think Kate went offline after we discussed this.) |
In C++, calling functions defined later in the same file requires a forward declaration. Do we need to support the same for Carbon, or can this instead be addressed by name lookup?
#438 has this as an open question.
For example, can this be legal code:
I briefly asked this to @zygoloid and we're okay not ruling this out for now. He did bring up wanting to have a single consistent name lookup rule for things both inside and outside functions. For example, the following lookup of
x
might work, but could error with "refers tox
outside its lifetime":This does touch on #424 (@jsiek).
The text was updated successfully, but these errors were encountered: