Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revised name generation #2455

Closed
wants to merge 3 commits into from
Closed

Revised name generation #2455

wants to merge 3 commits into from

Conversation

vladima
Copy link
Contributor

@vladima vladima commented Mar 22, 2015

This PR is a proposal to consolidate different ways to generate temp names that we have in the compiler.
It moves name generation for modules from the checker into the emitter (so pre-pass is not needed) and names can be generated on-demand using two functions:

  • makeUniqueName - creates a name that does not exist in global scope and unique to current file. This name will never be generated in the future nor by makeUniqueName neither by makeTempVariable
  • makeTempVariable - creates a name that does not exists anywhere higher in the scope (in user code) and unique among generated names in current scope. The same name however can be generated by calls to makeTempVariable above or below in the scope and it is up to the user to protect yourself from this fact.

Perf-wise this approach is preferable since to determine if the name is truly unique we need to do just 3 map lookups: globals, identifiers in current file, generated names in current file. However it comes with a cost of degrading debugging experience: i.e. in compiler codebase some names that were used quite frequently, i.e. result were transformed into something like _result_101.

@ahejlsberg, @JsonFreeman: feedback is welcomed

@CyrusNajmabadi
Copy link
Contributor

I'm concerned about the debugging impact. Also about this bit: "Perf-wise this approach is preferable since to determine if the name is truly unique we need to do just 3 map lookups". I'd like to know more accurately what the perf impact is of making better names that work well in most current debuggers. While i agree it will be more expensive, i have no sense of if that expense will be unacceptable.

Or, in other words, i think we should be optimizing for the end user experience with our language (including hte debugging experience), not just the raw performance of our compiler. A fast compiler isn't very helpful if it generates code that you don't want.

Another option is to use the --sourceMap flag as an indicator to us as to what sort of names to generate. If you use --sourceMap, then you're telling us you want to be able to debug this usign the original TS. In that case, we generate names that are as good as possible. If you don't have --sourceMap, then you'd only be debugging with the JS. And, in that case, the names just have to be ok. So we can be faster in the case where you don't use --sourceMap, and slower if you want great debugging.

@vladima
Copy link
Contributor Author

vladima commented Mar 22, 2015

In order to test the impact of name generation to the emit time we need codebase that heavily uses lexical declarations - I've used TypeScript compiler as a guinea pig.
Average emit time for 5 runs:

  • current PR ~0.93 s
  • master with extra name check - 2.74 s

Extra name check was done by threading isUniqueLocalName from the checker into the emitter and using this function in isExistingName. Also since isUniqueLocalName uses chain of containers to determine if name exists in the nested scope I've augmented the binder to add block-scoped containers into the chain.

@CyrusNajmabadi
Copy link
Contributor

@vladima Great! Thanks for the data. I'd def still like to know the cost of:

  1. jason's proposed approach.
  2. an approach where we do dive deep, but don't use the continer chain (and thus don't need to walk down and up at every point).

I can work with you on this early next week to help collect more data, so that this isn't all on you to do the investigation.

And, actually, if you make a branch for the perf tests you've been doing (i.e. the area where you got the 2.74s time), then i can try out the second approach above soon. Or, you can just try merging in the containerMadness branch into your own branch to try it out. Either way works for me. Thanks!

@ahejlsberg
Copy link
Member

@vladima I assume we only rename let variables when we actually need to. In other words, if let is used to declare a variable named result, that variable is renamed only if there is a conflicting variable in the containing function body. Is that correct? So, you'll end up with _result_101 only in cases where you have multiple variables named result in the same function body? If that is true, I don't think it is all that bad. The only issue would be that you'll get _result_101 instead of a potentially shorter name like _result_1, but either way the identifier will have to be renamed.

@vladima
Copy link
Contributor Author

vladima commented Mar 22, 2015

@ahejlsberg: Existing implementation renames let only in case if name conflicts with some name in current scope\higher in the scope. However it does not take into account cases when generated name collides with something in nested scope.

declare function use(a: any);
var x;
{
    let x;
    use(x);
    function foo() {
        var _x; // name that will collide with generated name
        use(x); // will use '_x' from nested scope
    }
}

Current ideas how to deal with this issue:

  1. generate unique identifiers using approach similar to checker - make name that never appear in identifiers in current file (implemented in current PR)
    • pros: simple, fast
    • cons: non-debugger friendly
  2. keep current approach but also rename conflicting values in nested scopes - in case above this will cause var _x to become var _x_1 to resolve conflict. This in turn will mean that if some inner scope will contain var _x_1 it will become _x_2 and so on. This approach is the second in my list to try and get perf implications
    • pros: names still look similar to what user have written
    • cons: might lead to potentially unexpected renames
  3. keep current approach, use isUniqueLocalName from the checker to determine if name is unique.
    • pros: good looking names, no unexpected name changes
    • cons: perf will definitely suffer since for every let we need to examine all nested scopes to make decision

if (node.name.kind === SyntaxKind.Identifier) {
let name = node.name.text;
// Use module/enum name itself if it is unique, otherwise make a unique variation
return assignGeneratedName(node, isUniqueLocalName(name, node) ? name : makeUniqueName(name));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does module or enum do isUniqueLocalName, but everybody else doesn't?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code was moved from the checker without modifications

@ahejlsberg
Copy link
Member

@vladima Regarding which approach to take, as long as we only rename when we absolutely have to, I think it is fine to just pick globally unique names (i.e. approach #1). There's really not a big qualitative difference between a locally unique _result_1 and a globally unique _result_73. It's a rename either way. The key is to limit the renames and do the best we can for debugging with renames.

@CyrusNajmabadi
Copy link
Contributor

I agree. We should avoid renames when possible. But, when renames are absolutely necessary, the only thing i care about is that the final name have some indication about where it came from.

@JsonFreeman
Copy link
Contributor

Okay, so it sounds like we really only care about how many / which declarations get renamed, not how big the suffix number is at the end. That is fair, but if we compare the different approaches @vladima outlined, will the current PR actually rename more things than the other two approaches? That is the impression I got, but maybe I am wrong about that.

If the conditions for uniqueness of the variable's original name are the same, then approach 1 doesn't seem much worse for debugging than approach 2. The same names will be unfaithful to the source in both approaches.

There is one exception. Consider the user who is debugging and tries to type name into their watch window. They might see something that doesn't look correct, and then they might try _name (because they are aware of our renaming scheme). That doesn't work, so then they try _name_0, etc. With approach 1, they will have to cycle through a lot of incorrect names before they get to the correct one, whereas with approach 2, they will hit the correct name fairly soon. I admit this scenario may be too obscure, because it requires knowledge of our renaming scheme to realize what to do. But it's a thought.

@vladima
Copy link
Contributor Author

vladima commented Mar 25, 2015

closing since this is covered by #2471

@vladima vladima closed this Mar 25, 2015
@mhegazy mhegazy deleted the revisedNameGen branch March 25, 2015 19:29
@microsoft microsoft locked and limited conversation to collaborators Jun 18, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants