Revised name generation #2455

vladima · 2015-03-22T00:56:18Z

This PR is a proposal to consolidate different ways to generate temp names that we have in the compiler.
It moves name generation for modules from the checker into the emitter (so pre-pass is not needed) and names can be generated on-demand using two functions:

makeUniqueName - creates a name that does not exist in global scope and unique to current file. This name will never be generated in the future nor by makeUniqueName neither by makeTempVariable
makeTempVariable - creates a name that does not exists anywhere higher in the scope (in user code) and unique among generated names in current scope. The same name however can be generated by calls to makeTempVariable above or below in the scope and it is up to the user to protect yourself from this fact.

Perf-wise this approach is preferable since to determine if the name is truly unique we need to do just 3 map lookups: globals, identifiers in current file, generated names in current file. However it comes with a cost of degrading debugging experience: i.e. in compiler codebase some names that were used quite frequently, i.e. result were transformed into something like _result_101.

@ahejlsberg, @JsonFreeman: feedback is welcomed

CyrusNajmabadi · 2015-03-22T01:01:39Z

I'm concerned about the debugging impact. Also about this bit: "Perf-wise this approach is preferable since to determine if the name is truly unique we need to do just 3 map lookups". I'd like to know more accurately what the perf impact is of making better names that work well in most current debuggers. While i agree it will be more expensive, i have no sense of if that expense will be unacceptable.

Or, in other words, i think we should be optimizing for the end user experience with our language (including hte debugging experience), not just the raw performance of our compiler. A fast compiler isn't very helpful if it generates code that you don't want.

Another option is to use the --sourceMap flag as an indicator to us as to what sort of names to generate. If you use --sourceMap, then you're telling us you want to be able to debug this usign the original TS. In that case, we generate names that are as good as possible. If you don't have --sourceMap, then you'd only be debugging with the JS. And, in that case, the names just have to be ok. So we can be faster in the case where you don't use --sourceMap, and slower if you want great debugging.

vladima · 2015-03-22T02:42:45Z

In order to test the impact of name generation to the emit time we need codebase that heavily uses lexical declarations - I've used TypeScript compiler as a guinea pig.
Average emit time for 5 runs:

current PR ~0.93 s
master with extra name check - 2.74 s

Extra name check was done by threading isUniqueLocalName from the checker into the emitter and using this function in isExistingName. Also since isUniqueLocalName uses chain of containers to determine if name exists in the nested scope I've augmented the binder to add block-scoped containers into the chain.

CyrusNajmabadi · 2015-03-22T05:32:49Z

@vladima Great! Thanks for the data. I'd def still like to know the cost of:

jason's proposed approach.
an approach where we do dive deep, but don't use the continer chain (and thus don't need to walk down and up at every point).

I can work with you on this early next week to help collect more data, so that this isn't all on you to do the investigation.

And, actually, if you make a branch for the perf tests you've been doing (i.e. the area where you got the 2.74s time), then i can try out the second approach above soon. Or, you can just try merging in the containerMadness branch into your own branch to try it out. Either way works for me. Thanks!

ahejlsberg · 2015-03-22T06:24:29Z

@vladima I assume we only rename let variables when we actually need to. In other words, if let is used to declare a variable named result, that variable is renamed only if there is a conflicting variable in the containing function body. Is that correct? So, you'll end up with _result_101 only in cases where you have multiple variables named result in the same function body? If that is true, I don't think it is all that bad. The only issue would be that you'll get _result_101 instead of a potentially shorter name like _result_1, but either way the identifier will have to be renamed.

vladima · 2015-03-22T06:45:34Z

@ahejlsberg: Existing implementation renames let only in case if name conflicts with some name in current scope\higher in the scope. However it does not take into account cases when generated name collides with something in nested scope.

declare function use(a: any);
var x;
{
    let x;
    use(x);
    function foo() {
        var _x; // name that will collide with generated name
        use(x); // will use '_x' from nested scope
    }
}

Current ideas how to deal with this issue:

generate unique identifiers using approach similar to checker - make name that never appear in identifiers in current file (implemented in current PR)
- pros: simple, fast
- cons: non-debugger friendly
keep current approach but also rename conflicting values in nested scopes - in case above this will cause var _x to become var _x_1 to resolve conflict. This in turn will mean that if some inner scope will contain var _x_1 it will become _x_2 and so on. This approach is the second in my list to try and get perf implications
- pros: names still look similar to what user have written
- cons: might lead to potentially unexpected renames
keep current approach, use isUniqueLocalName from the checker to determine if name is unique.
- pros: good looking names, no unexpected name changes
- cons: perf will definitely suffer since for every let we need to examine all nested scopes to make decision

JsonFreeman · 2015-03-22T07:00:46Z

src/compiler/emitter.ts

+                if (node.name.kind === SyntaxKind.Identifier) {
+                    let name = node.name.text;
+                    // Use module/enum name itself if it is unique, otherwise make a unique variation
+                    return assignGeneratedName(node, isUniqueLocalName(name, node) ? name : makeUniqueName(name));


Why does module or enum do isUniqueLocalName, but everybody else doesn't?

this code was moved from the checker without modifications

ahejlsberg · 2015-03-22T15:28:55Z

@vladima Regarding which approach to take, as long as we only rename when we absolutely have to, I think it is fine to just pick globally unique names (i.e. approach #1). There's really not a big qualitative difference between a locally unique _result_1 and a globally unique _result_73. It's a rename either way. The key is to limit the renames and do the best we can for debugging with renames.

CyrusNajmabadi · 2015-03-22T18:42:49Z

I agree. We should avoid renames when possible. But, when renames are absolutely necessary, the only thing i care about is that the final name have some indication about where it came from.

JsonFreeman · 2015-03-23T07:40:56Z

Okay, so it sounds like we really only care about how many / which declarations get renamed, not how big the suffix number is at the end. That is fair, but if we compare the different approaches @vladima outlined, will the current PR actually rename more things than the other two approaches? That is the impression I got, but maybe I am wrong about that.

If the conditions for uniqueness of the variable's original name are the same, then approach 1 doesn't seem much worse for debugging than approach 2. The same names will be unfaithful to the source in both approaches.

There is one exception. Consider the user who is debugging and tries to type name into their watch window. They might see something that doesn't look correct, and then they might try _name (because they are aware of our renaming scheme). That doesn't work, so then they try _name_0, etc. With approach 1, they will have to cycle through a lot of incorrect names before they get to the correct one, whereas with approach 2, they will hit the correct name fairly soon. I admit this scenario may be too obscure, because it requires knowledge of our renaming scheme to realize what to do. But it's a thought.

vladima · 2015-03-25T19:29:09Z

closing since this is covered by #2471

vladima added 3 commits March 21, 2015 17:05

moved name generation to emitter

4046a00

accepted baselines

dbb651a

dropped folder that was checked in by mistake

0504f51

msftclas added the cla-not-required label Mar 22, 2015

vladima mentioned this pull request Mar 22, 2015

Downlevel for 'let' doesn't account for nested declarations #2449

Closed

JsonFreeman reviewed Mar 22, 2015
View reviewed changes

vladima closed this Mar 25, 2015

mhegazy deleted the revisedNameGen branch March 25, 2015 19:29

microsoft locked and limited conversation to collaborators Jun 18, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revised name generation #2455

Revised name generation #2455

vladima commented Mar 22, 2015

CyrusNajmabadi commented Mar 22, 2015

vladima commented Mar 22, 2015

CyrusNajmabadi commented Mar 22, 2015

ahejlsberg commented Mar 22, 2015

vladima commented Mar 22, 2015

JsonFreeman Mar 22, 2015

vladima Mar 22, 2015

ahejlsberg commented Mar 22, 2015

CyrusNajmabadi commented Mar 22, 2015

JsonFreeman commented Mar 23, 2015

vladima commented Mar 25, 2015

Revised name generation #2455

Revised name generation #2455

Conversation

vladima commented Mar 22, 2015

CyrusNajmabadi commented Mar 22, 2015

vladima commented Mar 22, 2015

CyrusNajmabadi commented Mar 22, 2015

ahejlsberg commented Mar 22, 2015

vladima commented Mar 22, 2015

JsonFreeman Mar 22, 2015

Choose a reason for hiding this comment

vladima Mar 22, 2015

Choose a reason for hiding this comment

ahejlsberg commented Mar 22, 2015

CyrusNajmabadi commented Mar 22, 2015

JsonFreeman commented Mar 23, 2015

vladima commented Mar 25, 2015