Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not materialize optional undefined properties inside actual types #28727

Conversation

weswigham
Copy link
Member

@weswigham weswigham commented Nov 29, 2018

This is what I suggested when I had heard about the problem #28707 was opened to fix.

Fixes #28540 without making the input code into an error 😉

This is done by tracking the freshness state of the unions which contain and used to contain fresh object literals, and generating the missing properties on the fly as needed in any given function. This greatly reduces our memory usage in pathological examples like the given test, to the point where we can check it pretty easily - the test actually takes slightly less time to execute than the largeControlFlowGraph.ts test, which is which is actually only a third of the length of the new test.

Also in this PR:

  • Disabling subtype reduction for huge unions. (> 1000 members was my arbitrary cutoff - that's still 1,000,000 comparisons just on union construction (which is double the limit in Error on complex array literals #28707), so it might even be a good idea to lower it a lot more)

Subtype reduction is supposedly an optimization; but the gamble is that the powerset comparison within the union is beaten by the potential savings of eliminating union members for future comparisons. #28707 simply disables subtype reduction and reports an error to prevent followup issues with larger types. Instead, this lets the non-reduced type through and handles larger types better. For truly large unions in practice, subtype reduction is simply rarely a benefit - as normally these things get assigned once or twice and then disappear, rather than compared and narrowed extensively (like smaller unions do). In effect, if the number of followup comparisons exceeds the square of the number of union members not eliminated, then subtype reduction is a plus - however if it regularly eliminates few or no members from the union, then for large unions, it represents a huge cost for little potential gain.

It's possible there's an efficient middleground in subtype reduction here for large types where, rather than outright disabling subtype reduction, we divide the union into object and non-object type buckets, perform normal reduction on the non-object types (since there are usually a limited number of them, unless someone has 10000 different string literal variants in a single union for some reason), scan the object types of the union for the "least specific type" and then attempt to assign each union element to only the least specific type - that grows only linearly with union size, rather than exponentially, and still would handle cases where, eg, there's a lot of very specific types and a single base type in the array to unify them. In abstract, the idea is to spend one pass over the list grouping things into similarity classes which are likely a subtype of a single easily identifiable thing in the class, then doing a single pass on each group to check if it can be reduced to the candidate in that class, then combining the candidates (if a group succeeds in being entirely a subtype of its candidate) or classes back together. Just spitballing here, though - mostly a sketch of trying to the use sorta-transitivity of the subtype relationship to remodel subtype reduction into a divide and conquer style algorithm.

Anyways, back to smaller things in this PR:

  • On top of this, one of those assignments involving the widened and unwidened forms of large unions are now trivial - that relationship is tracked (for the purpose of fetching properties lazily rather than storing them in a ton of maps and consuming space), so that comparison can be effectively skipped, thus with this PR now these large unions of types can often have no actual costly comparisons performed on them.

  • A hard limit on the length of a type string we generate before applying truncation rules, even if noTruncation is set - this is done to avoid an OOM crash for huge output types, as the materialized AST nodes for the types we can now create and reason about are, well, huge. I've capped this at 100,000 characters right now, which is incredibly generous compared to the normal truncation limit of 160 characters. If you hit the 100,000 character limit, you're probably not reading the type anyway, and we'd probably crash on trying to parse the type back in anyway. This isn't strictly necessary for this PR to pass at this point, but I added it so we wouldn't OOM on the type baselines for the new test before I added skipTypeAndSymbol (see next bullet). I'd understand if we'd rather not see this in this PR.

  • The ability to skipTypeAndSymbol baselines for a given test (because type and symbol baselines for a 30k line long file are never going to be reviewed anyway, and they take like 30 seconds to generate besides because of how big the types are, even with the 100,000 character limit).

  • Minor optimizations to union property generation - rather than using appendIfUnique, we push symbols into a map, to avoid the cost of checking uniqueness repeatedly in unions with a huge number of members.

  • undefined property symbols are now globally cached and don't include a declaration - meaning we produce just one undefined symbol for each property name, rather than one for each name maybe with a declaration from the first one of that name we saw. This has the nice side effect of removing duplicated symbol declaration entries from union symbols, with the minor downside that if a property is narrowed to just an implied undefined symbol, it will no longer have an associated declaration at that site.

And one smaller change that might be a regression/bug but also might not depending on how flexible we are:

  • An object type containing a spread of a non-fresh type is no longer a fresh type. This makes sense for declarations, property types, and the like, but disables a somewhat desirable excess property check on the properties actually in the literal after such a spread (the props within the spread were always excluded via a value declaration comparison) - it might be worth tagging the desire for a property to be excess-checked on the property itself, rather than checking the containing type for freshness to fix this. So if this is a big issue (I know we've swapped our behavior on excess property checks in spreads like 4 times), it can be fixed, if need be.

@ahejlsberg
Copy link
Member

There's a lot happening in this PR and I haven't formed opinions about all of it yet. However, one thing I noticed is the 1000 type cutoff in union subtype reduction. I think this is way too blunt of a hammer. Consider an array literal with >1000 elements that have structurally identical types (e.g. object literals with a uniform set of properties). This is very common and doesn't present a problem for subtype reduction at all. In fact, subtype reduction scales linearly in this scenario because it immediately succeeds on the first subtype check for each element, quickly eliminating all but the first element in the type set.

This is why #28707 uses a heuristic of counting the number of failed subtype checks. Those provide a much more accurate measure of complexity and impose no hard limit on the number of structurally identical types you can union together.

@weswigham
Copy link
Member Author

Hm. Yeah, I can understand changing it to reduction failures (I just added it as a quick heuristic) - although rather than a hard limit, how do you feel about a % of the input union length? Ie, for unions > some small minimum length (10?) up to 30% of the possible comparisons are allowed to fail before we bail? The performance penalty of running/not running reduction scales with input size, so it seems to make sense to scale the heuristic with it.

@weswigham weswigham force-pushed the remove-explicit-object-type-widening branch from e7a4b10 to 564da4f Compare November 29, 2018 18:21
@weswigham
Copy link
Member Author

Hmmm, on inspection, I don't think basing the heuristic on failures alone is good enough - pass or fail, we still perform a number of comparisons equal to the square of the number of distinct types provided - and they are all distinct, as they come from differing object literals and have differing identities, even if their underlying structure is identical. Because of this, the success case is actually the worst case - there's absolutely no early bailouts and every object must be fully structurally compared with every other object. Especially since, barring subtype reduction, there's a real possibility we never need to perform any comparisons on the type besides these.

@weswigham
Copy link
Member Author

If anything, we should be bailing if we don't see some number of successful comparisons.

@sandersn
Copy link
Member

sandersn commented Feb 3, 2020

@weswigham I think this was superceded by #29435. Feel free to re-open if it's not.

@weswigham
Copy link
Member Author

We added an error on subtype checks (rather than just skipping it), and the skipTypeAndSymbol test flag has been added by another PR. We could still do the whole "only lazily create optional undefined properties thing, but we don't strictly have any pressing asks/crashes which require it (since we made the original scenario into an error...).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2.5gb on small project
4 participants