Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Statistics 12 fix unknown estimates #614

Open
wants to merge 105 commits into
base: master
Choose a base branch
from

Conversation

sopel39
Copy link

@sopel39 sopel39 commented Jul 11, 2017

No description provided.

losipiuk and others added 30 commits June 29, 2017 17:11
Remove statistics tests which just display statistics for
whole tree for manual analysis. Those proved to not be very useful
and mechanism it uses (getting stats from EXPLAIN plan) would not work
for more detailed column statistics anyway.
Test queries cover all the current implementation of
CoefficientCostCalculator.

Test compares estimated statistics with actual numbers obtained from
actual query execution
Instead of recursing in the StatsCalculator api, pass the Lookup instance into
the stats computation. Then the method computing a stats for plan node
can use Lookup to compute stats (and possibly other traits) for source
nodes.

The StatelessLookup is a temporary measure for places that don't have
access to the lookup for individual plans. It can be used across
multiple queries because it doesn't keep state, but it won't resolve
GroupReferences.
Using Estimates for statistics computation in StatsCalculators is
problematic as method-call based arithmetitc must be used.
This makes code hard to read and maintain. As currently Estimate was
nothing more than a wrapper around double value we decided to just use
doubles for computation and represent unknown values as a NaN.

Estimate is still used in SPI to make contract between presto-main and
connectores more clear.
Capping of limit set by sesion manger was moved from
ClusterMemoryManager.java to SystemSessionProperties.java
It makes this field more independent. It just describes data
distribution characteristic for a column and does not connect it
with table wide statistics of total number of rows in the table.
Just one range is supported for now
kokosing and others added 29 commits July 3, 2017 18:35
Move pattern matching to separate package and make it independent from
optimizer.
Thanks to that it will be possible to use pattern matching not only for
optimizer, but other components as well.
Fix a copy-paste error in PlanNodeCostEstimate.memoryCost()
Fix a copy-paste error in PlanNodeCostEstimate.memoryCost()
Return NaN from PNStatsEstimate#getOutputSizeInBytes when rowCount is NaN
Not all CostCalculators are thread safe.
…ying

Previously there was a lot of map copying (through ImmutableMap.copyOf(...)
and new HashMap(...)) which was significantly impacting stats code performance.
HashTreePMap is much better for cases where individual entries of base map
are modified which is common case in stats code.

TODO: This should be split into fixups. Keeping as commit for now since
it is one integral change
@sopel39 sopel39 force-pushed the statistics-12-fix_unknown_estimates branch from 75f2098 to 0379997 Compare July 12, 2017 11:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants