Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include cost estimates in IO EXPLAIN plan #806

Merged
merged 2 commits into from
Aug 29, 2019

Conversation

JamesRTaylor
Copy link

Implementation of #677

@cla-bot cla-bot bot added the cla-signed label May 22, 2019
@ebyhr
Copy link
Member

ebyhr commented May 23, 2019

While I already sent PR #709 (please ignore mine, I'll close it), why only estimated byte scanned? I think other information (e.g. outputRowCount) is also useful.

Of course we can limit the field first. However, if you add estBytesScanned to TableColumnInfo directly, the output format will be like below.

{
  "inputTableColumnInfos": [
    {
      "table": {..
      "columns": [ ..
      "estimatedBytesScanned": 1230000
    }
  ],

While it's easy to parse, when we add fields later about estimated costs, it looks little redundant to me.

{
  "inputTableColumnInfos": [
    {
      "table": {
      "columns": [ ..
      "estimatedBytesScanned": 1230000
      "estimatedxxx": 999
      "estimatedyyy": 999
    }
  ],

I hope you update presto-docs/src/main/sphinx/sql/explain.rst so that other developers can use this enhancement easily.

@findepi
Copy link
Member

findepi commented May 24, 2019

While I already sent PR #709 ...

@ebyhr apologies it hasn't been reviewed yet!

@ebyhr @JamesRTaylor can you look at each other's PR and determine how much overlap is there?

@ebyhr
Copy link
Member

ebyhr commented May 24, 2019

@findepi There's no need to apologize. Actually, I had completely forgot my PR until today and I hope @JamesRTaylor (the issue ticket author) implements this. Also, I'm little busy in this month.

The main difference between my PR and this commit is I added a separated class and 5 metrics including estimated byte scanned. If there's no special reason to limit to the bytes, adding other 4 metrics looks good to me. @JamesRTaylor What do you think about it?

@JamesRTaylor
Copy link
Author

Thanks so much for your comments, @kokosing, @ebyhr, and @findepi. I've pushed an update to address the feedback. Please review when you get a chance.

@JamesRTaylor
Copy link
Author

Ping @kokosing, @dain, @martint - please review when you have a few spare cycles. Thanks!

@kokosing
Copy link
Member

kokosing commented Jul 9, 2019

Can you please rebase and get rid of merge commits (we do not use them).

@JamesRTaylor JamesRTaylor force-pushed the est-scanned-bytes-in-explain-io branch from 516394d to ea8a8aa Compare July 9, 2019 17:21
@JamesRTaylor
Copy link
Author

@kokosing - I rebased and squashed the commits. I looked at the test failure, but it doesn't appear related.

@findepi
Copy link
Member

findepi commented Jul 9, 2019

@JamesRTaylor from Travis:

[ERROR] TestTpchDistributedQueries.testIOExplain:70 expected [IoPlan{inputTableColumnInfos=[TableColumnInfo{table=tpch.sf0.01.orders, columnConstraints=[ColumnConstraint{columnName=orderstatus, typeSignature=varchar(1), domain=FormattedDomain{nullsAllowed=false, ranges=[FormattedRange{low=FormattedMarker{value=Optional[F], bound=EXACTLY}, high=FormattedMarker{value=Optional[F], bound=EXACTLY}}, FormattedRange{low=FormattedMarker{value=Optional[O], bound=EXACTLY}, high=FormattedMarker{value=Optional[O], bound=EXACTLY}}, FormattedRange{low=FormattedMarker{value=Optional[P], bound=EXACTLY}, high=FormattedMarker{value=Optional[P], bound=EXACTLY}}]}}], estimate=EstimatedStatsAndCost{outputRowCount=15000.0, outputSizeInBytes=1597294.0, cpuCost=1597294.0, maxMemory=0.0, networkCost=0.0, bytesScanned=0.0}}], outputTable=Optional.empty, estimate=EstimatedStatsAndCost{outputRowCount=15000.0, outputSizeInBytes=1597294.0, cpuCost=1597294.0, maxMemory=0.0, networkCost=0.0, bytesScanned=0.0}}] but found [IoPlan{inputTableColumnInfos=[TableColumnInfo{table=tpch.sf0.01.orders, columnConstraints=[ColumnConstraint{columnName=orderstatus, typeSignature=varchar(1), domain=FormattedDomain{nullsAllowed=false, ranges=[FormattedRange{low=FormattedMarker{value=Optional[F], bound=EXACTLY}, high=FormattedMarker{value=Optional[F], bound=EXACTLY}}, FormattedRange{low=FormattedMarker{value=Optional[O], bound=EXACTLY}, high=FormattedMarker{value=Optional[O], bound=EXACTLY}}, FormattedRange{low=FormattedMarker{value=Optional[P], bound=EXACTLY}, high=FormattedMarker{value=Optional[P], bound=EXACTLY}}]}}], estimate=EstimatedStatsAndCost{outputRowCount=15000.0, outputSizeInBytes=1597294.0, cpuCost=1597294.0, maxMemory=0.0, networkCost=0.0, bytesScanned=742294.0}}], outputTable=Optional.empty, estimate=EstimatedStatsAndCost{outputRowCount=15000.0, outputSizeInBytes=1597294.0, cpuCost=1597294.0, maxMemory=0.0, networkCost=1597294.0, bytesScanned=742294.0}}]

@JamesRTaylor
Copy link
Author

Oops, missed that - sorry about that. Was seeing other errors in the console that I thought was leading to the test failure. I'll get that fixed.

@JamesRTaylor
Copy link
Author

All fixed now, @findepi. Sorry again for the noise.

presto-docs/src/main/sphinx/sql/explain.rst Show resolved Hide resolved
presto-docs/src/main/sphinx/sql/explain.rst Outdated Show resolved Hide resolved
EstimatedStatsAndCost estimate16 = new EstimatedStatsAndCost(1.0, 16.0, 16.0, 0.0, 0.0, 6.0);
EstimatedStatsAndCost estimate17 = new EstimatedStatsAndCost(1.0, 17.0, 17.0, 0.0, 0.0, 3.0);
EstimatedStatsAndCost estimate25 = new EstimatedStatsAndCost(1.0, 25.0, 25.0, 0.0, 0.0, 3.0);
Map<Object, TypeAndEstimate> data = new LinkedHashMap<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why linked hash map?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a linked hash map because the assert below prints the index of the entry upon failure to make the entry easier to find.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please add a comment about this, it is not obvious

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually use ImmutableMap.Builder also when we want to preserve insertion order.
ImmutableMap guarantees order.

BTW here List<Entry<..> would do. Or restructuring the test so that data is not needed at all.

@dain dain assigned JamesRTaylor and unassigned kokosing Aug 1, 2019
@JamesRTaylor JamesRTaylor force-pushed the est-scanned-bytes-in-explain-io branch from d73e657 to ec78d26 Compare August 15, 2019 23:53
@JamesRTaylor JamesRTaylor changed the title Add estBytesScanned to output of EXPLAIN (TYPE IO) based on column statistics Include cost estimates in IO EXPLAIN plan Aug 15, 2019
@JamesRTaylor
Copy link
Author

JamesRTaylor commented Aug 16, 2019

@kokosing - thanks for the prior review. I've updated the PR based on your feedback:

  • removed estimatedBytesScanned as it wouldn't be accurate anyway.
  • squashed the commits and shortened the commit message

Copy link
Member

@kokosing kokosing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also post how now EXPLAIN (TYPE IO) looks now, and how it was before? So it is easy to compare how it changed.

EstimatedStatsAndCost estimate16 = new EstimatedStatsAndCost(1.0, 16.0, 16.0, 0.0, 0.0, 6.0);
EstimatedStatsAndCost estimate17 = new EstimatedStatsAndCost(1.0, 17.0, 17.0, 0.0, 0.0, 3.0);
EstimatedStatsAndCost estimate25 = new EstimatedStatsAndCost(1.0, 25.0, 25.0, 0.0, 0.0, 3.0);
Map<Object, TypeAndEstimate> data = new LinkedHashMap<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please add a comment about this, it is not obvious

@JamesRTaylor JamesRTaylor force-pushed the est-scanned-bytes-in-explain-io branch 3 times, most recently from 05d911c to ca9e68c Compare August 22, 2019 16:55
@JamesRTaylor
Copy link
Author

@kokosing - thanks for your previous reviews. I've pushed another commit based on your feedback.

@JamesRTaylor JamesRTaylor force-pushed the est-scanned-bytes-in-explain-io branch from 2ed895f to 26c773d Compare August 23, 2019 18:22
@JamesRTaylor
Copy link
Author

Thanks for the feedback, @kokosing. I've pushed a new commit to address them.

Copy link
Member

@kokosing kokosing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rebase and squash the commits. Then please extract formatting changes as separate commit before your actual change. Then I will merge this. Thanks!

new FormattedMarker(Optional.of("false"), EXACTLY),
new FormattedMarker(Optional.of("false"), EXACTLY)))))))),
Optional.of(new CatalogSchemaTableName(catalog, "tpch", "test_orders"))));
ImmutableSet.of(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now when you change the formatting of existing code that is not strictly related to your change, please extract that as separate commit. Before your change.

sorry about such comments

@JamesRTaylor JamesRTaylor force-pushed the est-scanned-bytes-in-explain-io branch from 7aff6a5 to 7b45f97 Compare August 27, 2019 18:05
@JamesRTaylor
Copy link
Author

@kokosing - thanks for the reviews. I've pushed the changes.

tableMetadata.getCatalogName().getCatalogName(),
tableMetadata.getTable().getSchemaName(),
tableMetadata.getTable().getTableName()),
parseConstraints(node.getTable(), predicate)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️ Thank you!

@kokosing
Copy link
Member

It looks there some issue with travis, some external service was unavailable. Restarted.

@JamesRTaylor
Copy link
Author

JamesRTaylor commented Aug 28, 2019

Looks like an OOM error:

[INFO] Compiling 807 source files to /home/travis/build/prestosql/presto/presto-main/target/test-classes
Terminating due to java.lang.OutOfMemoryError: GC overhead limit exceeded

Please let me know if any action is required on my side.

@kokosing
Copy link
Member

Can you please raise Xmx in MAVEN_OPTS in .travis.yml to 1G.

FYI: @electrum @findepi

EstimatedStatsAndCost estimate16 = new EstimatedStatsAndCost(1.0, 16.0, 16.0, 0.0, 0.0, 6.0);
EstimatedStatsAndCost estimate17 = new EstimatedStatsAndCost(1.0, 17.0, 17.0, 0.0, 0.0, 3.0);
EstimatedStatsAndCost estimate25 = new EstimatedStatsAndCost(1.0, 25.0, 25.0, 0.0, 0.0, 3.0);
Map<Object, TypeAndEstimate> data = new LinkedHashMap<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually use ImmutableMap.Builder also when we want to preserve insertion order.
ImmutableMap guarantees order.

BTW here List<Entry<..> would do. Or restructuring the test so that data is not needed at all.

estimate)),
Optional.empty(),
estimate),
format("%d) Type %s ", index, type));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing ( ?

@findepi
Copy link
Member

findepi commented Aug 29, 2019

Can you please raise Xmx in MAVEN_OPTS in .travis.yml to 1G.

So far we've been seeing OOM on full maven build (with verification), and rather rarely.
This PR OOMs during compilation several times (https://travis-ci.com/prestosql/presto/builds/124792147 (re-run)). I don't see any reason why it would be so, but that's strange.

@JamesRTaylor could you please trigger a new build (rather than retriggering the failed one) so that we can see how reproducible it is. You can also trigger a build with changed MAVEN_OPTS.

eg

  • push an empty commit
  • push a commit changing MAVEN_OPTS
  • go to Travis and make sure the empty commit's build is run (Travis auto-cancels builds but you can still manually force a build to run)

@kokosing We can probably increase -Xmx for Maven, but need to think whether we don't go over limit somewhere else, when maven is running concurrently.
Like tests execution in a forked JVM (but tests are limited at 2g). Anything else?

@kokosing
Copy link
Member

I don't see any reason why it would be so, but that's strange.

It is strange to me as well. Let me retrigger the build with flushed cache.

@findepi
Copy link
Member

findepi commented Aug 29, 2019

It is strange to me as well. Let me retrigger the build with flushed cache.

i see it green, so ... @JamesRTaylor @kokosing let's do nothing and wait until problem resurfaces.

@kokosing kokosing merged commit 067e865 into trinodb:master Aug 29, 2019
@kokosing
Copy link
Member

Merged, thanks!

@kokosing kokosing added this to the 319 milestone Aug 29, 2019
@kokosing kokosing mentioned this pull request Aug 29, 2019
6 tasks
@JamesRTaylor JamesRTaylor deleted the est-scanned-bytes-in-explain-io branch August 29, 2019 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants