Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ui: Statements Page timing for CREATE INDEX wrong #30381

Closed
arctica opened this issue Sep 18, 2018 · 20 comments
Closed

ui: Statements Page timing for CREATE INDEX wrong #30381

arctica opened this issue Sep 18, 2018 · 20 comments
Assignees
Labels
A-webui-queryperf C-question A question rather than an issue. No code/spec/doc change needed. docs-done docs-known-limitation O-community Originated from the community S-3-ux-surprise Issue leaves users wondering whether CRDB is behaving properly. Likely to hurt reputation/adoption.

Comments

@arctica
Copy link

arctica commented Sep 18, 2018

Running 2.1.0-beta.20180917 we've seen the fantastic new "Statements" page which comes in really handy when looking for slow queries. Great job there!

But I've spotted a bug regarding CREATE INDEX queries. The timings on the page for this query are very low (35ms), yet in the commandline the query took over 1 minute. This made me question the overall accuracy of the statistics in this page.

CLI output:

CREATE INDEX

Time: 1m3.87831916s

screen shot 2018-09-18 at 20 26 26

@a-robinson
Copy link
Contributor

Thanks @arctica! I suspect this is related to the fact that creating an index gets done in a background job that gets triggered by the SQL statement, but isn't technically part of the SQL statement itself. I agree that this is misleading.

cc @vivekmenezes @couchand

@couchand couchand added O-community Originated from the community C-question A question rather than an issue. No code/spec/doc change needed. A-webui-queryperf labels Sep 18, 2018
@vilterp vilterp changed the title Statements Page timing for CREATE INDEX wrong ui: Statements Page timing for CREATE INDEX wrong Sep 18, 2018
@couchand
Copy link
Contributor

Indeed @a-robinson is correct, the phase timings we have for any statement that kicks off a background job only reflects the creation of the job. This is a known limitation with the data source we're using for this page (which was originally intended for a different purpose). We'll be revisiting this in the next release cycle, but at least for the 2.1 release, any statement that creates a job should be watched on the jobs page rather than here.

cc @piyush-singh

@couchand couchand added S-3-ux-surprise Issue leaves users wondering whether CRDB is behaving properly. Likely to hurt reputation/adoption. docs-known-limitation labels Sep 18, 2018
@couchand couchand added this to the 2.2 milestone Sep 18, 2018
@arctica
Copy link
Author

arctica commented Sep 19, 2018

That makes sense and restores the faith in the timing stats of other queries. Maybe a dirty hack until a full fix comes along could be to add a little note on this page for all queries that are known to execute background jobs?

@vivekmenezes
Copy link
Contributor

I think this can be resolved by changing the statement timing stats to reflect time window: [stmt start-time, stmt commit-time] rather than: [stmt start-time, stmt end-time], where commit-time is the time recorded after the schema changers have run post commit.

@vivekmenezes
Copy link
Contributor

Another choice is to eliminate statements that queue up schema changers from the stat, or compute the stats differently for those statements.

@couchand
Copy link
Contributor

I generally think it would be irresponsible at this point in the release cycle to make significant changes to the data being recorded. But the basic issue here (as well as with other known limitations to the statements page) is that we're using data collected for a different purpose. Changing the backing data for this feature would change it everywhere, something we need to be very careful about.

A more complete solution would be to treat the statements page in this release as a proof-of-concept and make a more comprehensive review of the needs of that feature for the next release cycle. If we determine that the needs are sufficiently aligned, one usage will need to follow the CONFORMIST pattern. On the other hand, if we recognize that they are similar but different, we need to go SEPARATE WAYS. (both patterns detailed in Part IV of Domain-Driven Design by Eric Evans)

@couchand
Copy link
Contributor

Minor changes that don't modify the underlying data, such as hiding certain statement types or adding a warning to certain statement types, seem like a reasonable change to make for this release. Anything more needs to be treated with care.

@piyush-singh
Copy link

cc @josueeee, we should discuss a way to visually indicate that the timings for background jobs might not be accurate (or just hiding the statements entirely).

@vivekmenezes
Copy link
Contributor

I don't think there is much we can do for 2.1 on the UI side. It will need backend support. I think the choices for 2.1 are

  1. leave it as it is
  2. remove schema change statements from crdb_internal query.

I'm inclined to say that we fix this for 2.2.

@knz
Copy link
Contributor

knz commented Sep 20, 2018

@couchand @piyush-singh

I think the reasonable step forward here (both for 2.1 and later versions) is to simply add a visual indicator on the statements stats page for DDL statements and other statements that use jobs (e.g backup/restore, import/export) that the timing info is not representative of the full execution time.

I would recommend against all of the following:

  • hiding the things from the UI (will raise more questions than it answers)
  • hiding the things from the crdb_internal table (this is totally unacceptable for metric collection in the reg server)
  • splitting the data collection logic (this would REALLY hurt maintainability so only to be considered in case of "force majeure")
  • trying to capture the job execution time into the stat statistics (that would be a hornet's nest of complexity)
  • trying to extend the phase definition for DDL statements to extend to the end of the current txn (up to commit time) -- this would be another hornet's nest of complexity.

@vivekmenezes
Copy link
Contributor

vivekmenezes commented Sep 20, 2018 via email

@knz
Copy link
Contributor

knz commented Sep 20, 2018

We have a predicate already to recognize DDL statements: tree.CanModifySchema().

@couchand
Copy link
Contributor

add a visual indicator on the statements stats page

I agree that's a pragmatic and helpful improvement. What we need it a way to clearly identify these statement types. Note that it's not just DDL, we want explicit job creation statements to have this warning: BACKUP, RESTORE, etc. And does every type of DDL statement create a job?

I would recommend against all of the following:

  • hiding the things...
  • trying to capture the job execution time
  • trying to extend the phase definition

agreed

  • splitting the data collection logic

I don't think we have enough information right now to know if this is to be recommended, but that's a question for another day anyway.

@sploiselle
Copy link
Contributor

@couchand Can I get a quick blurb describing this known limitation w/r/t the impact to user experience? Ideally, we need it by Friday 10/26 for the 2.1 Known Limitations page. Posting it on this issue and/or pinging me would be great.

@piyush-singh
Copy link

Quick strawman (feel free to suggest any edits @couchand ):

"The Statements page does not correctly report mean latency or latency by phase for statements that result in schema changes or other background jobs."

Not 100% sure what other background jobs can be kicked off by a statement off of the top of my head. Checked docs and I see "Whenever you initiate a schema change, CockroachDB registers it as a job, which you can view with SHOW JOBS" for schema change statements, but that note doesn't appear for CREATE INDEX.

@knz
Copy link
Contributor

knz commented Oct 24, 2018

the fact there is no job for create index is a bug and I think a known limitation. @awoods187 do we already have a roadmap item for that? If not let's make one.

@knz
Copy link
Contributor

knz commented Oct 24, 2018

sorry let us do make one

@vivekmenezes
Copy link
Contributor

CREATE INDEX does create a job for 2.1

@vivekmenezes
Copy link
Contributor

DROP TABLE/DATABASE/TRUNCATE are the schema changes that we do not have jobs for in 2.1

@awoods187
Copy link
Contributor

I'm going to close this with the various JOBs page improvements conducted to date. Feel free to re-open or comment if the needs discussed here have not been met.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-webui-queryperf C-question A question rather than an issue. No code/spec/doc change needed. docs-done docs-known-limitation O-community Originated from the community S-3-ux-surprise Issue leaves users wondering whether CRDB is behaving properly. Likely to hurt reputation/adoption.
Projects
None yet
Development

No branches or pull requests