Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ui: use step charts for timeseries to clarify measurement interval #5980

Closed
tbg opened this issue Apr 11, 2016 · 9 comments
Closed

ui: use step charts for timeseries to clarify measurement interval #5980

tbg opened this issue Apr 11, 2016 · 9 comments
Labels
A-webui-timeseries C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. X-stale

Comments

@tbg
Copy link
Member

tbg commented Apr 11, 2016

When I look at the ui and pick (for example) the reads graph, I'm not really sure what I'm reading. For example, if I send a single SELECT on an otherwise dormant cluster in a 10 second interval, I think I see a spike on that graph at 0.1. Knowing that time series are collected every 10 seconds, I deduce that the units on the y-axis is probably .1/s. But that seems random - you expect 1/s.

Btw, I've repeatedly tried to hover over the little info symbol at the top right of each graph - we should start populating those or hide them.

And something else - an insert of the form INSERT ... VALUES a, b, c, d, e, f ... counts only as one (not as the number of rows inserted) (saw this during #5981). Maybe that's not what people expect? In any case, should be clear from the tool tip.

@cuongdo
Copy link
Contributor

cuongdo commented Apr 11, 2016

I'm confused. If there was 1 SELECT in 10 seconds, isn't the actual rate of
SELECTs 0.1/second? If there was 1 SELECT per second, that'd mean you had
10 SELECTs in that 10-second interval.

On Mon, Apr 11, 2016 at 10:43 AM Tobias Schottdorf [email protected]
wrote:

When I look at the ui and pick (for example) the reads graph, I'm not
really sure what I'm reading. For example, if I send a single SELECT on
an otherwise dormant cluster in a 10 second interval, I think I see a spike
on that graph at 0.1. Knowing that time series are collected every 10
seconds, I deduce that the units on the y-axis is probably .1/s. But that
seems random - you expect 1/s.


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#5980

@tbg
Copy link
Member Author

tbg commented Apr 11, 2016

Yes, but looking at that graph, how am I supposed to know that 10 seconds is the base interval there? It doesn't make sense without that information. And who measures operations in .1s in practice? We always talk ops/sec. I would expect that single spike to go all the way up to 1, telling me that at that point in time there were 1 ops/sec. Or, I would expect the whole 10 second interval in which this operation happened to be at a constant .1 (we had .1 ops/sec over this 10 second interval).

@maxlang
Copy link
Contributor

maxlang commented Apr 21, 2016

@tschottdorf just wanted to check if you think this is still an issue after #6006

The graph titles are now "READS PER SECOND" and "WRITES PER SECOND" and the tooltip explains it's averaged over 10s.

The base interval shouldn't matter, especially once you get high enough traffic.

@maxlang maxlang assigned mrtracy and maxlang and unassigned mrtracy Apr 21, 2016
@tbg
Copy link
Member Author

tbg commented Apr 22, 2016

I still think there's an issue. We still show the rate at a point and
not over the interval. Really what it should be showing when hovering over
the datapoint is the time interval over which we had the events/sec (and
the graph should be a bar graph, i.e. a constant value over these 10s
slices, and the datapoint centered).

However, if that's only me OCDing about things, I don't want to push the
point too much since there's a lot of other stuff to do. Maybe just do what
@mberhault thinks - care to chime in?

On Thu, Apr 21, 2016 at 12:17 PM Max Lang [email protected] wrote:

@tschottdorf https://github.com/tschottdorf just wanted to check if you
think this is still an issue after #6006
#6006

The graph titles are now "READS PER SECOND" and "WRITES PER SECOND" and
the tooltip explains it's averaged over 10s.

The base interval shouldn't matter, especially once you get high enough
traffic.


You are receiving this because you were mentioned.

Reply to this email directly or view it on GitHub
#5980 (comment)

-- Tobias

@mberhault
Copy link
Contributor

showing bars is tricky. The common way of doing rates in monitoring graphs is to pick some point in the interval (beginning, end, middle, it depends on the system) but you must make sure the rate interval is well explained as that makes a big difference, especially for spikes (it smoothes everything).

eg: rpc_rate[1m] would be a clear name for rpc with a rate over 1m. where the actual datapoint are graphed is not a big deal, high resolution at that level is not necessarily useful.

@tbg
Copy link
Member Author

tbg commented Apr 22, 2016

Ok, sounds like we can close this then (?).

On Fri, Apr 22, 2016 at 5:29 AM marc [email protected] wrote:

showing bars is tricky. The common way of doing rates in monitoring graphs
is to pick some point in the interval (beginning, end, middle, it depends
on the system) but you must make sure the rate interval is well explained
as that makes a big difference, especially for spikes (it smoothes
everything).

eg: rpc_rate[1m] would be a clear name for rpc with a rate over 1m. where
the actual datapoint are graphed is not a big deal, high resolution at that
level is not necessarily useful.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#5980 (comment)

-- Tobias

@maxlang maxlang added this to the Later milestone Apr 25, 2016
@maxlang
Copy link
Contributor

maxlang commented Apr 25, 2016

It would be confusing to look at, but it seems like a scatter plot might get this point across better (that way it's clearer that each point is a sample.) We should also make the sampling period clearer - either visible somewhere on the page or part of the graph title somehow. Since the sampling period is the same for every graph on the page, it might make sense to include it on the page level. For tables, however, you might want to look across a few different sampling periods.

I'm moving this to the "Later" milestone and will look back into it when I'm revisiting our visualizations and sampling rates.

@maxlang maxlang removed their assignment Jul 26, 2016
@couchand couchand added A-webui-timeseries C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. labels Apr 24, 2018
@couchand
Copy link
Contributor

After reading the discussion on this issue, I think it perfectly represents a concern I've had for a while: our timeseries line charts are misleading. There are several aspects to this (for instance, #17552), but in this case, the problem is that we show an interval as a single point.

I think @tschottdorf's suggestion is generally right, though I would suggest that we want to show a step chart instead of a bar chart, since we're showing a continuous measure.

@couchand couchand changed the title ui: confusing units on rates (read, write, ...) ui: use step charts for timeseries to clarify measurement interval Apr 24, 2018
@petermattis petermattis removed this from the Later milestone Oct 5, 2018
@github-actions
Copy link

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-webui-timeseries C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. X-stale
Projects
None yet
Development

No branches or pull requests

8 participants