Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: support full-precision, machine-readable numbers in server status output #34

Open
timvaillancourt opened this issue Aug 22, 2016 · 3 comments

Comments

@timvaillancourt
Copy link

timvaillancourt commented Aug 22, 2016

Hey guys,

In a separate Prometheus MongoDB exporter project (https://github.com/Percona-Lab/prometheus_mongodb_exporter) we use some golang code to present the MongoRocks 'db.serverStatus().rocksdb' output as Prometheus metrics (for monitoring/graphing/etc). It is mostly working but I am running into 2 x major issues with the format of the stats.

Unfortunately as the RocksDB/MongoRocks output is in "human-readable" format my concern is while the string-parsing code I made to parse the stats ouput is currently working, it could very easily break if the output changes, making this a very "brittle"/hacky solution. For this I would like to request that these metrics are somehow presented in a regular, nested data structure that is machine-readable. This output format could be optional, however making it the default would seem to align a bit better with the rest of serverStatus.

Secondly, I am noticing the human format causes numbers to be heavily rounded, making it difficult or impossible to create time series on most counters in RocksDB because the precision is lost, an example:

A counter of 10,000,000 will become the string "10M" in the output (no decimal precision), meaning a time series will show no change in a given counter until there is 1 million changes, ie: the number goes from "10M" to "11M".

Of course, this causes any time series to show no activity and then one massive 1-million spike, which is essentially unusable. This is tracked in an issue on our github: percona/mongodb_exporter#24. Similar to my first issue, a machine-readable format with full precision ints or floats would resolve this issue for me, even if the number is an "estimation".

Cheers,

Tim

@igorcanadi
Copy link
Contributor

I definitely agree with this. Currently we have three different statistics in the output:

  1. human-readable stats, that we get from RocksDB here: https://github.com/mongodb-partners/mongo-rocks/blob/master/src/rocks_server_status.cpp#L79
  2. Couple of numeric properties that are true as of current moment (num entries in the memtable, etc): https://github.com/mongodb-partners/mongo-rocks/blob/master/src/rocks_server_status.cpp#L80-L91
  3. And then finally aggregated stats, which are not turned on by default (you can turn them on just by restarting the database) and are exported here: https://github.com/mongodb-partners/mongo-rocks/blob/master/src/rocks_server_status.cpp#L183-L196

Ideally I'd like to keep (1) as is. However, I think we should duplicate the most useful stats from (1) into either (2) or (3) category (depending on if it's current or aggregate), which are both machine readable since they're based on key-value pairs.

Now, which of the stats in (1) would you like to see be upgraded to the machine-readable format?

@timvaillancourt
Copy link
Author

timvaillancourt commented Aug 23, 2016

Thanks Igor!

I think the human-readable stats (Item 1) should remain the default and any additional counters this PR hypothetically-creates could become part of the 'db.serverStatus().rocksdb.counters' (Item 3).

I've enabled the rocksdb counters on my dev environment and I think much of what is there will be very helpful, thanks! I will start converting my exporter to use these metrics instead and report back.

Ideally these are the additional counters I would like (a lot - sorry :D):

  • Batched writes (total batches)
  • Stall info (total seconds, total stalls)
  • Compaction time (total seconds, total compactions)
  • Per-level bytes (total read, total write, total Wnew, total read N, total read N+1)
  • Per-level compactions (total seconds, total compactions)
  • Per-level keys (total in, total drop)
  • Tombstones (total tombstones, anything other suggestions appreciated)

Some items that would be useful but aren't necessary incremented counters as they're "current" values:

  • Per-level score
  • Per-level WAmp
  • Per-level number of files
  • Per-level number of file threads
  • Per-level total byte size
  • Level read latency microseconds (min, max, avg, 99%, 99.9%)

Thanks again

@timvaillancourt
Copy link
Author

Added tombstone info to the list and made 2 x tweaks.

igorsol added a commit to igorsol/mongo-rocks that referenced this issue Dec 9, 2016
…p-output

Fix startup ouptut messages in 3.4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants