-
Notifications
You must be signed in to change notification settings - Fork 674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SOLR-10654: Introduce output of Prometheus metrics directly from Solr #2405
Conversation
public SolrCoreMetric parseLabels() { | ||
String[] parsedMetric = metricName.split("\\."); | ||
if (dropwizardMetric instanceof Gauge) { | ||
String type = parsedMetric[2]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible for metricName
to be size 1 at this point? If so, we should surround this in a try/throw or a check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not seen a metric with delimited with a size of 1 yet but in cases that does, I did surround with try/catch at a higher level here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are enough additional dependencies that I feel strongly this should be another module.
It's a shame to see so much code to do the mapping but... shrug, ah well.
solr/core/src/java/org/apache/solr/metrics/prometheus/core/SolrCoreMetric.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/metrics/prometheus/core/package-info.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/metrics/prometheus/SolrPrometheusRegistry.java
Outdated
Show resolved
Hide resolved
Moving this out of draft for review |
WDYT? |
Sorry, I somehow completely missed this comment. When you say be another module, do you mean move it to |
No, there are 4 dependencies. Only the last one concerns me. Why is protobuf needed at all? Can it be excluded? Can any of the others be excluded as well, while you're at it? |
It doesn't feel right to merge this PR right now. Needs docs. Also the choice of what metrics are done so far is inverted from the value that would make most sense for this whole approach -- OS/JVM/node issues are highest value. |
@dsmiley Just finished up exporting |
Added ref-guide documentation on Prometheus endpoint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to suggest an integration test showing all the output (say to a file) stripped of numbers (to remove brittle/changing stuff) to show that we export what we mean to export. I could imagine someone renaming some metric and then forgetting to update the code here.
solr/core/src/java/org/apache/solr/metrics/prometheus/jvm/SolrJvmBuffersMetric.java
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/metrics/prometheus/node/SolrNodeContainerMetric.java
Show resolved
Hide resolved
public static final String NODE_REQUESTS = "solr_metrics_node_requests"; | ||
public static final String NODE_SECONDS_TOTAL = "solr_metrics_node_requests_time"; | ||
public static final String NODE_CONNECTIONS = "solr_metrics_node_connections"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay but FWIW IMO in situations like this class (parsing/exporting) constants like this only diffuse readability (add a needless indirection). Take it or leave it; your choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally prefer constants so I'd leave it unless someone feels strongly about it.
} | ||
|
||
SolrMetric solrNodeMetric = categorizeMetric(dropwizardMetric, metricName); | ||
solrNodeMetric.parseLabels().toPrometheus(this); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we forget to call parseLabels, will we recognize a mistake? not a big deal but just a thought
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't say its a mistake. I split these into 2 abstract functions to split out the logic for getting the labels, then converting the correct dropwizard type to the correct prometheus type instead of 1 big function.
I think it's fine if not called. If down the line additional metrics want to be exported from dropwizard, it's possible to just have it exported without any labels and just it's value to which parselabels isn't needed.
solr/core/src/java/org/apache/solr/metrics/prometheus/node/SolrPrometheusNodeExporter.java
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/response/PrometheusResponseWriter.java
Outdated
Show resolved
Hide resolved
solr/core/src/java/org/apache/solr/response/PrometheusResponseWriter.java
Outdated
Show resolved
Hide resolved
solr/solr-ref-guide/modules/deployment-guide/pages/monitoring-with-prometheus-and-grafana.adoc
Outdated
Show resolved
Hide resolved
Thanks for reviewing again. Added in an integration test with a file stripped of values. Also the unit tests I wrote originally should catch if any changes that happen to the naming of the prometheus metric or labels. Also thanks for catching missing javadocs. Made sure to add them in for each class and added sample values for what is being exported to prometheus. |
solr/solr-ref-guide/modules/deployment-guide/pages/monitoring-with-prometheus-and-grafana.adoc
Outdated
Show resolved
Hide resolved
…with-prometheus-and-grafana.adoc Co-authored-by: Christine Poerschke <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this PR locally using the "techproducts" example. I thing I noticed is that the metrics filtering appears to not work, which is a big problem. A test should be added for this, and it should work.
http://localhost:8983/solr/admin/metrics?category=solr.jetty&wt=prometheus
return this; | ||
} | ||
|
||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(this feedback applies to most classes where you did this).
You've put this where javadocs go yet it's it's technically not javadocs since you didn't start with a leading double-asterisk. If you were to do that, you might need to add markup :-/ I suppose leaving this as-is is fine. Even moving it above e parseLabels and add a blank line to further show this is not (trying to be) javadoc. FWIW when I suggested adding input examples for parsing, I thought you would add some trivial "//" in-line at the point of parsing. But this is fine, as you wish.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for this. Went with your other suggestion of putting it above parseLabels
and add an empty line below it.
If you still prefer just normal inline comments, I can make the appropriate changes.
*/ | ||
@Override | ||
public void toPrometheus(SolrPrometheusExporter exporter) { | ||
if (dropwizardMetric instanceof Meter) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I re-read all this code in the PR, esp. for these toPrometheus methods, the instanceof
checks seem more awkward with questionable value. If instead we see that the name matches a pattern, let's just cast. If we get a ClassCastException, a test will catch it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went ahead and removed most of the instanceof
checks but kept in some places. In the places I kept, it seemed cleaner to export that way as not all my checks are just straight names checks and we don't export every metric. So we can just export a large amount by just checking its type. I can try refactoring further of removing instanceof
checks if you still disagree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't go in the "solr" folder; that's a "solr home dir" and thus only solr config files go there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created it's own prometheus directory to place the file in a place we can place future test files relating to prometheus
solr/core/src/test/org/apache/solr/response/TestPrometheusResponseWriter.java
Outdated
Show resolved
Hide resolved
solr/core/src/test/org/apache/solr/response/TestPrometheusResponseWriter.java
Outdated
Show resolved
Hide resolved
Files.readString( | ||
new File(TEST_PATH().toString(), "solr-prometheus-output.txt").toPath(), | ||
StandardCharsets.UTF_8); | ||
assertEquals(expectedOutput, actual.replaceAll("(?<=}).*", "")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can expect there will be more cleansing/normalization to do. For example the JVM used will result in different "item" labels on solr_metrics_jvm_gc. I got "G1-Old-Generation" vs "PS-MarkSweep". People will use different JVMs for tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parsed out the JVM item tag from the test with additional regex
Files.readString( | ||
new File(TEST_PATH().toString(), "solr-prometheus-output.txt").toPath(), | ||
StandardCharsets.UTF_8); | ||
assertEquals(expectedOutput, actual.replaceAll("(?<=}).*", "")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you add a comment explaining that regexp please? It's all symbols 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in a comment with explanation.
Filtering does work I think the parameter you are looking for is But still you are correct, I should have a test for this. Will add one in. The only other thing I remember now about this is that Even then if you need extensive filtering or trying to get some specific metric, maybe the Prometheus Exporter process with a custom config might be the better solution. |
Oops; maybe I was hurried. I'm glad the category works at least! The other filtering options could be deferred to a future JIRA by matching the prometheus name. It's a very useful capability; some metrics implemented as gauges are expensive to compute, and others are just excessive output that aren't needed for O(N) for N cores if N is for some users quite high. |
Added in a test for the filtering. I'd be happy expand on this and add further filtering feature specific to prometheus after this is ready to be merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is super close now.
Can you propose a brief CHANGES.txt under 9.7 "New Features" (I think it's closer to this than an improvement).
|
||
actualSnapshot = getMetricSnapshot(actualSnapshots, "solr_metrics_jvm_gc_seconds"); | ||
actualGaugeDataPoint = | ||
getGaugeDatapointSnapshot(actualSnapshot, Labels.of("item", "G1-Old-Generation")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assertion will fail for varying JVMs; skip this one. Some others too above... it's okay to make the test looser / less specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed a few more tests from the JVM registry
@@ -879,7 +879,7 @@ The response will look like this: | |||
]} | |||
---- | |||
|
|||
Like other request handlers, the Metrics API can also take the `wt` parameter to define the output format. | |||
Like other request handlers, the Metrics API can also take the `wt` parameter to define the output format. The Metrics API also provides an additional special `wt` parameter `prometheus` that will output metrics in Prometheus format. This can be used for xref:monitoring-with-prometheus-and-grafana.adoc#metrics-api-with-prometheus-format[Monitoring with Prometheus and Grafana] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After each sentence, put a newline (doesn't change the final render but it to keep editing & diffs more manageable).
Should document what options do not work with prometheus. The regex one doesn't, you said.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in a small note about the filter query parameters. Let me know if you prefer a different way to write it on the ref-guide
Added entry into CHANGES.txt |
An alternative to the "Prometheus Exporter" in which each Solr node can return the Prometheus format natively from the MetricsHandler using `wt=prometheus` param. It's much faster and architecturally simpler, albeit less flexible. Co-authored-by: mbiscocho <[email protected]> Co-authored-by: Christine Poerschke <[email protected]> Co-authored-by: David Smiley <[email protected]> (cherry picked from commit fd7d447)
https://issues.apache.org/jira/browse/SOLR-10654
Description
Introduce a new Prometheus response writer
/solr/admin/metrics?wt=prometheus
to output Dropwizards Solr Core registry metrics directly from Solr. Output looks similar to the prometheus exporter. This exportssolr.core.*
solr.node
solr.jvm
solr.jetty
metric registriesSolution
Currently, Solr metrics are collected by Dropwizard and differentiated by metric names. This is an anti-pattern to Prometheus. Manually export Dropwizard Solr core registry metrics to a new
SolrPrometheusExporter
base class to parse out the labels from the metric name and output in a Prometheus readable format for the Prometheus Response writer. The exporter is just temporarily created to export a snapshot of what the current Dropwizard registry is at. These are not held in memory.The exporter will categorize these metrics and export to prometheus format with labels depending on the category and metric names.
In standalone, all labels for the
solr.core
registry includescore
. In cloud mode, all metrics includecore
,collection
,shard
,replica
NOTE: Not all metrics are possible to convert to Prometheus such as Dropwizard string metrics. This is because Prometheus does not support non-numeric types
Tests
Added
MetricsHandlerTest
,SolrPrometheusExporterTest
,TestPrometheusReponseWriter
Checklist
Please review the following and check all that apply:
main
branch../gradlew check
.