-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
receiver/prometheus: propagate Prometheus.Debug error values into .Warn for easy display #2906
Conversation
0462141
to
85585ce
Compare
Kindly cc-ing @bogdandrutu @Aneurysm9 @rakyll @alolita |
Codecov Report
@@ Coverage Diff @@
## main #2906 +/- ##
==========================================
+ Coverage 91.67% 91.69% +0.01%
==========================================
Files 287 287
Lines 15250 15254 +4
==========================================
+ Hits 13981 13987 +6
+ Misses 866 863 -3
- Partials 403 404 +1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced that it is appropriate to rewrite the level from debug
towarn
. I share the concerns expressed upstream with regard to log volume. I think this approach also results in all debug
level log entries that carry an error
becoming warn
, which may be more than is intended.
…rn for easy display This change transforms Prometheus created .Debug level errors such as failed scrape message reasons into a level that be displayed to collector users, without them having to use --log-level=DEBUG. In 2017, a Prometheus PR prometheus/prometheus#3135 added the failure reason displays with a .Debug level. This change now ensures that a Prometheus log that's routed from say a scrape failure that was logged originally from Prometheus as: 2021-04-09T22:58:51.732-0700 debug scrape/scrape.go:1127 Scrape failed {"kind": "receiver", "name": "prometheus", "scrape_pool": "otel-collector", "target": "http://0.0.0.0:9999/metrics", "err": "Get \"http://0.0.0.0:9999/metrics\": dial tcp 0.0.0.0:9999: connect: connection refused"} will now get transformed to: 2021-04-09T23:24:41.733-0700 warn internal/metricsbuilder.go:104 Failed to scrape Prometheus endpoint {"kind": "receiver", "name": "prometheus", "scrape_timestamp": 1618035881732, "target_labels": "map[instance:0.0.0.0:9999 job:otel-collector]"} which will now be surfaced to users. Fixes open-telemetry#2364
85585ce
to
b9818ee
Compare
Thank you for the response @Aneurysm9!
I don’t see any other way to inform users of instance related failures without turning on the floodgates using —log-level=DEBUG. Shouldn’t debug level entries with an error instead turn into a WARN? How might you propose this gets fixed? Should we just tell the users that there is nothing that we can do? I’d be more inclined to ask Prometheus to have that as a WARN, but that’s an uphill task of its own. Please let me know what you think. |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
@anuraaga @open-telemetry/collector-approvers can you please review and unblock this PR? |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
Agree with @Aneurysm9 we can't really convert debug logs like this as we don't have control of what logging upstream does. A more targeted approach of identifying specific messages (e.g., auth failure) could be ok but in general, collector doesn't log much at default, when something isn't working we generally expect temporarily enabling log-level=debug, fix the issue, and, restore log-level. |
Mark as waiting for author so @alolita does see that this is not blocked by reviewers. @Aneurysm9 @rakyll @anuraaga please review and make a decision. |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
Closed as inactive. Feel free to reopen if this PR is still being worked on. |
This change transforms Prometheus created .Debug level errors such as
failed scrape message reasons into a level that be displayed to
collector users, without them having to use --log-level=DEBUG.
In 2017, a Prometheus PR prometheus/prometheus#3135
added the failure reason displays with a .Debug level.
This change now ensures that a Prometheus log that's routed from
say a scrape failure that was logged originally from Prometheus as:
will now get transformed to:
which will now be surfaced to users.
Fixes #2364