receiver/prometheus: propagate Prometheus.Debug error values into .Warn for easy display #2906

odeke-em · 2021-04-10T06:33:13Z

This change transforms Prometheus created .Debug level errors such as
failed scrape message reasons into a level that be displayed to
collector users, without them having to use --log-level=DEBUG.

In 2017, a Prometheus PR prometheus/prometheus#3135
added the failure reason displays with a .Debug level.

This change now ensures that a Prometheus log that's routed from
say a scrape failure that was logged originally from Prometheus as:

2021-04-09T22:58:51.732-0700	debug	scrape/scrape.go:1127
Scrape failed	{"kind": "receiver", "name": "prometheus",
"scrape_pool": "otel-collector", "target": "http://0.0.0.0:9999/metrics",
"err": "Get \"http://0.0.0.0:9999/metrics\": dial tcp 0.0.0.0:9999: connect: connection refused"}

will now get transformed to:

2021-04-09T23:24:41.733-0700	warn	internal/metricsbuilder.go:104
Failed to scrape Prometheus endpoint    {"kind": "receiver", "name": "prometheus",
"scrape_timestamp": 1618035881732, "target_labels": "map[instance:0.0.0.0:9999 job:otel-collector]"}

which will now be surfaced to users.

Fixes #2364

odeke-em · 2021-04-10T06:35:58Z

Kindly cc-ing @bogdandrutu @Aneurysm9 @rakyll @alolita

codecov · 2021-04-10T06:45:14Z

Codecov Report

Merging #2906 (b9818ee) into main (a31ad5b) will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main    #2906      +/-   ##
==========================================
+ Coverage   91.67%   91.69%   +0.01%     
==========================================
  Files         287      287              
  Lines       15250    15254       +4     
==========================================
+ Hits        13981    13987       +6     
+ Misses        866      863       -3     
- Partials      403      404       +1

Impacted Files	Coverage Δ
receiver/prometheusreceiver/internal/logger.go	`86.79% <100.00%> (+5.15%)`	⬆️
testutil/testutil.go	`81.60% <0.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a31ad5b...b9818ee. Read the comment docs.

Aneurysm9

I'm not convinced that it is appropriate to rewrite the level from debug towarn. I share the concerns expressed upstream with regard to log volume. I think this approach also results in all debug level log entries that carry an error becoming warn, which may be more than is intended.

receiver/prometheusreceiver/internal/logger_test.go

…rn for easy display This change transforms Prometheus created .Debug level errors such as failed scrape message reasons into a level that be displayed to collector users, without them having to use --log-level=DEBUG. In 2017, a Prometheus PR prometheus/prometheus#3135 added the failure reason displays with a .Debug level. This change now ensures that a Prometheus log that's routed from say a scrape failure that was logged originally from Prometheus as: 2021-04-09T22:58:51.732-0700 debug scrape/scrape.go:1127 Scrape failed {"kind": "receiver", "name": "prometheus", "scrape_pool": "otel-collector", "target": "http://0.0.0.0:9999/metrics", "err": "Get \"http://0.0.0.0:9999/metrics\": dial tcp 0.0.0.0:9999: connect: connection refused"} will now get transformed to: 2021-04-09T23:24:41.733-0700 warn internal/metricsbuilder.go:104 Failed to scrape Prometheus endpoint {"kind": "receiver", "name": "prometheus", "scrape_timestamp": 1618035881732, "target_labels": "map[instance:0.0.0.0:9999 job:otel-collector]"} which will now be surfaced to users. Fixes open-telemetry#2364

odeke-em · 2021-04-13T16:53:26Z

Thank you for the response @Aneurysm9!

I'm not convinced that it is appropriate to rewrite the level from debug towarn. I share the concerns expressed upstream with regard to log volume. I think this approach also results in all debug level log entries that carry an error becoming warn, which may be more than is intended.

I don’t see any other way to inform users of instance related failures without turning on the floodgates using —log-level=DEBUG. Shouldn’t debug level entries with an error instead turn into a WARN? How might you propose this gets fixed? Should we just tell the users that there is nothing that we can do? I’d be more inclined to ask Prometheus to have that as a WARN, but that’s an uphill task of its own. Please let me know what you think.

github-actions · 2021-04-21T03:17:56Z

This PR was marked stale due to lack of activity. It will be closed in 7 days.

github-actions · 2021-05-05T03:35:40Z

This PR was marked stale due to lack of activity. It will be closed in 7 days.

alolita · 2021-05-07T19:06:31Z

@anuraaga @open-telemetry/collector-approvers can you please review and unblock this PR?

github-actions · 2021-05-15T03:40:15Z

This PR was marked stale due to lack of activity. It will be closed in 7 days.

anuraaga · 2021-05-15T07:29:37Z

Agree with @Aneurysm9 we can't really convert debug logs like this as we don't have control of what logging upstream does. A more targeted approach of identifying specific messages (e.g., auth failure) could be ok but in general, collector doesn't log much at default, when something isn't working we generally expect temporarily enabling log-level=debug, fix the issue, and, restore log-level.

bogdandrutu · 2021-05-19T19:01:20Z

Mark as waiting for author so @alolita does see that this is not blocked by reviewers.

@Aneurysm9 @rakyll @anuraaga please review and make a decision.

github-actions · 2021-05-27T04:16:24Z

This PR was marked stale due to lack of activity. It will be closed in 7 days.

github-actions · 2021-06-03T05:25:20Z

Closed as inactive. Feel free to reopen if this PR is still being worked on.

odeke-em requested a review from a team April 10, 2021 06:33

odeke-em force-pushed the receiver-prometheus-transform-Prometheus-log.Debug-with-errors-into-level.Warn branch from 0462141 to 85585ce Compare April 10, 2021 06:33

odeke-em mentioned this pull request Apr 10, 2021

Improve error message for prometheus scrape failure #2364

Closed

Aneurysm9 reviewed Apr 12, 2021

View reviewed changes

receiver/prometheusreceiver/internal/logger_test.go Outdated Show resolved Hide resolved

odeke-em added 2 commits April 12, 2021 11:29

Run all ExtractLogData tests instead of last one for debug

b9818ee

odeke-em force-pushed the receiver-prometheus-transform-Prometheus-log.Debug-with-errors-into-level.Warn branch from 85585ce to b9818ee Compare April 12, 2021 18:35

github-actions bot added the Stale label Apr 21, 2021

bogdandrutu removed the Stale label Apr 27, 2021

Aneurysm9 mentioned this pull request May 3, 2021

Track @Aneurysm9 interest and contributions for approver #2820

Closed

github-actions bot added the Stale label May 5, 2021

bogdandrutu removed the Stale label May 5, 2021

alolita added the area:prometheus label May 7, 2021

github-actions bot added the Stale label May 15, 2021

github-actions bot removed the Stale label May 16, 2021

bogdandrutu added the waiting-for-author label May 19, 2021

github-actions bot added the Stale label May 27, 2021

github-actions bot closed this Jun 3, 2021

odeke-em deleted the receiver-prometheus-transform-Prometheus-log.Debug-with-errors-into-level.Warn branch July 29, 2021 23:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

receiver/prometheus: propagate Prometheus.Debug error values into .Warn for easy display #2906

receiver/prometheus: propagate Prometheus.Debug error values into .Warn for easy display #2906

odeke-em commented Apr 10, 2021

odeke-em commented Apr 10, 2021

codecov bot commented Apr 10, 2021 •

edited

Loading

Aneurysm9 left a comment

odeke-em commented Apr 13, 2021

github-actions bot commented Apr 21, 2021

github-actions bot commented May 5, 2021

alolita commented May 7, 2021

github-actions bot commented May 15, 2021

anuraaga commented May 15, 2021

bogdandrutu commented May 19, 2021

github-actions bot commented May 27, 2021

github-actions bot commented Jun 3, 2021

receiver/prometheus: propagate Prometheus.Debug error values into .Warn for easy display #2906

receiver/prometheus: propagate Prometheus.Debug error values into .Warn for easy display #2906

Conversation

odeke-em commented Apr 10, 2021

odeke-em commented Apr 10, 2021

codecov bot commented Apr 10, 2021 • edited Loading

Codecov Report

Aneurysm9 left a comment

Choose a reason for hiding this comment

odeke-em commented Apr 13, 2021

github-actions bot commented Apr 21, 2021

github-actions bot commented May 5, 2021

alolita commented May 7, 2021

github-actions bot commented May 15, 2021

anuraaga commented May 15, 2021

bogdandrutu commented May 19, 2021

github-actions bot commented May 27, 2021

github-actions bot commented Jun 3, 2021

codecov bot commented Apr 10, 2021 •

edited

Loading