-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consistency in Elastic stack metricsets' code #8308
Consistency in Elastic stack metricsets' code #8308
Conversation
@@ -57,25 +56,35 @@ func New(base mb.BaseMetricSet) (mb.MetricSet, error) { | |||
func (m *MetricSet) Fetch(r mb.ReporterV2) { | |||
isMaster, err := elasticsearch.IsMaster(m.HTTP, m.HostData().SanitizedURI+clusterStatsPath) | |||
if err != nil { | |||
r.Error(fmt.Errorf("Error fetching master info: %s", err)) | |||
msg := errors.Wrap(err, "error determining if connected Elasticsearch node is master.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Errors strings should not end with punctuation. https://github.com/golang/go/wiki/CodeReviewComments#error-strings.
return &MetricSet{ | ||
base, | ||
http, | ||
config.XPack, | ||
log, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might as will inline the logp.NewLogger
statement.
@@ -92,11 +92,15 @@ func (m *MetricSet) Fetch(r mb.ReporterV2) { | |||
} | |||
|
|||
if m.MetricSet.XPack { | |||
eventsMappingXPack(r, m, content) | |||
err = eventsMappingXPack(r, m, content) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error is never reported. Is that intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we don't want to report X-Pack errors because they would end up in the metricbeat-*
index, not the .monitoring-*
index which is where all the "success" documents go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Obviously this is something we want to change at some point, but for now we're just trying to achieve parity with existing documents in .monitoring-*
.
} else { | ||
err = eventsMapping(r, content) | ||
if err != nil { | ||
r.Error(err) | ||
} | ||
} | ||
|
||
if err != nil { | ||
m.Log.Error(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are the reporting and logging handled separately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mainly because we want to log x-pack and non-xpack errors but want to report only non-xpack errors. I agree, it's not very pleasant.
@@ -20,6 +20,11 @@ package index_recovery | |||
import ( | |||
"encoding/json" | |||
|
|||
"github.com/elastic/beats/metricbeat/helper/elastic" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This import should be grouped with the other github.com/elastic imports.
err = eventsMapping(r, *info, content) | ||
if err != nil { | ||
r.Error(err) | ||
m.Log.Error(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a common pattern. I wonder if it would make sense to have the Metricbeat framework log any error that is reported by a metricset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would definitively nice to make this more generic.
Best in a follow up PR.
return | ||
} | ||
|
||
if m.MetricSet.XPack { | ||
eventMappingXPack(r, m, content) | ||
err = eventMappingXPack(r, m, content) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error is never reported. Is that intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, xpack data structure cannot make sense of the errors.
@@ -32,6 +32,9 @@ import ( | |||
// Global clusterIdCache. Assumption is that the same node id never can belong to a different cluster id | |||
var clusterIDCache = map[string]string{} | |||
|
|||
// ModuleName is the ame of this module |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/ame/name/
And godocs should have proper punctuation. Wouldn't it be nicer if the descriptions here ended in punctuation. 😄
@@ -58,16 +59,14 @@ func eventsMapping(r mb.ReporterV2, info elasticsearch.Info, content []byte) err | |||
|
|||
err := json.Unmarshal(content, &indicesStruct) | |||
if err != nil { | |||
r.Error(err) | |||
return err | |||
return errors.Wrap(err, "failure parsing Elasticsearch Stats API response") | |||
} | |||
|
|||
var errors multierror.Errors | |||
for name, index := range indicesStruct.Indices { | |||
event := mb.Event{} | |||
event.MetricSetFields, err = schema.Apply(index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The event
can also contain an error. I think it would make sense to store/report the error alongside the data that it's associated with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is now solved with the more resent commits?
Good stuff, @andrewkroh. Appreciate the early feedback so I don't have to make the same fixes in a number of places later 😄! |
CI is failing because of unrelated flaky test: #8208 |
@ruflin @andrewkroh The CI intake job is failing with this error:
I checked the imports in |
Can you try to run |
…#8497) This PR makes the Metricbeat `logstash` module code consistent with other Elastic stack modules by: * using `FetchContent()` instead of `FetchJSON()` to fetch data from LS APIs before passing it to the event mapping function. * adding unit tests. Further consistency around event reporting and error handling will be achieved in #8308.
@ruflin I did get a diff but it was in a file completely unrelated to this PR or the error shown by the intake job in CI. Still I tried to commit that diff in da30d94 but it didn't help either. |
Which go version do you have locally? |
1.11 😬 Let me try downgrading to 1.10. |
Yes, 1.10 will do different indentations I learned today. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad to see much more user friendly error messages. Very nice to have everything on v2Reporter now. Thanks for doing all this cleanup. Left some minor comments.
The discussion around error reporting is more general for modules / metricbeat not only the stack modules. We can also delay it.
return | ||
} | ||
|
||
if m.MetricSet.XPack { | ||
eventMappingXPack(r, m, content) | ||
err = eventMappingXPack(r, m, content) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, xpack data structure cannot make sense of the errors.
for name, index := range indicesStruct.Indices { | ||
event := mb.Event{} | ||
event.MetricSetFields, err = schema.Apply(index) | ||
if err != nil { | ||
r.Error(err) | ||
errors = append(errors, err) | ||
errs = append(errs, errors.Wrap(err, "failure applying index schema")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should keep the old behaviour that we have 1 error per index. +1 on what andrew suggestions above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to reporting 1 error per event, but only in the non-xpack code paths (since reporting errors into xpack indices doesn't make sense, at least today). I will fix all non-xpack code paths in this PR to go back to reporting 1 error per event.
eventsMappingXPack(r, m, *info, content) | ||
err = eventsMappingXPack(r, m, *info, content) | ||
if err != nil { | ||
m.Log.Error(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should make debugging easier. I wonder if we should do it really on the Error level or only Info or Debug. Reasons is that I assume also when Elasticsearch is down, an Error is logged but it's not an error in the sense of the Error log I would argue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely don't think Info, because then it would get logged all the time? I can see the argument both ways for Error vs. Debug. I'd rather keep it at Error because we only log it if something has gone wrong. Whereas Debug logging tends to happen regardless, just to provide additional detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would info get logged all the time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I might've misunderstood. When you suggested using Info, I thought you meant something like this:
err = eventsMappingXPack(r, m, *info, content)
m.Log.Info(err)
Notice, no if
check.
But I guess we could still log at Info level while keeping the if
check. Still, feels a bit odd to me that we'd log errors at Info level. err
could contain "fake" errors like ES being down but err
could also contain a real error like a parsing error that happened inside eventsMappingXPack
.
@@ -38,23 +41,23 @@ func eventsMappingXPack(r mb.ReporterV2, m *MetricSet, content []byte) error { | |||
for indexName, indexData := range data { | |||
indexData, ok := indexData.(map[string]interface{}) | |||
if !ok { | |||
continue | |||
return fmt.Errorf("%v is not a map", indexName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a change in behaviour. Should we really skip all metrics if one is wrong? Same for the things below.
It is probably unlikely that one is wrong and the others are not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking with the new approach is:
- It's unlikely that one is wrong and others are not, and also
- In the case when some data might be erroneous, is it better to make a best effort or is it better to fail fast? I decided to go with the latter because I think it makes it easier to spot an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other way of looking at it: What if the data inside one index object is wiered and all the others are ok. It would mean we don't ship all the valid index data. Perhaps the solution is somewhere in between that we send just 1 error doc but still send the valid ones (I think you did that further down).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move forward with the way it is at the moment and fix it if it breaks.
} | ||
|
||
nodeInfo, err := elasticsearch.GetNodeInfo(m.HTTP, m.HostData().SanitizedURI+statsPath, "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this code obsolete or is that the one that will be fixed in an other PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's obsolete because the monitoring UI does not need source_node
for indices_stats
documents.
@@ -58,11 +60,12 @@ func eventsMapping(r mb.ReporterV2, content []byte) error { | |||
|
|||
event.MetricSetFields, err = schema.Apply(job) | |||
if err != nil { | |||
errs = append(errs, err) | |||
errs = append(errs, errors.Wrap(err, "failure applying ml job schema")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a change in behaviour as in case of an error, still an incomplete event was sent. We should discuss what our general expectation in modules here are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the same as https://github.com/elastic/beats/pull/8308/files#r222085637.
@@ -120,7 +121,8 @@ func eventsMapping(r mb.ReporterV2, content []byte) error { | |||
|
|||
event.MetricSetFields, err = schema.Apply(node) | |||
if err != nil { | |||
r.Error(err) | |||
errs = append(errs, errors.Wrap(err, "failure to apply node schema")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change in behaviour. Instead of 5 errors now only 1 overall error is sent. We should discuss what our expectation here is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | ||
|
||
dataFields, err := schema.Apply(data) | ||
if err != nil { | ||
r.Error(err) | ||
return errors.Wrap(err, "failure to apply stats schema") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ties into the discussion above on what our expecation on error is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, same as https://github.com/elastic/beats/pull/8308/files#r222085637. For non-xpack code paths (like this one here) I will update the PR to report one error per event.
) | ||
|
||
func TestFetch(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this is removed because of the reporter change. But I was not sure where this is covered now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I removed this because Fetch
no longer returns an event. And now this file is consistent with other Elastic stack integration test files; they also only have a TestData
function that ultimately exercises Fetch
via the mbtest.WriteEventsReporterV2
call in the test code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TestData is not run by default in our tests. What I did for the Elasticsearch module you can find here: https://github.com/elastic/beats/blob/master/metricbeat/module/elasticsearch/elasticsearch_integration_test.go It basically imports all the metricsets with a fetcher and then runs the tests automatically. We should this here too for all the ones where TestFetch
was removed.
@ruflin @andrewkroh I've attempted to address your review feedback around error reporting with events by making changes just to the Specifically, I've tried to establish the following pattern in this sample commit:
|
Normally I would ask you to split up this PR into multiple PR's. I think there are lots of parts that need no discussion and could be directly merged, like the additional log messages you added or introducing the logger. Like this a big part of the changes could already be merged and the discussion would be only holding back a small part of the PR. At the same time I see it would be quite a bit of work to extract the non controversial parts now. |
…#8497) (#8525) This PR makes the Metricbeat `logstash` module code consistent with other Elastic stack modules by: * using `FetchContent()` instead of `FetchJSON()` to fetch data from LS APIs before passing it to the event mapping function. * adding unit tests. Further consistency around event reporting and error handling will be achieved in #8308. (cherry picked from commit 909d789)
@ruflin Initially this PR started out small but then it grew because at the time of development I wasn't sure how far these changes will cascade. But now that all metricsets are done and we've identified the parts we like (e.g. additional log messages + logger) I'm happy to extract out those parts into a separate PR to keep things moving. That will also make the controversial parts easier to discuss. So I'm going to keep this PR open for the controversial parts, since there is discussion history building up here. But I will put up a separate PR(s) that will only handle better log messages and the logger. Let's get those "easy" PR(s) merged first, then come back to this PR (after rebasing it, etc.) for further discussion? |
Per my previous comment, I've teased out just the error message consistency improvements into their own PR: #8551. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the error logging / reporting model we have now in the Elasticsearch and Kibana Module. We should probably make that our standard when changing other modules.
event.MetricSetFields, err = schema.Apply(shard) | ||
if err != nil { | ||
errors = append(errors, err) | ||
event.Error = errors.Wrap(err, "failure applying shard schema") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that we report partial documents events even on failure. We should probably document this somewhere.
@@ -58,16 +59,14 @@ func eventsMapping(r mb.ReporterV2, info elasticsearch.Info, content []byte) err | |||
|
|||
err := json.Unmarshal(content, &indicesStruct) | |||
if err != nil { | |||
r.Error(err) | |||
return err | |||
return errors.Wrap(err, "failure parsing Elasticsearch Stats API response") | |||
} | |||
|
|||
var errors multierror.Errors | |||
for name, index := range indicesStruct.Indices { | |||
event := mb.Event{} | |||
event.MetricSetFields, err = schema.Apply(index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is now solved with the more resent commits?
err = eventsMapping(r, *info, content) | ||
if err != nil { | ||
r.Error(err) | ||
m.Log.Error(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would definitively nice to make this more generic.
Best in a follow up PR.
@@ -38,23 +41,23 @@ func eventsMappingXPack(r mb.ReporterV2, m *MetricSet, content []byte) error { | |||
for indexName, indexData := range data { | |||
indexData, ok := indexData.(map[string]interface{}) | |||
if !ok { | |||
continue | |||
return fmt.Errorf("%v is not a map", indexName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move forward with the way it is at the moment and fix it if it breaks.
…de (#8755) Cherry-pick of PR #8308 to 6.x branch. Original message: This PR makes error messages, reporting, and logging consistent across all metricsets for Elastic stack products, including the xpack monitoring code paths: * [x] `elasticsearch/ccr` * [x] `elasticsearch/cluster_stats` * [x] `elasticsearch/index` * [x] `elasticsearch/index_recovery` * [x] `elasticsearch/index_summary` * [x] `elasticsearch/ml_job` * [x] `elasticsearch/node` * [x] `elasticsearch/node_stats` * [x] `elasticsearch/pending_tasks` * [x] `elasticsearch/shard` * [x] `kibana/status` * [x] `kibana/stats` * [x] `logstash/node` * [x] `logstash/node_stats`
Cherry-pick of PR #9642 to 6.x branch. Original message: The cleanup in #8308 introduced a regression which caused the `xpack.enabled` flag in the `kibana` module to stop having any effect. This PR fixes that issue by reverting the troublesome bits from #8308, ensuring that the struct field for the parsed configuration setting is exported.
This PR makes error messages, reporting, and logging consistent across all metricsets for Elastic stack products, including the xpack monitoring code paths:
elasticsearch/ccr
elasticsearch/cluster_stats
elasticsearch/index
elasticsearch/index_recovery
elasticsearch/index_summary
elasticsearch/ml_job
elasticsearch/node
elasticsearch/node_stats
elasticsearch/pending_tasks
elasticsearch/shard
kibana/status
kibana/stats
logstash/node
logstash/node_stats