-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fleet] Task to publish Agent metrics #168435
Conversation
🤖 GitHub commentsExpand to view the GitHub comments
Just comment with:
|
x-pack/plugins/fleet/server/services/metrics/fleet_metrics_task.ts
Outdated
Show resolved
Hide resolved
x-pack/plugins/fleet/server/services/metrics/fetch_agent_metrics.ts
Outdated
Show resolved
Hide resolved
x-pack/plugins/fleet/server/services/metrics/fetch_agent_metrics.ts
Outdated
Show resolved
Hide resolved
x-pack/plugins/fleet/server/services/metrics/fetch_agent_metrics.ts
Outdated
Show resolved
Hide resolved
Some FTR errors happening due to
Raised a pr to add delete privilege (and read if we want to add tests to read back the metrics): elastic/elasticsearch#100684 EDIT: even after adding the delete privilege it doesn't seem to work, will have to debug why It seems I left out |
Pinging @elastic/fleet (Team:Fleet) |
This looks good to me for now. Thanks for wiring that up! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me - thanks for addressing previous review comments 🚀
})); | ||
} catch (error) { | ||
if (error.statusCode === 404) { | ||
appContextService.getLogger().debug('Index .fleet-agents does not exist yet.'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this worth logging at another level so we can see it in serverless dashboards, etc? Not sure how common this is or if it's helpful in production debugging to know when we're swallowing these errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to avoid logging too much, as this task runs every minute. I think we can leave as debug for now and change later if needed. Probably we would realize anyway if there are no agents.
@juliaElastic I think even if we allow to install fleet_server we should probably not to create an agent policy with fleet server inside in serverless, edit: actually not sure it will really cause a problem as it's not possible to add fleet server hosts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Config changes LGTM
@elastic/response-ops Hey team, sorry for the direct ping, could you review this pr? |
// disable fleet task that writes to metrics.fleet_server.* data streams, impacting functional tests | ||
`--xpack.task_manager.unsafe.exclude_task_types=${JSON.stringify(['Fleet-Metrics-Task'])}`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please explain the impact on functional tests? What happens without this setting?
Asking because we don't want to add this kind of configuration to serverless tests outside the feature flag testing. The reason for that is, that when we create a real serverless project in MKI, this setting would not be applied but the tests would still need to pass there (the config files are only controlling the local setup).
So if serverless tests are failing without this setting, then they would most probably still fail when run as part of our release gates on an MKI project and thus we'd need a different solution here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that the same test is still failing even when disabling this task, so it's not the root cause of the issue, I can revert it.
It seems that my changes are not related to the test failing, should I skip it to pass the build? It already has an open issue: #166592
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted the config change and skipped the failing test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a different failure this time than reported in #166592.
Also, I don't see this test failing on the main branch recently, so there's a good chance the failure is really related this the changes in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found this label though saying it fails in MKI:
kibana/x-pack/test_serverless/functional/test_suites/observability/cases/attachment_framework.ts
Lines 23 to 25 in 7a6826b
describe('Cases persistable attachments', function () { | |
// security_exception: action [indices:data/write/delete/byquery] is unauthorized for user [elastic] with effective roles [superuser] on restricted indices [.kibana_alerting_cases], this action is granted by the index privileges [delete,write,all] | |
this.tags(['failsOnMKI']); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also found the same (?) test skipped under security
:
kibana/x-pack/test_serverless/functional/test_suites/security/ftr/cases/attachment_framework.ts
Lines 21 to 23 in 6d88fb5
// Failing | |
// Issue: https://github.com/elastic/kibana/issues/165135 | |
describe.skip('Cases persistable attachments', () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is skipped for different reasons. I am not in favor of skipping tests outside of the standard process but given that I am working on them I am okay with this being skipped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, in that case we can move on here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test config changes LGTM
💛 Build succeeded, but was flaky
Failed CI StepsMetrics [docs]
History
To update your PR or re-run it, just comment with: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed skipping the test is fine! Let's wait for the @elastic/response-ops-execution to take a look on the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New task LGTM from @elastic/response-ops-execution's perspective 👍
Summary
Closes https://github.com/elastic/ingest-dev/issues/2396
Added a new kibana task that publishes Agent metrics every minute to data streams installed by fleet_server package.
Opened the pr for review, there are a few things to finalize, but the core logic won't change much.
To test locally:
metrics-*
index pattern, filter ondata_stream.dataset: fleet_server.*
fleet_server.agent_status
andfleet_server.agent_versions
datasets.Checklist