[Fleet] Task to publish Agent metrics #168435

juliaElastic · 2023-10-10T08:49:55Z

Summary

Closes https://github.com/elastic/ingest-dev/issues/2396

Added a new kibana task that publishes Agent metrics every minute to data streams installed by fleet_server package.

Opened the pr for review, there are a few things to finalize, but the core logic won't change much.

To test locally:

Install fleet_server package 1.4.0 from this pr to get the mappings
Start kibana locally, wait for a few minutes for the metrics task to run (every minute)
Go to discover, metrics-* index pattern, filter on data_stream.dataset: fleet_server.*
Expect data to be populated in fleet_server.agent_status and fleet_server.agent_versions datasets.

Checklist

Unit or functional tests were updated or added to match the most common scenarios

apmmachine · 2023-10-10T08:50:11Z

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

/oblt-deploy : Deploy a Kibana instance using the Observability test environments.
/oblt-deploy-serverless : Deploy a serverless Kibana instance using the Observability test environments.
run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

x-pack/plugins/fleet/server/services/metrics/fleet_metrics_task.ts

…fix'

x-pack/plugins/fleet/server/services/metrics/fetch_agent_metrics.ts

x-pack/plugins/fleet/server/services/metrics/fleet_metrics_task.ts

juliaElastic · 2023-10-11T12:45:05Z

Some FTR errors happening due to kibana_system not having access to delete the new fleet_server data streams. I might have to add delete privilege as well.


└- ✖ fail: Dev Tools Search Profiler Editor No indices "before all" hook for "returns error if profile is executed with no valid indices"
--
  | │      ResponseError: illegal_argument_exception
  | │ 	Root causes:
  | │ 		illegal_argument_exception: index [.ds-metrics-fleet_server.agent_status-default-2023.10.11-000001] is the write index for data stream [metrics-fleet_server.agent_status-default] and cannot be deleted

Raised a pr to add delete privilege (and read if we want to add tests to read back the metrics): elastic/elasticsearch#100684

EDIT: even after adding the delete privilege it doesn't seem to work, will have to debug why

It seems I left out delete_index privilege, as the FTR tests try to delete the index. Opened a pr to fix it and give all privilege: elastic/elasticsearch#100764

elasticmachine · 2023-10-11T14:04:27Z

Pinging @elastic/fleet (Team:Fleet)

kpollich · 2023-10-12T13:10:20Z

Let me know if this is good enough for now until we have more general support for "asset only" packages.

This looks good to me for now. Thanks for wiring that up!

kpollich

Looks great to me - thanks for addressing previous review comments 🚀

kpollich · 2023-10-12T13:13:03Z

x-pack/plugins/fleet/server/services/metrics/fetch_agent_metrics.ts

+    }));
+  } catch (error) {
+    if (error.statusCode === 404) {
+      appContextService.getLogger().debug('Index .fleet-agents does not exist yet.');


Is this worth logging at another level so we can see it in serverless dashboards, etc? Not sure how common this is or if it's helpful in production debugging to know when we're swallowing these errors.

I want to avoid logging too much, as this task runs every minute. I think we can leave as debug for now and change later if needed. Probably we would realize anyway if there are no agents.

nchaulet · 2023-10-12T14:19:12Z

@juliaElastic I think even if we allow to install fleet_server we should probably not to create an agent policy with fleet server inside in serverless, edit: actually not sure it will really cause a problem as it's not possible to add fleet server hosts

nchaulet

LGTM 🚀

jloleysens

Config changes LGTM

juliaElastic · 2023-10-17T08:07:17Z

@elastic/response-ops Hey team, sorry for the direct ping, could you review this pr?

pheyos · 2023-10-18T09:07:41Z

x-pack/test_serverless/shared/config.base.ts

+        // disable fleet task that writes to metrics.fleet_server.* data streams, impacting functional tests
+        `--xpack.task_manager.unsafe.exclude_task_types=${JSON.stringify(['Fleet-Metrics-Task'])}`,


Can you please explain the impact on functional tests? What happens without this setting?

Asking because we don't want to add this kind of configuration to serverless tests outside the feature flag testing. The reason for that is, that when we create a real serverless project in MKI, this setting would not be applied but the tests would still need to pass there (the config files are only controlling the local setup).
So if serverless tests are failing without this setting, then they would most probably still fail when run as part of our release gates on an MKI project and thus we'd need a different solution here.

It seems that the same test is still failing even when disabling this task, so it's not the root cause of the issue, I can revert it.

It seems that my changes are not related to the test failing, should I skip it to pass the build? It already has an open issue: #166592

reverted the config change and skipped the failing test

It's a different failure this time than reported in #166592.
Also, I don't see this test failing on the main branch recently, so there's a good chance the failure is really related this the changes in this PR.

I found this label though saying it fails in MKI:

kibana/x-pack/test_serverless/functional/test_suites/observability/cases/attachment_framework.ts

Lines 23 to 25 in 7a6826b

describe('Cases persistable attachments', function () {

// security_exception: action [indices:data/write/delete/byquery] is unauthorized for user [elastic] with effective roles [superuser] on restricted indices [.kibana_alerting_cases], this action is granted by the index privileges [delete,write,all]

this.tags(['failsOnMKI']);

@pheyos The failures on this PR seem unrelated to the issue. FWIW, I am working on this PR #168924 to fix some issues around the Cases Serverless tests.

Also found the same (?) test skipped under security:

kibana/x-pack/test_serverless/functional/test_suites/security/ftr/cases/attachment_framework.ts

Lines 21 to 23 in 6d88fb5

// Failing

// Issue: https://github.com/elastic/kibana/issues/165135

describe.skip('Cases persistable attachments', () => {

Yes, this is skipped for different reasons. I am not in favor of skipping tests outside of the standard process but given that I am working on them I am okay with this being skipped.

Alright, in that case we can move on here.

pheyos

Test config changes LGTM

kibana-ci · 2023-10-18T10:57:30Z

💛 Build succeeded, but was flaky

Buildkite Build
Commit: b0add0c

Failed CI Steps

Defend Workflows Cypress Tests on Serverless #2

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @juliaElastic

cnasikas

As we discussed skipping the test is fine! Let's wait for the @elastic/response-ops-execution to take a look on the PR.

mikecote

New task LGTM from @elastic/response-ops-execution's perspective 👍

fleet metrics task

aa8f2a6

juliaElastic added the release_note:feature Makes this part of the condensed release notes label Oct 10, 2023

juliaElastic self-assigned this Oct 10, 2023

juliaElastic mentioned this pull request Oct 10, 2023

[Fleet] added privileges to write metrics-fleet_server* elastic/elasticsearch#100574

Merged

added upgrading_step aggregation

0627b36

juliaElastic commented Oct 10, 2023

View reviewed changes

x-pack/plugins/fleet/server/services/metrics/fleet_metrics_task.ts Outdated Show resolved Hide resolved

kibanamachine and others added 4 commits October 10, 2023 13:22

[CI] Auto-commit changed files from 'node scripts/lint_ts_projects --…

bf746e9

…fix'

fixed test

c302b2f

Merge branch 'main' into fleet-metrics

82f89b9

calculating unhealthy_reason

6ac0546

juliaElastic commented Oct 10, 2023

View reviewed changes

x-pack/plugins/fleet/server/services/metrics/fetch_agent_metrics.ts Outdated Show resolved Hide resolved

juliaElastic commented Oct 10, 2023

View reviewed changes

x-pack/plugins/fleet/server/services/metrics/fetch_agent_metrics.ts Outdated Show resolved Hide resolved

juliaElastic commented Oct 10, 2023

View reviewed changes

x-pack/plugins/fleet/server/services/metrics/fetch_agent_metrics.ts Outdated Show resolved Hide resolved

juliaElastic added 2 commits October 11, 2023 10:11

enabled write to es, populating generic fields

a6345e9

fix import

0389572

juliaElastic mentioned this pull request Oct 11, 2023

added metrics data streams to fleet_server elastic/integrations#8145

Merged

4 tasks

juliaElastic and others added 2 commits October 11, 2023 11:07

fixed mock

3fe4073

Merge branch 'main' into fleet-metrics

7e3ce02

juliaElastic commented Oct 11, 2023

View reviewed changes

x-pack/plugins/fleet/server/services/metrics/fleet_metrics_task.ts Show resolved Hide resolved

added unit tests

df49e79

juliaElastic marked this pull request as ready for review October 11, 2023 12:41

juliaElastic requested review from a team as code owners October 11, 2023 12:41

kpollich self-requested a review October 11, 2023 12:43

juliaElastic mentioned this pull request Oct 11, 2023

[Fleet] added read and delete privilege elastic/elasticsearch#100684

Merged

removed unhealthy_reason for now

29e8671

botelastic bot added the Team:Fleet Team label for Observability Data Collection Fleet team label Oct 11, 2023

kpollich approved these changes Oct 12, 2023

View reviewed changes

updated task version to 1.0.0

b1c3240

nchaulet approved these changes Oct 12, 2023

View reviewed changes

Merge branch 'main' into fleet-metrics

c4ba9d2

jloleysens approved these changes Oct 16, 2023

View reviewed changes

Merge branch 'main' into fleet-metrics

93a6e7e

juliaElastic and others added 5 commits October 17, 2023 10:15

fixed test

eeae806

Merge branch 'main' into fleet-metrics

e866a71

adding fleet_server package in oblt and security projects only

b038d5d

disable fleet task in functional tests

f1de97b

disable fleet task

8d439cc

juliaElastic requested a review from a team as a code owner October 17, 2023 12:41

Merge branch 'main' into fleet-metrics

2af5949

juliaElastic requested a review from kobelb October 18, 2023 08:53

pheyos reviewed Oct 18, 2023

View reviewed changes

revert serverless disable task, skip failing test

b0add0c

juliaElastic requested a review from pheyos October 18, 2023 09:23

pheyos approved these changes Oct 18, 2023

View reviewed changes

cnasikas approved these changes Oct 18, 2023

View reviewed changes

cnasikas self-requested a review October 18, 2023 11:15

mikecote approved these changes Oct 18, 2023

View reviewed changes

juliaElastic merged commit 0350f17 into elastic:main Oct 18, 2023

kibanamachine added v8.12.0 backport:skip This commit does not require backporting labels Oct 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fleet] Task to publish Agent metrics #168435

[Fleet] Task to publish Agent metrics #168435

juliaElastic commented Oct 10, 2023 •

edited

Loading

apmmachine commented Oct 10, 2023

juliaElastic commented Oct 11, 2023 •

edited

Loading

elasticmachine commented Oct 11, 2023

kpollich commented Oct 12, 2023

kpollich left a comment

kpollich Oct 12, 2023

juliaElastic Oct 12, 2023

nchaulet commented Oct 12, 2023 •

edited

Loading

nchaulet left a comment

jloleysens left a comment

juliaElastic commented Oct 17, 2023

pheyos Oct 18, 2023

juliaElastic Oct 18, 2023

juliaElastic Oct 18, 2023

pheyos Oct 18, 2023

juliaElastic Oct 18, 2023

cnasikas Oct 18, 2023 •

edited

Loading

juliaElastic Oct 18, 2023

cnasikas Oct 18, 2023

pheyos Oct 18, 2023

pheyos left a comment

kibana-ci commented Oct 18, 2023

cnasikas left a comment •

edited

Loading

mikecote left a comment

		// disable fleet task that writes to metrics.fleet_server.* data streams, impacting functional tests
		`--xpack.task_manager.unsafe.exclude_task_types=${JSON.stringify(['Fleet-Metrics-Task'])}`,

	describe('Cases persistable attachments', function () {
	// security_exception: action [indices:data/write/delete/byquery] is unauthorized for user [elastic] with effective roles [superuser] on restricted indices [.kibana_alerting_cases], this action is granted by the index privileges [delete,write,all]
	this.tags(['failsOnMKI']);

	// Failing
	// Issue: https://github.com/elastic/kibana/issues/165135
	describe.skip('Cases persistable attachments', () => {

[Fleet] Task to publish Agent metrics #168435

[Fleet] Task to publish Agent metrics #168435

Conversation

juliaElastic commented Oct 10, 2023 • edited Loading

Summary

Checklist

apmmachine commented Oct 10, 2023

🤖 GitHub comments

juliaElastic commented Oct 11, 2023 • edited Loading

elasticmachine commented Oct 11, 2023

kpollich commented Oct 12, 2023

kpollich left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nchaulet commented Oct 12, 2023 • edited Loading

nchaulet left a comment

Choose a reason for hiding this comment

jloleysens left a comment

Choose a reason for hiding this comment

juliaElastic commented Oct 17, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cnasikas Oct 18, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pheyos left a comment

Choose a reason for hiding this comment

kibana-ci commented Oct 18, 2023

💛 Build succeeded, but was flaky

Failed CI Steps

Metrics [docs]

History

cnasikas left a comment • edited Loading

Choose a reason for hiding this comment

mikecote left a comment

Choose a reason for hiding this comment

juliaElastic commented Oct 10, 2023 •

edited

Loading

juliaElastic commented Oct 11, 2023 •

edited

Loading

nchaulet commented Oct 12, 2023 •

edited

Loading

cnasikas Oct 18, 2023 •

edited

Loading

cnasikas left a comment •

edited

Loading