[8.6] Fleet Usage telemetry extension (#145353) #146105

kibanamachine · 2022-11-23T09:27:46Z

Backport

This will backport the following commits from main to 8.6:

Fleet Usage telemetry extension (#145353)

Questions ?

Please refer to the Backport tool documentation

@jlind23

## Summary Closes elastic/ingest-dev#1261 Added a snippet to the telemetry that I added for each requirement. Please review and let me know if any changes are needed. Also asked a few questions below. @jlind23 @kpollich 6. is blocked by [elasticsearch change](elastic/elasticsearch#91701) to give kibana_system the missing privilege to read logs-elastic_agent* indices. Took inspiration for task versioning from https://github.com/elastic/kibana/pull/144494/files#diff-0c7c49bf5c55c45c19e9c42d5428e99e52c3a39dd6703633f427724d36108186 - [x] 1. Elastic Agent versions Versions of all the Elastic Agent running: `agent.version` field on `.fleet-agents` documents ``` "agent_versions": [ "8.6.0" ], ``` - [x] 2. Fleet server configuration Think we can query for `.fleet-policies` where some `input` has `type: 'fleet-server'` for this, as well as use the `Fleet Server Hosts` settings that we define via saved objects in Fleet ``` "fleet_server_config": { "policies": [ { "input_config": { "server": { "limits.max_agents": 10000 }, "server.runtime": "gc_percent:20" } } ] } ``` - [x] 3. Number of policies Count of `.fleet-policies` index To confirm, did we mean agent policies here? ``` "agent_policies": { "count": 7, ``` - [x] 4. Output type contained in those policies Collecting this from ts logic, querying from `.fleet-policies` index. The alternative would be to write a painless script (because the `outputs` are an object with dynamic keys, we can't do an aggregation directly). ``` "agent_policies": { "output_types": [ "elasticsearch" ] } ``` Did we mean to just collect the types here, or any other info? e.g. output urls - [x] 5. Average number of checkin failures We only have the most recent checkin status and timestamp on `.fleet-agents`. Do we mean here to publish the total last checkin failure count? E.g. 3 if 3 agents are in failure checkin status currently. Or do we mean to publish specific info for all agents (`last_checkin_status`, `last_checkin` time, `last_checkin_message`)? Are the only statuses `error` and `degraded` that we want to send? ``` "agent_last_checkin_status": { "error": 0, "degraded": 0 }, ``` - [ ] 6. Top 3 most common errors in the Elastic Agent logs Do we mean here elastic-agent logs only, or fleet-server logs as well (maybe separately)? I found an alternative way to query the message field using sampler and categorize text aggregation: ``` GET logs-elastic_agent*/_search { "size": 0, "query": { "bool": { "must": [ { "term": { "log.level": "error" } }, { "range": { "@timestamp": { "gte": "now-1h" } } } ] } }, "aggregations": { "message_sample": { "sampler": { "shard_size": 200 }, "aggs": { "categories": { "categorize_text": { "field": "message", "size": 10 } } } } } } ``` Example response: ``` "aggregations": { "message_sample": { "doc_count": 112, "categories": { "buckets": [ { "doc_count": 73, "key": "failed to unenroll offline agents", "regex": ".*?failed.+?to.+?unenroll.+?offline.+?agents.*?", "max_matching_length": 36 }, { "doc_count": 7, "key": """stderr panic close of closed channel n ngoroutine running Stop ngithub.com/elastic/beats/v7/libbeat/cmd/instance Beat launch.func5 \n\t/go/src/github.com/elastic/beats/libbeat/cmd/instance/beat.go n ``` - [x] 7. Number of checkin failure over the past period of time I think this is almost the same as elastic#5. The difference would be to report new failures happened only in the last hour, or report all agents in failure state. (which would be an increasing number if the agent stays in failed state). Do we want these 2 separate telemetry fields? EDIT: removed the last1hr query, instead added a new field to report agents enrolled per policy (top 10). See comments below. ``` "agent_checkin_status": { "error": 3, "degraded": 0 }, "agents_per_policy": [2, 1000], ``` - [x] 8. Number of Elastic Agent and number of fleet server This is already there in the existing telemetry: ``` "agents": { "total_enrolled": 0, "healthy": 0, "unhealthy": 0, "offline": 0, "total_all_statuses": 1, "updating": 0 }, "fleet_server": { "total_enrolled": 0, "healthy": 0, "unhealthy": 0, "offline": 0, "updating": 0, "total_all_statuses": 0, "num_host_urls": 1 }, ``` ### Checklist - [ ] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios Co-authored-by: Kibana Machine <[email protected]> (cherry picked from commit e00e26e)

elasticmachine · 2022-11-23T09:27:53Z

Pinging @elastic/fleet (Team:Fleet)

kibana-ci · 2022-11-23T10:31:27Z

💚 Build Succeeded

Buildkite Build
Commit: 6a97c71

Metrics [docs]

Unknown metric groups

ESLint disabled in files

id	before	after	diff
`osquery`	1	2	+1

ESLint disabled line counts

id	before	after	diff
`enterpriseSearch`	19	21	+2
`fleet`	59	65	+6
`osquery`	108	113	+5
`securitySolution`	442	448	+6
total			+19

Total ESLint disabled count

id	before	after	diff
`enterpriseSearch`	20	22	+2
`fleet`	68	74	+6
`osquery`	109	115	+6
`securitySolution`	519	525	+6
total			+20

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @juliaElastic

kibanamachine assigned juliaElastic Nov 23, 2022

kibanamachine added the backport label Nov 23, 2022

kibanamachine enabled auto-merge (squash) November 23, 2022 09:27

kibanamachine mentioned this pull request Nov 23, 2022

Fleet Usage telemetry extension #145353

Merged

9 tasks

botelastic bot added the Team:Fleet Team label for Observability Data Collection Fleet team label Nov 23, 2022

kibanamachine merged commit 7b99f4c into elastic:8.6 Nov 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[8.6] Fleet Usage telemetry extension (#145353) #146105

[8.6] Fleet Usage telemetry extension (#145353) #146105

kibanamachine commented Nov 23, 2022

elasticmachine commented Nov 23, 2022

kibana-ci commented Nov 23, 2022

ESLint disabled in files

ESLint disabled line counts

Total ESLint disabled count

[8.6] Fleet Usage telemetry extension (#145353) #146105

[8.6] Fleet Usage telemetry extension (#145353) #146105

Conversation

kibanamachine commented Nov 23, 2022

Backport

Questions ?

elasticmachine commented Nov 23, 2022

kibana-ci commented Nov 23, 2022

💚 Build Succeeded

Metrics [docs]

ESLint disabled in files

ESLint disabled line counts

Total ESLint disabled count