Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Default Policy Specific]: Error in agent installed with Default policy having System and Endpoint Integration. #93910

Closed
amolnater-qasource opened this issue Mar 8, 2021 · 15 comments
Assignees
Labels
bug Fixes for quality problems that affect the customer experience needs-team Issues missing a team label v7.13.0

Comments

@amolnater-qasource
Copy link

amolnater-qasource commented Mar 8, 2021

Kibana version: Kibana: 7.12.0 Snapshot Cloud environment

Host OS and Browser version: Windows 10, All

Build Details:

    Artifact link used: https://snapshots.elastic.co/7.12.0-07994102/downloads/beats/elastic-agent/elastic-agent-7.12.0-SNAPSHOT-windows-x86_64.zip
    Build: 39220
    Commit: 10e035085b6ab636765b76500803c9db19492c8b

Preconditions:

  1. 7.12.0 Snapshot Cloud environment should be available.
  2. RDP Test signing should be "ON".
  3. Agent must be installed with default policy having System(v 0.10.9) and Endpoint Security(v 0.18.0) Integration.

Actual Result:

  • Agent status is "Unhealthy".
  • No Data under agent Logs.
  • No data under data streams tab other than System metrics[No data for elastic_agent and endpoint].
  • Agent is not available under Administration tab.
  • No filebeat folder observed at C:\Program Files\Elastic\Agent\data\elastic-agent-7fa781\install location.

Screenshots Kibana UI:
Logs and Data

Screenshots Installation Directory:
Installation folders

Logs:

{"log.level":"error","@timestamp":"2021-03-08T04:03:20.875-0500","log.origin":{"file.name":"application/fleet_gateway.go","file.line":185},"message":"failed to dispatch actions, error: operator: failed to execute step sc-run, error: operation 'Exec' failed: : operation 'Exec' failed: ","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-03-08T04:03:22.556-0500","log.origin":{"file.name":"stateresolver/stateresolver.go","file.line":48},"message":"New State ID is gbyT8Ylc","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-03-08T04:03:22.556-0500","log.origin":{"file.name":"stateresolver/stateresolver.go","file.line":49},"message":"Converging state requires execution of 4 step(s)","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-03-08T04:03:22.557-0500","log.origin":{"file.name":"operation/operation_fetch.go","file.line":61},"message":"metricbeat.7.12.0-SNAPSHOT already exists in C:\\Program Files\\Elastic\\Agent\\data\\elastic-agent-7fa781\\downloads\\metricbeat-7.12.0-SNAPSHOT-windows-x86_64.zip. Skipping operation operation-fetch","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-03-08T04:03:22.575-0500","log.origin":{"file.name":"operation/operation_fetch.go","file.line":61},"message":"metricbeat.7.12.0-SNAPSHOT already exists in C:\\Program Files\\Elastic\\Agent\\data\\elastic-agent-7fa781\\downloads\\metricbeat-7.12.0-SNAPSHOT-windows-x86_64.zip. Skipping operation operation-fetch","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-03-08T04:03:23.630-0500","log.origin":{"file.name":"operation/operator.go","file.line":259},"message":"operation 'operation-install' skipped for metricbeat.7.12.0-SNAPSHOT","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-03-08T04:03:23.630-0500","log.origin":{"file.name":"operation/operator.go","file.line":259},"message":"operation 'operation-start' skipped for metricbeat.7.12.0-SNAPSHOT","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-03-08T04:03:23.633-0500","log.origin":{"file.name":"operation/operation_fetch.go","file.line":61},"message":"endpoint-security.7.12.0-SNAPSHOT already exists in C:\\Program Files\\Elastic\\Agent\\data\\elastic-agent-7fa781\\downloads\\endpoint-security-7.12.0-SNAPSHOT-windows-x86_64.zip. Skipping operation operation-fetch","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2021-03-08T04:03:23.638-0500","log.origin":{"file.name":"operation/operation_fetch.go","file.line":61},"message":"endpoint-security.7.12.0-SNAPSHOT already exists in C:\\Program Files\\Elastic\\Agent\\data\\elastic-agent-7fa781\\downloads\\endpoint-security-7.12.0-SNAPSHOT-windows-x86_64.zip. Skipping operation operation-fetch","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2021-03-08T04:03:26.113-0500","log.origin":{"file.name":"application/fleet_gateway.go","file.line":185},"message":"failed to dispatch actions, error: operator: failed to execute step sc-run, error: operation 'Exec' failed: : operation 'Exec' failed: ","ecs.version":"1.6.0"}

Below attached Default Policy:
elastic-agent.zip

Below attached Agent Logs:
elastic-agent-json.log

@amolnater-qasource amolnater-qasource added the bug Fixes for quality problems that affect the customer experience label Mar 8, 2021
@amolnater-qasource
Copy link
Author

@manishgupta-qasource Please review.

@amolnater-qasource amolnater-qasource changed the title [Default Policy Specific]: Repeated error logs are generated for agent installed with Default policy having System Integration. [Default Policy Specific]: Repeated error logs are generated for agent installed with Default policy having System and Endpoint Integration. Mar 8, 2021
@manishgupta-qasource
Copy link

Reviewed & assigned to @EricDavisX

@manishgupta-qasource manishgupta-qasource added the impact:medium Addressing this issue will have a medium level of impact on the quality/strength of our product. label Mar 8, 2021
@manishgupta-qasource manishgupta-qasource changed the title [Default Policy Specific]: Repeated error logs are generated for agent installed with Default policy having System and Endpoint Integration. [Default Policy Specific]: Error in agent installed with Default policy having System and Endpoint Integration. Mar 8, 2021
@manishgupta-qasource manishgupta-qasource added impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. and removed impact:medium Addressing this issue will have a medium level of impact on the quality/strength of our product. labels Mar 8, 2021
@EricDavisX
Copy link
Contributor

I'm checking if the snapshot build had succeeded over the weekend for 7.12

@EricDavisX
Copy link
Contributor

EricDavisX commented Mar 8, 2021

looks like we had builds on March 6, 7, and 8 that passed so the fix for Windows Agent should have been picked up. This would appear to be new then? And urgent still for 7.12 - the BC4 is being build shortly (next day or so?).

Updating: confirming too that the hash cited for the artifact in test relates to the build from March 7:
https://internal-ci.elastic.co/view/Unified%20Release%20-%20snapshot/job/elastic+unified-release+master+distributed-snapshot-multijob-7.12/42/

@blakerouse
Copy link

We need to improve that log message to include the output of the executed command so we can see the true error. At the moment this is too vague for us to really know why Endpoint is failing to run.

@EricDavisX
Copy link
Contributor

@amolnater-qasource @dikshachauhan-qasource we can re-test this on BC4 to see if it is any better, and report more logs.

@blakerouse you intend on merging the logging enhancement to 7.12 or no? We can test on 7.13 (which is merged) if we need more data.

@blakerouse
Copy link

@EricDavisX It will go to 7.12. Waiting on the PR to pass testing to merge.

@amolnater-qasource
Copy link
Author

Hi @EricDavisX
We have revalidated this issue on 7.12.0 Kibana cloud environment and found it fixed.

Observations:

  • Agent status is "Healthy".
  • Data is generating for both System and Endpoint Security under Data Streams tab.

Build details:
Artifact link: https://staging.elastic.co/7.12.0-336ff10d/downloads/beats/elastic-agent/elastic-agent-7.12.0-windows-x86_64.zip
Build: 39242
Commit: 99ac38d

Logs:
elastic-agent-json.log

Screenshot:
17

Hence closing this out.
Thanks!!

@EricDavisX
Copy link
Contributor

EricDavisX commented Mar 10, 2021

@amolnater-qasource @dikshachauhan-qasource I think the PR that was going in was just to increase the logs. We don't know what was truly going on in this case yet to know how to fix it. So, I will re-open but since it does not seem consistently reproducible we will have to wait and see if it shows again. Removing from 7.12 concerns list.

@EricDavisX EricDavisX reopened this Mar 10, 2021
@EricDavisX EricDavisX removed the impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. label Mar 10, 2021
@EricDavisX EricDavisX assigned blakerouse and unassigned EricDavisX Mar 10, 2021
@dikshachauhan-qasource
Copy link

Hi @EricDavisX

Thanks for the update on this.

We will keep following it and update with our observations if found it reproducible.

Thanks
QAS

@amolnater-qasource
Copy link
Author

Hi @EricDavisX
Today while testing on 7.13.0 Snapshot Kibana Cloud environment we had more observations for Agent installation errors with Endpoint.

Steps to reproduce:

  1. Install Agent with "New Policy" having only System integration.
  2. Observe agent in "Healthy" state.
  3. Add "Endpoint Security" integration to "New Policy".
  4. Observe agent going to "Unhealthy" state.

Further we attempted to restart the endpoint and after restart agent gets back to "Healthy" state.

Build details:

Artifact Link: https://snapshots.elastic.co/7.13.0-b51da292/downloads/beats/elastic-agent/elastic-agent-7.13.0-SNAPSHOT-windows-x86_64.zip
Build: 40169
Commit: 17fafeffdea652380ddfb4cb2701613cfad46d5e

Agent Logs:
Agent Logs.zip

Endpoint Logs:
endpoint-000000.log

Screenshots:
Agent Status

Thanks
QAS

@EricDavisX
Copy link
Contributor

If it required a re-start of the Endpoint that is interesting - let us re-test this in the context of Fleet-Server and see if it remains an issue? tomorrow's build hopefully will allow this easily and we can move forward.

@amolnater-qasource
Copy link
Author

Hi @EricDavisX

As per feedback please find debug level logs, attached below:
Win64 debug logs.zip

Build details:

BUILD: 40237
COMMIT: 8b60488a58eeae6be1b994bf6be0110d39e8924c

Further we have observed "Unhealthy" agent issue only specific to Windows 10 x32 bit on 8.0 Kibana Cloud environment. This issue was not resolved when we restarted the agent and looks like a different issue. Hence we have logged another ticket at #25108

Thanks
QAS

@botelastic botelastic bot added the needs-team Issues missing a team label label Apr 20, 2021
@ph
Copy link
Contributor

ph commented Apr 28, 2021

@amolnater-qasource is this still an issue?

@ph ph added v7.13.0 and removed v7.12.0 labels Apr 28, 2021
@dikshachauhan-qasource
Copy link

Hi @ph

We are unable to validate it on latest 7.13 BC3 available as default policy is currently not available on cloud created builds. However, we have validated this issue with user created policies and found them working fine.

As discussed with @EricDavisX , we closing it for now and will re-open if found it reproducible able on 7.13.

Just for a note, we have found this issue only reproducible earlier with default policies.

Thanks
QAS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience needs-team Issues missing a team label v7.13.0
Projects
None yet
Development

No branches or pull requests

7 participants