Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No activity logs and Agent remains in Enrolling state on installing Linux rpm #22296

Closed
ghost opened this issue Oct 30, 2020 · 15 comments
Closed

Comments

@ghost
Copy link

ghost commented Oct 30, 2020

Kibana version:
Kibana: 7.10 BC4 cloud environment

Elasticsearch version:
Elasticsearch: 7.10 BC4 cloud environment

Host OS and Browser version:
Linux , All

Original install method (e.g. download page, yum, from source, etc.):
7.10 BC4 cloud environment

Description
No activity logs and Agent remains in Enrolling state on installing Linux rpm

Preconditions

  1. 7.10 BC4 kibana cloud should be available.
  2. Login into the above Kibana environment(with commit 3f7bbe02a2f08f0ad1c7f5a9b35235a4aa066a32) with 'elastic' superuser.
  3. Endpoint integration should be added in Default Policy
  4. 7.10 BC4 Agent should be downloaded from below location:
    https://staging.elastic.co/7.10.0-c650b297/downloads/beats/elastic-agent/elastic-agent-7.10.0-x86_64.rpm

Steps to Reproduce

  1. SSH to the endpoint.
  2. Download tar file at /home/zeus location
  3. Run: sudo rpm -vi elastic-agent-7.10.0-x86_64.rpm
  4. Run: sudo elastic-agent enroll https://40bd2f19a46c45928e42e550581aee76.europe-west1.gcp.cloud.es.io:9243 cVFRYWVIVUJsQmQxUnpZTDBJWU06Uy1ibmZpTjhRLXE3dWdIUjdON1ViUQ==
  5. Run 'sudo systemctl enable elastic-agent'
  6. Run 'systemctl start elastic-agent'
  7. Navigate to Agents tab in Kibana
  8. Click on prerequisite host
  9. Observe that agent remains in enrolling status and no activity logs are displayed.

Test data
N/A

Impacted Test case id
https://elastic.testrail.io/index.php?/cases/view/33960

Actual Result
No activity logs and Agent remains in Enrolling state on installing Linux rpm

Expected Result
Agent in Online status and Activity logs should be displayed for Linux rpm agent installation on 7.10 BC4 Kibana cloud environment

What's working

  • Issue is not occurring with Linux deb installation

What's not working

  • N/A

Screenshot
rpm_test_putty

rpm_tests

Logs
N/A

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Oct 30, 2020
@ghost ghost added Team:Ingest Management and removed needs_team Indicates that the issue/PR needs a Team:* label labels Oct 30, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ingest-management (Team:Ingest Management)

@ghost ghost self-assigned this Oct 30, 2020
@ghost
Copy link
Author

ghost commented Oct 30, 2020

@vipulsrivastava-qasource Please review the bug.

@ghost
Copy link

ghost commented Oct 30, 2020

Reviewed and Assigned to @EricDavisX

@ghost ghost added high and removed impact:medium labels Oct 30, 2020
@EricDavisX
Copy link
Contributor

@ph @blakerouse did we somehow miss something on the .rpm side that is different from .deb or perhaps we can evaluate in more detail what is different on the vms as below?

@rahulgupta-qasource @vipulsrivastava-qasource can you update the process for bug logging to make sure we cite what OS a Beat / Agent issue is reported on? And for the near term, you could mention the explicit template in vSphere the vm is based on, if it is indeed an Endgame vSphere host. With this info we can assess what settings are set and what is different on this (presumably Centos) vm versus others. Thanks!

@ph
Copy link
Contributor

ph commented Oct 30, 2020

@EricDavisX that might be related to a specific distribution.

@ghost
Copy link
Author

ghost commented Nov 2, 2020

Hi @EricDavisX

Thank you for sharing the feedback.

The above issue is reported on centos-release-7-3.1611.el7.centos.x86_64 (VM ip: 10.6.227.3, VM Name: rahul_centos_7_3)
Screenshot:
centOS_6_3

Explicit template in vSphere for above VM[ip: 10.6.227.3 and name rahul_centos_7_3]: qa-centos7.3-x64-endpoint-clone-base
RPM

Moving forward, we will include these details while logging the agent/beat bugs.

@blakerouse
Copy link
Contributor

I explicitly tested and confirms the RPM to be fixed on Fedora 32. I did not test Centos 7, could be something different causing it to break in Centos.

I will install Centos and give it a test.

@EricDavisX
Copy link
Contributor

to add to config / repro info, SELinux is enabled on that system.

Last login: Mon Nov  2 04:39:32 2020 from vpneng-192-168-118-67.endgames.local
[zeus@atlcolo-10-6-227-3 ~]$ cat /etc/selinux/config

# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#     enforcing - SELinux security policy is enforced.
#     permissive - SELinux prints warnings instead of enforcing.
#     disabled - No SELinux policy is loaded.
SELINUX=enforcing
# SELINUXTYPE= can take one of three two values:
#     targeted - Targeted processes are protected,
#     minimum - Modification of targeted policy. Only selected processes are protected. 
#     mls - Multi Level Security protection.
SELINUXTYPE=targeted 

@blakerouse
Copy link
Contributor

I just tested the RPM installation on a fresh VM installed with Centos 7 x86_64. I hit no issues, it installed, started, enrolled, and reported correctly to Fleet with no issues.

@EricDavisX
Copy link
Contributor

did you have SELINUX enabled? @blakerouse

@EricDavisX
Copy link
Contributor

@blakerouse please let me know what other settings you think we could compare between the vms and I can introspect and report back. I don't recall if you got access to the Endgame vSphere system, but I know Michal does so he could help look at a given bug if easier for the team. @michalpristas

@blakerouse
Copy link
Contributor

@EricDavisX SELinux is enabled by default with CentOS 7. I verified just in-case and yes it is enabled. Still no issues.

@EricDavisX
Copy link
Contributor

Thanks Blake.

I see the below from systemctl that indicates it isn't a clean system. Which is a huge difference between the tests done by Blake and Rahul.

@rahulgupta-qasource did you know you had 7.9.3 Agent installed prior on that machine? I know we were doing 7.9.3 to 7.10 RPM Agent upgrade tests, we might have gotten environments mixed up or something. If you can try again on that same vm, but revert to clean (or clone it again) and let us know please? Also... if we were still doing work on the .rpm to .rpm upgrade (per last comment) here: #21200 we can log that separately if needed so it is clearer what use case is being tested?

[zeus@atlcolo-10-6-227-3 ~]$ systemctl status elastic-agent -l
● elastic-agent.service - Agent manages other beats based on configuration provided.
   Loaded: loaded (/usr/lib/systemd/system/elastic-agent.service; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2020-10-30 06:33:02 EDT; 3 days ago
     Docs: https://www.elastic.co/products/beats/elastic-agent
 Main PID: 8669 (elastic-agent)
   CGroup: /system.slice/elastic-agent.service
           ├─8669 /usr/share/elastic-agent/bin/elastic-agent --environment systemd -c /etc/elastic-agent/elastic-agent.yml --path.home /usr/share/elastic-agent --path.config /etc/elastic-agent --path.data /var/lib/elastic-agent --path.logs /var/log/elastic-agent
           ├─8676 /var/lib/elastic-agent/install/filebeat-7.9.3-linux-x86_64/filebeat -E setup.ilm.enabled=false -E setup.template.enabled=false -E management.mode=x-pack-fleet -E management.enabled=true -E logging.level=debug -E logging.level=debug -E http.enabled=true -E http.host=unix:///tmp/elastic-agent/default/filebeat/filebeat.sock -E logging.json=true -E logging.ecs=true -E logging.files.path=/var/lib/elastic-agent/logs/default -E logging.files.name=filebeat-json.log -E logging.files.keepfiles=7 -E logging.files.permission=0640 -E logging.files.interval=1h -E path.data=/var/lib/elastic-agent/run/default/filebeat--7.9.3
           ├─8683 /var/lib/elastic-agent/install/metricbeat-7.9.3-linux-x86_64/metricbeat -E setup.ilm.enabled=false -E setup.template.enabled=false -E management.mode=x-pack-fleet -E management.enabled=true -E logging.level=debug -E logging.level=debug -E http.enabled=true -E http.host=unix:///tmp/elastic-agent/default/metricbeat/metricbeat.sock -E logging.json=true -E logging.ecs=true -E logging.files.path=/var/lib/elastic-agent/logs/default -E logging.files.name=metricbeat-json.log -E logging.files.keepfiles=7 -E logging.files.permission=0640 -E logging.files.interval=1h -E path.data=/var/lib/elastic-agent/run/default/metricbeat--7.9.3
           ├─8688 /var/lib/elastic-agent/install/filebeat-7.9.3-linux-x86_64/filebeat -E setup.ilm.enabled=false -E setup.template.enabled=false -E management.mode=x-pack-fleet -E management.enabled=true -E logging.level=debug -E logging.level=debug -E http.enabled=true -E http.host=unix:///tmp/elastic-agent/default/filebeat/filebeat.sock_monitor -E logging.json=true -E logging.ecs=true -E logging.files.path=/var/lib/elastic-agent/logs/default -E logging.files.name=filebeat_monitor-json.log -E logging.files.keepfiles=7 -E logging.files.permission=0640 -E logging.files.interval=1h -E path.data=/var/lib/elastic-agent/run/default/filebeat--7.9.3--36643631373035623733363936343635
           └─8697 /var/lib/elastic-agent/install/metricbeat-7.9.3-linux-x86_64/metricbeat -E setup.ilm.enabled=false -E setup.template.enabled=false -E management.mode=x-pack-fleet -E management.enabled=true -E logging.level=debug -E logging.level=debug -E http.enabled=true -E http.host=unix:///tmp/elastic-agent/default/metricbeat/metricbeat.sock_monitor -E logging.json=true -E logging.ecs=true -E logging.files.path=/var/lib/elastic-agent/logs/default -E logging.files.name=metricbeat_monitor-json.log -E logging.files.keepfiles=7 -E logging.files.permission=0640 -E logging.files.interval=1h -E path.data=/var/lib/elastic-agent/run/default/metricbeat--7.9.3--36643631373035623733363936343635

Nov 02 11:02:30 atlcolo-10-6-227-3.qascale.eginternal.com elastic-agent[8669]: 2020-11-02T11:02:30.529-0500        DEBUG        kibana/client.go:170        Request method: POST, path: /api/ingest_manager/fleet/agents/806e038e-5848-4b92-9ab4-09c4aef08adc/checkin
Nov 02 11:02:30 atlcolo-10-6-227-3.qascale.eginternal.com elastic-agent[8669]: 2020-11-02T11:02:30.676-0500        ERROR        application/fleet_gateway.go:176        Could not communicate with Checking API will retry, error: Status code: 0, Kibana returned an error: , message: Deleted resource.
Nov 02 11:11:55 atlcolo-10-6-227-3.qascale.eginternal.com elastic-agent[8669]: 2020-11-02T11:11:55.602-0500        DEBUG        kibana/client.go:170        Request method: POST, path: /api/ingest_manager/fleet/agents/806e038e-5848-4b92-9ab4-09c4aef08adc/checkin
Nov 02 11:11:55 atlcolo-10-6-227-3.qascale.eginternal.com elastic-agent[8669]: 2020-11-02T11:11:55.809-0500        ERROR        application/fleet_gateway.go:176        Could not communicate with Checking API will retry, error: Status code: 0, Kibana returned an error: , message: Deleted resource.
Nov 02 11:19:42 atlcolo-10-6-227-3.qascale.eginternal.com elastic-agent[8669]: 2020-11-02T11:19:42.283-0500        DEBUG        kibana/client.go:170        Request method: POST, path: /api/ingest_manager/fleet/agents/806e038e-5848-4b92-9ab4-09c4aef08adc/checkin
Nov 02 11:19:42 atlcolo-10-6-227-3.qascale.eginternal.com elastic-agent[8669]: 2020-11-02T11:19:42.398-0500        ERROR        application/fleet_gateway.go:176        Could not communicate with Checking API will retry, error: Status code: 0, Kibana returned an error: , message: Deleted resource.
Nov 02 11:26:04 atlcolo-10-6-227-3.qascale.eginternal.com elastic-agent[8669]: 2020-11-02T11:26:04.242-0500        DEBUG        kibana/client.go:170        Request method: POST, path: /api/ingest_manager/fleet/agents/806e038e-5848-4b92-9ab4-09c4aef08adc/checkin
Nov 02 11:26:04 atlcolo-10-6-227-3.qascale.eginternal.com elastic-agent[8669]: 2020-11-02T11:26:04.388-0500        ERROR        application/fleet_gateway.go:176        Could not communicate with Checking API will retry, error: Status code: 0, Kibana returned an error: , message: Deleted resource.
Nov 02 11:33:32 atlcolo-10-6-227-3.qascale.eginternal.com elastic-agent[8669]: 2020-11-02T11:33:32.092-0500        DEBUG        kibana/client.go:170        Request method: POST, path: /api/ingest_manager/fleet/agents/806e038e-5848-4b92-9ab4-09c4aef08adc/checkin
Nov 02 11:33:32 atlcolo-10-6-227-3.qascale.eginternal.com elastic-agent[8669]: 2020-11-02T11:33:32.237-0500        ERROR        application/fleet_gateway.go:176        Could not communicate with Checking API will retry, error: Status code: 0, Kibana returned an error: , message: Deleted resource.
[zeus@atlcolo-10-6-227-3 ~]$ 

@ghost
Copy link
Author

ghost commented Nov 3, 2020

Hi Eric

Thank you for sharing the feedback.

We have validated the above scenario for 'Linux .rpm enroll-deploy of 7.9.3 BC1 Agent update it to 7.10.0 BC4 .rpm deploy with Endpoint' with CentOS 7 VM(10.0.7.240) and found it fixed.

Observations:
User is successfully able to enroll 7.10 BC4 rpm agent with endpoint on upgrading 7.9.3 Bc1 kibana to 7.10.0 BC4 kibana.

We have executed the testcase under TestRun https://elastic.testrail.io/index.php?/runs/view/791

Screenshot:
after_kibana_upgrade

Hence, we are closing this bug.

@ghost ghost closed this as completed Nov 3, 2020
@ghost
Copy link
Author

ghost commented Nov 25, 2020

Bug Conversion:

Testcase already exists for this ticket.
https://elastic.testrail.io/index.php?/cases/view/33960

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants