Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Elastic Agent] Enroll with Fleet Server #23865

Merged
merged 7 commits into from
Feb 16, 2021

Conversation

blakerouse
Copy link
Contributor

@blakerouse blakerouse commented Feb 4, 2021

What does this PR do?

This adds the ability to enroll the Elastic Agent with Fleet Server executed locally on the same machine. To get this work a few things needed to be added to Elastic Agent.

  • Wire in the status.Controller to the socket control protocol.
  • Add new bootstrap operating mode that just runs a Fleet Server.
  • Add a new bootstrap of Fleet Server before the start of the Fleet Gateway (in managed mode w/ Fleet Server)

Note: This has a breaking change in the parameters to enroll. kibana_url and enrollment_token move to being parameters instead of positional arguments. This makes install and enroll take the same parameters, and closes #21897.

Why is it important?

So that Fleet Server can be bootstrapped on a machine with Fleet Server also running on that same machine.

How does it work?

The enroll command handles the coordination of controlling the running Elastic Agent daemon. The install command proxies to the enroll command so this can be ran from the install or from the DEB/RPM.

Breakdown of the steps that are completed to handle the bootstrap:

  1. Enroll must be executed with --fleet-server parameter. This parameter is a connection string for Fleet Server to communicate to elasticsearch. (Example: --fleet-server http://elastic:changeme@localhost:9200)
  2. Enroll ensures that it can communicate with a running Elastic Agent. (This requires that Elastic Agent to be running).
  3. Enroll writes the fleet.yml with fleet.server configuration, with fleet.server.bootstrap: true.
  4. Enroll triggers the Elastic Agent to restart (causing re-execution)
  5. Elastic Agent is re-executed into the Fleet Server bootstrap mode.
  6. Elastic Agent starts the Fleet Server passing it the configuration.
  7. Enroll polls the status GRPC of the Elastic Agent until Fleet Server is started and is in degraded state (should be degraded, because the Elastic Agent is not enrolled yet).
  8. Enroll performs the enrollment against newly running Fleet Server.
  9. Enroll writes a new fleet.yml with enrollment information and the fleet.server information. The fleet.server.bootstrap is removed (aka. False).
  10. Enroll triggers the Elastic Agent to restart (causing re-execution)
  11. Elastic Agent is re-executed into the Fleet mode.
  12. Elastic Agent starts up the Fleet Server before starting the Fleet Gateway communication (because fleet.server is set in the fleet.yml).
  13. Elastic Agent then starts the Fleet Gateway communication
  14. Elastic Agent is now executing and being controlled through the locally running Fleet Server.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

  • Normall enrollment without Fleet Server works as expected
  • Enrollment with --fleet-server works.

How to test this PR locally

Run the latest 8.0.0-SNAPSHOT of elasticsearch and Kibana. Start Kibana with the xpack.fleet.agents.fleetServerEnabled: true.

Add the Fleet Server integration to a policy.

Look up the policy ID (as this is currently needed until a default policy for Fleet Server is added to Kibana).

Start Elastic Agent.

Run the following command to bootstrap and enroll the Elastic Agent.

./elastic-agent enroll --insecure --url http://localhost:8000 --enrollment-token {token} --fleet-server http://elastic:changeme@localhost:9200 --fleet-server-policy {policy_id}

Related issues

@blakerouse blakerouse added the Team:Elastic-Agent Label for the Agent team label Feb 4, 2021
@blakerouse blakerouse self-assigned this Feb 4, 2021
@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Feb 4, 2021
@blakerouse blakerouse marked this pull request as ready for review February 4, 2021 21:00
@elasticmachine
Copy link
Collaborator

Pinging @elastic/agent (Team:Agent)

@elasticmachine
Copy link
Collaborator

elasticmachine commented Feb 4, 2021

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: Pull request #23865 updated

  • Start Time: 2021-02-16T13:54:09.912+0000

  • Duration: 51 min 56 sec

  • Commit: b78c551

Test stats 🧪

Test Results
Failed 0
Passed 6534
Skipped 16
Total 6550

Trends 🧪

Image of Build Times

Image of Tests

💚 Flaky test report

Tests succeeded.

Expand to view the summary

Test stats 🧪

Test Results
Failed 0
Passed 6534
Skipped 16
Total 6550


// enroll should use localhost as fleet-server is now running
// it must also restart
c.options.URL = "http://localhost:8000"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hardcoded?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes as it will communicate with the Fleet Server locally. There is currently not a way to setup SSL or run it on a different port through the enroll command.

Definitely things we need to look into, but not in this PR. This is just enough to get it up and running.

}

if c.options.NoRestart {
return err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: return nil

if c.daemonReload(ctx) != nil {
c.log.Info("Elastic Agent might not be running; unable to trigger restart")
}
c.log.Info("Successfully triggered restart on running Elastic Agent.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: thought we agreed on lower-case logs. as long as it is consistent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was in Fleet Server repository. This is more consistent in Elastic Agent, even though I am not a fan of it.

c.log.Info("Elastic Agent might not be running; unable to trigger restart")
}
c.log.Info("Successfully triggered restart on running Elastic Agent.")
return err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: return nil

}

func (c *EnrollCmd) fleetServerBootstrap(ctx context.Context) error {
c.log.Debug("verifying communication with running elastic-agent daemon")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: consistency

running Elastic Agent

and

running elastic-agent

in the same file

// Degraded status means something minor is preventing agent to work properly.
Degraded
// Failed status means agent is unable to work properly.
Failed
)

var (
humanReadableStatuses = map[AgentStatus]string{
humanReadableStatuses = map[AgentStatusCode]string{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this map
and use:

func (s AgentStatusCode) String() string {
    return []string{"online", "degraded", "error"}[s]
}

it's more idiomatic

notifyChangeFunc: r.updateStatus,
}

r.lock.Lock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: r.mx.Lock() maybe

}
rep.lock.Unlock()
i++
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: shorter, no i++

	apps := make([]AgentApplicationStatus, 0, len(r.appReporters))
	for key, rep := range r.appReporters {
		rep.lock.Lock()
		apps = append(apps, AgentApplicationStatus{
			ID:      key,
			Name:    rep.name,
			Status:  rep.status,
			Message: rep.message,
		})
		rep.lock.Unlock()
	}
	```

@@ -125,6 +195,19 @@ func (r *controller) updateStatus() {
break
}
}
if status != Failed {
for id, rep := range r.appReporters {
s := statusToAgentStatus(rep.status)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you were using locks above

rep.lock.Lock()

but not here

status = s
}

r.log.Debugf("'%s' has status '%s'", id, humanReadableStatuses[s])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once you change the enum code above, then could do just

r.log.Debugf("'%s' has status '%s'", id, s)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/ingest-management (Team:Ingest Management)

@blakerouse blakerouse force-pushed the enroll-with-fleet-server branch from 7d57806 to 50966c6 Compare February 11, 2021 17:45
@blakerouse
Copy link
Contributor Author

/package

@nchaulet
Copy link
Member

@blakerouse Do we have to specify the policy id if we use and enrollment token (the token should contain a policy id)
Also the enrollment token is optionnal if we use the default fleet server policy right?

@blakerouse
Copy link
Contributor Author

@nchaulet At the moment you need both, if we could simplify it to only one that would be better.

@nchaulet
Copy link
Member

@nchaulet At the moment you need both, if we could simplify it to only one that would be better.

Yes I think we can simplify it an enrollment key is always linked to a policy so it could work without the policy id

@blakerouse blakerouse merged commit ae0f29e into elastic:master Feb 16, 2021
@blakerouse blakerouse deleted the enroll-with-fleet-server branch February 16, 2021 14:56
blakerouse added a commit to blakerouse/beats that referenced this pull request Feb 16, 2021
* Add test and changelog.

* Add ability to enroll through a local Fleet Server started by the running Elastic Agent daemon.

* Fix tests.

* Fix changelog.

* Fixes from code review.

* Cleanup from merge into master.

(cherry picked from commit ae0f29e)
blakerouse added a commit that referenced this pull request Feb 16, 2021
* Add test and changelog.

* Add ability to enroll through a local Fleet Server started by the running Elastic Agent daemon.

* Fix tests.

* Fix changelog.

* Fixes from code review.

* Cleanup from merge into master.

(cherry picked from commit ae0f29e)
v1v added a commit to v1v/beats that referenced this pull request Feb 17, 2021
…-arm

* upstream/master:
  [CI] install docker-compose with retry (elastic#24069)
  Add nodes to filebeat-kubernetes.yaml ClusterRole - fixes elastic#24051 (elastic#24052)
  updating manifest files for filebeat threatintel module (elastic#24074)
  Add Zeek Signatures (elastic#23772)
  Update Beats to ECS 1.8.0 (elastic#23465)
  Support running Docker logging plugin on ARM64 (elastic#24034)
  Fix ec2 metricset fields.yml and add integration test (elastic#23726)
  Only build targz and zip versions of Beats if PACKAGES is set in agent (elastic#24060)
  [Filebeat] Add field definitions for known Netflow/IPFIX vendor fields (elastic#23773)
  [Elastic Agent] Enroll with Fleet Server (elastic#23865)
  [Filebeat] Convert logstash logEvent.action objects to strings (elastic#23944)
  [Ingest Management] Fix reloading of log level for services (elastic#24055)
  Add Agent standalone k8s manifest (elastic#23679)
v1v added a commit to v1v/beats that referenced this pull request Feb 17, 2021
…dows-7

* upstream/master: (332 commits)
  Use ECS v1.8.0 (elastic#24086)
  Add support for postgresql csv logs (elastic#23334)
  [Heartbeat] Refactor config system (elastic#23467)
  [CI] install docker-compose with retry (elastic#24069)
  Add nodes to filebeat-kubernetes.yaml ClusterRole - fixes elastic#24051 (elastic#24052)
  updating manifest files for filebeat threatintel module (elastic#24074)
  Add Zeek Signatures (elastic#23772)
  Update Beats to ECS 1.8.0 (elastic#23465)
  Support running Docker logging plugin on ARM64 (elastic#24034)
  Fix ec2 metricset fields.yml and add integration test (elastic#23726)
  Only build targz and zip versions of Beats if PACKAGES is set in agent (elastic#24060)
  [Filebeat] Add field definitions for known Netflow/IPFIX vendor fields (elastic#23773)
  [Elastic Agent] Enroll with Fleet Server (elastic#23865)
  [Filebeat] Convert logstash logEvent.action objects to strings (elastic#23944)
  [Ingest Management] Fix reloading of log level for services (elastic#24055)
  Add Agent standalone k8s manifest (elastic#23679)
  [Metricbeat][Kubernetes] Extend state_node with more conditions (elastic#23905)
  [CI] googleStorageUploadExt step (elastic#24048)
  Check fields are documented for aws metricsets (elastic#23887)
  Update go-concert to 0.1.0 (elastic#23770)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Elastic-Agent Label for the Agent team v7.12.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Elastic Agent] Inconsistency for enroll and install command
4 participants