-
Notifications
You must be signed in to change notification settings - Fork 42
feat: add first scenario for Fleet Server #900
feat: add first scenario for Fleet Server #900
Conversation
This type will hold information about Fleet config, supporting building the proper flags during enrollment
The flags used for installing/enrolling an agent will be generated from the new FleetConfig struct. Because of that, we are moving a pointer to that fleet config to the install command
💔 Tests Failed
Expand to view the summary
Build stats
Test stats 🧪
Trends 🧪Test errors
Expand to view the tests failures
|
Test | Results |
---|---|
Failed | 5 |
Passed | 118 |
Skipped | 0 |
Total | 123 |
Genuine test errors 
💔 There are test failures but not known flaky tests, most likely a genuine test failure.
- Name:
Initializing / End-To-End Tests / ubuntu-18.04_fleet_fleet_mode_agent / [empty] – TEST-fleet.xml
- Name:
Initializing / End-To-End Tests / ubuntu-18.04_fleet_backend_processes / [empty] – TEST-fleet.xml
- Name:
Initializing / End-To-End Tests / ubuntu-18.04_fleet_fleet_server / [empty] – TEST-fleet.xml
- Name:
Initializing / End-To-End Tests / ubuntu-18.04_fleet_agent_endpoint_integration / [empty] – TEST-fleet.xml
- Name:
Initializing / End-To-End Tests / ubuntu-18.04_fleet_stand_alone_agent / [empty] – TEST-fleet.xml
@start-fleet-server | ||
Scenario Outline: Deploying the <os> fleet-server agent | ||
When a "<os>" agent is deployed to Fleet with "tar" installer in fleet-server mode | ||
Then Fleet server is enabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@EricDavisX @blakerouse I wrote this Then
clause, but I'd like to know if there is another assertion that should be done here. Something like: the elastic-agent process is started
, or the Fleet app in Kibana shows "FooBar" in the Fleet page
, or elasticsearch contains THIS doc in THAT index
. Preferredly queried by an API call.
Could you help me here in writing the right expected behaviour?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not too familiar with the structure of these files so I might not be fully understanding the context.
I believe your asking if its possible to known if the Fleet Server is running correct. The simplest way is to check that the Agent is reported Healthy in Kibana. That might seem to simple but the only way for the Agent running a Fleet Server to show in Kibana as healthy is if it can communicate to its local Fleet Server and that Fleet Server can write to elasticsearch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never mind about the code structure yet, I'm still interested in the behavior of the product without considering internal details/implementations.
With that in mind:
- what ES query should we write to verify that?
- is it enough to check the healthy status for the promoted-to-server agent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check can be the same as for all Agents, that it is listed as 'healthy' in the Agents API list call. A secondary check could be to assess if the Fleet Server process is running on the host, as noted in this example:
edavis-mbp:elastic-agent-8.0.0-SNAPSHOT-darwin-x86_64-infra-build edavis$ ps ax | grep elastic
SNAPSHOT-darwin-x86_64/fleet-server --agent-mode -E logging.level=info -E http.enabled=true -E http.host=unix:///Library/Elastic/Agent/data/tmp/default/fleet-server/fleet-server.sock -E logging.json=true -E logging.ecs=true -E logging.files.path=/Library/Elastic/Agent/data/elastic-agent-53d75c/logs/default -E logging.files.name=fleet-server-json.log -E logging.files.keepfiles=7 -E logging.files.permission=0640 -E logging.files.interval=1h -E path.data=/Library/Elastic/Agent/data/elastic-agent-53d75c/run/default/fleet-server--8.0.0-SNAPSHOT
This example is from macOS, but the the process name is the same.
@@ -374,8 +375,8 @@ func (i *TARPackage) Preinstall() error { | |||
|
|||
// simplify layout | |||
cmds := [][]string{ | |||
[]string{"rm", "-fr", "/elastic-agent"}, | |||
[]string{"mv", fmt.Sprintf("/%s-%s-%s-%s", i.artifact, i.version, i.OS, i.arch), "/elastic-agent"}, | |||
{"rm", "-fr", "/elastic-agent"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove some leftovers: the type is automatically inferred by the Go compiler
This is wonderful Manu. Thank you. I will review expectations with the team if we don't hear feedback. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good!
What is needed to land it?
so that the Agent is able to communicate with them | ||
|
||
@start-fleet-server | ||
Scenario Outline: Deploying the <os> fleet-server agent enables Fleet Server |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is surely because of my lack of understanding of how fleet-server works, but I don't find this phrasing very clear: there is no such a thing as "fleet-server agent", right? There is an Elastic Agent that can start a Fleet Server process: shouldn't this be in the scenario definition?
Since "Fleet" is such a ubiquitous term, it might be worth to be more explicit every time is it used: Is it Fleet API, Fleet UI, etc.
Related: since this spec is meant to be semi-formal, and we put a lot of effort in standarisation elsewhere (linting, etc), It would be good to define components canonically. Eg., always "Elastic Agent", and not a combination of "Elastic Agent", "Agent", "the agent", etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I strongly agree with this. 💯
What about sending the refactors in a follow-up? Let's create an issue to standardise the names across Fleet test suite. Maybe @EricDavisX can help with the wording.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to the name, yes. It could be rephrased. Let me add a suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another note about this pr vs a separate one... we have a new feature file which is helpful, but to me, it is mostly helpful in confirming we have the code right here, with a straightforward usage. We'll need to adapt (now or in a separate PR) all of the:
Given a "" agent is deployed to Fleet with "tar" installer
to
When a "" agent is deployed to Fleet with "tar" installer in fleet-server mode
...because any non-Fleet server usage (in Fleet) will not be supported. The only non-Fleet-Server usage will be as a stand-alone Agent mode.
So, all the following feature files will need review and update, at least in some way (listing them out explicitly):
e2e/_suites/fleet/features/agent_endpoint_integration.feature
e2e/_suites/fleet/features/backend_processes.feature
e2e/_suites/fleet/features/linux_integration.feature
e2e/_suites/fleet/features/fleet_mode_agent.feature
The stand_alone_agent file is the exception:
e2e/_suites/fleet/features/stand_alone_agent.feature
- we have opportunity there to improve the stand-alone test to include Docker usage with Fleet Server. And it may be a great and easy way to spin up a 2nd Agent to connect to the first (the first Agent also running the Fleet Server, and the 2nd being a 'normal' Agent)
So, knowing this now, I'm not sure if we want to keep this first PR small and merge it so we have some passing scenario knowing the others will fail, or if we'll want to push forward with slightly larger impact across more files after we see it working.
e2e/_suites/fleet/fleet.go
Outdated
} | ||
|
||
err = installer.InstallFn(containerName, token) | ||
var fleetConfig *FleetConfig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't actually tried, but I think it would be a bit simpler to have just 1 kind of config and 1 way of creating it, eg: func NewAgentConfig(token, fleetServerMode bool) (*AgentConfig, error)
Credentials, url and port can also be hardcoded in the only place they are used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, let me send a follow-up commit with that, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented in f5a9f46
@EricDavisX @adam-stokes @jalvz This PR is failing because the way we are bootstrapping the initial elastic-agent is wrong. We would need an example of the canonical command that does that: When I want to bootstrap the fleet server in a host
Then I run the FOO command with FLAGS and VALUES As a follow-up, once this is merged and passing, we'd like to add scenarios for another agent using the bootstrapped fleet-server. Something like: Given a fleet-server already bootstrapped
When I enroll a second agent in Fleet using the fleet-server
Then the agent is "online" in Fleet |
According to the documentation we should be starting up the environment, copying over the https://www.elastic.co/guide/en/fleet/7.12/run-elastic-agent-standalone.html That should do the bootstrapping and allow us to continue. We will also have to workaround elastic/beats#24950 until that is fixed as well |
@adam-stokes You should not need a custom |
there is a lot to unpack here, I wanted to finally chime in more with some thoughts. This comment from you @mdelapenya to @jalvz I wanted to clarify some bits:
I think the libbeat code is not the same as the Agent - and we can't test all or much of Agent by hitting libbeat itself or any libbeat / beat (only) installer.
Indeed there are many different artifacts all of which we want to test. Docker is just ONE way to run the stand-alone Agent, but any artifact can be used with a config that does NOT include Fleet usage and it will be a 'stand-alone' mode Agent. We are using the Docker container as our stand-alone test, because it was easy to do so (and we just happened to code it up first, as such). We could expand the Docker Container tests to NOT be stand-alone mode and this would be a nice enhancement, but we have so many priorities, it has not gotten high in the list yet. Also wanted to call out my prior comment that all Agents (except Stand-alone mode Agents) will need to connect to a Fleet Server to send data. Coming ASAP that will be the requirement, so we'll have to consider making each test (except for Stand-alone mode if easier to leave it alone for now) to be a Fleet-Server using test. The implication there too is that the 'real' test for Fleet Server is to have a second container or Agent connect to a separate Fleet Server (Agent) to include that communication part of the architecture (which was formerly in Kibana but is now on the edge, on the host, in Fleet Server, running as part of Agent). |
I have not seen it working end to end yet but we're close and I think just waiting on a new build from Infa, so by the time this is read on Wednesday Apr 14 it should work! Unless there are more bugs. So, the code I know to bootstrap the Agent, including Fleet Server process, using 'install' command is: Then, with a healthy Fleet Server (Agent) running, the code to install a non-Fleet Server Agents to use it is:
NOTE, if we are using insecure communications between the Agent(s) and Fleet Server we can remove the 's' in the https and use the below parameter at the end of the command: Also - there is a new Fleet level param to be set in a Kibana API, we should already be using this, but it can be extended to include the Fleet Server URL (planning ahead for whichever host we want to use - if we don't do this, no other host can connect): |
The enrollment token for the Elastic Agent not running Fleet Server should be for the Default policy, not the Fleet Server default policy. |
Ah, thank you Blake! I updated the note in line above. |
* master: chore: add debug info for the payload (elastic#1044) chore: add debug traces for the webhook payload (elastic#1043) fix: wrong interpolation (elastic#1042) Update Elastic Agent to not use Kibana (elastic#1036) fix: apply X version for non-master branches (elastic#1037) fix: add NodeJS to PATH (elastic#1035) fix: use an agent when building kibana (elastic#1030) fix(jjb): use a branch that exists (elastic#1029) remove uninstall step (elastic#1017) fix: delay checking stale agent version until it's used (elastic#1016) fix: use same JJB than in custom kibana (elastic#1010) chore: simplify PR template (elastic#1011) feat: support passing KIBANA_VERSION (elastic#905) [mergify] assign the original author (elastic#1009) Remove the agent config file parameters for stand alone (elastic#983) Uniquify the stand-alone step for checking agent status (elastic#993)
I've seen this error when running the following command:
I think you mean |
Also running this command starts the agent but not bootstrapped: /elastic-agent/elastic-agent install \
--force --fleet-server-insecure-http --fleet-server-host elasticsearch --fleet-server-port 9200 |
The last build had a check-in to change the parameter from fleet-server to fleet-server-es. that much I know. I am trying to gather communication about pending / coming changes but it is a challenging effort as we didn't have Infra builds to validate with until today. |
@mdelapenya @EricDavisX is correct it is now |
@@ -15,5 +15,6 @@ xpack.fleet.enabled: true | |||
xpack.fleet.registryUrl: http://package-registry:8080 | |||
xpack.fleet.agents.enabled: true | |||
xpack.fleet.agents.elasticsearch.host: http://elasticsearch:9200 | |||
xpack.fleet.agents.fleetServerEnabled: true | |||
xpack.fleet.agents.kibana.host: http://kibana:5601 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might need xpack.fleet.agents.fleet_server.hosts: [http://?:5601]
instead here with the most recent builds. Probably worth rerunning.
* chore: capture Fleet's default policy in a stronger manner * chore: support passing the field for is_default policy * chore: remove inferred type for array * chore: enable fleet server in kibana config * chore: create fleet config struct This type will hold information about Fleet config, supporting building the proper flags during enrollment * chore: refactor enroll command logic to use the new struct * chore: check if the fleet-server field exists when retrieving the policy * chore: refactor install to support fleet-server The flags used for installing/enrolling an agent will be generated from the new FleetConfig struct. Because of that, we are moving a pointer to that fleet config to the install command * feat: add first scenario for fleet server * chore: add fleet server branch to the CI * chore: set Then clause for the scenario * chore: remove step * fix: define fallback when checking agent status * chore: simplify creation of Fleet configs * fix: forgot to rename variable * WIP * chore: rename scenario * fix: wrong merge conflicts resolution * chore: support passing environment when running a command in a container * chore: run elastic agent commands passing an env * WIP * chore: separate bootstrapping an agent from connecting to a fleet server agent * fix: use proper fleet-server flags Co-authored-by: Adam Stokes <[email protected]> (cherry picked from commit a150734)
* chore: capture Fleet's default policy in a stronger manner * chore: support passing the field for is_default policy * chore: remove inferred type for array * chore: enable fleet server in kibana config * chore: create fleet config struct This type will hold information about Fleet config, supporting building the proper flags during enrollment * chore: refactor enroll command logic to use the new struct * chore: check if the fleet-server field exists when retrieving the policy * chore: refactor install to support fleet-server The flags used for installing/enrolling an agent will be generated from the new FleetConfig struct. Because of that, we are moving a pointer to that fleet config to the install command * feat: add first scenario for fleet server * chore: add fleet server branch to the CI * chore: set Then clause for the scenario * chore: remove step * fix: define fallback when checking agent status * chore: simplify creation of Fleet configs * fix: forgot to rename variable * WIP * chore: rename scenario * fix: wrong merge conflicts resolution * chore: support passing environment when running a command in a container * chore: run elastic agent commands passing an env * WIP * chore: separate bootstrapping an agent from connecting to a fleet server agent * fix: use proper fleet-server flags Co-authored-by: Adam Stokes <[email protected]> (cherry picked from commit a150734) Co-authored-by: Manuel de la Peña <[email protected]>
* master: v2 refactor (elastic#1008) fix: use a version of kibana with the fix for fleet-server validations (elastic#1055) feat: add first scenario for Fleet Server (elastic#900) fix: do not use GT_REPO variable, use fixed repo name instead (elastic#1049)
What does this PR do?
This PR adds a feature file for Fleet Server, adding one scenario checking that when an agent is deployed with Fleet Server mode, then Fleet Server is enabled.
To allow that, we had to do some refactors, explained below:
xpack.fleet.agents.fleetServerEnabled: true
is_default
andis_default_fleet_server
attributes in the policy.flags()
method returning an array of flags to use in the install/enroll commands. Finally, this structure will be passed to the different command to be reused, specially theEnrollmentToken
.deployAgentWithInstaller
method, so that it's reusable for existing scenarios, and the new one for fleet server.It's important to notice that the final step in the new scenario is not finished, it's returning a
Pending
error. This is because, at the moment of sending this PR, it's not clear to me how to checck that Fleet Server has been enabled/deployed successfully.Why is it important?
It will bring the first scenario for fleet server, also improving code base health after refactors.
Checklist
make notice
in the proper directory)Author's Checklist
How to test this PR locally
Related issues