Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] Define onboarding flow for fleet-server #89396

Closed
ruflin opened this issue Jan 27, 2021 · 18 comments
Closed

[Fleet] Define onboarding flow for fleet-server #89396

ruflin opened this issue Jan 27, 2021 · 18 comments
Assignees
Labels
Feature:Fleet Fleet team's agent central management project Feature:fleet-server Team:Fleet Team label for Observability Data Collection Fleet team

Comments

@ruflin
Copy link
Contributor

ruflin commented Jan 27, 2021

Enrolling an Elastic Agent with fleet-server requires a slightly different enrollment / setup command then just a normal Elastic Agent. Especially on prem, the first thing the user needs to do to get started with Fleet is enrolling an Elastic Agent with fleet-server to enroll further Agents.

@ruflin ruflin added Feature:Fleet Fleet team's agent central management project Team:Fleet Team label for Observability Data Collection Fleet team Feature:fleet-server labels Jan 27, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Feature:Fleet)

@elasticmachine
Copy link
Contributor

Pinging @elastic/ingest-management (Team:Ingest Management)

@ruflin
Copy link
Contributor Author

ruflin commented Jan 27, 2021

@ph @mostlyjason @blakerouse ^

@mostlyjason
Copy link
Contributor

mostlyjason commented Jan 28, 2021

@blakerouse @ph can you help me understand a few things:

  1. What steps does the user need to execute to add his first agent, and the second one? Would be great to include command line examples and any configuration snippets needed, with constraints. It'd be great to have it in a format that can we can hand to Dede for a docs page.
  2. How do these steps need to synchronize with the UI? For example, should the user click the "enable central management" button in kibana before adding his first agent? For self-managed clusters, do they also need to set the elasticsearch output URL first? I imagine they need to add a fleet server before they can enroll any other agents? Can we add smart defaults for these [Fleet] Smarter defaults for global settings on self-managed clusters #86992?
  3. What steps does the user need to configure the elastic stack, such as setting up username/password, setting permissions, configuring the fleet server URL in kibana, adding the fleet server package, etc?
  4. Do we have a Github issue for a centralized agent policy type to prevent users from accidentally adding fleet server to all their endpoints? What change do we need for the package spec to mark integrations as centralized type?
  5. For self-managed clusters, should we add a default centralized agent policy or generate one automatically?
  6. What steps does a user of 7.11 fleet need to complete to migrate his agents to fleet server? (We can treat this as a separate issue if needed)
  7. How does this work with TLS? How do we bootstrap TLS? How do we generate certificates?
  8. How does it work for apm-server and stack monitoring on self-managed agents?
  9. Any other differences I'm missing?

@ph
Copy link
Contributor

ph commented Feb 10, 2021

@blakerouse and @nchaulet could you help to fill up the blank in #89396 (comment), maybe we already have that information somewhere else.

@blakerouse
Copy link

@blakerouse @ph can you help me understand a few things:

  1. What steps does the user need to execute to add his first agent, and the second one? Would be great to include command line examples and any configuration snippets needed, with constraints. It'd be great to have it in a format that can we can hand to Dede for a docs page.

In the simplest case the command-line becomes this:

./elastic-agent install -f --enrollment-token {token} --fleet-server http://elastic:changeme@localhost:9200

The http://elastic:changeme@localhost:9200 is just the example of the connection string. This would be the actual connection string to the elasticsearch. This string must include a username/password because Fleet Server needs user-based authentication for the API key creation when an Elastic Agent enrolls.

You will notice that the --kibana-url flag goes away. That is only in the case that the --fleet-server is present, because with --fleet-server forces the Elastic Agent to connect to the locally spawned Fleet Server over localhost.

For an enrollment of an Elastic Agent without Fleet Server it becomes.

./elastic-agent install -f --url http://{ip-address-of-fleet-server}:8000 --enrollment-token {token}

You will notice here that --kibana-url has became --url this is because Elastic Agent is not talking to Kibana any more. --kibana-url still exists its just a deprecated flag --url is the same and takes precedence over --kibana-url if both are provided.

  1. How do these steps need to synchronize with the UI? For example, should the user click the "enable central management" button in kibana before adding his first agent? For self-managed clusters, do they also need to set the elasticsearch output URL first? I imagine they need to add a fleet server before they can enroll any other agents? Can we add smart defaults for these [Fleet] Smarter defaults for global settings on self-managed clusters #86992?

Fleet Server needs a user account to authenicate with elasticsearch. This account needs to have the correct permissions to create the required API keys.

I think the flow would be that they user first clicks "enable central management" button, which will create this user, then on enrollment of the first Agent the --fleet-server {connection-str} will be included in the enrollment fly out.

  1. What steps does the user need to configure the elastic stack, such as setting up username/password, setting permissions, configuring the fleet server URL in kibana, adding the fleet server package, etc?

I would hope we could get Kibana to perform these operations for the user. They could obviously be done manually, but I think it would best to automate it with the "enable central management" button.

  1. Do we have a Github issue for a centralized agent policy type to prevent users from accidentally adding fleet server to all their endpoints? What change do we need for the package spec to mark integrations as centralized type?

I do not have the answer to this. But adding the Fleet Server integration to a policy, will not break Elastic Agent. Elastic Agent will not start Fleet Server even if that integration is in the policy unless that Elastic Agent was installed with --fleet-server flag.

  1. For self-managed clusters, should we add a default centralized agent policy or generate one automatically?

I think for both self-managed and cloud, we should create a default centralized agent policy. It should just include the Fleet Server integration.

  1. What steps does a user of 7.11 fleet need to complete to migrate his agents to fleet server? (We can treat this as a separate issue if needed)

I think we should discuss that in a separate issue.

  1. How does this work with TLS? How do we bootstrap TLS? How do we generate certificates?

See elastic/fleet-server#90

  1. Any other differences I'm missing?

Not that I see at the moment.

@mostlyjason
Copy link
Contributor

Thanks @blakerouse!

Do we have a Github issue for a centralized agent policy type to prevent users from accidentally adding fleet server to all their endpoints? What change do we need for the package spec to mark integrations as centralized type?

Ruflin wrote up a policy types document. I think we added a managed policy type to restrict what users could change on cloud. It also describes central and regular types that would be nice to include. @ph is that already done or scheduled?

Elastic Agent will not start Fleet Server even if that integration is in the policy unless that Elastic Agent was installed with --fleet-server flag.

It'd be good to mark the agent as unhealthy and throw an error if the agent has a fleet server integration without the required flag.

@ph
Copy link
Contributor

ph commented Feb 17, 2021

Ruflin wrote up a policy types document. I think we added a managed policy type to restrict what users could change on cloud. It also describes central and regular types that would be nice to include. @ph is that already done or scheduled?

This is not scheduled or done.

But that would be a new type, than the one described in the doc?

@mostlyjason
Copy link
Contributor

@ph thanks I added the central and regular policy types to our roadmap for later. I think they are defined in Ruflin's doc.

@ph
Copy link
Contributor

ph commented Feb 22, 2021

Discussion:

Do we need the button for enabling Fleet for self-managed? Yes, we keep it for 7.13.
Would be Fleet Server be on by default cloud? Yes. (APM is on by default)
We will need to test that the Fleet and Fleet Server can be turned off?

Questions to resolve:

Do we need to support adding an additional Fleet-Server? @mostlyjason

@hbharding hbharding self-assigned this Mar 1, 2021
@hbharding
Copy link
Contributor

@mostlyjason can you update this issue to include a problem statement and the user stories we want to solve for? I'd like to use this as my design issue for tracking purposes. Alternatively, you can create a separate issue and link to this one. I can fill in details for deliverables.

@ph
Copy link
Contributor

ph commented Mar 1, 2021

@hbharding @mostlyjason can we only a single owner for an issue, which should be the person responsible to drive to a conclusion?

@hbharding
Copy link
Contributor

I'll take this one @ph.

A quick update: I've been meeting with Jason and Mukesh and we've explored user journeys + wireframes for providing Fleet Server setup instructions within Fleet. Due to the complexity of the steps and various use cases to consider, we're leaning towards pointing the user to documentation for instruction. This will help lessen the required engineering effort and improve our ability to provide accurate, curated information to the user.

@hbharding
Copy link
Contributor

hbharding commented Mar 22, 2021

I emailed the team on March 10th requesting review on a google document and whimsical userflow / wireframes (screenshot below) that walks through the process of adding a Fleet Server for both cloud and self-managed deployments. There wasn't significant feedback or concern with the direction we intend to build in 7.13.

Two items came up that i'll mention:

  1. For cloud deployments, we learned that we'll be able to link directly to the cloud console's "Edit Deployment" screen where users can enable Fleet Server if for some reason it is not enabled. https://github.com/elastic/cloud/issues/57690 I updated the user flows and mockups to account for this. Previously we were going to rely solely on documentation for this use case.
  2. We learned that the ES team is working on building Service Accounts in 7.13 which will be used to add a Fleet Server and authenticate with Elasticsearch. The Service Account for Fleet will be created automatically by ES, but users will need a way to retrieve the token/password in order to run the Fleet Server setup command. @ruflin confirmed that this retrieval process can be explained in our documentation.

image

As an overview, I'll describe the general flow for adding a Fleet Server on cloud and self-managed deployments below. Please refer to the wireframes and google doc I sent for complete details.

Cloud deployments

The process for cloud will be relatively straight forward and automated. For new 7.13 cloud deployments, Fleet Server will be enabled by default and users will be able to enroll agents. If a user chooses to have a deployment without Fleet Server, they will see a prompt in Kibana UI to add one when they navigate to Fleet before they can enroll agents. This prompt will link them to the "Edit Deployment" screen in their cloud console where they can enable Fleet Server. Once enabled, Fleet will update inside Kibana and allow users to enroll agents. The Fleet Server URL for cloud will be added to the Fleet Settings flyout automatically.

Self-managed

Self-managed users will see a similar prompt to "Add a Fleet Server" when they navigate to the "Agents" section of Fleet. The prompt will link them to instructions in our documentation that walk through the steps of adding a Fleet Server. After Fleet Server connects to Elasticsearch, the agent running Fleet Server will appear in Fleet. The user will need to manually add a Fleet Server URL #89442 in the Fleet Settings flyout before they can enroll agents (this part will be explained in the documentation). Once a Fleet Server URL exists, users are able to enroll agents.

If users try to enroll an agent before Fleet Server exists, they will see the same "Add a Fleet Server" prompt in the "Add Agent" flyout. Users are still able to access instructions to run an agent in standalone mode.

@hbharding
Copy link
Contributor

We learned recently that the Fleet Server URL needs to be set up before Fleet Server can be setup elastic/fleet-server#145. I will update userflow + doc

@mostlyjason
Copy link
Contributor

@hbharding we just did a test and it still works if the user sets the Fleet Server URL after installing Fleet Server

@mostlyjason
Copy link
Contributor

I think the onboarding UX is now fully defined so we can close this one. I believe the team is already working on implementation.

@hbharding
Copy link
Contributor

@nchaulet is working on it. I posted designs for the fleet server migration message here #95445 which has some overlap with this issue. Still waiting on feedback, but it shouldn't have much of an impact.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Fleet Fleet team's agent central management project Feature:fleet-server Team:Fleet Team label for Observability Data Collection Fleet team
Projects
None yet
Development

No branches or pull requests

6 participants