-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fleet] Add fleet server URL #89442
Comments
Pinging @elastic/ingest-management (Team:Ingest Management) |
Pinging @elastic/fleet (Feature:Fleet) |
Thanks for creating this one @mostlyjason, should we support to have two field in the settings Kibana URL and Fleet Server URL in 7.13 so we can have a seamless transition for the user, a user enrolled agent in kibana, can set the fleet server url to migrate them to fleet server. |
@nchaulet that sounds reasonable. I want to understand the process for migrating from kibana to fleet server better, but we can treat that as a separate issue. |
There is also the case that potentially multiple fleet-server exists with different URLs. One for example would be in Cloud and the other one running on prem. |
@ruflin that is a good point! Are you thinking it could be per agent policy? Either way, it'd be nice to have a global default so new agent policies can be initialized automatically. We could start with that and come back to per-agent policy URLs later. |
I'm not sure I follow the above. The policy does not really matter here. An enrollment key can be used with any fleet server and all policies can be retrieved through all fleet-servers. Lets take the current enrollment screen. Currently you can select the policy to enroll an Agent into. As an advanced option, there could also be an option that you can select the fleet-server you want to enroll into so we can show you the correct enrollment command. Thinking of setting the Kibana URL today, this would more become like a dropdown to select the default and maybe configure the correct host for each in case it is reported with the wrong value by fleet-server: |
Interesting the dropdown method is simple and could work as an MVP. I can think of a few use cases where it'd be advantageous to add it to the agent policy. What if the user wants to take a fleet server out for maintenance or replacement? One option is to use a load balancer to fail over to another instance. If the user doesn't have a load balancer, they'd have to reenroll all the agents. The nice thing about storing the URLs in the agent policy is that we could centrally manage them without reenrolling the agents. To take a fleet server out for maintenance, just update the policy with a new URL. This is the same way the elasticsearch URL is updated. If we had a global setting for a fleet server (or a list of them), we could add it to agent policy along with the elasticsearch URL. For a longer term use case, I imagine users wouldn't want to always default to the same fleet server in the add agent dialog because it could get overloaded. One solution is to randomize the default fleet server. Alternatively, if the operator set up agent policies for each geographic region, it could default to the matching fleet server for each region. The person adding agents doesn't need to understand geographic assignments, because there is a smart default. We may not even need a dropdown box, which would simplify the add agent dialog. Choosing a non-default fleet server could be an advanced use case. This one is a stretch, but an even cooler long term use case is that the fleet server could be assigned via variable in the agent policy. That means the fleet server shown in the agent dialog is just for enrollment and retrieving the initial agent policy. Then it can be reassigned dynamically. This would be great if you want to have a single script to deploy agents across multiple regions. I don't think we need this near term, just thinking of another advantage of adding it to the agent policy. Can you think of any downsides? |
For the maintenance part, I think it should be the same as with Elasticsearch where you can have an array of hosts (assuming no proxy exists). This is more for the on prem use case. So the Elastic-Agent itself will switch over to an alternative url if one is not available. But this makes the drop-down trickier. We should probably discuss the use cases where users have fleet-servers in different regions and not all agents have access to all fleet-servers. I would assume at first, we can get away with just making it an array and the Elastic Agent will pick the "best" fleet server? |
@ruflin how would the elastic agent get an array of fleet server URLs? Would you pass an array in via command line parameter, or would you seed one value and deliver the rest via agent policy or an ES document? Also to pick the "best" fleet server from an array, would it start with a random one and switch to the others if it can't connect? |
I would assume that policy would contain a list but have not fully thought it trough yet. Missed the detail around the command line. I'm thinking of the command line as the initial connection setup but that fleet-server could disappear over time. So as soon as the initial setup is done, it should rely the content of the policy. @blakerouse has probably more / better opinions here. What best means is a tricky question, but round robin / random selection sounds like a good start. |
Elastic Agent already supports multiple Kibana URL's to connect to Fleet. The Kibana URL's are added to the policy, so if you add a new Kibana it updates the policy and the Elastic Agent gets that new URL. Elastic Agent will round-robin connect to the Kibana's in the policy. To the Elastic Agent there is no different between talking to Kibana or a Fleet Server. So the same code path, round-robin, policy updating just works for the Fleet Server in Elastic Agent. We probably need to add a new configuration key/values to the policy so we can migrate to the Fleet Server from an existing Kibana. With that new configuration the Elastic Agents can transition over to the newly deployed Fleet Servers. For enrollment Elastic Agent only connects to the Kibana/Fleet Server that is provided in the |
Thanks Blake! I'm glad it already works that way. Sorry for the noise I should have seen it in the agent policy when I looked. I just tested multiple URLs with an existing cluster. It looks like only 1 URL is shown in the add agent dialog. Thats probably fine because its just used for the initial connection setup. A dropdown could work for the use case where a user wants to choose a specific fleet server. Do we really that for GA or can we add it later? I'd lean towards waiting for customer feedback because the user can edit the value on the command line. I assume if the first fleet server in the list is not accessible it will automatically iterate through the list until it can successfully connect to one? That will handle the use cases of taking a fleet server out for maintenance, too much load on one server, and some fleet servers not being reachable from all agents. It doesn't account for the use case of wanting to prefer specific fleet servers based on geography, but I don't think we need to handle that use case for GA.
Does the fleet server report its own URL? I assumed that cloud would pass in the proxy endpoint through |
Agree to keep it as simple / minimal as possible for the first version. As long as a user can edit these in Kibana like today the Kibana endpoint, I assume things should work? I think we should have it in Kibana like today and not move to it kibana.yml as otherwise changes require a restart which is not nice. |
Assuming we remove Kibana URL, we can replace it with "Fleet Server URL" and make it a combobox to support multiple URLs, like we do for the ES URL. For on prem users who have yet to set up Fleet Server, I imagine this field would appear empty. When Fleet Server(s) are added, I think this field should update to include the Fleet Server URLs automatically, otherwise the user would have to know to go here to add the URL manually. |
How much extra effort is this? The simple solution is that the Fleet server would attempt to print its own URL on the installer stdout, which the user could verify and copy into the Fleet settings screen. The disadvantage is that the user has to leave the onboarding flow for an integration to go to the Fleet settings page, paste in the URL, then come back to the integration onboarding flow again. If the server could notify Kibana of the new URL, we could detect it and populate the URL in the add agent dialog automatically. The user could verify that its correct here when adding their agent. |
At the moment an enrolled Fleet Server places its IP addresses into the This might need to be extended more to add which IP address is attached to the interface that is used as the gateway for the machine. Giving that IP address precedence over the other IP addresses in the system. |
The way I read @blakerouse's comment is that Kibana already has access to the information, so it should be possible to populate this information into the "Fleet Server Url(s)" setting? From issue description:
@mostlyjason can you expand on why users should be able to specify the Fleet server URL? I'm probably missing something, but I think it makes sense for this field to be auto-populated (like it is today for Kibana URL) when Fleet Servers are added, both for cloud and self-managed. As an aside, we should also provide better description about what these fields are used for and maybe even warn the user about making changes since these could have a big impact on their agents. Pending the outcome (and if necessary), can you make a design issue that summaries what needs to change? The idea of being able to specify a "default" fleet server sounded interesting. |
I'm thinking of the use case where the fleet server is running behind a proxy, and the agent isn't able to determine its external URL. In this case, the auto-populated information would be incorrect. I don't see a parameter in blake's install example that contains the external URL of Fleet server. That could be one way to populate it, the other being the UI. The advantage of exposing this in the UI is that it provides viability to users on what URLs are present in the list. @blakerouse is there already a concept of a default fleet server in the index? That might be nice because the cluster may have fleet servers in private networks that are not accessible externally, so they wouldn't want to use it as a default server necessarily.
Good point! Sorry I didn't understand your comment initially, but I edited my comment and this is a good point. I think we can populate more than one interface if there are multiple. The other agents can iterate through the list to find one that routes. |
I'm thinking about the concept of auto-populating the fleet server URL/IP and I wonder if there are security concerns? If we populate a DHCP IP or URL, could it be assigned to compromised fleet server later? Ideally, it would be a static IP or URL that is accessible from the endpoints, accounts for proxies, etc. @blakerouse do we ask for user confirmation before saving it to ES and provide a way for users to edit it if needed? Could be via parameter or interactive prompt. |
@mostlyjason At the moment we only populate the index with the list of IP addresses. I was currently leaving it up to Kibana to use that information to create the URL. I could see it being an Administrators job to create the initial URL presented to the users, by selecting from the list of IP's or by providing a DNS address. |
I've been working on the Fleet Server onboarding UX #89396 and one of the last steps (for self-managed) is for the user to confirm the Fleet Server URL which will be used to enroll agents. I made a mockup for the Fleet Settings flyout and shared with @blakerouse and @nchaulet today. The idea is that after Fleet Server connects to ES, it will add its IP addresses to an editable list in Kibana, shown in the Fleet Settings flyout. The user will have to select and confirm a default URL to use. They can also add a DNS address to the list and choose that as the default URL. A few questions and ideas came up that we need to resolve. More detail below:
@blakerouse @nchaulet please correct me if anything I said was wrong or add more as you see fit. Also, any input on the form field descriptions would be much appreciated. I tried my best based on my own understanding. |
Few thoughts:
|
Yes for Cloud we want to show the publicly available, single Fleet Server URL. |
I would like to move the discussion simplicity, If we promote the proxy deployment scenario this would remove a lot of complexity on our end, a single URL, if you have more than one Fleet-Server uses a proxy. If your scale require to have multiple fleet-server you probably have a proxy available to you. By doing so the Cloud and the on-premise deployment is more similar. I don't think we are in a position or we have a requirement to auto-detect IPs or doing anything magic. |
In this case configuring the Fleet server url will be really similar to what we currently have for Kibana url and ES url and this would be automatically configured in cloud |
Yes after thinking about it for a day and speaking with @ph this morning. I think we should just go with exactly what we have with Kibana today. Just renaming it from Kibana URL to Fleet Server URL. This would greatly simplify the implementation. With the current approach it would never be 100% accurate that we would be providing the correct addresses. There is the whole issue with when to update policy based on new Fleet Servers, etc. They already have to get the URL of elasticsearch correct and set it in the Kibana configuration on start or by updating it in the Setting flyout. Why not just make it the same for Fleet Server and we remove all the magic. The magic is just going to either be wrong, open up more issues, and introduce more complexity. |
++
Sounds good to me! We already offer multiple kibana URLs today. This seems like a powerful feature allowing management across multiple networks and automatic failover with a relatively simple solution, no proxy required. We already pick the first URL off the list in the add agent dialog, and the agent already tries the next server until it finds one that successfully connects. What's the downside of keeping the current behavior? |
The downside we lose by keeping the current behavior is that the new behavior in some cases would have been more streamlined and require less configuration from an administrator. But I think the fact that it will not always be correct and will still require manual configuration and the complexity of adding out weights the benefits greatly. |
Here's my take at a summary of what's been discussed. I'd like to close out the items which are not clear to me and come to a decision. What's clear to me
Updated:
|
I'd say start simple and disable the whole input; in case we identify use cases we could losen the restrictions later. |
I think its best to keep the current behavior and have it be an array.
They don't really matter, because if you have a Fleet Server URL in the array, then all Agents really should be able to communicate to that Fleet Server otherwise it does not belong in the list. So picking the first in the array makes this simple and easy for a user to understand that it just shows the first for enrollment.
Keep the current behavior - After the user clicks "save" in "Fleet Settings", apply to all policies. |
++ on Blake's response
I just had a chat with Ruflin on Slack. We came up with an alternative idea that would explain to users the impact/risks of changing this field and ask if they want to continue in a confirmation dialog. We can also add a section to our troubleshooting guide explaining how to recover from an incorrect or missing URL by updating this field and then reenrolling agents. The advantages are that it still allows cloud users to add their own fleet servers, it would also benefit self-managed users, and it reduces the amount of special code we need for cloud. We can apply this same behavior to the ES output field as well. |
For the cloud part how we will get the url currently we go the ES and Kibana URL from the cloudId are we going to have this too? (@simitt do you know how it's done for APM server currently?) |
This thread is getting long so I added a summary at the top based on the latest conversation so we can move towards a clear definition. Please let me know if you have concerns with that approach. |
Thanks Jason - i just updated my previous comment / summary. |
Thanks everyone I like where this is going!, ++++ on @mostlyjason for updating the description. |
Currently, our global output settings in Fleet list a Kibana URL. With the new Fleet server, we need a way for users to specify the fleet server URL.
Requirements
Updated 2020-03-10
Match current behavior for populating the URL
On ESS/ECE, the fleet server URL will be automatically populated by cloud to make it easy to get started. For self-managed clusters users must manually populate this URL in Kibana's Fleet settings when they are setting up Fleet server. We decided not to attempt to magically fill in the IP/DNS name at this time because this is error prone and opens up more complexity. Manual entry allows the user to confirm it is correct, assign a static URL, account for proxy servers, etc. This is the current behavior, so no change.
Match current behavior for multiple URLs
The user can set multiple fleet server URLs, which is also the current behavior. The first URL in the list is used to populate the add agent dialog. The Elastic Agent will connect to this URL when installing, then download the rest of the URLs in the agent policy. When a fleet server is added or removed, the agent policies are updated automatically. The Elastic Agent will iterate through URLs until it connects to one successfully. This allows for automatic failover and subnets.
Add a confirmation dialog for changes
One new feature we'd like to add is a confirmation dialog when the user changes the Fleet server URL. There is a risk if this field is incorrect that agents will lose connection to Fleet, and the agents will need to be manually reenrolled. A confirmation dialog will explain these risks and allow the user to proceed if desired. We can use this dialog in both ESS/ECE and self-managed use cases. On ESS/ECE, we can add a section to our troubleshooting guide explaining how to reset the URL if needed for both cloud and self-managed use cases. We will not completely lock the field or build cloud-specific logic at this time. If we see many users running into a problem we can add more guardrails later.
Stretch goal: add the same confirmation dialog for the ES output URL
The text was updated successfully, but these errors were encountered: