Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ILM policy PUT and GET for remote_monitoring_agent built-in role #57963

Conversation

ycombinator
Copy link
Contributor

@ycombinator ycombinator commented Jun 11, 2020

What does this PR do?

This PR adds the cluster:admin/ilm/get and cluster:admin/ilm/put privilege to the remote_monitoring_agent built-in role.

Why is this change necessary?

The remote_monitoring_agent built-in role is intended for users who wish to monitor their Elastic Stack with Metricbeat. One of the checks that Metricbeat does upon start up is for the existence of an ILM policy. As such, this role should allow this check to happen.

Without this fix, users who try to use Metricbeat for Stack Monitoring today see the following error repeatedly in their Metricbeat log. Due to this error Metricbeat is unwilling to proceed further and, thus, no Stack Monitoring data is indexed into the Elasticsearch cluster.

2020-06-10T17:10:37.707-0700    ERROR   [publisher_pipeline_output]     pipeline/output.go:154  Failed to connect to backoff(elasticsearch(http://localhost:9200)): Connection marked as failed because the onConnect callback failed: failed to check for policy name 'metricbeat': (status=403) {"error":{"root_cause":[{"type":"security_exception","reason":"action [cluster:admin/ilm/get] is unauthorized for user [remote_monitoring_user]"}],"type":"security_exception","reason":"action [cluster:admin/ilm/get] is unauthorized for user [remote_monitoring_user]"},"status":403}: 403 Forbidden: {"error":{"root_cause":[{"type":"security_exception","reason":"action [cluster:admin/ilm/get] is unauthorized for user [remote_monitoring_user]"}],"type":"security_exception","reason":"action [cluster:admin/ilm/get] is unauthorized for user [remote_monitoring_user]"},"status":403}

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-security (:Security/Authorization)

@elasticmachine elasticmachine added the Team:Security Meta label for security team label Jun 11, 2020
@@ -73,6 +73,7 @@
"cluster:monitor/xpack/watcher/watch/get",
"cluster:admin/xpack/watcher/watch/put",
"cluster:admin/xpack/watcher/watch/delete",
GetLifecycleAction.NAME,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's preferable, if possible, to use privilege names, such as read_ilm in this case, instead of explicit action names. This offers us the maximum of flexibility, as we don't have to worry about BWC when changing action names.

@albertzaharovits
Copy link
Contributor

Is it a specific ILM policy that Metricbeat requires access to (I think so, but want to make sure)?
In this case, in the future (see #50130), we should consider limiting access only to the specific policy, not to all of them.

@ycombinator
Copy link
Contributor Author

Is it a specific ILM policy that Metricbeat requires access to (I think so, but want to make sure)?
In this case, in the future (see #50130), we should consider limiting access only to the specific policy, not to all of them.

Yes, it is a specific policy, governed by this setting: https://www.elastic.co/guide/en/beats/metricbeat/current/ilm.html#setup-ilm-policy_name-option. By default the name of the policy is metricbeat.

@albertzaharovits albertzaharovits self-requested a review June 11, 2020 13:05
Copy link
Contributor

@albertzaharovits albertzaharovits left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ycombinator
LGTM, but I have two questions please:

@albertzaharovits albertzaharovits self-assigned this Jun 11, 2020
@lcawl
Copy link
Contributor

lcawl commented Jun 11, 2020

@ycombinator I think more than just read_ilm cluster privileges are required since when I tested this on 7.6.2, I subsequently got this message:

2020-06-11T15:49:30.389-0700 ERROR pipeline/output.go:100 Failed to connect to backoff(elasticsearch(http://localhost:9202)): Connection marked as failed because the onConnect callback failed: 403 Forbidden: {"error":{"root_cause":[{"type":"security_exception","reason":"action [cluster:admin/ilm/put] is unauthorized for user [my-tester]"}],"type":"security_exception","reason":"action [cluster:admin/ilm/put] is unauthorized for user [my-tester]"},"status":403}
2020-06-11T15:49:30.389-0700 INFO pipeline/output.go:93 Attempting to reconnect to backoff(elasticsearch(http://localhost:9202)) with 4 reconnect attempt(s)

I had to either give my user the manage_ilm cluster privilege or else put setup.ilm.enabled: false in my metricbeat.yml file to make those errors go away.

@ycombinator
Copy link
Contributor Author

ycombinator commented Jun 11, 2020

Thanks @albertzaharovits and @lcawl for your comments.

is this breaking Metricbeat in 7.7 ?

Yes, and in prior versions as well.

can you clarify what/who PUTs the policy in the first place (and maintains/updates)? Is it the kibana_system user, or something that ships with ES from #57629 .

I had to either give my user the manage_ilm cluster privilege or else put setup.ilm.enabled: false in my metricbeat.yml file to make those errors go away.

Users who use Metricbeat for Stack Monitoring can be divided into two classes:

  • those who are using Metricbeat just for Stack Monitoring. These users don't need ILM at all, because the Stack Monitoring modules in Metricbeat write to .monitoring-* indices, which don't use ILM today. So such users can use the remote_monitoring_user built-in user (which uses the remote_monitoring_agent role), set setup.ilm.enabled: false in their metricbeat.yml and everything should work. Metricbeat will simply skip the ILM policy check and setup as well.
  • those who are using Metricbeat for Stack Monitoring but also for monitoring other services. These users do need ILM because non-Stack-Monitoring modules in Metricbeat write to metricbeat-* indices, and Metricbeat manages the index template and ILM policy for these indices. If such users are using the remote_monitoring_user built-in user which is intended for Stack Monitoring, they will see errors about ILM setup because the role tied to that user, remote_monitoring_agent does not have ILM-related privileges. That's what this PR is trying to address.

So I think the options become:

  1. Leave the built-in remote_monitoring_user and remote_monitoring_agent roles as-is. Don't give them additional privileges. Recommend to users in Stack Monitoring documentation that users can use the built-in user IF they are using Metricbeat just for Stack Monitoring. Otherwise, they must create a new user that uses the built-in remote_monitoring_agent role but also another role that grants the manage_ilm cluster privilege.
  2. Add the manage_ilm cluster privilege to the remote_monitoring_agent built-in role, thereby granting it to the remote_monitoring_user built-in user as well.

Option 1 makes the "getting started" user experience a bit messier but option 2 makes the built-in remote_monitoring_agent role and remote_monitoring_user user less secure.

Any opinions on one over the other, @albertzaharovits or @lcawl?

@albertzaharovits
Copy link
Contributor

Thanks for the explanation @ycombinator !

I think we should go with option 2

Add the manage_ilm cluster privilege to the remote_monitoring_agent built-in role, thereby granting it to the remote_monitoring_user built-in user as well.

My thinking is that the remote_monitoring_agent built-in role already has quite extensive privileges with manage_index_templates and manage_ingest_pipelines . The issue is that this role (and by extension the builtin remote_monitoring_user) can wreak havoc on all of ingestion in the cluster, it is not confined to monitoring ingestion. manage_ilm adds the added risk of data deletion because ILM has a delete index action. We don't have fine grained privileges for ILM , although we sorely need it, but the privs we currently have are a rough approximation that should only detract from the effective Security and not from the user experience.


But I don't think we should ship such a change in a patch release. From ES perspective this is an "enhancement" not a "bug". It's a bug from a use case perspective. Not even all ES bug fixes are ported to patch releases, if they change behaviour.

I don't feel strongly about it, but I feel it's not my decision to make. If you feel otherwise can you please rope in someone that's more involved in the release process, maybe a tech lead.

@albertzaharovits
Copy link
Contributor

@ycombinator We discussed this inside the team this morning, and they've changed my mind.
We will backport this in a patch release (most likely 7.8.1 , but I'll backport to the 7.7 branch as well, just to be safe).

I feel discomfort whenever we have to grant such a liberal privilege for built-in roles, especially since it's required only during the initial setup, but this is simply our only option atm. I was reluctant to force this upon users upgrading the patch version, but @jkakavas convinced me that this sounds like a bug for users and we must find reasons NOT to backport bugs to patches, not the other way around.

@albertzaharovits albertzaharovits changed the title Add cluster:admin/ilm/get privilege for remote_monitoring_agent built-in role Add ILM policy PUT and GET for remote_monitoring_agent built-in role Jun 15, 2020
@albertzaharovits
Copy link
Contributor

@ycombinator I have taken the liberty to push to your PR branch in order to make up for the lost time with these discussions. I have modified the role to explicitly grant privs for PUT and GET ILM policy actions (instead of the manage_ilm); I went against my own recommendation to use privilege names because narrowing the privs in this way makes me feel just a tiny bit better.

@albertzaharovits
Copy link
Contributor

@elasticmachine update branch

@albertzaharovits albertzaharovits merged commit 4e4b831 into elastic:master Jun 15, 2020
albertzaharovits added a commit that referenced this pull request Jun 15, 2020
…57963)

Without this fix, users who try to use Metricbeat for Stack Monitoring today
see the following error repeatedly in their Metricbeat log. Due to this error
Metricbeat is unwilling to proceed further and, thus, no Stack Monitoring
data is indexed into the Elasticsearch cluster.

Co-authored-by: Albert Zaharovits <[email protected]>
albertzaharovits added a commit to albertzaharovits/elasticsearch that referenced this pull request Jun 15, 2020
…lastic#57963)

Without this fix, users who try to use Metricbeat for Stack Monitoring today
see the following error repeatedly in their Metricbeat log. Due to this error
Metricbeat is unwilling to proceed further and, thus, no Stack Monitoring
data is indexed into the Elasticsearch cluster.

Co-authored-by: Albert Zaharovits <[email protected]>
albertzaharovits added a commit that referenced this pull request Jun 15, 2020
…57963)

Without this fix, users who try to use Metricbeat for Stack Monitoring today
see the following error repeatedly in their Metricbeat log. Due to this error
Metricbeat is unwilling to proceed further and, thus, no Stack Monitoring
data is indexed into the Elasticsearch cluster.

Co-authored-by: Shaunak Kashyap <[email protected]>
albertzaharovits added a commit that referenced this pull request Jun 15, 2020
…57963)

Without this fix, users who try to use Metricbeat for Stack Monitoring today
see the following error repeatedly in their Metricbeat log. Due to this error
Metricbeat is unwilling to proceed further and, thus, no Stack Monitoring
data is indexed into the Elasticsearch cluster.

Co-authored-by: Shaunak Kashyap <[email protected]>
@ycombinator
Copy link
Contributor Author

Sorry @albertzaharovits, got caught up in other things on Friday and then came the weekend. Thanks for making updates to this PR and merging it <3.

@ycombinator ycombinator deleted the remote-monitoring-agent-privs-add-ilm-read branch June 15, 2020 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Security/Authorization Roles, Privileges, DLS/FLS, RBAC/ABAC Team:Security Meta label for security team v7.7.2 v7.8.1 v7.9.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants