-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[APM] Allow kibana to collect APM telemetry in background task #52917
[APM] Allow kibana to collect APM telemetry in background task #52917
Conversation
2e3e83f
to
a959378
Compare
Pinging @elastic/es-security (:Security/Authorization) |
@ogupte For posterity, I'm doing the following calls:
The latter is to check for the presence of APM ML jobs. That could also be done via a Kibana API IIRC but I guess it would face the same restrictions. |
Does this PR need to add privileges to |
a959378
to
c8f2608
Compare
@elasticmachine test this please |
You're right @legrego, I amended this change by adding those privileges as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a question about the index names. Otherwise, LGTM
RoleDescriptor.IndicesPrivileges.builder() | ||
.indices(".apm-custom-link").privileges("all").build(), | ||
// APM telemetry queries APM & ML anomalies indices in kibana task runner | ||
RoleDescriptor.IndicesPrivileges.builder() | ||
.indices("apm-*") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the indices referenced in this file is prefixed with a dot. So just wanna make sure this one is indeed without it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes it's intentionally referring to the apm data indices
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ogupte In a previous discussion, of which you have been part of, see #50051 (comment), Brandon laid out the design of security privileges.
According to it, system roles such as kibana_system
, which are used by Kibana system tasks/jobs for internal house keeping, must not be granted privileges against end user data. The administrator must at all times be in control of which users and services have access to the end user's data.
If a system task needs to deal with user data, it must get consent from the user, and afterwards obtain API keys on the user's behalf (see #52886). The end users should be able to revoke the system user's access to his data at any time.
IIUC this PR grants privileges to user data to the kibana_system
role, going against the restrictions described above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this PR grants access to user data to system users which goes against best practices.
RoleDescriptor.IndicesPrivileges.builder() | ||
.indices(".apm-custom-link").privileges("all").build(), | ||
// APM telemetry queries APM & ML anomalies indices in kibana task runner | ||
RoleDescriptor.IndicesPrivileges.builder() | ||
.indices("apm-*") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ogupte In a previous discussion, of which you have been part of, see #50051 (comment), Brandon laid out the design of security privileges.
According to it, system roles such as kibana_system
, which are used by Kibana system tasks/jobs for internal house keeping, must not be granted privileges against end user data. The administrator must at all times be in control of which users and services have access to the end user's data.
If a system task needs to deal with user data, it must get consent from the user, and afterwards obtain API keys on the user's behalf (see #52886). The end users should be able to revoke the system user's access to his data at any time.
IIUC this PR grants privileges to user data to the kibana_system
role, going against the restrictions described above.
.privileges("read", "read_cross_cluster", "view_index_metadata").build(), | ||
RoleDescriptor.IndicesPrivileges.builder() | ||
.indices(".ml-anomalies-*") | ||
.privileges("read", "read_cross_cluster", "view_index_metadata").build(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also don't think the kibana_system
user should access ML data directly, but I need more context.
Instead I propose, similarly to the above, the the APM kibana system user calls the ML APIs using the credentials of the user (via an API key). That's because ML users do get access to the .ml-anomalies*
indices which indicates that the .ml-anomalies*
indices contain user data.
Maybe @droberts195 has a better suggestion of how APM should access ML data ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@albertzaharovits As far as I understand @kobelb was OK with this exception, see e.g. elastic/kibana#50757 (comment). I can't find anything more formal or explicit than this, so maybe Brandon can confirm here just to be sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After quite a few discussions regarding this topic, I came to the conclusion that allowing the kibana_system
role to read from the apm-*
indices only for collecting telemetry was tolerable from the security perspective. The telemetry data is aggregate in nature, and unauthorized end-users will be unable to see these aggregates. Since apm-*
is a "data index" which users have complete control over, there's the risk that these indices contain documents that aren't created by the APM server for use within the APM application in Kibana, which could potentially lead to other bugs. I can only speculate on the likelihood and impact of these other bugs.
I don't believe I was part of a prior discussion regarding .ml-anomalies*
. From the security perspective, as long as we're only using this for telemetry it also seems tolerable for the aforementioned reasons. I'll defer to @droberts195 on the risk of other bugs which might occur from Kibana reading from these indices directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From elastic/kibana#50757 (comment) it looks like the requirement is to find out:
- whether the cluster has APM-specific ML indices
But actually given the way the check is being done - see https://github.com/elastic/kibana/pull/51612/files#diff-876010cd4a64003f0f8c1ded433c1c72R371 - it is finding out whether indices whose names match .ml-anomalies-*-high_mean_response_time
contain at least one document.
The output in the telemetry is this:
integrations: {
ml: {
has_anomalies_indices: boolean
}
}
Is the real requirement to find out if the ML jobs created by APM have done any work? It seems to be wrong to be doing this via some internal implementation detail that could change (and could return an incorrect answer with the current implementation if somebody has created some other job not related to APM whose job ID ends with -high_mean_response_time
).
Since the APM telemetry code seems to know the job IDs of ML jobs APM might have created it would be better to just use the get job stats API
. That would mean giving the kibana_system
role the manage_ml
cluster privilege, but that's what we plan to do anyway for the "ML in Spaces" project so it wouldn't hurt if you did it now. You could look for data_counts.processed_record_count > 0
in the job stats for the APM jobs to see if they've been used.
Even if you really do want to report on ML indices (the internal implementation detail) rather than ML jobs (the public interface), I am not convinced it's appropriate to grant the read_cross_cluster
privilege. Would APM ever really need to access ML indices in a different cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also added elastic/kibana#50757 (comment) on the Kibana issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@albertzaharovits I think we want to avoid requiring our users to initiate background tasks like this (IIUC this is necessary even with the linked PR) because we want a solution that works out of the box. We previously discussed API keys but decided against it for this reason.
I do agree it should be a temporary solution for the reasons you've mentioned, but I'm not sure which solution that would be. For now we would accept data loss if the user has reconfigured their indices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@albertzaharovits I think we want to avoid requiring our users to initiate background tasks like this (IIUC this is necessary even with the linked PR) because we want a solution that works out of the box. We previously discussed API keys but decided against it for this reason.
@dgieselaar This is not entirely accurate. For example, using secondary authentication (see #52093 - I now realize I made a mistake in the above "#52886 and #52886") the query against the apm-*
indices can be made by the end user currently interacting with the APM UI. Also, if changing the apm-*
index pattern is at all possible, it should also be feasible to obtain an ES API key for the configured index pattern for the user doing the configuration (although I don't have any idea about the multi-tenancy aspect of the data in the apm indices).
Another point is that there is an apm_user
system user already, and it's not clear to me why the same privileges have to be assigned to the kibana system user as well.
Ultimately it seems I lack the context, and the PR description does not provide it, so I feel I'm not the person to review the changes. Although I'm open to learn about it and contribute options, that's not necessary, and we can rely on the reviews from @kobelb and @legrego since Kibana is doing most of the the authorization around APM anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think further thought needs to be given to how we enable solutions like APM to report their telemetry data. Per elastic/kibana#50757 (comment), @TinaHeiligers and the Pulse team will be taking this into consideration when fleshing out the improvements which they're going to be making to our existing telemetry infrastructure.
For the time being, using Kibana to access the hard-coded apm-*
indices is tolerable, but far from ideal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed we only want to know whether any APM jobs have been created
If you don't care if they've been used or not, just literally if they've been created, then the most appropriate API would be get jobs
. You can choose one of these two ways to use it:
- Call it with your wildcard ID pattern and check whether you get an empty array back. Empty array => no APM jobs, non-empty array => APM jobs exist.
- Call it with your wildcard and
?allow_no_match=false
. Then 404 status => no APM jobs and 200 status => APM jobs exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've addressed your suggestions in the PR @droberts195, thanks!
@ogupte @legrego @ywangd @droberts195 I'm now using Without any index privileges, it is not able to read anything from the If I add a |
Allows the kibana user to collect APM telemetry in a background task.
…system` reserved role
c8f2608
to
efa0963
Compare
Oh, of course, the |
… (#54106) * Required for elastic/kibana#50757. Allows the kibana user to collect APM telemetry in a background task. * removed unnecessary priviledges on `.ml-anomalies-*` for the `kibana_system` reserved role
Required for elastic/kibana#50757. Allows the kibana user to collect APM telemetry in a background task by giving the
kibana_system
reserved role theread
,read_cross_cluster
,view_index_metadata
privileges onapm-*