[Change Proposal] Ability to add privileges for all data streams of a specific type #315

felixbarny · 2022-04-07T12:13:26Z

Currently, the data stream permissions are specific to the dataset that an integration defines.
For example, when adding a custom log integration, you'll have to specify the data_stream.dataset, for example foo, and an API key will be generated with the permissions to send data to logs-foo-default.

This is limiting for integrations that define a data stream that routes events to other data streams. See:

Ecs logs integration integrations#2972

Example: a log line like this gets sent to the logs-ecs_router-default data stream:

{
  "message": "{\"@timestamp\":\"2022-04-01T12:09:12.375Z\", \"log.level\": \"INFO\", \"message\":\"With event.dataset\", \"data_stream.dataset\": \"foo\"}"
}

The default ingest pipeline for logs-ecs_router-default parses the JSON within the message and uses the data_stream.dataset from the log message to route the message to the logs-foo-default data stream (by overriding the _index field).

The issue is that this will lead to a security exception as the API key used to ingest the data only has permissions to ingest data to logs-ecs_router-default.

This is also an issue for the Azure springcloudlogs integration: All logs are always sent to the springcloudlogs data stream, even if the logs are from different application and, thus, should ideally be routed to their own data streams. Other examples are CloudWatch, k8s logs, PCF logs, and httpjson.

This relates to the discussions about input-only packages but is an independent and decoupled task.

What I'm proposing is to add a flag to the package spec that behaves similar to dataset_is_prefix

package-spec/versions/1/data_stream/manifest.spec.yml

Lines 129 to 132 in d017330

    
           dataset_is_prefix: 
        
             description: if true, the index pattern in the ES template will contain the dataset as a prefix only 
        
             type: boolean 
        
             default: false

But instead of just adding .* to the index permissions, the flag will allow access to all data streams of a given type, such as logs-*-*.

@ruflin @mtojek @joshdover

The text was updated successfully, but these errors were encountered:

ruflin · 2022-04-11T07:17:53Z

An alternative approach is using default_permissions: true. There is a default set of permissions that gives access to logs-*-*, metrics-*-*, traces-*-*. It gives broader permissions then needed but is a catch all solution for all input packages that might also contain metrics or traces. One of the basic assumptions I make here is that if someone deploys this, the Elastic Agent is deployed in a trusted environment.

felixbarny · 2022-04-11T07:52:43Z

An alternative approach is using default_permissions: true. There is a default set of permissions that gives access to logs-*-*, metrics-*-*, traces-*-*.

Is this an existing flag? Where can I learn more about it? Ideally, a logging-related input integration would only have access to logs-*-*.

ruflin · 2022-04-11T09:08:15Z

Is this an existing flag?

No, I just made it up :-) Alternative it could default_permissions: ["logs", "metrics"] to make it an array which by default is empty.

joshdover · 2022-04-11T10:07:59Z

Thanks for highlighting this use case. We have a few discussions this week around ingest node routing (eg. elastic/elasticsearch#63798) and I think I can provide a more helpful response here after we've had those.

felixbarny · 2022-04-25T06:53:26Z

@joshdover Did you have a chance to think about this again? This issue is a blocker for full event routing and also the lightweight event routing required for an ECS logging integration (elastic/integrations#2972).

ruflin · 2022-04-29T09:07:57Z

There is currently work ongoing here around input packages: #325 I expect the packages that need this feature are normally input packages as for integrations packages, the data streams are mostly predefined. Now I'm torn if there should be a config param in the package spec itself for this or if Fleet should so magic around it. I'm leaning towards package a package spec flag as it would allow Fleet to know in advance that a certain package requires much more permissions and also warn the user about it.

joshdover · 2022-04-29T09:44:12Z

The default ingest pipeline for logs-ecs_router-default parses the JSON within the message and uses the data_stream.dataset from the log message to route the message to the logs-foo-default data stream (by overriding the _index field).

The issue is that this will lead to a security exception as the API key used to ingest the data only has permissions to ingest data to logs-ecs_router-default.

This is also an issue for the Azure springcloudlogs integration: All logs are always sent to the springcloudlogs data stream, even if the logs are from different application and, thus, should ideally be routed to their own data streams. Other examples are CloudWatch, k8s logs, PCF logs, and httpjson.

I'm wondering if instead, this should actually be solved with an index / data stream setting rather than with granting Agents explicit access to every data stream that an ingest pipeline may route to. This would allow us to further decouple the data collection from the ingest processing.

Otherwise, every time a routing ingest pipeline adds or removes a new destination data stream, we'll need to update the API keys for every agent sending data to the routing data stream. Generating API keys has non-trivial performance overhead and when we're talking about 100k+ agents, rolling out an API key change could take quite a bit of time and put a large load on the cluster. Due to this delay, we'd need to also change the order of operations to update all the Agent API keys first, then update the routing ingest pipeline. Otherwise, data could be dropped.

I'm imagining a setting like index.pipeline_routing_allowlist: ["logs-*"] that could be specified on the data stream's index template which would modify the security behavior during the ingest pipeline processing to avoid the security exception.

felixbarny · 2022-04-29T10:31:35Z

Otherwise, every time a routing ingest pipeline adds or removes a new destination data stream, we'll need to update the API keys

I don't think updates are required when the API keys have permissions for logs-* as that's all it would ever need. Note that it wouldn't be possible to make the list of index names more restrictive so that it only contains patterns that are used by installed integrations, such as logs-ngingx* if the Nginx integration is installed. That's because integrations, such as the ECS logging integration and possibly the Azure springcloudlogs integration, can route to any data dataset, depending on the values in the log message.

I'm imagining a setting like index.pipeline_routing_allowlist: ["logs-*"] that could be specified on the data stream's index template which would modify the security behavior during the ingest pipeline processing to avoid the security exception.

I think that could work, too. However, it would conflate index-level settings with API Key permissions which makes it harder to reason about what you can do with a given API key. Not sure if the ES team would be happy with that.

Also, I don't see why that would be preferable over an integration being able to request permissions for logs-*. Having access to logs-router-default which has a setting like index.pipeline_routing_allowlist: ["logs-*"] would be synonymous to having a permission for logs-*.

joshdover · 2022-04-29T11:27:44Z

I don't think updates are required when the API keys have permissions for logs-* as that's all it would ever need.

Got it, if we don't expect to ever have finer-grained privileges for these types of packages, I guess going with a purely package-based approach is fine then. Probably an option default_permissions would work fine, though I'd maybe want to make the name of the option a bit more explicit. I also don't think it should live next to dataset_as_prefix since that changes more than just the permissions and this is a permissions-only feature.

What if we put this next to data_stream.elasticsearch.privileges.indices? I'm thinking a name like data_stream.elasticsearch.privileges.allow_routing which would grant create privileges on logs-*-*, metrics-*-*, traces-*-*, synthetics-*-*

Felix are you comfortable opening a PR for the package-spec change? Once we have agreement on the spec change we can create the implementation tasks.

felixbarny · 2022-04-29T18:10:29Z

Here's the PR: #327

felixbarny · 2022-07-25T06:42:46Z

Hey @joshdover what's the progress on this? Is there any blocker?

joshdover · 2022-07-25T10:42:14Z

@felixbarny We still haven't made progress on elastic/kibana#115032 yet which is blocking this. It was a stretch goal for 8.4. @jen-huang @kpollich any progress made here?

felixbarny · 2022-07-25T10:55:16Z

Thanks for the update. Seems like it doesn't make it to 8.4 then. Could we add this as a non-stretch goal to 8.5?

kpollich · 2022-07-25T11:59:02Z

@felixbarny We still haven't made progress on elastic/kibana#115032 yet which is blocking this. It was a stretch goal for 8.4. @jen-huang @kpollich any progress made here?

We weren't able to get to this in 8.4. I'll add it to our 8.5 planning doc.

axw · 2022-08-16T07:28:45Z

APM Server would like to make use of this, or a variation on it, to satisfy elastic/kibana#134971

We wouldn't need to write to arbitrary datasets, only to arbitrary namespaces. Should I open a new issue for this?

ruflin · 2022-09-26T14:27:16Z

I would like to see us moving forward on this. My suggestion is to start with a simple solution that should work for apm-server but also input packages. The basic idea is that a package can opt-in to having the default permissions meaning permissions for logs-*-*, metrics-*-*, traces-*-*. The assumption is that anyone running Elastic Agent with apm-server runs it in a trusted environment and we are comfortable with giving these permissions.

If this becomes a problem, we can still follow up with more fine grained permissions. We should keep in mind, AFAIK apm-server standalone had more permissions in the past (including things like setup templates).

There is a concern around the upgrade. Can we warn users if they add an integration with these permissions? Should we?

joshdover · 2022-09-27T12:14:08Z

+1 on a more generically useful name than allow_routing to allow for the other use cases, but I'd still prefer integrations declare which types they need access to.

There is a concern around the upgrade. Can we warn users if they add an integration with these permissions? Should we?

After our last discussions, I'm a little less concerned about expanding privileges for Agents as it hasn't historically been an issue for Beats users. We have advertised that agents will receive an API key with the minimum privileges required, but I haven't seen many user questions about this. We could consider some sort of explanation text in the integration policy upgrade flow which shows the privilege change that will happen. One problem with the APM package specifically is that this is upgraded during stack upgrade without user interaction, so there's not a chance to show this to the user. I'd argue we just defer on this problem.

On an execution note, we are starting work on the Kibana blocker to enable any variation of this feature: elastic/kibana#115032

TheRiffRafi · 2022-11-11T00:45:35Z

I don't know if this will add and help to the discussion but here it goes.

I jut dealt with a user that is pulling kubernetes metrics and log data through elastic-agent. The agent monitors a cluster but that cluster has different pods doing different work for different applications and apparently different apps within a same pod. The user has not access to make changes on how the agent is deployed so they can only segment the data based on the fields they are receiving from the already installed agents.
One of those applications requires a longer data retention cycle than the other ones but since the data comes from a single policy they are unable to redirect some of the data to another data stream with a longer retention policy using an ingest pipeline (doing the set _index field).

Hacky workaround
In order to get around this I had them add a custom log integration to the policy with a dataset and namespace used for the longer retention and pointing to a dummy log path.
Now the apikey assigned to the policy has that data stream as part of the permissions and the ingest pipeline redirect works.

Dropping here for considerations.

felixbarny · 2022-11-16T08:45:43Z

I've updated the package spec PR

Add flags to allow for routing to dynamic datasets and namespaces #327

Now, there are two dedicated flags: data_stream.elasticsearch.dynamic_dataset/data_stream.elasticsearch.dynamic_namespace. I think this both more self-explanatory and more flexible rather than having a single flag. I agree that allow_routing is not a great name. Separating the functionality to the two different flags dynamic_dataset/dynamic_dataset gets around that naming issue, too.

felixbarny added the discuss Issue needs discussion label Apr 7, 2022

jlind23 added the Team:Ecosystem Label for the Packages Ecosystem team label Apr 7, 2022

This was referenced Apr 7, 2022

[Feature Request] Enable ECS log detection by default using Custom Log integration elastic/integrations#1332

Closed

Ecs logs integration elastic/integrations#2972

Closed

felixbarny mentioned this issue Apr 29, 2022

Add flags to allow for routing to dynamic datasets and namespaces #327

Merged

2 tasks

jlind23 assigned ruflin May 24, 2022

ruflin assigned felixbarny and unassigned ruflin Jun 24, 2022

joshdover mentioned this issue Oct 28, 2022

[Stack Monitoring] Ingest pipeline monitoring elastic/kibana#41936

Closed

6 tasks

jsoriano closed this as completed in #327 Nov 17, 2022

kpollich mentioned this issue Apr 26, 2023

[Fleet] Support for document-based routing via ingest pipelines elastic/kibana#151898

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Change Proposal] Ability to add privileges for all data streams of a specific type #315

[Change Proposal] Ability to add privileges for all data streams of a specific type #315

felixbarny commented Apr 7, 2022 •

edited

Loading

ruflin commented Apr 11, 2022

felixbarny commented Apr 11, 2022

ruflin commented Apr 11, 2022 •

edited by felixbarny

Loading

joshdover commented Apr 11, 2022

felixbarny commented Apr 25, 2022

ruflin commented Apr 29, 2022

joshdover commented Apr 29, 2022 •

edited

Loading

felixbarny commented Apr 29, 2022

joshdover commented Apr 29, 2022

felixbarny commented Apr 29, 2022

felixbarny commented Jul 25, 2022

joshdover commented Jul 25, 2022

felixbarny commented Jul 25, 2022

kpollich commented Jul 25, 2022

axw commented Aug 16, 2022

ruflin commented Sep 26, 2022

joshdover commented Sep 27, 2022

TheRiffRafi commented Nov 11, 2022

felixbarny commented Nov 16, 2022

[Change Proposal] Ability to add privileges for all data streams of a specific type #315

[Change Proposal] Ability to add privileges for all data streams of a specific type #315

Comments

felixbarny commented Apr 7, 2022 • edited Loading

ruflin commented Apr 11, 2022

felixbarny commented Apr 11, 2022

ruflin commented Apr 11, 2022 • edited by felixbarny Loading

joshdover commented Apr 11, 2022

felixbarny commented Apr 25, 2022

ruflin commented Apr 29, 2022

joshdover commented Apr 29, 2022 • edited Loading

felixbarny commented Apr 29, 2022

joshdover commented Apr 29, 2022

felixbarny commented Apr 29, 2022

felixbarny commented Jul 25, 2022

joshdover commented Jul 25, 2022

felixbarny commented Jul 25, 2022

kpollich commented Jul 25, 2022

axw commented Aug 16, 2022

ruflin commented Sep 26, 2022

joshdover commented Sep 27, 2022

TheRiffRafi commented Nov 11, 2022

felixbarny commented Nov 16, 2022

felixbarny commented Apr 7, 2022 •

edited

Loading

ruflin commented Apr 11, 2022 •

edited by felixbarny

Loading

joshdover commented Apr 29, 2022 •

edited

Loading