Initial data stream commit #53666

martijnvg · 2020-03-17T14:28:59Z

This commits adds a data stream feature flag, initial definition of a data stream and
the stubs for the data stream create, delete and get APIs. Also simple serialization
tests are added and a rest test to test the data stream API stubs.

This is a large amount of code and mainly mechanical, but this commit should be
straightforward to review, because there isn't any real logic.

The data stream transport and rest action are behind the data stream feature flag and
are only intialized if the feature flag is enabled. The feature flag is enabled if
elasticsearch is build as snapshot or a release build and the
'es.datastreams_feature_flag_registered' is enabled.

The integ-test-zip sets the feature flag if building a release build, otherwise
rest tests would fail.

Relates to #53100

This commits adds a data stream feature flag, initial definition of a data stream and the stubs for the data stream create, delete and get APIs. Also simple serialization tests are added and a rest test to thest the data stream API stubs. This is a large amount of code and mainly mechanical, but this commit should be straightforward to review, because there isn't any real logic. The data stream transport and rest action are behind the data stream feature flag and are only intialized if the feature flag is enabled. The feature flag is enabled if elasticsearch is build as snapshot or a release build and the 'es.datastreams_feature_flag_registered' is enabled. The integ-test-zip sets the feature flag if building a release build, otherwise rest tests would fail. Relates to elastic#53100

elasticmachine · 2020-03-17T14:29:18Z

Pinging @elastic/es-core-features (:Core/Features/Data streams)

before this commit the data steams api were indices based actions, but data streams aren't indices, data streams encapsulates indices, but are indices themselves. It is a cluster level attribute, and therefor cluster based action fits best for now. Perhaps in the future we will have data stream based actions and then this would be a right fit for the data stream crud apis.

danhermann

LGTM!

danhermann · 2020-03-18T14:36:08Z

.../src/main/java/org/elasticsearch/action/admin/cluster/datastream/CreateDataStreamAction.java

+            super(in);
+            this.name = in.readString();
+            this.timestampFieldName = in.readString();
+        }


Do we need to read the optional indices field here for creating a data stream with existing indices?

For now this api will only create new empty indices.
When we add the ability to add existing indices to a data stream then we can add an optional indices field here. Makes sense?

…lease build is executed

henningandersen

I added a number of comments, mostly on naming and locations, otherwise looking good.

henningandersen · 2020-03-19T16:43:11Z

rest-api-spec/src/main/resources/rest-api-spec/api/cluster.create_data_stream.json

@@ -0,0 +1,31 @@
+{
+  "cluster.create_data_stream":{


I wonder if this name should be: data_streams.create to be consistent with indices.create?

Other cluster operations seems to be mostly about the cluster state and cluster settings.

After chatting with Henning, I also came to the conclusion that data stream apis should be concidere indices based operations. (Ideally we should have a data_stream notion, but we need to think more about this and how this would work in security). So I will revert the commit in this pr that changes data stream crud apis back to indices based apis.

the component template apis are cluster operations, I suspect those may need to be changed also, can you explain the reasoning and how it will relate to security?

Just like indices based apis, data stream apis targets a namespace. So someone may be allowed to create a data stream for logs-* but not for events-*.

Ideally there should be a data stream privilege that handles this correctly, because data streams aren't indices and a data stream may have indices that don't share the data stream name as common prefix. But for now let's stick with indices: based action names and group the apis in the indices client, until we a better there is a better understanding how security and data streams should integrate.

I think component templates should remain cluster based apis. The reason is that these resources don't apply to a namespace. The index template (based on the specified pattern) that use component templates should be an index based action/operation, since they apply the an index namespace and soon also to a data stream namespace.

However currently index templates are treated as cluster privilege (see ClusterPrivilegeResolver line 196 in master), even though the action names start with indices: prefix. With data streams we should properly fix this.

Okay, I originally had component template APIs using indices: to match the existing template stuff, but I did end up changing it because that requires the request to provide the indices that it is going to apply to, and component templates don't apply to indices. +1 to properly fix the template exception with data streams.

henningandersen · 2020-03-19T16:43:45Z

rest-api-spec/src/main/resources/rest-api-spec/api/cluster.delete_data_stream.json

@@ -0,0 +1,26 @@
+{
+  "cluster.delete_data_stream":{


data_streams.delete?

henningandersen · 2020-03-19T16:43:59Z

rest-api-spec/src/main/resources/rest-api-spec/api/cluster.get_data_streams.json

@@ -0,0 +1,33 @@
+{
+  "cluster.get_data_streams":{


data_streams.get?

henningandersen · 2020-03-19T16:50:56Z

rest-api-spec/src/main/resources/rest-api-spec/test/cluster.data_stream/10_basic.yml

+      cluster.create_data_stream:
+        name: data-stream2
+        body:
+          timestamp_field_name: "@timestamp"


I think most places where we ask for a field, we use just "field" and not "field_name". I did not search all usages, but I think this should be just timestamp_field.

henningandersen · 2020-03-19T16:54:55Z

.../src/main/java/org/elasticsearch/action/admin/cluster/datastream/CreateDataStreamAction.java

+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.elasticsearch.action.admin.cluster.datastream;


I think we should move this outside the cluster package? So datastream is a sibling to indices.

We might also want to follow the same substructure by adding create package now? Probably easiest to split into subpackages early.

henningandersen · 2020-03-19T16:57:17Z

.../src/main/java/org/elasticsearch/action/admin/cluster/datastream/CreateDataStreamAction.java

+public class CreateDataStreamAction extends ActionType<AcknowledgedResponse> {
+
+    public static final CreateDataStreamAction INSTANCE = new CreateDataStreamAction();
+    public static final String NAME = "cluster:admin/data_stream/create";


I am in doubt if we should not add a specific prefix here? No need to look at this now, we can pick that up later.

henningandersen · 2020-03-19T17:02:13Z

server/src/main/java/org/elasticsearch/client/support/AbstractClient.java

@@ -1185,6 +1188,36 @@ public DeleteStoredScriptRequestBuilder prepareDeleteStoredScript(){
        public DeleteStoredScriptRequestBuilder prepareDeleteStoredScript(String id){
            return prepareDeleteStoredScript().setId(id);
        }
+


Same story as for the action names. We should consider whether to include this under indices or under a new "client" interface? Will add to our weekly sync, this should not block merging this.

henningandersen · 2020-03-19T17:03:52Z

server/src/main/java/org/elasticsearch/cluster/metadata/DataStream.java

+import java.util.List;
+import java.util.Objects;
+
+public final class DataStream extends AbstractDiffable<DataStream> implements ToXContentObject {


Rename to DataStreamMetaData (like IndexMetaData)?

Data stream definitions will be stored in a custom metadata (like ComponentTemplateMetadata). These classes tend to have the Metadata suffix and therfore I think that adding MetaData suffix is confusing here.

Maybe we can rename this to DataStreamSource? Like how stored scripts are stored? (which also is stored inside a custom metadata).

OK. I think I imagined a more native integration into the MetaData (outside Custom). I think the primary purpose of Custom is for extensibility (x-pack/plugins/modules)? In particular, I think MetaData.getAliasAndIndexLookup will need some support of data-streams? It could still look for the Custom metadata pieces, but feels second-class.

Let us leave this as is for now and tackle above when we get to add the metadata and do the lookups.

I think MetaData.getAliasAndIndexLookup will need some support of data-streams?

We can add a getter on metadata that reads data stream from a custom and then it is as if it is stored as a primary field. The reason why I think we should do this is that, we get serialization, diffability and versioning for free without changing the Metadata class.

It could still look for the Custom metadata pieces, but feels second-class.

Perhaps we should rename this in the future. Scripts and pipelines and other primary concepts are stored also as custom metadata and these concepts aren't second class either.

This reverts commit e362eeb.

martijnvg · 2020-03-20T08:00:07Z

I've changed the base branch to be the master branch instead of the data stream feature branch.

henningandersen

LGTM.

Left a few minor optional comments to consider.

henningandersen · 2020-03-20T08:27:20Z

...ain/java/org/elasticsearch/xpack/core/security/authz/privilege/ClusterPrivilegeResolver.java

-        return actionName.startsWith("cluster:") || actionName.startsWith("indices:admin/template/");
+        return actionName.startsWith("cluster:") ||
+            actionName.startsWith("indices:admin/template/") ||
+            actionName.startsWith("indices:admin/data_stream/");


Maybe add a todo to help the next reader until resolved? Something like:

// todo: hack until we implement security of data_streams

henningandersen · 2020-03-20T08:38:07Z

server/src/main/java/org/elasticsearch/cluster/metadata/DataStream.java

+import java.util.List;
+import java.util.Objects;
+
+public final class DataStream extends AbstractDiffable<DataStream> implements ToXContentObject {


OK. I think I imagined a more native integration into the MetaData (outside Custom). I think the primary purpose of Custom is for extensibility (x-pack/plugins/modules)? In particular, I think MetaData.getAliasAndIndexLookup will need some support of data-streams? It could still look for the Custom metadata pieces, but feels second-class.

Let us leave this as is for now and tackle above when we get to add the metadata and do the lookups.

henningandersen · 2020-03-20T08:40:27Z

...er/src/main/java/org/elasticsearch/rest/action/admin/cluster/RestCreateDataStreamAction.java

+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.elasticsearch.rest.action.admin.cluster;


I think the rest classes should move to org.elasticsearch.rest.action.admin.datastreams? Or org.elasticsearch.rest.action.admin.indices.datastream?

yes, I missed that.

This commits adds a data stream feature flag, initial definition of a data stream and the stubs for the data stream create, delete and get APIs. Also simple serialization tests are added and a rest test to thest the data stream API stubs. This is a large amount of code and mainly mechanical, but this commit should be straightforward to review, because there isn't any real logic. The data stream transport and rest action are behind the data stream feature flag and are only intialized if the feature flag is enabled. The feature flag is enabled if elasticsearch is build as snapshot or a release build and the 'es.datastreams_feature_flag_registered' is enabled. The integ-test-zip sets the feature flag if building a release build, otherwise rest tests would fail. Relates to elastic#53100

martijnvg added the >non-issue label Mar 17, 2020

martijnvg requested a review from danhermann March 17, 2020 14:28

martijnvg added the :Data Management/Data streams Data streams and their lifecycles label Mar 17, 2020

martijnvg requested a review from henningandersen March 17, 2020 16:23

martijnvg added 5 commits March 17, 2020 17:30

fixed hlrc test

9665ea9

ignore bwc until this change has been backported to 7.x branch

f7200c7

this should have been part of the previous commit

7c22f72

fixed yaml test

ee721c3

danhermann approved these changes Mar 18, 2020

View reviewed changes

danhermann reviewed Mar 18, 2020

View reviewed changes

Also add feature flag in other modules that run the yaml test if a re…

f9ce4cd

…lease build is executed

henningandersen reviewed Mar 19, 2020

View reviewed changes

martijnvg added 3 commits March 19, 2020 21:02

Reverted the commits that make data stream a cluster based api

6586d7c

This reverts commit e362eeb.

Make data stream crud apis work like a indices based api.

7130a27

renamed timestamp field

ce73ee8

martijnvg changed the base branch from feature/data-streams to master March 20, 2020 07:35

Merge remote-tracking branch 'es/master' into data-streams-start

53c8fd4

fixed compile error after merging in master

b133074

martijnvg requested a review from henningandersen March 20, 2020 08:08

henningandersen approved these changes Mar 20, 2020

View reviewed changes

martijnvg added 3 commits March 20, 2020 09:48

fixed merge mistake

56c5879

moved setting system property

1ed4eea

applied review comments

9e58488

martijnvg merged commit 1204608 into elastic:master Mar 20, 2020

martijnvg added backport pending v7.7.0 labels Mar 20, 2020

martijnvg added the v8.0.0 label Mar 20, 2020

martijnvg mentioned this pull request Mar 23, 2020

Backport: initial data stream commit #53959

Merged

martijnvg removed the backport pending label Mar 23, 2020

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial data stream commit #53666

Initial data stream commit #53666

martijnvg commented Mar 17, 2020 •

edited

Loading

elasticmachine commented Mar 17, 2020

danhermann left a comment

danhermann Mar 18, 2020

martijnvg Mar 18, 2020

henningandersen left a comment

henningandersen Mar 19, 2020

martijnvg Mar 19, 2020

dakrone Mar 19, 2020

martijnvg Mar 19, 2020

dakrone Mar 19, 2020

henningandersen Mar 19, 2020

henningandersen Mar 19, 2020

henningandersen Mar 19, 2020

henningandersen Mar 19, 2020

henningandersen Mar 19, 2020

henningandersen Mar 19, 2020

henningandersen Mar 19, 2020

martijnvg Mar 19, 2020

henningandersen Mar 20, 2020

martijnvg Mar 20, 2020

martijnvg commented Mar 20, 2020

henningandersen left a comment

henningandersen Mar 20, 2020

henningandersen Mar 20, 2020

henningandersen Mar 20, 2020

martijnvg Mar 20, 2020

Initial data stream commit #53666

Initial data stream commit #53666

Conversation

martijnvg commented Mar 17, 2020 • edited Loading

elasticmachine commented Mar 17, 2020

danhermann left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

henningandersen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martijnvg commented Mar 20, 2020

henningandersen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martijnvg commented Mar 17, 2020 •

edited

Loading