Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial data stream commit #53666

Merged
merged 15 commits into from
Mar 20, 2020
Merged

Conversation

martijnvg
Copy link
Member

@martijnvg martijnvg commented Mar 17, 2020

This commits adds a data stream feature flag, initial definition of a data stream and
the stubs for the data stream create, delete and get APIs. Also simple serialization
tests are added and a rest test to test the data stream API stubs.

This is a large amount of code and mainly mechanical, but this commit should be
straightforward to review, because there isn't any real logic.

The data stream transport and rest action are behind the data stream feature flag and
are only intialized if the feature flag is enabled. The feature flag is enabled if
elasticsearch is build as snapshot or a release build and the
'es.datastreams_feature_flag_registered' is enabled.

The integ-test-zip sets the feature flag if building a release build, otherwise
rest tests would fail.

Relates to #53100

This commits adds a data stream feature flag, initial definition of a data stream and
the stubs for the data stream create, delete and get APIs. Also simple serialization
tests are added and a rest test to thest the data stream API stubs.

This is a large amount of code and mainly mechanical, but this commit should be
straightforward to review, because there isn't any real logic.

The data stream transport and rest action are behind the data stream feature flag and
are only intialized if the feature flag is enabled. The feature flag is enabled if
elasticsearch is build as snapshot or a release build and the
'es.datastreams_feature_flag_registered' is enabled.

The integ-test-zip sets the feature flag if building a release build, otherwise
rest tests would fail.

Relates to elastic#53100
@martijnvg martijnvg added the :Data Management/Data streams Data streams and their lifecycles label Mar 17, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (:Core/Features/Data streams)

before this commit the data steams api were indices based actions,
but data streams aren't indices, data streams encapsulates indices,
but are indices themselves. It is a cluster level attribute, and
therefor cluster based action fits best for now.

Perhaps in the future we will have data stream based actions and
then this would be a right fit for the data stream crud apis.
Copy link
Contributor

@danhermann danhermann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

super(in);
this.name = in.readString();
this.timestampFieldName = in.readString();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to read the optional indices field here for creating a data stream with existing indices?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now this api will only create new empty indices.
When we add the ability to add existing indices to a data stream then we can add an optional indices field here. Makes sense?

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a number of comments, mostly on naming and locations, otherwise looking good.

@@ -0,0 +1,31 @@
{
"cluster.create_data_stream":{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this name should be: data_streams.create to be consistent with indices.create?

Other cluster operations seems to be mostly about the cluster state and cluster settings.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After chatting with Henning, I also came to the conclusion that data stream apis should be concidere indices based operations. (Ideally we should have a data_stream notion, but we need to think more about this and how this would work in security). So I will revert the commit in this pr that changes data stream crud apis back to indices based apis.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the component template apis are cluster operations, I suspect those may need to be changed also, can you explain the reasoning and how it will relate to security?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just like indices based apis, data stream apis targets a namespace. So someone may be allowed to create a data stream for logs-* but not for events-*.

Ideally there should be a data stream privilege that handles this correctly, because data streams aren't indices and a data stream may have indices that don't share the data stream name as common prefix. But for now let's stick with indices: based action names and group the apis in the indices client, until we a better there is a better understanding how security and data streams should integrate.

I think component templates should remain cluster based apis. The reason is that these resources don't apply to a namespace. The index template (based on the specified pattern) that use component templates should be an index based action/operation, since they apply the an index namespace and soon also to a data stream namespace.

However currently index templates are treated as cluster privilege (see ClusterPrivilegeResolver line 196 in master), even though the action names start with indices: prefix. With data streams we should properly fix this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I originally had component template APIs using indices: to match the existing template stuff, but I did end up changing it because that requires the request to provide the indices that it is going to apply to, and component templates don't apply to indices. +1 to properly fix the template exception with data streams.

@@ -0,0 +1,26 @@
{
"cluster.delete_data_stream":{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data_streams.delete?

@@ -0,0 +1,33 @@
{
"cluster.get_data_streams":{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data_streams.get?

cluster.create_data_stream:
name: data-stream2
body:
timestamp_field_name: "@timestamp"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think most places where we ask for a field, we use just "field" and not "field_name". I did not search all usages, but I think this should be just timestamp_field.

* specific language governing permissions and limitations
* under the License.
*/
package org.elasticsearch.action.admin.cluster.datastream;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should move this outside the cluster package? So datastream is a sibling to indices.

We might also want to follow the same substructure by adding create package now? Probably easiest to split into subpackages early.

public class CreateDataStreamAction extends ActionType<AcknowledgedResponse> {

public static final CreateDataStreamAction INSTANCE = new CreateDataStreamAction();
public static final String NAME = "cluster:admin/data_stream/create";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am in doubt if we should not add a specific prefix here? No need to look at this now, we can pick that up later.

@@ -1185,6 +1188,36 @@ public DeleteStoredScriptRequestBuilder prepareDeleteStoredScript(){
public DeleteStoredScriptRequestBuilder prepareDeleteStoredScript(String id){
return prepareDeleteStoredScript().setId(id);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same story as for the action names. We should consider whether to include this under indices or under a new "client" interface? Will add to our weekly sync, this should not block merging this.

import java.util.List;
import java.util.Objects;

public final class DataStream extends AbstractDiffable<DataStream> implements ToXContentObject {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to DataStreamMetaData (like IndexMetaData)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Data stream definitions will be stored in a custom metadata (like ComponentTemplateMetadata). These classes tend to have the Metadata suffix and therfore I think that adding MetaData suffix is confusing here.

Maybe we can rename this to DataStreamSource? Like how stored scripts are stored? (which also is stored inside a custom metadata).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I think I imagined a more native integration into the MetaData (outside Custom). I think the primary purpose of Custom is for extensibility (x-pack/plugins/modules)? In particular, I think MetaData.getAliasAndIndexLookup will need some support of data-streams? It could still look for the Custom metadata pieces, but feels second-class.

Let us leave this as is for now and tackle above when we get to add the metadata and do the lookups.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think MetaData.getAliasAndIndexLookup will need some support of data-streams?

We can add a getter on metadata that reads data stream from a custom and then it is as if it is stored as a primary field. The reason why I think we should do this is that, we get serialization, diffability and versioning for free without changing the Metadata class.

It could still look for the Custom metadata pieces, but feels second-class.

Perhaps we should rename this in the future. Scripts and pipelines and other primary concepts are stored also as custom metadata and these concepts aren't second class either.

@martijnvg martijnvg changed the base branch from feature/data-streams to master March 20, 2020 07:35
@martijnvg
Copy link
Member Author

I've changed the base branch to be the master branch instead of the data stream feature branch.

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Left a few minor optional comments to consider.

return actionName.startsWith("cluster:") || actionName.startsWith("indices:admin/template/");
return actionName.startsWith("cluster:") ||
actionName.startsWith("indices:admin/template/") ||
actionName.startsWith("indices:admin/data_stream/");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a todo to help the next reader until resolved? Something like:

// todo: hack until we implement security of data_streams

import java.util.List;
import java.util.Objects;

public final class DataStream extends AbstractDiffable<DataStream> implements ToXContentObject {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I think I imagined a more native integration into the MetaData (outside Custom). I think the primary purpose of Custom is for extensibility (x-pack/plugins/modules)? In particular, I think MetaData.getAliasAndIndexLookup will need some support of data-streams? It could still look for the Custom metadata pieces, but feels second-class.

Let us leave this as is for now and tackle above when we get to add the metadata and do the lookups.

* specific language governing permissions and limitations
* under the License.
*/
package org.elasticsearch.rest.action.admin.cluster;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the rest classes should move to org.elasticsearch.rest.action.admin.datastreams? Or org.elasticsearch.rest.action.admin.indices.datastream?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I missed that.

@martijnvg martijnvg merged commit 1204608 into elastic:master Mar 20, 2020
martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Mar 23, 2020
This commits adds a data stream feature flag, initial definition of a data stream and
the stubs for the data stream create, delete and get APIs. Also simple serialization
tests are added and a rest test to thest the data stream API stubs.

This is a large amount of code and mainly mechanical, but this commit should be
straightforward to review, because there isn't any real logic.

The data stream transport and rest action are behind the data stream feature flag and
are only intialized if the feature flag is enabled. The feature flag is enabled if
elasticsearch is build as snapshot or a release build and the
'es.datastreams_feature_flag_registered' is enabled.

The integ-test-zip sets the feature flag if building a release build, otherwise
rest tests would fail.

Relates to elastic#53100
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants