Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingest ingest then create index #39607

Merged

Conversation

jakelandis
Copy link
Contributor

@jakelandis jakelandis commented Mar 2, 2019

Execute ingest node pipeline before creating the index

Prior to this commit (and after 6.5.0), if an ingest node changes
the _index in a pipeline, the original target index would be created.
For daily indexes this could create an extra, empty index per day.

This commit changes the TransportBulkAction to execute the ingest node
pipeline before attempting to create the index. This ensures that the
only index created is the original or one set by the ingest node pipeline.
This was the execution order prior to 6.5.0 (#32786).

The execution order was changed in 6.5 to better support default pipelines.
Specifically the execution order was changed to be able to read the settings
from the index meta data. This commit also includes a change in logic such
that if the target index does not exist when ingest node pipeline runs, it
will now pull the default pipeline (if one exists) from the settings of the
best matched of the index template.

Relates #32786
Relates #32758
Closes #36545


There is some history here...

Originally ingest node executed before any creation of indices. We introduced default pipelines and the discovered #32758 (the first doc in an index failed to get run through the pipeline). So we fixed that via #32758 (changes to create the index first). However, that fix introduced #36545 (comment) (note - the title is misleading) where now if the pipeline re-directs the _index it still creates the original _index resulting in an empty index. This is particularly problematic for daily indices that always re-directed, since it would result in an empty index per day. This (re-directing daily indices) is also how upgrading monitoring templates work, and thus this PR is a blocker to #39336.

The fix here is to go back to the original order where ingest node executes before creation of indexes, but solve #32758 differently. Instead of creating the index and always reading the index meta data, if the index does not exist we can now do the index template matching to look for the default pipeline.

@jakelandis jakelandis added WIP :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP v7.0.0 v8.0.0 v7.2.0 labels Mar 2, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features

@@ -91,4 +91,4 @@ teardown:
get:
index: test
id: 3
- match: { found: false }
Copy link
Contributor Author

@jakelandis jakelandis Mar 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is removed because with this change the test index is no longer created since the index creation only happens after ingest (and this ingest drops the index request, thus no index ).

if (IngestService.NOOP_PIPELINE_NAME.equals(defaultPipeline) == false) {
hasIndexRequestsWithPipelines = true;
}
} else if (indexRequest.index() != null) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving the ingest node execution before the create index is the primary change of this PR.

However, to allow the prevent re-introducing #32758, we need to look up the default pipeline from the matched templates which is what this code block does.

@@ -460,7 +462,7 @@ public void testUseDefaultPipelineWithBulkUpsert() throws Exception {
verifyZeroInteractions(transportService);
}

public void testCreateIndexBeforeRunPipeline() throws Exception {
public void testDoExecuteCalledTwiceCorrectly() throws Exception {
Copy link
Contributor Author

@jakelandis jakelandis Mar 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is valid, but it's name was slightly misleading (it still passed with my change to run pipeline before the create index)

--------------------------------------------------
// TESTRESPONSE
////

Copy link
Contributor Author

@jakelandis jakelandis Mar 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these were removed because now that the ingest happens before index creation, if the index doesn't exist yet, and we only drop a document, the index never exists, and thus the response is different.

@martijnvg
Copy link
Member

@jakelandis I checked this out and the approach in the pr looks good to me.

@jakelandis jakelandis marked this pull request as ready for review March 5, 2019 15:45
@jakelandis
Copy link
Contributor Author

@martijnvg @original-brownbear thanks for taking a look.

I forgot to pull the PR out of draft mode, could I get a review approval or LGTM (assuming this is good to go).

@original-brownbear
Copy link
Member

@jakelandis giving it a thorough review now then, won't take long :)

Copy link
Member

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one problem, otherwise LGTM

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@jakelandis jakelandis merged commit 66ec358 into elastic:master Mar 6, 2019
jakelandis added a commit to jakelandis/elasticsearch that referenced this pull request Mar 7, 2019
Prior to this commit (and after 6.5.0), if an ingest node changes
the _index in a pipeline, the original target index would be created.
For daily indexes this could create an extra, empty index per day.

This commit changes the TransportBulkAction to execute the ingest node
pipeline before attempting to create the index. This ensures that the 
only index created is the original or one set by the ingest node pipeline. 
This was the execution order prior to 6.5.0 (elastic#32786). 

The execution order was changed in 6.5 to better support default pipelines. 
Specifically the execution order was changed to be able to read the settings
from the index meta data. This commit also includes a change in logic such 
that if the target index does not exist when ingest node pipeline runs, it 
will now pull the default pipeline (if one exists) from the settings of the 
best matched of the index template. 

Relates elastic#32786
Relates elastic#32758 
Closes elastic#36545
jakelandis added a commit to jakelandis/elasticsearch that referenced this pull request Mar 7, 2019
Prior to this commit (and after 6.5.0), if an ingest node changes
the _index in a pipeline, the original target index would be created.
For daily indexes this could create an extra, empty index per day.

This commit changes the TransportBulkAction to execute the ingest node
pipeline before attempting to create the index. This ensures that the 
only index created is the original or one set by the ingest node pipeline. 
This was the execution order prior to 6.5.0 (elastic#32786). 

The execution order was changed in 6.5 to better support default pipelines. 
Specifically the execution order was changed to be able to read the settings
from the index meta data. This commit also includes a change in logic such 
that if the target index does not exist when ingest node pipeline runs, it 
will now pull the default pipeline (if one exists) from the settings of the 
best matched of the index template. 

Relates elastic#32786
Relates elastic#32758 
Closes elastic#36545
jakelandis added a commit to jakelandis/elasticsearch that referenced this pull request Mar 7, 2019
Prior to this commit (and after 6.5.0), if an ingest node changes
the _index in a pipeline, the original target index would be created.
For daily indexes this could create an extra, empty index per day.

This commit changes the TransportBulkAction to execute the ingest node
pipeline before attempting to create the index. This ensures that the 
only index created is the original or one set by the ingest node pipeline. 
This was the execution order prior to 6.5.0 (elastic#32786). 

The execution order was changed in 6.5 to better support default pipelines. 
Specifically the execution order was changed to be able to read the settings
from the index meta data. This commit also includes a change in logic such 
that if the target index does not exist when ingest node pipeline runs, it 
will now pull the default pipeline (if one exists) from the settings of the 
best matched of the index template. 

Relates elastic#32786
Relates elastic#32758 
Closes elastic#36545
jakelandis added a commit that referenced this pull request Mar 7, 2019
Prior to this commit (and after 6.5.0), if an ingest node changes
the _index in a pipeline, the original target index would be created.
For daily indexes this could create an extra, empty index per day.

This commit changes the TransportBulkAction to execute the ingest node
pipeline before attempting to create the index. This ensures that the 
only index created is the original or one set by the ingest node pipeline. 
This was the execution order prior to 6.5.0 (#32786). 

The execution order was changed in 6.5 to better support default pipelines. 
Specifically the execution order was changed to be able to read the settings
from the index meta data. This commit also includes a change in logic such 
that if the target index does not exist when ingest node pipeline runs, it 
will now pull the default pipeline (if one exists) from the settings of the 
best matched of the index template. 

Relates #32786
Relates #32758 
Closes #36545
jakelandis added a commit that referenced this pull request Mar 7, 2019
Prior to this commit (and after 6.5.0), if an ingest node changes
the _index in a pipeline, the original target index would be created.
For daily indexes this could create an extra, empty index per day.

This commit changes the TransportBulkAction to execute the ingest node
pipeline before attempting to create the index. This ensures that the 
only index created is the original or one set by the ingest node pipeline. 
This was the execution order prior to 6.5.0 (#32786). 

The execution order was changed in 6.5 to better support default pipelines. 
Specifically the execution order was changed to be able to read the settings
from the index meta data. This commit also includes a change in logic such 
that if the target index does not exist when ingest node pipeline runs, it 
will now pull the default pipeline (if one exists) from the settings of the 
best matched of the index template. 

Relates #32786
Relates #32758 
Closes #36545
jakelandis added a commit that referenced this pull request Mar 8, 2019
Prior to this commit (and after 6.5.0), if an ingest node changes
the _index in a pipeline, the original target index would be created.
For daily indexes this could create an extra, empty index per day.

This commit changes the TransportBulkAction to execute the ingest node
pipeline before attempting to create the index. This ensures that the 
only index created is the original or one set by the ingest node pipeline. 
This was the execution order prior to 6.5.0 (#32786). 

The execution order was changed in 6.5 to better support default pipelines. 
Specifically the execution order was changed to be able to read the settings
from the index meta data. This commit also includes a change in logic such 
that if the target index does not exist when ingest node pipeline runs, it 
will now pull the default pipeline (if one exists) from the settings of the 
best matched of the index template. 

Relates #32786
Relates #32758 
Closes #36545
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP v6.7.0 v7.0.0-rc2 v7.2.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

If an ingest node changes the target index, the original index is still created.
5 participants