-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Indices with "_source.enabled: false" same size as indices with "_source.enabled: true" #41628
Comments
Pinging @elastic/es-search |
I think the increase in size is due to the fact that we now add a In 6.7, For this reason, I'm not sure the behavior indicates a bug. I'll tag @elastic/es-distributed to see if they think a follow-up is in order or would like to add any information. |
Thanks @davemoore- for reporting this. @jtibshirani your explanation is correct. This test creates a single segment (the dataset is quite small), and TierMergePolicy thinks that segment is merged already. Hence, RecoverySourcePruneMergePolicy is never triggered to prune away _recovery_source. In this scenario, this behaviour probably is ok as the store size is pretty small (25MB). However, this behaviour can be problematic with a larger dataset for it won't prune away _recovery_source when the retention lease advances if segments are merged already. I have a test that demonstrates this behavior. public void testPruneRecoverySource() throws Exception {
Settings.Builder settings = Settings.builder()
.put(defaultSettings.getSettings())
.put(IndexSettings.INDEX_SOFT_DELETES_SETTING.getKey(), true)
.put(IndexSettings.INDEX_SOFT_DELETES_RETENTION_OPERATIONS_SETTING.getKey(), 0);
final IndexMetaData indexMetaData = IndexMetaData.builder(defaultSettings.getIndexMetaData()).settings(settings).build();
final IndexSettings indexSettings = IndexSettingsModule.newIndexSettings(indexMetaData);
final AtomicLong globalCheckpoint = new AtomicLong(SequenceNumbers.NO_OPS_PERFORMED);
final MapperService mapperService = createMapperService("test");
final MergePolicy mp = new TieredMergePolicy(); // works with LogDocMergePolicy
try (Store store = createStore();
InternalEngine engine = createEngine(config(indexSettings, store, createTempDir(), mp, null, null, globalCheckpoint::get))) {
int numDocs = 10;
for (int i = 0; i < numDocs; i++) {
ParsedDocument doc = testParsedDocument(Integer.toString(i), null, testDocument(), new BytesArray("{}"), null, true);
engine.index(indexForDoc(doc));
}
globalCheckpoint.set(engine.getLocalCheckpoint());
engine.syncTranslog();
engine.flush(true, true);
engine.forceMerge(true, 1, false, false, false);
try (Translog.Snapshot snapshot = engine.newChangesSnapshot("test", mapperService, 0, Long.MAX_VALUE, true)) {
IllegalStateException sourceNotFound = expectThrows(IllegalStateException.class, snapshot::next);
assertThat(sourceNotFound.getMessage(), startsWith("source not found"));
}
}
} |
I think we can look into triggering a merge we would drop all sources. Yet, this is really only something that is relevant or should be done in a force_merge context? @dnhatn WDYT? I mean this might look confusing but it's what it is. We can't magically make it go away. I think if you run force merge too quick you would still have the same issue if we need to retain the sources. |
Maybe we should fail |
This test failure manifests the limitation of the recovery source merge policy explained in #41628. If we already merge down to a single segment then subsequent force merges will be noop although they can prune recovery source. We need to adjust this test until we have a fix for the merge policy. Relates #41628 Closes #48735
This test failure manifests the limitation of the recovery source merge policy explained in #41628. If we already merge down to a single segment then subsequent force merges will be noop although they can prune recovery source. We need to adjust this test until we have a fix for the merge policy. Relates #41628 Closes #48735
This test failure manifests the limitation of the recovery source merge policy explained in #41628. If we already merge down to a single segment then subsequent force merges will be noop although they can prune recovery source. We need to adjust this test until we have a fix for the merge policy. Relates #41628 Closes #48735
This test failure manifests the limitation of the recovery source merge policy explained in #41628. If we already merge down to a single segment then subsequent force merges will be noop although they can prune recovery source. We need to adjust this test until we have a fix for the merge policy. Relates #41628 Closes #48735
This test failure manifests the limitation of the recovery source merge policy explained in #41628. If we already merge down to a single segment then subsequent force merges will be noop although they can prune recovery source. We need to adjust this test until we have a fix for the merge policy. Relates #41628 Closes #48735
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
Will be fixed in #114618 |
Elasticsearch version: 7.0.0
Plugins installed: []
JVM version: OpenJDK 1.8.0_191
OS version: Ubuntu 16.04 (or Elastic Cloud)
Description of the problem including expected versus actual behavior:
When setting
_source.enabled: false
in the index mapping, the_source
should not be stored.In 7.0.0, when two indices have identical data and mappings (except for one having
_source.enabled: false
), the indices will be almost exactly the same size. This isn't the expected behavior.In 6.7.1, when two indices with identical data and mappings (except for one having
source.enabled: false
), the index with_source.enabled: false
is roughly half the size of the one with_source
enabled. This is the expected behavior.Steps to reproduce:
Overview:
Create two Elasticsearch clusters: version 6.7.1 and version 7.0.0.
Create two index templates with identical mappings, but let the second template use
_source.enabled: false
. Put these two index templates in both clusters.Load data into the two indices on both clusters.
Force merge the indices to a single segment.
Compare the "Storage Size" of the two indices in Kibana for each cluster:
/app/kibana#/management/elasticsearch/index_management/indices
More detailed:
Create the following templates and pipelines in the 7.0.0 cluster:
Create the following indices and templates in the 6.7.1 cluster:
Download and unzip the data from https://storage.googleapis.com/elasticsearch-sizing-workshop/data/nginx.zip and then load the nginx.log file into the
"logs"
and"logs-nosource"
indices on both clusters.Force merge the indices to a single segment.
Compare the size of the indices in Kibana. Elasticsearch 7.0.0 shows both indices as being roughly the same size, whereas Elasticsearch 6.7.1 shows the
"logs-nosource"
index being roughly half the size of the"logs"
index.The text was updated successfully, but these errors were encountered: