-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate data generator in LogsDB mode challenge test #111303
Conversation
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The integration of the data generator looks great.
// Nested fields don't work with subobjects: false. | ||
.withNestedFieldsLimit(0) | ||
// TODO increase depth of objects | ||
// Currently matching fails because in synthetic source all fields are flat (given that we have subobjects: false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this set? I think we don't want to proceed with this..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a consequence of how synthetic source works. We'll get a synthetic field loader for every mapper and let them populate the synthetic source. With subobjects: false
we only have leaf mappers - e.g. a keyword field mapper for field a.b.c
which will write a.b.c: "foo"
in the source. It is object mappers who usually add structure to synthetic source and in this case there are no object mappers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would have to write a special implementation of top-level synthetic field loader if we wanted to fix this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was referring to using something like #110524, instead of subobjects: false
.. We know that the latter has shortcomings, shouldn't be used in logsdb mode in its current form..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, i definitely think that we should use auto
once it is available. Just to clarify - do you propose to remove subobjects: false
until then? I am okay with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, we should probably do so if the matching logic can tolerate it. Once auto
is in, it should apply to both standard and logsdb as it's orthogonal.
.field("ignore_malformed", randomBoolean()) | ||
.endObject() | ||
|
||
.endObject(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd keep the hard-coded version as it's easier to debug, and add the randomized one as a separate test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that, will do.
@@ -133,11 +124,13 @@ private static void settings(final Settings.Builder settings) { | |||
@Override | |||
public void contenderSettings(Settings.Builder builder) { | |||
builder.put("index.mode", "logsdb"); | |||
builder.put("index.mapping.total_fields.limit", 5000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add a commonSettings()
function, in a follow-up.
public class TopLevelObjectFieldDataGenerator { | ||
private final Context context; | ||
private final List<GenericSubObjectFieldDataGenerator.ChildField> predefinedFields; | ||
private final List<GenericSubObjectFieldDataGenerator.ChildField> generatedChildFields; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding comments to explain the difference between these two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late response. Added a few minor comments, feel free to address them in a follow-up.
It's so amazing to see this happening in such a short time, well done.
* upstream/main: (105 commits) Removing the use of watcher stats from WatchAcTests (elastic#111435) Mute org.elasticsearch.xpack.restart.FullClusterRestartIT testSingleDoc {cluster=UPGRADED} elastic#111434 Make `EnrichPolicyRunner` more properly async (elastic#111321) Mute org.elasticsearch.xpack.restart.FullClusterRestartIT testSingleDoc {cluster=OLD} elastic#111430 Mute org.elasticsearch.xpack.esql.expression.function.aggregate.ValuesTests testGroupingAggregate {TestCase=<long unicode KEYWORDs>} elastic#111428 Mute org.elasticsearch.xpack.esql.expression.function.aggregate.ValuesTests testGroupingAggregate {TestCase=<long unicode TEXTs>} elastic#111429 Mute org.elasticsearch.xpack.repositories.metering.azure.AzureRepositoriesMeteringIT org.elasticsearch.xpack.repositories.metering.azure.AzureRepositoriesMeteringIT elastic#111307 Update semantic_text field to support indexing numeric and boolean data types (elastic#111284) Mute org.elasticsearch.repositories.blobstore.testkit.AzureSnapshotRepoTestKitIT testRepositoryAnalysis elastic#111280 Ensure vector similarity correctly limits inner_hits returned for nested kNN (elastic#111363) Fix LogsIndexModeFullClusterRestartIT (elastic#111362) Remove 4096 bool query max limit from docs (elastic#111421) Fix score count validation in reranker response (elastic#111212) Integrate data generator in LogsDB mode challenge test (elastic#111303) ESQL: Add COUNT and COUNT_DISTINCT aggregation tests (elastic#111409) [Service Account] Add AutoOps account (elastic#111316) [ML] Fix failing test DetectionRulesTests.testEqualsAndHashcode (elastic#111351) [ML] Create and inject APM Inference Metrics (elastic#111293) [DOCS] Additional reranking docs updates (elastic#111350) Mute org.elasticsearch.repositories.azure.RepositoryAzureClientYamlTestSuiteIT org.elasticsearch.repositories.azure.RepositoryAzureClientYamlTestSuiteIT elastic#111345 ... # Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java
This PR integrates data generator into the standard vs logsdb test. I had to disable some features of data generation due to matcher not being ready to work with the differences they cause in source.
#109800