Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate data generator in LogsDB mode challenge test #111303

Merged
merged 3 commits into from
Jul 29, 2024

Conversation

lkts
Copy link
Contributor

@lkts lkts commented Jul 25, 2024

This PR integrates data generator into the standard vs logsdb test. I had to disable some features of data generation due to matcher not being ready to work with the differences they cause in source.

#109800

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The integration of the data generator looks great.

@lkts lkts merged commit a4e6cf9 into elastic:main Jul 29, 2024
15 checks passed
@lkts lkts deleted the integrate_data_generator branch July 29, 2024 18:35
// Nested fields don't work with subobjects: false.
.withNestedFieldsLimit(0)
// TODO increase depth of objects
// Currently matching fails because in synthetic source all fields are flat (given that we have subobjects: false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this set? I think we don't want to proceed with this..

Copy link
Contributor Author

@lkts lkts Jul 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a consequence of how synthetic source works. We'll get a synthetic field loader for every mapper and let them populate the synthetic source. With subobjects: false we only have leaf mappers - e.g. a keyword field mapper for field a.b.c which will write a.b.c: "foo" in the source. It is object mappers who usually add structure to synthetic source and in this case there are no object mappers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would have to write a special implementation of top-level synthetic field loader if we wanted to fix this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referring to using something like #110524, instead of subobjects: false.. We know that the latter has shortcomings, shouldn't be used in logsdb mode in its current form..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, i definitely think that we should use auto once it is available. Just to clarify - do you propose to remove subobjects: false until then? I am okay with that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, we should probably do so if the matching logic can tolerate it. Once auto is in, it should apply to both standard and logsdb as it's orthogonal.

.field("ignore_malformed", randomBoolean())
.endObject()

.endObject();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd keep the hard-coded version as it's easier to debug, and add the randomized one as a separate test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that, will do.

@@ -133,11 +124,13 @@ private static void settings(final Settings.Builder settings) {
@Override
public void contenderSettings(Settings.Builder builder) {
builder.put("index.mode", "logsdb");
builder.put("index.mapping.total_fields.limit", 5000);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a commonSettings() function, in a follow-up.

public class TopLevelObjectFieldDataGenerator {
private final Context context;
private final List<GenericSubObjectFieldDataGenerator.ChildField> predefinedFields;
private final List<GenericSubObjectFieldDataGenerator.ChildField> generatedChildFields;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding comments to explain the difference between these two.

Copy link
Contributor

@kkrik-es kkrik-es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late response. Added a few minor comments, feel free to address them in a follow-up.

It's so amazing to see this happening in such a short time, well done.

weizijun added a commit to weizijun/elasticsearch that referenced this pull request Jul 30, 2024
* upstream/main: (105 commits)
  Removing the use of watcher stats from WatchAcTests (elastic#111435)
  Mute org.elasticsearch.xpack.restart.FullClusterRestartIT testSingleDoc {cluster=UPGRADED} elastic#111434
  Make `EnrichPolicyRunner` more properly async (elastic#111321)
  Mute org.elasticsearch.xpack.restart.FullClusterRestartIT testSingleDoc {cluster=OLD} elastic#111430
  Mute org.elasticsearch.xpack.esql.expression.function.aggregate.ValuesTests testGroupingAggregate {TestCase=<long unicode KEYWORDs>} elastic#111428
  Mute org.elasticsearch.xpack.esql.expression.function.aggregate.ValuesTests testGroupingAggregate {TestCase=<long unicode TEXTs>} elastic#111429
  Mute org.elasticsearch.xpack.repositories.metering.azure.AzureRepositoriesMeteringIT org.elasticsearch.xpack.repositories.metering.azure.AzureRepositoriesMeteringIT elastic#111307
  Update semantic_text field to support indexing numeric and boolean data types (elastic#111284)
  Mute org.elasticsearch.repositories.blobstore.testkit.AzureSnapshotRepoTestKitIT testRepositoryAnalysis elastic#111280
  Ensure vector similarity correctly limits inner_hits returned for nested kNN (elastic#111363)
  Fix LogsIndexModeFullClusterRestartIT (elastic#111362)
  Remove 4096 bool query max limit from docs (elastic#111421)
  Fix score count validation in reranker response (elastic#111212)
  Integrate data generator in LogsDB mode challenge test (elastic#111303)
  ESQL: Add COUNT and COUNT_DISTINCT aggregation tests (elastic#111409)
  [Service Account] Add AutoOps account (elastic#111316)
  [ML] Fix failing test DetectionRulesTests.testEqualsAndHashcode (elastic#111351)
  [ML] Create and inject APM Inference Metrics (elastic#111293)
  [DOCS] Additional reranking docs updates (elastic#111350)
  Mute org.elasticsearch.repositories.azure.RepositoryAzureClientYamlTestSuiteIT org.elasticsearch.repositories.azure.RepositoryAzureClientYamlTestSuiteIT elastic#111345
  ...

# Conflicts:
#	server/src/main/java/org/elasticsearch/TransportVersions.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants