Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] MixedClusterClientYamlTestSuiteIT test {p0=search.vectors/60_dense_vector_dynamic_mapping/Fields with float arrays within the threshold map as dense_vector} failing #100502

Closed
edsavage opened this issue Oct 9, 2023 · 18 comments · Fixed by #101404
Assignees
Labels
low-risk An open issue or test failure that is a low risk to future releases :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch >test-failure Triaged test failures from CI

Comments

@edsavage
Copy link
Contributor

edsavage commented Oct 9, 2023

Test has failed just the once.

Build scan:
https://gradle-enterprise.elastic.co/s/bdc4arhkiazly/tests/:qa:mixed-cluster:v8.11.0%23mixedClusterTest/org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT/test%20%7Bp0=search.vectors%2F60_dense_vector_dynamic_mapping%2FFields%20with%20float%20arrays%20within%20the%20threshold%20map%20as%20dense_vector%7D
Reproduction line:

./gradlew ':qa:mixed-cluster:v8.11.0#mixedClusterTest' -Dtests.class="org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT" -Dtests.method="test {p0=search.vectors/60_dense_vector_dynamic_mapping/Fields with float arrays within the threshold map as dense_vector}" -Dtests.seed=D699A81BE55AB321 -Dtests.bwc=true -Dtests.locale=sr-Latn-ME -Dtests.timezone=America/Indiana/Vevay -Druntime.java=18

Applicable branches:
main

Reproduces locally?:
No

Failure history:
https://gradle-enterprise.elastic.co/scans/tests?tests.container=org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT&tests.test=test%20%7Bp0%3Dsearch.vectors/60_dense_vector_dynamic_mapping/Fields%20with%20float%20arrays%20within%20the%20threshold%20map%20as%20dense_vector%7D
Failure excerpt:

java.lang.AssertionError: Failure at [search.vectors/60_dense_vector_dynamic_mapping:61]: field [test-map-dense-vector.mappings.properties.my_field.dims] is null

  at __randomizedtesting.SeedInfo.seed([D699A81BE55AB321:5ECD97C14BA6DED9]:0)
  at org.elasticsearch.test.rest.yaml.ESClientYamlSuiteTestCase.executeSection(ESClientYamlSuiteTestCase.java:582)
  at org.elasticsearch.test.rest.yaml.ESClientYamlSuiteTestCase.test(ESClientYamlSuiteTestCase.java:534)
  at jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
  at java.lang.reflect.Method.invoke(Method.java:577)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
  at java.lang.Thread.run(Thread.java:833)

  Caused by: java.lang.AssertionError: field [test-map-dense-vector.mappings.properties.my_field.dims] is null

    at org.junit.Assert.fail(Assert.java:88)
    at org.junit.Assert.assertTrue(Assert.java:41)
    at org.junit.Assert.assertNotNull(Assert.java:712)
    at org.elasticsearch.test.rest.yaml.section.MatchAssertion.doAssert(MatchAssertion.java:78)
    at org.elasticsearch.test.rest.yaml.section.Assertion.execute(Assertion.java:65)
    at org.elasticsearch.test.rest.yaml.ESClientYamlSuiteTestCase.executeSection(ESClientYamlSuiteTestCase.java:562)
    at org.elasticsearch.test.rest.yaml.ESClientYamlSuiteTestCase.test(ESClientYamlSuiteTestCase.java:534)
    at jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
    at java.lang.reflect.Method.invoke(Method.java:577)
    at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
    at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
    at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
    at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
    at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
    at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
    at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
    at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
    at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
    at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
    at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
    at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
    at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
    at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
    at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
    at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
    at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
    at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
    at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
    at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
    at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
    at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
    at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
    at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
    at java.lang.Thread.run(Thread.java:833)

@edsavage edsavage added :Search/Search Search-related issues that do not fall into other categories >test-failure Triaged test failures from CI labels Oct 9, 2023
@elasticsearchmachine elasticsearchmachine added the Team:Search Meta label for search team label Oct 9, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@benwtrent benwtrent self-assigned this Oct 9, 2023
@benwtrent benwtrent added :Search Relevance/Vectors Vector search and removed :Search/Search Search-related issues that do not fall into other categories labels Oct 9, 2023
@benwtrent
Copy link
Member

I cannot get this to replicate locally :(.

I even tried going back to the particular versions of the upgraded and old servers.

From what I can tell, every time we create th test-map-dense-vector index, it updated (which is expected) to include the dims.

For this particular failure, I did notice that the shard becomes GREEN between us applying the mapping and then updating it.

[2023-10-09T09:26:29,968][INFO ][o.e.c.m.MetadataMappingService] [v8.11.0-3] [test-map-dense-vector/a7R-dIeAT4KxmFyYNW4Kzw] create_mapping
[2023-10-09T09:26:29,981][INFO ][o.e.c.r.a.AllocationService] [v8.11.0-3] current.health="GREEN" message="Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[test-map-dense-vector][0]]])." previous.health="YELLOW" reason="shards started [[test-map-dense-vector][0]]"
[2023-10-09T09:26:29,998][INFO ][o.e.c.m.MetadataMappingService] [v8.11.0-3] [test-map-dense-vector/a7R-dIeAT4KxmFyYNW4Kzw] update_mapping [_doc]
[2023-10-09T09:26:30,064][INFO ][o.e.x.m.MachineLearning  ] [v8.11.0-3] Starting machine learning feature reset
[2023-10-09T09:26:30,141][INFO ][o.e.x.m.MachineLearning  ] [v8.11.0-3] Finished machine learning feature reset
[2023-10-09T09:26:30,152][INFO ][o.e.c.m.MetadataDeleteIndexService] [v8.11.0-3] [test-map-dense-vector/a7R-dIeAT4KxmFyYNW4Kzw] deleting index

But that shouldn't be a problem as we already call with refresh: true...

@benwtrent benwtrent added the low-risk An open issue or test failure that is a low risk to future releases label Oct 9, 2023
@benwtrent
Copy link
Member

Adding some logging via: #100546

I see the mapping gets updated, but then the API complains that the dims is actually null on getting the mapping.

Just wanting to be 100% sure the mapping is being updated correctly in this mix cluster scenario via logging.

@benwtrent
Copy link
Member

In my PR to add more logging, a similar test failed & had my higher logging fidelity:



1> [2023-10-09T16:08:21,276][DEBUG][o.e.i.m.MapperService    ] [remote_cluster-1] [test-nested-index] [[test-nested-index/WmVZa5SuQYqzsx0KXAB74w]] added mapping, source [{"_doc":{"properties":{"my_nested_field":{"type":"nested"}}}}] |  
-- | --
  | 1> [2023-10-09T16:08:21,279][INFO ][o.e.i.m.MapperService    ] [remote_cluster-1] [test-nested-index] reloading search analyzers |  
  | 1> [2023-10-09T16:08:21,291][INFO ][o.e.c.r.a.AllocationService] [remote_cluster-0] current.health="GREEN" message="Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[test-nested-index][0]]])." previous.health="YELLOW" reason="shards started [[test-nested-index][0]]" |  
  | 1> [2023-10-09T16:08:21,315][INFO ][o.e.c.m.MetadataMappingService] [remote_cluster-0] [test-nested-index/WmVZa5SuQYqzsx0KXAB74w] update_mapping [_doc] |  
  | 1> [2023-10-09T16:08:21,325][DEBUG][o.e.i.m.MapperService    ] [remote_cluster-1] [test-nested-index] [[test-nested-index/WmVZa5SuQYqzsx0KXAB74w]] updated mapping, source [{"_doc":{"properties":{"my_nested_field":{"type":"nested","properties":{"my_field":{"type":"dense_vector","index":true,"similarity":"cosine"}}}}}}] |  
  | 1> [2023-10-09T16:08:21,332][INFO ][o.e.c.m.MetadataMappingService] [remote_cluster-0] [test-nested-index/WmVZa5SuQYqzsx0KXAB74w] update_mapping [_doc] |  
  | 1> [2023-10-09T16:08:21,342][DEBUG][o.e.i.m.MapperService    ] [remote_cluster-1] [test-nested-index] [[test-nested-index/WmVZa5SuQYqzsx0KXAB74w]] updated mapping, source [{"_doc":{"properties":{"my_nested_field":{"type":"nested","properties":{"my_field":{"type":"dense_vector","dims":128,"index":true,"similarity":"cosine"}}}}}}]

We can see that the MapperService is updating the mapping. But then on GET, I have this:

[2023-10-09T16:08:21,385][INFO ][o.e.t.r.y.CcsCommonYamlTestSuiteIT] [test] Stash dump on test failure [{ |  
-- | --
  | 1>   "stash" : { |  
  | 1>     "body" : { |  
  | 1>       "test-nested-index" : { |  
  | 1>         "mappings" : { |  
  | 1>           "properties" : { |  
  | 1>             "my_nested_field" : { |  
  | 1>               "type" : "nested", |  
  | 1>               "properties" : { |  
  | 1>                 "my_field" : { |  
  | 1>                   "type" : "dense_vector", |  
  | 1>                   "index" : true, |  
  | 1>                   "similarity" : "cosine" |  
  | 1>                 } |  
  | 1>               } |  
  | 1>             } |  
  | 1>           } |  
  | 1>         } |  
  | 1>       } |  
  | 1>     } |  
  | 1>   } |  
  | 1> }]

@benwtrent
Copy link
Member

I have updated the logging for this test via: #100546

if it fails again after that commit, we should be able to make a better diagnosis.

@iverase
Copy link
Contributor

iverase commented Oct 11, 2023

@benwtrent
Copy link
Member

@iverase my PR isn't merged yet :( hopefully it will fail after. Or better yet, during the CI build for the PR!

elasticsearchmachine pushed a commit that referenced this issue Oct 11, 2023
This adds a test for dynamic dims update mapping merges.

Also, this adds logging to help investigate a periodically failing test.

related to: #100502
@benwtrent
Copy link
Member

FYI person who sees this test fail on their PR. Please let me know the build so I can verify my logging is there. Once I have the logging, I can see about muting the test myself if it is a nuisance.

benwtrent added a commit to benwtrent/elasticsearch that referenced this issue Oct 13, 2023
…00546)

This adds a test for dynamic dims update mapping merges.

Also, this adds logging to help investigate a periodically failing test.

related to: elastic#100502
elasticsearchmachine pushed a commit that referenced this issue Oct 13, 2023
…100833)

This adds a test for dynamic dims update mapping merges.

Also, this adds logging to help investigate a periodically failing test.

related to: #100502
@benwtrent
Copy link
Member

Sub-objects mapped as dense_vector without dims dynamically map the dim size to the first indexed document

Failed in build: https://gradle-enterprise.elastic.co/s/db27w6u4cwpck

Here are the logs, they contain my trace logging.

testclusters.zip

@benwtrent
Copy link
Member

2023-10-13T15:11:41,633 is when the test failed.

On 2023-10-13T21:11:41,709 on node v8.11.0-3 (time zone hour makes sense), I see:

[2023-10-13T21:11:41,709][DEBUG][o.e.i.m.MapperService    ] [v8.11.0-3] [test-mapped-index] [[test-mapped-index/Vyr55lFOSXuCiHJJX4wxdQ]] updated mapping, source [{"_doc":{"properties":{"my_parent_object":{"properties":{"my_child_dense_vector_field":{"type":"dense_vector","dims":3,"index":true,"similarity":"cosine"}}}}}}]

So, we are obviously updating the mapping.

I wonder if the GET mapping code is just seeing an older mapping. Which stinks, because we wait for refresh=true before returning on the index request.

@benwtrent
Copy link
Member

With the additional logging: https://gradle-enterprise.elastic.co/s/4cgo6n7lqhbki

Test failed around 14:10:09, the index in question is my-dynamic-template-index/370HcJXNSiCCPGkx5_M9wA

[2023-10-17T14:10:09,889][INFO ][o.e.c.m.MetadataMappingService] [v8.11.0-2] [my-dynamic-template-index/370HcJXNSiCCPGkx5_M9wA] update_mapping [_doc]
[2023-10-17T14:10:09,943][DEBUG][o.e.c.s.ClusterApplierService] [v8.11.0-2] processing [Publication{term=4, version=1254}]: execute
[2023-10-17T14:10:09,951][TRACE][o.e.a.a.i.m.g.TransportGetMappingsAction] [v8.11.0-2] serving getMapping request based on version 1253
[2023-10-17T14:10:09,747][TRACE][o.e.c.s.ClusterApplierService] [v8.11.0-1] calling [org.elasticsearch.repositories.RepositoriesService@6e34a96a] with change to version [1252]
[2023-10-17T14:10:09,747][TRACE][o.e.c.s.ClusterApplierService] [v8.11.0-1] calling [org.elasticsearch.indices.cluster.IndicesClusterStateService@2c3158af] with change to version [1252]
[2023-10-17T14:10:09,750][DEBUG][o.e.i.m.MapperService    ] [v8.11.0-1] [my-dynamic-template-index] [[my-dynamic-template-index/370HcJXNSiCCPGkx5_M9wA]] updated mapping, source [{"_doc":{"dynamic_templates":[{"my_vector_template":{"match":"*dense_vector*","mapping":{"dims":5,"index":true,"similarity":"cosine","type":"dense_vector"}}},{"my_float_template":{"match":"*float*","mapping":{"type":"float"}}}],"properties":{"my_dense_vector_field":{"type":"dense_vector","dims":5,"index":true,"similarity":"cosine"},"my_float_field":{"type":"float"},"my_small_float_field":{"type":"float"}}}}]
[2023-10-17T14:10:09,750][TRACE][o.e.i.s.IndexShard       ] [v8.11.0-1] [my-dynamic-template-index][0] [my-dynamic-template-index][0] skip writing shard state, has been written before

It SEEMS like the new mapping is in cluster state version: 1252. I then noticed when we call TransportGetMappingsAction its version 1253 which indicates to me that the mapping SHOULD have the dimension update.

Is this a correct reading of
testclusters 2.zip
@DaveCTurner ?

@DaveCTurner
Copy link
Contributor

Not quite I think. The exception in that failure is java.lang.AssertionError: Failure at [search.vectors/60_dense_vector_dynamic_mapping:316]: field [my-dynamic-template-index.mappings.properties.my_dynamic_field.dims] is null and that field is indeed not in the mapping mentioned at 2023-10-17T14:10:09,750 in cluster state version 1252. This field appears at 2023-10-17T14:10:09,909 as part of the update to cluster state version 1254.

@DaveCTurner
Copy link
Contributor

Oh I think I see:

$ find testclusters -name 'v*.log' | xargs cat | sort | grep 'ClusterApplierService.*processing.*version=1254.*execute\|index \[4\]'
[2023-10-17T14:10:09,898][DEBUG][o.e.c.s.ClusterApplierService] [v8.11.0-0] processing [ApplyCommitRequest{term=4, version=1254, sourceNode={v8.11.0-2}{EDrn0Q4VRgaP3iQMfGae2A}{39WoirGLRwGvKvYCQrn_hw}{v8.11.0-2}{127.0.0.1}{127.0.0.1:39653}{cdfhilmrstw}{8.11.0}{7000099-8500003}{testattr=test, ml.allocated_processors_double=32.0, ml.machine_memory=101237211136, ml.config_version=11.0.0, ml.max_jvm_size=536870912, ml.allocated_processors=32, xpack.installed=true, transform.config_version=10.0.0}}]: execute
[2023-10-17T14:10:09,898][DEBUG][o.e.c.s.ClusterApplierService] [v8.11.0-1] processing [ApplyCommitRequest{term=4, version=1254, sourceNode={v8.11.0-2}{EDrn0Q4VRgaP3iQMfGae2A}{39WoirGLRwGvKvYCQrn_hw}{v8.11.0-2}{127.0.0.1}{127.0.0.1:39653}{cdfhilmrstw}{8.11.0}{7000099-8500003}{testattr=test, ml.allocated_processors_double=32.0, ml.machine_memory=101237211136, ml.config_version=11.0.0, ml.max_jvm_size=536870912, ml.allocated_processors=32, xpack.installed=true, transform.config_version=10.0.0}}]: execute
[2023-10-17T14:10:09,915][TRACE][o.e.i.s.IndexShard       ] [v8.11.0-0] [my-dynamic-template-index][0] index [4] seq# [-2] allocation-id [[id=KYY3YmwgRmeSyLeYRGssKA]] primaryTerm [1] operationPrimaryTerm [1] origin [PRIMARY]
[2023-10-17T14:10:09,919][TRACE][o.e.i.s.IndexShard       ] [v8.11.0-1] [my-dynamic-template-index][0] index [4] seq# [3] allocation-id [[id=GcrCavp5SKS6mjdDmSuXTw]] primaryTerm [1] operationPrimaryTerm [1] origin [REPLICA]
[2023-10-17T14:10:09,920][DEBUG][o.e.c.s.ClusterApplierService] [v8.11.0-3] processing [ApplyCommitRequest{term=4, version=1254, sourceNode={v8.11.0-2}{EDrn0Q4VRgaP3iQMfGae2A}{39WoirGLRwGvKvYCQrn_hw}{v8.11.0-2}{127.0.0.1}{127.0.0.1:39653}{cdfhilmrstw}{8.11.0}{7000099-8500003}{transform.config_version=10.0.0, xpack.installed=true, ml.allocated_processors=32, ml.max_jvm_size=536870912, ml.config_version=11.0.0, ml.machine_memory=101237211136, ml.allocated_processors_double=32.0, testattr=test}}]: execute
[2023-10-17T14:10:09,943][DEBUG][o.e.c.s.ClusterApplierService] [v8.11.0-2] processing [Publication{term=4, version=1254}]: execute

We apply the mapping update on v8.11.0-0 and v8.11.0-1 before indexing completes, but that doesn't guarantee it's applied on all nodes, and in particular v8.11.0-2 is a little behind. You need to do a GET _cluster/health?wait_for_events=languid to flush the master queue before checking the mappings.

@benwtrent
Copy link
Member

GET _cluster/health?wait_for_events=languid

I can try to add these to our yaml tests.

@DaveCTurner
Copy link
Contributor

I think we have a handle on the problem and its solution here now so I'll mute the tests.

DaveCTurner added a commit that referenced this issue Oct 18, 2023
elasticsearchmachine pushed a commit that referenced this issue Oct 18, 2023
…101006)

I noticed that sometimes we read from the `fieldType()` and other times
we read from the mapper directly. It seems to me that we should only
ever read and update values from one of those, for sanity's sake. 

So, I removed all values that were part of the mapper directly and used
`fieldType().<value>` everywhere.

Additionally, David Turner suggested that we wait for cluster health
before verifying mappings in the yaml tests, so I added that as well.

related to: #100502
benwtrent added a commit to benwtrent/elasticsearch that referenced this issue Oct 18, 2023
…lastic#101006)

I noticed that sometimes we read from the `fieldType()` and other times
we read from the mapper directly. It seems to me that we should only
ever read and update values from one of those, for sanity's sake. 

So, I removed all values that were part of the mapper directly and used
`fieldType().<value>` everywhere.

Additionally, David Turner suggested that we wait for cluster health
before verifying mappings in the yaml tests, so I added that as well.

related to: elastic#100502
elasticsearchmachine pushed a commit that referenced this issue Oct 18, 2023
…101006) (#101076)

I noticed that sometimes we read from the `fieldType()` and other times
we read from the mapper directly. It seems to me that we should only
ever read and update values from one of those, for sanity's sake. 

So, I removed all values that were part of the mapper directly and used
`fieldType().<value>` everywhere.

Additionally, David Turner suggested that we wait for cluster health
before verifying mappings in the yaml tests, so I added that as well.

related to: #100502
@benwtrent
Copy link
Member

These were failing pretty much at least once a day for weeks, I have verified since merging the fix, these having failed. I am going to revert all the logging changes and close this issue.

benwtrent added a commit to benwtrent/elasticsearch that referenced this issue Oct 26, 2023
elasticsearchmachine pushed a commit that referenced this issue Oct 26, 2023
Test hasn't failed for over a week, the original issue seems resolved as
the at least one of the tests were failing once a day.

closes #100502
benwtrent added a commit to benwtrent/elasticsearch that referenced this issue Oct 26, 2023
Test hasn't failed for over a week, the original issue seems resolved as
the at least one of the tests were failing once a day.

closes elastic#100502

(cherry picked from commit 3b0b484)
elasticsearchmachine pushed a commit that referenced this issue Oct 26, 2023
Test hasn't failed for over a week, the original issue seems resolved as
the at least one of the tests were failing once a day.

closes #100502

(cherry picked from commit 3b0b484)
@javanna javanna added Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
low-risk An open issue or test failure that is a low risk to future releases :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants