Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BWC] Ensure 2.x compatibility with Legacy 7.10.x #1902

Merged
merged 4 commits into from
Jan 17, 2022

Conversation

nknize
Copy link
Collaborator

@nknize nknize commented Jan 14, 2022

This PR fixes TransportHandshaker to send a spoofed Legacy 7.10.2 mincompat
version to ensure OpenSearch 2.x nodes can join a Legacy 7.10.x cluster for
rolling upgrade support. Without this change 7.10.x and OpenSearch 2.x mixed
cluster bwc tests were failing.

@nknize nknize added :test Adding or fixing a test v2.0.0 Version 2.0.0 backwards-compatibility labels Jan 14, 2022
@nknize nknize requested a review from a team as a code owner January 14, 2022 04:31
@opensearch-ci-bot
Copy link
Collaborator

Can one of the admins verify this patch?

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure db4542631a5781ababb07ef25f737cdaf398c4d6
Log 1914

Reports 1914

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure ad1d9029d41a8ffb381281bc59559e977d8843e2
Log 1916

Reports 1916

@nknize nknize force-pushed the bwc/fix2xTransportHandshake branch from ad1d902 to 48b8bf7 Compare January 14, 2022 05:47
@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure 48b8bf739a04111d8cead96ddb31aefed079f492
Log 1918

Reports 1918

@nknize
Copy link
Collaborator Author

nknize commented Jan 14, 2022

Last check failed with a non-reproducible test; documenting for posterity:

REPRODUCE WITH: ./gradlew ':server:internalClusterTest' --tests "org.opensearch.index.shard.IndexShardIT.testExpectedShardSizeIsPresent" -Dtests.seed=9C6FF4BF9EE79C60 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=ar-LY -Dtests.timezone=Asia/Novokuznetsk -Druntime.java=17
2> java.lang.AssertionError
        at __randomizedtesting.SeedInfo.seed([9C6FF4BF9EE79C60:D0C2765DABB1D7ED]:0)
        at org.junit.Assert.fail(Assert.java:87)
        at org.junit.Assert.assertTrue(Assert.java:42)
        at org.junit.Assert.assertTrue(Assert.java:53)
        at org.opensearch.index.shard.IndexShardIT.testExpectedShardSizeIsPresent(IndexShardIT.java:307)
  1> [2022-01-14T12:58:28,900][INFO ][o.o.i.s.IndexShardIT     ] [testLimitNumberOfRetainedTranslogFiles] before test

Also looks like an unexpected circuit breaker was tripped; likely unrelated to the test failure:

Caused by: org.opensearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<agg [foo_terms]>] would be [6948/6.7kb], which is larger than the limit of [1024/1kb], usages [request=5120/5kb, fielddata=0/0b, in_flight_requests=0/0b, accounting=1828/1.7kb]
  1> 	at org.opensearch.indices.breaker.HierarchyCircuitBreakerService.checkParentLimit(HierarchyCircuitBreakerService.java:484) ~[main/:?]
  1> 	at org.opensearch.common.breaker.ChildMemoryCircuitBreaker.addEstimateBytesAndMaybeBreak(ChildMemoryCircuitBreaker.java:133) ~[main/:?]
  1> 	at org.opensearch.search.aggregations.AggregatorBase.addRequestCircuitBreakerBytes(AggregatorBase.java:167) ~[main/:?]
  1> 	at org.opensearch.search.aggregations.AggregatorBase.<init>(AggregatorBase.java:129) ~[main/:?]
  1> 	at org.opensearch.search.aggregations.bucket.BucketsAggregator.<init>(BucketsAggregator.java:78) ~[main/:?]
  1> 	at org.opensearch.search.aggregations.bucket.DeferableBucketAggregator.<init>(DeferableBucketAggregator.java:64) ~[main/:?]
  1> 	at org.opensearch.search.aggregations.bucket.terms.TermsAggregator.<init>(TermsAggregator.java:206) ~[main/:?]
  1> 	at org.opensearch.search.aggregations.bucket.terms.AbstractStringTermsAggregator.<init>(AbstractStringTermsAggregator.java:65) ~[main/:?]
  1> 	at org.opensearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator.<init>(GlobalOrdinalsStringTermsAggregator.java:112) ~[main/:?]
  1> 	at org.opensearch.search.aggregations.bucket.terms.TermsAggregatorFactory$ExecutionMode$2.create(TermsAggregatorFactory.java:487) ~[main/:?]
  1> 	at org.opensearch.search.aggregations.bucket.terms.TermsAggregatorFactory$1.build(TermsAggregatorFactory.java:135) ~[main/:?]
  1> 	at org.opensearch.search.aggregations.bucket.terms.TermsAggregatorFactory.doCreateInternal(TermsAggregatorFactory.java:306) ~[main/:?]
  1> 	at org.opensearch.search.aggregations.support.ValuesSourceAggregatorFactory.createInternal(ValuesSourceAggregatorFactory.java:71) ~[main/:?]
  1> 	at org.opensearch.search.aggregations.AggregatorFactory.create(AggregatorFactory.java:96) ~[main/:?]
  1> 	at org.opensearch.search.aggregations.AggregatorFactories.createTopLevelAggregators(AggregatorFactories.java:276) ~[main/:?]
  1> 	at org.opensearch.search.aggregations.AggregationPhase.preProcess(AggregationPhase.java:63) ~[main/:?]
  1> 	at org.opensearch.search.query.QueryPhase.execute(QueryPhase.java:161) ~[main/:?]

@nknize
Copy link
Collaborator Author

nknize commented Jan 14, 2022

Pushed commits to the development branch do not seem to be updating this PR. :trollface:

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure bc945e4d9cd8648305c374a7ec89c33d9ef67caa
Log 1922

Reports 1922

@nknize
Copy link
Collaborator Author

nknize commented Jan 14, 2022

Another failure that can't be reproduced! (╯°□°)╯︵ ┻━┻

REPRODUCE WITH: ./gradlew ':server:internalClusterTest' --tests "org.opensearch.cluster.routing.allocation.decider.MockDiskUsagesIT.testRerouteOccursOnDiskPassingHighWatermark" -Dtests.seed=994DB46D3A71E388 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=pt -Dtests.timezone=Asia/Tel_Aviv -Druntime.java=17

Looks like a node timeout issue at MockDiskUsagesIT.java#L166

1> [2022-01-14T19:03:26,620][WARN ][o.o.c.NodeConnectionsService] [node_t1] failed to connect to {node_t0}{C3fT4Fp9SjmjuepSij5_0Q}{IqnPkDZ7TZu52WOm_HWKOA}{127.0.0.1}{127.0.0.1:43583}{dimr}{shard_indexing_pressure_enabled=true} (tried [1] times)
  1> org.opensearch.transport.ConnectTransportException: [node_t0][127.0.0.1:43583] connect_exception

Gave up after one try... valiant effort (。々°)

@dblock
Copy link
Member

dblock commented Jan 14, 2022

testRerouteOccursOnDiskPassingHighWatermark

This is a new one, open an issue, link back to #1715

@reta
Copy link
Collaborator

reta commented Jan 14, 2022

Uh ... Never seen MockDiskUsagesIT failing ... yet ...

@nknize
Copy link
Collaborator Author

nknize commented Jan 14, 2022

I've never seen a PR not update after pushing to the upstream branch... yet. (⊙_◎)

@nknize
Copy link
Collaborator Author

nknize commented Jan 14, 2022

Opened an issue regarding the node timeout.

I'll give some time for the internet to reboot and re-fire gradle check

This commit fixes TransportHandshaker to send a spoofed Legacy 7.10.2 mincompat
version to ensure OpenSearch 2.x nodes can join a Legacy 7.10.x cluster for
rolling upgrade support. Without this change 7.10.x and OpenSearch 2.x mixed
cluster bwc tests would fail.

Signed-off-by: Nicholas Walter Knize <[email protected]>
Signed-off-by: Nicholas Walter Knize <[email protected]>
@nknize nknize force-pushed the bwc/fix2xTransportHandshake branch from 78625ee to 2601e64 Compare January 14, 2022 19:03
@nknize
Copy link
Collaborator Author

nknize commented Jan 14, 2022

The internet rebooted successfully and latest commits synced. Welcome back from the coma, github

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Check success 2601e64
Log 1927

Reports 1927

@dblock dblock merged commit 81d998d into opensearch-project:main Jan 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 1.x backwards-compatibility pending backport Identifies an issue or PR that still needs to be backported :test Adding or fixing a test v2.0.0 Version 2.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants