Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Segment Replication] - Create a POC proving out mixed clusters with SR enabled. #6211

Closed
mch2 opened this issue Feb 6, 2023 · 4 comments
Closed
Assignees
Labels
distributed framework enhancement Enhancement or improvement to existing feature or request

Comments

@mch2
Copy link
Member

mch2 commented Feb 6, 2023

This is an issue to break #3881 into smaller chunks.

This is to track writing a POC to support mixed clusters & rolling upgrades with SR enabled by writing segments using bwc codecs.

@Poojita-Raj
Copy link
Contributor

Poojita-Raj commented Feb 16, 2023

As part of creating this POC, the first step would be to create a test that fails when mixed clusters with segment replication enabled does not work as expected.

Starting with the test we can work backwards to create a poc. Once this test passes with the poc, we can then revisit all the components to ensure a strong testing framework by adding more unit and integration tests.

@Poojita-Raj
Copy link
Contributor

Current work for the POC is on this branch: https://github.com/Poojita-Raj/OpenSearch/tree/rollingUpgrade

@Poojita-Raj
Copy link
Contributor

Poojita-Raj commented Mar 9, 2023

The below config was run without including the rolling upgrade POC changes in order to see how the current segment replication will fail. The error to take note of is in node3 where it's unable to load the lucene95 codec after the other 2 nodes are upgraded to 2.6.

Retried the same config with the rolling upgrade poc change to see that no such error shows up and in a mixed cluster state (2 nodes on 2.6 and 1 on 2.5 with differing lucene codec versions), we are still able to index and search docs correctly. This POC fix works for default lucene codecs only.

3 nodes (OS 2.5 using lucene 9.4)  
upgrade 2 nodes to OS 2.6 
2 replicas
 
 
$ curl localhost:9200/_cat/nodes?v
ip            heap.percent ram.percent cpu load_1m load_5m load_15m node.role node.roles                                        cluster_manager name
172.31.39.233           24          24   0    0.00    0.01     0.00 dimr      cluster_manager,data,ingest,remote_cluster_client *               node2
172.31.36.82            16          21   0    0.00    0.00     0.00 dimr      cluster_manager,data,ingest,remote_cluster_client -               node3
172.31.41.190           16          56   0    0.00    0.00     0.00 dimr      cluster_manager,data,ingest,remote_cluster_client -               node1
 
$ curl localhost:9200/_cat/allocation?v
shards disk.indices disk.used disk.avail disk.total disk.percent host          ip            node
     2         416b     3.2gb    146.6gb    149.9gb            2 172.31.36.82  172.31.36.82  node3
     2         416b    10.7gb    139.2gb    149.9gb            7 172.31.41.190 172.31.41.190 node1
     2         416b    12.9gb    136.9gb    149.9gb            8 172.31.39.233 172.31.39.233 node2
 
$ curl localhost:9200/_cat/shards
my-index-000001 0 p STARTED 0 208b 172.31.39.233 node2
my-index-000001 0 r STARTED 0 208b 172.31.36.82  node3
my-index-000001 0 r STARTED 0 208b 172.31.41.190 node1
my-index-000003 0 p STARTED 0 208b 172.31.39.233 node2
my-index-000003 0 r STARTED 0 208b 172.31.36.82  node3
my-index-000003 0 r STARTED 0 208b 172.31.41.190 node1
my-index-000002 0 p STARTED 0 208b 172.31.39.233 node2
my-index-000002 0 r STARTED 0 208b 172.31.36.82  node3
my-index-000002 0 r STARTED 0 208b 172.31.41.190 node1
 
curl localhost:9200/_cat/indices?v
health status index           uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   my-index-000003 r659AOcLQDK7Fxb27STC1A   1   2          0            0       624b           208b
green  open   my-index-000002 bU_S1_VxQyeuq7RxAF-bqA   1   2          0            0       624b           208b
green  open   my-index-000001 1axQbVvRSl65aEUKR9Vsxg   1   2          0            0       624b           208b 
 
$ curl localhost:9200/_cat/shards
my-index-000001 0 p STARTED 2  9.7kb 172.31.39.233 node2
my-index-000001 0 r STARTED 2  9.7kb 172.31.36.82  node3
my-index-000001 0 r STARTED 2  9.7kb 172.31.41.190 node1
my-index-000003 0 p STARTED 3 14.5kb 172.31.39.233 node2
my-index-000003 0 r STARTED 3 14.5kb 172.31.36.82  node3
my-index-000003 0 r STARTED 3 14.5kb 172.31.41.190 node1
my-index-000002 0 p STARTED 2  9.7kb 172.31.39.233 node2
my-index-000002 0 r STARTED 2  9.7kb 172.31.36.82  node3
my-index-000002 0 r STARTED 2  9.7kb 172.31.41.190 node1 
 
 
$ curl localhost:9200/_cat/shards 
my-index-000003 0 r STARTED 3 14.8kb 172.31.39.233 node1
my-index-000003 0 r STARTED 3 14.8kb 172.31.36.82  node3
my-index-000003 0 p STARTED 3 14.8kb 172.31.41.190 node1
my-index-000001 0 r STARTED 2  9.9kb 172.31.39.233 node1
my-index-000001 0 r STARTED 2  9.9kb 172.31.36.82  node3
my-index-000001 0 p STARTED 2  9.9kb 172.31.41.190 node1
my-index-000002 0 r STARTED 2  9.9kb 172.31.39.233 node1
my-index-000002 0 r STARTED 2  9.9kb 172.31.36.82  node3
my-index-000002 0 p STARTED 2  9.9kb 172.31.41.190 node1 
 
node3 = 2.5.0 = lucene 9.4.2
 
indexed another doc
 
[2023-03-08T22:03:30,951][WARN ][o.o.i.c.IndicesClusterStateService] [node3] [my-index-000003][0] marking and sending shard failed due to [shard failure, reason [replication failure]]
org.opensearch.indices.replication.common.ReplicationFailedException: [my-index-000003][0]: Replication failed on  (failed to clean after replication)
	at org.opensearch.indices.replication.SegmentReplicationTarget.lambda$finalizeReplication$4(SegmentReplicationTarget.java:254) ~[opensearch-2.5.0.jar:2.5.0]
	at org.opensearch.action.ActionListener.completeWith(ActionListener.java:342) ~[opensearch-2.5.0.jar:2.5.0]
	at org.opensearch.indices.replication.SegmentReplicationTarget.finalizeReplication(SegmentReplicationTarget.java:209) ~[opensearch-2.5.0.jar:2.5.0]
	at org.opensearch.indices.replication.SegmentReplicationTarget.lambda$startReplication$2(SegmentReplicationTarget.java:170) ~[opensearch-2.5.0.jar:2.5.0]
	at org.opensearch.action.ActionListener$1.onResponse(ActionListener.java:80) ~[opensearch-2.5.0.jar:2.5.0]
	at org.opensearch.common.util.concurrent.ListenableFuture$1.doRun(ListenableFuture.java:126) ~[opensearch-2.5.0.jar:2.5.0]
	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.5.0.jar:2.5.0]
	at org.opensearch.common.util.concurrent.OpenSearchExecutors$DirectExecutorService.execute(OpenSearchExecutors.java:343) ~[opensearch-2.5.0.jar:2.5.0]
	at org.opensearch.common.util.concurrent.ListenableFuture.notifyListener(ListenableFuture.java:120) ~[opensearch-2.5.0.jar:2.5.0]
	at org.opensearch.common.util.concurrent.ListenableFuture.lambda$done$0(ListenableFuture.java:112) ~[opensearch-2.5.0.jar:2.5.0]
	at java.util.ArrayList.forEach(ArrayList.java:1511) ~[?:?]
	at org.opensearch.common.util.concurrent.ListenableFuture.done(ListenableFuture.java:112) ~[opensearch-2.5.0.jar:2.5.0]
	at org.opensearch.common.util.concurrent.BaseFuture.set(BaseFuture.java:160) ~[opensearch-2.5.0.jar:2.5.0]
	at org.opensearch.common.util.concurrent.ListenableFuture.onResponse(ListenableFuture.java:141) ~[opensearch-2.5.0.jar:2.5.0]
	at org.opensearch.action.StepListener.innerOnResponse(StepListener.java:77) ~[opensearch-2.5.0.jar:2.5.0]
	at org.opensearch.action.NotifyOnceListener.onResponse(NotifyOnceListener.java:55) ~[opensearch-2.5.0.jar:2.5.0]
	at org.opensearch.action.ActionListener$4.onResponse(ActionListener.java:180) ~[opensearch-2.5.0.jar:2.5.0]
	at org.opensearch.action.ActionListener$6.onResponse(ActionListener.java:299) ~[opensearch-2.5.0.jar:2.5.0]
	at org.opensearch.action.support.RetryableAction$RetryingListener.onResponse(RetryableAction.java:181) ~[opensearch-2.5.0.jar:2.5.0]
	at org.opensearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:69) ~[opensearch-2.5.0.jar:2.5.0]
	at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1404) ~[opensearch-2.5.0.jar:2.5.0]
	at org.opensearch.transport.InboundHandler.doHandleResponse(InboundHandler.java:393) ~[opensearch-2.5.0.jar:2.5.0]
	at org.opensearch.transport.InboundHandler.lambda$handleResponse$1(InboundHandler.java:387) ~[opensearch-2.5.0.jar:2.5.0]
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:747) [opensearch-2.5.0.jar:2.5.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
	at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: java.lang.IllegalArgumentException: Could not load codec 'Lucene95'. Did you forget to add lucene-backward-codecs.jar?
	at org.apache.lucene.index.SegmentInfos.readCodec(SegmentInfos.java:515) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.SegmentInfos.parseSegmentInfos(SegmentInfos.java:404) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:363) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:310) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.opensearch.indices.replication.SegmentReplicationTarget.lambda$finalizeReplication$4(SegmentReplicationTarget.java:218) ~[opensearch-2.5.0.jar:2.5.0]
	... 26 more
	Suppressed: org.apache.lucene.index.CorruptIndexException: checksum passed (a859c5b). possibly transient resource issue, or a Lucene or JVM bug (resource=BufferedChecksumIndexInput(SegmentInfos))
		at org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:500) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
		at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:370) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
		at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:310) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
		at org.opensearch.indices.replication.SegmentReplicationTarget.lambda$finalizeReplication$4(SegmentReplicationTarget.java:218) ~[opensearch-2.5.0.jar:2.5.0]
		at org.opensearch.action.ActionListener.completeWith(ActionListener.java:342) ~[opensearch-2.5.0.jar:2.5.0]
		at org.opensearch.indices.replication.SegmentReplicationTarget.finalizeReplication(SegmentReplicationTarget.java:209) ~[opensearch-2.5.0.jar:2.5.0]
		at org.opensearch.indices.replication.SegmentReplicationTarget.lambda$startReplication$2(SegmentReplicationTarget.java:170) ~[opensearch-2.5.0.jar:2.5.0]
		at org.opensearch.action.ActionListener$1.onResponse(ActionListener.java:80) ~[opensearch-2.5.0.jar:2.5.0]
		at org.opensearch.common.util.concurrent.ListenableFuture$1.doRun(ListenableFuture.java:126) ~[opensearch-2.5.0.jar:2.5.0]
		at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.5.0.jar:2.5.0]
		at org.opensearch.common.util.concurrent.OpenSearchExecutors$DirectExecutorService.execute(OpenSearchExecutors.java:343) ~[opensearch-2.5.0.jar:2.5.0]
		at org.opensearch.common.util.concurrent.ListenableFuture.notifyListener(ListenableFuture.java:120) ~[opensearch-2.5.0.jar:2.5.0]
		at org.opensearch.common.util.concurrent.ListenableFuture.lambda$done$0(ListenableFuture.java:112) ~[opensearch-2.5.0.jar:2.5.0]
		at java.util.ArrayList.forEach(ArrayList.java:1511) ~[?:?]
		at org.opensearch.common.util.concurrent.ListenableFuture.done(ListenableFuture.java:112) ~[opensearch-2.5.0.jar:2.5.0]
		at org.opensearch.common.util.concurrent.BaseFuture.set(BaseFuture.java:160) ~[opensearch-2.5.0.jar:2.5.0]
		at org.opensearch.common.util.concurrent.ListenableFuture.onResponse(ListenableFuture.java:141) ~[opensearch-2.5.0.jar:2.5.0]
		at org.opensearch.action.StepListener.innerOnResponse(StepListener.java:77) ~[opensearch-2.5.0.jar:2.5.0]
		at org.opensearch.action.NotifyOnceListener.onResponse(NotifyOnceListener.java:55) ~[opensearch-2.5.0.jar:2.5.0]
		at org.opensearch.action.ActionListener$4.onResponse(ActionListener.java:180) ~[opensearch-2.5.0.jar:2.5.0]
		at org.opensearch.action.ActionListener$6.onResponse(ActionListener.java:299) ~[opensearch-2.5.0.jar:2.5.0]
		at org.opensearch.action.support.RetryableAction$RetryingListener.onResponse(RetryableAction.java:181) ~[opensearch-2.5.0.jar:2.5.0]
		at org.opensearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:69) ~[opensearch-2.5.0.jar:2.5.0]
		at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1404) ~[opensearch-2.5.0.jar:2.5.0]
		at org.opensearch.transport.InboundHandler.doHandleResponse(InboundHandler.java:393) ~[opensearch-2.5.0.jar:2.5.0]
		at org.opensearch.transport.InboundHandler.lambda$handleResponse$1(InboundHandler.java:387) ~[opensearch-2.5.0.jar:2.5.0]
		at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:747) [opensearch-2.5.0.jar:2.5.0]
		at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
		at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
		at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: java.lang.IllegalArgumentException: An SPI class of type org.apache.lucene.codecs.Codec with name 'Lucene95' does not exist.  You need to add the corresponding JAR file supporting this SPI to your classpath.  The current classpath supports the following names: [Lucene94, Lucene80, Lucene84, Lucene86, Lucene87, Lucene70, Lucene90, Lucene91, Lucene92]
	at org.apache.lucene.util.NamedSPILoader.lookup(NamedSPILoader.java:113) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.codecs.Codec.forName(Codec.java:118) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.SegmentInfos.readCodec(SegmentInfos.java:511) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.SegmentInfos.parseSegmentInfos(SegmentInfos.java:404) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:363) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:310) ~[lucene-core-9.4.2.jar:9.4.2 858d9b437047a577fa9457089afff43eefa461db - jpountz - 2022-11-17 12:56:39]
	at org.opensearch.indices.replication.SegmentReplicationTarget.lambda$finalizeReplication$4(SegmentReplicationTarget.java:218) ~[opensearch-2.5.0.jar:2.5.0]
	... 26 more
 
 
Node1 = cluster manager
 
[2023-03-08T22:08:35,296][WARN ][o.o.t.InboundHandler     ] [node1] Failed to deserialize response from [172.31.36.82/172.31.36.82:9300]
org.opensearch.transport.TransportSerializationException: Failed to deserialize response from handler [org.opensearch.transport.TransportService$ContextRestoreResponseHandler/org.opensearch.transport.TransportService$6/[cluster:monitor/nodes/stats[n]]:org.opensearch.action.support.nodes.TransportNodesAction$AsyncAction$1@326eefe9]
	at org.opensearch.transport.InboundHandler.handleResponse(InboundHandler.java:375) [opensearch-2.6.1.jar:2.6.1]
	at org.opensearch.transport.InboundHandler.messageReceived(InboundHandler.java:160) [opensearch-2.6.1.jar:2.6.1]
	at org.opensearch.transport.InboundHandler.inboundMessage(InboundHandler.java:114) [opensearch-2.6.1.jar:2.6.1]
	at org.opensearch.transport.TcpTransport.inboundMessage(TcpTransport.java:769) [opensearch-2.6.1.jar:2.6.1]
	at org.opensearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:175) [opensearch-2.6.1.jar:2.6.1]
	at org.opensearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:150) [opensearch-2.6.1.jar:2.6.1]
	at org.opensearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:115) [opensearch-2.6.1.jar:2.6.1]
	at org.opensearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:94) [transport-netty4-client-2.6.1.jar:2.6.1]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) [netty-transport-4.1.87.Final.jar:4.1.87.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) [netty-transport-4.1.87.Final.jar:4.1.87.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) [netty-transport-4.1.87.Final.jar:4.1.87.Final]
	at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:280) [netty-handler-4.1.87.Final.jar:4.1.87.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) [netty-transport-4.1.87.Final.jar:4.1.87.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) [netty-transport-4.1.87.Final.jar:4.1.87.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) [netty-transport-4.1.87.Final.jar:4.1.87.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) [netty-transport-4.1.87.Final.jar:4.1.87.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) [netty-transport-4.1.87.Final.jar:4.1.87.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) [netty-transport-4.1.87.Final.jar:4.1.87.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) [netty-transport-4.1.87.Final.jar:4.1.87.Final]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) [netty-transport-4.1.87.Final.jar:4.1.87.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) [netty-transport-4.1.87.Final.jar:4.1.87.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:689) [netty-transport-4.1.87.Final.jar:4.1.87.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:652) [netty-transport-4.1.87.Final.jar:4.1.87.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) [netty-transport-4.1.87.Final.jar:4.1.87.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) [netty-common-4.1.87.Final.jar:4.1.87.Final]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.87.Final.jar:4.1.87.Final]
	at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: java.lang.IllegalStateException: Message not fully read (response) for requestId [3866], handler [org.opensearch.transport.TransportService$ContextRestoreResponseHandler/org.opensearch.transport.TransportService$6/[cluster:monitor/nodes/stats[n]]:org.opensearch.action.support.nodes.TransportNodesAction$AsyncAction$1@326eefe9], error [false]; resetting
	at org.opensearch.transport.InboundHandler.checkStreamIsFullyConsumed(InboundHandler.java:341) ~[opensearch-2.6.1.jar:2.6.1]
	at org.opensearch.transport.InboundHandler.handleResponse(InboundHandler.java:373) ~[opensearch-2.6.1.jar:2.6.1]
	... 26 more
 
 
Node2
[2023-03-08T22:18:03,759][INFO ][o.o.p.PluginsService     ] [node2] PluginService:onIndexModule index:[my-index-000001/1axQbVvRSl65aEUKR9Vsxg]
[2023-03-08T22:18:03,779][INFO ][o.o.i.s.Store            ] [node2] [my-index-000001][0] getMinSegLucVer = 9.4.2
[2023-03-08T22:18:03,780][INFO ][o.o.i.s.Store            ] [node2] [my-index-000001][0] getCommitLucVer = 9.5.0
[2023-03-08T22:18:03,780][INFO ][o.o.i.s.Store            ] [node2] [my-index-000001][0] version=9.4.2
[2023-03-08T22:18:03,781][INFO ][o.o.i.s.Store            ] [node2] [my-index-000001][0] version=9.4.2
[2023-03-08T22:18:03,781][INFO ][o.o.i.s.Store            ] [node2] [my-index-000001][0] version=9.5.0
[2023-03-08T22:18:03,821][INFO ][o.o.i.s.Store            ] [node2] [my-index-000001][0] getMinSegLucVer = 9.4.2
[2023-03-08T22:18:03,821][INFO ][o.o.i.s.Store            ] [node2] [my-index-000001][0] getCommitLucVer = 9.5.0
[2023-03-08T22:18:03,822][INFO ][o.o.i.s.Store            ] [node2] [my-index-000001][0] version=9.4.2
[2023-03-08T22:18:03,822][INFO ][o.o.i.s.Store            ] [node2] [my-index-000001][0] version=9.4.2
[2023-03-08T22:18:03,823][INFO ][o.o.i.s.Store            ] [node2] [my-index-000001][0] version=9.5.0
[2023-03-08T22:18:03,826][INFO ][o.o.i.r.SegmentReplicationTarget] [node2] [my-index-000001][0] Codec Lucene94
[2023-03-08T22:18:03,827][INFO ][o.o.i.r.SegmentReplicationTarget] [node2] [my-index-000001][0] Codec Lucene94
[2023-03-08T22:18:03,827][INFO ][o.o.i.r.SegmentReplicationTarget] [node2] [my-index-000001][0] Codec Lucene95
[2023-03-08T22:18:03,832][INFO ][o.o.i.s.Store            ] [node2] [my-index-000001][0] getMinSegLucVer = 9.4.2
[2023-03-08T22:18:03,832][INFO ][o.o.i.s.Store            ] [node2] [my-index-000001][0] getCommitLucVer = 9.5.0
[2023-03-08T22:18:03,833][INFO ][o.o.i.s.Store            ] [node2] [my-index-000001][0] version=9.4.2
[2023-03-08T22:18:03,835][INFO ][o.o.i.s.Store            ] [node2] [my-index-000001][0] version=9.4.2
[2023-03-08T22:18:03,835][INFO ][o.o.i.s.Store            ] [node2] [my-index-000001][0] version=9.5.0
[2023-03-08T22:18:03,839][INFO ][o.o.i.s.Store            ] [node2] [my-index-000001][0] getMinSegLucVer = 9.4.2
[2023-03-08T22:18:03,839][INFO ][o.o.i.s.Store            ] [node2] [my-index-000001][0] getCommitLucVer = 9.5.0
[2023-03-08T22:18:03,839][INFO ][o.o.i.s.Store            ] [node2] [my-index-000001][0] version=9.4.2
[2023-03-08T22:18:03,840][INFO ][o.o.i.s.Store            ] [node2] [my-index-000001][0] version=9.4.2
[2023-03-08T22:18:03,841][INFO ][o.o.i.s.Store            ] [node2] [my-index-000001][0] version=9.5.0
[2023-03-08T22:20:35,561][INFO ][o.o.i.s.Store            ] [node2] [my-index-000001][0] getMinSegLucVer = 9.4.2
[2023-03-08T22:20:35,562][INFO ][o.o.i.s.Store            ] [node2] [my-index-000001][0] getCommitLucVer = 9.5.0
[2023-03-08T22:20:35,562][INFO ][o.o.i.s.Store            ] [node2] [my-index-000001][0] version=9.4.2
[2023-03-08T22:20:35,563][INFO ][o.o.i.s.Store            ] [node2] [my-index-000001][0] version=9.4.2
[2023-03-08T22:20:35,564][INFO ][o.o.i.s.Store            ] [node2] [my-index-000001][0] version=9.5.0
[2023-03-08T22:20:35,575][INFO ][o.o.i.r.SegmentReplicationTarget] [node2] [my-index-000001][0] Codec Lucene94
[2023-03-08T22:20:35,575][INFO ][o.o.i.r.SegmentReplicationTarget] [node2] [my-index-000001][0] Codec Lucene94
[2023-03-08T22:20:35,575][INFO ][o.o.i.r.SegmentReplicationTarget] [node2] [my-index-000001][0] Codec Lucene95
[2023-03-08T22:20:35,576][INFO ][o.o.i.r.SegmentReplicationTarget] [node2] [my-index-000001][0] Codec Lucene95
[2023-03-08T22:20:35,578][INFO ][o.o.i.s.Store            ] [node2] [my-index-000001][0] getMinSegLucVer = 9.4.2
[2023-03-08T22:20:35,578][INFO ][o.o.i.s.Store            ] [node2] [my-index-000001][0] getCommitLucVer = 9.5.0
[2023-03-08T22:20:35,578][INFO ][o.o.i.s.Store            ] [node2] [my-index-000001][0] version=9.4.2
[2023-03-08T22:20:35,579][INFO ][o.o.i.s.Store            ] [node2] [my-index-000001][0] version=9.4.2
[2023-03-08T22:20:35,580][INFO ][o.o.i.s.Store            ] [node2] [my-index-000001][0] version=9.5.0
[2023-03-08T22:20:54,103][INFO ][o.o.i.s.Store            ] [node2] [my-index-000003][0] getMinSegLucVer = 9.4.2
[2023-03-08T22:20:54,103][INFO ][o.o.i.s.Store            ] [node2] [my-index-000003][0] getCommitLucVer = 9.5.0
[2023-03-08T22:20:54,104][INFO ][o.o.i.s.Store            ] [node2] [my-index-000003][0] version=9.4.2
[2023-03-08T22:20:54,104][INFO ][o.o.i.s.Store            ] [node2] [my-index-000003][0] version=9.4.2
[2023-03-08T22:20:54,105][INFO ][o.o.i.s.Store            ] [node2] [my-index-000003][0] version=9.4.2
[2023-03-08T22:20:54,105][INFO ][o.o.i.s.Store            ] [node2] [my-index-000003][0] version=9.5.0
[2023-03-08T22:20:54,113][INFO ][o.o.i.r.SegmentReplicationTarget] [node2] [my-index-000003][0] Codec Lucene94
[2023-03-08T22:20:54,113][INFO ][o.o.i.r.SegmentReplicationTarget] [node2] [my-index-000003][0] Codec Lucene94
[2023-03-08T22:20:54,113][INFO ][o.o.i.r.SegmentReplicationTarget] [node2] [my-index-000003][0] Codec Lucene94
[2023-03-08T22:20:54,114][INFO ][o.o.i.r.SegmentReplicationTarget] [node2] [my-index-000003][0] Codec Lucene95
[2023-03-08T22:20:54,114][INFO ][o.o.i.r.SegmentReplicationTarget] [node2] [my-index-000003][0] Codec Lucene95
[2023-03-08T22:20:54,116][INFO ][o.o.i.s.Store            ] [node2] [my-index-000003][0] getMinSegLucVer = 9.4.2
[2023-03-08T22:20:54,117][INFO ][o.o.i.s.Store            ] [node2] [my-index-000003][0] getCommitLucVer = 9.5.0
[2023-03-08T22:20:54,117][INFO ][o.o.i.s.Store            ] [node2] [my-index-000003][0] version=9.4.2
[2023-03-08T22:20:54,117][INFO ][o.o.i.s.Store            ] [node2] [my-index-000003][0] version=9.4.2
[2023-03-08T22:20:54,118][INFO ][o.o.i.s.Store            ] [node2] [my-index-000003][0] version=9.4.2
[2023-03-08T22:20:54,118][INFO ][o.o.i.s.Store            ] [node2] [my-index-000003][0] version=9.5.0
[2023-03-08T22:25:03,909][WARN ][o.o.t.InboundHandler     ] [node2] Failed to deserialize response from [172.31.36.82/172.31.36.82:9300]
org.opensearch.transport.TransportSerializationException: Failed to deserialize response from handler [org.opensearch.transport.TransportService$ContextRestoreResponseHandler/org.opensearch.transport.TransportService$6/[cluster:monitor/nodes/stats[n]]:org.opensearch.action.support.nodes.TransportNodesAction$AsyncAction$1@7dd5cc97]
	at org.opensearch.transport.InboundHandler.handleResponse(InboundHandler.java:375) [opensearch-2.6.1.jar:2.6.1]
	at org.opensearch.transport.InboundHandler.messageReceived(InboundHandler.java:160) [opensearch-2.6.1.jar:2.6.1]
	at org.opensearch.transport.InboundHandler.inboundMessage(InboundHandler.java:114) [opensearch-2.6.1.jar:2.6.1]
	at org.opensearch.transport.TcpTransport.inboundMessage(TcpTransport.java:769) [opensearch-2.6.1.jar:2.6.1]
	at org.opensearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:175) [opensearch-2.6.1.jar:2.6.1]
	at org.opensearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:150) [opensearch-2.6.1.jar:2.6.1]
	at org.opensearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:115) [opensearch-2.6.1.jar:2.6.1]
	at org.opensearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:94) [transport-netty4-client-2.6.1.jar:2.6.1]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) [netty-transport-4.1.87.Final.jar:4.1.87.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) [netty-transport-4.1.87.Final.jar:4.1.87.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) [netty-transport-4.1.87.Final.jar:4.1.87.Final]
	at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:280) [netty-handler-4.1.87.Final.jar:4.1.87.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) [netty-transport-4.1.87.Final.jar:4.1.87.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) [netty-transport-4.1.87.Final.jar:4.1.87.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) [netty-transport-4.1.87.Final.jar:4.1.87.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) [netty-transport-4.1.87.Final.jar:4.1.87.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) [netty-transport-4.1.87.Final.jar:4.1.87.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) [netty-transport-4.1.87.Final.jar:4.1.87.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) [netty-transport-4.1.87.Final.jar:4.1.87.Final]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) [netty-transport-4.1.87.Final.jar:4.1.87.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) [netty-transport-4.1.87.Final.jar:4.1.87.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:689) [netty-transport-4.1.87.Final.jar:4.1.87.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:652) [netty-transport-4.1.87.Final.jar:4.1.87.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) [netty-transport-4.1.87.Final.jar:4.1.87.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) [netty-common-4.1.87.Final.jar:4.1.87.Final]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.87.Final.jar:4.1.87.Final]
	at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: java.lang.IllegalStateException: Message not fully read (response) for requestId [459], handler [org.opensearch.transport.TransportService$ContextRestoreResponseHandler/org.opensearch.transport.TransportService$6/[cluster:monitor/nodes/stats[n]]:org.opensearch.action.support.nodes.TransportNodesAction$AsyncAction$1@7dd5cc97], error [false]; resetting
	at org.opensearch.transport.InboundHandler.checkStreamIsFullyConsumed(InboundHandler.java:341) ~[opensearch-2.6.1.jar:2.6.1]
	at org.opensearch.transport.InboundHandler.handleResponse(InboundHandler.java:373) ~[opensearch-2.6.1.jar:2.6.1]
	... 26 more
[2023-03-08T22:25:40,183][INFO ][o.o.i.s.Store            ] [node2] [my-index-000001][0] getMinSegLucVer = 9.4.2
[2023-03-08T22:25:40,184][INFO ][o.o.i.s.Store            ] [node2] [my-index-000001][0] getCommitLucVer = 9.5.0
[2023-03-08T22:25:40,184][INFO ][o.o.i.s.Store            ] [node2] [my-index-000001][0] version=9.4.2
[2023-03-08T22:25:40,185][INFO ][o.o.i.s.Store            ] [node2] [my-index-000001][0] version=9.4.2
[2023-03-08T22:25:40,185][INFO ][o.o.i.s.Store            ] [node2] [my-index-000001][0] version=9.5.0
[2023-03-08T22:25:40,186][INFO ][o.o.i.s.Store            ] [node2] [my-index-000001][0] version=9.5.0
[2023-03-08T22:25:40,188][INFO ][o.o.i.r.SegmentReplicationTarget] [node2] [my-index-000001][0] Codec Lucene94
[2023-03-08T22:25:40,189][INFO ][o.o.i.r.SegmentReplicationTarget] [node2] [my-index-000001][0] Codec Lucene94
[2023-03-08T22:25:40,189][INFO ][o.o.i.r.SegmentReplicationTarget] [node2] [my-index-000001][0] Codec Lucene95
[2023-03-08T22:25:40,189][INFO ][o.o.i.r.SegmentReplicationTarget] [node2] [my-index-000001][0] Codec Lucene95
[2023-03-08T22:25:40,194][INFO ][o.o.i.s.Store            ] [node2] [my-index-000001][0] getMinSegLucVer = 9.4.2
[2023-03-08T22:25:40,194][INFO ][o.o.i.s.Store            ] [node2] [my-index-000001][0] getCommitLucVer = 9.5.0
[2023-03-08T22:25:40,195][INFO ][o.o.i.s.Store            ] [node2] [my-index-000001][0] version=9.4.2
[2023-03-08T22:25:40,196][INFO ][o.o.i.s.Store            ] [node2] [my-index-000001][0] version=9.4.2
[2023-03-08T22:25:40,197][INFO ][o.o.i.s.Store            ] [node2] [my-index-000001][0] version=9.5.0
[2023-03-08T22:25:40,197][INFO ][o.o.i.s.Store            ] [node2] [my-index-000001][0] version=9.5.0
[2023-03-08T22:25:40,207][INFO ][o.o.i.s.Store            ] [node2] [my-index-000001][0] getMinSegLucVer = 9.4.2
[2023-03-08T22:25:40,207][INFO ][o.o.i.s.Store            ] [node2] [my-index-000001][0] getCommitLucVer = 9.5.0
[2023-03-08T22:25:40,208][INFO ][o.o.i.s.Store            ] [node2] [my-index-000001][0] version=9.4.2
[2023-03-08T22:25:40,208][INFO ][o.o.i.s.Store            ] [node2] [my-index-000001][0] version=9.4.2
[2023-03-08T22:25:40,209][INFO ][o.o.i.s.Store            ] [node2] [my-index-000001][0] version=9.5.0
[2023-03-08T22:25:40,210][INFO ][o.o.i.s.Store            ] [node2] [my-index-000001][0] version=9.5.0
[2023-03-08T22:25:55,184][INFO ][o.o.i.s.Store            ] [node2] [my-index-000003][0] getMinSegLucVer = 9.4.2
[2023-03-08T22:25:55,185][INFO ][o.o.i.s.Store            ] [node2] [my-index-000003][0] getCommitLucVer = 9.5.0
[2023-03-08T22:25:55,185][INFO ][o.o.i.s.Store            ] [node2] [my-index-000003][0] version=9.4.2
[2023-03-08T22:25:55,185][INFO ][o.o.i.s.Store            ] [node2] [my-index-000003][0] version=9.4.2
[2023-03-08T22:25:55,186][INFO ][o.o.i.s.Store            ] [node2] [my-index-000003][0] version=9.4.2
[2023-03-08T22:25:55,186][INFO ][o.o.i.s.Store            ] [node2] [my-index-000003][0] version=9.5.0
[2023-03-08T22:25:55,187][INFO ][o.o.i.s.Store            ] [node2] [my-index-000003][0] version=9.5.0
[2023-03-08T22:25:55,190][INFO ][o.o.i.r.SegmentReplicationTarget] [node2] [my-index-000003][0] Codec Lucene94
[2023-03-08T22:25:55,190][INFO ][o.o.i.r.SegmentReplicationTarget] [node2] [my-index-000003][0] Codec Lucene94
[2023-03-08T22:25:55,190][INFO ][o.o.i.r.SegmentReplicationTarget] [node2] [my-index-000003][0] Codec Lucene94
[2023-03-08T22:25:55,191][INFO ][o.o.i.r.SegmentReplicationTarget] [node2] [my-index-000003][0] Codec Lucene95
[2023-03-08T22:25:55,191][INFO ][o.o.i.r.SegmentReplicationTarget] [node2] [my-index-000003][0] Codec Lucene95
[2023-03-08T22:25:55,197][INFO ][o.o.i.s.Store            ] [node2] [my-index-000003][0] getMinSegLucVer = 9.4.2
[2023-03-08T22:25:55,198][INFO ][o.o.i.s.Store            ] [node2] [my-index-000003][0] getCommitLucVer = 9.5.0
[2023-03-08T22:25:55,198][INFO ][o.o.i.s.Store            ] [node2] [my-index-000003][0] version=9.4.2
[2023-03-08T22:25:55,199][INFO ][o.o.i.s.Store            ] [node2] [my-index-000003][0] version=9.4.2
[2023-03-08T22:25:55,199][INFO ][o.o.i.s.Store            ] [node2] [my-index-000003][0] version=9.4.2
[2023-03-08T22:25:55,200][INFO ][o.o.i.s.Store            ] [node2] [my-index-000003][0] version=9.5.0
[2023-03-08T22:25:55,201][INFO ][o.o.i.s.Store            ] [node2] [my-index-000003][0] version=9.5.0
[2023-03-08T22:25:55,211][INFO ][o.o.i.s.Store            ] [node2] [my-index-000003][0] getMinSegLucVer = 9.4.2
[2023-03-08T22:25:55,211][INFO ][o.o.i.s.Store            ] [node2] [my-index-000003][0] getCommitLucVer = 9.5.0
[2023-03-08T22:25:55,212][INFO ][o.o.i.s.Store            ] [node2] [my-index-000003][0] version=9.4.2
[2023-03-08T22:25:55,212][INFO ][o.o.i.s.Store            ] [node2] [my-index-000003][0] version=9.4.2
[2023-03-08T22:25:55,213][INFO ][o.o.i.s.Store            ] [node2] [my-index-000003][0] version=9.4.2
[2023-03-08T22:25:55,214][INFO ][o.o.i.s.Store            ] [node2] [my-index-000003][0] version=9.5.0
[2023-03-08T22:25:55,214][INFO ][o.o.i.s.Store            ] [node2] [my-index-000003][0] version=9.5.0

@github-project-automation github-project-automation bot moved this from In Progress to Done in Segment Replication Mar 10, 2023
@Poojita-Raj Poojita-Raj added the POC Label for Proof of Concept label Jun 13, 2023
@github-actions
Copy link
Contributor

POC Checklist:

Please go through the following checklist to ensure these items are taken into account while designing the POC.

  • Supports safe upgrade paths from all supported BWC versions to the current version
  • Supports compatibility with all plugins
    • opensearch-alerting
    • opensearch-anomaly-detection
    • opensearch-asynchronous-search
    • opensearch-cross-cluster-replication
    • opensearch-geospatial
    • opensearch-index-management
    • opensearch-job-scheduler
    • opensearch-knn
    • opensearch-ml
    • opensearch-notifications
    • opensearch-notifications-core
    • opensearch-observability
    • opensearch-performance-analyzer
    • opensearch-reports-scheduler
    • opensearch-security
    • opensearch-sql
  • Supports lucene upgrades across minor lucene versions
  • Supports lucene upgrades across major lucene versions
  • Supports lucene upgrades across underlying lucene codec bumps (Eg: Lucene95Codec -> Lucene96Codec)
  • Supports wire compatibility of OpenSearch
  • Plan to measure performance degradation/improvement (if any)
  • Plan to document any user facing changes introduced by this feature
  • Ensure working and passing CI

Thank you for your contribution!

@Poojita-Raj Poojita-Raj removed the POC Label for Proof of Concept label Jun 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
distributed framework enhancement Enhancement or improvement to existing feature or request
Projects
Status: Done
Development

No branches or pull requests

3 participants