Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] java.io.OptionalDataException error in OpenSearch 2.0 rc #1961

Closed
HyunSeaHoon opened this issue May 19, 2022 · 51 comments
Closed

[BUG] java.io.OptionalDataException error in OpenSearch 2.0 rc #1961

HyunSeaHoon opened this issue May 19, 2022 · 51 comments
Assignees
Labels
bug Something isn't working

Comments

@HyunSeaHoon
Copy link

HyunSeaHoon commented May 19, 2022

Describe the bug
I am collecting data with OpenSearch with Logstash.
OpenSearch throws java.io.OptionalDataException error during service operation.

This did not happen in the OpenSearch 1.3 operating environment.

The occurrence of errors is irregular, and errors sometimes disappear after a certain period of time has passed.

Data collection is collected in different indexes depending on the data type, and some index collections work when an error occurs.
The data that is failing is not just a specific index, but a variety of things.

To Reproduce

Expected behavior

Plugins

Screenshots

Host/Environment (please complete the following information):

  • OS: CentOS Linux release 7.9.2009
  • Logstash: 7.16.2.
  • Version: OpenSearch works in docker, and images use opensearchproject/opensearch:1.3.0 / opensearchproject/opensearch:2.0.0-rc1

Additional context
[OpenSearch Log]

2022-05-18T23:44:42.965700981Z [2022-05-18T23:44:42,965][INFO ][o.o.j.s.JobSweeper       ] [91616f4bd951] Running full sweep
2022-05-18T23:49:42.966011507Z [2022-05-18T23:49:42,965][INFO ][o.o.j.s.JobSweeper       ] [91616f4bd951] Running full sweep
2022-05-18T23:49:43.072676353Z [2022-05-18T23:49:43,072][INFO ][o.o.a.t.CronTransportAction] [91616f4bd951] Start running AD hourly cron.
2022-05-18T23:49:43.072820561Z [2022-05-18T23:49:43,072][INFO ][o.o.a.t.ADTaskManager    ] [91616f4bd951] Start to maintain running historical tasks
2022-05-18T23:49:43.073109887Z [2022-05-18T23:49:43,073][INFO ][o.o.a.c.HourlyCron       ] [91616f4bd951] Hourly maintenance succeeds
2022-05-18T23:50:30.021645243Z [2022-05-18T23:50:30,012][WARN ][r.suppressed             ] [91616f4bd951] path: /indexname-*/_search, params: {index=indexname-*}
2022-05-18T23:50:30.021674167Z org.opensearch.action.search.SearchPhaseExecutionException: all shards failed
2022-05-18T23:50:30.021678359Z 	at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:642) [opensearch-2.0.0-rc1.jar:2.0.0-rc1]
2022-05-18T23:50:30.021681538Z 	at org.opensearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:360) [opensearch-2.0.0-rc1.jar:2.0.0-rc1]
2022-05-18T23:50:30.021684355Z 	at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:677) [opensearch-2.0.0-rc1.jar:2.0.0-rc1]
2022-05-18T23:50:30.021687051Z 	at org.opensearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:457) [opensearch-2.0.0-rc1.jar:2.0.0-rc1]
2022-05-18T23:50:30.021689946Z 	at org.opensearch.action.search.AbstractSearchAsyncAction$1.onFailure(AbstractSearchAsyncAction.java:291) [opensearch-2.0.0-rc1.jar:2.0.0-rc1]
2022-05-18T23:50:30.021692560Z 	at org.opensearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:72) [opensearch-2.0.0-rc1.jar:2.0.0-rc1]
2022-05-18T23:50:30.021695600Z 	at org.opensearch.transport.TransportService$6.handleException(TransportService.java:735) [opensearch-2.0.0-rc1.jar:2.0.0-rc1]
2022-05-18T23:50:30.021707976Z 	at org.opensearch.security.transport.SecurityInterceptor$RestoringTransportResponseHandler.handleException(SecurityInterceptor.java:318) [opensearch-security-2.0.0.0-rc1.jar:2.0.0.0-rc1]
2022-05-18T23:50:30.021711627Z 	at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1350) [opensearch-2.0.0-rc1.jar:2.0.0-rc1]
2022-05-18T23:50:30.021714318Z 	at org.opensearch.transport.TransportService$DirectResponseChannel.processException(TransportService.java:1459) [opensearch-2.0.0-rc1.jar:2.0.0-rc1]
2022-05-18T23:50:30.021717064Z 	at org.opensearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1433) [opensearch-2.0.0-rc1.jar:2.0.0-rc1]
2022-05-18T23:50:30.021719769Z 	at org.opensearch.transport.TransportService.sendLocalRequest(TransportService.java:967) [opensearch-2.0.0-rc1.jar:2.0.0-rc1]
2022-05-18T23:50:30.021723529Z 	at org.opensearch.transport.TransportService$3.sendRequest(TransportService.java:147) [opensearch-2.0.0-rc1.jar:2.0.0-rc1]
2022-05-18T23:50:30.021726036Z 	at org.opensearch.transport.TransportService.sendRequestInternal(TransportService.java:869) [opensearch-2.0.0-rc1.jar:2.0.0-rc1]
2022-05-18T23:50:30.021728354Z 	at org.opensearch.security.transport.SecurityInterceptor.sendRequestDecorate(SecurityInterceptor.java:212) [opensearch-security-2.0.0.0-rc1.jar:2.0.0.0-rc1]
2022-05-18T23:50:30.021731148Z 	at org.opensearch.security.OpenSearchSecurityPlugin$7$2.sendRequest(OpenSearchSecurityPlugin.java:665) [opensearch-security-2.0.0.0-rc1.jar:2.0.0.0-rc1]
2022-05-18T23:50:30.021733902Z 	at org.opensearch.transport.TransportService.sendRequest(TransportService.java:756) [opensearch-2.0.0-rc1.jar:2.0.0-rc1]
2022-05-18T23:50:30.021736648Z 	at org.opensearch.transport.TransportService.sendChildRequest(TransportService.java:831) [opensearch-2.0.0-rc1.jar:2.0.0-rc1]
2022-05-18T23:50:30.021739309Z 	at org.opensearch.action.search.SearchTransportService.sendCanMatch(SearchTransportService.java:149) [opensearch-2.0.0-rc1.jar:2.0.0-rc1]
2022-05-18T23:50:30.021742052Z 	at org.opensearch.action.search.CanMatchPreFilterSearchPhase.executePhaseOnShard(CanMatchPreFilterSearchPhase.java:128) [opensearch-2.0.0-rc1.jar:2.0.0-rc1]
2022-05-18T23:50:30.021744771Z 	at org.opensearch.action.search.AbstractSearchAsyncAction.lambda$performPhaseOnShard$3(AbstractSearchAsyncAction.java:278) [opensearch-2.0.0-rc1.jar:2.0.0-rc1]
2022-05-18T23:50:30.021747524Z 	at org.opensearch.action.search.AbstractSearchAsyncAction.performPhaseOnShard(AbstractSearchAsyncAction.java:312) [opensearch-2.0.0-rc1.jar:2.0.0-rc1]
2022-05-18T23:50:30.021750314Z 	at org.opensearch.action.search.AbstractSearchAsyncAction.run(AbstractSearchAsyncAction.java:249) [opensearch-2.0.0-rc1.jar:2.0.0-rc1]
2022-05-18T23:50:30.021753108Z 	at org.opensearch.action.search.AbstractSearchAsyncAction.executePhase(AbstractSearchAsyncAction.java:415) [opensearch-2.0.0-rc1.jar:2.0.0-rc1]
2022-05-18T23:50:30.021755926Z 	at org.opensearch.action.search.AbstractSearchAsyncAction.start(AbstractSearchAsyncAction.java:215) [opensearch-2.0.0-rc1.jar:2.0.0-rc1]
2022-05-18T23:50:30.021758723Z 	at org.opensearch.action.search.TransportSearchAction.executeSearch(TransportSearchAction.java:994) [opensearch-2.0.0-rc1.jar:2.0.0-rc1]
2022-05-18T23:50:30.021764118Z 	at org.opensearch.action.search.TransportSearchAction.executeLocalSearch(TransportSearchAction.java:757) [opensearch-2.0.0-rc1.jar:2.0.0-rc1]
2022-05-18T23:50:30.021767219Z 	at org.opensearch.action.search.TransportSearchAction.lambda$executeRequest$3(TransportSearchAction.java:398) [opensearch-2.0.0-rc1.jar:2.0.0-rc1]
2022-05-18T23:50:30.021770070Z 	at org.opensearch.action.ActionListener$1.onResponse(ActionListener.java:78) [opensearch-2.0.0-rc1.jar:2.0.0-rc1]
2022-05-18T23:50:30.021772833Z 	at org.opensearch.index.query.Rewriteable.rewriteAndFetch(Rewriteable.java:136) [opensearch-2.0.0-rc1.jar:2.0.0-rc1]
2022-05-18T23:50:30.021775554Z 	at org.opensearch.index.query.Rewriteable.rewriteAndFetch(Rewriteable.java:101) [opensearch-2.0.0-rc1.jar:2.0.0-rc1]
2022-05-18T23:50:30.021778319Z 	at org.opensearch.action.search.TransportSearchAction.executeRequest(TransportSearchAction.java:487) [opensearch-2.0.0-rc1.jar:2.0.0-rc1]
2022-05-18T23:50:30.021781263Z 	at org.opensearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:277) [opensearch-2.0.0-rc1.jar:2.0.0-rc1]
2022-05-18T23:50:30.021784064Z 	at org.opensearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:120) [opensearch-2.0.0-rc1.jar:2.0.0-rc1]
2022-05-18T23:50:30.021786868Z 	at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:194) [opensearch-2.0.0-rc1.jar:2.0.0-rc1]
2022-05-18T23:50:30.021789576Z 	at org.opensearch.indexmanagement.rollup.actionfilter.FieldCapsFilter.apply(FieldCapsFilter.kt:118) [opensearch-index-management-2.0.0.0-rc1.jar:2.0.0.0-rc1]
2022-05-18T23:50:30.021792191Z 	at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:192) [opensearch-2.0.0-rc1.jar:2.0.0-rc1]
2022-05-18T23:50:30.021794631Z 	at org.opensearch.performanceanalyzer.action.PerformanceAnalyzerActionFilter.apply(PerformanceAnalyzerActionFilter.java:78) [opensearch-performance-analyzer-2.0.0.0-rc1.jar:2.0.0.0-rc1]
2022-05-18T23:50:30.021797236Z 	at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:192) [opensearch-2.0.0-rc1.jar:2.0.0-rc1]
2022-05-18T23:50:30.021800054Z 	at org.opensearch.security.filter.SecurityFilter.apply0(SecurityFilter.java:325) [opensearch-security-2.0.0.0-rc1.jar:2.0.0.0-rc1]
2022-05-18T23:50:30.021802755Z 	at org.opensearch.security.filter.SecurityFilter.apply(SecurityFilter.java:157) [opensearch-security-2.0.0.0-rc1.jar:2.0.0.0-rc1]
2022-05-18T23:50:30.021806267Z 	at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:192) [opensearch-2.0.0-rc1.jar:2.0.0-rc1]
2022-05-18T23:50:30.021809178Z 	at org.opensearch.action.support.TransportAction.execute(TransportAction.java:169) [opensearch-2.0.0-rc1.jar:2.0.0-rc1]
2022-05-18T23:50:30.021811919Z 	at org.opensearch.action.support.TransportAction.execute(TransportAction.java:97) [opensearch-2.0.0-rc1.jar:2.0.0-rc1]
2022-05-18T23:50:30.021814537Z 	at org.opensearch.client.node.NodeClient.executeLocally(NodeClient.java:108) [opensearch-2.0.0-rc1.jar:2.0.0-rc1]
2022-05-18T23:50:30.021819342Z 	at org.opensearch.rest.action.RestCancellableNodeClient.doExecute(RestCancellableNodeClient.java:104) [opensearch-2.0.0-rc1.jar:2.0.0-rc1]
2022-05-18T23:50:30.021822194Z 	at org.opensearch.client.support.AbstractClient.execute(AbstractClient.java:425) [opensearch-2.0.0-rc1.jar:2.0.0-rc1]
2022-05-18T23:50:30.021824698Z 	at org.opensearch.rest.action.search.RestSearchAction.lambda$prepareRequest$2(RestSearchAction.java:130) [opensearch-2.0.0-rc1.jar:2.0.0-rc1]
2022-05-18T23:50:30.021827375Z 	at org.opensearch.rest.BaseRestHandler.handleRequest(BaseRestHandler.java:123) [opensearch-2.0.0-rc1.jar:2.0.0-rc1]
2022-05-18T23:50:30.021830341Z 	at org.opensearch.security.filter.SecurityRestFilter$1.handleRequest(SecurityRestFilter.java:128) [opensearch-security-2.0.0.0-rc1.jar:2.0.0.0-rc1]
2022-05-18T23:50:30.021832662Z 	at org.opensearch.rest.RestController.dispatchRequest(RestController.java:306) [opensearch-2.0.0-rc1.jar:2.0.0-rc1]
2022-05-18T23:50:30.021835200Z 	at org.opensearch.rest.RestController.tryAllHandlers(RestController.java:392) [opensearch-2.0.0-rc1.jar:2.0.0-rc1]
2022-05-18T23:50:30.021837529Z 	at org.opensearch.rest.RestController.dispatchRequest(RestController.java:235) [opensearch-2.0.0-rc1.jar:2.0.0-rc1]
2022-05-18T23:50:30.021840258Z 	at org.opensearch.security.ssl.http.netty.ValidatingDispatcher.dispatchRequest(ValidatingDispatcher.java:63) [opensearch-security-2.0.0.0-rc1.jar:2.0.0.0-rc1]
2022-05-18T23:50:30.021843001Z 	at org.opensearch.http.AbstractHttpServerTransport.dispatchRequest(AbstractHttpServerTransport.java:361) [opensearch-2.0.0-rc1.jar:2.0.0-rc1]
2022-05-18T23:50:30.021845584Z 	at org.opensearch.http.AbstractHttpServerTransport.handleIncomingRequest(AbstractHttpServerTransport.java:440) [opensearch-2.0.0-rc1.jar:2.0.0-rc1]
2022-05-18T23:50:30.021848392Z 	at org.opensearch.http.AbstractHttpServerTransport.incomingRequest(AbstractHttpServerTransport.java:351) [opensearch-2.0.0-rc1.jar:2.0.0-rc1]
2022-05-18T23:50:30.021851677Z 	at org.opensearch.http.netty4.Netty4HttpRequestHandler.channelRead0(Netty4HttpRequestHandler.java:55) [transport-netty4-client-2.0.0-rc1.jar:2.0.0-rc1]
2022-05-18T23:50:30.021854379Z 	at org.opensearch.http.netty4.Netty4HttpRequestHandler.channelRead0(Netty4HttpRequestHandler.java:41) [transport-netty4-client-2.0.0-rc1.jar:2.0.0-rc1]
2022-05-18T23:50:30.021857188Z 	at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
2022-05-18T23:50:30.021859832Z 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
2022-05-18T23:50:30.021862547Z 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
2022-05-18T23:50:30.021865145Z 	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
2022-05-18T23:50:30.021868012Z 	at org.opensearch.http.netty4.Netty4HttpPipeliningHandler.channelRead(Netty4HttpPipeliningHandler.java:71) [transport-netty4-client-2.0.0-rc1.jar:2.0.0-rc1]
2022-05-18T23:50:30.021873155Z 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
2022-05-18T23:50:30.021876009Z 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
2022-05-18T23:50:30.021878647Z 	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
2022-05-18T23:50:30.021881288Z 	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) [netty-codec-4.1.73.Final.jar:4.1.73.Final]
2022-05-18T23:50:30.021884253Z 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
2022-05-18T23:50:30.021887006Z 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
2022-05-18T23:50:30.021890227Z 	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
2022-05-18T23:50:30.021893150Z 	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) [netty-codec-4.1.73.Final.jar:4.1.73.Final]
2022-05-18T23:50:30.021895772Z 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
2022-05-18T23:50:30.021898247Z 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
2022-05-18T23:50:30.021900791Z 	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
2022-05-18T23:50:30.021903611Z 	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) [netty-codec-4.1.73.Final.jar:4.1.73.Final]
2022-05-18T23:50:30.021906357Z 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
2022-05-18T23:50:30.021909121Z 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
2022-05-18T23:50:30.021911671Z 	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
2022-05-18T23:50:30.021914332Z 	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:327) [netty-codec-4.1.73.Final.jar:4.1.73.Final]
2022-05-18T23:50:30.021917065Z 	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:299) [netty-codec-4.1.73.Final.jar:4.1.73.Final]
2022-05-18T23:50:30.021919797Z 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
2022-05-18T23:50:30.021924273Z 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
2022-05-18T23:50:30.021927154Z 	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
2022-05-18T23:50:30.021930364Z 	at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) [netty-handler-4.1.73.Final.jar:4.1.73.Final]
2022-05-18T23:50:30.021933214Z 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
2022-05-18T23:50:30.021935825Z 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
2022-05-18T23:50:30.021938494Z 	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
2022-05-18T23:50:30.021941185Z 	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) [netty-codec-4.1.73.Final.jar:4.1.73.Final]
2022-05-18T23:50:30.021943772Z 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
2022-05-18T23:50:30.021946507Z 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
2022-05-18T23:50:30.021949096Z 	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
2022-05-18T23:50:30.021951894Z 	at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1371) [netty-handler-4.1.73.Final.jar:4.1.73.Final]
2022-05-18T23:50:30.021954600Z 	at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1234) [netty-handler-4.1.73.Final.jar:4.1.73.Final]
2022-05-18T23:50:30.021957283Z 	at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1283) [netty-handler-4.1.73.Final.jar:4.1.73.Final]
2022-05-18T23:50:30.021959941Z 	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:510) [netty-codec-4.1.73.Final.jar:4.1.73.Final]
2022-05-18T23:50:30.021962822Z 	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:449) [netty-codec-4.1.73.Final.jar:4.1.73.Final]
2022-05-18T23:50:30.021965687Z 	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:279) [netty-codec-4.1.73.Final.jar:4.1.73.Final]
2022-05-18T23:50:30.021968924Z 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
2022-05-18T23:50:30.021971771Z 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
2022-05-18T23:50:30.021974631Z 	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
2022-05-18T23:50:30.021979112Z 	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
2022-05-18T23:50:30.021982181Z 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
2022-05-18T23:50:30.021984750Z 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
2022-05-18T23:50:30.021987493Z 	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
2022-05-18T23:50:30.021990170Z 	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
2022-05-18T23:50:30.021992886Z 	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
2022-05-18T23:50:30.021995570Z 	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:623) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
2022-05-18T23:50:30.021998068Z 	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:586) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
2022-05-18T23:50:30.022005132Z 	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) [netty-transport-4.1.73.Final.jar:4.1.73.Final]
2022-05-18T23:50:30.022007714Z 	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) [netty-common-4.1.73.Final.jar:4.1.73.Final]
2022-05-18T23:50:30.022010101Z 	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.73.Final.jar:4.1.73.Final]
2022-05-18T23:50:30.022013231Z 	at java.lang.Thread.run(Thread.java:833) [?:?]
2022-05-18T23:50:30.022015580Z Caused by: org.opensearch.OpenSearchException: java.io.OptionalDataException
2022-05-18T23:50:30.022020352Z 	at org.opensearch.security.support.Base64Helper.deserializeObject(Base64Helper.java:185) ~[?:?]
2022-05-18T23:50:30.022023058Z 	at org.opensearch.security.transport.SecurityRequestHandler.messageReceivedDecorate(SecurityRequestHandler.java:155) ~[?:?]
2022-05-18T23:50:30.022025907Z 	at org.opensearch.security.ssl.transport.SecuritySSLRequestHandler.messageReceived(SecuritySSLRequestHandler.java:97) ~[?:?]
2022-05-18T23:50:30.022028556Z 	at org.opensearch.security.OpenSearchSecurityPlugin$7$1.messageReceived(OpenSearchSecurityPlugin.java:651) ~[?:?]
2022-05-18T23:50:30.022031217Z 	at org.opensearch.indexmanagement.rollup.interceptor.RollupInterceptor$interceptHandler$1.messageReceived(RollupInterceptor.kt:118) ~[?:?]
2022-05-18T23:50:30.022033939Z 	at org.opensearch.performanceanalyzer.transport.PerformanceAnalyzerTransportRequestHandler.messageReceived(PerformanceAnalyzerTransportRequestHandler.java:43) ~[?:?]
2022-05-18T23:50:30.022036620Z 	at org.opensearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:98) ~[opensearch-2.0.0-rc1.jar:2.0.0-rc1]
2022-05-18T23:50:30.022042345Z 	at org.opensearch.transport.TransportService.sendLocalRequest(TransportService.java:931) ~[opensearch-2.0.0-rc1.jar:2.0.0-rc1]
2022-05-18T23:50:30.022045459Z 	... 101 more
2022-05-18T23:50:30.022047895Z Caused by: java.io.OptionalDataException
2022-05-18T23:50:30.022050506Z 	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1767) ~[?:?]
2022-05-18T23:50:30.022053172Z 	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:514) ~[?:?]
2022-05-18T23:50:30.022055439Z 	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:472) ~[?:?]
2022-05-18T23:50:30.022057905Z 	at java.util.HashSet.readObject(HashSet.java:345) ~[?:?]
2022-05-18T23:50:30.022060408Z 	at jdk.internal.reflect.GeneratedMethodAccessor36.invoke(Unknown Source) ~[?:?]
2022-05-18T23:50:30.022062944Z 	at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
2022-05-18T23:50:30.022065516Z 	at java.lang.reflect.Method.invoke(Method.java:568) ~[?:?]
2022-05-18T23:50:30.022068083Z 	at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1231) ~[?:?]
2022-05-18T23:50:30.022070808Z 	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2434) ~[?:?]
2022-05-18T23:50:30.022073271Z 	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2268) ~[?:?]
2022-05-18T23:50:30.022075787Z 	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1744) ~[?:?]
2022-05-18T23:50:30.022078237Z 	at java.io.ObjectInputStream$FieldValues.<init>(ObjectInputStream.java:2617) ~[?:?]
2022-05-18T23:50:30.022081688Z 	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2468) ~[?:?]
2022-05-18T23:50:30.022087904Z 	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2268) ~[?:?]
2022-05-18T23:50:30.022090864Z 	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1744) ~[?:?]
2022-05-18T23:50:30.022093895Z 	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:514) ~[?:?]
2022-05-18T23:50:30.022096278Z 	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:472) ~[?:?]
2022-05-18T23:50:30.022098911Z 	at org.opensearch.security.support.Base64Helper.deserializeObject(Base64Helper.java:183) ~[?:?]
2022-05-18T23:50:30.022101744Z 	at org.opensearch.security.transport.SecurityRequestHandler.messageReceivedDecorate(SecurityRequestHandler.java:155) ~[?:?]
2022-05-18T23:50:30.022104165Z 	at org.opensearch.security.ssl.transport.SecuritySSLRequestHandler.messageReceived(SecuritySSLRequestHandler.java:97) ~[?:?]
2022-05-18T23:50:30.022106691Z 	at org.opensearch.security.OpenSearchSecurityPlugin$7$1.messageReceived(OpenSearchSecurityPlugin.java:651) ~[?:?]
2022-05-18T23:50:30.022109481Z 	at org.opensearch.indexmanagement.rollup.interceptor.RollupInterceptor$interceptHandler$1.messageReceived(RollupInterceptor.kt:118) ~[?:?]
2022-05-18T23:50:30.022112185Z 	at org.opensearch.performanceanalyzer.transport.PerformanceAnalyzerTransportRequestHandler.messageReceived(PerformanceAnalyzerTransportRequestHandler.java:43) ~[?:?]
2022-05-18T23:50:30.022117725Z 	at org.opensearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:98) ~[opensearch-2.0.0-rc1.jar:2.0.0-rc1]
2022-05-18T23:50:30.022119699Z 	at org.opensearch.transport.TransportService.sendLocalRequest(TransportService.java:931) ~[opensearch-2.0.0-rc1.jar:2.0.0-rc1]
2022-05-18T23:50:30.022121470Z 	... 101 more
@HyunSeaHoon HyunSeaHoon added bug Something isn't working untriaged Require the attention of the repository maintainers and may need to be prioritized labels May 19, 2022
@adnapibar
Copy link

@HyunSeaHoon Thanks for reporting the issue. Can you please provide steps to reproduce the issue?

@dblock
Copy link
Member

dblock commented May 23, 2022

@adnapibar ^ says The occurrence of errors is irregular, and errors sometimes disappear after a certain period of time has passed. I am afraid we have to work backwards from the error to understand how it could happen.

@adnapibar
Copy link

@dblock Yes agree! But I think we need more details that can help us to reproduce this error. Also, does this error happen with other API calls during the time? I see one discussion on es 7.9 reporting similar issue (not with logstash) https://discuss.elastic.co/t/temp-downtimes-optionaldataexception-500-on-all-apis-aws/257571 which may be completely unrelated.

@HyunSeaHoon
Copy link
Author

Thank you for your interest.

Our service collects logs with logstash, stores them with ElasticSearch, and retrieves logs with a NodeJS-based web server.

We are changing from using ElasticSearch (with or without the Opendistro plugin) service to OpenSearch service.

logstash uses OSS version 7.16.2 and the output plugin uses logstash-output-opensearch (1.2.0).

The web server uses the javascript client elastic/elasticsearch.

Screen Shot 2022-05-24 at 12 55 31 PM

As far as I know, the OptionalDataException error has been confirmed so far in the OpenSearch V2.0Rc environment.

Based on the symptoms occurring in the test environment, it seems to occur as follows:

  1. OptionalDataException seems to be thrown first in Logstash during log collection.

  2. It also happens through the OpenSearch search API. (Occurs when a request is made from the web server to the OpenSearch server)

  3. Logs are collected in several types, and the log types in which OptionalDataException occurs are various. However, it seems that the object type is included in the data mapping among the collection histories in each log. For example, in the logindex generated by month, there was Object type mapping in logindex-April, and there was no Objecty type mapping in logindex-May, and it also occurred in the process of inserting data without object type. (The important thing is that OptionalDataException is thrown randomly.)

  4. In the existing OpenSearch or ElasticSearch, the vm.max_map_count value was not set separately. (It seems that the default value of 65,530 was applied)
    After the occurrence of OptionalDataException was confirmed in the OpenSearch 2.0 Rc1 environment, when this value was changed to the recommended value (262,144), it seems that OptionalDataException occurs less.

  5. After the vm.max_map_count value was changed, I did not experience the symptom that an OptionalDataException error occurred when searching in the web server.

  6. When an OptionalDataException occurred before the vm.max_map_count value was changed, there were cases where the login function could not be used even in OpenSearchDashboard.
    (500 status code, Internal Server Error)

  7. When an OptionalDataException occurred before the vm.max_map_count value was changed, the symptom seemed to continue until the OpenSearch service was restarted.

  8. After changing the value of vm.max_map_count, I monitored the occurrence of OptionalDataException (only in Logstash, not when looking up the web server). It occurs about 5 times per minute for a certain period of time, then does not occur for a certain period of time and then occurs irregularly.

  9. In general, when _cat/indices is searched, the response is in the following form.
    health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
    green open indexname JermuzKkTAa0dL7AxZ6l3A 1 0 449683 0 418.6mb 418.6mb
    When an OptionalDataException occurs in Logstash, when _cat/indices is searched with OpenSearch through curl, document and size information may not appear for all indexes as follows.
    health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
    green open indexname JermuzKkTAa0dL7AxZ6l3A

I wonder how many cases of this symptom are there, and I wonder if it is an opensearch bug, or a problem with the version of the library we use or a problem with data structure.

So far, it seems difficult to predict the cause of the occurrence.

Obviously, I have not experienced it with any version other than OpenSearch 2.0Rc1.

@adnapibar adnapibar added distributed framework and removed untriaged Require the attention of the repository maintainers and may need to be prioritized labels May 24, 2022
@louzadod
Copy link

louzadod commented May 30, 2022

It happened with 2.0.0 official release too. I migrated my cluster on 8h30 and the problem started 9h50 and took 1h do stop. It recovered from the error alone (no intervention) and I don't know if it is going to come back.
It is critical for us and now it is difficult to revert my upgrade.

@HyunSeaHoon
Copy link
Author

HyunSeaHoon commented May 31, 2022

The same symptoms appear in the full version of Opensearch 2.0.

The environment file used for testing is attached.

ostest.tar.gz

After checking the following, an error also occurred in that case.

    1. After the vm.max_map_count value was changed, I did not experience the symptom that an OptionalDataException error occurred when searching in the web server.

After all the services are up (docker-compose up -d), an OptionalDataException is thrown at some point after a certain amount of time has passed. (Occurs within 1 hour, occurs after 10 hours, etc.)

The peculiar thing is that when monitoring with only the service running, OptionalDataException occurs in Logstash, but the log is not visible in OpenSearch.

At this time, when accessing the OpenSearchDashboard UI, an error 500 occurs, and an OptionalDataException occurs in the OpenSearchDashboard log and OpenSearch log.

@tlfeng
Copy link

tlfeng commented May 31, 2022

Hi @HyunSeaHoon Thank you so much for providing such detailed steps to reproduce, appreciate your time.
And thanks @louzadod for reporting the issue as well.
I will discuss the issue with the other maintainers, and take it as a high priority.

@louzadod
Copy link

louzadod commented Jun 1, 2022

Hello @tlfeng . The architecture we adopted is as follows:
[Filebeat] --> [Logstash] --> [OS Client Nodes] --> [OS Data Nodes]

When the problem happens, the logs from Logstash to OpenSearch Client Nodes stops flowing. I assumed the problem was on Logstash and tried to restart it. After LS restart, logs were still not flowing.

Only when I restarted the OS Client Nodes, the logs started to flow again. It looks like a problem in the OpenSearch side.

@amalgamm
Copy link

amalgamm commented Jun 7, 2022

Hi! Any update on this one?
I've got the same problem after upgrading from 1.2.4 to 2.0.0. I've got 2 data nodes and 1 master node. Several times during a day this stuff happens on random node. Cant read or write to this node until restart. Also no related warning or errors in +-5 minutes period.
Writing data directly to data nodes with fluent-bit (opensearch input), reading with grafana and opensearch-dashboards 2.0.0. No problems occurred before upgrade

@JustinasKO
Copy link

Same happens here. Multiple times a day Logstash fails to sent logs to OpenSearch cluster. Started happening after upgrade to V2.
Is downgrading back to v1.3.2 an option? maybe some other workaround?

@amalgamm
Copy link

amalgamm commented Jun 7, 2022

I tried to downgrade but due to Lucene upgrade to v9 cluster failed to read cluster state and indices metadata.

Same happens here. Multiple times a day Logstash fails to sent logs to OpenSearch cluster. Started happening after upgrade to V2. Is downgrading back to v1.3.2 an option? maybe some other workaround?

@louzadod
Copy link

louzadod commented Jun 7, 2022

The only workaround that helped me a lot was to schedule restart hourly for the client nodes. This reduced the problem a lot, but of course is an ugly workaround. The performance improvements in v2 were important to me and that's why I performed the upgrade.

@rafaelvargas
Copy link

Some information, based on what we've experienced:

We've recently updated Opensearch to the version 2.0.0. The problem seems to happen periodically (at least, in our case), every 60 minutes. After some further investigation, we detected that the problem seems to be associated with the Opensearch Security Plugin cache, and the frequency of the occurence of the problem seems to be directly associated with the TTL defined for the cache (currently defined in plugins.security.cache.ttl_minutes, with 60 minutes as a default value). Another evidence is that, once we flush/purge the cache, the problems seems to be solve temporarily, until the cache expires again.

I don't have any more practical evidences/logs about it, but, as a temporary solution, we've increased the cache's TTL, and started to flush the cache once a day.

Additional info: we use the Opensearch's javascript client (v2.0.0) to save the documents in Opensearch.

@JustinasKO
Copy link

JustinasKO commented Jun 9, 2022

Yes as temp workaround we run 'DELETE _plugins/_security/api/cache' this solves the problem till next cache expire. Or "Purge cache" button from Dashboars > Security tab

@amitgalitz
Copy link
Member

amitgalitz commented Jun 13, 2022

Experienced the exact same error on a OS 2.0 cluster without Logstash. It occurred after indexing close to 1 billion documents with elasticsearch python client (7.10.0). Same exact python code didn't show the error on 1.3 OS and also the error doesn't happen consistently on OS 2.0 either. Experienced the bug for the first time after calling _cat/indices a few times

@aabukhalil
Copy link

I will be looking into this.

@JustinasKO
Copy link

JustinasKO commented Jun 20, 2022

Upgraded to version 2.0.1 on Friday and turned off all scheduled cache purge. There were no issues for some time but on Sunday got same error again. So issue still exists but appears less frequently.
Previously we had these errors once a couple hours at least.

@louzadod
Copy link

Hi @JustinasKO . I also tried 2.0.1 and confirmed that it does not fix this issue.

@galion96
Copy link

Same here, latest fix didn't seem to have any impact. Still get a node randomly failing which causes this error. In that case I have to restart the node as a tmp fix.

I have installed it via helm chart if it means anything

@louzadod
Copy link

While making changes in the security information (user, roles and so on) via OS Dashboards, the problem appeared again.
I had to restart again my clients do get back to normal promptly. One possible cause may be the security plugin.

@msoler8785
Copy link

I am also using the Bulk API with the elasticsearch-net client. I followed the advice above and changed the
ttl in my opensearch.yml config to plugins.security.cache.ttl_minutes: 1440.

This decreased the frequency of errors drastically. I then implemented logic to detect the error and call the flush cache api in my app before retrying. It's super hacky but it's getting me by for now.

if (item.Error.Reason == "java.io.OptionalDataException")
{
    try
    {
        var flushCacheResponse = await _simpleCacheClient.FlushAuthCacheAsync();
    }
    catch (Exception ex)
    {
        _logger.LogError(ex, "Error flushing auth cache.");
    }
}
   public class SimpleCacheClient : ISimpleCacheClient
   {
       private readonly HttpClient _http;
       private readonly ILogger<SimpleCacheClient> _logger;
       private readonly AppOptions _options;

       public SimpleCacheClient(
           HttpClient httpClient,
           IOptions<AppOptions> options,
           ILogger<SimpleCacheClient> logger
           )
       {
           _http = httpClient;
           _logger = logger;
           _options = options.Value;
       }

       public async Task<FlushCacheResponse> FlushAuthCacheAsync()
       {
           _logger.LogDebug("Begin {method}", nameof(FlushAuthCacheAsync));

           List<Exception> innerExceptions = null;
           foreach (var item in _options.ElasticClusterNodes)
           {
               try
               {
                   var url = $"{item}/_opendistro/_security/api/cache";
                   var response = await _http.DeleteAsync(url);
                   _logger.LogDebug("Attempting to flush cache at url {url}", url);

                   if (response.IsSuccessStatusCode)
                   {
                       using var stream = await response.Content.ReadAsStreamAsync();
                       return JsonSerializer.Deserialize<FlushCacheResponse>(stream);
                   }
                   else
                   {
                       throw new HttpRequestException($"Error sending request: {response.ReasonPhrase}");
                   }
               }
               catch (Exception ex)
               {
                   if (innerExceptions == null)
                       innerExceptions = new List<Exception>();

                   innerExceptions.Add(ex);
               }
               await Task.Delay(1000);
           }

           var finalException = new AggregateException("Unable to flush the cache from any of the available OpenSearch hosts.", innerExceptions);
           throw finalException;
       }
   }

@wkruse
Copy link

wkruse commented Jul 1, 2022

Right now everyone, who is using OpenSearch 2.x (we are running OpenSearch 2.0.1) with Security Plugin and Logstash 7.16.2 with OpenSearch output plugin, is probably losing data. And because it happens sporadic, it could go completely unnoticed.

@ggt
Copy link

ggt commented Jul 1, 2022

I can confirm i can reproduce the issue with few requests which fix the issue or open it:

curl -X DELETE --key /usr/share/opensearch/config/admin-key.pem -ks --cert /usr/share/opensearch/config/admin.pem https://localhost:9200/_plugins/_security/api/cache

2.0.1 with logstash oss 7.16.2

My problem appears since 1.3.2 -> 2.0.0

PS: if i backup the config, config is detected as version 1 and not 2, i don't know why....

@msoler8785
Copy link

So I'm beginning to think that this may be caused by an incompatibility with the Elasticsearch version of the clients. I implemented my own bulk index client and haven't experienced these issues.

Originally I was using elasticsearch-net client version 7.12. I'm not using Logstash but I imagine it is using the Elasticsearch versions of the client libraries internally and is running into the same issues.

I think we have just hit a point where the API has diverged enough that the Elasticsearch clients are no longer compatible with > 2.x OpenSearch.

@ggt
Copy link

ggt commented Jul 4, 2022

I can confirm, it solved my issue with: plugins.security.cache.ttl_minutes: 1440 in opensearch.yml!

@daxxog
Copy link

daxxog commented Jul 4, 2022

So I'm beginning to think that this may be caused by an incompatibility with the Elasticsearch version of the clients. I implemented my own bulk index client and haven't experienced these issues.

Originally I was using elasticsearch-net client version 7.12. I'm not using Logstash but I imagine it is using the Elasticsearch versions of the client libraries internally and is running into the same issues.

I think we have just hit a point where the API has diverged enough that the Elasticsearch clients are no longer compatible with > 2.x OpenSearch.

@msoler8785 I do remember seeing this change in OpenSearch 2.

@amitgalitz
Copy link
Member

I ran the exact same python script (repeatedly calling bulk api) once utilizing elasticsearch-py (7.10.0) and once using openearch-py client (2.0.0). On the first attempt (elasticsearch client) to index 1b docs (~111gb), data went missing and I got the same exceptions as mentioned above with only 1b docs (~105gb) indexed. Using the opensearch-py client the exception didn't occur. Since so far this exception seems to be somewhat random, I can't say that with more data the opensearch client wont experience the same issue but seems to likely be related to incompatibility with elasticsearch client

@w39hh
Copy link

w39hh commented Jul 6, 2022

We are using 2.0.0, and are also using the Java client 2.0.0 (opensearch-rest-client and opensearch-rest-high-level-client) and have been experiencing the same issues as others on this thread. So not sure it is a client incompatibility. As with others happens intermittently. We are running a 3 node cluster, and problem may occur only once in a few days or within hours.

Caused by: org.opensearch.OpenSearchException: java.io.OptionalDataException
        at org.opensearch.security.support.Base64Helper.deserializeObject(Base64Helper.java:185) ~[?:?]
       ...
Caused by: java.io.IOException
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1767) ~[?:?]

@gswdkks
Copy link

gswdkks commented Jul 7, 2022

I can confirm, it solved my issue with: plugins.security.cache.ttl_minutes: 1440 in opensearch.yml!

Indeed, I solved the issue as well, when I set this configuration!

@LucasCiocari
Copy link

Has anyone with this problem updated to 2.1.0 and verified if it still happens?

Adding ttl_minutes does not solve the problem, it just alleviates it.

@andilem
Copy link

andilem commented Jul 11, 2022

We are still having this problem with 2.1.0.
Looks like it happens after each TTL timeout as mentioned here: #1927.

@tlfeng
Copy link

tlfeng commented Jul 12, 2022

I'm still trying to reproduce to bug locally, and hope it can help me locate the problem.
From the above discussions, I think the key to reproduce the bug is sending bulk API request with a client.
Based on the feedback from @w39hh (#1961), I wrote a sample code to use Java high level rest client 2.0.0 to send bulk API request.
But I haven't seen the OptionalDataException yet. 🙁

public static void main(String[] args) throws IOException, InterruptedException, NoSuchAlgorithmException, KeyStoreException, KeyManagementException {
        String timeStamp = new SimpleDateFormat("yyyyMMdd_HHmmss").format(Calendar.getInstance().getTime());

        final CredentialsProvider credentialsProvider = new BasicCredentialsProvider();
        credentialsProvider.setCredentials(AuthScope.ANY,
                new UsernamePasswordCredentials("admin", "admin"));

        SSLContext sslContext = new SSLContextBuilder()
                .loadTrustMaterial(null, (certificate, authType) -> true).build();

        RestClientBuilder builder = RestClient.builder(new HttpHost("localhost", 9200, "https"))
                .setHttpClientConfigCallback(new RestClientBuilder.HttpClientConfigCallback() {
                    @Override
                    public HttpAsyncClientBuilder customizeHttpClient(HttpAsyncClientBuilder httpClientBuilder) {
                        return httpClientBuilder.setDefaultCredentialsProvider(credentialsProvider)
                                .setSSLContext(sslContext)
                                .setSSLHostnameVerifier(new NoopHostnameVerifier());
                    }
                });

        RestHighLevelClient client = new RestHighLevelClient(builder);
        
        for(int i=1110; i<19000; i++) {
            BulkRequest request = new BulkRequest();
            request.add(new IndexRequest(String.valueOf(i)).id("1")
                    .source(XContentType.JSON,"date", timeStamp));
            request.add(new DeleteRequest(String.valueOf(i), "1"));
            request.add(new IndexRequest(String.valueOf(i)).id("2")
                    .source(XContentType.JSON,"date", timeStamp));
            request.add(new UpdateRequest(String.valueOf(i), "2")
                    .doc(XContentType.JSON,"field2", "value2"));
            BulkResponse bulkResponse = client.bulk(request, RequestOptions.DEFAULT);
            System.out.println(i + " " + timeStamp);
            System.out.println(bulkResponse.status());
            TimeUnit.SECONDS.sleep(1);
        }
        client.close();
    }

In pom.xml

    <dependencies>
        <dependency>
            <groupId>org.opensearch.client</groupId>
            <artifactId>opensearch-rest-high-level-client</artifactId>
            <version>2.0.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.logging.log4j</groupId>
            <artifactId>log4j-core</artifactId>
            <version>2.18.0</version>
        </dependency>
    </dependencies>

@andilem
Copy link

andilem commented Jul 12, 2022

I can perhaps explain our very easy setup which raised the bug yesterday, but not anymore (for now ~11 hours since I purged the security plugin cache):

  • Docker single node (discovery.type=single-node), TLS disabled (plugins.security.ssl.http.enabled=false), but that shouldn't matter
  • Data stream test-status (with ISM policy, but that shouldn't matter) as index template and manually created (PUT _data_stream/test-status)
  • User with index permissions index on test-* as well as indices:admin/mapping/put on .ds-test-*
    (the latter was required at least in 2.0.1 because it was not properly propagated from the index permission on the data stream; perhaps it's fixed with 2.1.0, didn't check again)
  • recreated Docker container
  • Java 11 "raw" HTTP client, performing one bulk create request (POST test-status/_bulk) with one document every 30 seconds

This raised the mentioned OptionalDataException exactly 6 hours after recreating the Docker container for a duration of exactly 1 hour. About 2 hours later, I purged the security plugin cache, and the problem didn't happen again for now ~11 hours.

@w39hh
Copy link

w39hh commented Jul 14, 2022

@tlfeng When the issue occurs for us, it is not related to a Bulk API request. We have applied the mitigations

  • plugins.security.cache.ttl_minutes: 1440
  • restart every 3 hours

But we still get the issue, although much less frequently, but typically once a day. I have added a much larger portion of the stack trace in the event it helps. (**** = removed text)

[2022-07-14T17:53:20,923][WARN ][r.suppressed             ] [172.27.5.13] path: /****-assign/_search, params: {typed_keys=true, max_concurrent_shard_requests=5, ignore_unavailable=false, expand_wildcards=open, allow_no_indices=true, ignore_throttled=true, index=dawn-assign, search_type=query_then_fetch, batched_reduce_size=512, ccs_minimize_roundtrips=true}
org.opensearch.action.search.SearchPhaseExecutionException: all shards failed
        at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:644) [opensearch-2.0.0.jar:2.0.0]
        at org.opensearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:362) [opensearch-2.0.0.jar:2.0.0]
        at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:679) [opensearch-2.0.0.jar:2.0.0]
        at org.opensearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:459) [opensearch-2.0.0.jar:2.0.0]
        at org.opensearch.action.search.AbstractSearchAsyncAction$1.onFailure(AbstractSearchAsyncAction.java:293) [opensearch-2.0.0.jar:2.0.0]
        at org.opensearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:74) [opensearch-2.0.0.jar:2.0.0]
        at org.opensearch.transport.TransportService$6.handleException(TransportService.java:750) [opensearch-2.0.0.jar:2.0.0]
        at org.opensearch.security.transport.SecurityInterceptor$RestoringTransportResponseHandler.handleException(SecurityInterceptor.java:318) [opensearch-security-2.0.0.0.jar:2.0.0.0]
        at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1370) [opensearch-2.0.0.jar:2.0.0]
        at org.opensearch.transport.InboundHandler.lambda$handleException$3(InboundHandler.java:420) [opensearch-2.0.0.jar:2.0.0]
        at org.opensearch.common.util.concurrent.OpenSearchExecutors$DirectExecutorService.execute(OpenSearchExecutors.java:301) [opensearch-2.0.0.jar:2.0.0]
        at org.opensearch.transport.InboundHandler.handleException(InboundHandler.java:418) [opensearch-2.0.0.jar:2.0.0]
        at org.opensearch.transport.InboundHandler.handlerResponseError(InboundHandler.java:410) [opensearch-2.0.0.jar:2.0.0]
        at org.opensearch.transport.InboundHandler.messageReceived(InboundHandler.java:158) [opensearch-2.0.0.jar:2.0.0]
        at org.opensearch.transport.InboundHandler.inboundMessage(InboundHandler.java:114) [opensearch-2.0.0.jar:2.0.0]
        at org.opensearch.transport.TcpTransport.inboundMessage(TcpTransport.java:769) [opensearch-2.0.0.jar:2.0.0]
        at org.opensearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:175) [opensearch-2.0.0.jar:2.0.0]
        at org.opensearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:150) [opensearch-2.0.0.jar:2.0.0]
        at org.opensearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:115) [opensearch-2.0.0.jar:2.0.0]
        at org.opensearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:94) [transport-netty4-client-2.0.0.jar:2.0.0]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) 
Caused by: org.opensearch.OpenSearchException: java.io.OptionalDataException
        at org.opensearch.security.support.Base64Helper.deserializeObject(Base64Helper.java:185) ~[?:?]
        at org.opensearch.security.auditlog.impl.AbstractAuditLog.getUser(AbstractAuditLog.java:597) ~[?:?]
        at org.opensearch.security.auditlog.impl.AbstractAuditLog.logMissingPrivileges(AbstractAuditLog.java:202) ~[?:?]
        at org.opensearch.security.auditlog.impl.AuditLogImpl.logMissingPrivileges(AuditLogImpl.java:162) ~[?:?]
        at org.opensearch.security.auditlog.AuditLogSslExceptionHandler.logError(AuditLogSslExceptionHandler.java:77) ~[?:?]
        at org.opensearch.security.ssl.transport.SecuritySSLRequestHandler.messageReceived(SecuritySSLRequestHandler.java:169) ~[?:?]
        at org.opensearch.security.OpenSearchSecurityPlugin$7$1.messageReceived(OpenSearchSecurityPlugin.java:651) ~[?:?]
        at org.opensearch.indexmanagement.rollup.interceptor.RollupInterceptor$interceptHandler$1.messageReceived(RollupInterceptor.kt:118) ~[?:?]
        at org.opensearch.performanceanalyzer.transport.PerformanceAnalyzerTransportRequestHandler.messageReceived(PerformanceAnalyzerTransportRequestHandler.java:43) ~[?:?]
        at org.opensearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:103) ~[opensearch-2.0.0.jar:2.0.0]
        at org.opensearch.transport.InboundHandler.handleRequest(InboundHandler.java:249) ~[opensearch-2.0.0.jar:2.0.0]
        at org.opensearch.transport.InboundHandler.messageReceived(InboundHandler.java:132) ~[opensearch-2.0.0.jar:2.0.0]
        at org.opensearch.transport.InboundHandler.inboundMessage(InboundHandler.java:114) ~[opensearch-2.0.0.jar:2.0.0]
        at org.opensearch.transport.TcpTransport.inboundMessage(TcpTransport.java:769) ~[opensearch-2.0.0.jar:2.0.0]
        at org.opensearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:175) ~[opensearch-2.0.0.jar:2.0.0]
        at org.opensearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:150) ~[opensearch-2.0.0.jar:2.0.0]
        at org.opensearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:115) ~[opensearch-2.0.0.jar:2.0.0]
        at org.opensearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:94) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[?:?]
       
Caused by: java.io.IOException
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1767) ~[?:?]
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:514) ~[?:?]
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:472) ~[?:?]
        at java.util.HashSet.readObject(HashSet.java:345) ~[?:?]
        at jdk.internal.reflect.GeneratedMethodAccessor47.invoke(Unknown Source) ~[?:?]
        at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
        at java.lang.reflect.Method.invoke(Method.java:568) ~[?:?]
        at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1231) ~[?:?]
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2434) ~[?:?]
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2268) ~[?:?]
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1744) ~[?:?]
        at java.io.ObjectInputStream$FieldValues.<init>(ObjectInputStream.java:2617) ~[?:?]
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2468) ~[?:?]
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2268) ~[?:?]
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1744) ~[?:?]
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:514) ~[?:?]
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:472) ~[?:?]
        at org.opensearch.security.support.Base64Helper.deserializeObject(Base64Helper.java:183) ~[?:?]
        at org.opensearch.security.auditlog.impl.AbstractAuditLog.getUser(AbstractAuditLog.java:597) ~[?:?]
        at org.opensearch.security.auditlog.impl.AbstractAuditLog.logMissingPrivileges(AbstractAuditLog.java:202) ~[?:?]
        at org.opensearch.security.auditlog.impl.AuditLogImpl.logMissingPrivileges(AuditLogImpl.java:162) ~[?:?]
        at org.opensearch.security.auditlog.AuditLogSslExceptionHandler.logError(AuditLogSslExceptionHandler.java:77) ~[?:?]
        at org.opensearch.security.ssl.transport.SecuritySSLRequestHandler.messageReceived(SecuritySSLRequestHandler.java:169) ~[?:?]
        at org.opensearch.security.OpenSearchSecurityPlugin$7$1.messageReceived(OpenSearchSecurityPlugin.java:651) ~[?:?]
        at org.opensearch.indexmanagement.rollup.interceptor.RollupInterceptor$interceptHandler$1.messageReceived(RollupInterceptor.kt:118) ~[?:?]
        at org.opensearch.performanceanalyzer.transport.PerformanceAnalyzerTransportRequestHandler.messageReceived(PerformanceAnalyzerTransportRequestHandler.java:43) ~[?:?]
        at org.opensearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:103) ~[opensearch-2.0.0.jar:2.0.0]
        at org.opensearch.transport.InboundHandler.handleRequest(InboundHandler.java:249) ~[opensearch-2.0.0.jar:2.0.0]
        at org.opensearch.transport.InboundHandler.messageReceived(InboundHandler.java:132) ~[opensearch-2.0.0.jar:2.0.0]
        at org.opensearch.transport.InboundHandler.inboundMessage(InboundHandler.java:114) ~[opensearch-2.0.0.jar:2.0.0]
        at org.opensearch.transport.TcpTransport.inboundMessage(TcpTransport.java:769) ~[opensearch-2.0.0.jar:2.0.0]
        at org.opensearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:175) ~[opensearch-2.0.0.jar:2.0.0]
        at org.opensearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:150) ~[opensearch-2.0.0.jar:2.0.0]
        at org.opensearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:115) ~[opensearch-2.0.0.jar:2.0.0]
        at org.opensearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:94) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[?:?]

@denisvll
Copy link

Faced same issue after upgrade from 1.3

so, I've found how to reproduce this bug locally

I used vm (4cpu, hdd)
cluster was created by official docker compose file

  1. create cluster
curl https://opensearch.org/samples/docker-compose.yml -O
docker-compose up
  1. run fill.sh script to send sample data to the index
#!/bin/sh
while true
  do
    curl -XPOST -k -u admin:admin https://localhost:9200/test-index/_doc \
      -H "Content-Type: application/json" -d @test_data.json
  done
  1. run in infinity loop on purge cache url
while true;do curl -k -u admin:admin -X DELETE https://localhost:9200/_plugins/_security/api/cache; done
  1. run in infinity loop on _nodes stats
while true; do curl -s -k -u admin:admin https://localhost:9200/_nodes -o /dev/null; done

4 java.io.OptionalDataException will appear in minutes (on my env :)

sh script and json payload can be found in my repo https://github.com/denisvll/os_test

docker ps
CONTAINER ID   IMAGE                                            COMMAND                  CREATED         STATUS         PORTS                                                                                                      NAMES
f9f8f58d95bb   opensearchproject/opensearch:latest              "./opensearch-docker…"   3 minutes ago   Up 7 seconds   0.0.0.0:9200->9200/tcp, :::9200->9200/tcp, 9300/tcp, 0.0.0.0:9600->9600/tcp, :::9600->9600/tcp, 9650/tcp   opensearch-node1
5fd8e3c80116   opensearchproject/opensearch-dashboards:latest   "./opensearch-dashbo…"   3 minutes ago   Up 7 seconds   0.0.0.0:5601->5601/tcp, :::5601->5601/tcp                                                                  opensearch-dashboards
523714cde06c   opensearchproject/opensearch:latest              "./opensearch-docker…"   3 minutes ago   Up 7 seconds   9200/tcp, 9300/tcp, 9600/tcp, 9650/tcp                                                                     opensearch-node2
docker images
REPOSITORY                                TAG       IMAGE ID       CREATED       SIZE
opensearchproject/opensearch              latest    9a3e759bec77   2 weeks ago   854MB
opensearchproject/opensearch-dashboards   latest    7da32241e476   2 weeks ago   998MB

@carlos-neto-trustly
Copy link

carlos-neto-trustly commented Jul 19, 2022

The bug described in this issue is also happening to me.

My OpenSearch infra context is:

Component Version
OpenSearch 2.0.1
Amazon Linux 2
Logstash 7.16.2 (logstash-oss-with-opensearch-output-plugin)

My workaround solution was to create a script that checks the log ingestion status. If the ingestion log has no activity in the last 1 minute, it is executed DELETE _plugins/_security/api/cache in OpenSearch API. This script run over Linux Cron and is executed for every 5 minutes.

The following query is to check log ingestion is stopped:

GET my-index*/
{
  "size": 0,
  "query": {
    "bool": {
      "filter": [
        {
          "range": {
            "ingest_time": {
              "from": "now-1m",
              "to": "now"
            }
          }
        }
      ]
    }
  }
}

Important: Sometimes, the fix command doesn't solve the log ingestion problem. In these cases, it is necessary to restart Logstash instances.

@hauck-jvsh
Copy link

This is also happening with me.

@CEHENKLE CEHENKLE transferred this issue from opensearch-project/OpenSearch Jul 25, 2022
@aabukhalil
Copy link

According to log traces, it all ends in Base64Helper in security plugin

Caused by: org.opensearch.OpenSearchException: java.io.OptionalDataException
at org.opensearch.security.support.Base64Helper.deserializeObject(Base64Helper.java:182) ~[?:?]
at org.opensearch.security.transport.SecurityRequestHandler.messageReceivedDecorate(SecurityRequestHandler.java:150) ~[?:?]
at org.opensearch.security.ssl.transport.SecuritySSLRequestHandler.messageReceived(SecuritySSLRequestHandler.java:96) ~[?:?]
at org.opensearch.security.OpenSearchSecurityPlugin$7$1.messageReceived(OpenSearchSecurityPlugin.java:658) ~[?:?]
at org.opensearch.indexmanagement.rollup.interceptor.RollupInterceptor$interceptHandler$1.messageReceived(RollupInterceptor.kt:118) ~[?:?]
at org.opensearch.performanceanalyzer.transport.PerformanceAnalyzerTransportRequestHandler.messageReceived(PerformanceAnalyzerTransportRequestHandler.java:43) ~[?:?]
at org.opensearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:103) ~[opensearch-2.1.0.jar:2.1.0]
at org.opensearch.transport.TransportService$8.doRun(TransportService.java:952) ~[opensearch-2.1.0.jar:2.1.0]
at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:798) ~[opensearch-2.1.0.jar:2.1.0]
at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.1.0.jar:2.1.0]

and all workarounds and steps to reproduce are security plugin related. I tried to reproduce same issue without security plugin and it didn't happen.

@cliu123
Copy link
Member

cliu123 commented Jul 25, 2022

@cwperks Would you mind taking the first look at this issue?

@amalgamm
Copy link

Any update on this one?

@peternied
Copy link
Member

@amalgamm thanks for asking, this should be closed and the fix will be released in the upcoming OpenSearch 2.2.0 release, track it via opensearch-build#2271. Release candidate builds are available and linked on that issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests