Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Nacos reconnection failed continuously issue with message "com.alibaba.nacos.common.remote.exception.RemoteException: errCode: 500, errMsg: Unknown payload type:ServerCheckResponse" but actually the connection is already established. #1676

Conversation

rztao
Copy link
Contributor

@rztao rztao commented Nov 17, 2024

Fix Nacos reconnection failed continuously issue with message "com.alibaba.nacos.common.remote.exception.RemoteException: errCode: 500, errMsg: Unknown payload type:ServerCheckResponse" but actually the connection is already established because PayloadRegistry without initializing due to incorrect ClassLoader when creating Nacos client

What type of PR is this?

Bug

What this PR does / why we need it?

Nacos reconnection always failed because PayloadRegistry without initializing due to incorrect ClassLoader when creating Nacos client, This PR will fix it.

Which issue(s) this PR fixes?

Fixes # Nacos client will reconnect to Nacos server when connection is lost and it will depend on PayloadRegistry initializing correctly so RpcClient can found response class to deserialize the response data.

Here is the error stack trace profiling by Arthas.

com.alibaba.nacos.common.remote.exception.RemoteException: errCode: 500, errMsg: Unknown payload type:ServerCheckResponse at com.alibaba.nacos.common.remote.client.grpc.GrpcUtils.parse(GrpcUtils.java:133) at com.alibaba.nacos.common.remote.client.grpc.GrpcClient.serverCheck(GrpcClient.java:198) at com.alibaba.nacos.common.remote.client.grpc.GrpcClient.connectToServer(GrpcClient.java:307) at com.alibaba.nacos.common.remote.client.RpcClient.reconnect(RpcClient.java:498) at com.alibaba.nacos.common.remote.client.RpcClient.lambda$start$2(RpcClient.java:339) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

but actually the reconnect response from Nacos server is success with 200 status code.
body_=@Any[ serialVersionUID=@Long[0], cachedUnpackValue=null, TYPE_URL_FIELD_NUMBER=@Integer[1], typeUrl_=@String[], VALUE_FIELD_NUMBER=@Integer[2], value_=@LiteralByteString[<ByteString@478c0716 size=98 contents="{\"resultCode\":200,\"errorCode\":0,\"connectionId\":...">], memoizedIsInitialized=@Byte[-1], DEFAULT_INSTANCE=@Any[], PARSER=@[com.alibaba.nacos.shaded.com.google.protobuf.Any$1@4e975aba], serialVersionUID=@Long[1], alwaysUseFieldBuilders=@Boolean[false], unknownFields=@UnknownFieldSet[], memoizedSize=@Integer[-1], memoizedHashCode=@Integer[0], ],

Here is the code from GrpcUtils.parse method.

image

And the REGISTRY_REQUEST map from PayloadRegistry is empty but it is initialized

[arthas@1]$ ognl -classLoaderClass com.huaweicloud.sermant.core.classloader.FrameworkClassLoader "@com.alibaba.nacos.common.remote.PayloadRegistry@REGISTRY_REQUEST" @HashMap[isEmpty=true;size=0] [arthas@1]$

Does this PR introduce a user-facing change?

No

Checklist

  • Make sure there is a GitHub_issue related with this PR before you start working on it.
  • Make sure you have squashed your change to one single commit.
  • GitHub Actions works fine in this PR.

Copy link

codecov bot commented Nov 17, 2024

Codecov Report

Attention: Patch coverage is 0% with 11 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...registry/service/register/NacosServiceManager.java 0.00% 8 Missing ⚠️
...rmant/dubbo/registry/utils/NamingServiceUtils.java 0.00% 3 Missing ⚠️
Flag Coverage Δ Complexity Δ
unittests 43.95% <0.00%> (-0.02%) 181.00 <0.00> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ Complexity Δ
...rmant/dubbo/registry/utils/NamingServiceUtils.java 48.38% <0.00%> (-5.19%) 0.00 <0.00> (ø)
...registry/service/register/NacosServiceManager.java 40.47% <0.00%> (-6.75%) 0.00 <0.00> (ø)

🚨 Try these New Features:

@rztao rztao changed the title Fix Nacos reconnection failed issue because PayloadRegistry without initializing due to incorrect ClassLoader when creating Nacos client Fix Nacos reconnection failed issue with message "com.alibaba.nacos.common.remote.exception.RemoteException: errCode: 500, errMsg: Unknown payload type:ServerCheckResponse" but actually the connection is already established because PayloadRegistry without initializing due to incorrect ClassLoader when creating Nacos client Nov 17, 2024
@rztao rztao changed the title Fix Nacos reconnection failed issue with message "com.alibaba.nacos.common.remote.exception.RemoteException: errCode: 500, errMsg: Unknown payload type:ServerCheckResponse" but actually the connection is already established because PayloadRegistry without initializing due to incorrect ClassLoader when creating Nacos client Fix Nacos reconnection failed issue with message "com.alibaba.nacos.common.remote.exception.RemoteException: errCode: 500, errMsg: Unknown payload type:ServerCheckResponse" but actually the connection is already established. Nov 17, 2024
@rztao rztao changed the title Fix Nacos reconnection failed issue with message "com.alibaba.nacos.common.remote.exception.RemoteException: errCode: 500, errMsg: Unknown payload type:ServerCheckResponse" but actually the connection is already established. Fix Nacos reconnection failed continuously issue with message "com.alibaba.nacos.common.remote.exception.RemoteException: errCode: 500, errMsg: Unknown payload type:ServerCheckResponse" but actually the connection is already established. Nov 17, 2024
…nitializing due to incorrect classloader when creating nacos client.

Signed-off-by: rztao <[email protected]>
@rztao rztao force-pushed the fix-nacos-reconnection-failed-issue-due-to-incorrect-init-classloader branch from ab4e5ab to 904b8a1 Compare November 17, 2024 14:17
@provenceee
Copy link
Collaborator

This is a bug of nacos-client. For details, see nacos issue 11139.

@rztao
Copy link
Contributor Author

rztao commented Nov 18, 2024

This is a bug of nacos-client. For details, see nacos issue 11139.

The exception is different: nacos issue 11139 is ClassCastException, for this issue is com.alibaba.nacos.common.remote.exception.RemoteException: errCode: 500, errMsg: Unknown payload type:ServerCheckResponse just because PayloadRegistry is not correctly initialized using BootstrapClassLoader not usingFrameworkClassLoader.

The code for PayloadRegistry.scan is also different from the code pasted from 11139.

We tested it in our online environment, it worked without nacos-client version change.

image

This is a bug of nacos-client. For details, see nacos issue 11139.

@Sherlockhan Sherlockhan merged commit 76b68a4 into sermant-io:develop Nov 20, 2024
87 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants