Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]requests lost #4630

Closed
3 tasks done
zscomehuyue opened this issue Dec 8, 2023 · 3 comments · Fixed by #4679
Closed
3 tasks done

[Bug]requests lost #4630

zscomehuyue opened this issue Dec 8, 2023 · 3 comments · Fixed by #4679
Labels
bug Something isn't working

Comments

@zscomehuyue
Copy link

Search before asking

  • I had searched in the issues and found no similar issues.

Environment

Linux

EventMesh version

master

What happened

The server core thread pool of Mesh adopts a default dropout strategy; Assuming there is blocking, important requests will be lost;

How to reproduce

The connection between MESH and Rocketmq is disconnected, and the client sends a large number of messages;

Debug logs

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@zscomehuyue zscomehuyue added the bug Something isn't working label Dec 8, 2023
@pandaapo
Copy link
Member

pandaapo commented Dec 9, 2023

The server core thread pool of Mesh adopts a default dropout strategy; Assuming there is blocking, important requests will be lost;

I think this can be optimized by changing the strategy or other ways to reduce the amount of data loss. Would you be willing to submit a PR to fix this?

The connection between MESH and Rocketmq is disconnected, and the client sends a large number of messages;

I think that the situation of disconnected connection should not be considered as "blocking" mentioned above.

@xwm1992
Copy link
Contributor

xwm1992 commented Dec 12, 2023

I think the situation you gave is a message sending failure, not a message loss, because the connection between the mesh and the broker is disconnected, which has nothing to do with the dropout strategy of the thread pool.


我认为你举例这种情况属于消息发送失败,并不是消息丢失,因为mesh与broker的连接断开,与线程池的dropout策略没有关系

@zscomehuyue
Copy link
Author

zscomehuyue commented Dec 14, 2023

当我发送过快消息时,服务端队列1w,会超过该队列时,会触发丢弃策略,EventMeshTcpExceptionHandler的异常处理会触发close session;客户端会进行重新连接;
而被关闭的有可能是订阅session,也有可能是发送session;
感觉服务端丢弃也是合理的,过多的消息,服务端不能及时处理,只有丢弃(放到本地线程执行会影响整体性能)
如果把订阅和发送使用不同线程池,这样发送和订阅就不会相互影响;
改动:

  1. MessageTransferTask 使用独立线程池,采用丢弃,关闭session策略,让客户端重新连接;
  2. 其他Task采用独立线程池采用本地执行,task过多的是MessageTransferTask发送任务;

pandaapo pushed a commit that referenced this issue Dec 19, 2023
#4679)

* fix concurrency problem

* split task handle threadpool

* fix checkstyle problem
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants