Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests: internal: timeout: added coroutine hang test case #4605

Merged
merged 4 commits into from
Jan 13, 2022

Conversation

leonardo-albertovich
Copy link
Collaborator

This unit test shoult be used to verify the validity of the proposed fixes for the issue that arises when a connection attempt times out, the fixes are :

.1 - PR #4140 - Resumes the coroutines in the timeout handler
.2 - https://github.com/fluent/fluent-bit/tree/leonardo-event-injection-poc - Injects a synthesized event to the event loop to naturally cause the process to be awoken

Signed-off-by: Leonardo Alminana [email protected]

@leonardo-albertovich
Copy link
Collaborator Author

There's a strange error in some Ubuntu 18.04 virtual machines which is odd but unrelated to the patch, this is a dump of the output :

[2022/01/12 16:49:18] [ info] [storage] version=1.1.5, initializing...
[2022/01/12 16:49:18] [ info] [storage] in-memory
[2022/01/12 16:49:18] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2022/01/12 16:49:18] [ info] [cmetrics] version=0.2.2
[2022/01/12 16:49:18] [ info] [sp] stream processor started
Error: 1/12 16:49:19] [error] [net] TCP connection failed: 128.1.1.1:65534 (Connection refused)
Error: 1/12 16:49:19] [error] [output:tcp:tcp.0] no upstream connections available to 128.1.1.1:65534
[2022/01/12 16:49:19] [ warn] [engine] failed to flush chunk '20196-1642006159.5863193.flb', retry in 9 seconds: task_id=0, input=dummy.0 > output=tcp.0 (out_id=0)
[2022/01/12 16:49:28] [ warn] [engine] service will shutdown in max 30 seconds
Error: 1/12 16:49:28] [error] [net] TCP connection failed: 128.1.1.1:65534 (Connection refused)
Error: 1/12 16:49:28] [error] [output:tcp:tcp.0] no upstream connections available to 128.1.1.1:65534
[2022/01/12 16:49:29] [ info] [task] dummy/dummy.0 has 1 pending task(s):
[2022/01/12 16:49:29] [ info] [task]   task_id=0 still running on route(s): tcp/tcp.0 
[2022/01/12 16:49:58] [ info] [task] dummy/dummy.0 has 1 pending task(s):
[2022/01/12 16:49:58] [ info] [task]   task_id=0 still running on route(s): tcp/tcp.0 
[2022/01/12 16:49:58] [ info] [engine] service has stopped (1 pending tasks)
[ FAILED ]
  timeout.c:92: Check for hung coroutines... failed
FAILED: 1 of 1 unit tests has failed.

What I see there is that :

  1. The connection attempt to 128.1.1.1 is refused which is odd considering this worked on other machines and suggests it might not be related to the endpoint but the virtual machine
  2. A second connection seems to be attempted and that's probably because Retry_limit is set to 1 instead of no_retries

I will change the retry_limit setting to no_retries but we will probably have to set up our own blackholed endpoint for this test to reliably work.

@edsiper
Copy link
Member

edsiper commented Jan 12, 2022

thanks.

Please fix the commits subjects typo so we can merge it:

tests: interal:...

@leonardo-albertovich leonardo-albertovich changed the title tests: interal: timeout: added coroutine hang test case tests: internal: timeout: added coroutine hang test case Jan 12, 2022
@leonardo-albertovich
Copy link
Collaborator Author

Fixed the typo but please don't merge it yet, we need to change the endpoint information, I already sent a request to have one hosted by us as this public one is not reliable.

@leonardo-albertovich
Copy link
Collaborator Author

This is ready to be merged @edsiper

@edsiper edsiper merged commit e0afffe into master Jan 13, 2022
@edsiper
Copy link
Member

edsiper commented Jan 13, 2022

thanks

@lecaros lecaros added this to the Fluent Bit v1.8.12 milestone Jan 21, 2022
edsiper pushed a commit that referenced this pull request Jan 22, 2022
edsiper pushed a commit that referenced this pull request Jan 22, 2022
edsiper pushed a commit that referenced this pull request Jan 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants