Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tls: improve TLS handshake error message #4561

Merged
merged 6 commits into from
Mar 11, 2022

Conversation

pwhelan
Copy link
Contributor

@pwhelan pwhelan commented Jan 3, 2022

Errors during the TLS handshake are not reported or are obscure and hard to understand. Now we report the error:

[2022/01/03 14:21:24] [error] [tls] error: unexpected EOF

The error handling code for the unexpected EOF error has to be written in this way due to a legacy bug in OpenSSL 1.x, as mentioned at https://www.openssl.org/docs/man1.1.1/man3/SSL_get_error.html.


Testing
Before we can approve your change; please submit the following in a comment:

  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found

@pwhelan
Copy link
Contributor Author

pwhelan commented Jan 3, 2022

valgrind ./bin/fluent-bit -vvv -i dummy -o http -p tls=on -p host=127.0.0.1 -p port=7777 -p match='*' -f 1
==46955== Memcheck, a memory error detector
==46955== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==46955== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==46955== Command: ./bin/fluent-bit -vvv -i dummy -o http -p tls=on -p host=127.0.0.1 -p port=7777 -p match=* -f 1
==46955== 
Fluent Bit v1.9.0
* Copyright (C) 2019-2021 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2022/01/03 14:21:11] [ info] Configuration:
[2022/01/03 14:21:11] [ info]  flush time     | 1.000000 seconds
[2022/01/03 14:21:11] [ info]  grace          | 5 seconds
[2022/01/03 14:21:11] [ info]  daemon         | 0
[2022/01/03 14:21:11] [ info] ___________
[2022/01/03 14:21:11] [ info]  inputs:
[2022/01/03 14:21:11] [ info]      dummy
[2022/01/03 14:21:11] [ info] ___________
[2022/01/03 14:21:11] [ info]  filters:
[2022/01/03 14:21:11] [ info] ___________
[2022/01/03 14:21:11] [ info]  outputs:
[2022/01/03 14:21:11] [ info]      http.0
[2022/01/03 14:21:11] [ info] ___________
[2022/01/03 14:21:11] [ info]  collectors:
[2022/01/03 14:21:11] [ info] [engine] started (pid=46955)
[2022/01/03 14:21:11] [debug] [engine] coroutine stack size: 24576 bytes (24.0K)
[2022/01/03 14:21:11] [debug] [storage] [cio stream] new stream registered: dummy.0
[2022/01/03 14:21:11] [ info] [storage] version=1.1.5, initializing...
[2022/01/03 14:21:11] [ info] [storage] in-memory
[2022/01/03 14:21:11] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2022/01/03 14:21:11] [ info] [cmetrics] version=0.2.2
[2022/01/03 14:21:11] [debug] [http:http.0] created event channels: read=19 write=20
[2022/01/03 14:21:11] [debug] [router] match rule dummy.0:http.0
[2022/01/03 14:21:11] [ info] [sp] stream processor started
[2022/01/03 14:21:12] [debug] [input chunk] update output instances with new chunk size diff=26
[2022/01/03 14:21:13] [debug] [task] created task=0x53e6b00 id=0 OK
[2022/01/03 14:21:13] [debug] [input chunk] update output instances with new chunk size diff=26
[2022/01/03 14:21:14] [debug] [task] created task=0x5469400 id=1 OK
[2022/01/03 14:21:14] [debug] [input chunk] update output instances with new chunk size diff=26
[2022/01/03 14:21:15] [debug] [task] created task=0x54e4100 id=2 OK
[2022/01/03 14:21:15] [debug] [input chunk] update output instances with new chunk size diff=26
[2022/01/03 14:21:16] [debug] [task] created task=0x555ee00 id=3 OK
[2022/01/03 14:21:16] [debug] [input chunk] update output instances with new chunk size diff=26
[2022/01/03 14:21:17] [debug] [task] created task=0x55d9b00 id=4 OK
[2022/01/03 14:21:17] [debug] [input chunk] update output instances with new chunk size diff=26
[2022/01/03 14:21:18] [debug] [task] created task=0x5654800 id=5 OK
[2022/01/03 14:21:18] [debug] [input chunk] update output instances with new chunk size diff=26
[2022/01/03 14:21:19] [debug] [task] created task=0x56cf500 id=6 OK
[2022/01/03 14:21:19] [debug] [input chunk] update output instances with new chunk size diff=26
[2022/01/03 14:21:20] [debug] [task] created task=0x7066220 id=7 OK
[2022/01/03 14:21:20] [debug] [input chunk] update output instances with new chunk size diff=26
[2022/01/03 14:21:21] [debug] [task] created task=0x70e0f20 id=8 OK
[2022/01/03 14:21:21] [debug] [input chunk] update output instances with new chunk size diff=26
[2022/01/03 14:21:22] [debug] [task] created task=0x715bc20 id=9 OK
[2022/01/03 14:21:22] [debug] [input chunk] update output instances with new chunk size diff=26
[2022/01/03 14:21:23] [debug] [task] created task=0x71d6920 id=10 OK
[2022/01/03 14:21:23] [debug] [input chunk] update output instances with new chunk size diff=26
[2022/01/03 14:21:24] [error] [upstream] connection #25 to 127.0.0.1:7777 timed out after 10 seconds
[2022/01/03 14:21:24] [error] [tls] error: unexpected EOF
[2022/01/03 14:21:24] [debug] [socket] could not validate socket status for #25 (don't worry)
[2022/01/03 14:21:24] [debug] [upstream] connection #-1 failed to 127.0.0.1:7777
[2022/01/03 14:21:24] [error] [output:http:http.0] no upstream connections available to 127.0.0.1:7777
[2022/01/03 14:21:24] [error] [upstream] connection #26 to 127.0.0.1:7777 timed out after 10 seconds
[2022/01/03 14:21:24] [error] [tls] error: unexpected EOF
[2022/01/03 14:21:24] [debug] [socket] could not validate socket status for #26 (don't worry)
[2022/01/03 14:21:24] [debug] [upstream] connection #-1 failed to 127.0.0.1:7777
[2022/01/03 14:21:24] [error] [output:http:http.0] no upstream connections available to 127.0.0.1:7777
[2022/01/03 14:21:24] [debug] [task] created task=0x72516c0 id=11 OK
[2022/01/03 14:21:24] [debug] [input chunk] update output instances with new chunk size diff=26
[2022/01/03 14:21:24] [debug] [out flush] cb_destroy coro_id=0
[2022/01/03 14:21:24] [debug] [out flush] cb_destroy coro_id=1
[2022/01/03 14:21:24] [debug] [retry] new retry created for task_id=0 attempts=1
[2022/01/03 14:21:24] [ warn] [engine] failed to flush chunk '46955-1641230472.243811716.flb', retry in 10 seconds: task_id=0, input=dummy.0 > output=http.0 (out_id=0)
[2022/01/03 14:21:24] [debug] [retry] new retry created for task_id=1 attempts=1
[2022/01/03 14:21:24] [ warn] [engine] failed to flush chunk '46955-1641230473.324440044.flb', retry in 10 seconds: task_id=1, input=dummy.0 > output=http.0 (out_id=0)
^C[2022/01/03 14:21:24] [engine] caught signal (SIGINT)
[2022/01/03 14:21:24] [debug] [task] created task=0x72cc9d0 id=12 OK
[2022/01/03 14:21:25] [ warn] [engine] service will shutdown in max 5 seconds
[2022/01/03 14:21:25] [debug] [input chunk] dummy.0 is paused, cannot append records
[2022/01/03 14:21:25] [ info] [task] dummy/dummy.0 has 13 pending task(s):
[2022/01/03 14:21:25] [ info] [task]   task_id=0 still running on route(s): http/http.0 
[2022/01/03 14:21:25] [ info] [task]   task_id=1 still running on route(s): http/http.0 
[2022/01/03 14:21:25] [ info] [task]   task_id=2 still running on route(s): http/http.0 
[2022/01/03 14:21:25] [ info] [task]   task_id=3 still running on route(s): http/http.0 
[2022/01/03 14:21:25] [ info] [task]   task_id=4 still running on route(s): http/http.0 
[2022/01/03 14:21:25] [ info] [task]   task_id=5 still running on route(s): http/http.0 
[2022/01/03 14:21:25] [ info] [task]   task_id=6 still running on route(s): http/http.0 
[2022/01/03 14:21:25] [ info] [task]   task_id=7 still running on route(s): http/http.0 
[2022/01/03 14:21:25] [ info] [task]   task_id=8 still running on route(s): http/http.0 
[2022/01/03 14:21:25] [ info] [task]   task_id=9 still running on route(s): http/http.0 
[2022/01/03 14:21:25] [ info] [task]   task_id=10 still running on route(s): http/http.0 
[2022/01/03 14:21:25] [ info] [task]   task_id=11 still running on route(s): http/http.0 
[2022/01/03 14:21:25] [ info] [task]   task_id=12 still running on route(s): http/http.0 
[2022/01/03 14:21:25] [error] [tls] error: unexpected EOF
[2022/01/03 14:21:25] [debug] [socket] could not validate socket status for #27 (don't worry)
[2022/01/03 14:21:25] [debug] [upstream] connection #-1 failed to 127.0.0.1:7777
[2022/01/03 14:21:25] [error] [output:http:http.0] no upstream connections available to 127.0.0.1:7777
[2022/01/03 14:21:25] [debug] [out flush] cb_destroy coro_id=2
[2022/01/03 14:21:26] [debug] [input chunk] dummy.0 is paused, cannot append records
[2022/01/03 14:21:27] [error] [tls] error: unexpected EOF
[2022/01/03 14:21:27] [debug] [socket] could not validate socket status for #28 (don't worry)
[2022/01/03 14:21:27] [debug] [upstream] connection #-1 failed to 127.0.0.1:7777
[2022/01/03 14:21:27] [error] [output:http:http.0] no upstream connections available to 127.0.0.1:7777
[2022/01/03 14:21:27] [error] [tls] error: unexpected EOF
[2022/01/03 14:21:27] [debug] [socket] could not validate socket status for #29 (don't worry)
[2022/01/03 14:21:27] [debug] [upstream] connection #-1 failed to 127.0.0.1:7777
[2022/01/03 14:21:27] [error] [output:http:http.0] no upstream connections available to 127.0.0.1:7777
[2022/01/03 14:21:27] [debug] [input chunk] dummy.0 is paused, cannot append records
[2022/01/03 14:21:27] [debug] [out flush] cb_destroy coro_id=3
[2022/01/03 14:21:27] [debug] [out flush] cb_destroy coro_id=4
[2022/01/03 14:21:28] [debug] [input chunk] dummy.0 is paused, cannot append records
[2022/01/03 14:21:28] [error] [tls] error: unexpected EOF
[2022/01/03 14:21:28] [debug] [socket] could not validate socket status for #30 (don't worry)
[2022/01/03 14:21:28] [debug] [upstream] connection #-1 failed to 127.0.0.1:7777
[2022/01/03 14:21:28] [error] [output:http:http.0] no upstream connections available to 127.0.0.1:7777
[2022/01/03 14:21:28] [debug] [out flush] cb_destroy coro_id=5
[2022/01/03 14:21:29] [debug] [input chunk] dummy.0 is paused, cannot append records
[2022/01/03 14:21:29] [ info] [task] dummy/dummy.0 has 13 pending task(s):
[2022/01/03 14:21:29] [ info] [task]   task_id=0 still running on route(s): http/http.0 
[2022/01/03 14:21:29] [ info] [task]   task_id=1 still running on route(s): http/http.0 
[2022/01/03 14:21:29] [ info] [task]   task_id=2 still running on route(s): http/http.0 
[2022/01/03 14:21:29] [ info] [task]   task_id=3 still running on route(s): http/http.0 
[2022/01/03 14:21:29] [ info] [task]   task_id=4 still running on route(s): http/http.0 
[2022/01/03 14:21:29] [ info] [task]   task_id=5 still running on route(s): http/http.0 
[2022/01/03 14:21:29] [ info] [task]   task_id=6 still running on route(s): http/http.0 
[2022/01/03 14:21:29] [ info] [task]   task_id=7 still running on route(s): http/http.0 
[2022/01/03 14:21:29] [ info] [task]   task_id=8 still running on route(s): http/http.0 
[2022/01/03 14:21:29] [ info] [task]   task_id=9 still running on route(s): http/http.0 
[2022/01/03 14:21:29] [ info] [task]   task_id=10 still running on route(s): http/http.0 
[2022/01/03 14:21:29] [ info] [task]   task_id=11 still running on route(s): http/http.0 
[2022/01/03 14:21:29] [ info] [task]   task_id=12 still running on route(s): http/http.0 
[2022/01/03 14:21:29] [ info] [engine] service has stopped (11 pending tasks)
[2022/01/03 14:21:29] [debug] [task] destroy task=0x53e6b00 (task_id=0)
[2022/01/03 14:21:29] [debug] [retry] task retry=0x72b83d0, invalidated from the scheduler
[2022/01/03 14:21:29] [debug] [task] destroy task=0x5469400 (task_id=1)
[2022/01/03 14:21:29] [debug] [retry] task retry=0x72cc7e0, invalidated from the scheduler
[2022/01/03 14:21:29] [debug] [task] destroy task=0x54e4100 (task_id=2)
[2022/01/03 14:21:29] [debug] [task] destroy task=0x555ee00 (task_id=3)
==46955== 
==46955== HEAP SUMMARY:
==46955==     in use at exit: 805,220 bytes in 3,981 blocks
==46955==   total heap usage: 7,423 allocs, 3,442 frees, 7,002,452 bytes allocated
==46955== 
==46955== LEAK SUMMARY:
==46955==    definitely lost: 0 bytes in 0 blocks
==46955==    indirectly lost: 0 bytes in 0 blocks
==46955==      possibly lost: 696,178 bytes in 292 blocks
==46955==    still reachable: 109,042 bytes in 3,689 blocks
==46955==         suppressed: 0 bytes in 0 blocks
==46955== Rerun with --leak-check=full to see details of leaked memory
==46955== 
==46955== For lists of detected and suppressed errors, rerun with: -s
==46955== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

@edsiper edsiper changed the title TLS handshake error tls: improve TLS handshake error message Jan 5, 2022
@edsiper
Copy link
Member

edsiper commented Jan 6, 2022

@pwhelan can you pls rebase on top of Master ? there are some recent openssl changes

Copy link
Collaborator

@nokute78 nokute78 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some review comments.

  • How about using thread-safe function ERR_error_string_n ?
  • Multiline comment style

src/tls/openssl.c Outdated Show resolved Hide resolved
src/tls/openssl.c Outdated Show resolved Hide resolved
src/tls/openssl.c Outdated Show resolved Hide resolved
src/tls/openssl.c Outdated Show resolved Hide resolved
src/tls/openssl.c Outdated Show resolved Hide resolved
@niedbalski niedbalski temporarily deployed to integration March 9, 2022 22:01 Inactive
@niedbalski niedbalski temporarily deployed to integration March 9, 2022 22:05 Inactive
@pwhelan
Copy link
Contributor Author

pwhelan commented Mar 9, 2022

@nokute78 or @edsiper any chance of getting this merged?

@niedbalski niedbalski merged commit da871ff into fluent:master Mar 11, 2022
mgeriesa pushed a commit to mgeriesa/fluent-bit that referenced this pull request Oct 25, 2022
* openssl/tls: log handshake error.

Signed-off-by: Phillip Whelan <[email protected]>

* tls: use ERR_error_string_n over ERR_error_string.

Signed-off-by: Phillip Whelan <[email protected]>

* tls: reformat multiline comment to align to 80 columns.

Signed-off-by: Phillip Whelan <[email protected]>
Signed-off-by: Manal Geries <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants