Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[engine] caught signal (SIGSEGV) when consecutive tcp packages are arrived via in_tcp plugin. #701

Closed
Gabriel1688 opened this issue Aug 1, 2018 · 1 comment
Assignees

Comments

@Gabriel1688
Copy link

Bug Report

Fluentbit crashed if remote peer sends more than one Tcp package when tcp socket is open. It works well if only one Tcp package is sent for each connection.

Describe the bug

To Reproduce

  • Rubular link if applicable:
  • Example log message if applicable:
    [2018/08/01 18:16:04] [ info] [engine] started (pid=26228)
    [2018/08/01 18:16:04] [debug] [in_tcp] Listen='0.0.0.0' TCP_Port=5170
    [2018/08/01 18:16:04] [ info] [in_tcp] binding 0.0.0.0:5170
    [2018/08/01 18:16:04] [trace] [router] input=tcp.0 tag=tcp.0
    [2018/08/01 18:16:04] [debug] [router] match rule tcp.0:stdout.0
    [2018/08/01 18:16:21] [trace] [in_tcp] new TCP connection arrived FD=20
    [2018/08/01 18:16:21] [trace] [in_tcp] read()=41 pre_len=0 now_len=41
    [2018/08/01 18:16:21] [debug] [input tcp.0] [mem buf] size = 37
    [0] tcp.0: [1533161781.720389480, {"key 1"=>123456789, "key 2"=>"abcdefg"}]
    [2018/08/01 18:16:22] [trace] [task 0x1fd5860] created (id=0)
    [2018/08/01 18:16:22] [debug] [task] created task=0x1fd5860 id=0 OK
    [2018/08/01 18:16:22] [trace] [engine dispatch] task #0 created 0x1fd5860
    [2018/08/01 18:16:22] [trace] [thread 0x1fd3750] created (custom data at 0x1fd3770, size=64
    [2018/08/01 18:16:22] [trace] [engine] [task event] task_id=0 thread_id=0 return=OK
    [2018/08/01 18:16:22] [trace] [thread] destroy thread=0x1fd3750 data=0x1fd3770
    [2018/08/01 18:16:22] [debug] [task] destroy task=0x1fd5860 (task_id=0)
    [2018/08/01 18:16:22] [trace] [in_tcp] fd=20 closed connection

[2018/08/01 18:17:48] [trace] [in_tcp] new TCP connection arrived FD=20
[2018/08/01 18:17:48] [trace] [in_tcp] read()=41 pre_len=0 now_len=41
[2018/08/01 18:17:48] [debug] [input tcp.0] [mem buf] size = 37
[2018/08/01 18:17:49] [trace] [in_tcp] fd=20 closed connection
[0] tcp.0: [1533161868.660901880, {"key 1"=>123456789, "key 2"=>"abcdefg"}]
[2018/08/01 18:17:49] [trace] [task 0x1fd5860] created (id=0)
[2018/08/01 18:17:49] [debug] [task] created task=0x1fd5860 id=0 OK
[2018/08/01 18:17:49] [trace] [engine dispatch] task #0 created 0x1fd5860
[2018/08/01 18:17:49] [trace] [thread 0x1fde310] created (custom data at 0x1fde330, size=64
[2018/08/01 18:17:49] [trace] [engine] [task event] task_id=0 thread_id=0 return=OK
[2018/08/01 18:17:49] [trace] [thread] destroy thread=0x1fde310 data=0x1fde330
[2018/08/01 18:17:49] [debug] [task] destroy task=0x1fd5860 (task_id=0)

[Crashed occurred when 2nd tcp package arrived before tcp connection is closed ]

[2018/08/01 18:19:05] [trace] [in_tcp] new TCP connection arrived FD=20
[2018/08/01 18:19:05] [trace] [in_tcp] read()=41 pre_len=0 now_len=41
[2018/08/01 18:19:05] [debug] [input tcp.0] [mem buf] size = 37
[0] tcp.0: [1533161945.628831412, {"key 1"=>123456789, "key 2"=>"abcdefg"}]
[2018/08/01 18:19:05] [trace] [task 0x1fd5860] created (id=0)
[2018/08/01 18:19:05] [debug] [task] created task=0x1fd5860 id=0 OK
[2018/08/01 18:19:05] [trace] [engine dispatch] task #0 created 0x1fd5860
[2018/08/01 18:19:05] [trace] [thread 0x1fde310] created (custom data at 0x1fde330, size=64
[2018/08/01 18:19:05] [trace] [engine] [task event] task_id=0 thread_id=0 return=OK
[2018/08/01 18:19:05] [trace] [thread] destroy thread=0x1fde310 data=0x1fde330
[2018/08/01 18:19:05] [debug] [task] destroy task=0x1fd5860 (task_id=0)
[engine] caught signal (SIGSEGV)
[2018/08/01 18:19:06] [trace] [in_tcp] read()=41 pre_len=1 now_len=42
#0 0x7fb9384abc3c in ???() at ???:0
#1 0x463a3d in flb_free() at include/fluent-bit/flb_mem.h:91
#2 0x464821 in tcp_conn_event() at plugins/in_tcp/tcp_conn.c:161
#3 0x43508c in flb_engine_start() at src/flb_engine.c:542
#4 0x4211eb in main() at src/fluent-bit.c:824
#5 0x7fb938448444 in ???() at ???:0
#6 0x41f6d6 in ???() at ???:0
#7 0xffffffffffffffff in ???() at ???:0
Aborted (core dumped)

  • Steps to reproduce the problem:

------source code of tcp client --------
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <string.h>
#include <netdb.h>
#include <sys/types.h>
#include <netinet/in.h>
#include <sys/socket.h>
#include <unistd.h>

#define PORT 5170 /* the port client will be connecting to */

#define MAXDATASIZE 100 /* max number of bytes we can get at once */

int main(int argc, char *argv[])
{
int sockfd, numbytes;
char buf[MAXDATASIZE];
struct hostent he;
struct sockaddr_in their_addr; /
connector's address information */
sockfd = socket(AF_INET, SOCK_STREAM, 0);

their_addr.sin_family = AF_INET;      /* host byte order */
their_addr.sin_port = htons(PORT);    /* short, network byte order */
their_addr.sin_addr.s_addr = inet_addr("127.0.0.1");
bzero(&(their_addr.sin_zero), 8);     /* zero the rest of the struct */
char  content[] ="{\"key 1\": 123456789, \"key 2\": \"abcdefg\"}"; 
if (connect(sockfd, (struct sockaddr *)&their_addr, sizeof(struct sockaddr)) == -1)
while (1)
{
    send(sockfd, content, sizeof(content), 0);
    sleep(1);
}
close(sockfd);
return 0;

}

Expected behavior

Once application open a Tcp connection, the connection will exist all the time, and application could send tcp package all the time with same connection until it is closed by remote peer.

Screenshots

Your Environment

  • Version used:
    Version: Fluent Bit v0.13.4
    Internal
    Event Loop = epoll
    Build Flags = JSMN_PARENT_LINKS JSMN_STRICT FLB_HAVE_TLS FLB_HAVE_METRICS FLB_HAVE_SQLDB FLB_HAVE_BUFFERING FLB_HAVE_TRACE FLB_HAVE_METRICS FLB_HAVE_HTTP_SERVER FLB_HAVE_FLUSH_LIBCO FLB_HAVE_FORK FLB_HAVE_PROXY_GO FLB_HAVE_LIBBACKTRACE FLB_HAVE_REGEX FLB_HAVE_C_TLS FLB_HAVE_ACCEPT4 FLB_HAVE_INOTIFY

OS: CentOS 7

  • Configuration:
  • Environment name and version (e.g. Kubernetes? What version?):
  • Server type and version:
  • Operating System and version:
  • Filters and plugins:

[SERVICE]
Flush 2
Daemon Off
Log_Level trace
[INPUT]
Name tcp
Listen 0.0.0.0
Port 5170
Chunk_Size 32
Buffer_Size 64

[OUTPUT]
Name stdout
Match *
Additional context

@Gabriel1688 Gabriel1688 changed the title [engine] caught signal (SIGSEGV) when consecutive tcp packages are arrived from via in_tcp plugin. [engine] caught signal (SIGSEGV) when consecutive tcp packages are arrived via in_tcp plugin. Aug 1, 2018
edsiper added a commit that referenced this issue Aug 6, 2018
edsiper added a commit that referenced this issue Aug 6, 2018
When parsing JSON string which contains a NULL byte, the number of tokens
found returned is zero.

This patch make sure that packaging calls raise an error if no tokens
are returned.

Signed-off-by: Eduardo Silva <[email protected]>
@edsiper edsiper self-assigned this Aug 6, 2018
@edsiper edsiper added the bug label Aug 6, 2018
@edsiper
Copy link
Member

edsiper commented Aug 6, 2018

hi @Gabriel1688

thanks for reporting this issue and a test case, it was very useful.

I've found that the root cause of the problem is not the number of messages in the connection, but it is the implicit NULL byte inserted between each message. In the following code of the test case the real string length is including the NULL byte:

char  content[] ="{\"key 1\": 123456789, \"key 2\": \"abcdefg\"}"; 
...
while (1)
{
    send(sockfd, content, sizeof(content), 0);
    sleep(1);
}

when doing sizeof() of a static string it always must be sizeof(content) - 1, otherwise you are sending a NULL byte over the wire after each JSON message and it's invalid.

The problem in Fluent Bit (in_tcp) was that it was processing an invalid number of tokens returned by the JSON parser; the following commits fixes the issue:

Now the default behavior is that if the parser do not found any tokens, the whole message is skipped.

@edsiper edsiper added the fixed label Aug 6, 2018
edsiper added a commit that referenced this issue Aug 6, 2018
edsiper added a commit that referenced this issue Aug 6, 2018
When parsing JSON string which contains a NULL byte, the number of tokens
found returned is zero.

This patch make sure that packaging calls raise an error if no tokens
are returned.

Signed-off-by: Eduardo Silva <[email protected]>
@edsiper edsiper closed this as completed Aug 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants