Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

out_http: Fix keepalive #6784

Merged
merged 2 commits into from
Nov 7, 2023
Merged

out_http: Fix keepalive #6784

merged 2 commits into from
Nov 7, 2023

Conversation

bchess
Copy link
Contributor

@bchess bchess commented Feb 3, 2023

Fixes #6783


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • [N/A] Run local packaging test showing all targets (including any new ones) build.
  • [N/A] Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • [N/A] Documentation required for this feature

Backporting

  • [N/A] Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

@bchess
Copy link
Contributor Author

bchess commented Feb 3, 2023

test.conf:

[Service]
  Flush 1
  Daemon Off
  log_level trace

[Input]
  Name tail
  Path test_data.log
  Read_From_Head on

[Output]
  Name http
  Match *
  Host localhost
  Port 8080
  Uri /
  Format json_lines

In addition to above file, to repro:

% cat > nginx.conf << EOF
daemon off;
error_log /dev/stderr warn;
pid nginx.pid;

events {
  worker_connections 768;
}

http {
  log_format logfmt '$remote_addr - $remote_user [$time_local] "$request" $status $connection $connection_requests';

  server {
    listen 8080;
    access_log /dev/stdout logfmt;

    location / {
      return 200 "ok";
    }

  }
}
EOF
% seq 1 3 > test_data.log
% nginx -c $PWD/nginx.conf &
% fluent-bit -c test.conf
Fluent Bit v2.0.9
* Copyright (C) 2015-2022 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2023/02/03 22:46:54] [ info] Configuration:
[2023/02/03 22:46:54] [ info]  flush time     | 1.000000 seconds
[2023/02/03 22:46:54] [ info]  grace          | 5 seconds
[2023/02/03 22:46:54] [ info]  daemon         | 0
[2023/02/03 22:46:54] [ info] ___________
[2023/02/03 22:46:54] [ info]  inputs:
[2023/02/03 22:46:54] [ info]      tail
[2023/02/03 22:46:54] [ info] ___________
[2023/02/03 22:46:54] [ info]  filters:
[2023/02/03 22:46:54] [ info] ___________
[2023/02/03 22:46:54] [ info]  outputs:
[2023/02/03 22:46:54] [ info]      http.0
[2023/02/03 22:46:54] [ info] ___________
[2023/02/03 22:46:54] [ info]  collectors:
[2023/02/03 22:46:54] [ info] [fluent bit] version=2.0.9, commit=5fae90359f, pid=805024
[2023/02/03 22:46:54] [debug] [engine] coroutine stack size: 24576 bytes (24.0K)
[2023/02/03 22:46:54] [ info] [storage] ver=1.4.0, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2023/02/03 22:46:54] [ info] [cmetrics] version=0.5.8
[2023/02/03 22:46:54] [ info] [ctraces ] version=0.2.7
[2023/02/03 22:46:54] [ info] [input:tail:tail.0] initializing
[2023/02/03 22:46:54] [ info] [input:tail:tail.0] storage_strategy='memory' (memory only)
[2023/02/03 22:46:54] [debug] [tail:tail.0] created event channels: read=21 write=22
[2023/02/03 22:46:54] [debug] [input:tail:tail.0] flb_tail_fs_inotify_init() initializing inotify tail input
[2023/02/03 22:46:54] [debug] [input:tail:tail.0] inotify watch fd=27
[2023/02/03 22:46:54] [debug] [input:tail:tail.0] scanning path test_data.log
[2023/02/03 22:46:54] [debug] [input:tail:tail.0] inode=783319 with offset=0 appended as test_data.log
[2023/02/03 22:46:54] [debug] [input:tail:tail.0] scan_glob add(): test_data.log, inode 783319
[2023/02/03 22:46:54] [debug] [input:tail:tail.0] 1 new files found on path 'test_data.log'
[2023/02/03 22:46:54] [debug] [http:http.0] created event channels: read=29 write=30
[2023/02/03 22:46:54] [ info] [sp] stream processor started
[2023/02/03 22:46:54] [debug] [input chunk] update output instances with new chunk size diff=54
[2023/02/03 22:46:54] [debug] [input:tail:tail.0] [static files] processed 6b
[2023/02/03 22:46:54] [debug] [input:tail:tail.0] inode=783319 file=test_data.log promote to TAIL_EVENT
[2023/02/03 22:46:54] [ info] [input:tail:tail.0] inotify_fs_add(): inode=783319 watch_fd=1 name=test_data.log
[2023/02/03 22:46:54] [debug] [input:tail:tail.0] [static files] processed 0b, done
[2023/02/03 22:46:54] [ info] [output:http:http.0] worker #1 started
[2023/02/03 22:46:54] [ info] [output:http:http.0] worker #0 started
[2023/02/03 22:46:54] [debug] [task] created task=0x7f7c4c034e20 id=0 OK
[2023/02/03 22:46:54] [debug] [output:http:http.0] task_id=0 assigned to thread #0
[2023/02/03 22:46:54] [debug] [upstream] KA connection #58 to localhost:8080 is connected
[2023/02/03 22:46:54] [debug] [http_client] not using http_proxy for header
127.0.0.1 - - [03/Feb/2023:22:46:54 +0000] "POST / HTTP/1.1" 200 1 1

Then write several more lines to test_data.log:

127.0.0.1 - - [03/Feb/2023:22:47:13 +0000] "POST / HTTP/1.1" 200 2 1
127.0.0.1 - - [03/Feb/2023:22:47:13 +0000] "POST / HTTP/1.1" 200 1 2
127.0.0.1 - - [03/Feb/2023:22:47:13 +0000] "POST / HTTP/1.1" 200 2 2

The last number above indicates that the connection is being reused for subsequent requests

@bchess
Copy link
Contributor Author

bchess commented Feb 3, 2023

Valgrind output

$ valgrind ./fluent-bit -c test-http.conf
==805284== Memcheck, a memory error detector
==805284== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==805284== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==805284== Command: ./fluent-bit -c test-http.conf
==805284==
Fluent Bit v2.0.9
* Copyright (C) 2015-2022 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2023/02/03 22:50:56] [ info] Configuration:
[2023/02/03 22:50:56] [ info]  flush time     | 1.000000 seconds
[2023/02/03 22:50:56] [ info]  grace          | 5 seconds
[2023/02/03 22:50:56] [ info]  daemon         | 0
[2023/02/03 22:50:56] [ info] ___________
[2023/02/03 22:50:56] [ info]  inputs:
[2023/02/03 22:50:56] [ info]      tail
[2023/02/03 22:50:56] [ info] ___________
[2023/02/03 22:50:56] [ info]  filters:
[2023/02/03 22:50:56] [ info] ___________
[2023/02/03 22:50:56] [ info]  outputs:
[2023/02/03 22:50:56] [ info]      http.0
[2023/02/03 22:50:56] [ info] ___________
[2023/02/03 22:50:56] [ info]  collectors:
[2023/02/03 22:50:56] [ info] [fluent bit] version=2.0.9, commit=5fae90359f, pid=805284
[2023/02/03 22:50:56] [debug] [engine] coroutine stack size: 24576 bytes (24.0K)
[2023/02/03 22:50:56] [ info] [storage] ver=1.4.0, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2023/02/03 22:50:56] [ info] [cmetrics] version=0.5.8
[2023/02/03 22:50:56] [ info] [ctraces ] version=0.2.7
[2023/02/03 22:50:56] [ info] [input:tail:tail.0] initializing
[2023/02/03 22:50:56] [ info] [input:tail:tail.0] storage_strategy='memory' (memory only)
[2023/02/03 22:50:56] [debug] [tail:tail.0] created event channels: read=21 write=22
[2023/02/03 22:50:56] [debug] [input:tail:tail.0] flb_tail_fs_inotify_init() initializing inotify tail input
[2023/02/03 22:50:56] [debug] [input:tail:tail.0] inotify watch fd=27
[2023/02/03 22:50:56] [debug] [input:tail:tail.0] scanning path test_data.log
[2023/02/03 22:50:56] [debug] [input:tail:tail.0] inode=783319 with offset=0 appended as test_data.log
[2023/02/03 22:50:56] [debug] [input:tail:tail.0] scan_glob add(): test_data.log, inode 783319
[2023/02/03 22:50:56] [debug] [input:tail:tail.0] 1 new files found on path 'test_data.log'
[2023/02/03 22:50:56] [debug] [http:http.0] created event channels: read=29 write=30
[2023/02/03 22:50:56] [ info] [output:http:http.0] worker #0 started
[2023/02/03 22:50:56] [ info] [output:http:http.0] worker #1 started
[2023/02/03 22:50:56] [ info] [sp] stream processor started
[2023/02/03 22:50:56] [debug] [input chunk] update output instances with new chunk size diff=486
[2023/02/03 22:50:56] [debug] [input:tail:tail.0] [static files] processed 54b
[2023/02/03 22:50:56] [debug] [input:tail:tail.0] inode=783319 file=test_data.log promote to TAIL_EVENT
[2023/02/03 22:50:56] [ info] [input:tail:tail.0] inotify_fs_add(): inode=783319 watch_fd=1 name=test_data.log
[2023/02/03 22:50:56] [debug] [input:tail:tail.0] [static files] processed 0b, done
[2023/02/03 22:50:56] [debug] [task] created task=0x515a100 id=0 OK
[2023/02/03 22:50:56] [debug] [output:http:http.0] task_id=0 assigned to thread #0
==805284== Warning: client switching stacks?  SP change: 0x6c9d7f8 --> 0x51605b0
==805284==          to suppress, use: --max-stackframe=28561992 or greater
==805284== Warning: client switching stacks?  SP change: 0x5160508 --> 0x6c9d7f8
==805284==          to suppress, use: --max-stackframe=28562160 or greater
==805284== Warning: client switching stacks?  SP change: 0x6c9d7f8 --> 0x5160508
==805284==          to suppress, use: --max-stackframe=28562160 or greater
==805284==          further instances of this message will not be shown.
[2023/02/03 22:50:56] [debug] [upstream] KA connection #58 to localhost:8080 is connected
[2023/02/03 22:50:56] [debug] [http_client] not using http_proxy for header
127.0.0.1 - - [03/Feb/2023:22:50:56 +0000] "POST / HTTP/1.1" 200 5 1
[2023/02/03 22:50:56] [ info] [output:http:http.0] localhost:8080, HTTP status=200
ok
[2023/02/03 22:50:56] [debug] [upstream] KA connection #58 to localhost:8080 is now available
[2023/02/03 22:50:56] [debug] [out flush] cb_destroy coro_id=0
[2023/02/03 22:50:56] [debug] [task] destroy task=0x515a100 (task_id=0)
[2023/02/03 22:51:10] [debug] [input:tail:tail.0] IN_MODIFY
[2023/02/03 22:51:10] [debug] [input chunk] update output instances with new chunk size diff=54
[2023/02/03 22:51:10] [debug] [task] created task=0x528ddd0 id=0 OK
[2023/02/03 22:51:10] [debug] [output:http:http.0] task_id=0 assigned to thread #1
[2023/02/03 22:51:10] [debug] [upstream] KA connection #59 to localhost:8080 is connected
[2023/02/03 22:51:10] [debug] [http_client] not using http_proxy for header
127.0.0.1 - - [03/Feb/2023:22:51:10 +0000] "POST / HTTP/1.1" 200 6 1
[2023/02/03 22:51:10] [ info] [output:http:http.0] localhost:8080, HTTP status=200
ok
[2023/02/03 22:51:10] [debug] [upstream] KA connection #59 to localhost:8080 is now available
[2023/02/03 22:51:10] [debug] [out flush] cb_destroy coro_id=0
[2023/02/03 22:51:10] [debug] [task] destroy task=0x528ddd0 (task_id=0)
[2023/02/03 22:51:13] [debug] [input:tail:tail.0] IN_MODIFY
[2023/02/03 22:51:13] [debug] [input chunk] update output instances with new chunk size diff=54
[2023/02/03 22:51:13] [debug] [task] created task=0x5310eb0 id=0 OK
[2023/02/03 22:51:13] [debug] [output:http:http.0] task_id=0 assigned to thread #0
[2023/02/03 22:51:13] [debug] [upstream] KA connection #58 to localhost:8080 has been assigned (recycled)
[2023/02/03 22:51:13] [debug] [http_client] not using http_proxy for header
127.0.0.1 - - [03/Feb/2023:22:51:13 +0000] "POST / HTTP/1.1" 200 5 2
[2023/02/03 22:51:13] [ info] [output:http:http.0] localhost:8080, HTTP status=200
ok
[2023/02/03 22:51:13] [debug] [upstream] KA connection #58 to localhost:8080 is now available
[2023/02/03 22:51:13] [debug] [out flush] cb_destroy coro_id=1
[2023/02/03 22:51:13] [debug] [task] destroy task=0x5310eb0 (task_id=0)
[2023/02/03 22:51:14] [debug] [input:tail:tail.0] IN_MODIFY
[2023/02/03 22:51:14] [debug] [input chunk] update output instances with new chunk size diff=54
[2023/02/03 22:51:15] [debug] [task] created task=0x53785a0 id=0 OK
[2023/02/03 22:51:15] [debug] [output:http:http.0] task_id=0 assigned to thread #1
[2023/02/03 22:51:15] [debug] [upstream] KA connection #59 to localhost:8080 has been assigned (recycled)
[2023/02/03 22:51:15] [debug] [http_client] not using http_proxy for header
127.0.0.1 - - [03/Feb/2023:22:51:15 +0000] "POST / HTTP/1.1" 200 6 2
[2023/02/03 22:51:15] [ info] [output:http:http.0] localhost:8080, HTTP status=200
ok
[2023/02/03 22:51:15] [debug] [upstream] KA connection #59 to localhost:8080 is now available
[2023/02/03 22:51:15] [debug] [out flush] cb_destroy coro_id=1
[2023/02/03 22:51:15] [debug] [task] destroy task=0x53785a0 (task_id=0)
[2023/02/03 22:51:16] [debug] [input:tail:tail.0] IN_MODIFY
[2023/02/03 22:51:16] [debug] [input chunk] update output instances with new chunk size diff=54
[2023/02/03 22:51:17] [debug] [task] created task=0x53ddc50 id=0 OK
[2023/02/03 22:51:17] [debug] [output:http:http.0] task_id=0 assigned to thread #0
[2023/02/03 22:51:17] [debug] [upstream] KA connection #58 to localhost:8080 has been assigned (recycled)
[2023/02/03 22:51:17] [debug] [http_client] not using http_proxy for header
127.0.0.1 - - [03/Feb/2023:22:51:17 +0000] "POST / HTTP/1.1" 200 5 3
[2023/02/03 22:51:17] [ info] [output:http:http.0] localhost:8080, HTTP status=200
ok
[2023/02/03 22:51:17] [debug] [upstream] KA connection #58 to localhost:8080 is now available
[2023/02/03 22:51:17] [debug] [out flush] cb_destroy coro_id=2
[2023/02/03 22:51:17] [debug] [task] destroy task=0x53ddc50 (task_id=0)
[2023/02/03 22:51:18] [debug] [input:tail:tail.0] IN_MODIFY
[2023/02/03 22:51:18] [debug] [input chunk] update output instances with new chunk size diff=54
[2023/02/03 22:51:19] [debug] [task] created task=0x5443300 id=0 OK
[2023/02/03 22:51:19] [debug] [output:http:http.0] task_id=0 assigned to thread #1
[2023/02/03 22:51:19] [debug] [upstream] KA connection #59 to localhost:8080 has been assigned (recycled)
[2023/02/03 22:51:19] [debug] [http_client] not using http_proxy for header
127.0.0.1 - - [03/Feb/2023:22:51:19 +0000] "POST / HTTP/1.1" 200 6 3
[2023/02/03 22:51:19] [ info] [output:http:http.0] localhost:8080, HTTP status=200
ok
[2023/02/03 22:51:19] [debug] [upstream] KA connection #59 to localhost:8080 is now available
[2023/02/03 22:51:19] [debug] [out flush] cb_destroy coro_id=2
[2023/02/03 22:51:19] [debug] [task] destroy task=0x5443300 (task_id=0)
^C[2023/02/03 22:51:21] [engine] caught signal (SIGINT)
[2023/02/03 22:51:21] [ warn] [engine] service will shutdown in max 5 seconds
[2023/02/03 22:51:21] [ info] [input] pausing tail.0
[2023/02/03 22:51:22] [ info] [engine] service has stopped (0 pending tasks)
[2023/02/03 22:51:22] [ info] [input] pausing tail.0
[2023/02/03 22:51:22] [debug] [input:tail:tail.0] inode=783319 removing file name test_data.log
[2023/02/03 22:51:22] [ info] [input:tail:tail.0] inotify_fs_remove(): inode=783319 watch_fd=1
[2023/02/03 22:51:22] [ info] [output:http:http.0] thread worker #0 stopping...
[2023/02/03 22:51:22] [ info] [output:http:http.0] thread worker #0 stopped
[2023/02/03 22:51:22] [ info] [output:http:http.0] thread worker #1 stopping...
[2023/02/03 22:51:22] [ info] [output:http:http.0] thread worker #1 stopped
==805284==
==805284== HEAP SUMMARY:
==805284==     in use at exit: 0 bytes in 0 blocks
==805284==   total heap usage: 2,428 allocs, 2,428 frees, 3,840,074 bytes allocated
==805284==
==805284== All heap blocks were freed -- no leaks are possible
==805284==
==805284== For lists of detected and suppressed errors, rerun with: -s
==805284== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

@bchess bchess temporarily deployed to pr February 4, 2023 15:49 — with GitHub Actions Inactive
@bchess bchess temporarily deployed to pr February 4, 2023 15:49 — with GitHub Actions Inactive
@bchess bchess temporarily deployed to pr February 4, 2023 16:06 — with GitHub Actions Inactive
@bchess bchess changed the title Http keepalive out_http: Fix keepalive Feb 6, 2023
When HTTP keepalive is desired, specify the Connection: keep-alive
header. This is a regression from e045d65

Signed-off-by: Ben Chess <[email protected]>
Update the stream keepalive flag to match the net keepalive setting.
This is required for the upstream connection to get recycled

Signed-off-by: Ben Chess <[email protected]>
@github-actions
Copy link
Contributor

github-actions bot commented May 8, 2023

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the Stale label May 8, 2023
@bchess
Copy link
Contributor Author

bchess commented May 8, 2023

Bump

@leonardo-albertovich
Copy link
Collaborator

Hi Ben, thanks for sticking around, I reviewed the fix and it makes sense to me.

The only "issue" with it is that http keepalive is bound to upstream keepalive which in theory could not be the case but in practice that's how it is and overall the pros overcome the cons so I don't have a complaint.

It seems like CI is not having a good time at the moment but that's unrelated to your PR, once we can verify everything it'll be ready to be merged.

Once again, thanks for the patience!

@leonardo-albertovich leonardo-albertovich added ok-package-test Run PR packaging tests and removed ok-to-test labels May 8, 2023
@leonardo-albertovich
Copy link
Collaborator

@bchess, it seems like this fix could be transparently applied to the 2.0 branch, do you want to backport it yourself once it's merged or would you rather have someone from the team do it?

@leonardo-albertovich
Copy link
Collaborator

@edsiper This is ready to be merged from my point of view, let's wrap it up.

@bchess
Copy link
Contributor Author

bchess commented May 9, 2023

@bchess, it seems like this fix could be transparently applied to the 2.0 branch, do you want to backport it yourself once it's merged or would you rather have someone from the team do it?

Thanks for approving. I'm happy to take a stab at backporting to 2.0 after this is merged

@bchess
Copy link
Contributor Author

bchess commented May 25, 2023

hi, anything else I need to do at this point?

@bchess
Copy link
Contributor Author

bchess commented Aug 8, 2023

@leonardo-albertovich @edsiper can you merge?

@pwhelan
Copy link
Contributor

pwhelan commented Nov 6, 2023

@edsiper I checked out the PR and it seems to do what it says. It works as well as it should. I vote to merge it.

*** edit ***

The CI failure is a recurring one we are having with the internal log tests and occurs to it being too strict WRT to timings.

@edsiper edsiper merged commit 5be576e into fluent:master Nov 7, 2023
@edsiper
Copy link
Member

edsiper commented Nov 7, 2023

@bchess thanks for triagging this issue and proving a fix. It seems like keepalive is functional but if there is a proxy that required the keepalive header will generate the issue.

@pwhelan thank you!

@bchess bchess deleted the http_keepalive branch November 8, 2023 15:54
@bchess bchess restored the http_keepalive branch November 8, 2023 15:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ok-package-test Run PR packaging tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

fluent-bit does not use HTTP keepalive
4 participants