Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial Sending Delay #855

Closed
ffatghub opened this issue Sep 16, 2022 · 6 comments · Fixed by #860
Closed

Initial Sending Delay #855

ffatghub opened this issue Sep 16, 2022 · 6 comments · Fixed by #860
Milestone

Comments

@ffatghub
Copy link

Hi all,

I am integrating logstash-logback-encoder in an application that is executing in a Kubernetes cluster. I use LogstashTcpSocketAppender to send logs to Logstash with success. Moreover, I have to use Envoy (https://github.com/envoyproxy/envoy) as injected proxy container between the application and the Logstash server. Envoy is intercepting the LogstashTcpSocketAppender TCP connection and is terminating it, then it opens another TCP connection towards the Logstash server (it might also be another proxy but this is a detail).

first picture here is worth a thousand words, the services are the application and Logstash
https://www.solo.io/blog/istios-networking-in-depth/

The Logback TCP basic protocol assumes that a log has been delivered to the destination as soon as the write() on the socket returns OK, no application level ACK is required. This is an acceptable compromise in almost all circumstances but in the context described above it might lead to loose many log events. In case all Logstash destinations go down (for any reason) the LogstashTcpSocketAppender opens a new connection - that is successfully terminated by the Envoy proxy - and immediately write all available LogEvents found in the RingBuffer into the new socket. Unfortunately the TCP connection terminated by the proxy is established but the second TCP connection between the proxy and the real target is not. After few seconds (typically 5/6) the proxy declares the attempt failed and RST the incoming TCP connection established by LogstashTcpSocketAppender. Then a new LogEvent arrives… LogstashTcpSocketAppender opens a new connection terminated by the proxy… same result, we loose logs. This goes on and on until a Logstash destination is reachable.

I propose to add a configurable small delay, called “Initial Sending Delay” (better name?) just after the TCP socket establishment. During this period of time the socket would be established but not available for sending events.

thank you
ciao

@brenuart
Copy link
Collaborator

Hi,
Thanks for reporting this.

As you described, LLE has indeed no application level ACK which may result in the loss of events in the transmission, especially when you have additional proxies on the path that are likely to add additional buffering. LLE does not "guarantee" the delivery: even without intermediates between LLE and the sink it may loose events already buffered in the TCP window...

As far as I understand LLE establishes a connection with Envoy which in turn tries to connect to the actual destination. This second attempts fails if the destination is not available and Envoy immediately terminates the connection with LLE. However, LLE initially thought every is ok and already started to send events to Envoy (they are now waiting in Envoy buffers or are in-flight in the TCP window). These events are lost when Envoy ultimately closes the connection with LLE. Adding a small delay before sending data in the newly established connection may be a "workaround" in your case - as long as you can find a value that works in the majority of cases. Did I get it right?

A better solution would be to add application-level ACK to the transport mechanism. We already thought about adding support for Lumberjack/Beats support to connect directly to a Logstash instance... This would be very handy to those using the ELK stack. However, it seems that most users are logging into a file and rely on a separate "scrapper" to send the content to the remote destination. Some like Logstash have stronger delivery guarantee than LLE... maybe you should give them a try ;-)

@brenuart brenuart added the status/waiting-for-feedback Waiting for feedback from the original author. Issue will be closed if feedback is not received soon label Sep 16, 2022
@brenuart
Copy link
Collaborator

I forgot to mention that you can maybe reduce the amount of lost events by disabling the TCP write buffer - see https://github.com/logfellow/logstash-logback-encoder#write-buffer-size.

@ffatghub
Copy link
Author

Hi @brenuart

Did I get it right?

perfectly, we control our environment so the timeout would provide a reasonable workaround for corner cases (hopefully the actual destination is available most of the time)

We already thought about adding support for Lumberjack/Beats

that would be nice

maybe you should give them a try ;-)

but direct streaming is better :-)

@github-actions github-actions bot removed the status/waiting-for-feedback Waiting for feedback from the original author. Issue will be closed if feedback is not received soon label Sep 16, 2022
@ffatghub
Copy link
Author

I forgot to mention that you can maybe reduce the amount of lost events by disabling the TCP write buffer - see https://github.com/logfellow/logstash-logback-encoder#write-buffer-size.

thanks it helps, but still a delay is necessary right after the connection is established (because reopenSocket is called by onEvent, so that event will always be lost)

@brenuart
Copy link
Collaborator

Yeah... as we said earlier, unfortunately a few events are always likely to be lost after reconnecting - there is nothing much we can do without application level ACKs :(
I'll have a look at this "initial send delay" feature and will come back asap with my findings. Stay tuned.

brenuart added a commit that referenced this issue Sep 22, 2022
The initialSendDelay feature adds an optional wait time before a newly established connection can be used to write events.

See #855
@brenuart brenuart linked a pull request Sep 22, 2022 that will close this issue
brenuart added a commit that referenced this issue Sep 22, 2022
The initialSendDelay feature adds an optional wait time before a newly established connection can be used to write events. If the connection is closed before the delay expires, nothing is written and no events are lost.

See #855
@brenuart brenuart added this to the 7.3 milestone Sep 22, 2022
@ffatghub
Copy link
Author

Hi @brenuart, thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants