Skip to content
This repository has been archived by the owner on May 25, 2022. It is now read-only.

Fix bug where splitting could be skipped #303

Merged
merged 1 commit into from
Nov 15, 2021

Conversation

djaglowski
Copy link
Member

@djaglowski djaglowski commented Nov 9, 2021

Resolves #292

When force_flush_period is set to a non-zero value,
a forced flushing operation would disregard the possibility
that the data being flushed may contain multiple entries.

For example, the file_input operator could flush the remainder
of a file that it was consuming. If multiple lines were written
to that file, they would be emitted as a single log entry.

The fix is basically that when a forced flush is required, we
will continue reading in lines until reaching the end of file.
Only then will we mark the flush complete.

When force_flush_period is set to a non-zero value,
a forced flushing operation would disregard the possibility
that the data being flushed may contain multiple entries.

For example, the file_input operator could flush the remainder
of a file that it was consuming. If multiple lines were written
to that file, they would be emitted as a single log entry.
@codecov
Copy link

codecov bot commented Nov 9, 2021

Codecov Report

Merging #303 (a4d8694) into main (12dd2d3) will decrease coverage by 0.0%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main    #303     +/-   ##
=======================================
- Coverage   77.0%   77.0%   -0.1%     
=======================================
  Files         94      94             
  Lines       4410    4408      -2     
=======================================
- Hits        3397    3395      -2     
  Misses       697     697             
  Partials     316     316             
Impacted Files Coverage Δ
operator/helper/multiline.go 90.2% <100.0%> (-0.2%) ⬇️

@djaglowski djaglowski marked this pull request as ready for review November 9, 2021 21:18
@djaglowski djaglowski requested review from a team and jsirianni November 9, 2021 21:18
Copy link
Member

@jsirianni jsirianni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial testing looks good, I believe the issue is resolved. My process looks like this:

  1. Deploy GKE cluster and replicate the issue using the OTEL example and the Google Micro Service Demo
  2. Observed the issue pretty quickly. Sometimes, logs would be appended to each other during a forced flush.
  3. Verified that setting force_flush_period to 0 prevents the issue from happening
  4. Reverted step 3, observed the issue again
  5. Deployed this fix by building Collector Contrib against this branch
  6. Verified that the issue is no longer happening

Copy link
Member

@dmitryax dmitryax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for fixing it!

@djaglowski djaglowski merged commit 9c01d75 into open-telemetry:main Nov 15, 2021
@djaglowski djaglowski deleted the force-split branch November 15, 2021 14:03
djaglowski added a commit to djaglowski/opentelemetry-log-collection that referenced this pull request Nov 30, 2021
- `combine_with` setting to `recombine` operator, to allow for joining on custom delimiter ([PR315](open-telemetry#315))
- Issue where `force_flush_period` could cause line splitting to be skipped ([PR303](open-telemetry#303))
- Issue where `tcp_input` and `udp_input` could panic when stopping ([PR273](open-telemetry#273))
- Syslog severity mapping is now aligned with log specification ([PR300](open-telemetry#300))

- Improve error message when timezone database is not found ([PR289](open-telemetry#289))
djaglowski added a commit that referenced this pull request Nov 30, 2021
- `combine_with` setting to `recombine` operator, to allow for joining on custom delimiter ([PR315](#315))
- Issue where `force_flush_period` could cause line splitting to be skipped ([PR303](#303))
- Issue where `tcp_input` and `udp_input` could panic when stopping ([PR273](#273))
- Syslog severity mapping is now aligned with log specification ([PR300](#300))

- Improve error message when timezone database is not found ([PR289](#289))
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multiline splitter doesn't work all the time with default configuration
3 participants