Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/filelog] record numbers are incorrect when header is configured #35869

Closed
andrzej-stencel opened this issue Oct 18, 2024 · 1 comment · Fixed by #35870
Closed

[receiver/filelog] record numbers are incorrect when header is configured #35869

andrzej-stencel opened this issue Oct 18, 2024 · 1 comment · Fixed by #35870
Assignees
Labels
bug Something isn't working receiver/filelog

Comments

@andrzej-stencel
Copy link
Member

andrzej-stencel commented Oct 18, 2024

Component(s)

receiver/filelog

What happened?

Description

Configuring the header option for the File Log receiver together with include_file_record_number causes the reported record numbers to be higher than they should be. This happens even when there are no header lines in the file.

Steps to Reproduce (input file with header)

  1. Prepare input file

    cat > logs.txt <<EOF
    heredoc> #xyz
    heredoc> my-log
    heredoc> EOF
  2. Run the collector with the below config otelcol-contrib --config config.yaml

Expected Result

I would expect the "my-log" log to have log.file.record_number equal to 1.

Actual Result

The log has log.file.record_number equal to 3.

The output from Debug exporter shows the following log:

my-log log.file.name=logs.txt log.file.record_number=3 headerattr=xyz

Steps to Reproduce (input file without header)

  1. Prepare input file

    cat > logs.txt <<EOF
    heredoc> my-log
    heredoc> EOF
  2. Run the collector with the below config otelcol-contrib --config config.yaml

Expected Result

I would expect the "my-log" log to have log.file.record_number equal to 1.

Actual Result

The log has log.file.record_number equal to 2.

The output from Debug exporter shows the following log:

my-log log.file.name=logs.txt log.file.record_number=2

Collector version

v0.111.0

Environment information

No response

OpenTelemetry Collector configuration

exporters:
  debug:
    verbosity: normal
receivers:
  filelog:
    header:
      metadata_operators:
        - type: regex_parser
          regex: ^#(?P<headerattr>\S+)$
      pattern: ^#
    include:
      - logs.txt
    include_file_record_number: true
    start_at: beginning
service:
  pipelines:
    logs:
      exporters:
        - debug
      receivers:
        - filelog

Log output

2024-10-18T15:42:50.737+0200    info    [email protected]/service.go:136 Setting up own telemetry...
2024-10-18T15:42:50.737+0200    info    telemetry/metrics.go:70 Serving metrics {"address": "localhost:8888", "metrics level": "Normal"}
2024-10-18T15:42:50.737+0200    info    builders/builders.go:26 Development component. May change in the future.        {"kind": "exporter", "data_type": "logs", "name": "debug"}
2024-10-18T15:42:50.739+0200    info    [email protected]/service.go:208 Starting otelcol-contrib...     {"Version": "0.111.0", "NumCPU": 20}
2024-10-18T15:42:50.739+0200    info    extensions/extensions.go:39     Starting extensions...
2024-10-18T15:42:50.739+0200    info    adapter/receiver.go:47  Starting stanza receiver        {"kind": "receiver", "name": "filelog", "data_type": "logs"}
2024-10-18T15:42:50.740+0200    info    [email protected]/service.go:234 Everything is ready. Begin running and processing data.
2024-10-18T15:42:50.940+0200    info    fileconsumer/file.go:256        Started watching file   {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer", "path": "logs.txt"}
2024-10-18T15:42:51.041+0200    info    LogsExporter    {"kind": "exporter", "data_type": "logs", "name": "debug", "resource logs": 1, "log records": 1}
2024-10-18T15:42:51.041+0200    info    my-log log.file.name=logs.txt log.file.record_number=3 headerattr=xyz
        {"kind": "exporter", "data_type": "logs", "name": "debug"}

Additional context

No response

@andrzej-stencel andrzej-stencel added bug Something isn't working needs triage New item requiring triage labels Oct 18, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@andrzej-stencel andrzej-stencel self-assigned this Oct 18, 2024
@andrzej-stencel andrzej-stencel removed the needs triage New item requiring triage label Oct 28, 2024
djaglowski pushed a commit that referenced this issue Oct 28, 2024
#### Description

Fixes
#35869
by refactoring of the `Reader::ReadToEnd` method.

This refactors the `Reader::ReadToEnd` method by separating reading the
file's header from reading the file's contents.
This results in very similar code in `readHeader` and `readContents`
methods, which was previously deduplicated at the cost of slightly
higher complexity.
The bug could be fixed without separating header reading from contents
reading, but I hope this separation will make it easier to implement
content batching in the Reader
(#35455).
Content batching was my original motivation for these code changes.
I only discovered the problem with record counting when reading the
code.

#### Link to tracking issue

Fixes
#35869

#### Testing

In the first commit I have added tests that document the erroneous
behavior. In the second commit I have fixed the bug and corrected the
tests.

#### Documentation

Added changelog entry.
jpbarto pushed a commit to jpbarto/opentelemetry-collector-contrib that referenced this issue Oct 29, 2024
)

#### Description

Fixes
open-telemetry#35869
by refactoring of the `Reader::ReadToEnd` method.

This refactors the `Reader::ReadToEnd` method by separating reading the
file's header from reading the file's contents.
This results in very similar code in `readHeader` and `readContents`
methods, which was previously deduplicated at the cost of slightly
higher complexity.
The bug could be fixed without separating header reading from contents
reading, but I hope this separation will make it easier to implement
content batching in the Reader
(open-telemetry#35455).
Content batching was my original motivation for these code changes.
I only discovered the problem with record counting when reading the
code.

#### Link to tracking issue

Fixes
open-telemetry#35869

#### Testing

In the first commit I have added tests that document the erroneous
behavior. In the second commit I have fixed the bug and corrected the
tests.

#### Documentation

Added changelog entry.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working receiver/filelog
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant