Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[filereceiver] support logs, traces, and compressed reads #22080

Conversation

moh-osman3
Copy link
Contributor

@moh-osman3 moh-osman3 commented May 18, 2023

Description:
[PART 2] Breaking #22021 into 2 PRs
This PR completes the following:

  • supports traces and logs in the filereceiver
  • uses compressed reader to read any compressed data

For more context see part 1: #22079

Link to tracking Issue:
Related to #13626
Supports ongoing effort to record and replay telemetry data for experimental/research purposes

Testing:
Locally running otelcontribcol and writing to a file using all supported formats (json, json+compression, proto, proto+compression) and recovering original metrics using filereceiver

Documentation:

// readLogLine reads the next line in the file, converting it into logs and
// passing it to the the consumer member.
func (fr fileReader) readLogLine(ctx context.Context) error {
line, err := fr.stringReader.ReadString('\n')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use a scanner here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! I believe a scanner would work here, but curious why that would be preferred over the string reader? Have never used a scanner in golang but glancing at bufio's scanner type and saw

Scanning stops unrecoverably at EOF, the first I/O error, or a token too large to fit in the buffer. When a scan stops, the reader may have advanced arbitrarily far past the last token. Programs that need more control over error handling or large tokens, or must run sequential scans on a reader, should use bufio.Reader instead

I'm wondering if this places a limitation on the size of file being read?

cr, _ := zstd.NewReader(file)
fr.stringReader = bufio.NewReader(cr)
} else { // no compression
fr.stringReader = bufio.NewReader(file)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to use a string reader?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No particular reason other than that was already what was being used by the receiver, so I am just extending this functionality to also be able to read chunks of data (instead of just lines). Is there a major drawback/concern with using the string reader you have in mind?

return err
}

dataBuffer := make([]byte, sz)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's weird, what is this? Are we encoding each entry separately along with their size in the stream? Can we not use EOLs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to accurately decode the method of writing the fileexporter uses for proto format. The filexporter currently already supports 2 distinct writing strategies. (1) Writing line by line or (2) writing chunks that are prepended with the size of the chunk (for proto format)

This added functionality now supports the second type of file writing in the exporter. We can't use EOL here because in this case the writer does not include any newlines so decoding would not be accurate

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@atoulme Yes, the old existing code was weird!

@djaglowski
Copy link
Member

I've suggested in the past that this receiver should use pkg/stanza/fileconsumer to track and read files. (See comment)

Given that this PR would duplicate additional functionality that is supported in pkg/stanza/fileconsumer shouldn't we first rebase onto that?

@jmacd
Copy link
Contributor

jmacd commented Jun 7, 2023

@djaglowski The package that you referred to (a) sounds vendor specific -- are you willing to rename it? (b) does not have a README explaining why we should want it and what tradeoffs we might have to use it. Can you file an issue explaining why you'd like this package to be used in more detail? It looks like it is a useful package for production log reading, which is not what this receiver is aimed at doing.

As it stands, the current filereceiver is just a simple receiver designed to read a single file, which @moh-osman3 is not changing in this PR. I suggest we leave improvements in this package until there is a documented reason to do so. We are trying to resolve #13626, which is addressed by @moh-osman3's change. The OTel-Arrow project needs a way to record and replay files to be able to reproduce bugs, and our use-case is not helped by the feature you are requesting. Thank you!

@djaglowski
Copy link
Member

@djaglowski The package that you referred to (a) sounds vendor specific -- are you willing to rename it? (b) does not have a README explaining why we should want it and what tradeoffs we might have to use it. Can you file an issue explaining why you'd like this package to be used in more detail? It looks like it is a useful package for production log reading, which is not what this receiver is aimed at doing.

As it stands, the current filereceiver is just a simple receiver designed to read a single file, which @moh-osman3 is not changing in this PR. I suggest we leave improvements in this package until there is a documented reason to do so. We are trying to resolve #13626, which is addressed by @moh-osman3's change. The OTel-Arrow project needs a way to record and replay files to be able to reproduce bugs, and our use-case is not helped by the feature you are requesting. Thank you!

The package exists in this repo and is used in generic components. What would you rename it to?

I think I must have misunderstood the proposal when I sponsored it. I understood that this would be a complement to the fileexporter, that it would be an accessible (if not ideal) way to transmit OTLP via file.

With that in mind, the main reason to use the fileconsumer package is because the common requirements for components that tail fails are likely to be very similar. We already have robust file tracking and tailing capabilities in the collector, which were specifically isolated so that we would not need to reimplement them in each file tailing component. It's not difficult to use the functionality - here an example of another component doing so in a few lines.

In any case, the reason I have brought this up in each consecutive PR is because I don't think we should continue down the road of redesigning configuration and reimplementing functionality that is likely to converge in the future, nor do I want to be responsible for maintaining a separate implementation.

Although I would still disagree with the approach, I'm happy to hand the decision off to any other @open-telemetry/collector-contrib-approvers who would like to take over sponsorship of the component.

@moh-osman3 moh-osman3 force-pushed the mohosman/file-receiver-read-compressed-data branch from c584ece to 8cf6342 Compare June 11, 2023 18:59
@moh-osman3
Copy link
Contributor Author

Refactored this change to use the fileconsumer.PositionalScanner, and would appreciate some tips on using this package properly.

@djaglowski @atoulme would you mind taking another look at this change?

// Format will specify the format of the file to be read.
// Currently support json and proto options.
FormatType string `mapstructure:"format"`
// type of compression algorithm used. Currently supports zstd.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type -> Type

@@ -15,12 +15,19 @@ import (
"go.uber.org/zap"
)

type consumerType struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name consumerType seemed a little confused.

Do you think is it better that we just directly use metricsConsumer, tracesConsumer , logsConsumer in the fileReceiver struct ?

timer: timer,
}

if compression == compressionTypeZSTD {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part seems redundant, maybe we can change as following?

fr.baseReader = file
if compression == compressionTypeZSTD {
     cr, _ := zstd.NewReader(file)
     fr.baseReader = cr
}

if format == formatTypeProto {
    fr.scanner = fileconsumer.NewPositionalScanner(cr, 10 * 1024, int64(0), ScanChunks)
} else {
    fr.scanner = fileconsumer.NewPositionalScanner(cr, 10 * 1024, int64(0), bufio.ScanLines)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! I may be changing how the fileconsumer package is used to address @djaglowski 's request, but will incorporate your feedback in my next change

@djaglowski
Copy link
Member

@moh-osman3, I don't recommend using the PositionalScanner directly.

The Config and Manager should be used. Example here.

@moh-osman3
Copy link
Contributor Author

@moh-osman3, I don't recommend using the PositionalScanner directly.

The Config and Manager should be used. Example here.

@djaglowski Hmm I'm wondering how I could customize the split function used if I only use the Config and Manager? I see an option for SplitterConfig but struggling to get this to work. The only thing I can control is the MultilineConfig but don't think this will work for reading one of the formats of the fileexporter. I want a split function like https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/22080/files#diff-95d05abde2282d49c10dddc2bd412b09b2a59fedd7f081249123e8fbc4f7759fR41, which doesn't have a clear string delimiter (i.e. the delimiter is a uint32 indicating the number of bytes to read)

@djaglowski
Copy link
Member

@moh-osman3, can you use Config.BuildWithSplitFunc?

@github-actions
Copy link
Contributor

github-actions bot commented Jul 5, 2023

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Jul 5, 2023
@github-actions
Copy link
Contributor

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this Jul 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants