-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[filereceiver] support logs, traces, and compressed reads #22080
[filereceiver] support logs, traces, and compressed reads #22080
Conversation
receiver/filereceiver/file_reader.go
Outdated
// readLogLine reads the next line in the file, converting it into logs and | ||
// passing it to the the consumer member. | ||
func (fr fileReader) readLogLine(ctx context.Context) error { | ||
line, err := fr.stringReader.ReadString('\n') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use a scanner here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review! I believe a scanner would work here, but curious why that would be preferred over the string reader? Have never used a scanner in golang but glancing at bufio's scanner type and saw
Scanning stops unrecoverably at EOF, the first I/O error, or a token too large to fit in the buffer. When a scan stops, the reader may have advanced arbitrarily far past the last token. Programs that need more control over error handling or large tokens, or must run sequential scans on a reader, should use bufio.Reader instead
I'm wondering if this places a limitation on the size of file being read?
receiver/filereceiver/file_reader.go
Outdated
cr, _ := zstd.NewReader(file) | ||
fr.stringReader = bufio.NewReader(cr) | ||
} else { // no compression | ||
fr.stringReader = bufio.NewReader(file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to use a string reader?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No particular reason other than that was already what was being used by the receiver, so I am just extending this functionality to also be able to read chunks of data (instead of just lines). Is there a major drawback/concern with using the string reader you have in mind?
receiver/filereceiver/file_reader.go
Outdated
return err | ||
} | ||
|
||
dataBuffer := make([]byte, sz) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's weird, what is this? Are we encoding each entry separately along with their size in the stream? Can we not use EOLs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to accurately decode the method of writing the fileexporter uses for proto format. The filexporter currently already supports 2 distinct writing strategies. (1) Writing line by line or (2) writing chunks that are prepended with the size of the chunk (for proto format)
This added functionality now supports the second type of file writing in the exporter. We can't use EOL here because in this case the writer does not include any newlines so decoding would not be accurate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@atoulme Yes, the old existing code was weird!
I've suggested in the past that this receiver should use Given that this PR would duplicate additional functionality that is supported in |
@djaglowski The package that you referred to (a) sounds vendor specific -- are you willing to rename it? (b) does not have a README explaining why we should want it and what tradeoffs we might have to use it. Can you file an issue explaining why you'd like this package to be used in more detail? It looks like it is a useful package for production log reading, which is not what this receiver is aimed at doing. As it stands, the current filereceiver is just a simple receiver designed to read a single file, which @moh-osman3 is not changing in this PR. I suggest we leave improvements in this package until there is a documented reason to do so. We are trying to resolve #13626, which is addressed by @moh-osman3's change. The OTel-Arrow project needs a way to record and replay files to be able to reproduce bugs, and our use-case is not helped by the feature you are requesting. Thank you! |
The package exists in this repo and is used in generic components. What would you rename it to? I think I must have misunderstood the proposal when I sponsored it. I understood that this would be a complement to the With that in mind, the main reason to use the In any case, the reason I have brought this up in each consecutive PR is because I don't think we should continue down the road of redesigning configuration and reimplementing functionality that is likely to converge in the future, nor do I want to be responsible for maintaining a separate implementation. Although I would still disagree with the approach, I'm happy to hand the decision off to any other @open-telemetry/collector-contrib-approvers who would like to take over sponsorship of the component. |
c584ece
to
8cf6342
Compare
Refactored this change to use the @djaglowski @atoulme would you mind taking another look at this change? |
// Format will specify the format of the file to be read. | ||
// Currently support json and proto options. | ||
FormatType string `mapstructure:"format"` | ||
// type of compression algorithm used. Currently supports zstd. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type -> Type
@@ -15,12 +15,19 @@ import ( | |||
"go.uber.org/zap" | |||
) | |||
|
|||
type consumerType struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name consumerType
seemed a little confused.
Do you think is it better that we just directly use metricsConsumer
, tracesConsumer
, logsConsumer
in the fileReceiver
struct ?
timer: timer, | ||
} | ||
|
||
if compression == compressionTypeZSTD { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part seems redundant, maybe we can change as following?
fr.baseReader = file
if compression == compressionTypeZSTD {
cr, _ := zstd.NewReader(file)
fr.baseReader = cr
}
if format == formatTypeProto {
fr.scanner = fileconsumer.NewPositionalScanner(cr, 10 * 1024, int64(0), ScanChunks)
} else {
fr.scanner = fileconsumer.NewPositionalScanner(cr, 10 * 1024, int64(0), bufio.ScanLines)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review! I may be changing how the fileconsumer package is used to address @djaglowski 's request, but will incorporate your feedback in my next change
@moh-osman3, I don't recommend using the PositionalScanner directly. The |
@djaglowski Hmm I'm wondering how I could customize the split function used if I only use the Config and Manager? I see an option for SplitterConfig but struggling to get this to work. The only thing I can control is the MultilineConfig but don't think this will work for reading one of the formats of the fileexporter. I want a split function like https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/22080/files#diff-95d05abde2282d49c10dddc2bd412b09b2a59fedd7f081249123e8fbc4f7759fR41, which doesn't have a clear string delimiter (i.e. the delimiter is a uint32 indicating the number of bytes to read) |
@moh-osman3, can you use Config.BuildWithSplitFunc? |
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
Closed as inactive. Feel free to reopen if this PR is still being worked on. |
Description:
[PART 2] Breaking #22021 into 2 PRs
This PR completes the following:
For more context see part 1: #22079
Link to tracking Issue:
Related to #13626
Supports ongoing effort to record and replay telemetry data for experimental/research purposes
Testing:
Locally running otelcontribcol and writing to a file using all supported formats (json, json+compression, proto, proto+compression) and recovering original metrics using filereceiver
Documentation: