Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pkg/stanza/fileconsumer] Add ability to read files asynchronously #23056

Conversation

VihasMakwana
Copy link
Contributor

Description: Added a new feature gate that enables a thread pool mechanism to respect the poll_interval parameter

Current Scenario:

  • If a file takes longer than poll_interval to consume, the current implementation would block until it consumes entirely. In other words, it doesn't respect poll_interval.

Improvisation using thread pooling:

  • In a thread pool model, the backend will queue the files as it proceeds and won't wait for them to consume, all the reading will be asynchronous.

Link to tracking Issue: #18908

Testing: Nothing new added, existing ones are modified as per the feature gate

I will provide benchmarks in the comments.

@VihasMakwana VihasMakwana requested a review from a team June 5, 2023 05:44
@VihasMakwana VihasMakwana marked this pull request as draft June 5, 2023 05:44
@djaglowski djaglowski changed the title feature: Add a new feature gate [pkg/stanza/fileconsumer] Add ability to read files asynchronously Jun 6, 2023
@VihasMakwana VihasMakwana marked this pull request as ready for review June 9, 2023 08:50
@VihasMakwana
Copy link
Contributor Author

will add a changelog entry, @djaglowski please review it!

Copy link
Member

@djaglowski djaglowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VihasMakwana, thanks for continuing on this. I still want to be very careful here but this is looking like a much simpler PR than we had before.

Comment on lines 422 to 424
if useThreadPool.IsEnabled() {
operator, emitCalls = buildTestManagerWithOptions(t, cfg, withReaderChan())
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could probably make better use of the options pattern here, since we are doing this everywhere.

Maybe the options can be set in TestMain and then we can always just call buildTestManager(t, cfg, options...)

pkg/stanza/fileconsumer/trie.go Outdated Show resolved Hide resolved

package fileconsumer

type Trie struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need thorough tests for the trie itself. What do you think about adding the trie and dedicated tests to fileconsumer/internal in a separate PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we can do that. Will reduce the the time complexity of this PR

Comment on lines +62 to +63
f.rwLock.Lock()
defer f.rwLock.Unlock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll have to ensure this doesn't impact performance when the gate is not enabled. If benchmarks can show it's not an issue, that's fine but otherwise can we just check the gate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

Comment on lines +211 to +216
if rt.enableThreadPool {
t.Cleanup(func() {
require.NoError(t, featuregate.GlobalRegistry().Set("filelog.useThreadPool", false))
})
require.NoError(t, featuregate.GlobalRegistry().Set("filelog.useThreadPool", true))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this test require its own management of the gate? Isn't it covered in TestMain?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes because both of them are separate packages. TestMain will only cover the fileconsumer package.

@@ -77,6 +97,13 @@ func (m *Manager) Start(persister operator.Persister) error {
func (m *Manager) Stop() error {
m.cancel()
m.wg.Wait()
if useThreadPool.IsEnabled() {
close(m.readerChan)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Start, it's possible that we return before creating the channel, so we need to check if the channel is nil. This can crash the collector from an otherwise recoverable situation.

pkg/stanza/fileconsumer/file_threadpool.go Outdated Show resolved Hide resolved
// Get the list of paths on disk
matches := m.finder.FindFiles()
m.consumeConcurrent(ctx, matches)
m.clearCurrentFingerprints()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I asked you this before but I can't recall. Why can we do this in an asynchronous situation?

Comment on lines 46 to 56
r.ReadToEnd(ctx)
// Delete a file if deleteAfterRead is enabled and we reached the end of the file
if m.deleteAfterRead && r.eof {
r.Close()
if err := os.Remove(r.file.Name()); err != nil {
m.Errorf("could not delete %s", r.file.Name())
}
} else {
// Save off any files that were not fully read or if deleteAfterRead is disabled
m.saveCurrentConcurrent(r)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to deduplicate this code?

Comment on lines 62 to 87
if _, ok := m.seenPaths[filePath]; !ok {
if m.readerFactory.fromBeginning {
m.Infow("Started watching file", "path", filePath)
} else {
m.Infow("Started watching file from end. To read preexisting logs, configure the argument 'start_at' to 'beginning'", "path", filePath)
}
m.seenPaths[filePath] = struct{}{}
}
file, err := os.Open(filePath) // #nosec - operator must read in files defined by user
if err != nil {
m.Debugf("Failed to open file", zap.Error(err))
return nil, nil
}
fp, err := m.readerFactory.newFingerprint(file)
if err != nil {
m.Errorw("Failed creating fingerprint", zap.Error(err))
return nil, nil
}
// Exclude any empty fingerprints or duplicate fingerprints to avoid doubling up on copy-truncate files

if len(fp.FirstBytes) == 0 {
if err = file.Close(); err != nil {
m.Errorf("problem closing file", "file", file.Name())
}
return nil, nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all of this is duplicated as well. Can we extract it somehow?

@djaglowski
Copy link
Member

@VihasMakwana, #23415 is merged, please rebase.

@VihasMakwana VihasMakwana force-pushed the filelogreceiver_featuregate branch 2 times, most recently from 9696336 to a989864 Compare June 25, 2023 16:18
@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Jul 10, 2023
@github-actions
Copy link
Contributor

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this Jul 25, 2023
@VihasMakwana
Copy link
Contributor Author

@djaglowski lets' keep this one closed, will reopen a fresh PR after we merge our trie's PR

djaglowski added a commit that referenced this pull request Aug 8, 2023
Description: Add Trie data structure and keep it separate from PR
#23056

Testing: Relevant test cases added

---------

Co-authored-by: Dan Jaglowski <[email protected]>
@h0cheung h0cheung mentioned this pull request Aug 20, 2023
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants