Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prevent race condition decrementing watch count to negative #6

Closed

Conversation

sgtsquiggs
Copy link

@sgtsquiggs sgtsquiggs commented Jul 3, 2019

Prevents negative watch counts when removeWatch is called when there is no such watch. A negative watch count prevents future watches from being created for that path.

Should resolve influxdata/telegraf#3573

Change-Id: I5d4ac569d516b1ee7b606ef2ad144399b07c4fc6
@danielnelson
Copy link

Feels a little bit like we are treating the symptom and not the cause. I think the Cleanup() function as a whole is problematic, and I don't see how it can be used safely. Reading over the history of this function, I don't think it works as intended anymore. The original idea seems to be to close all watches without having a handle to them, but today it doesn't operate like that and requires a handle to the Tail. It doesn't guarantee that all watches are closed, just decremented, and since you need the Tail why not just call Stop()?

I'm feeling more aggressive here, I think we should panic if the refcount goes negative. This would indicate a bug in the tail library. Since Cleanup() is now, as far I can tell, worthless I think we should remove it.

@sgtsquiggs
Copy link
Author

sgtsquiggs commented Jul 5, 2019

Feels a little bit like we are treating the symptom and not the cause.

Agreed! This was a stopgap to "solve" the problem quickly.

After reading through this library I think I will continue work on the tail replacement library extracted from google/mtail, while making it more similar to hpcloud/tail for compatibility and ease of use.

@sgtsquiggs
Copy link
Author

Removing the tail.Cleanup call from the tail plugin in telegraf also resolves this. I think this PR can be closed.

@danielnelson
Copy link

Yeah, let's close this issue, and I'll remove the Cleanup calls and open an issue upstream describing what we found here.

@sgtsquiggs
Copy link
Author

Attempted to fix the issue here: influxdata/telegraf#6074

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Logparser plugin don't process new lines after telegraf configuration reload.
2 participants