Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix hang for --discover-log-location flag #592

Merged
merged 1 commit into from
Aug 30, 2024

Conversation

msakrejda
Copy link
Contributor

This was missed in #384.

@msakrejda msakrejda requested a review from a team August 29, 2024 22:44
@msakrejda msakrejda mentioned this pull request Aug 29, 2024
@@ -225,6 +225,8 @@ func run(ctx context.Context, wg *sync.WaitGroup, globalCollectionOpts state.Col

if globalCollectionOpts.DiscoverLogLocation {
selfhosted.DiscoverLogLocation(ctx, servers, globalCollectionOpts, logger)
testRunSuccess = make(chan bool, 1)
testRunSuccess <- true
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this is a good fix (this isn't really a "testRun"), but it works, and I don't have any better ideas. Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is happening because the main function selects from this channel, and selecting from a nil channel will block forever. It seems Go's recommended solution is to add a timeout to that select, but we wouldn't want to confuse users with a timeout error when the function is returning normally.

So since select statements apparently can't check if the channel is nil first, this is the only way for the main function to handle it:

	DoneOrSignal:
		for {
			if testRunSuccess != nil {
				select {
				case success := <-testRunSuccess:
					if reloadRun {
						if success {
							Reload(logger)
						} else {
							logger.PrintError("Error: Reload requested, but ignoring since configuration errors are present")
							exitCode = 1
						}
					} else if !success {
						exitCode = 1
					}
					break DoneOrSignal
				case s := <-sigs:
					if s == syscall.SIGINT || s == syscall.SIGTERM {
						logger.PrintError("Interrupt")
						break DoneOrSignal
					}
				}
			} else {
				select {
				case s := <-sigs:
					if s == syscall.SIGINT || s == syscall.SIGTERM {
						logger.PrintError("Interrupt")
						break DoneOrSignal
					}
				}
			}
		}

I don't think either option is easy to understand, so your approach is probably better since it results in fewer lines of code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I think the underlying bug here is that we expect anything that exits quickly, instead of continuing to run, to be a test run and depend on reading something from the testRun channel (or a signal). I think a proper fix involves rethinking that, but the run method is kind of daunting, and I'd rather fix this first and refactor that some other time.

@@ -225,6 +225,8 @@ func run(ctx context.Context, wg *sync.WaitGroup, globalCollectionOpts state.Col

if globalCollectionOpts.DiscoverLogLocation {
selfhosted.DiscoverLogLocation(ctx, servers, globalCollectionOpts, logger)
testRunSuccess = make(chan bool, 1)
testRunSuccess <- true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is happening because the main function selects from this channel, and selecting from a nil channel will block forever. It seems Go's recommended solution is to add a timeout to that select, but we wouldn't want to confuse users with a timeout error when the function is returning normally.

So since select statements apparently can't check if the channel is nil first, this is the only way for the main function to handle it:

	DoneOrSignal:
		for {
			if testRunSuccess != nil {
				select {
				case success := <-testRunSuccess:
					if reloadRun {
						if success {
							Reload(logger)
						} else {
							logger.PrintError("Error: Reload requested, but ignoring since configuration errors are present")
							exitCode = 1
						}
					} else if !success {
						exitCode = 1
					}
					break DoneOrSignal
				case s := <-sigs:
					if s == syscall.SIGINT || s == syscall.SIGTERM {
						logger.PrintError("Interrupt")
						break DoneOrSignal
					}
				}
			} else {
				select {
				case s := <-sigs:
					if s == syscall.SIGINT || s == syscall.SIGTERM {
						logger.PrintError("Interrupt")
						break DoneOrSignal
					}
				}
			}
		}

I don't think either option is easy to understand, so your approach is probably better since it results in fewer lines of code.

@msakrejda msakrejda merged commit 677eac2 into main Aug 30, 2024
9 checks passed
@msakrejda msakrejda deleted the fix-discover-log-location-hang branch August 30, 2024 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants