-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
another approach to blocking reader #12
Conversation
Also I noticed that the In 5a4f8cb I added a new netrasp option that is used to set up the dialing timeout |
Yeah, I've thought of that too just hadn't created an issue for it. I found this comment a while back and was going to do something like this: golang/go#20288 (comment) But it's probably a good idea to allow users to set the timeout regardless. |
In pursuit for understand this use case better I added a few print statements to track which goroutine do what: func (c *contextReader) Read(p []byte) (n int, err error) {
ctx, cancel := context.WithCancel(c.ctx)
defer cancel()
rrCh := make(chan *readResult)
go func() {
id := rand.Intn(300)
dead, _ := ctx.Deadline()
fmt.Printf("goroutine %v started, context expires in %v\n", id, dead.Sub(time.Now()))
select {
case rrCh <- c.read(p, id):
fmt.Printf("goroutine %v ended on received result from ssh channel\n", id)
case <-ctx.Done():
fmt.Printf("goroutine %v ended on ctx timeout\n", id)
}
}()
select {
case <-ctx.Done():
return 0, ctx.Err()
case rr := <-rrCh:
return rr.n, rr.err
}
}
func (c *contextReader) read(p []byte, id int) *readResult {
fmt.Printf("read %v is called\n", id)
n, err := c.r.Read(p)
fmt.Printf("read %v returned result\n", id)
return &readResult{n, err}
} And this produced interesting results, I wonder how would you read it @ogenstad @karimra
|
In particular, the interesting stuff happens when we send # ping command is called, many goroutines start and finish normally until the last two
goroutine 2 started, context expires in 2.953256809s
read 2 is called
goroutine 218 started, context expires in 2.953201021s
read 218 is called # <- this goroutine 218 entered blocking state, since now no data is coming from the channel
read 2 returned result # <- goroutine 2 returned result, but it seems the rrCh is not empty, so it can't write to it
goroutine 2 ended on ctx timeout # <- it ends on timeout instead
unable to run command: error reading output from device reader error: context deadline exceeded
Number of go routines 10
################################################
Sending command:
Comment: First enter
Number of go routines 10
Output:
# upon new command we a new pair of goroutines is created
goroutine 147 started, context expires in 2.999854586s
read 147 is called
goroutine 294 started, context expires in 2.999804606s
read 294 is called
read 218 returned result # <- this goroutine was hanging since previous command, but now its able to read from channel as data appears
goroutine 218 ended on ctx timeout # <- but it seems that the channel is destroyed already? since it was part of previous call to readUntilPrompt? |
Perhaps also print the error if it's not nil here?
The As you indicate I'd also suspect that there's an issue with rrCh no longer being active. A small issue from the example program I posted, the output in the end indicated that there were still 9 goroutines running, in reality that's just a timing issue. If I add a At this point I'm also curious as to what's happening under the hood, but mostly from an academic point of view. |
so that basically means that in #11 we have a leaking goroutine that doesn't get discarded even when the connection is closed. That is weird... |
Yes, that seems surprising and I don't understand why it would be the case. It could be some other timing issue but I also waited for a few seconds and I think it should have been resolved by then if it wasn't anything permanent. I just want to run some additional tests and verify that everything looks good if I try to add a larger number of config lines to some test devices ensure that it's behaving as desired in that scenario. Then I think we can merge this. |
I tried even longer waiting timeouts after closing the device. 35sec. And it was still 2 goroutines reported. |
@ogenstad this is an implementation of another the approach discussed in #11
I wonder if you want to test this one side-to-side with #11 with injecting
runtime.NumGoroutine()
to check the differences.