Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

another approach to blocking reader #12

Merged
merged 8 commits into from
Apr 17, 2021
Merged

Conversation

hellt
Copy link
Contributor

@hellt hellt commented Apr 16, 2021

@ogenstad this is an implementation of another the approach discussed in #11

I wonder if you want to test this one side-to-side with #11 with injecting runtime.NumGoroutine() to check the differences.

@hellt
Copy link
Contributor Author

hellt commented Apr 16, 2021

Also I noticed that the ctx that you pass to device.Dial() does not have any effect on ssh dial timeout, since ssh.Dial() doesn't take in contexts

In 5a4f8cb I added a new netrasp option that is used to set up the dialing timeout

@ogenstad
Copy link
Member

Yeah, I've thought of that too just hadn't created an issue for it. I found this comment a while back and was going to do something like this: golang/go#20288 (comment)

But it's probably a good idea to allow users to set the timeout regardless.

@hellt
Copy link
Contributor Author

hellt commented Apr 16, 2021

In pursuit for understand this use case better I added a few print statements to track which goroutine do what:

func (c *contextReader) Read(p []byte) (n int, err error) {
	ctx, cancel := context.WithCancel(c.ctx)
	defer cancel()
	rrCh := make(chan *readResult)

	go func() {
		id := rand.Intn(300)
		dead, _ := ctx.Deadline()
		fmt.Printf("goroutine %v started, context expires in %v\n", id, dead.Sub(time.Now()))
		select {
		case rrCh <- c.read(p, id):
			fmt.Printf("goroutine %v ended on received result from ssh channel\n", id)
		case <-ctx.Done():
			fmt.Printf("goroutine %v ended on ctx timeout\n", id)
		}
	}()

	select {
	case <-ctx.Done():
		return 0, ctx.Err()
	case rr := <-rrCh:
		return rr.n, rr.err
	}
}

func (c *contextReader) read(p []byte, id int) *readResult {
	fmt.Printf("read %v is called\n", id)
	n, err := c.r.Read(p)
	fmt.Printf("read %v returned result\n", id)
	return &readResult{n, err}
}

And this produced interesting results, I wonder how would you read it @ogenstad @karimra

❯ go run main.go
Number of go routines 1
goroutine 281 started, context expires in 4.90112636s
<SNIP as this is not interesting>
read 285 returned result
goroutine 285 ended on received result from ssh channel
################################################
### Connected
################################################
Sending command: show arp
Comment:  hello
Number of go routines 9
Output:
goroutine 13 started, context expires in 2.999888565s
read 13 is called
goroutine 190 started, context expires in 2.999835848s
read 190 is called
read 190 returned result
goroutine 190 ended on received result from ssh channel
read 13 returned result
goroutine 13 ended on received result from ssh channel

<SNIP>

goroutine 103 started, context expires in 2.904135708s
read 103 is called
goroutine 155 started, context expires in 2.904081919s
read 155 is called
read 155 returned result
goroutine 155 ended on received result from ssh channel
read 103 returned result
goroutine 103 ended on received result from ssh channel
Protocol  Address          Age (min)  Hardware Addr   Type   Interface
Internet  10.0.0.2              167   5255.0a00.0002  ARPA   GigabitEthernet1
Internet  10.0.0.15               -   5254.0023.4700  ARPA   GigabitEthernet1

Number of go routines 9
################################################
Sending command: ping
Comment:  Starting ping for the first blocking command
Number of go routines 9
Output:
goroutine 251 started, context expires in 2.999941035s
read 251 is called
goroutine 210 started, context expires in 2.999909833s
read 210 is called
read 210 returned result
goroutine 210 ended on received result from ssh channel
read 251 returned result
goroutine 251 ended on received result from ssh channel
goroutine 105 started, context expires in 2.974634878s
read 105 is called
goroutine 156 started, context expires in 2.974601483s
read 156 is called
read 156 returned result
goroutine 156 ended on received result from ssh channel
read 105 returned result
goroutine 105 ended on received result from ssh channel
goroutine 166 started, context expires in 2.97432273s
read 166 is called
goroutine 128 started, context expires in 2.974262897s
read 128 is called
read 128 returned result
goroutine 128 ended on received result from ssh channel
read 166 returned result
goroutine 166 ended on received result from ssh channel
goroutine 161 started, context expires in 2.972261274s
read 161 is called
goroutine 2 started, context expires in 2.97224511s
read 2 is called
read 2 returned result
goroutine 2 ended on received result from ssh channel
read 161 returned result
goroutine 161 ended on received result from ssh channel
goroutine 83 started, context expires in 2.969358146s
read 83 is called
goroutine 146 started, context expires in 2.969334017s
read 146 is called
read 146 returned result
goroutine 146 ended on received result from ssh channel
read 83 returned result
goroutine 83 ended on received result from ssh channel
goroutine 163 started, context expires in 2.968877329s
read 163 is called
goroutine 276 started, context expires in 2.968859734s
read 276 is called
read 276 returned result
goroutine 276 ended on received result from ssh channel
read 163 returned result
goroutine 163 ended on received result from ssh channel
goroutine 2 started, context expires in 2.953256809s
read 2 is called
goroutine 218 started, context expires in 2.953201021s
read 218 is called
read 2 returned result
goroutine 2 ended on ctx timeout
unable to run command: error reading output from device reader error: context deadline exceeded

Number of go routines 10
################################################
Sending command: 
Comment:  First enter
Number of go routines 10
Output:
goroutine 147 started, context expires in 2.999854586s
read 147 is called
goroutine 294 started, context expires in 2.999804606s
read 294 is called
read 218 returned result
goroutine 218 ended on ctx timeout
read 147 returned result
goroutine 147 ended on ctx timeout
unable to run command: error reading output from device reader error: context deadline exceeded

Number of go routines 10
################################################
Sending command: 10.0.0.15
Comment:  Ping target
Number of go routines 10
Output:
goroutine 277 started, context expires in 2.999825812s
read 277 is called
goroutine 163 started, context expires in 2.999792074s
read 163 is called
read 294 returned result
goroutine 294 ended on ctx timeout
read 163 returned result
goroutine 163 ended on received result from ssh channel
read 277 returned result
goroutine 277 ended on received result from ssh channel
goroutine 196 started, context expires in 2.988924168s
read 196 is called
goroutine 20 started, context expires in 2.988888814s
read 20 is called
read 20 returned result
goroutine 20 ended on received result from ssh channel
read 196 returned result
goroutine 196 ended on received result from ssh channel
goroutine 123 started, context expires in 2.987283508s
read 123 is called
goroutine 153 started, context expires in 2.987215824s
read 153 is called
read 153 returned result
goroutine 153 ended on received result from ssh channel
read 123 returned result
goroutine 123 ended on received result from ssh channel
goroutine 37 started, context expires in 2.985541389s
read 37 is called
goroutine 133 started, context expires in 2.985520568s
read 133 is called
read 133 returned result
goroutine 133 ended on received result from ssh channel
read 37 returned result
goroutine 37 ended on received result from ssh channel
goroutine 41 started, context expires in 2.984512854s
read 41 is called
goroutine 259 started, context expires in 2.984476066s
read 259 is called
read 259 returned result
goroutine 259 ended on received result from ssh channel
read 41 returned result
goroutine 41 ended on received result from ssh channel
goroutine 233 started, context expires in 2.982470437s
read 233 is called
goroutine 143 started, context expires in 2.982448038s
read 143 is called
read 143 returned result
goroutine 143 ended on received result from ssh channel
read 233 returned result
goroutine 233 ended on received result from ssh channel
goroutine 191 started, context expires in 2.981253914s
read 191 is called
goroutine 202 started, context expires in 2.981203906s
read 202 is called
read 202 returned result
goroutine 202 ended on received result from ssh channel
read 191 returned result
goroutine 191 ended on received result from ssh channel
goroutine 278 started, context expires in 2.979498386s
read 278 is called
goroutine 136 started, context expires in 2.979465069s
read 136 is called
read 136 returned result
goroutine 136 ended on received result from ssh channel
read 278 returned result
goroutine 278 ended on received result from ssh channel
goroutine 246 started, context expires in 2.978192411s
read 246 is called
goroutine 7 started, context expires in 2.978159326s
read 7 is called
read 7 returned result
goroutine 7 ended on received result from ssh channel
read 246 returned result
goroutine 246 ended on received result from ssh channel
goroutine 240 started, context expires in 2.977094409s
read 240 is called
goroutine 103 started, context expires in 2.97706947s
read 103 is called
read 240 returned result
goroutine 240 ended on ctx timeout
unable to run command: error reading output from device reader error: context deadline exceeded

@hellt
Copy link
Contributor Author

hellt commented Apr 16, 2021

In particular, the interesting stuff happens when we send ping command that doesn't have a prompt returned. The following happens:

# ping command is called, many goroutines start and finish normally until the last two

goroutine 2 started, context expires in 2.953256809s
read 2 is called
goroutine 218 started, context expires in 2.953201021s
read 218 is called # <- this goroutine 218 entered blocking state, since now no data is coming from the channel
read 2 returned result # <- goroutine 2 returned result, but it seems the rrCh is not empty, so it can't write to it
goroutine 2 ended on ctx timeout # <- it ends on timeout instead
unable to run command: error reading output from device reader error: context deadline exceeded

Number of go routines 10
################################################
Sending command: 
Comment:  First enter
Number of go routines 10
Output:
# upon new command we a new pair of goroutines is created
goroutine 147 started, context expires in 2.999854586s
read 147 is called
goroutine 294 started, context expires in 2.999804606s
read 294 is called
read 218 returned result # <- this goroutine was hanging since previous command, but now its able to read from channel as data appears
goroutine 218 ended on ctx timeout # <- but it seems that the channel is destroyed already? since it was part of previous call to readUntilPrompt?

@ogenstad
Copy link
Member

Perhaps also print the error if it's not nil here?

func (c *contextReader) read(p []byte, id int) *readResult {
	fmt.Printf("read %v is called\n", id)
	n, err := c.r.Read(p)
	fmt.Printf("read %v returned result\n", id)
	return &readResult{n, err}
}

The read xyz returned result doesn't have to mean that anything worked.

As you indicate I'd also suspect that there's an issue with rrCh no longer being active.

A small issue from the example program I posted, the output in the end indicated that there were still 9 goroutines running, in reality that's just a timing issue. If I add a time.Sleep(500 * time.Millisecond) on the line after disconnecting from the device it goes back to 1 goroutine when printing. I also tried this when disconnecting before going through the entire ping command where we have 10 goroutines and after closing the connection we're back at 1 goroutine (after waiting half a second at least). However with #11 we have 2 goroutines at that point. If we go with this PR it should be max 1 leaked goroutine per active connection (which won't actually happen that often) I think that's perfectly fine.

At this point I'm also curious as to what's happening under the hood, but mostly from an academic point of view.

@hellt
Copy link
Contributor Author

hellt commented Apr 16, 2021

so that basically means that in #11 we have a leaking goroutine that doesn't get discarded even when the connection is closed. That is weird...

@ogenstad
Copy link
Member

Yes, that seems surprising and I don't understand why it would be the case. It could be some other timing issue but I also waited for a few seconds and I think it should have been resolved by then if it wasn't anything permanent.

I just want to run some additional tests and verify that everything looks good if I try to add a larger number of config lines to some test devices ensure that it's behaving as desired in that scenario. Then I think we can merge this.

@hellt
Copy link
Contributor Author

hellt commented Apr 17, 2021

I tried even longer waiting timeouts after closing the device. 35sec. And it was still 2 goroutines reported.
I suppose you want to merge this PR after doing some testing with bigger configs?

@ogenstad
Copy link
Member

Nice work @hellt, and thank you for your assistance @karimra! Merging this in favor of #11.

@ogenstad ogenstad merged commit a3f157b into networklore:main Apr 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants