-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
initial implementation of runUntilPrompt goroutine #11
Conversation
@ogenstad I wonder what your tests will indicate to the proposed changes. |
Thanks! I've done a few tests and it seems to work fine. As I understand it the started go routine should end when the readUntilPrompt method is returned regardless off if it's stuck or not, is that correct according to you? Unfortunately I noticed that I'd forgotten to add the linting job to the CI, would you mind running it locally with At the same time you can perhaps delete the section in the README that mentions the previous ssh fork? |
hm, it may be that the goroutine may actually hang... @karimra, does it seem to you as well, that in this part of the code if the The main readUntilPrompt will not suffer from it, and the execution will end because ctx.Done will kick in, but the goroutine will not be cancelled. I am not sure that this is too bad, because it seems the goroutine will end once the transport SSH session will be closed, si likely mem leaking will not happen? ADD1: I haven't read it yet, but maybe there is a workaround to that https://benjamincongdon.me/blog/2020/04/23/Cancelable-Reads-in-Go/ |
That article doesn't solve the problem it seems. I wonder if changing Essentially what we pass as a So I wonder if by changing the interface to ReadCloser we can just call |
@hellt, @ogenstad is right that goroutine will hang since the contextReader.Read() will be already on the I tried to come up with an alternative solution by modifying the contextReader.Read implementation and avoid type readResult struct {
n int
err error
}
func (c *contextReader) read(p []byte) *readResult {
n, err := c.r.Read(p)
return &readResult{n, err}
}
func (c *contextReader) Read(p []byte) (int, error) {
ctx, cancel := context.WithCancel(c.ctx) // this can also be a timeoutContext to create a local read timeout
defer cancel()
rrCh := make(chan *readResult)
go func() {
select {
case rrCh <- c.read(p):
case <-ctx.Done():
}
}()
select {
case <-ctx.Done():
return 0, ctx.Err()
case rr := <-rrCh:
return rr.n, rr.err
}
} This creates a cancellable context local to the Read method, the started goroutine selects on sending a |
Thanks @karimra! I would assume that the block would occur anytime if the final io.Reader that's used is blocking. As you indicate @hellt, I'd also assume that any leaking go routines would end once the ssh session is terminated. However, I did some tests where I ran some commands in a way that would block the reader since the expected prompt wouldn't be found.
At this stage the assumption would be that we have a number of started go routines that are in blocking state. However, if they remain in that blocking state I'd also assume that they would come to life again once there's more data to be read from the buffer. Once the buffer is read by some function that data isn't available any more so I think it's reasonable that multiple go routines trying to read from the same buffer would get in each others way, this doesn't seem to be happening. I modified the anonymous go routine in the PR and changed it so that it only read one byte at a time and then slept for a while, i.e. buffer := make([]byte, 5)
time.Sleep(1 * time.Millisecond) At this stage when running additional commands I'd expect the go routines to get in each others way as they'd read a byte each causing the output to be garbled. This doesn't happen though. The output from the main program still looks good after sending that ping command followed by a few others.
I couldn't say if that this means that the go routines are actually ended or if they get stuck in some other zombie state. Still I'm inclined to go with this solution regardless. I'd originally incorrectly assumed that the replace statement in this projects go.mod file would get used automatically by others using this project. Since that's not the case it would be weird to use the package, coupled with the fact that I also don't like using a modified fork of the crypto/ssh package even if it's just to "fix" one line in the code. It would be nice if it was possible to just have a setting in crypto/ssh to change this behaviour, perhaps it will be an option in the future. Under normal circumstances the Read call should return the expected prompt, or fail because the connection has ended. With this in mind even if there was a leak here I don't think it's something that would occur often for normal usage. The above example with ping could also have been avoided if the program used What are your thoughts? |
The proper way to know what happening is to use a pprof server to see what is happening internally. The not so proper way is to abuse prometheus, by instrumenting your program: add this goroutine to your main: go func() {
http.Handle("/metrics", promhttp.Handler())
http.ListenAndServe(":8888", nil)
}() run your example and continuously get the number of running goroutines with: |
@ogenstad I want to create another branch with @karimra proposal for comparison |
Thanks for the tips! Here's a snippet that connects to Cisco's sandbox machines, as they are open to the internet with full access they aren't always accessible. package main
import (
"context"
"fmt"
"log"
"time"
"runtime"
"github.com/networklore/netrasp/pkg/netrasp"
)
func run(d netrasp.Platform, command string, comment string) {
fmt.Println("################################################")
fmt.Println("Sending command:", command)
if comment != "" {
fmt.Println("Comment: ", comment)
}
fmt.Println("Number of go routines", runtime.NumGoroutine())
ctx, cancelRun := context.WithTimeout(context.Background(), 3*time.Second)
defer cancelRun()
fmt.Println("Output:")
output, err := d.Run(ctx, command)
if err != nil {
fmt.Printf("unable to run command: %v\n", err)
}
fmt.Println(output)
fmt.Println("Number of go routines", runtime.NumGoroutine())
}
func main() {
device, err := netrasp.New("ios-xe-mgmt.cisco.com",
netrasp.WithUsernamePassword("developer", "C1sco12345"),
netrasp.WithDriver("ios"),
netrasp.WithInsecureIgnoreHostKey(),
netrasp.WithSSHPort(8181),
)
if err != nil {
log.Fatalf("unable to create client: %v", err)
}
fmt.Println("Number of go routines", runtime.NumGoroutine())
ctx, cancelOpen := context.WithTimeout(context.Background(), 5*time.Second)
defer cancelOpen()
err = device.Dial(ctx)
if err != nil {
fmt.Printf("unable to connect: %v\n", err)
return
}
fmt.Println("################################################")
fmt.Println("### Connected")
run(device, "show arp", "hello")
run(device, "ping", "Starting ping for the first blocking command")
run(device, "", "First enter")
run(device, "8.8.8.8", "Ping target")
run(device, "", "Next")
run(device, "", "Next")
run(device, "", "Next")
run(device, "", "Next")
run(device, "", "Final command that will execute ping then return to the prompt")
run(device, "show arp","")
run(device, "show inventory","")
run(device, "show arp","")
run(device, "show inventory","")
fmt.Println("################################################")
fmt.Println("### Closing connection")
_ = device.Close(context.Background())
fmt.Println("Number of go routines", runtime.NumGoroutine())
} The output when running the program:
When the Read call gets blocked after the first ping command we can see that the number of go routines go from 9 to 10, but it doesn't climb any more. And once the issue is resolved it goes back to 9. So worst case scenario it might be that there's one leaked go routine per device for some time. Would you agree? |
Thanks! I checked with #12 and the number of go routines look exactly the same, so there's no difference. Perhaps the code in this PR is a bit easier to read though. |
For the Read in #12 not to block, u have to give a context.WithTimeout, with a shorter duration than the main context |
similarly to what I did here I added debug print statements to see what is happening. And this approach spawns less goroutines, with the same pattern of a single read going to block: // readUntilPrompt reads until the specified prompt is found and returns the read data.
func readUntilPrompt(ctx context.Context, r io.Reader, prompt *regexp.Regexp) (string, error) {
var output string
bufCh := make(chan string)
errCh := make(chan error)
go func() {
id := rand.Intn(300)
dead, _ := ctx.Deadline()
fmt.Printf("goroutine %v started, context expires in %v\n", id, time.Until(dead))
i := 0
for {
i += 1
select {
case <-ctx.Done():
fmt.Printf("goroutine %v ended on ctx timeout\n", id)
return
default:
buffer := make([]byte, 10000)
fmt.Printf("read %v is called %d time\n", id, i)
bytes, err := r.Read(buffer)
if err != nil {
errCh <- fmt.Errorf("error reading output from device %w: %v", errRead, err)
}
fmt.Printf("read %v returned result\n", id)
latestOutput := string(buffer[:bytes])
output += latestOutput
workingOutput := output
workingOutput = strings.ReplaceAll(workingOutput, "\r\n", "\n")
workingOutput = strings.ReplaceAll(workingOutput, "\r", "\n")
lines := strings.Split(workingOutput, "\n")
matches := prompt.FindStringSubmatch(lines[len(lines)-1])
if len(matches) != 0 {
bufCh <- output
fmt.Printf("goroutine %v ended on matched prompt\n", id)
return
}
}
}
}()
|
superseded by #12 |
@ogenstad this PR addresses the issue of having a custom
crypto/ssh
packageWanted let you have a look at a draft version of a goroutine-based approach