Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PingHandler will not time time out #137

Closed
vishnureddy17 opened this issue Jun 8, 2023 · 7 comments · Fixed by #222
Closed

PingHandler will not time time out #137

vishnureddy17 opened this issue Jun 8, 2023 · 7 comments · Fixed by #222

Comments

@vishnureddy17
Copy link
Contributor

Describe the bug
The logic of PingHandler means that there will never be a timeout, and pingFailHandler will never be called. I'm working on getting out a PR for this.

To reproduce
Wrote a test which should pass if PingHandler times out, but the test fails:

package paho

import (
	"net"
	"testing"
	"time"

	"github.com/stretchr/testify/assert"
)

func TestPingerPingHandlerTimeout(t *testing.T) {
	fakeServerConn, fakeClientConn := net.Pipe()
	defer fakeServerConn.Close()
	defer fakeClientConn.Close()

	serverConnBuffer := make([]byte, 1024)

	go func() {
		// keep reading from fakeServerConn and discard the values. exit when fakeServerConn is closed.
		for {
			_, err := fakeServerConn.Read(serverConnBuffer)
			if err != nil {
				return
			}
		}
	}()

	pfhValues := make(chan error)
	pfh := func(e error) {
		pfhValues <- e
	}

	pingHandler := DefaultPingerWithCustomFailHandler(pfh)

	go func() {
		pingHandler.Start(fakeClientConn, time.Second)
	}()
	defer pingHandler.Stop()

	select {
	case <-time.After(time.Minute):
		t.Error("pingHandler did not timeout after a minute of waiting")
	case err := <-pfhValues:
		assert.EqualError(t, err, "ping resp timed out")
	}
}

Expected behaviour
PingHandler should eventually call pingFailHandler if PingResp() is never called.

Additional context
Looks like this happens because p.lastPing is reset every time a PINGREQ is sent (approximately every pt duration), so time.Since(p.lastPing) doesn't get the chance to get to 1.5x pt for a timeout to be detected.

@MattBrittan
Copy link
Contributor

I'm not sure why this uses 1.5 x the ping timeout as the measure of success. The spec says:

If the Keep Alive value is non-zero and the Server does not receive an MQTT Control Packet from the Client within one and a half times the Keep Alive time period, it MUST close the Network Connection to the Client as if the network had failed [MQTT-3.1.2-22].

If a Client does not receive a PINGRESP packet within a reasonable amount of time after it has sent a PINGREQ, it SHOULD close the Network Connection to the Server.

So the one and a half times is a requirement on the server side (assuming the client sends a ping every pt there is some transit time so allowing pt * 1.5 makes sense). However, for the client, the timeout is a "reasonable time"; on Mosquitto the default keepalive_interval is 60s and waiting 1.5 minutes for a response packet seems a touch excessive. Perhaps this should be configurable, as in the v3 client? (I would also note that the C client has extra checks in case a large packet is being received and this delays the receipt of the ping).

I suspect this was modelled on the C library which uses the same basic methodology but will never send concurrent ping requests in flight (so changing if time.Since(p.lastPing) >= pt to else if time.Since(p.lastPing) >= pt would replicate this behaviour).

@vishnureddy17
Copy link
Contributor Author

I suspect this was modelled on the C library which uses the same basic methodology but will never send concurrent ping requests in flight (so changing if time.Since(p.lastPing) >= pt to else if time.Since(p.lastPing) >= pt would replicate this behaviour).

I instead proposed a different design in #139, but I'm happy to open a different PR with the fix you describe.

@ankitrgadiya
Copy link

ankitrgadiya commented Aug 9, 2023

One more tangential issue with PingHandler is that, in theory the Client can accept a custom PingHandler but to initialise a PingHandler a FailureHandler is required which inherently requires internal field access. For the default one an anonymous function is used.

https://github.com/eclipse/paho.golang/blob/master/paho/client.go#L150

This essentially limits the user to only use the default PingHandler. Maybe a better way would be to accept NewPingHandler func and then generate it and pass the same FailureHandler.

@MattBrittan
Copy link
Contributor

I instead proposed a different design in #139, but I'm happy to open a different PR with the fix you describe.

Sorry for not responding on this; I've been leaving paho changes to @alsm; will have a chat with him and see if we can get these reviewed (the current pinger is not great).

@vishnureddy17
Copy link
Contributor Author

No worries! I'm also curious what the use case is for custom PingHandlers.

@MattBrittan
Copy link
Contributor

Really sorry for the lack of response on this - I am now beginning to accept PR's on the paho part of the project (had previously focused mainly on the auto bit) so have taken a look at this. I'm posting this here rather than against your PR because there are more general comments (I'm fine with the PR - it just needs some tweaks, due to changes after it was raised, so it can be merged).

I think (as discussed in #207) we should also add something like PacketSent() to the interface (even if this is not implemented in paho this release, it gives us the flexibility to add it). The longer term idea is that this would be called every time a packet is sent which would mean Pinger could avoid sending unnecessary ping requests (not a major issue but could create problems on systems under high load).

If we are making a change then I also wonder if we should have Start return an error because this would remove the need for pingFailHandler (which is problematic as we would need to expose client.error to make a custom ping handler workable as per comment from @ankitrgadiya); instead the connection would be dropped if Start returns an error (nil would be returned if the duration is 0 or the pinger is stopped gracefully). Given that Start is blocking I think it might be better renamed to Run (or similar) because I would assume that Start would start something and then return.

so something like:

type Pinger interface {
	Run(net.Conn, time.Duration) error
	Stop()
        PacketSent() // Called when a packet (other than a PING) has been sent to the server
	PingResp() // Called when client receives a ping response
	SetDebug(log.Logger)
}

@vishnureddy17
Copy link
Contributor Author

I'm posting this here rather than against your PR because there are more general comments (I'm fine with the PR - it just needs some tweaks, due to changes after it was raised, so it can be merged).

Sounds good, I went ahead and closed that PR (still kept the branch in my fork for reference). I'll open a brand new PR.

What you propose sounds reasonable, will get a PR out soon.

MattBrittan added a commit that referenced this issue Jan 13, 2024
Resolves a range of issues with the old pinger.

Closes #77
Closes #137
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants