Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple epoll events received for large message size, reads getting blocked after first event #143

Closed
codefetcher opened this issue Jun 30, 2021 · 26 comments

Comments

@codefetcher
Copy link

Hi we are implementing websocket server and using netpoll (epoll) for watching for read events on connected clients file descriptor. When the client sends large message size we are getting multiple read events for the same message. We are able to read the entire message after receiving the first event from epoll. For the subsequent events the read gets blocked on nextframe() call as there is no message on underlying connection. Any help is much appreciated.

@cristaloleg
Copy link
Collaborator

Hi, what is the problem in the end? Title gives too small contrxt.

@codefetcher codefetcher changed the title Multiple epoll events for large message size Multiple epoll events for large message size hence reads getting blocked after first event that is received Jun 30, 2021
@codefetcher
Copy link
Author

Updated the title. Basically the issue is that as multiple events are received for a websocket message. At the first event we are able to get the message but for subsequent events as there is no data to read. The Read gets blocked, i want to understand if there is a way to stop the blocking read if there is no data to read on the underlying connection

@codefetcher codefetcher changed the title Multiple epoll events for large message size hence reads getting blocked after first event that is received Multiple epoll events received for large message size, reads getting blocked after first event Jun 30, 2021
@KushagraVasal
Copy link

May be Head of line blocking is your issue
head-of-line-blocking-l

@lesismal
Copy link

lesismal commented Jul 6, 2021

@lesismal
Copy link

lesismal commented Jul 6, 2021

I know it is impolite to mention my own project in other people's projects, but I have pointed out problems similar to the gobwas/ws+netpoll in many projects, but few people are aware of the shortcomings of that.
So, I still have to mention my own project:
https://github.com/lesismal/nbio

I've implemented poller + non-blocking-parser in nbio, which is the real way to solve 1000k connections problem.

Here is the example for websocket:
https://github.com/lesismal/nbio/tree/master/examples/websocket

Still very sorry, but I just want to give back to the open source community

@gobwas
Copy link
Owner

gobwas commented Jul 7, 2021

@codefetcher how does the difficulty of async networking and partial reads relate to this library? If you want to use truly async you have to parse WebSocket frames in async way e.g., storing the state of the read, unparsed (due to incompleteness) bytes buffer and so on. If you are going to adopt the mailru/netpoll library please keep in mind that that library suggests to block read when first readable event happens -- yeah, it's not the ideal async i/o, but it may work in most of the cases -- the only penalty here is that you will not reuse read buffers so efficiently for cases, when not the whole message is in kernel socket buffer. There are another ways to solve the issue still using gobwas/ws, e.g., using non-blocking sockets and retries of parsing frames (in case on partial arrive) and so on.

@gobwas
Copy link
Owner

gobwas commented Jul 7, 2021

@lesismal so what exactly you are going to give back to the community besides mentioning your project?

@lesismal
Copy link

lesismal commented Jul 7, 2021

@lesismal so what exactly you are going to give back to the community besides mentioning your project?

the project is the give back.

@lesismal
Copy link

lesismal commented Jul 7, 2021

@lesismal so what exactly you are going to give back to the community besides mentioning your project?

and I have pointed out the problem of gobwas/ws+netpoll, and in gorilla's issue about poller supportting, and int eranyanay/1m-go-websockets's issue.
you have not implemented the streaming-parser, that's one of the key points for a poller framework.

@lesismal
Copy link

lesismal commented Jul 7, 2021

@lesismal so what exactly you are going to give back to the community besides mentioning your project?

and I have pointed out the problem of gobwas/ws, and in gorilla's issue about poller supportting, and int eranyanay/1m-go-websockets's issue.
you have not implemented the streaming-parser, that's the key point for a poller framework.

pointed out the problem, but seems few of you have understood that problem which would make service slower even not available.

@lesismal
Copy link

lesismal commented Jul 7, 2021

see this: gobwas/ws-examples#18, I am trying to tell people about that problem but there's nobody response to my issue.

if more guys use gobwas/ws+netpoll, should all come to the problem as this issue mentioned.

@lesismal
Copy link

lesismal commented Jul 7, 2021

you have provide gobwas/ws+netpoll for users to solve 1000k problem with golang, but gobwas/ws+netpoll is the problem now.
not only my repo has implemented the streaming-parser, gev websocket implemented that too.
if you really understand the half-packet/tcp-sticky problem with gobwas/ws+netpoll, you should not ask the users of gobwas/ws+netpoll to implement the parser themselves.

@lesismal
Copy link

lesismal commented Jul 7, 2021

@lesismal so what exactly you are going to give back to the community besides mentioning your project?

the project is the give back.

if any interested, to check the code of my repo, welcome communicating or issue/pr.

@lesismal
Copy link

lesismal commented Jul 7, 2021

the only penalty here is that you will not reuse read buffers so efficiently for cases, when not the whole message is in kernel socket buffer.

that's not the only penalty, the real problem is the risk of service slow even not available.

There are another ways to solve the issue still using gobwas/ws, e.g., using non-blocking sockets and retries of parsing frames (in case on partial arrive) and so on.

it seems that gobwas/ws has not provided the way of "non-blocking sockets and retries of parsing frames".
if the users implement that themselves, why they still need to use gobwas/ws+netpoll? they will only need to use netpoll+their own parser implementation.

@lesismal
Copy link

lesismal commented Jul 9, 2021

use this simple proxy tunnel to simulate tcp sticky packets:

import "github.com/lesismal/nbio/examples/sticky/proxy"

go proxy.Run(addrForAcceptClient, addrOfYourWsServer, time.Nanosecond, func(max int) int {
	n :=  max / 2
	time.Sleep(time.Millisecond*200)
	return n
})

then you would easily reproduce this problem about tcp streaming sticky packets.

@lesismal
Copy link

lesismal commented Jul 9, 2021

@cristaloleg I have explained here: gobwas/ws-examples#18

@lesismal
Copy link

lesismal commented Jul 9, 2021

@codefetcher @DNAofTech

there is no Head of line blocking problem in the example below, would anyone like to try it?

package main

import (
	"flag"
	"fmt"
	"net/http"
	"os"
	"os/signal"

	"github.com/lesismal/nbio/nbhttp"
	"github.com/lesismal/nbio/nbhttp/websocket"
)

var (
	addr = flag.String("addr", ":8888", "listening addr")
	path = flag.String("path", "/ws", "url path")
)

func onWebsocket(w http.ResponseWriter, r *http.Request) {
	upgrader := &websocket.Upgrader{EnableCompression: true}
	conn, err := upgrader.Upgrade(w, r, nil)
	if err != nil {
		fmt.Println("upgrade failed:", err)
	}
	wsConn := conn.(*websocket.Conn)
	wsConn.OnMessage(func(c *websocket.Conn, messageType websocket.MessageType, data []byte) {
		// echo
		c.WriteMessage(messageType, data)
	})
	wsConn.OnClose(func(c *websocket.Conn, err error) {
		fmt.Println("OnClose:", c.RemoteAddr().String(), err)
	})
	fmt.Println("OnOpen:", wsConn.RemoteAddr().String())
}

func main() {
	flag.Parse()
	mux := &http.ServeMux{}
	mux.HandleFunc(*path, onWebsocket)

	svr := nbhttp.NewServer(nbhttp.Config{
		Network: "tcp",
		Addrs:   []string{*addr},
	}, mux, nil)

	err := svr.Start()
	if err != nil {
		fmt.Printf("nbio.Start failed: %v\n", err)
		return
	}
	defer svr.Stop()

	interrupt := make(chan os.Signal, 1)
	signal.Notify(interrupt, os.Interrupt)
	<-interrupt
	fmt.Println("exit")
}

@gobwas
Copy link
Owner

gobwas commented Jul 9, 2021

@lesismal that's great that you solved the problem you mentioned in your library.

Keeping in mind that this (gobwas/ws) library was designed to work with network as it usually done in most of the Go code -- in blocking mode, letting Go's internall poller to do his job -- I'm not sure what problem you are asking to solve in this library?

I mean, yeah, it wasn't designed to support the streaming parsing, and that's okay for most of the use cases. Problems with peers sending byte by byte theoretically exist, but you can get rid of them by having "rubber" goroutine pools and pretty short timeouts for reading the whole message. That's what exactly was done for some projects which are using netpoll and this library. If you have some bad guys specifically doing such things you probably have to deal with them more strictly and probably on quite different layers of your applications such as DDoS protection and so on.

@lesismal
Copy link

lesismal commented Jul 9, 2021

pretty short timeouts for reading the whole message

do you think the pretty short timeouts could solve the problem?
for example:

  1. there are 10k connections, we use the goroutine pool with size 100
  2. there are 100 slow connections, each payload of websocket message sent by these slow connections arrives half at the first wsConn.ReadMessage, and another half payload arrive after 90% of your timeout time. and these slow connections send messages like this again and again.

your goroutines in the pool will block by wsConn.ReadMessage of these slow connections most of the time, even you have used the timeouts. then, your service becomes very slow, even not available.

your timeouts solution to promise your service's healthy is something about luck but not a rigorous engineering solution.

@lesismal
Copy link

lesismal commented Jul 9, 2021

use this simple proxy tunnel to simulate tcp sticky packets:

import "github.com/lesismal/nbio/examples/sticky/proxy"

go proxy.Run(addrForAcceptClient, addrOfYourWsServer, time.Nanosecond, func(max int) int {
	n :=  max / 2
	time.Sleep(time.Millisecond*200)
	return n
})

then you would easily reproduce this problem about tcp streaming sticky packets.

@gobwas try some test with this proxy to simulate slow connections, you will find that your service would be very slow and the other normal connections will be affected.

@gobwas
Copy link
Owner

gobwas commented Jul 9, 2021

@lesismal as I said previously, you can use "rubber" pools for this.

Anyway, I repeat my question -- what do you ask to be changed in this library? If nothing, would you mind if I close this issue and conversation?

Edit:
By "rubber" I mean those with some hard & soft limits.

@lesismal
Copy link

lesismal commented Jul 9, 2021

o simulate slow connections, you will find that your service would b

or you can provide an echo server code writen by gobwas/ws+netpoll, and I provide some clients code with not too many connections to attack your server, let's check whether your service could still keep healthy.

@lesismal
Copy link

lesismal commented Jul 9, 2021

@lesismal as I said previously, you can use "rubber" pools for this.

Anyway, I repeat my question -- what do you ask to be changed in this library? If nothing, would you mind if I close this issue and conversation?

I do not need anything, I have my own repo which has solve these problems.

but you need to solve this HOLB problem for your users of gobwas/ws+netpoll, otherwise, the service of your users will always be at risk of being slow or not available.

the way to solve this problem is poller + async streaming parser, you have had easygo, but still need the async streaming parser.

@gobwas
Copy link
Owner

gobwas commented Jul 9, 2021

@lesismal great, thank you for sharing your repo.

@lesismal
Copy link

lesismal commented Jul 9, 2021

If you guys think I am nosy, sorry to everyone, please ignore all my replies, I will go away.
But if anyone wants to solve these problems about go-1m-websocket-connections, welcome to comunicate.

@buffge
Copy link

buffge commented Sep 21, 2021

If you guys think I am nosy, sorry to everyone, please ignore all my replies, I will go away.
But if anyone wants to solve these problems about go-1m-websocket-connections, welcome to comunicate.

you're real great coder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants