Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple servers working as a cluster #42

Open
wandenberg opened this issue Aug 6, 2012 · 16 comments
Open

Multiple servers working as a cluster #42

wandenberg opened this issue Aug 6, 2012 · 16 comments

Comments

@wandenberg
Copy link
Owner

Add the functionality to multiple servers works as a cluster, without use external tools like redis or memcached

@kixorz
Copy link

kixorz commented Aug 10, 2012

This would be a fantastic feature for load-balanced environments! Are you planning to implement it soon?

@wandenberg
Copy link
Owner Author

This is in the road map for a long time.
There are some difficult issues to solve, like the message id and the messages order, which should be the same on all servers, without create any deadlock or contention on the server, to not drop the efficiency.

If you have any suggestion about this topics let me know.

Regards

@kixorz
Copy link

kixorz commented Aug 17, 2012

I'd say that both of these things should be implemented by the app, not by the module. Messages may arrive out-of-order and the app should deal with this gracefully. I think that the module should just re-publish to other publisher endpoints specified in it's configuration. I wouldn't try to solve all the problems at once.

Thanks for your response!

Adam

@Dunaeth
Copy link

Dunaeth commented Nov 6, 2012

Maybe am I out of topic, but wouldn't it be easier to distribute channels among servers instead of trying to distribute messages ?

@kixorz
Copy link

kixorz commented Nov 6, 2012

Not always you have control over where user gets connected to - the load balancer decides.

@Dunaeth
Copy link

Dunaeth commented Nov 6, 2012

Sure but it's the same thing with memcache or other distributed services. Distributing channels should deal with message id trouble, then one have to find a way to distribute data among servers, with a circular hash for example (which would deal with data redundancy). Is that a bad way ?

@wandenberg
Copy link
Owner Author

Isn't a bad way at all, but imagine that you don't have an uniformly subscribers distribution between channels.
As example, imagine that you have two servers and two channels, one with 10 subscribers and other with 10 thousand.
If you do the balance using the channel, one server will have much more load than the other.
As @kixorz said the message id and the message order should be ensured by the application, putting some information on the "message text" when using the module with cluster.
May be I choose put the message id as an optional header, with that the application set the id, instead of an incremental, once the module does not use this number

@Dunaeth
Copy link

Dunaeth commented Nov 9, 2012

Actually, I think our PoV differs due to the use case each of us has in mind. Mine is that many users subscribe each to only one channel (theirs) but the subscribe request can be done on many servers. Your use case is really different, but I'm not sure your hypothetical solution would do the trick. If we look at (TCP ?) connections, the described solution would lead to :

  • 10 connections for the 1st channel
  • 10000 connections for the 2nd
  • 10 connections from the 1st channel responding server to the 2nd server (the one that's not responding to the request)
  • 10000 connections from the 2nd channel to the 2nd server

So with this solution you would have twice as connections and I'm not sure you'd lower the load on any server. Though I must admit you would have a better memory distribution (even if it then will depend on the messages size).

I may be wrong in my analysis too.

Regards

@wandenberg
Copy link
Owner Author

Yes, your analysis is wrong :)
We will not make all subscribers connect on both servers, we will make one server connects to the other.
When a message is published to one server it will be replicated to the others, and each server will deliver the message to all subscribers connected on it, no matters if is one subscriber per channel or many.
With this solution I can distribute the subscribers having 5005 in each server and no matters where the message is published all subscribers will receive it.

Regards

@Dunaeth
Copy link

Dunaeth commented Nov 9, 2012

So you would stick to a memory space equal to the server memory that has the less shared mem available ? That's why I said the use cases differ in our different PoV, we probably have different scalable architecture in mind :)

@wandenberg wandenberg modified the milestone: 0.4.x Jan 2, 2015
@bartebor
Copy link

bartebor commented Jan 9, 2015

+1 for setting message id by an application. It makes use of custom versioning possible and ensures consistency between multiple modules in load balanced environments.

I would also really like to have an upstream mode, which would help make clustered and load balanced farm of servers where backend application does not have to know all servers locations (some form of inversion of control). We could use one instance of module as an upstream server or make backend to be one.

It would be also nice to have conditional get based on version - when you already have current version you will not get it again, even if hitting another server.
Why do we need this? Well, few years ago we have developed custom live distribution mechanism which is still in production.
It consists of a backend (HA clustered) and a farm of frontends, which connect to the backend.
The whole concept is based on events. Clients create events (some sort of channel, think of a sport event) and than start to publish data. Publishing starts with sending data to the backend. Backend compares it with previous version and, if different, increments version number, creates differential version (currently we have only diff-like method) and stores both diff and full data in persistend storage. Note that application always sends to backend full data, which is often quite big.

Now, if there are any frontends subscribed to this event, they are informed of changes.
Frontend maintains persistent connection to backend and subscribes to events as first clients connects and unsubscribe, when all are gone. Frontend is also responsible for sending data to clients in optimized way. We have implemented simple protocol, where (Javascript) client provides frontend with current data version. Frontend can now decide how to send data - if client is behind more than one version it sends it full current data, and a differential (potentially much smaller) version otherwise. If there is no new version, client waits on the connection.

As you see our system is some sort of state replicator as opposed to bare publish-subscribe.
It has also the advantage of self healing after server restart and supports events that are no longer live (so there will be no updates anymore). Such events have a special flag which makes client disconnect after it reads data since no updates are expected. This substantially lowers frontend load in terms of used memory and number of persisting network connections.

Now we are looking for some alternatives, as after these years new technologies arrived such as websockets (we have only long polling). Nginx push module would be ideal replacement for our frontends, but in current shape it just does not fit... Having custom versions support, upstream mode and (possibly) support for specifying current version by client would be great :)

Since this issue is idle for a quite long time are there any possibilities of implementing these features?

Regards

@amonaco
Copy link

amonaco commented Mar 12, 2015

how about using a "scalability protocol" library approach like http://zeromq.org/ or https://github.com/nanomsg/nanomsg

Thanks!

@dennismartensson
Copy link

Hi, Is this still on the road map?

@wandenberg
Copy link
Owner Author

yes. but, unfortunately I'm currently "out of time"

@dennismartensson
Copy link

Okey, I have bin looking at http://nats.io/ as a messaging systems between websocket servers and have had good results in terms of speed and scaling there is a official nginx client https://github.com/nats-io/nginx-nats

I don't know how you have bin thinking about building the cluster solution. Just wanted to mention it.

@chudzini
Copy link

May be I choose put the message id as an optional header, with that the application set the id, instead of an incremental, once the module does not use this number

Is this option available? Can I set message id on the application side?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants