Skip to content
This repository has been archived by the owner on Mar 17, 2023. It is now read-only.

High Availability deployment #2037

Closed
engelgabriel opened this issue Aug 9, 2016 · 28 comments
Closed

High Availability deployment #2037

engelgabriel opened this issue Aug 9, 2016 · 28 comments
Assignees

Comments

@engelgabriel
Copy link
Member

See

RocketChat/Rocket.Chat#520
RocketChat/Rocket.Chat#847
RocketChat/Rocket.Chat#1867
RocketChat/Rocket.Chat#2964

@engelgabriel
Copy link
Member Author

@engelgabriel engelgabriel changed the title Create documentation about High Availability deployment High Availability deployment Aug 18, 2016
@geekgonecrazy
Copy link
Contributor

RocketChat/Rocket.Chat#3540 (comment) We need to ensure we mention keeping time in sync between instances also.

@srihas619
Copy link

inorder to run two instances of rocket chat on two VMs (web1 and web2) and with a loadbalancer infront of them (haproxy1), do I need to take care of setting up a shared storage for the session info. I have a seperate mongo DB replica setup with another three Vms: so if the session info stored in mongo; I dont need any shared storage I suppopse; please suggest me how I should move from this point

@geekgonecrazy
Copy link
Contributor

if you have mongo set up in replicaset mode, you do not need anything else for shared session storage

@srihas619
Copy link

@geekgonecrazy thanks for your response.
so, do I need to enable OpLog URL? what does exactly this OpLog do? sorry I am quite new to mongodb; I just want to setup a clustered rocket chat .

@richardwlu
Copy link

richardwlu commented Aug 10, 2017

@geekgonecrazy And to add on to @srihas619's question, does the MONGO_OPLOG_URL need to point to the location of the Primary?

I currently have 4 instances of RC running on one server and 3 Mongo replicas and my
MONGO_OPLOG_URL is mongodb://localhost:27017/local
and
MONGO_URL is mongodb://localhost:27017,mongochat02:27017,mongochat03:27017/rocketchat?replicaSet=001-rs&readPreference=primaryPreferred&w=majority

However, I will have 8 instances of RC running between 2 servers while adding 2 additional replica set members on the new server(s) (to be mongochat04 and mongochat05) and I am trying to determine if I need to change MONGO_OPLOG_URL and MONGO_URL on either servers.

@srihas619
Copy link

srihas619 commented Aug 10, 2017

@richardwlu I configured MONGO_OPLOG_URL as mongodb://mongo1:27017,mongo2:27017,mongo3:27017/local?rs0 (rs0 is my replicaset) and it worked. but still I don't know the exact purpose of it; but when OpLog is enabled, rocket chat webs know themselves that they are clustered (Identified by observation)

@richardwlu
Copy link

@srihas619 Thanks for the tip. What is your value of MONGO_URL?

And the OpLog, according to the docs:

The oplog (operations log) is a special capped collection that keeps a rolling record of all operations that modify the data stored in your databases. MongoDB applies database operations on the primary and then records the operations on the primary’s oplog. The secondary members then copy and apply these operations in an asynchronous process. All replica set members contain a copy of the oplog, in the local.oplog.rs collection, which allows them to maintain the current state of the database.

https://docs.mongodb.com/manual/core/replica-set-oplog/

@srihas619
Copy link

srihas619 commented Aug 10, 2017

@richardwlu thanks for the link to docs. It explains clearly. My MONGO_URL is configured as mongodb://mongo1:27017,mongo2:27017,mongo3:27017/rocketchat?rs0. Have I configured things in a correct way or? I will be glad if you can review :)

@richardwlu
Copy link

@srihas619 I think we will need to bring in @geekgonecrazy to help verify the questions above for us :)

@geekgonecrazy
Copy link
Contributor

Just like with the MONGO_URL you will want to add all nodes into the MONGO_OPLOG_URL incase a primary election happens. But it will only actually be tailing the oplog on the primary node.

@geekgonecrazy
Copy link
Contributor

geekgonecrazy commented Aug 10, 2017

also as far as naming used in the connection string. You will want to make sure to use the same name that the nodes identify them selves as in the replicaset.

localhost for example is something I would avoid using when you have a multiple node mongo replicaset.

If you attempt to connect and the primary is localhost it will address attempt to connect to localhost and address it as such. If internally its referencing its self as something else, you will have issues.

Also the other mongo nodes will try to lookup localhost when trying to talk to this peer, it will of course always resolve to its self. So will cause all kinds of issues.

tl;dr Good practice to always use a hostname thats reachable by all other nodes in replicaset

@richardwlu
Copy link

@geekgonecrazy Thank you for clearing this up!

@richardwlu
Copy link

@geekgonecrazy Just to clarify, is it necessary to specify the replica set name by adding replicaSet=001-rs to the end of the MONGO_URL and MONGO_OPLOG_URL?

What I intend to have is:
MONGO_URL = "mongodb://den02vmchat01:27017,mongochat02:27017,mongochat03:27017,ash01vmchat01:27017,ash01vmchat02:27017/rocketchat?replicaSet=001-rs&readPreference=primaryPreferred&w=majority"

MONGO_OPLOG_URL = "mongodb://den02vmchat01:27017,mongochat02:27017,mongochat03:27017,ash01vmchat01:27017,ash01vmchat02:27017/local?replicaSet=001-rs"

@geekgonecrazy
Copy link
Contributor

geekgonecrazy commented Aug 15, 2017

@richardwlu I don't know that its absolutely necessary. But i typically always do this

Looks like what I would use 👍

@richardwlu
Copy link

@geekgonecrazy After editing the following values for each instance (I have 8 total instances running on 2 servers with 4 on each), we have noticed an issue where users are not receiving desktop notifications and alerts consistently (more failed than not). Would you happen to know why this would occur and if it is related to the mongodb config?

We are on version 0.55.0 (older I know), but everything was running fine prior to me adding 4 instances and editing the mongodb. The version has stayed the same.

MONGO_URL = "mongodb://den02vmchat01:27017,mongochat02:27017,mongochat03:27017,ash01vmchat01:27017,ash01vmchat02:27017/rocketchat?replicaSet=001-rs&readPreference=primaryPreferred&w=majority"

MONGO_OPLOG_URL = "mongodb://den02vmchat01:27017,mongochat02:27017,mongochat03:27017,ash01vmchat01:27017,ash01vmchat02:27017/local?replicaSet=001-rs"

@geekgonecrazy
Copy link
Contributor

@richardwlu sorry got a bit behind on github notifications 😁

If you are getting issues like that, typically because the instances cannot talk to each other. Usually something with instance ip or firewall conditions. They need to be able to talk otherwise when it fires event only users on same instance as the one that fired the event will receive it.

@richardwlu
Copy link

@geekgonecrazy No worries, found out it was the INSTANCE_IP

@AFrangopoulos
Copy link

@geekgonecrazy @Sing-Li @georgios Are you aware of any RC use-cases with very high concurrency (say, 100k + concurrent users)? Most things I have read were much smaller in scale than this.
After reading the doc page on scaling by adding RC instances (using mongo RS and a RP). In your research, have you found scaling like this to be linear or is there a large drop-off at a certain point? Most things I have read were mostly formula based, ie: " After doing X, we were able to handle 4X the number of Users". I have not been able to find a ballpark estimation of the number of users per instance, given some generic setup. Thank you for your time.

@geekgonecrazy
Copy link
Contributor

Ulimit increased, node heap increased and good hardware you can do some amazing stuff. :). At that size of deployment I'd say reach out and talk to us.

@AFrangopoulos
Copy link

AFrangopoulos commented Mar 30, 2018

@geekgonecrazy I assume you mean to contact you via support? I think I shot an email out to sales and support a day or so ago. If there is a more fluid way of contacting RC that would be great -- maybe meet on your chat server for a brief discussion?

Load Testing: We started doing baseline load testing with no subscriptions and just sending messages and we get 100% usage around 60 rocketchat_messages/second. Writes/CPU usage seems to be the largest bottleneck -- the amount of users doesn't seem to really have much affect. We tested the same throughput with varying number of users and it had negligible affect on the system. Our setup involves oplog tailing and the hardware as of now is greater than what RC documented as "minimal specs". If we can have a brief discussion as stated above, we can get more specific if you have other questions.

Oplog Tailing vs RedisOplog: From what I have read, oplog tailing is a very expensive task when there are a lot of writes and redisOplog solution will relieve a lot of this CPU stress. Has Rocket Chat looked into this solution and if so, do you have any data on the results? I am currently trying to setup redisOplog with oplog tailing disabled on another machine but running into snags. Hopefully I get that running soon with Theodor's help.

Lastly, when you mentioned "node heap increased", did you mean the process memory limit or maybe, giving more RAM to mongo?

Update: The tests described above was just for a single instance. We are testing our server (12 cores) to have 10 instances. Very early testing showed 150 messages(writes)/second to take 65% CPU. It peaked at over 75% from bursts. Is it normal for the application to be make 6-7x more disk read / writes than mongo does? Or is this a possible misconfig on my end?

Thanks,
Aleko

@sr258
Copy link

sr258 commented May 2, 2018

Have you had any conclusive results on how RC scales at your level of users @AFrangopoulos ? I'm evaluating RC for a use case with more than 1 million registered users and hundreds of thousands of concurrent users. Is there any documentation or experience on scaling RC to this level?

@AFrangopoulos
Copy link

Without getting redisOplog integrated, it can't scale to large numbers of concurrent users (did not have the time to spend on adding this solution to RC, so we abandoned it). The more you scale horizontally, the more your gains diminish due to oplog tailing (and I imagine some other things). Also, it seems the number of packages are also starving the app for resources from what I could tell.

I suspect that if RC were to integrate the redis-oplog feature successfully, it could scale to a very large number of concurrent users (unless there are other bottlenecks that appear after oplog tailing is removed from the equation). Lastly, if you do choose to use RC and try scale it will not be cost-effective -- and eventually you will hit a point where scaling stops and you will likely need to find a different solution.

This is all based on our load/performance testing. It is not fact per se, but I haven't seen anything out there that contradicts our conclusions. Hope this helps! I urge the RC devs to try to integrate the redis-oplog package to see this applications full potential. GL

@geekgonecrazy
Copy link
Contributor

I know we talked a bit in the support channel. But curious what specifically is pointing to oplog as the limiting factor here?

We are definitely experimenting with redis oplog and a few others trying to overall increase performance.

@sr258
Copy link

sr258 commented May 3, 2018

@AFrangopoulos Thanks for your answer, even if it is not what I'd hoped. Are you willing / allowed to hand out the load-tests?
@geekgonecrazy Do you think we could talk about your assessment on how RC scales and what would need to be done for RC to work at the number of users we have?

@sr258
Copy link

sr258 commented Jun 14, 2018

@AFrangopoulos Can you provide some more detail on the load testing you've done (what techniques you've used etc.)? My institution is really interested in setting up their own load testing.

@AFrangopoulos
Copy link

@sr258 I used both meteor down and meteor-load-testing by allaning. I found the latter to be more consistent in our tests. Our main concern was 'writes' & having 20k+ concurrent users. Scaling horizontally just wasn't cutting it -- diminished returns the more you scaled. Was not cost effective. Also, it seems RC has a LOT of packages and there is a continuous fight for CPU.

Our goal was to reach 1k writes/second through scaling and it couldn't within reason ($/# boxes needed). I think somewhere on these forums pertaining to RC/Meteor I have a detailed writeup of the several setups we tried to get the best performance.

In the end, we concluded this product couldn't scale to what we needed. It seems RedisOplog would be a great candidate to fix the scaling issues. GL to you

@Faria1212 Faria1212 transferred this issue from RocketChat/docs-old Apr 20, 2021
@Faria1212 Faria1212 transferred this issue from RocketChat/developer-docs Apr 26, 2022
@Rodriq Rodriq closed this as completed Jun 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants