Support 4k active users with RocketChat

ramrami.mohamed · November 7, 2020, 10:03pm

Description

Hi, we’re trying to support 4k active users with RocketChat, but we are unable to go above 1k for now.

We are using RocketChat v3.6.3
10 instances (2CPU & 2GB RAM each) on AWS Fargate
3 nodes MongoDB v4.2 cluster (8vCPU & 32GB RAM & 16000 max connections) on Atlas, we use retryWrites=true&w=majority&poolSize=75 in the connection string.

We are using selenium with headless chrome on the cloud to perform the load test.
All users are connected to the same public channel, and wait a random amount of time before sending a text message.
We tried with :
+10 min : const time = Math.floor(Math.random() * 10 * 60) * 1000;
and + 30 min : const time = Math.floor(Math.random() * 30 * 60) * 1000;

In our last test we tried with 2370 users, and the chat was unusable, I could not send messages (they stay grey and no REST request sendMessage is sent), if I reload the page I can access the channel but the messages loader stays forever.

The problem is that our monitoring does not show any big CPU load, the app instances are at ~50% CPU max and the DB is at ~40% CPU, so we’re at lost here.

We first discovered that having the setting Unread_Count set to all_messages is a big no for large channels, it was generating a lot of oplog updates on the subscription collection and was slowing the app. Changing it helped a little.

We also have a lot of this in our instance logs :
Mongodb Exception in setInterval callback: SwitchedToQuery TIMEOUT QUERY OPERATION

We would appreciate any additional hints from the experts in this forum.

Server Setup Information

Version of Rocket.Chat Server: v3.6.3
Operating System: Amazon Linux
Deployment Method: Containers on AWS Fargate
Number of Running Instances: 10
DB Replicaset Oplog: enabled
NodeJS Version: 12.16.1
MongoDB Version: 4.2
Proxy: AWS ALB
Firewalls involved: none

Thanks

rocketuser53 · November 13, 2020, 2:09pm

did you have a solution?

PolG · November 28, 2020, 5:20pm

Same here we are experiencing the same.

The servers look fine.

However, the browser collapses. We are thinking if the number of messages that it receives could be an issue here. Either the volume or the rate.

HIChat · November 10, 2021, 10:02am

Have the same problem.
Did you solve it?

My cluster:
2.5k active users
25 containers on swarm with reverse proxy
3 mongo servers in replicaSet

Topic		Replies	Views
RocketChat unusable, very slow, hangs etc Community Support	6	2905	July 24, 2023
Rocket.Chat user limitations Community Support	10	13710	February 22, 2023
Rocket Chat optimization Community Support	6	4067	June 13, 2018
Sudden 100% CPU usage causing non responsive chat till service restart Community Support	31	12033	June 28, 2019
Scaling challenges Community Support	9	4454	November 16, 2018

[Secure CommsOS™ Launch] Join our next Roadmap Reveal webinar to learn more Register now ->

Support 4k active users with RocketChat

Description

Server Setup Information

Related topics