Problems connecting to the server. 100% CPU usage

During the working day, I had to restart the RocketChat server. After the reboot, there were problems: users try to connect and they don’t get it, just a gray screen and running points.
I looked at the server activity and saw such a picture there

Observing the activity of the processor, I noticed that consistently on one core, the load is 100% on the other less, then the next core is 100%, on the previous one it falls, which is actually seen in the screenshot.
In the process of searching for the cause, I established that such activity arises when users endeavor to connect to the server en masse, that is, the server does not cope. However, those who have already connected can work in the chat, but if you open a new contact, the messages are loaded with some delay.
Previously, this problem did not arise.
What can be done with this?

Rocket Chat server runs on Ubuntu 18.04. (Hyper-V virtual machine. VM is allocated 8 cores, 16 GB of RAM (dynamic))
Installed through the snap. Current version 0.72.1
Total number of users 478. Active during the working day 350-400.

Methods of testing user connections.
Users connect from different addresses. Approximately half of users connect from 2 large offices with static addresses. Server for NAT, which allows you to “play” the rules. On the gateway, created rules for static addresses:

  1. I allowed connection to one office - they connected, everything is OK
  2. I allowed connection to the second office - they connected, everything is OK
  3. I give permission to connect to the server from arbitrary addresses - the activity of one of the cores soars up to 100% and further as described above.

By active users do you mean connected users?

If so i’d definitely recommend looking at our scaling docs: https://rocket.chat/docs/installation/manual-installation/multiple-instances-to-improve-performance/

Yes, active users are those who are connected and can work with chat.

When installed via snap is so you can configure?
Read the documentation on your link did not find such information.

Currently the snap install is unable to scale like recommended in this doc.

You would need to migrate to another one of the installation methods.

Thanks. Consider the future. Now work normal: when the user connection not once, but gradually the server will cope.

I apologize for my English - I use google translator.

1 Like