Server crashes possibly because of memory leak

Description

Hello, our company is hosting a single instance of Rocket.Chat server for approximately 50 active users, on a DigitalOcean droplet (2 vCPUs, 4GB memory, 50GB SSD). According to the mimimum requirements page on the documentation, this is more than enough for our needs. Nevertheless we often (every two days approximately) experience what seems to be a memory saturation leading to a server freeze. We then need to hard-reboot the server to get the instance running again. We assume that this is due to image upload, or a too high number of simultaneous connections, but we don’t know for sure.

Here are some metrics from after the latest crash:

We have several questions:

  • Is it a “normal” behaviour? Aren’t there safety locks meant to prevent such an extreme reaction?
  • How can we precisely troubleshoot the issue ? What logs to look at ?
  • Is there a way to “flush” memory/cache regularly in order to avoid this saturation of the memory ?

We’d like to know if there we can optimize our settings or set some limits, as we can’t afford yet to simply upgrade our server, and again, those specs seems sufficient according the Minimum requirement page.

Server Setup Information

  • Version of Rocket.Chat Server: 3.2.2
  • Operating System: Ubuntu 18.04.3 (LTS) x64
  • Deployment Method: snap
  • Number of Running Instances: 1
  • DB Replicaset Oplog: Enabled
  • NodeJS Version: 12.16.1
  • MongoDB Version: 3.6.14
  • Proxy: Caddy

Hello Corentin,
Have you managed to find a solution to the problem? I am experiencing OOM errors as well with the Digital Ocean deployment. My workspace is smaller at the moment, but we see that the cached memory does not get freed up, and that the server crasches on regular base.

Maurizio,

That install is over a year old and a snap install, so it is like comparing chalk and cheese in all likelihood as Rocket has changed massively in that time,

Note that the minimum requirements in the docs are old and need to be revised. We are doing some testing at the minute so we can update that. Most OS these days require 1Gb just to run.

Please just stick to one topic rather than hunting around for something that may be vaguely connected unless you are absolutely sure it is, in which case add a link to your topic for reference. Otherwise it just gets too confusing to follow, and then you will just get ignored!!

On Ubuntu 18.04, after 5 days full time search, I could fixed by

sudo sync && sudo sysctl -w vm.drop_caches=3