My instance of RC is becoming unusable since it consumes so much CPU. Mongo consumes about 400% CPU on a large server with only a few users online. This started a few weeks ago and is gradually getting worse, currently anonymous login isn’t even working since it can’t load the username suggestions.
I considered it may be due to an attack but I’ve enabled DDOS protection from Cloudflare and the result is the same. There are however no problems on my dev server, the problem start on production when there are users.
It would be much appreciated if anyone could help sort this out. I have previously restored an old backup of the server since I initially thought it was due to updating to 3.5.0, but a couple of days after restoring the backup the problems arose again.
I’m also running a Wildfly server on it but it’s barely used by any users and doesn’t consume a lot of resources.
I thought of starting a completely new instance, in case something has become corrupted in my current one. But I suppose migrating all the data will be troublesome since dumping the whole DB will probably result in the same issues on the new server.
It’s deployed with compose-compose and I have now tried to limit memory usage with mem_limit: 2048m, I also tried using the cpu_shares limit on the mongo container. But mongod still consumes 200-400% CPU.
I’ll continue to monitor and see if there are any improvements but I’ll try to isolate mongo on it’s own instance as well. Are there any specific settings that has to be set to run mongo on a separate server, or will it be enough to run mongo and mongo-init-replica on it’s own server in docker-compose, and point the current rocketchat service to that server?
Yes I’m using GridsFS, there are about 80GB of files uploaded and approximately 1000 files are uploaded per day, mainly images.
I have setup an isolated server for mongo but uploads aren’t working with GridFS now, it keeps looking for the images on the app server. I’ve changed to FileSystem which uploads new files, but the progress meter gets stuck on 0% despite the file being uploaded. And none of the already uploaded files are available since it still uses the app server url for those as well.
It’s also very unpredictable right now, a lot of the times it won’t even load a channel, it just gets stuck at loading.
New issues are just popping up each day with this software. When I activate FileSystem uploads it doesn’t store them in the correct folder, but in a folder such as /var/lib/docker/overlay2/68a9f50e946d6f986b0cf19662a036eb1b1be74f196aa99fbb3aaafc25c96c6c/merged/app/bundle/programs/server/uploads. And as soon as I restart the container it breaks and all images return 404 even the avatars, probably because the container id changes on restart. Even if I run all containers as root and set the path for uploads to /uploads it won’t save anything to that folder, only to the folder relative to docker.
Usually about 70-100 users are connecting simultaneously, so you’d think that 6 vCPUs and 16 GB memory would be enough, but more users are leaving each day since it’s so unstable.