"Upload Chunks" error messages after recovery from backup and other problems

Description

We had to restore our snaps deployment of RC from a full backup about a month ago because of a server problem. The restore went without any error messages and problems and RC was up and running again as expected. But since then we are experiencing extremely slow file up- and downloads. Sometimes clicking the download button in the RC client takes up to 20sec just to show the download dialogue window. Also RC spikes to full CPU usage with around 10 people in an audio only Jitsi conference and eventually crashes RC. Only way to recover is a restart of the server. There are some other occasional little problems that we didn’t have before too.

I’m not a developer or IT specialist at all, so please be patient with me. As I had no idea what was going on I started to read through many Github issues and researched here for a solution in the existing topics. Although nothing seems to address our specific problem I got some clues. This led me to inspect the RC logs.

In there I could find hundreds of those messages:

Mar 23 11:13:05 rocketchat rocketchat-server.rocketchat-mongo[1163]: 2020-03-23T11:13:05.609+0100 I COMMAND [conn10] command parties.rocketchat_uploads.chunks command: find { find: "rocketchat_uploads.chunks", filter: { files_id: "urEhR8YpSrcnaJaLS" }, sort: { n: 1 }, returnKey: false, showRecordId: false, $clusterTime: { clusterTime: Timestamp(1584958370, 1), signature: { hash: BinData(0, 0000000000000000000000000000000000000000), keyId: 0 } }, $db: "parties" } planSummary: COLLSCAN keysExamined:0 docsExamined:13970 hasSortStage:1 cursorExhausted:1 numYields:686 nreturned:37 reslen:9633422 locks:{ Global: { acquireCount: { r: 1374 } }, Database: { acquireCount: { r: 687 } }, Collection: { acquireCount: { r: 687 } } } protocol:op_msg 13719ms

And also some of those:

Mar 23 11:07:18 rocketchat rocketchat-server.rocketchat-mongo[1163]: 2020-03-23T11:07:18.974+0100 I WRITE [conn10] update parties.usersSessions command: { q: { connections.id: "XLAn5uyhAnwNqkNa4" }, u: { $pull: { connections: { id: "XLAn5uyhAnwNqkNa4" } } }, multi: false, upsert: false } planSummary: IXSCAN { connections.id: 1 } keysExamined:1 docsExamined:1 nMatched:1 nModified:1 keysDeleted:1 writeConflicts:1 numYields:1 locks:{ Global: { acquireCount: { r: 3, w: 3 } }, Database: { acquireCount: { w: 3 } }, Collection: { acquireCount: { w: 2 } }, oplog: { acquireCount: { w: 1 } } } 197ms

I’ve read on other threads that this might be due to missing mongoDB indexes. I have no idea what that really means or how I would fix that, if it was even part of the problem which is really just an assumption on my side.

All I know is that RC is acting erratically and buggy lately up to the point of being hardly useable anymore. As this is the production environment for our company and everybody is restricted to home office due to the Covid-19 crisis, I really need a fix for RC rather urgently.

If anyone could look into this and give me some (easy to follow) advice I’d be more than grateful!

Let me know if I can provide any more information to make the troubleshooting process easier.

Server Setup Information

  • Version of Rocket.Chat Server: 2.4.11
  • Operating System: Ubuntu
  • Deployment Method: Snap
  • Number of Running Instances: 1
  • DB Replicaset Oplog: ?
  • NodeJS Version: 8.17.0
  • MongoDB Version: 3.6.14
  • Proxy: Nginx
  • Firewalls involved: ?

Any additional Information

RC is deployed on an Ubuntu Cloud Server with 2 VCPU, 4GB RAM and 40GB HD
Total users approx. 20-25