Initiating live chat is very slow, possibly because of a slow query

Description

Whenever a new live chat guest wants to initiate a chat, it takes a very long time for the message to be sent.

Traced it back to the /api/v1/livechat/room?token=redacted_token_here: it keeps “pending” for while, until it times out with a 504. The live chat guest can’t initiate a new conversation.

In the MongoDB logs I see a slow query, taking over 70 seconds. The navigation.token is the one of the API call above.

rocketchat_1_29a615826450 | server.js:207 API ➔ debug GET: /api/v1/livechat/config?token=redacted_token_here
rocketchat_1_29a615826450 | server.js:207 API ➔ debug Success { statusCode: 200, [...]

rocketchat_1_29a615826450 | server.js:207 API ➔ debug POST: /api/v1/livechat/visitor
rocketchat_1_29a615826450 | server.js:207 API ➔ debug Success { statusCode: 200, [...]

rocketchat_1_29a615826450 | server.js:207 API ➔ debug GET: /api/v1/livechat/room?token=redacted_token_here

mongo_1_4205249569f6  | 2019-06-12T07:43:17.633+0000 I COMMAND  [conn18] command rocketchat.rocketchat_message command: find { find: "rocketchat_message", filter: { navigation.token: "redacted_token_here", rid: null }, returnKey: false, showRecordId: false, $clusterTime: { clusterTime: Timestamp(1560325324, 1), signature: { hash: BinData(0, 0000000000000000000000000000000000000000), keyId: 0 } }, lsid: { id: UUID("redacted_uuid_here") }, $db: "rocketchat" } planSummary: IXSCAN { rid: 1, t: 1, u._id: 1 } keysExamined:1496523 docsExamined:1496523 cursorExhausted:1 numYields:40818 nreturned:0 reslen:258 locks:{ Global: { acquireCount: { r: 81638 } }, MMAPV1Journal: { acquireCount: { r: 40819 }, acquireWaitCount: { r: 1 }, timeAcquiringMicros: { r: 502 } }, Database: { acquireCount: { r: 40819 } }, Collection: { acquireCount: { R: 40819 } }, Mutex: { acquireCount: { r: 35662 } } } protocol:op_query 73628ms
mongo_1_4205249569f6  | 2019-06-12T07:44:34.476+0000 I WRITE    [conn21] update rocketchat.rocketchat_message command: { q: { navigation.token: "redacted_token_here", rid: null }, u: { $set: { rid: "redacted_rid_here", _updatedAt: new Date(1560325324004) } }, multi: true, upsert: false } planSummary: IXSCAN { rid: 1, t: 1, u._id: 1 } keysExamined:1496523 docsExamined:1496523 nMatched:0 nModified:0 numYields:14469 locks:{ Global: { acquireCount: { r: 14470, w: 14470 } }, MMAPV1Journal: { acquireCount: { w: 14490 }, acquireWaitCount: { w: 604 }, timeAcquiringMicros: { w: 15137 } }, Database: { acquireCount: { w: 14470 } }, Collection: { acquireCount: { W: 14470 }, acquireWaitCount: { W: 20 }, timeAcquiringMicros: { W: 815 } }, Mutex: { acquireCount: { r: 2484 } } } 76842ms

Should I add an index to navigation.token? Or is something seriously wrong?

Server Setup Information

  • Version of Rocket.Chat Server: 1.1.1
  • Operating System: Linux
  • Deployment Method: docker-compose
  • Number of Running Instances: 1
  • DB Replicaset Oplog: Enabled
  • NodeJS Version: v8.11.4
  • MongoDB Version: 4.0.10
  • Proxy: nginx -> traefik -> Rocket.Chat
  • Firewalls involved: All accessible for Rocket.Chat

Any additional Information

  • I upgraded to 1.1.1 from 0.74.3. However, I had this issue already back then. Upgrading didn’t solve it.
  • We have a lot of messages.

docker-compose.yml:

version: '2'

services:
  rocketchat:
    image: rocketchat/rocket.chat:1.1.1
    restart: unless-stopped
    command: /bin/sh -c 'INSTANCE_IP=$$(hostname -i) node main.js'
    mem_limit: 4G
    volumes:
     - ./data/uploads:/app/uploads
     - /tmp:/tmp
    environment:
     - PORT=3000
     - ROOT_URL=http://${ROCKETCHAT_HOST}
     - MONGO_URL=mongodb://mongo:27017/rocketchat
     - MONGO_OPLOG_URL=mongodb://mongo:27017/local
    labels:
      - "traefik.backend=rocketchat"
      - "traefik.frontend.rule=Host: ${ROCKETCHAT_HOST}"
    depends_on:
      - mongo

  mongo:
    image: mongo:4.0
    restart: unless-stopped
    command: mongod --smallfiles --oplogSize 128 --replSet rs0
    volumes:
      - ./data/db:/data/db
      - ./data/backups:/dump

  mongo-init-replica:
    image: mongo:4.0
    command: 'mongo mongo/rocketchat --eval "rs.initiate({ _id: ''rs0'', members: [ { _id: 0, host: ''localhost:27017'' } ]})"'
    depends_on:
      - mongo

  traefik:
    image: traefik:1.6.4
    restart: unless-stopped
    command: traefik --web --docker --defaultentrypoints=http --entryPoints='Name:http Address::3000'
    ports:
      - 3000:3000
      - 8080:8080
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock

nginx.conf:

server {
   listen 443 ssl;
   server_name redacted_server_name;

   ssl_certificate /etc/letsencrypt/live/redacted_server_name/fullchain.pem;
   ssl_certificate_key /etc/letsencrypt/live/redacted_server_name/privkey.pem;
   ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
   ssl_prefer_server_ciphers on;
   ssl_ciphers 'EECDH+AESGCM:EDH+AESGCM:AES256+EECDH:AES256+EDH';

   location / {
      proxy_pass http://localhost:3000/;
      proxy_http_version 1.1;
      proxy_set_header Upgrade $http_upgrade;
      proxy_set_header Connection "upgrade";
      proxy_set_header Host $http_host;
      proxy_set_header X-Real-IP $remote_addr;
      proxy_set_header X-Forward-For $proxy_add_x_forwarded_for;
      proxy_set_header X-Forward-Proto http;
      proxy_set_header X-Nginx-Proxy true;
      proxy_redirect off;
   }
}

server {
   listen 80;
   server_name redacted_server_name;

   return 301 https://$host$request_uri;
}

Websockets are working:

$ curl -i -N -H "Connection: Upgrade" -H "Upgrade: websocket" -H "Host: redacted_server_name" -H "Origin: redacted_server_name" -H "Sec-WebSocket-Key: key==" -H "Sec-WebSocket-Version: 13" https://redacted_server_name/websocket
HTTP/1.1 101 Switching Protocols
Server: nginx/1.12.2
Date: Wed, 12 Jun 2019 09:33:02 GMT
Connection: upgrade
Upgrade: websocket
Sec-WebSocket-Accept: key=

{"server_id":"0"}^C

Any idea?

I created a backup, downloaded it to my dev machine, and started Rocket.Chat locally. I got the same issue. A live chat would time out on /api/v1/livechat/room.

So I took down Rocket.Chat, and I created the index I mentioned above, following the MongoDB docs for creating an index on replica sets. The command:

use rocketchat
db.rocketchat_message.createIndex({"navigation.token": 1})

It took a while to build, let it get in sync, then started Rocket.Chat, and tried a new live chat. It all worked out, the live chat started working again.

However, I did this on a local copy of the production setup, and I think my “solution” might be not the way to go. So I’ll let this open for the devs to check out.

After a chat with the devs, I created an issue on Github: