Messages not displaying and greyed out

Hi everybody,

I’ve already opened an issue on github and have seen similiar issues related to mine but still found no solution for this problem. I hope somebody knows whats going on on my server.

Sometimes our users get an auto disconnect in web browser. After relogin they get a DM notification, but they can’t see any new message when they open the chat window. Also they can see that someone is typing but the new sent message from the other user doesn’t appear. When the user sends a message to a specific other user or channel the message is greyed out and not delivered to the reciever. The message got something like a “Temp” status in the web browser console.

crtl-r revives the missing messages.

This happens since RC v0.61.1.

Any advice?

Version of Rocket.Chat Server: 0.66.1 & 0.68.3
Operating System: Debian 9
Deployment Method(snap/docker/tar/etc): tar
Number of Running Instances: 1
DB Replicaset Oplog: No
Node Version: 8.9.4 & 8.11.3

Github report: https://github.com/RocketChat/Rocket.Chat/issues/11516

This one is kind of elusive. I think I’ve seen a few times but i dont think anyone has been able to reproduce consistently. :frowning:

Ye, thats right…

here the steps to reproduce once again. This occurs several times a week but we dont know whats the reason for the session timeout :confused:

Steps to reproduce:

  1. User A gets disconnected and logs in again.
  2. User A sends a message to any User B.
  3. The sent message is greyed out for user A/User B receives message ordinarily.
  4. User B sends a message to user A.
  5. User A can see “User B is typing” and he gets a new message notification, but there are no new messages visible for User A in chat window.
  6. Refreshing browser cache or relogin solves this bug.

Any advice? Any optimization? Or maybe downgrade? :frowning:

Do you still get this? Have you tried newer versions?

currently experiencing this, no understanding of why. We’re running about 10 different instances, all on the same atlas mongo db cluster. Never happened before, but it is currently happening across them all.

@amcgrath it would really help us and others if you can summarize what happened in this case. Thanks.

Messages not update until the users refresh the browser…
“User A” see in the chat: “User B” is typing but never get the message send it by the “User B”.
“User B” see his message in gray (like never was sended).

We have the last version (3.7.0) and the issue persists.

I’m also having the same issue on 3.7.1. It works fine on my local dev machine, but when I deploy to production, I can reproduce the issue as @jsantana describes it.

At first I thought it might be an issue with my production Caddy config around websockets but since we can see the “UserA is typing…” message, that would seem to indicate that the Realtime API subscriptions are receiving fine over websockets. I also turned on Caddy logs to debug level and don’t see anything useful there (no relevant errors).

This seems like possibly the same issue described here (except they’re running on Heroku): https://github.com/RocketChat/Rocket.Chat/issues/18505

Differences between my local machine and production environment:

  • local is running Mac OS X 10.14.6

  • production is running Ubuntu 18.04

  • local is running in usual dev mode (meteor npm start)

  • production is using a custom Meteor bundle that I build on a Linux box, then upload to the production server

  • local has no reverse proxy

  • production is using Caddy 2.2.1 as reverse proxy

  • local is using HTTP

  • production is using HTTPS with custom domain name (via Caddy Auto-SSL / Lets Encrypt)

  • local is using local MongoDB server with single-node replica set (NOT the built-in one that ships with Meteor, I have a separate mongo server running locally)

  • production is using MongoDB Atlas cluster with 3-node replica set

I’m still actively working on this issue and looking at the RC code so I’ll post more updates as I learn more.

Websocket messages for the sender after a successful post on my local machine:

Websocket messages for the sender from a post on my production site (where the problem is occurring):

So after rebooting my production RC server, the problem went away. Hmmm. This occurred after my production server had not had any traffic for a couple days. I wonder if there is an issue related to periods of inactivity where something gets disconnected and doesn’t properly reconnect?

I was curious why the “UserA is typing…” notification appeared to UserB but the actual message never arrived. After looking at the code a bit it seems that the “typing…” notification is handled by the rocketchat:streamer package and doesn’t involve the oplog since it’s just UserA emitting a DDP notification to the server on the “notify-room” stream and UserB (who is subscribed to that stream) receiving the notification.

However, when the actual message is sent by UserA, the oplog is involved. UserA has a message added to their Minimongo store with “temp: true” which makes the message appear gray in the UI. Then after the message is saved to the DB, the server (who is listening for changes on the oplog) emits a “changed” message to the sender (this is done in /server/stream/messages/emitter.js). The sender then receives the “changed” message and removes the “temp” field which turns the message from gray to black.

Since we know that the message is saving correctly to the DB I’m wondering if maybe there is a bug in RC’s custom oplog processing logic where it’s somehow not being notified that a new message was inserted into the DB?

In general, the problem is most likely in one of the following places:

  • RC server isn’t correctly processing the oplog message “insert” operation

  • RC server isn’t sending the “changed” message to the client

  • client isn’t receiving the “changed” message from the server (for example, if it has stopped listening to the “room-messages” stream)

  • client isn’t correctly processing the “changed” message and updating the message in Minimongo (for example, if the message is incorrectly formatted or there is a bug in the client code)

Adding some logging to each of the above 4 areas should help isolate the root cause. If I’m able to reproduce the problem again, I’ll add the logging and let you all know what I find.

Hi, did you figure out what the problem was? This happened to us today and a restart solved the problem. However I have a feeling that it might happen again. We upgraded to 3.7.1 (from v 1.x something) a week ago and never had the problem before. We are on MongoDB 3.6 in Atlas.

Weird thing was that the mobile client worked, but not the Desktop app or webapp.

Hi, no admin here, just a regular user of my uni’s chat, but the same error occured to me with just one other person in a personal chat. I couldn’t view any messages sent by me or that person in the chat, but I could see that they were typing something.
This happened on desktop app version 3.0.6 and 2.17.x (i don’t remember clearly), somehow I can’t update onto 3.0.7 either, if that was fixed there. An uninstall without removing user data didn’t solve the issue, but another with removing it did solve it. On the webapp and on Android I didnt encounter this problem.
Hope it helps.

Hey guys, I think I have some good news!

This problem started happening more frequently lately (daily for the past few days) and it looks like I’ve got a solution.

Last night I added the environment variable USE_NATIVE_OPLOG=true. This disables the newer oplog tailing code added in RC 3.7.0 that is using change streams and instead reverts back to Meteor’s oplog tailing (https://github.com/RocketChat/Rocket.Chat/pull/18892).

I checked the RC server logs this morning and saw that Atlas had what looks like 2 Mongo primary elections in the space of 7 hours overnight and for the first time, my production RC instance survived them both and there is no trace of this error anymore in the UI. I can post messages and they appear instantly (instead of being greyed out), the recipient sees them right away if they are online, user presence is working and email push notifications are being delivered.

Here is the redacted environment variable entry for my Rocket.Chat service file in case that helps anyone:

Environment=USE_NATIVE_OPLOG=true MONGO_URL=mongodb+srv://USERNAME:PASSWORD@CLUSTER_NAME.mongodb.net/DB_NAME?retryWrites=true&readPreference=primaryPreferred&w=majority MONGO_OPLOG_URL=mongodb+srv://USERNAME:PASSWORD@CLUSTER_NAME.mongodb.net/local ROOT_URL=https://mydomain.com/ PORT=3000

Note that the “mongodb+srv” URI syntax no longer requires replicaSet or authSource to be set on MONGO_URL or MONGO_OPLOG_URL.

====================================

Here is some previous analysis that I did which may be useful…

Some different scenarios I tested to help isolate the issue:

  1. Custom RC 3.7.1 codebase on my laptop (OS X) + local MongoDB replica set without USE_NATIVE_OPLOG=true -> NO problem
  2. Custom RC 3.7.1 codebase on my laptop (OS X) + Atlas MongoDB replica set without USE_NATIVE_OPLOG=true -> Problem reproduced
  3. Custom RC 3.7.1 codebase on DigitalOcean droplet (Ubuntu + Caddy 2) + Atlas MongoDB replica set without USE_NATIVE_OPLOG=true -> Problem reproduced
  4. Custom RC 3.7.1 codebase on DigitalOcean droplet (Ubuntu + Caddy 2) + Atlas MongoDB replica set with USE_NATIVE_OPLOG=true -> NO Problem

Since I was able to reproduce the problem on both my laptop and my production environment, that meant it was probably not related to my production environment.

Since I was not able to reproduce it locally when using a local MongoDB replica set, that meant it was probably something specific that’s happening on Atlas - which makes sense since Atlas frequently does primary elections, which I never do on my local Mongo replica set.

====================================

Also, I’m now seeing some new errors in my production RC logs apparently related to Atlas Mongo primary election (but they don’t seem to cause any problems or if they do, RC can now recover from them successfully):

ERROR 1: MongoError: not master

Dec 17 11:31:07 **************** rocketchat[25979]: Exception in setInterval callback: MongoError: not master
Dec 17 11:31:07 **************** rocketchat[25979]: at Connection. (/opt/Rocket.Chat/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/core/connection/pool.js:451:61)
Dec 17 11:31:07 **************** rocketchat[25979]: at Connection.emit (events.js:210:5)
Dec 17 11:31:07 **************** rocketchat[25979]: at Connection.EventEmitter.emit (domain.js:475:20)
Dec 17 11:31:07 **************** rocketchat[25979]: at processMessage (/opt/Rocket.Chat/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/core/connection/connection.js:452:10)
Dec 17 11:31:07 **************** rocketchat[25979]: at TLSSocket. (/opt/Rocket.Chat/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/core/connection/connection.js:621:15)
Dec 17 11:31:07 **************** rocketchat[25979]: at TLSSocket.emit (events.js:210:5)
Dec 17 11:31:07 **************** rocketchat[25979]: at TLSSocket.EventEmitter.emit (domain.js:475:20)
Dec 17 11:31:07 **************** rocketchat[25979]: at addChunk (_stream_readable.js:309:12)
Dec 17 11:31:07 **************** rocketchat[25979]: at readableAddChunk (stream_readable.js:290:11)
Dec 17 11:31:07 **************** rocketchat[25979]: at TLSSocket.Readable.push (stream_readable.js:224:10)
Dec 17 11:31:07 **************** rocketchat[25979]: at TLSWrap.onStreamRead (internal/stream_base_commons.js:182:23) {
Dec 17 11:31:07 **************** rocketchat[25979]: operationTime: Timestamp { bsontype: ‘Timestamp’, low: 2, high
: 1608204667 },
Dec 17 11:31:07 **************** rocketchat[25979]: ok: 0,
Dec 17 11:31:07 **************** rocketchat[25979]: code: 10107,
Dec 17 11:31:07 **************** rocketchat[25979]: codeName: ‘NotWritablePrimary’,
Dec 17 11:31:07 **************** rocketchat[25979]: ‘$clusterTime’: {
Dec 17 11:31:07 **************** rocketchat[25979]: clusterTime: Timestamp { bsontype: ‘Timestamp’, low: 3, high
: 1608204667 },
Dec 17 11:31:07 **************** rocketchat[25979]: signature: { hash: [Binary], keyId: [Long] }
Dec 17 11:31:07 **************** rocketchat[25979]: },
Dec 17 11:31:07 **************** rocketchat[25979]: name: ‘MongoError’,
Dec 17 11:31:07 **************** rocketchat[25979]: [Symbol(errorLabels)]: Set { ‘RetryableWriteError’ }
Dec 17 11:31:07 **************** rocketchat[25979]: }

ERROR 2: UnHandledPromiseRejection - MongoError: not master

Dec 17 16:05:20 **************** rocketchat[25979]: === UnHandledPromiseRejection ===
Dec 17 16:05:20 **************** rocketchat[25979]: MongoError: not master
Dec 17 16:05:20 **************** rocketchat[25979]: at Connection. (/opt/Rocket.Chat/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/core/connection/pool.js:451:61)
Dec 17 16:05:20 **************** rocketchat[25979]: at Connection.emit (events.js:210:5)
Dec 17 16:05:20 **************** rocketchat[25979]: at Connection.EventEmitter.emit (domain.js:475:20)
Dec 17 16:05:20 **************** rocketchat[25979]: at processMessage (/opt/Rocket.Chat/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/core/connection/connection.js:452:10)
Dec 17 16:05:20 **************** rocketchat[25979]: at TLSSocket. (/opt/Rocket.Chat/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/core/connection/connection.js:621:15)
Dec 17 16:05:20 **************** rocketchat[25979]: at TLSSocket.emit (events.js:210:5)
Dec 17 16:05:20 **************** rocketchat[25979]: at TLSSocket.EventEmitter.emit (domain.js:475:20)
Dec 17 16:05:20 **************** rocketchat[25979]: at addChunk (_stream_readable.js:309:12)
Dec 17 16:05:20 **************** rocketchat[25979]: at readableAddChunk (stream_readable.js:290:11)
Dec 17 16:05:20 **************** rocketchat[25979]: at TLSSocket.Readable.push (stream_readable.js:224:10)
Dec 17 16:05:20 **************** rocketchat[25979]: at TLSWrap.onStreamRead (internal/stream_base_commons.js:182:23)
Dec 17 16:05:20 **************** rocketchat[25979]: => awaited here:
Dec 17 16:05:20 **************** rocketchat[25979]: at Function.Promise.await (/opt/Rocket.Chat/programs/server/npm/node_modules/meteor/promise/node_modules/meteor-promise/promise_server.js:56:12)
Dec 17 16:05:20 **************** rocketchat[25979]: at app/models/server/raw/NotificationQueue.ts:60:15
Dec 17 16:05:20 **************** rocketchat[25979]: at /opt/Rocket.Chat/programs/server/npm/node_modules/meteor/promise/node_modules/meteor-promise/fiber_pool.js:43:40
Dec 17 16:05:20 **************** rocketchat[25979]: => awaited here:
Dec 17 16:05:20 **************** rocketchat[25979]: at Function.Promise.await (/opt/Rocket.Chat/programs/server/npm/node_modules/meteor/promise/node_modules/meteor-promise/promise_server.js:56:12)
Dec 17 16:05:20 **************** rocketchat[25979]: at app/notification-queue/server/NotificationQueue.ts:48:21
Dec 17 16:05:20 **************** rocketchat[25979]: at /opt/Rocket.Chat/programs/server/npm/node_modules/meteor/promise/node_modules/meteor-promise/fiber_pool.js:43:40 {
Dec 17 16:05:20 **************** rocketchat[25979]: operationTime: Timestamp { bsontype: ‘Timestamp’, low: 6, high
: 1608221119 },
Dec 17 16:05:20 **************** rocketchat[25979]: ok: 0,
Dec 17 16:05:20 **************** rocketchat[25979]: code: 10107,
Dec 17 16:05:20 **************** rocketchat[25979]: codeName: ‘NotWritablePrimary’,
Dec 17 16:05:20 **************** rocketchat[25979]: ‘$clusterTime’: {
Dec 17 16:05:20 **************** rocketchat[25979]: clusterTime: Timestamp { bsontype: ‘Timestamp’, low: 8, high
: 1608221119 },
Dec 17 16:05:20 **************** rocketchat[25979]: signature: { hash: [Binary], keyId: [Long] }
Dec 17 16:05:20 **************** rocketchat[25979]: },
Dec 17 16:05:20 **************** rocketchat[25979]: name: ‘MongoError’,
Dec 17 16:05:20 **************** rocketchat[25979]: [Symbol(errorLabels)]: Set { ‘RetryableWriteError’ }
Dec 17 16:05:20 **************** rocketchat[25979]: }
Dec 17 16:05:20 **************** rocketchat[25979]: ---------------------------------
Dec 17 16:05:20 **************** rocketchat[25979]: Errors like this can cause oplog processing errors.
Dec 17 16:05:20 **************** rocketchat[25979]: Setting EXIT_UNHANDLEDPROMISEREJECTION will cause the process to exit allowing your service to automatically restart the process
Dec 17 16:05:20 **************** rocketchat[25979]: Future node.js versions will automatically exit the process
Dec 17 16:05:20 **************** rocketchat[25979]: =================================

Also just realized that it’s possible the MongoError: not master errors I’m getting are because I need to add “&readPreference=primaryPreferred” to the MONGO_OPLOG_URL. I’ll try that and see if that fixes it.

Any update on the same?
I’m facing the same issue consistently with the chat server. I’m using mongo DB atlas

i have same issues with rocket.chat v3.0.7
sometimes the message not send :frowning:

UPDATE: Hey guys, so the good news is I haven’t had any problems with messages being grayed out and not displaying after adding the USE_NATIVE_OPLOG=true environment variable.

However, I have noticed that after an UnHandledPromiseRejection, my offline email notifications are no longer getting sent until I restart the RC server. After looking at the code, I see that the offline email notification code makes uses of recursive promises so I could see how the 2 might be related. Therefore I would suggest setting the environment variable EXIT_UNHANDLEDPROMISEREJECTION=1 so that your RC server shuts down / exits after an UnHandledPromiseRejection…and if you do this then make SURE that your server is set up to automatically restart on failure/crash.

How would one set these environment variables when using a Snap so as to ensure the environment variable remains on restarts, updates etc.?

@phideaux See: “How do I change other environmental variables in my snap?” over here - Snaps - Rocket.Chat Docs

i tried to set USE_NATIVE_OPLOG = false, but sometimes it still occur :frowning: