Notifications not being sent to clients - desktop / mobile

Description

We migrated our Multi-node / replica set RC solution to a new Kubernetes cluster, after the migration notifications from chat rooms no longer illuminate/notify of new messages reliably. Multiple clients types (iOS, Electron, Web, Android) are all affected but not all rooms/conversations.

Server Setup Information

  • Version of Rocket.Chat Server: 1.3.2
  • Operating System: k8s
  • Deployment Method: Azure AKS
  • Number of Running Instances: 3
  • DB Replicaset Oplog: yes
  • NodeJS Version: v8.11.4
  • MongoDB Version: 4.0.12
  • Proxy: Nginx
  • Firewalls involved: n/m

Any additional Information

Deployment is leveraging Helm for both database AND rocketchat. https://github.com/helm/charts/tree/master/stable/rocketchat

The migration process has dump the database, reload the database on the new site and turn up RC replica.

Post migration the only odd errors I saw were
StreamBroadcast ➔ Stream.error Stream broadcast from ‘10.61.25.47:3000’ to ‘10.61.17.198:3000/’ with name notify-room not authorized

and these have subsided.

To add i also found

Error in oplog callback TypeError: Cannot read property 'u' of undefined
at BaseDb.Subscriptions.on (server/publications/subscription/emitter.js:19:39)
at emitOne (events.js:116:13)
at BaseDb.emit (events.js:211:7)
at BaseDb.processOplogRecord (app/models/server/models/_BaseDb.js:157:9)
at packages/mongo/oplog_tailing.js:105:7
at runWithEnvironment (packages/meteor.js:1356:24)
at Object.callback (packages/meteor.js:1369:14)
at packages/ddp-server/crossbar.js:114:36
at Array.forEach (<anonymous>)
at Function._.each._.forEach (packages/underscore.js:139:11)
at DDPServer._Crossbar.fire (packages/ddp-server/crossbar.js:112:7)
at handleDoc (packages/mongo/oplog_tailing.js:311:24)
at packages/mongo/oplog_tailing.js:337:11
at Meteor.EnvironmentVariable.EVp.withValue (packages/meteor.js:1304:12)
at packages/meteor.js:620:25
at runWithEnvironment (packages/meteor.js:1356:24)

Further troubleshooting tonight involved scaling the replicas of the rocketchat app down to 0 and then bringing it back up. 2 out of the 3 pods had the following errors

{ MongoError: ns not found                                                                                                                                │
│     at /app/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb-core/lib/connection/pool.js:581:63                              │
│     at authenticateStragglers (/app/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb-core/lib/connection/pool.js:504:16)     │
│     at Connection.messageHandler (/app/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb-core/lib/connection/pool.js:540:5)   │
│     at emitMessageHandler (/app/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb-core/lib/connection/connection.js:310:10)   │
│     at Socket.<anonymous> (/app/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb-core/lib/connection/connection.js:453:17)   │
│     at emitOne (events.js:116:13)                                                                                                                         │
│     at Socket.emit (events.js:211:7)                                                                                                                      │
│     at addChunk (_stream_readable.js:263:12)                                                                                                              │
│     at readableAddChunk (_stream_readable.js:250:11)                                                                                                      │
│     at Socket.Readable.push (_stream_readable.js:208:10)                                                                                                  │
│     at TCP.onread (net.js:597:20)                                                                                                                         │
│   operationTime: Timestamp { _bsontype: 'Timestamp', low_: 15, high_: 1569381919 },                                                                       │
│   ok: 0,                                                                                                                                                  │
│   errmsg: 'ns not found',                                                                                                                                 │
│   code: 26,                                                                                                                                               │
│   codeName: 'NamespaceNotFound',                                                                                                                          │
│   '$clusterTime':                                                                                                                                         │
│    { clusterTime: Timestamp { _bsontype: 'Timestamp', low_: 17, high_: 1569381919 },                                                                      │
│      signature: { hash: [Object], keyId: [Object] } },                                                                                                    │
│   name: 'MongoError',                                                                                                                                     │
│   [Symbol(mongoErrorContextSymbol)]: {} }