Server Setup Information:
The version of Rocket.Chat Server: 0.63.3
Operating System: Ubuntu 16.04
Deployment Method: PM2
Number of Running Instances: 14
DB Replicaset Oplog: Yes, 5 member one of those is an arbiter.
MongoDB Version: 3.2.X
RocketChat statistics:
Total Users 7116
Active Users 6174
Inactive Users 942
Online Users 342
Away Users 344
Offline Users 6430
Total Rooms 49149
Total Channels 44
Total Private Groups 1917
Total Direct Message Rooms 47188
Total Livechat Rooms 47188
Total Messages 2399787
Total Messages in Channels 5345
Total Messages in Private Groups 416826
Total Messages in Direct Messages 1975032
Total users 7000+ and moderate chat/file upload activities by 300-900 users brings rocketchat installation to the unusable state and many 502 are thrown out. From infrastructure, this is how the request flows
Users <—> AWS ALB(SSL Termination) <–> RC Nodes
Here’s the 5XX report from ALB
In logs we get,
[ Node processes ]
Warning: connect.session() MemoryStore is not
designed for a production environment, as it will leak
memory, and will not scale past a single process.
Wed, 17 Oct 2018 02:35:13 GMT connect deprecated multipart: use parser (multiparty, busboy, formidable) npm module instead at npm/node_modules/connect
/lib/middleware/bodyParser.js:56:20
Wed, 17 Oct 2018 02:35:13 GMT connect deprecated limit: Restrict request size at location of read at npm/node_modules/connect/lib/middleware/multipart
.js:86:15
You have triggered an unhandledRejection, you may have forgotten to catch a Promise rejection:
MongoError: ns not found
at Function.MongoError.create (/home/deploy/channels/releases/20180719040156/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules
/mongodb-core/lib/error.js:31:11)
at /home/deploy/channels/releases/20180719040156/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb-core/lib/connection/pool.js:497:72
at authenticateStragglers (/home/deploy/channels/releases/20180719040156/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb-core/lib/connection/pool.js:443:16)
at Connection.messageHandler (/home/deploy/channels/releases/20180719040156/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb-core/lib/connection/pool.js:477:5)
at Socket. (/home/deploy/channels/releases/20180719040156/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb-core/lib/connection/connection.js:333:22)
at emitOne (events.js:116:13)
at Socket.emit (events.js:211:7)
at addChunk (_stream_readable.js:263:12)
at readableAddChunk (_stream_readable.js:250:11)
at Socket.Readable.push (_stream_readable.js:208:10)
at TCP.onread (net.js:597:20)
Error: socket hang up
at createHangUpError (_http_client.js:331:15)
at Socket.socketOnEnd (_http_client.js:423:23)
at emitNone (events.js:111:20)
at Socket.emit (events.js:208:7)
at endReadableNT (_stream_readable.js:1064:12)
at _combinedTickCallback (internal/process/next_tick.js:138:11)
at process._tickDomainCallback (internal/process/next_tick.js:218:9)
Error: socket hang up
at createHangUpError (_http_client.js:331:15)
at Socket.socketCloseListener (_http_client.js:363:23)
at emitOne (events.js:121:20)
at Socket.emit (events.js:211:7)
at TCP._handle.close [as _onclose] (net.js:557:12)
[Nginx]
2018-10-17T12:58:26.239590Z XXX-production-app10 nginx - - - 2018/10/17 12:58:26 [error] 15374#15374: *34349 recv() failed (104: Connection reset by peer) while proxying upgraded connection, client: 172.17.37.220, server: XXXXXX, request: “GET /sockjs/008/vcqc9hke/websocket HTTP/1.1”, upstream: “x “,
xxxx
<14>1 2018-10-17T12:46:26.302082Z xxxx-production-app5 nginx - - - 2018/10/17 12:46:26 [error] 26789#26789: *1447807 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 172.17.108.56, server: xxxxxxxxx ; /AmazonS3:Uploads/e9D2Z4NzGWdJavrfG?token=d97AB9993b&progress=0.6322451184687814 HTTP/1.1”, upstream: “AmazonS3:Uploads/e9D2Z4NzGWdJavrfG?token=d97AB9993b&progress=0.6322451184687814”, host: “xxxxx”, referrer: “
Hope some RocketChat expert could help?