Docker deployment not starting after upgrade from 4.2.0 to 4.2.1/4.2.2/4.3.0

Description

I’m using the official docker hub images for mongodb and rocket.chat. I’m not using docker-compose (I should, I know, but it’s not related to the issue), but the setup is similar, only done with basic docker commands . It’s working on 4.2.0 (and many others earlier; the setup is old), but updating the rocket.chat image (tried 4.2.1, 4.2.2 and 4.2.3) breaks it.

When running the service, it says nothing into “docker logs” and after a moment of silence, just shuts down. In MongoDB logs, I can see that it makes the normal amount of connections and authenticates into the DB successfully, but does nothing after that. Reverting back to 4.2.0 image fixes the issue.

Any tips on where to look for the cause?

Server Setup Information

  • Version of Rocket.Chat Server: 4.2.0 (works) 4.2.1+ (fails)
  • Operating System: CentOS 7
  • Deployment Method: Docker
  • Number of Running Instances: 1
  • DB Replicaset Oplog:
  • NodeJS Version: 12.22.1 - x64
  • MongoDB Version: 4.2.17 (wiredTiger)
  • Proxy: nginx
  • Firewalls involved: no

Any additional Information

Updated from mmap to wiredTiger a while ago, but the setup has been working since then, through a couple of rocket.chat versions.

Containers are in a docker network, communicating via docker hostnames. Mongo has auth enabled on a separate user (rocket).

Mongo launched with args: --storageEngine wiredTiger --auth --replSet rs0

Rocket env-variables set via docker:
-e “ROOT_URL=https://-domain-here-”
-e “MONGO_URL=mongodb://rocket:-password-here-@mongo-db:27017/meteor”
-e “MONGO_OPLOG_URL=mongodb://rocket:-password-here-@mongo-db:27017/local?authSource=meteor&replSet=rs0”
-e “USE_NATIVE_OPLOG=true”

Hi! Welcome to our forums!

Have you tried with latest 4.3.0? (Title says one thing, Server Setup info don’t)

It’s really strange to not spill any logs. Even if this was an auth problem.

I would suggest moving to a docker-compose, just to make sure, and creating an staging environment to test with latest.

Thanks for the reply. I did try 4.3.0 as well and the result was the same.

I guess, I’ll move to compose soon anyhow, but I don’t see it affecting this. It would be more maintainable at any rate. Maybe I should try to tinker with it a bit, launching the app manually inside the container too see if it would be more verbose.

Tried running the container with bash as entrypoint (now testing on 4.3.0).
The environment variables within look fine to me:

I have no name!@rocket_test:/app/bundle$ printenv
HOSTNAME=rocket_test
PWD=/app/bundle
PORT=3000
NODE_ENV=production
Accounts_AvatarStorePath=/app/uploads
DEPLOY_METHOD=docker-official
RC_VERSION=4.3.0
HOME=/tmp
TERM=xterm
SHLVL=1
ROOT_URL=https://my.domain.here
MONGO_OPLOG_URL=mongodb://rocket:password_censored@mongo-db:27017/local?authSource=meteor&replSet=rs0
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
MONGO_URL=mongodb://rocket:password_censored@mongo-db:27017/meteor
NODE_VERSION=12.22.8
USE_NATIVE_OPLOG=true
_=/usr/bin/printenv

Running the app manually also dies after a silent while. There is an error, but not too verbose:

I have no name!@rocket_test:/app/bundle$ node main.js
Illegal instruction
I have no name!@rocket_test:/app/bundle$

Edit:
I tried a few different tweaks into the config emulating the official compose-file from Docker.Official.Image/docker-compose.yml at master · RocketChat/Docker.Official.Image · GitHub . There were some differences, but changing them didn’t change the result. It just dies with “Illegal Instruction”. Is there a way to get more logging?

There were some segmentation fault issues with centOS recently.

Not sure if it’s related. The no log situation is already strange, so worth looking.
The docker should isolate this kind of stuff.

For reference:

I think you’re on to something. The fix is merged on 4.2.1 - exactly where my troubles started and the CentOS 7 matches as well. The fix really just updates the sharp library (and anything behind it of course). A bit of googling indicates that sharp has given Illegal Instruction errors in the past.

Digging through their github, I found this recent issue: Getting "Illegal instruction" error when running sharp.js in Docker / Node container. · Issue #3030 · lovell/sharp · GitHub

My server CPU is almost exactly the same as the one there (AMD Phenom II X6 1100T). The new sharp lib just doesn’t support it on Linux x64 apparently. Performance-wise it’s still more than enough, but I guess 10 years is considered a long time these days. Well, it’s been a loyal workhorse for a good while…

Edit: in case you get others wondering about similar issues with a CPU in the 10+ ageclass, you can refer to that, as well as this: version 0.28.0 and later is not compatible with Intel Wolfdale E5xxx CPU · Issue #2723 · lovell/sharp · GitHub . There are workarounds there as well, but I don’t know how much support you want to maintain for old hardware.

As a follow-up; I got some new hardware and indeed, the exact same setup now works (with the same db, etc.). So this was indeed a case of CPU incompatibility.

1 Like

I’ve just decided to finally upgrade my rchat deployment to the latest version after few months have passed since the last attempt. I did not have time to dig deep into the issue so I’ve decided to stick with 4.2.0 for the time being. Now that a warning popped up about a recent CVE I thought it’s high time I updated. Firing up a new container with 4.2.1/4.2.2/4.3.0 was unsuccessful, similar to OP, I was getting no error messages while starting the container, bash’ing into it revealed ‘illegal instruction (core dumped)’

My CPU is Phenom II X4 965 which does not support AVX, my question is: do I really have to upgrade to a newer CPU to continue using rocket.chat?

This is harsh… :frowning:

As far as I can tell, there doesn’t seem to be much else we can do as users. The linux x64 binaries for sharp won’t work with CPUs as old as the phenom II. It doesn’t seem like they’re interested in providing that support (it was intentionally removed), so there’s not a lot of options. I guess you could try manually replacing the sharp lib for an older one to buy more time, but it’s likely quite a hassle and not very maintainable either.

Unfortunately that’s something beyond our control.

Also consider that we’ll have more situations like this. For instance, Mongo 5.X, like sharp, also seems to have this constraint: