Sudden 502 error

Description

Today suddenly rocket.chat stopped working with 502 server error. I presume it was caused by snap auto-refresh.
I tried solving it on my own: mongo showed errors similar to described here so I tried prescribed solution. I had to use mongo 3.6 but repair worked and those errors are now gone.
However server doesn’t work nonetheless and still shows me error 502.

mongo and caddy status seem okay:

sudo service snap.rocketchat-server.rocketchat-mongo status

snap.rocketchat-server.rocketchat-mongo.service - Service for snap application rocketchat-server.rocketchat-mongo
   Loaded: loaded (/etc/systemd/system/snap.rocketchat-server.rocketchat-mongo.service; enabled; vendor preset: enabled)
   Active: active (running) since Вт 2020-05-12 23:35:23 MSK; 15min ago
 Main PID: 1098 (sh)
    Tasks: 70
   Memory: 261.3M
      CPU: 7.759s
   CGroup: /system.slice/snap.rocketchat-server.rocketchat-mongo.service
           ├─1098 /bin/sh /snap/rocketchat-server/1427/bin/startmongo
           └─1297 mongod --bind_ip 127.0.0.1 --pidfilepath /var/snap/rocketchat-server/common/mongod.pid --smallfiles --journal --dbpath=/var/snap/rocketchat-server/common --replSet rs0

май 12 23:35:34 rocket rocketchat-server.rocketchat-mongo[1098]: 2020-05-12T23:35:34.745+0300 I REPL     [replexec-0] Starting replication reporter thread
май 12 23:35:34 rocket rocketchat-server.rocketchat-mongo[1098]: 2020-05-12T23:35:34.747+0300 I REPL     [rsSync] transition to SECONDARY from RECOVERING
май 12 23:35:34 rocket rocketchat-server.rocketchat-mongo[1098]: 2020-05-12T23:35:34.747+0300 I REPL     [rsSync] conducting a dry run election to see if we could be elected. current term: 41
май 12 23:35:34 rocket rocketchat-server.rocketchat-mongo[1098]: 2020-05-12T23:35:34.747+0300 I REPL     [replexec-0] dry election run succeeded, running for election in term 42
май 12 23:35:34 rocket rocketchat-server.rocketchat-mongo[1098]: 2020-05-12T23:35:34.748+0300 I REPL     [replexec-0] election succeeded, assuming primary role in term 42
май 12 23:35:34 rocket rocketchat-server.rocketchat-mongo[1098]: 2020-05-12T23:35:34.749+0300 I REPL     [replexec-0] transition to PRIMARY from SECONDARY
май 12 23:35:34 rocket rocketchat-server.rocketchat-mongo[1098]: 2020-05-12T23:35:34.749+0300 I REPL     [replexec-0] Resetting sync source to empty, which was :27017
май 12 23:35:34 rocket rocketchat-server.rocketchat-mongo[1098]: 2020-05-12T23:35:34.749+0300 I REPL     [replexec-0] Entering primary catch-up mode.
май 12 23:35:34 rocket rocketchat-server.rocketchat-mongo[1098]: 2020-05-12T23:35:34.750+0300 I REPL     [replexec-0] Exited primary catch-up mode.
май 12 23:35:36 rocket rocketchat-server.rocketchat-mongo[1098]: 2020-05-12T23:35:36.749+0300 I REPL     [rsSync] transition to primary complete; database writes are now permitted

sudo service snap.rocketchat-server.rocketchat-caddy status

● snap.rocketchat-server.rocketchat-caddy.service - Service for snap application rocketchat-server.rocketchat-caddy
   Loaded: loaded (/etc/systemd/system/snap.rocketchat-server.rocketchat-caddy.service; enabled; vendor preset: enabled)
   Active: active (running) since Вт 2020-05-12 23:35:23 MSK; 16min ago
 Main PID: 1105 (caddy)
    Tasks: 10
   Memory: 29.4M
      CPU: 1.632s
   CGroup: /system.slice/snap.rocketchat-server.rocketchat-caddy.service
           └─1105 caddy -conf=/var/snap/rocketchat-server/1427/Caddyfile

май 12 23:35:23 rocket systemd[1]: Started Service for snap application rocketchat-server.rocketchat-caddy.
май 12 23:35:25 rocket rocketchat-server.rocketchat-caddy[1105]: Activating privacy features... done.
май 12 23:35:25 rocket rocketchat-server.rocketchat-caddy[1105]: Serving HTTPS on port 443
май 12 23:35:25 rocket rocketchat-server.rocketchat-caddy[1105]: https://rocket.rtc.ru
май 12 23:35:25 rocket rocketchat-server.rocketchat-caddy[1105]: Serving HTTP on port 80
май 12 23:35:25 rocket rocketchat-server.rocketchat-caddy[1105]: http://rocket.rtc.ru
май 12 23:35:25 rocket rocketchat-server.rocketchat-caddy[1105]: WARNING: File descriptor limit 1024 is too low for production servers. At least 8192 is recommended. Fix with `ulimit -n 8192`.

However rocketchat-server is failed:

sudo service snap.rocketchat-server.rocketchat-server status

● snap.rocketchat-server.rocketchat-server.service - Service for snap application rocketchat-server.rocketchat-server
   Loaded: loaded (/etc/systemd/system/snap.rocketchat-server.rocketchat-server.service; enabled; vendor preset: enabled)
   Active: failed (Result: start-limit-hit) since Ср 2020-05-13 00:03:51 MSK; 2s ago
  Process: 7051 ExecStart=/usr/bin/snap run rocketchat-server (code=exited, status=1/FAILURE)
 Main PID: 7051 (code=exited, status=1/FAILURE)

май 13 00:03:51 rocket systemd[1]: snap.rocketchat-server.rocketchat-server.service: Unit entered failed state.
май 13 00:03:51 rocket systemd[1]: snap.rocketchat-server.rocketchat-server.service: Failed with result 'exit-code'.
май 13 00:03:51 rocket systemd[1]: snap.rocketchat-server.rocketchat-server.service: Service hold-off time over, scheduling restart.
май 13 00:03:51 rocket systemd[1]: Stopped Service for snap application rocketchat-server.rocketchat-server.
май 13 00:03:51 rocket systemd[1]: snap.rocketchat-server.rocketchat-server.service: Start request repeated too quickly.
май 13 00:03:51 rocket systemd[1]: Failed to start Service for snap application rocketchat-server.rocketchat-server.
май 13 00:03:51 rocket systemd[1]: snap.rocketchat-server.rocketchat-server.service: Unit entered failed state.
май 13 00:03:51 rocket systemd[1]: snap.rocketchat-server.rocketchat-server.service: Failed with result 'start-limit-hit'.

Trying to look in the logs shows almost nothing:

sudo journalctl -f -u snap.rocketchat-server.rocketchat-server

-- Logs begin at Вт 2020-05-12 23:35:21 MSK. --
май 13 00:03:51 rocket rocketchat-server.rocketchat-server[7051]:     at tryModuleLoad (module.js:506:12)
май 13 00:03:51 rocket systemd[1]: snap.rocketchat-server.rocketchat-server.service: Main process exited, code=exited, status=1/FAILURE
май 13 00:03:51 rocket systemd[1]: snap.rocketchat-server.rocketchat-server.service: Unit entered failed state.
май 13 00:03:51 rocket systemd[1]: snap.rocketchat-server.rocketchat-server.service: Failed with result 'exit-code'.
май 13 00:03:51 rocket systemd[1]: snap.rocketchat-server.rocketchat-server.service: Service hold-off time over, scheduling restart.
май 13 00:03:51 rocket systemd[1]: Stopped Service for snap application rocketchat-server.rocketchat-server.
май 13 00:03:51 rocket systemd[1]: snap.rocketchat-server.rocketchat-server.service: Start request repeated too quickly.
май 13 00:03:51 rocket systemd[1]: Failed to start Service for snap application rocketchat-server.rocketchat-server.
май 13 00:03:51 rocket systemd[1]: snap.rocketchat-server.rocketchat-server.service: Unit entered failed state.
май 13 00:03:51 rocket systemd[1]: snap.rocketchat-server.rocketchat-server.service: Failed with result 'start-limit-hit'.

I tried doing snap refresh to 2.4.12 and snap revert to 2.4.11 a couple of times and on the 2.4.12 version stumbled upon this error (but wasn’t able to see it fully since than, looks like logs were cut… somehow)

Still, one line at tryModuleLoad (module.js:506:12) matches so I guess that’s the cause:

-- Logs begin at Вт 2020-05-12 22:49:56 MSK, end at Вт 2020-05-12 22:53:21 MSK. --
май 12 22:49:59 rocket systemd[1]: Started Service for snap application rocketchat-server.rocketchat-server.
май 12 22:50:00 rocket rocketchat-server.rocketchat-server[1066]: Mongo is not available, can't start. Waiting 10 seconds and trying again
май 12 22:50:10 rocket rocketchat-server.rocketchat-server[1066]: Checking if oplog has been enabled, and enabling if not
май 12 22:50:11 rocket rocketchat-server.rocketchat-server[1066]: MongoDB shell version v3.4.20
май 12 22:50:11 rocket rocketchat-server.rocketchat-server[1066]: connecting to: mongodb://127.0.0.1:27017
май 12 22:50:11 rocket rocketchat-server.rocketchat-server[1066]: MongoDB server version: 3.4.20
май 12 22:50:13 rocket rocketchat-server.rocketchat-server[1066]: fs.js:646
май 12 22:50:13 rocket rocketchat-server.rocketchat-server[1066]:   return binding.open(pathModule._makeLong(path), stringToFlags(flags), mode);
май 12 22:50:13 rocket rocketchat-server.rocketchat-server[1066]:                  ^
май 12 22:50:13 rocket rocketchat-server.rocketchat-server[1066]: Error: ENOENT: no such file or directory, open '/snap/rocketchat-server/1436/star.json'
май 12 22:50:13 rocket rocketchat-server.rocketchat-server[1066]:     at Object.fs.openSync (fs.js:646:18)
май 12 22:50:13 rocket rocketchat-server.rocketchat-server[1066]:     at Object.fs.readFileSync (fs.js:551:33)
май 12 22:50:13 rocket rocketchat-server.rocketchat-server[1066]:     at Object.<anonymous> (/snap/rocketchat-server/1436/programs/server/boot.js:38:32)
май 12 22:50:13 rocket rocketchat-server.rocketchat-server[1066]:     at Object.<anonymous> (/snap/rocketchat-server/1436/programs/server/boot.js:466:3)
май 12 22:50:13 rocket rocketchat-server.rocketchat-server[1066]:     at Module._compile (module.js:653:30)
май 12 22:50:13 rocket rocketchat-server.rocketchat-server[1066]:     at Module.Mp._compile (/snap/rocketchat-server/1436/programs/server/runtime.js:50:23)
май 12 22:50:13 rocket rocketchat-server.rocketchat-server[1066]:     at Object.Module._extensions..js (module.js:664:10)
май 12 22:50:13 rocket rocketchat-server.rocketchat-server[1066]:     at Module.load (module.js:566:32)
май 12 22:50:13 rocket rocketchat-server.rocketchat-server[1066]:     at Module.Mp.load (/snap/rocketchat-server/1436/programs/server/runtime.js:15:31)
май 12 22:50:13 rocket rocketchat-server.rocketchat-server[1066]:     at tryModuleLoad (module.js:506:12)
май 12 22:50:13 rocket systemd[1]: e[0;1;39msnap.rocketchat-server.rocketchat-server.service: Main process exited, code=exited, status=1/FAILURE
май 12 22:50:13 rocket systemd[1]: e[0;1;39msnap.rocketchat-server.rocketchat-server.service: Unit entered failed state.
май 12 22:50:13 rocket systemd[1]: e[0;1;39msnap.rocketchat-server.rocketchat-server.service: Failed with result 'exit-code'.
май 12 22:50:13 rocket systemd[1]: snap.rocketchat-server.rocketchat-server.service: Service hold-off time over, scheduling restart.
май 12 22:50:13 rocket systemd[1]: Stopped Service for snap application rocketchat-server.rocketchat-server.
май 12 22:50:13 rocket systemd[1]: Started Service for snap application rocketchat-server.rocketchat-server.
май 12 22:50:13 rocket rocketchat-server.rocketchat-server[1650]: Checking if oplog has been enabled, and enabling if not
май 12 22:50:13 rocket rocketchat-server.rocketchat-server[1650]: MongoDB shell version v3.4.20
май 12 22:50:13 rocket rocketchat-server.rocketchat-server[1650]: connecting to: mongodb://127.0.0.1:27017
май 12 22:50:13 rocket rocketchat-server.rocketchat-server[1650]: MongoDB server version: 3.4.20
май 12 22:50:14 rocket rocketchat-server.rocketchat-server[1650]: fs.js:646

When I’m on 2.4.11, server status is inactive with no obvious reason:

sudo service snap.rocketchat-server.rocketchat-server status

● snap.rocketchat-server.rocketchat-server.service - Service for snap application rocketchat-server.rocketchat-server
   Loaded: loaded (/etc/systemd/system/snap.rocketchat-server.rocketchat-server.service; enabled; vendor preset: enabled)
   Active: inactive (dead) since Вт 2020-05-12 23:04:52 MSK; 3min 48s ago
  Process: 1104 ExecStart=/usr/bin/snap run rocketchat-server (code=exited, status=0/SUCCESS)
 Main PID: 1104 (code=exited, status=0/SUCCESS)

май 12 23:04:50 rocket systemd[1]: Started Service for snap application rocketchat-server.rocketchat-server.

logs are almost empty

sudo journalctl -f -u snap.rocketchat-server.rocketchat-server

-- Logs begin at Вт 2020-05-12 23:35:21 MSK. --
май 12 23:35:23 rocket systemd[1]: Started Service for snap application rocketchat-server.rocketchat-server.

And that’s it.

Server Setup Information

  • Version of Rocket.Chat Server: 2.4.11 or 2.4.12
  • Operating System: Ubuntu 16.04 LTS
  • Deployment Method: snap
  • Number of Running Instances: 1
  • DB Replicaset Oplog: ?
  • NodeJS Version: ?
  • MongoDB Version: 3.4.2
  • Proxy: not sure
  • Firewalls involved: not sure

We have the same issue. I have tried

sudo snap revert rocketchat-server

but it is still not working. The Mongo service will now run but the server will not.

sudo service snap.rocketchat-server.rocketchat-server status

   Loaded: loaded (/etc/systemd/system/snap.rocketchat-server.rocketchat-server.service; enabled; vendor preset: enabled)
   Active: inactive (dead) since Tue 2020-05-12 21:19:23 UTC; 3s ago
  Process: 10381 ExecStart=/usr/bin/snap run rocketchat-server (code=exited, status=0/SUCCESS)
 Main PID: 10381 (code=exited, status=0/SUCCESS)

sudo service snap.rocketchat-server.rocketchat-mongo status

   Loaded: loaded (/etc/systemd/system/snap.rocketchat-server.rocketchat-mongo.service; enabled; vendor preset: enabled)
   Active: active (running) since Tue 2020-05-12 20:59:40 UTC; 24min ago

sudo service snap.rocketchat-server.rocketchat-caddy status

   Loaded: loaded (/etc/systemd/system/snap.rocketchat-server.rocketchat-caddy.service; enabled; vendor preset: enabled)
   Active: active (running) since Tue 2020-05-12 21:24:20 UTC; 2s ago

I see now that sudo snap refresh rocketchat-server is coming up with snap "rocketchat-server" has no updates available. Is someone making changes to the latest snap that was released?

same here but our mongo dont work

Same here. Ubuntu 18, snap rocketchat-server 2.4.12 , error in mongodb startup

txn-recover: unsupported WiredTiger file version: this build only supports major/minor versions up to 1/0, and the file is version 2/0: WT_ERROR: non-specific WiredTiger error

Did not try to revert to previous version. All services down :frowning:

I was able to repair mongo errors as suggested here but apparently that’s not enough, so I’m not advising it.

OK this means I am not crazy. Glad to see others are having the same problem and its not just me. Also looks like 2.4.12 has been pulled from snap as of minutes ago.

1 Like

“not enough”… What happened after you did the repair? Mongo didn’t start?

if you are not 100% sure what to do , you can spend 1000 of hours in that …
i just hope for the team

… my Company just start to love this Server :sob:

Mongo did start, but rocketchat-server is now failing to open some file or something like that.

thats what i mean … its impossible

propably its an “Upgrade” and our server got that

You need to revert to 2.4.11. You can find the fix here: https://github.com/RocketChat/Rocket.Chat/issues/17628

These are the steps to revert to previous version (From ubuntu-snap channel):

snap revert rocketchat-server # Revert to 2.4.11
sudo snap run --shell rocketchat-server
snapctl get snap-refreshing #which returned "true"
snapctl set snap-refreshing=false
exit

Then I restarted the service:

sudo systemctl restart snap.rocketchat-server.rocketchat-server

I freaked out for 3 hours trying to fix my server. All good now!

3 Likes

Hhmmm… but other users above did try a revert and didn’t succeed, right?

Correct, reverted to 2.4.11 and logs show everything started up, however rocketchat.server will not go “active” shown in “snap info rocketchat-server”. Nothing else shows up in the logs that would mention a problem.

Hmm - I did just this and it came right. Weird, I have been fighting this issue for hours and thought I had done this or something similar but obviously not. Anyway happy to have my users back on and everyone off my back for now.

Worked for me as well! Thank you!

I’m glad to see there is a fix that doesn’t require a restore, however since we couldn’t find anyone else having the issue and I had a snap rocket.chat backup that occurred prior to the upgrade from 2.4.11 to 2.4.12, I did a revert and then a restore to fix our issue. However, one other thing I did prior to starting the rocketchat-server service was to change the channel to 3.x/stable. Then once I started the service, I did a refresh to bring the server up to 3.2.2 to keep it from trying to update to 2.4.12 again.

Set backups to occur before refresh:
sudo snap set rocketchat-server backup-on-refresh=enable

1 Like

A refresh should now also solve. We have reverted the revision at the store level too. I’d recommend looking at Introducing SNAP Tracks! also to lock into a track. Cause when we goto 3.x we want to prevent this… but could happen. We want to make sure everyone has a chance to backup prior to a big upgrade

Reverting for me did not work, but switching channel to 3.x/stable and refreshing worked perfectly. Thank you!

I was in the exact same boat. Thanks a ton!

Today when trying to fix the 502 on our server, I saved my snap:

image

After some failed attemps to fix it, I want to restore it the saved snap, I ran “sudo snap restore 1”. It takes around 15 minutes processing, but when it finishes, nothing happens.

If I ran sudo snap list after running restore, I only have core:

Name Version Rev Tracking Publisher Notes
core 16-2.44.3 9066 latest/stable canonical✓ core

What should be the process to restore the saved snapshot? So I can revert to 2.4.11