Migrating 3.18.7 from snap to Docker loses all uploads (GridFS)

Description

We have a snap installation of RocketChat which was running nicely over the past years (a few hundred users) and I need to move it to our Docker swarm. The snap installation is 3.18.7 and I started with the described procedure: stopping it, creating a mongodb-backup, launching a Docker with the very same version, stopping the RocketChat service, restoring the database etc. and then went on with the following upgrade path

3.18.7 -> 4.0.0 -> 4.8.6 -> 5.0.0 -> 5.4.3

With this Docker setup:

services:
  rocketchat:
    <<: *default-opts
    image: registry.rocket.chat/rocketchat/rocket.chat:3.18.7
    environment:
      MONGO_URL: "mongodb://mongodb:27017/rocketchat?replicaSet=rs0"
      MONGO_OPLOG_URL: "mongodb://mongodb:27017/local?replicaSet=rs0"

It worked nicely (I thought) but now I realise that all the uploads are missing. They show a “Retry” button and that’s it.

I restarted the process and the uploads are already lost when moving 3.8.17 (snap) to 3.8.17 (Docker).

Btw. the original snap installation used MongoDB 3.6.14 / wiredTiger (oplog Enabled) and in the new Docker system I have to use MongoDB 4.4.15 / wiredTiger (oplog Enabled) since I was unable to get v3.6 running.

Is that maybe the problem?

Should I try changing the file upload system to “FileSystem” and then migrate? I only found this “workaround” Question: Howto Migrate the files from Gridfs to Filesystem? · Issue #9349 · RocketChat/Rocket.Chat · GitHub and not sure if it’s recommended.

This is the mongodb Docker service setup:

  mongodb:
    <<: *default-opts
    image: docker.io/bitnami/mongodb:4.4
    environment:
      MONGODB_REPLICA_SET_MODE: primary
      MONGODB_REPLICA_SET_NAME: rs0
      MONGODB_PORT_NUMBER: 27017
      MONGODB_INITIAL_PRIMARY_HOST: mongodb
      MONGODB_INITIAL_PRIMARY_PORT_NUMBER: 27017
      MONGODB_ADVERTISED_HOSTNAME: mongodb
      MONGODB_ENABLE_JOURNAL: "true"

I thought my problem is similar to Migrate self hosted server from 3.18 snap to 5.0 docker based installation but I am using “GridFS” on the old system and have not set it to “Filesystem” yet.

Also all the migrations were reported to be successful during the update process.

I also found another issue regarding a changed URL redirect rule as reported in After update to 3.11.0, embedded images appear with "Retry" - Rocket.Chat - EGroupware Help Forum but I am using the “original” Docker image, so I don’t think it’s related.

All in all, I rolled back the test migration and started another one, so now I only went from 3.18.7 (snap) to 3.18.7 (Docker) and everything is fine but the uploads are missing:

I also don’t see any errors neither in the Browser, nor in the RocketChat logs. Here are the last few lines:

I20230313-12:32:17.619(0) Setting default file store to GridFS 
I20230313-12:32:17.701(0) LocalStore: store created at  
I20230313-12:32:17.702(0) LocalStore: store created at  
I20230313-12:32:17.702(0) LocalStore: store created at  
I20230313-12:32:22.564(0) (migrations.js:120) Migrations: Not migrating, already at version 232
I20230313-12:32:22.719(0) ufs: temp directory created at "/tmp/ufs" 
I20230313-12:32:25.769(0) Loaded the Apps Framework and loaded a total of 0 Apps! 
I20230313-12:32:25.877(0) Using GridFS for custom sounds storage 
I20230313-12:32:25.880(0) Using GridFS for custom emoji storage 
I20230313-12:32:26.211(0) Updating process.env.MAIL_URL 
I20230313-12:32:26.811(0) âž” server.js:204 System âž” startup 
I20230313-12:32:26.811(0) âž” +-----------------------------------------------+ 
I20230313-12:32:26.812(0) âž” |                 SERVER RUNNING                | 
I20230313-12:32:26.812(0) âž” +-----------------------------------------------+ 
I20230313-12:32:26.812(0) âž” |                                               | 
I20230313-12:32:26.812(0) âž” |  Rocket.Chat Version: 3.18.7                  | 
I20230313-12:32:26.812(0) âž” |       NodeJS Version: 12.22.1 - x64           | 
I20230313-12:32:26.812(0) âž” |      MongoDB Version: 4.4.15                  | 
I20230313-12:32:26.813(0) âž” |       MongoDB Engine: wiredTiger              | 
I20230313-12:32:26.813(0) âž” |             Platform: linux                   | 
I20230313-12:32:26.813(0) âž” |         Process Port: 3000                    | 
I20230313-12:32:26.813(0) âž” |             Site URL: https://chat.ecap.work  | 
I20230313-12:32:26.813(0) âž” |     ReplicaSet OpLog: Enabled                 | 
I20230313-12:32:26.813(0) âž” |          Commit Hash: 660c9f5e89              | 
I20230313-12:32:26.813(0) âž” |        Commit Branch: HEAD                    | 
I20230313-12:32:26.814(0) âž” |                                               | 
I20230313-12:32:26.814(0) âž” +-----------------------------------------------+ 

Server Setup Information

  • Version of Rocket.Chat Server: 3.18.7 → 3.18.7
  • Operating System: Debian Buster
  • Deployment Method: snap → Docker
  • Number of Running Instances: 1
  • DB Replicaset Oplog: 1
  • NodeJS Version: v12.22.1
  • MongoDB Version: 3.6.14 wiredTiger (oplog Enabled) → 4.4.15 wiredTiger (oplog Enabled)
  • Proxy: apache
  • Firewalls involved: yes but unrleaded. HAProxy is front of the service with SSL termination and the following config:
backend be_chat.ECAP
    balance roundrobin
    cookie SERVERID insert indirect
    option httpchk GET /
    default-server check maxconn 20
    server-template ecap-chat- 1 ecap-chat_rocketchat:3000 check resolvers docker init-addr libc,none

I tried the migrate.py script from GitHub - arminfelder/gridfsmigrate: RocketChat GridFS to filesytem migration script on the migrated database, just to not mess with the existing installation and I had to modify it since it crashed after around 2500 files (we have a total of 18000 uploads).

Here you can see my modified script which starts to spit out errors coming from gridfs.errors.CorruptGridFile: no chunk #0. Maybe that’s the reason why the uploads are missing :wink:

2430. Dumping SCdbqpWAvqwm3FQjw lecture_3_iact.pdf
2431. Dumping JGsJHxBM4Md5EDTgw lecture_2_intro-gamma.pdf
2432. Dumping Pf9LeqxEg6frMyCdS lecture_1_intro.pdf
CORRUPT FILE: 2432 7yy7wX9xTXjkTDmqJ Tuerschilde.pdf
CORRUPT FILE: 2433 7uApj2p96CypTi58B Tuerschilde.pdf
CORRUPT FILE: 2434 w2aNEqEqxiuExhPjG Bachelorarbeit_FAU_Physik_Final.pdf
CORRUPT FILE: 2435 QNev8xWqECKQWCnSi Präsentation_Wohlleben.pdf
...
...
...
CORRUPT FILE: 18364 anYWp7rTtWy3BXyNj Screenshot_20230224-225042_Maps.jpg
CORRUPT FILE: 18365 RuXAobQLEudp8J3ww thumb-Screenshot_20230224-225042_Maps.jpg

Hi!

Can you make sure the request of the “retry” image is the same one that you are expecting?

Maybe you are running on a new server, and the server has localhost configured, so it will try to look for that image at localhost, instead of the SITE_URL (Workspace > Settings > General > SITE_URL)

Let me know if this helps :slight_smile:

Yes sure, thanks for the quick response!

It definitely looks like file corruption. I replaced our URL with CORRECT_DOMAIN_URL.de in the browser logs below, that’s what I get when I hit the “Retry” button. I get an empty file…

Summary
URL: https://CORRECT_DOMAIN_URL.de/file-upload/cL4rretDypy7Jffip/Clipboard%20-%2019.%20Februar%202023%2019:52
Status: 200 OK
Source: Network
Address: SOME_IP:443
Initiator: 
36d53d95895b64e18e0f537a585d957d3432874d.js:568:3574


Request
GET /file-upload/cL4rretDypy7Jffip/Clipboard%20-%2019.%20Februar%202023%2019:52 HTTP/1.1
Cookie: rc_token=ZiueVz-AqKXvq1_ZXUSwqZ2zXIYirUNQMFSdg6wJKv7; rc_uid=spdavPaBmLN8T3KJg
Accept: image/webp,image/avif,video/*;q=0.8,image/png,image/svg+xml,image/*;q=0.8,*/*;q=0.5
Accept-Encoding: gzip, deflate, br
Host: chat.test.ecap.work
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.3 Safari/605.1.15
Accept-Language: en-GB,en;q=0.9
Referer: https://CORRECT_DOMAIN_URL.de/group/admins
Connection: keep-alive

Response
HTTP/1.1 200 OK
Content-Type: image/png
Content-Security-Policy: default-src 'none'
Content-Disposition: attachment; filename*=UTF-8''thumb-Clipboard%20-%2019.%20Februar%202023%2019%3A52.png
X-XSS-Protection: 1
Content-Encoding: gzip
Transfer-Encoding: Identity
Set-Cookie: SERVERID=; Expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/
Cache-Control: max-age=31536000
Date: Mon, 13 Mar 2023 20:07:46 GMT
Connection: close
X-Content-Type-Options: nosniff
Last-Modified: Sun, 19 Feb 2023 18:52:29 GMT
x-instance-id: 4CBkZNJ4P6zg9dGqo

I now managed to extract all the uploads on the old server with a modified version of the Python script. I had some issues with unicode chars in the filenames, not sure if that’s related. Anyways, now I am trying to somehow import those in the Docker instance, let’s see how that works out. Tomorrow morning is the maintenance window I announced, so the clock is ticking :laughing:

One thing you could do is dumping only the rocketchat_uploads* collections and importing them again.

:thinking:

Got it! Migrated the dumped files with the migrate.py script on the new instance.

Screenshot 2023-03-13 at 21.44.09

and voila, the files appear :slight_smile:

Screenshot 2023-03-13 at 21.43.08

Now I will go through the upgrade path and see if everything works.

Ah well, I wanted to switch to FileSystem anyways, so that’s fine. The next person who fights this problem can try the other solution if they want :slight_smile:

Anyways, thanks for the feedback!

1 Like

Awesome.

Thanks for sharing your findings :slight_smile: