Description:
We are encountering a critical issue with our Rocket.Chat instance running in a Docker Compose setup. The service becomes unresponsive on port 3000 shortly after startup, typically within 30-40 seconds. Despite high CPU usage and various troubleshooting steps, the root cause remains unclear.
Steps to Reproduce:
- Start the Rocket.Chat service using Docker Compose.
- Wait for approximately 30-40 seconds after the service becomes available on port 3000.
- Observe that Rocket.Chat stops responding while CPU usage spikes to 100%.
Observed Behavior:
- Rocket.Chat is functional for a brief period (30-40 seconds) after startup and then stops responding.
- The container consistently shows 100% CPU usage on one core, with normal memory usage.
- No significant errors are logged in Rocket.Chat or Docker logs. The logs primarily show routine startup messages.
- The MongoDB instance connected to Rocket.Chat shows occasional slow queries, but nothing conclusively related to the issue.
Versions Affected:
- Rocket.Chat version 6.10.1 (initial issue observed)
- Rocket.Chat version 6.11.1 (upgraded, issue persisted)
- Rocket.Chat version 6.10.4 (downgraded, issue persisted)
Troubleshooting Steps Taken:
-
Version Upgrades/Downgrades:
- Initially observed on version 6.10.1.
- Upgraded to version 6.11.1; issue persisted.
- Downgraded to version 6.10.4; issue persisted.
-
Node.js Memory Allocation:
- Adjusted Docker Compose to include the following Node.js flags:
--max-old-space-size=4096
--optimize-for-size
--gc-interval=100
- Issue persisted with no significant change in behavior.
- Adjusted Docker Compose to include the following Node.js flags:
-
Database Monitoring:
- Executed
db.currentOp()
anddb.serverStatus()
in MongoDB. - Monitored for slow queries or locking issues that could cause the unresponsiveness.
- MongoDB logs indicated slow queries, but nothing definitive.
- Executed
-
Debug Logging:
- Enabled full debug logging in Rocket.Chat (
DEBUG=*
), but no actionable information was produced. - Removed
node_trace.1.log
, which previously contained repetitive V8 GC operations likeV8.GCIncrementalMarkingLayoutChange
.
- Enabled full debug logging in Rocket.Chat (
-
Resource Monitoring:
- Monitored CPU and memory usage via
docker stats
. - Observed persistent 100% CPU usage by Rocket.Chat, with no significant memory spikes.
- Monitored CPU and memory usage via
Logs of Interest:
- Rocket.Chat Logs:
LocalStore: store created at {"level":40,"time":"...","pid":1,"hostname":"...","name":"VoIPAsteriskService","msg":"Voip is not enabled. Cant start the service"} ... +-----------------------------------------------+ | SERVER RUNNING | | Rocket.Chat Version: 6.10.4 | | NodeJS Version: 14.21.3 - x64 | | MongoDB Version: 6.0.13 | | Platform: linux | +-----------------------------------------------+ {"level":30,"time":"...","pid":1,"hostname":"...","name":"VersionCheck","msg":"Checking for version updates"}
- MongoDB Logs:
{"t":{"$date":"..."},"s":"I", "c":"COMMAND", "id":51803, "ctx":"conn84","msg":"Slow query","attr":{"type":"command","ns":"rocketchat.$cmd"}} ...
Request for Help:
We are seeking guidance on further troubleshooting steps or solutions to resolve this issue where Rocket.Chat becomes unresponsive shortly after startup. Despite various versions (6.10.1, 6.11.1, and 6.10.4) and extensive troubleshooting, the problem persists without any clear error messages. Any insights or suggestions on how to fix this would be greatly appreciated.
Server Setup Information:
- Version of Rocket.Chat Server: 6.10.4 (also tested 6.10.1 and 6.11.1)
- License Type: Community
- Number of Users: Around 50
- Operating System: Ubuntu 22.04
- Deployment Method: Docker
- Number of Running Instances: 1
- DB Replicaset Oplog: Unknown
- NodeJS Version: 14.21.3
- MongoDB Version: 6.0.13