Sudden 100% CPU usage causing non responsive chat till service restart


#22

More information if its helpful.

I have changed the storage engine back to mmapv1 as i saw some posts saying it is preferred.

Still having the same issues. Even with just a single user logged in and slackbridge disabled, it still goes to 100% and eventually fails and the containers restart.


#23

I think there must be bad data stored in the users table thats triggering this issue. Is there a cleanup routine or a data validation i can run to clear bad data ? It seems only that table thats causing issues as far as i can tell


#24

Wow that’s some serious wait time on a query. Can you grab the user object it’s delaying on? I know might be a pain to redact. Is it a random Id each time or is it random?


#25

Well, i think i have found the issue so here’s something we can script to fix if someone else has it.

In each user document, in the service column, there was a 28MB (text version) for a single user. The login sessions (all without dates) were huge. I deleted these for 2 users (and am cleaning up others) and its sooo much faster now.

I presume theres a cleanup routine running to delete old sessions but without dates they will be ignored.

I am not a mongo person but will look at writing a script to delete all sessions where there is no date.

I am also assuming that this has come from an old version and there probably should be a published routine or something to clean some of this up.

I think the logs I had were just pointing at the table causing the issues and locks, not the actual row so the logs weren’t entirely helpful.

Gotta say that I now know far more about mongo db than i thought i needed originally :slight_smile: