Server Memory Taking Off

Anybody have any idea what might cause this behavior?

App in production for years, sitting on Wappler version 6.8.0, Node version 16, on a 4CPU x 16GB droplet. DB is hosted separately on Aiven. Droplet CPU utilization sits below 8% during the heaviest of user loads and memory hovers stable at 10% for years. Then suddenly on November 3-4, with no recent code deployments, DB changes, etc memory heap takes off and the CPU has to start sawtoothing the GC.

I attempted a fresh redeploy, which truncated the memory but then it went right back to climbing at the same rate.

The cause looks like the FDs in the container are climbing. So far, GPT is just guessing that connection management is now gone awry (due to a dependency updating somewhere). Before I start randomly building custom routes to stub functionality, I was wondering if anyone had experienced this behavior before and what they did to circumvent and/or any other recommendations. Thanks all!

Maybe the issue is not coming from the app?

I use btop/htop to check all the process running on the droplet..
Had an issue with snapd but easily get rid of it..

So my recommendation, if you have linux running on the droplet you can use some process monitor and see which one is the problematic

Also, if you are using the resource manager in Wappler, you can install netdata:

Thanks @franse ! I do indeed have htop. All of the utilization is going to index.js:

Which is what caused me to have to dig deeper into the container. Inside the container, node is literally the only thing running:

If I observe the Fds (via ls -l /proc/1/fd) I can watch them slowly grow over time, which is likely a culprit.

Thanks for the tip on Netdata, I’m giving that a try now…

If you need more information about what is going on inside the app, I know there are some other tools like Sentry performance, or Datadog APM but not familiar to me..
Maybe someone else can join here and share some info, sorry for not being so helpful

1 Like

Thanks again. Unfortunately Netdata only revealed to me what was already apparent: huge GC heap sweeps every few minutes from Node.

The root question remains: somehow on the server, Node suddenly started leaking per-request/per-connection memory and sockets, causing the heap and RSS to grow until GC runs hard every few minutes (but memory utilization still grows out of control). If anyone’s got experience with this, I’m all ears!

It is strange that it just suddenly happens, if it was a memory leak in the code you probably would have noticed it sooner and it wouldn't just happen on a single day.

Check if you have any server logs from November 4 to see if you notice something strange in them. Perhaps a suspicious upload or strange requests to your server.

Debugging can be difficult, you can run node with --inspect and connect with devtools to the remote server. It is recommended not to open the debugging port publicly but instead connect using ssh.

Node.js — Debugging Node.js

Thanks @patrick! Inspired by this vid: https://www.youtube.com/watch?v=_OnTUIYxGRs, I edited the following lines within the prod target docker-compose yaml:

environment:
'NODE_ENV=production'
'NODE_OPTIONS=--max-old-space-size=12288
mem_limit: '13g'

(Both set to about 75% of the droplet memory)

And also update Node from 16 to v20. After redploy, CPU GC thrashed and memory climbed as seen, but upon reaching that 75% limit, everything went calm memory stabilized, and CPU throttles down to half of the new thrash level. So I’m now stable. Hence it was not a memory leak but rather GC suddenly going nuts + not having a clear memory max defined.

The mystery still remains on what suddenly caused the old/young-gen space to be insufficient, I added a custom route in order to log all API response times to docker logs, but all looked normal. According to GPT, it can be something as benign as a cloudflare or DB external adjustment that holds on to connections for simply an extra 50ms, in order to trigger young-gen to be insufficient for the stock NODE memory space settings.

This behavior is now considered a new normal.