When node.js is the wrong tool for the job

JavaScript is my first programming languages and I haven’t found the need to expand my programming skills to more, mostly because you can do everything with node.js and JavaScript. However, after recently building a high-volume service for a client, I’ve, for the first time ever, found myself wishing I didn’t write the service in node.js, though my client doesn’t mind.

When is node.js right for the job?

Why node.js is perfect for simple CRUD apps when there is also a frontend, i.e. if it’s a React or Angular app. Having the entire stack written in JavaScript is great for the team — frontend developers can quickly investigate and, more importantly, understand the backend if needed. More people know JavaScript than any other language, so almost any available engineer can help when more hands are needed.

Another factor is speed. If your entire team has to learn Go or Rust to write a Go or Rust service, then it’s obviously going to take longer to develop than a node.js app (assuming everyone knows JavaScript). Sure, the Go or Rust version might be 2x as fast, but it might also be 2x as slow to develop, which may cost the company more than 2x as many servers.

I would really only recommend the modern node.js. Years ago before ES6 Promises were accepted by the community, I contributed heavily to node.js modules because a lot of them were simply broken; node.js was the wild west. Nowadays, the community and the modules have matured immensely and I am very grateful for that.

When is node.js wrong for the job?

Unlike services I usually work with, there is no front end to this app. Thus, JavaScript as a common language with front end developers has no benefit. The previous service was an nginx server with gigantic configuration files, but it was not as maintainable and dynamic as they would like.

In-memory Caching

On each server, rules are retrieved from Redis and cached in-memory using an LRU-cache. As node.js is not multi-threaded, we spin up 4 instances of node.js per server, 1 instance per CPU core. Thus, we cache in-memory 4 times per server. This is a waste of memory!

We cache a geoip database in memory. The file itself is only about 60mb, but because we have to cache the database 4 times, we end up using closer to 240mb. In my experience, my node.js processes have never been more than 120mb when there’s no in-memory caching. This service uses upwards of 400mb per process.

The end result is that node.js uses a lot more memory than required. This never actually because an issue for us as we rearchitected before it became an issue and because most servers actually have too much RAM for node.js servers.

Reaching Bandwidth Limits

Operations started adding rules with 100,000s of domains, which caused a single set of rules to be about 10mb large. As we cached these rule sets in memory for about 30 seconds, each Redis cluster would see about 10 servers * 4 cores * (10mb / 30 seconds) * (8B / 1b) = 106MB/s of bandwidth usage per rule set. This was above the 100MB/s bandwidth limit of a standard Azure Redis cluster, so Azure started throttling the server, causing latencies to spike.

If we weren’t using node.js, we could cut this bandwidth by 4 as there would only be one connection to the Redis cluster retrieving rule sets, not 4 (1 for each node.js process). We would’ve probably hit this limit eventually with any language, but we hit it a lot faster with node.js.

So we broke up our rules so that certain rules were Redis sets. This lowered bandwidth usage, but increased latency as each request may require Redis calls and the average latency to an Azure Redis server from a Azure Web App is about 7ms. Prior, no Redis calls were necessary on a per-request basis.

Processing medium-sized datasets

What makes this worse in node.js is that this JSON.parse() would occur 4 times, once every 30 seconds per event loop. This is not an issue with multi-threaded languages as this processing can be done on a completely separate thread and once per server instead of per process.

We’ve moved this logic into a worker that breaks up rules into Redis sets. We sees this event loop blocking, but by making it a worker and running it within a separate process via child_process.fork(), we no longer run into this issue.

Excessive socket usage

We have a lot of external requests setup, but the ones that made external request on every HTTP request we received were:

There are a few that we batch, but still call more than once a second:

  • AWS Kinesis Firehose
  • InfluxDB

For every request we receive, you can expect a few calls of HTTP requests. Once we started proxying HTTP requests, which are additional to the above requests, we quickly ran into this socket limit. The reason was that node.js, by default, does not pool connections. Every HTTP request was creating a new socket. We fixed it by setting keepAlive=true on the globalAgent:

However, we also needed to set how many sockets we pooled for each host. How do you figure that out? Well, there’s 8,192 sockets per server, which means 2,048 sockets per node.js process. How many hosts do we have? At least 8 — 4 for all our analytics and another 4 for the current approximate amount of proxied domains. 2,048 / 8 = 256 max sockets per domain per node.js process.

But we still were getting errors. I guess I was wrong! So we lowered it by trial and error until we didn’t receive anymore errors. That number happened to be 128 sockets, but we expect to lower that number once we ramp up.

The issue with node.js here is that we had to divide 8,192/4 since we’re spinning up 4 node.js processes per server. If node.js was a multi-threaded process, we wouldn’t have to do such complicated math and trial and error as 256 sockets per host would’ve probably worked.

Development Speed

This churn would have made development a lot slower if we used a different language, specifically because the only other language this adtech team was experienced with is ActionScript.

Conclusion

Above all, what matters more to an engineer is delivering. There’s no need to make optimized products if your product is never delivered. Hitting the market early with your product and getting product feedback from real customers is more important than fast code.

--

--

Jonathan Ong

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store