How old our servers are (as of 2026)
Back in 2022, I wrote about how old our servers were at the time, partly because they're older than you might expect, and today I want to update that with our current situation. My group handles the general departmental infrastructure for the research side of the department (the teaching side is a different group), and we've tended to keep servers for quite a while. Research groups are a different matter; they often have much more modern servers and turn them over much faster.
As in past installments, our normal servers remain Dell 1U servers. What we consider our current generation are Dell R350s, which it looks like we got about two years ago in 2024 (and are now out of production). We still have plenty of Dell R340s and R240s in production, which were our most recent generation in 2022. We still have some Dell R230s and even R210 IIs in production in less important server roles. We also have a fair number of Supermicro servers in production, of assorted ages and in assorted roles (including our fileservers and our giant login server, which is now somewhat old).
(On a casual look, the Dell R210 IIs are all for machines that we consider decidedly unimportant; they're still in service because we haven't had to touch them. Our current view is that R350s are for important servers, and R340s and R240s are acceptable for less important ones.)
In a change from 2022, we turned over the hardware for our fileservers somewhat recently, 'modernizing' all of our ZFS filesystems in the process. The current fileservers have 512 GBytes of RAM in each, so I expect that we'll run this hardware for more than five years unless prices drop drastically back to what they were when we could afford to get a half-dozen machines with a combined multiple terabytes of (DDR5) RAM.
(Today, a single machine with 128 GBytes of DDR5 RAM and some U.2 NVMe drives came out far more expensive than we hoped (and the prices forced us to lower the amount of RAM we were targeting).)
Our SLURM cluster is quite a mix of machines. We have both CPU-focused and GPU-focused machines, and on both sides there's a lot of hand-built machines stuffed into rack cases. On the GPU side, the vendor servers are mostly Dell 3930s; on the CPU side, they're mostly Supermicro servers. A significant number of these servers are relatively old by now; the 3930s appear to date from 2019, for example. We have updated the GPUs somewhat but we mostly haven't bothered to update the servers otherwise, as we assume people mostly want GPU computation in GPU SLURM nodes. Even the CPU nodes are not necessarily the most modern; half of them (still) have Threadripper 2990WX CPUs (launched in 2018, and hand built into the same systems as in 2022). With RAM prices being the way they are, it's unlikely that we'll replace these CPU nodes with anything more recent in the near future.
With current hardware prices being what they are (and current and future likely funding levels), I don't think we're likely to get a new generation of 1U servers in the moderate future. We have one particular important server getting a hardware refresh soon, but apart from that we'll run servers on the hardware we have available today. This may mean we have to accept more hardware failures than usual (our usual amount of server hardware failures is roughly zero), but hopefully we'll have a big enough pool of old spare servers to deal with this.
(I expect us to reuse a lot more old servers than we traditionally have. For instance, our first generation of Linux ZFS fileservers date from 2018 but they've been completely reliable and they have a lot of disk bays and decent amounts of RAM. Surely we can find uses for that.)
PS: If I'm doing the math correctly, we have roughly 10 TBytes of DDR4 RAM of various sizes in machines that report DMI information to our metrics system, compared to roughly 6 TBytes of DDR5 RAM. That DDR5 RAM number is unlikely to go up by much any time soon; the DDR4 number probably will, for various reasons beyond the scope of this entry. This doesn't include our old fileserver hardware, which is currently turned off and not in service (and so not reporting DMI information about their decent amount of DDR4 RAM).