An introduction to budget “high” performance computing

So, you want to get into HPC?

Whatever your reason may be, whether it’s for a overkill homelab or a CFD simulation for your college research, you’re finding yourself venturing down the rabbit hole of modern high performance computing.

Regardless of how much money you have, everyone wants to spend as little as possible, and if you’re like me, your budget isn’t much. However, don’t let this discourage you! While you and I may not be able to afford a brand new Dell PowerEdge with dual Xeon Platinum CPUs and more RAM than your computer has in storage, that doesn’t mean you can’t get some powerful hardware for a acceptable cost, albeit with a possibly high power consumption… (more on that later)

 

Now, first things first, here is what hardware to avoid:

  1. Xeon Phi x100 coprocessor cards
    Although these may be very tempting because of their high core count, I would not recommend purchasing these. A while ago I fell into this trap thinking that they would be easy to configure and work seamlessly with my system. I was wrong. The driver software works only on Windows 10 or Server unless you have some ancient Linux kernel, even if you compile it. Also, if you do get into the SSH terminal for the Xeon Phi, be prepared to pay for Intel Parallel Studio XE and the Intel compiler, as GCC does not support the slightly modified x86 architecture of these CPUs.

  2. NVIDIA Tesla K-series and older GPUs (except the K80 and K40)
    That is, if you’re into GPU HPC. Unfortunately, aside from their high power consumption and poor FLOPS/Watt, NVIDIA’s CUDA will no longer support these older GPUs after CUDA 11. So while CUDA 11 and older will work, looking forward, these cards are at the end of the road. However, the Tesla K40 and especially K80 have such high FP64 performance per dollar that they should be perfectly usable for modest workloads in HPC.

  3. Anything with DDR2 or older
    Most HPC applications need not only fast CPUs and GPUs, but also require high memory bandwidth and capacities. DDR2 is unable to provide this. In addition, any server that contains DDR2 very likely has very bad performance per watt. So, unless your electricity is free, I would not recommend this hardware.

  4. Anything with a CPU made before 2010 and any 32-bit CPU
    The 32-bit CPU part should be self explanatory. Also, any CPU prior to 2010 is in the same boat as the DDR2 situation: it likely has very poor performance per watt. Last but definitely not least, these CPUs may lack modern vector processing extensions, so they will be much less effective with highly vectorized code.

  5. Brand new equipment
    While it may be tempting to spend your paycheck on the latest and greatest, don’t! Brand new servers are generally very expensive and you can put that same money into multiple servers that are a little bit older. The older ones may not have the highest energy efficiency and performance compared to the latest, but if you don’t go too far back in time with your search, these metrics are still very good.

Next, here’s what to look for:

  1. Deals on new equipment
    While new equipment is often very expensive from vendors such as Dell, HP and Lenovo, there are sometimes deals on new servers. If a newer generation is scheduled to come out, the current generation servers may have deals, so if you hear of a vendor making a new server generation, check out their site!

  2. Intel Xeon E3/E5/E7 CPUs (V3 and newer) and AMD Opteron 6300 series
    These CPUs can often be found for great deals whether they are in an existing system or just by themselves. While the Opterons can go in 4-socket motherboard and have a lot of cores, set your priority on the Intel CPUs. They may have fewer cores, but their per-core performance makes up for it. However, don’t strike the Opterons off your list yet, as they are still good performers especially with highly parallel workloads. Just note that every two cores share a FPU, so floating point arithmetic won’t be great. However, they are very cheap!

  3. DDR3-1600 ECC or DDR4 ECC Memory
    Although DDR3 may be old now, it’s low cost even for high frequencies makes it a great deal especially for large quantities. Aim for higher frequencies with ECC though, as for HPC applications memory bandwidth is critical and ECC will make sure you get reliable results. Better yet, if you can get your hands on cheap DDR4 with ECC (and your system supports it), get it instead.

  4. Infiniband Connect X-3 or newer
    If your into HPC, chances are you have heard of Infiniband. Not only are the speeds in the tens of gigabits per second, the latency is superb. Although this may seem great but sound expensive, it’s only the former. Infiniband is perfect for transferring a lot of data with low latency, and many PCI-e cards can be found for less than $75 a piece. Despite its low cost, the cost of cables make up a significant percent of the overall cost of using Infiniband. Cables come in 2 main types: passive copper and optical. Passive copper cables are good for lengths less than 3 meters. After that, optical cables are cheaper per meter. But before you buy, make sure that your cables and cards match in speed. Also, be sure your workload even needs this technology, as it’s not always needed.

  5. Almost any GPU with FP64 performance exceeding 2.5 TFLOPS or a FP32/FP64 performance ratio of 3:1 or less
    That is, if you need GPUs for your HPC cluster. Many GPUs feature high single precision floating point performance but lack a proportional double precision floating point performance. If all you need is FP32 performance, you can ignore this tip. But, if you require high FP64 performance, look into NVIDIA Tesla GPUs or Radeon Instinct accelerators.


Here are two websites to aid your shopping…

  • E-bay
    This one’s pretty obvious. I have found that E-bay is the most abundant source of good deals on server hardware, It just takes a little time and looking around! I generally avoid the auctions though, as I think they’re more trouble than they’re worth.

  • Labgopher
    Although this site uses E-bay for its listings, it is exclusively for used servers. It scrapes the E-bay site for server listings and sorts everything according to user-defined parameters. It makes everything more convenient.


Last minute tips…

Many people get so caught up in their shiny new equipment that they forget all the other essentials that go into a HPC homelab. Here are some other things to consider:

  1. Make sure you have access to at least one 20 amp breaker (The more current the better). While a 15 amp circuit may work too, if you’re buying a lot of gear, you may find yourself frequently running to the breaker panel to reset a breaker like I have to do. If you own your place, you might want to look into getting more circuits installed into your server room (and please, have it done professionally!). If you rent, you might find yourself running extension cables from other breakers into your server room to split the electrical load, like I have to do as well.

  2. Make sure you can afford an increased power bill. Many completely disregard that an HPC homelab uses a lot of power. And by a lot, I mean A LOT. Here in the US, power rates vary dramatically, and some states even employ a tiered system, where the more you use the higher the rate (I’m looking at you California). If you live in California like I do, power bills can be a significant burden, but in other regions this may not be a problem. Some mitigation techniques I use are to turn off anything that is not used, turn off air conditioning when it is not absolutely necessary (that may mean having to deal with some heat!), and even rationing power when I see I have used too much in a month.

  3. Invest in battery backup units (UPS). While you may be able to get adequate power and funds to run your HPC cluster, the occasional brown-out or even power outage will cause all of your precious HPC applications to halt and possibly lose all progress. A UPS battery backup unit allows your HPC cluster to run on battery power so you can safely shut everything down safely. Just make sure the battery units you buy can handle the load of your servers.

  4. This one ties to #3: make sure your workloads can be paused and saved if necessary. Most likely you and I are in the same boat in that our HPC cluster is at home. That means that our power supply is not always stable or we may find ourselves needing to shut things down, such as for rationing power or high temperatures. Therefore, if one of these scenarios occur and you can’t pause an application, you may lose all progress!


Conclusion

So, now that you have read that, you should have an idea of what to buy and not to buy for your upcoming HPC homelab! Essentially, avoid machines with bad performance and/or high power consumption but don’t necessarily buy new either. Equipment that is a little old should do the trick and remember: the internet is your friend! Stay effective!

 

Information as of Q4 2020