Science and engineering both rely on the continuous increase in supercomputing performance. Back in 2009, it was believed that exascale machines will become available by 2018 — nine years ahead seemed like a lot of time. No one knew how much power exascale systems would require, but the seminal 2011 report by 65 authors (Jack Dongarra et al., “The International Exascale Software Project roadmap”, PDF) wrote this: “A politico-economic pain threshold of 25 megawatts has been suggested (by the DARPA) as a working boundary”.
Somehow it happened that people started to perceive 25 megawatts as a goal that is feasible to achieve, while in fact it was only “a pain threshold” — a figure that is still acceptable, but anything bigger than that would cause “mental pain” to stakeholders.
Now in 2014 it seems that both goals — time frame (2018) and expected power consumption (25 MW) — will be missed. Analysts are carefully speaking about year 2020, and discussions of power consumption are “omitted for clarity”, so to say, because any serious discussion would quickly reveal how badly the power goal is going to be missed. Of course, power consumption makes a significant contribution to the operating cost of a supercomputer, but the biggest obstacle on the way to exascale is the capital (procurement) cost, and it is not being discussed at all.
Our community is like Titanic: it is swiftly moving in the dark, allegedly towards its exascale goal, passengers on the saloon deck — the scientists — are chatting about how nice it would be to have an exascale machine to solve many of the world’s pressing problems, and how they will ask for funds to keep their countries competitive, but only some of them look out of the windows and note that we have an obstacle ahead, and then don’t speak out so as not to ruin the delight.
Let’s try to calculate how much an exascale machine could cost to buy and operate if we built it with today’s components. We will make the following assumptions:
1. Compute nodes are the same as used in today’s fastest supercomputer, Tianhe-2: two Intel Xeon processors plus three Intel Xeon Phi boards. For the CPUs, let’s further assume they are the Intel Xeon E5-2690v2 model, TDP is 130 W, cost is $2057, and Xeon Phi boards are the 3120P model, 1 TFLOPS of peak performance, TDP is 300 W, and cost is $1695. If you add the cost of the server chassis, as well as memory modules and the InfiniBand adaptor, this will lead to the cost of the compute node of the order of $12,500. That will be our building block, with 3 TFLOPS of peak performance and roughly 1250 W of power.
2. Electricity price is 0.0942 €/kW·hour, which is the average price for industrial consumers in the European Union in the first half of 2013. This equals roughly 0.13 $/kW·hour.
3. System lifetime is 4 years.
Under these conditions, to get the peak floating-point performance of 1 ExaFLOPS (EFLOPS), we would need about 333,000 compute nodes. Each costs $12,500, so the total cost of nodes will be $4.16 billion dollars. Power is 416 MW.
We can build a 4:1 blocking fat-tree network with 648-port InfiniBand switches; calculations using our tool yield the cost of $500M — not too much, compared to server cost. Power is mere 7 MW (see, fat-tree networks are not that power-hungry!)
The total capital cost of equipment is 4.66 billion dollars. Let’s now calculate electricity costs. Each year has about 8,760 hours. With the electricity price of $0.13 per kW·hour, the cost of 1 MW·year is $1.14M. The lifetime of our system is 4 years, and its total power is 416+7=423 MW. This leads us to electricity cost of $1.14M/(MW·year) * 4 years * 423 MW = 1.93 billion dollars.
So, the total cost of ownership is 4.66+1.93=6.59 billion dollars, and of them electricity costs are only 29%. Energy costs that are discussed in the community are large, and yet they comprise only one third of the total cost — the other two thirds, the capital costs, are “hidden”, just like an iceberg under the water, and don’t get enough media attention.
Of course, by 2020 individual compute nodes will be more capable, and we will need fewer of them to build an exascale machine — one day we can try this analysis, too. But with the calculations above, please tell me how many countries will be able to afford allocating 6 billion dollars for even a single exascale machine? And how many countries will have several such machines?
It seems that exascale computers will be the privilege of rich and powerful countries. Exascale machines are “weapons of innovation”, and countries that possess these “weapons” will be interested that other countries are unable to get access to this technology. The high total cost calculated above will only make this barrier higher.
UPDATE: What can a country buy for $6.6bn?
- Lay more than 200 km of high-speed railway (like the lines being built in China, suitable for 250 km/hour trains)
- Or lay 30 km of fully automatic underground metro lines (like the Circle line built in Singapore)
- Buy 30 Eurofighter Typhoon fighter jets for greater military security
- Buy 50-100 oil tankers for greater energy security (only vessels, without oil)
- Provide 6 years of funding for RWTH Aachen University or TU München, for greater confidence in the future