The Journey Ends Here

Hi all!

It’s been several exciting years, but things do change, life goes on, and I work in a different field now. This website will remain online in the hope it will be useful, but comments will be disabled, and I will be unable to provide you with advice in HPC, because this is a fast-evolving field where you need to “constantly run just to stay in the same spot”, so to say.

I enjoyed the journey, however, and hope that you, if you work in HPC, are enjoying your journey as well.

My Ph.D. thesis defence presentation is available, and the Ph.D. thesis itself is published online at TUM‘s library.

Posted in News | Comments Off on The Journey Ends Here

Cluster Design Tools ver. 0.8.5 — Final!

The final version of the Cluster Design Tools, ver. 0.8.5, was released in June 2015 and was made available for download. The changes are mostly about letting the software suite run with newer Python versions — 3.4.2 and higher.

This version is “final” in the sense that I no longer plan to work on this software. But since it is available under the GNU GPL license, you can take it and use it, and suit for your needs, should you wish to.

Posted in News | Comments Off on Cluster Design Tools ver. 0.8.5 — Final!

Cluster Design Tools Updated (ver. 0.8.4), SADDLE Included

clusterdesign-icoThe recent release of Cluster Design Tools, version 0.8.4, is different from previous releases. First, it now incorporates SADDLE, a Python-based scripting language for automated design of cluster supercomputers and data centres, which was announced earlier.

Second, it now includes source code for each tool. Source codes are released under the GNU General Public License (GPL) or, for web services, the GNU Affero GPL license. Yes, even the fat-tree design tool is included, and you might be surprised to find it’s written in Object Pascal!

All in all, version 0.8.4 is a solid new release, and you can try it out by downloading and installing it. Just make sure you follow the guidance about the correct version of Python (as of today, version 3.3.0 works fine).

As usual, if something is not working as expected, you know what to do.

Posted in News | Tagged | Comments Off on Cluster Design Tools Updated (ver. 0.8.4), SADDLE Included

Will We Ever See InfiniBand in Desktop Computers?

[Update: the post is not actually about InfiniBand as such, it’s more about convergence that can be brought by the new Intel Omni Scale Fabric :-) Please see the comment below the post, and thanks for your time!]

When I was a system administrator (and it was long ago!), I dreamed that all desktop computers in the world would use one single type of network technology. Of course, I wanted to minimise the amount of maintenance for myself — but it would benefit our users, too.

Workstations of our engineers were connected to the Gigabit Ethernet switch, and compute nodes of our clusters were connected with thick and inflexible black cables to the InfiniBand switch. Head nodes of clusters were connected to both switches and acted as “routers” between these two network technologies. Wouldn’t it be better if all computers would reside on one type of network — for example, InfiniBand, because it is so much faster than Gigabit Ethernet?

We even thought of switching our workstations to use InfiniBand, but those expensive black cables with large and unwieldy connectors eliminated all elegance of the idea. Simple unshielded twisted pair (UTP) cable used for our Gigabit Ethernet network seemed a much better alternative: it was thin, easy to handle, and we could terminate the ends of the cable with simple tools. One day I read a press release (I guess, from Mellanox) where they speculated about an opportunity to run InfiniBand over twisted pair, albeit at a much lower speed. Unfortunately, this never happened. To use InfiniBand in your desktop computer or a workstation, you still need to buy an expensive InfiniBand network adapter, such as one below. Luckily, the newer adapters from Mellanox can speak both 40 Gbps Ethernet and 56 Gbps FDR InfiniBand:

Single-port and dual-port InfiniBand adapters from Mellanox. New models can speak both FDR InfiniBand at 56 Gbps or 40 Gbit Ethernet (auto-sensing or user-selectable)

Single-port and dual-port InfiniBand adapters from Mellanox. New models can speak either FDR InfiniBand at 56 Gbps or 40 Gbit Ethernet (auto-sensing or user-selectable). Note that in the real life the beautiful chip is hidden below the heatsink. Image source: Mellanox.

Continue reading

Posted in Miscellaneous | Tagged , , , | 1 Comment

SADDLE Presented at the ISC’14 Conference in Leipzig

clusterdesign-icoSADDLE,  the CAD tool for cluster supercomputer and data centre design, has just been presented at the ISC’14 conference in Leipzig, Germany. SADDLE can help you choose the best hardware by analysing its price/performance ratio, and then it will design an interconnection network and a power supply system, place equipment into racks and locate racks on the floor.

With SADDLE, you can design almost anything: HPC cluster, Hadoop cluster, web server farm, etc.

SADDLE stands for “Supercomputer And Data-centre Design LanguagE“. Below is the example design for 100 compute nodes, obtained with a simple script. Give SADDLE a try by supplying your own parameters such as the number of compute nodes or the type of the CPUs and then re-running the example script.

And did you know that economic characteristics (capital/operating costs) are automatically calculated, too?

Computer cluster designed by SADDLE, for 100 compute nodes (about 52 TFLOPS of peak performance)

Computer cluster designed by SADDLE, for 100 compute nodes (about 52 TFLOPS of peak performance)

Posted in News | Tagged | Comments Off on SADDLE Presented at the ISC’14 Conference in Leipzig

Cost: The Biggest Pothole on the Exascale Road

Recently, industry analyst John Barr wrote at the ISC blog about “Potholes on the Road to Exascale”. John speaks about a unified programming environment that should be able to support all sorts of computing devices of the future. That’s right: we need that. But only if we can actually build those future computing devices, because exascale cost is going to be prohibitively high.

In my previous blog post — Exascale Supercomputers: Anything but Cheap — it was calculated that exascale computers, if built with today’s components, would cost around 6 billion dollars to build and operate. That’s perhaps the biggest “pothole”, “obstacle” or “roadblock” on the road to exascale, or whatever metaphor you would like to use.

But the 6 billion dollar figure was based on the assumption that we take a compute node from Tianhe-2 that has two server-grade CPUs and three accelerator cards. The accelerators do the main floating-point intensive work, whereas the CPUs are mainly used to feed data into the accelerators and get the results back, and the motherboard physically connects the network adapter to the rest of the system.

You can realise significant savings if you integrate all required hardware right onto the chip. Calxeda went as far as to integrate memory controller, SATA controller and even a 10 Gbit Ethernet network switch onto their chips. And then a tightly-packed board connects together four such chips. Image source: The Register.

You can realise significant savings if you integrate all required hardware right onto the chip. Calxeda went as far as to integrate memory controller, SATA controller and even a 10 Gbit Ethernet network switch onto their chips. And then a tightly-packed board connects together four such chips. Image source: The Register.

What will be the benefits of getting rid of the CPUs and the network adapter and shifting this functionality directly to accelerators? Significant savings! Continue reading

Posted in Ideas | Tagged , , , , , , , , , , , | Comments Off on Cost: The Biggest Pothole on the Exascale Road

Exascale Supercomputers: Anything but Cheap

Science and engineering both rely on the continuous increase in supercomputing performance. Back in 2009, it was believed that exascale machines will become available by 2018 — nine years ahead seemed like a lot of time. No one knew how much power exascale systems would require, but the seminal 2011 report by 65 authors (Jack Dongarra et al., “The International Exascale Software Project roadmap”, PDF) wrote this: “A politico-economic pain threshold of 25 megawatts has been suggested (by the DARPA) as a working boundary”.

Somehow it happened that people started to perceive 25 megawatts as a goal that is feasible to achieve, while in fact it was only “a pain threshold” — a figure that is still acceptable, but anything bigger than that would cause “mental pain” to stakeholders.

Now in 2014 it seems that both goals — time frame (2018) and expected power consumption (25 MW) — will be missed. Analysts are carefully speaking about year 2020, and discussions of power consumption are “omitted for clarity”, so to say, because any serious discussion would quickly reveal how badly the power goal is going to be missed. Of course, power consumption makes a significant contribution to the operating cost of a supercomputer, but the biggest obstacle on the way to exascale is the capital (procurement) cost, and it is not being discussed at all.

When discussing economics of the exascale era, people often talk about power consumption and associated costs, but miss that they represent less than 1/3 of the total cost of ownership — “the tip of the iceberg”. Image by Uwe Kils (iceberg) and Wiska Bodo (sky). Source: Wikimedia Commons.

Our community is like Titanic: it is swiftly moving in the dark, allegedly towards its exascale goal, passengers on the saloon deck — the scientists — are chatting about how nice it would be to have an exascale machine to solve many of the world’s pressing problems, and how they will ask for funds to keep their countries competitive, but only some of them look out of the windows and note that we have an obstacle ahead, and then don’t speak out so as not to ruin the delight. Continue reading

Posted in Miscellaneous | Tagged , , , , , , | Comments Off on Exascale Supercomputers: Anything but Cheap

Hungarian Goulash and Parallel Cooking

Most cooking recipes are formulated improperly: they list actions required to prepare food, but fail to explicitly mention which of those actions can be run in parallel. This leaves you guessing how much time you could save if you added a pair or two of spare hands to the cooking process, thereby reducing your “time-to-dinner”.

Today we will explore the cooking process for Hungarian Goulash, using a mix of these two recipes: 1, 2.

goulash_08  Continue reading

Posted in Ideas | Tagged | 2 Comments

Building Fat-Tree Networks with Ethernet Hardware

Fat-tree networks work very well with InfiniBand hardware. At the same time, a fat-tree built with Ethernet switches may not always work: this is because switches must be able to discover multiple paths in a topology and balance traffic among them.

For topology discovery (detecting multiple paths), the switches must support OSPF protocol, or alternatively IS-IS or BGP protocols. For balancing traffic among those links, a standard called ECMP must also be supported.

If your Ethernet switches are outdated and support the spanning tree protocol (STP) rather than OSPF and friends, they cannot detect multiple redundant paths. If they don’t support the ECMP standard, they cannot utilise those paths. Only one path will be selected and used; the remaining ones will be disabled, bandwidth and resilience will be limited, and the fat-tree magic will just not happen.

Dell Networking Z9500 Ethernet switch with 132 40 Gigabit Ethernet QSFP+ ports. It supports all standards required to build fat-tree networks.

Dell Networking Z9500 Ethernet switch with 132 40 Gigabit Ethernet QSFP+ ports. It supports all standards required to build fat-tree networks. Source: Dell.

This alphabet soup of protocols is not easy to navigate (and why would one want to navigate a soup?), so there is now a page that simply lists Ethernet hardware that is known to support fat-tree topologies out of the box: Fat-Trees with Ethernet Switches. You can also get there through the site menu.

Currently it lists switches made by Mellanox and Dell. If you have comments or updates, please leave them at that page.

Posted in News | Tagged , , , , , | Comments Off on Building Fat-Tree Networks with Ethernet Hardware

648-Port InfiniBand FDR Switches Added to the Database

clusterdesign-icoIt’s Christmas time… Just a few hours, and year 2013 will become a thing of the past. Gone with it will also be the outdated InfiniBand QDR hardware that was — until today — used in the fat-tree design tool.

I updated the database to use the current, fourteen data rate (FDR) switches. These include 36-port, 108-port and 648-port switch models from Mellanox Technologies.

Prices for all parts (chassis and leaf, spine and management modules) were taken from Colfax Direct, a web shop of Colfax International. Disclaimer: I don’t work for Colfax, but I like what they do.

This is the first time I included 648-port switches with the tool, and it gives an opportunity to design huge networks. And when I say huge, I mean it. Theoretically, with switches that have P=648 ports, you can design fat-tree networks that support up to P²/2=209,952 nodes, while the largest supercomputers on earth as of now are “Tianhe-2” with its 16,000 compute nodes and “Titan” with 18,688 nodes. Now, what do you think, 648 ports in a switch “ought to be enough for anybody”?

Mellanox SX6536 InfiniBand Director Switch with 648 ports, tall and beautiful. Ought to be enough for anybody. Source: Mellanox.

Mellanox SX6536 InfiniBand Director Switch with 648 ports, tall and beautiful. Ought to be enough for anybody. Source: Mellanox.

But wait, there is more. Large modular switches may have a higher price per port than simple 36-port switches (in fact, more than 3 times higher), but using them can still prove more cost-effective. How comes it? Continue reading

Posted in News | Tagged , , , , | Comments Off on 648-Port InfiniBand FDR Switches Added to the Database