648-Port InfiniBand FDR Switches Added to the Database

clusterdesign-icoIt’s Christmas time… Just a few hours, and year 2013 will become a thing of the past. Gone with it will also be the outdated InfiniBand QDR hardware that was — until today — used in the fat-tree design tool.

I updated the database to use the current, fourteen data rate (FDR) switches. These include 36-port, 108-port and 648-port switch models from Mellanox Technologies.

Prices for all parts (chassis and leaf, spine and management modules) were taken from Colfax Direct, a web shop of Colfax International. Disclaimer: I don’t work for Colfax, but I like what they do.

This is the first time I included 648-port switches with the tool, and it gives an opportunity to design huge networks. And when I say huge, I mean it. Theoretically, with switches that have P=648 ports, you can design fat-tree networks that support up to P²/2=209,952 nodes, while the largest supercomputers on earth as of now are “Tianhe-2” with its 16,000 compute nodes and “Titan” with 18,688 nodes. Now, what do you think, 648 ports in a switch “ought to be enough for anybody”?

Mellanox SX6536 InfiniBand Director Switch with 648 ports, tall and beautiful. Ought to be enough for anybody. Source: Mellanox.

Mellanox SX6536 InfiniBand Director Switch with 648 ports, tall and beautiful. Ought to be enough for anybody. Source: Mellanox.

But wait, there is more. Large modular switches may have a higher price per port than simple 36-port switches (in fact, more than 3 times higher), but using them can still prove more cost-effective. How comes it? Suppose you want to design a fat-tree network for N=600 nodes. When using 36-port switches, you need 34 switches on the edge level, plus 18 more switches on the core level, for a total of 52, plus lots of messy (and costly) cables between the two layers. The total cost of this network is $640,880 (you can see costs for all possible configurations of switches by using the “Show database” button in the tool).

With large switches, you would have only one 612-port switch (a partially-populated 648-port model), and no cables between the layers — those connections are implemented inside the switch. The cost would be $632,590 — which is lower. The costs of the two solutions are similar since 648-port switches actually implement the same two-layer fat-tree network, but within one chassis rather than distributed across your whole machine room.

Technical characteristics are also in favour of large switches. Here is a complete comparison:

  Connecting N=600 nodes
  with many 36-port switches with one 612-port switch
Cost $640,880 $632,590
Power 10,764 W 8,822 W
Weight 416 kg 322 kg
Size of equipment 52U 31U

You can try the fat-tree design tool online or download a copy for your own  usage. Networks built with 648-port switches do not always cost less when you buy the hardware, but they are definitely easier to build and maintain, which reduces the long-term total cost of ownership (less faults, less downtime, less disappointment). You can play with the switch database in the tool, for example, by commenting out 36-port switches, to force the tool to use larger switches even if this results in more expensive networks.

If there are any questions or anything not working as expected, please leave your feedback.

This entry was posted in News and tagged , , , , . Bookmark the permalink.