Good news from the Supercomputing-2012 (SC12) conference: ten collaborators (including a talented team led by Dr. Dhabaleswar K. Panda) presented a paper on a new approach for assigning processes to compute nodes in InfiniBand networks.
Roughly, it works as follows: a plugin for OpenSM subnet manager retrieves a network topology by querying switches, topology information is passed to the MPI library (MVAPICH2), and finally the MPI library ensures that MPI processes that need to exchange large volumes of data are placed onto physically close compute nodes — i.e., within the minimum number of switch hops — or, ideally, within a single compute node.
The method allows to reduce the execution time of parallel applications by 6% to 15%, depending on the application and the number of MPI processes in a job: large-scale jobs appear to benefit more from the topology-aware placement.
As the new topology discovery scheme is capable of converting both fat-tree and torus topologies into tree-based representations, this technique should work for multi-dimensional torus networks as well, including 3D and 5D tori in IBM BlueGene. Initial results from the presentation seem to indicate this. As BlueGenes feature a proprietary network, and don’t have an InfiniBand-compatible subnet manager, another mechanism will be required to retrieve topology information here. Further exploration of torus based networks is the area of future work for the team.
Read the full paper here:
“Design of a Scalable InfiniBand Topology Service to Enable Network-Topology-Aware Placement of Processes”, by Hari Subramoni, Sreeram Potluri, Krishna Kandalla, Bill Barth, Jerome Vienne, Jeff Keasler, Karen Tomko, Karl Schulz, Adam Moody, Dhabaleswar Panda (PDF).