Storage Vendors, You Would Upset Henry Ford

Certain storage vendors are allegedly selling replacement hard disk drives for their storage systems at inflated prices. I saw people complaining that their storage system will not accept a hard disk drive entirely identical to that found in their servers. The only difference was the drive’s firmware: if it is not recognised by the storage system as a field-replaceable unit (FRU) approved by the storage vendor, the drive will not work.

So, why do they do this? Nothing personal, just business: this way they can lower the procurement costs of a storage system to make it look more attractive for a customer than competitors’ systems, and the “lost revenue” will be caught up during inevitable maintenance and upgrades.

Henry Ford in 1919. Image from Wikimedia Commons, source: US Library of Congress

Henry Ford in 1919. Image from Wikimedia Commons, source: US Library of Congress

This is what Henry Ford had to say about a similar situation in the automotive market in his book “My Life and Work” in 1922 (Chapter 2):

[if the automobile] broke down and had to have parts replaced, then that was just hard luck for the owner. It was considered good business to sell parts at the highest possible price on the theory that, since the man had already bought the car, he simply had to have the part and would be willing to pay for it.

Seems unfair, both for automobiles and hard disk drives. But I also heard that many companies are not involved in such practices. I wonder if there are any statistics on this topic.

Share and Enjoy:
    Posted in Miscellaneous | Tagged | Leave a comment

    Parallel NFS (pNFS): the War Is Close

    Every Unix of Linux system administrator knows NFS, the Network File System, invented back in 1984. Its notable feature is a very simple syntax:

    mount server.example.com:/exported/tree /example/mountpoint

    This will mount the server’s directory “/exported/tree” on a local machine, making it accessible by the path “/example/mountpoint”. The server — server.example.com — also has to be configured, but it is not difficult, either. For many years, this setup was enough for many smaller cluster supercomputers, and it is still in use now (some parameters can be tweaked to make it more stable and faster). However, for the bigger machines, this presents a bottleneck, because there is only one server, and the whole system is limited by its performance.

    Several workarounds were adopted over the years, finally resulting in truly parallel file systems such as Lustre and Panasas (which are not so different, by the way). However, Lustre is difficult to set up and maintain, while Panasas is easier to use and has more functionality, but is a proprietary solutions resulting in vendor lock-in.

    On the other hand, both solutions can provide tremendous speed: for example, Lustre was demonstrated to provide a 1,3 terabytes per second bandwidth for the IBM Sequoia supercomputer. The question is: can we have high speed, easy syntax and open source all together? The answer is parallel NFS (pNFS). Continue reading

    Share and Enjoy:
      Posted in Reviews | Tagged , , , | Leave a comment

      Memory Bandwidth for Intel Xeon Phi (And Friends)

      John D. McCalpin, Ph.D., informally known as “Dr. Bandwidth” for his invention of STREAM memory bandwidth benchmark, posted STREAM results for Intel Xeon Phi and two Xeon-based servers made by Dell (see the end of his blog entry). All three devices are used in the “Stampede” computer at TACC.

      Intel Xeon Phi achieved memory bandwidth of roughly 161829 MBytes/second — not bad, to say the least! It is this product that I called “monstrous” 6 months ago. I decided to turn the numbers into a graphical form. Also present on the graphs are results of the aforementioned 2xCPU and 4xCPU Intel Xeon-based servers used as compute nodes in “Stampede”. First, total memory bandwidth per device: per Xeon Phi board, or per Xeon-based server:

      Memory bandwidth per device. MBytes/sec

      Memory bandwidth per device, MBytes/sec

      In this test, the number of OpenMP threads used by STREAM was equal to the number of cores on the computing device. As can be seen, Intel Xeon Phi board supplies a lot of bandwidth to its cores. The hi-end dual-socket and quad-socket servers are lagging behind in total bandwidth, and the quad-socket server is only marginally better than its dual-socket peer. Continue reading

      Share and Enjoy:
        Posted in Reviews | Tagged , , , | Leave a comment

        Fat-tree and Torus Articles Now Available at arXiv.org

        In case you wanted something more than an informal introduction into the world of fat-tree and torus networks — here you are. A formal problem statement (and solution!) in an academic form is given in these two articles hosted at arXiv:

        K.S. Solnushkin, “Automated Design of Two-Layer Fat-Tree Networks”
        (link: arXiv:1301.6179) (cite with BibTeX)

        K.S. Solnushkin, “Automated Design of Torus Networks”
        (link: arXiv:1301.6180) (cite with BibTeX)

        arXiv is the leading open access repository of scientific papers.

        Share and Enjoy:
          Posted in News | Tagged , , | Leave a comment

          Cluster Design Tools Updated (ver. 0.8.1a)

          A new version of Cluster Design Tools was made available for download today. The change concerns the UPS sizing algorithm, making it yet more optimal.

          clusterdesign-ico

           

           

           

          Share and Enjoy:
            Posted in News | Leave a comment

            Real Cost Comparison of Fat-tree and Torus Networks

            Thanks to Mellanox Technologies, our tool that designs fat-tree and torus networks now operates with real life prices for InfiniBand hardware. Mellanox kindly provided list prices for the previous generation of switches, InfiniBand QDR: these figures are not likely to change. The three switches are: Grid Director™ 4036 (36 ports), IS5100 Chassis Switch (18..108 ports) and IS5200 Chassis Switch (18..216 ports).

            Remember the two things: (a) prices are for QDR InfiniBand hardware; for the most recent prices and the current InfiniBand FDR hardware, please contact Mellanox; (b) you can always download the tool and supply it with your own prices.

            The main advantage of our fat-tree design tool is that it tries all possible configurations of modular (chassis) switches, including those where only some line cards are installed, hence recommending the most cost-efficient designs. Additionally, you can specify to the tool if your network must be expandable in the future, and up to how many hosts.

            I decided to use the newly available prices to objectively compare costs of the following networks: non-blocking and 2:1 blocking fat-trees and torus networks.

            network-cost-comparison-3888-nodes

            Cost comparison of fat-tree and torus networks with up to 3,888 nodes

            Continue reading

            Share and Enjoy:
              Posted in News | Tagged , , , , | 3 Comments

              Latency Everywhere

              People from the high-performance computing field have a clear understanding that performance of technical systems (of various nature) is characterized by two metrics: throughput and latency. People in other fields sometimes focus on either throughput or latency alone.

              MV Colombo Express, one of the largest container ships in the world. Image source: Wikipedia

              For many years it has been a well-accepted truth in the HPC field that latency in the computer network does matter. But with the proliferation of Gigabit Ethernet on desktops, people started to expect their network connection to function at a high bandwidth. It quickly turned out that big network latencies can easily limit available bandwidth. That’s how ordinary people learned about bandwidth, latency, and their interdependence. Continue reading

              Share and Enjoy:
                Posted in Miscellaneous | Tagged , | Leave a comment

                “I’ll Start My Own Supercomputer Conference”

                Remember that moment from Futurama when Robot Bender promises to set up his own theme park? Forget about the theme park, because is seems that we now have several supercomputer conferences. Why? And does it do us any good?

                I am a member of the ACM, and in their recent mailing I received news about upcoming events in the field of high-performance computing. What caught my attention was the not-so-major event that, nevertheless, proudly called itself the “International Conference on Supercomputing”. Continue reading

                Share and Enjoy:
                  Posted in Miscellaneous | Tagged | Leave a comment

                  Finally, A Topology-Aware MPI Implementation

                  Good news from the Supercomputing-2012 (SC12) conference: ten collaborators (including a talented team led by Dr. Dhabaleswar K. Panda) presented a paper on a new approach for assigning processes to compute nodes in InfiniBand networks.

                  Roughly, it works as follows: a plugin for OpenSM subnet manager retrieves a network topology by querying switches, topology information is passed to the MPI library (MVAPICH2), and finally the MPI library ensures that MPI processes that need to exchange large volumes of data are placed onto physically close compute nodes — i.e., within the minimum number of switch hops — or, ideally, within a single compute node.

                  The method allows to reduce the execution time of parallel applications by 6% to 15%, depending on the application and the number of MPI processes in a job: large-scale jobs appear to benefit more from the topology-aware placement.

                  As the new topology discovery scheme is capable of converting both fat-tree and torus topologies into tree-based representations, this technique should work for multi-dimensional torus networks as well, including 3D and 5D tori in IBM BlueGene. Initial results from the presentation seem to indicate this. As BlueGenes feature a proprietary network, and don’t have an InfiniBand-compatible subnet manager, another mechanism will be required to retrieve topology information here. Further exploration of torus based networks is the area of future work for the team.

                  Read the full paper here:

                  “Design of a Scalable InfiniBand Topology Service to Enable Network-Topology-Aware Placement of Processes”, by Hari Subramoni, Sreeram Potluri, Krishna Kandalla, Bill Barth, Jerome Vienne, Jeff Keasler, Karen Tomko, Karl Schulz, Adam Moody, Dhabaleswar Panda (PDF).

                  Share and Enjoy:
                    Posted in Reviews | Tagged , , , | Leave a comment

                    How To Size Your UPS System: More Power, Igor!

                    We now have a tool to size UPS systems! Find it in the menu above, or use this link.

                    Battery shelves for the biggest (65 MW) UPS system in the world: Battery Electric Storage System (BESS) in Fairbanks, Alaska. Image source: GVEA.

                    (As for the headline, Igor is a character of Tom Holt’s novel, “Igor”)

                    Share and Enjoy:
                      Posted in News | Tagged , | Leave a comment