There are a lot of good research projects going on in Europe: if you didn’t hear about them, it is simply because they are not receiving the media attention they deserve. One of such projects is the Fraunhofer parallel file system, which is being developed by the Institute for Industrial Mathematics, part of the Fraunhofer Society, since 2005.
There have been several major installations of FhGFS during the last years (and more are coming), but this year, 2013, appears to be a definitive point for the project, characterised by a substantial increase in download counts. Latest presentation at SC13 in the USA also attracted attention (you can fetch the slides at the project’s website, FhGFS.com).
To learn what makes FhGFS so special, I decided to contact Sven Breuner, team lead for the FhGFS project. The conversation quickly became “an interview by e-mail”, which I present to you below. Continue reading →
If you are designing a supercomputer or a data centre and know the number of racks that you need to place on the floor, you can easily calculate the required floor space using the new floor planning module that is available in the cluster design tool set since version 0.8.2. Read more about floor planning here and download your copy of the software. The tool takes into account rack dimensions and clearances that you specify.
Colfax International is a US-based IT equipment and solution provider. What’s so special about them is that their web-based retail shop, Colfax Direct, lists thousands of items ready to be shipped, and all prices are available online, with no dumb “request a free quote” buttons or whatever.
Another nice feature of Colfax International is that they have a real, customer-facing research division, which is also uncommon among solution providers. In January 2013, Andrey Vladimirov of Stanford University and Vadim Karpusenko of Colfax International conducted some testing of the then-new Intel Xeon Phi coprocessors on the N-body problem, summarising their results in an article.
The N-body problem considers interaction of a large number of objects affected by gravitational forces, and is widely used in astronomy. The above-mentioned article describes a couple of coding optimisations that allow to significantly speed up computations, but you should better take a look at the animation below which says it all. The idea is that with more Intel Xeon Phi coprocessors you can calculate a longer sequence of particle interactions in a given time. And you do want to have a fast machine to calculate such things for you if you have an asteroid moving on a collision course with the Earth and which you intend to break up into parts.
What interested me in the video is the graph with performance results of N-body simulation on a server equipped with up to eight Intel Xeon Phi coprocessors (see the screenshot below, made at 1:00 into the video).
This performance data is valuable, but we can make it even more valuable if we compound it with equipment costs.
The headline sounds like the obvious thing: of course, if you can make your own CPUs for your projects, then you don’t have to rely on CPU manufacturers. “But wait”, you would ask, “Aren’t CPU design and manufacture very expensive?”
Yes, they are. Wikipedia explains that a project to design a high-end CPU can easily cost US $100 million and require 400 man-years. $100M is a huge sum; it’s a mind-boggling figure. But everything should be put into context:
Now, the $100M figure doesn’t sound that big, does it? And just for more context: Intel’s revenues in 2012 were US $53341 millions — this is the cost of 533 “typical” CPU design projects.
Tianhe-2, the fastest supercomputer in the world as of today, contains 32,000 Intel Ivy Bridge chips and 48,000 Intel Xeon Phi chips. Photo by Xinhua/Long Hongtao, via China Radio International’s English Service.
So it is not really that expensive; but it is difficult to organise, because you need to gather a hundred or two of top-notch engineers from many different fields, put them together, and motivate them for the work that can take many years before first “palpable” results are available. Continue reading →
The idea of using graphics hardware to perform computations dates back to 1978. However, AMD claims that it was them who “kicked off the GPGPU revolution” in November 2006. What is really important is that it was standardisation that allowed mass participation in this movement. NVIDIA’s API for GPGPU, called CUDA, was made available on February 2007, and is currently the de-facto standard.
The problems with practical use of GPU computing are two-fold. First, you have to learn the corresponding API — CUDA or OpenCL, or at least bindings for your programming language: see bindings for CUDA and for OpenCL. Second, you have to deal with memory transfers: every piece of data that you want to process must be transferred to GPU memory, and after processing, the result must be transferred back to your regular (CPU) memory:
Processing flow on CUDA that can soon be made obsolete. Image by Wikipedia user Tosaka. Source: Wikimedia Commons.
These transfers are tiresome to program, distracting software developers from their main task — making their software parallel. But transfers can soon be made obsolete, thanks to the technology proposed by AMD. Continue reading →
The term project for the course was to write an essay on the topic of disruptive innovations. A disruptive innovation is the one that disrupts the market — that is, in simple terms, users switch to the new technology, and companies that fail to morph their business model gradually go out of business.
My essay was about cluster computers, and how they disrupted the supercomputing market in 1990s. It explains the professional field to non-professionals, without referring to slang. My grade for this course was 94,2%, assigned by peer students, therefore I assume the essay was interesting to them.
The essay is below, in the form of questions and answers. As always, if you have something to say, drop a comment. Remember, it was created to serve as a popular explanation. I will also be glad to update the essay with new or refined facts.
1. How did cluster computing emerge as a disruptive technology for the supercomputing market? Continue reading →
Certain storage vendors are allegedly selling replacement hard disk drives for their storage systems at inflated prices. I saw people complaining that their storage system will not accept a hard disk drive entirely identical to that found in their servers. The only difference was the drive’s firmware: if it is not recognised by the storage system as a field-replaceable unit (FRU) approved by the storage vendor, the drive will not work.
So, why do they do this? Nothing personal, just business: this way they can lower the procurement costs of a storage system to make it look more attractive for a customer than competitors’ systems, and the “lost revenue” will be caught up during inevitable maintenance and upgrades.
[if the automobile] broke down and had to have parts replaced, then that was just hard luck for the owner. It was considered good business to sell parts at the highest possible price on the theory that, since the man had already bought the car, he simply had to have the part and would be willing to pay for it.
Seems unfair, both for automobiles and hard disk drives. But I also heard that many companies are not involved in such practices. I wonder if there are any statistics on this topic.
Every Unix of Linux system administrator knows NFS, the Network File System, invented back in 1984. Its notable feature is a very simple syntax:
mount server.example.com:/exported/tree /example/mountpoint
This will mount the server’s directory “/exported/tree” on a local machine, making it accessible by the path “/example/mountpoint”. The server — server.example.com — also has to be configured, but it is not difficult, either. For many years, this setup was enough for many smaller cluster supercomputers, and it is still in use now (some parameters can be tweaked to make it more stable and faster). However, for the bigger machines, this presents a bottleneck, because there is only one server, and the whole system is limited by its performance.
Several workarounds were adopted over the years, finally resulting in truly parallel file systems such as Lustre and Panasas (which are not so different, by the way). However, Lustre is difficult to set up and maintain, while Panasas is easier to use and has more functionality, but is a proprietary solution resulting in vendor lock-in.
On the other hand, both solutions can provide tremendous speed: for example, Lustre was demonstrated to provide a 1,3 terabytes per second bandwidth for the IBM Sequoia supercomputer. The question is: can we have high speed, easy syntax and open source all together? The answer is parallel NFS (pNFS). Continue reading →
Intel Xeon Phi achieved memory bandwidth of roughly 161829 MBytes/second — not bad, to say the least! It is this product that I called “monstrous” 6 months ago. I decided to turn the numbers into a graphical form. Also present on the graphs are results of the aforementioned 2xCPU and 4xCPU Intel Xeon-based servers used as compute nodes in “Stampede”. First, total memory bandwidth per device: per Xeon Phi board, or per Xeon-based server:
Memory bandwidth per device, MBytes/sec
In this test, the number of OpenMP threads used by STREAM was equal to the number of cores on the computing device. As can be seen, Intel Xeon Phi board supplies a lot of bandwidth to its cores. The hi-end dual-socket and quad-socket servers are lagging behind in total bandwidth, and the quad-socket server is only marginally better than its dual-socket peer. Continue reading →