Clusters: A Disruptive Technology in the Supercomputing World

I recently completed my first Coursera course — “Surviving Disruptive Technologies” by Hank C. Lucas, Jr., Ph.D. and his staff. The course was an eye-opener: it gave enough examples to prove that sometimes companies do odd things and even go out of business just because of plain bad management.

The term project for the course was to write an essay on the topic of disruptive innovations. A disruptive innovation is the one that disrupts the market — that is, in simple terms, users switch to the new technology, and companies that fail to morph their business model gradually go out of business.

A simple home-built Beowulf cluster. Machines like this one disrupted the market. Photo by Alex Schenck. Image source: Wikimedia Commons.

A simple home-built Beowulf cluster. Machines like this one disrupted the market. Photo by Alex Schenck. Image source: Wikimedia Commons.

My essay was about cluster computers, and how they disrupted the supercomputing market in 1990s. It explains the professional field to non-professionals, without referring to slang. My grade for this course was 94,2%, assigned by peer students, therefore I assume the essay was interesting to them.

The essay is below, in the form of questions and answers. As always, if you have something to say, drop a comment. Remember, it was created to serve as a popular explanation. I will also be glad to update the essay with new or refined facts.

1. How did cluster computing emerge as a disruptive technology for the supercomputing market?

Supercomputing refers to the use of really big and powerful machines to solve those real world problems where experiments are unavailable or not economically feasible. Examples of such problems are: weather prediction, simulation of nuclear explosions and their consequences, new drug discovery, design of better cars and air-planes, etc. In all these cases, using supercomputers allows to achieve results cheaper and/or faster. Recently, supercomputers have been used for data analysis such as Facebook’s “graph search”, and this field is called “Big Data”.

Supercomputers have been around since the computing industry’s inception. However, late 1980s and early 1990s saw a significant increase in the nomenclature of computing devices. Personal computers (PCs) and servers emerged, eating into the market of bigger machines that were previously used to provide general computing services (accounting, text processing, running simple programs) in large companies and research institutions, as all these tasks could now be done with personal computers.

However, although the overall demand for bigger machines dwindled, supercomputers still remained vital. They were procured by government agencies, research institutions and some big corporations, because only these types of organisations could afford those expensive machines. At 1993, players in this filed were companies such as Thinking Machines, Intel, Cray, Hewlett-Packard, NEC (Japan), Fujitsu (Japan). Receiving revenues from generous government contracts, these companies didn’t anticipate the coming change.

The main problem with supercomputers was their price. High price came from two factors: the first was the cost of designing the machine, because, as technology evolved, customers expected ever-faster machines, so engineers had to do their best to match expectations, and this required lots of research and development. The other factor was low-volume fabrication, and thus engineering costs could not be amortised over larger batches. Each individual supercomputer remained expensive. If only there was more demand from customers, vendors could be able to set lower prices, possibly boosting further demand. But buying supercomputers still remained the privilege of well-funded institutions: prices were not decreasing much.

At the same time, the industry was producing lots of servers; they were found in many companies, even smaller ones, and the progress of Internet instigated manufacturing of vast amounts of servers, because all Internet content must somehow be served (on the provider’s side) or handled (on the company’s side) before it finally reaches personal computers. As a result, servers were manufactured in amounts so large as to become commodity, off-the-shelf hardware. That is, they were not unique, they were easily replaceable and upgradeable, and most importantly, they were very cheap compared to supercomputers.

Around 1994, two bright minds, Thomas Sterling and Donald Becker from NASA, came up with the idea of using servers to build a new type of supercomputers that they called Beowulf cluster computers. Clustering per se was known before, but this time cheap, mass-produced servers were used. Although servers were not perfectly suitable for running supercomputing applications of those days, programming techniques were developed to utilise them efficiently. Practice quickly showed that building a cluster computer was significantly less expensive than buying a traditional supercomputer of the comparable performance.

The rest of the story is short. Cluster computers, built out of commodity hardware, were a disruptive technology for the traditional supercomputer companies. The incumbents — the thriving companies listed above — had to do something. As it turns out, all of them did differently: some companies disappeared, some morphed their business models, and some new players (including even IBM) entered the market of “affordable supercomputers” (previous IBM machines were for the √©lite). Currently, the vast majority of supercomputers are cluster computers made with commodity components that anyone can freely buy on the market. Only the top machines still feature unique, custom-designed and custom-made components.

2. How did the industry respond?

“The Innovator’s Dilemma”, the book by Clayton M. Christensen, states that incumbents often disregard disruptive technologies when they see them, because “investing aggressively in disruptive technologies is not a rational financial decision” (page xvii) and that they have “well-developed systems for killing ideas that their customers don’t want” (page xix). However, this was not quite the case with the supercomputing industry.

Embracing the disruptive technology — the commodity servers — might have slightly shrunk the incumbents’ traditional business, but on the other hand, it would uncover much bigger opportunities in new market segments. It turns out that high-profile customers still continue to buy the best and the fastest supercomputers with unique hardware. However, the most systems currently shipped are made of commodity hardware, and go to medium-sized or even small customers — the market which was inaccessible to traditional supercomputing vendors before the disruptive technology arrived.

Funny enough, when the technology of using commodity servers to build supercomputers became available, at first there were no companies that provided corresponding products or services. Instead, each research institution could assemble their own cluster computer, with help and advice from the community of similar users.

Gradually, the market changed, and there emerged companies that did just that: assembling cluster computers out of commodity servers, loading them with necessary software, and providing this as a turn-key solution. This further broadened the market, as the customers didn’t require the expertise to make all pieces work together. The incumbent companies also started to provide these products and services — at least, those who remained in the business, but more on this later.

3. What factors inhibited the incumbents’ response to the disruptive technology?

Let us now discuss, using the survival model introduced in the course, what factors inhibited the incumbents’ response to cluster computing as a disruptive technology. We will not go into details for individual market players, but will rather describe the industry as a whole. The possible grades (how much of a challenge it was) are low, moderate, and high.

1. Denial: High. All the incumbent companies were doing a lot of research and development, in order to be able to provide newer and faster supercomputers to their customers every year. These research and development activities were very costly, so the incumbents just didn’t believe someone could come in and propose to use the technology — the cheap commodity servers — which were not the top-notch, unique hardware, and were not, in fact, designed for supercomputing tasks. No one believed that using cheap servers would just work. But it did. It was a classical example of denial on the part of the incumbents.

2. History: High. Many of the players in the supercomputing market were there since at least 1960s or 1970s. Therefore, by 1995 there was quite a lot of history that these companies have accumulated. It was not easy to say “good bye” to all that.

3. Resistance to change: Low. Making supercomputers is on the forefront of innovation, it requires to introduce new ideas with every new product, so generally the incumbents were not afraid of making changes.

4. Mind set: Moderate. Incumbents were sure they were proceeding in the right direction. They were always trying something new in order to make their supercomputers faster, so they didn’t have a constrained mind set. They would also try commodity servers if someone told them, but they just couldn’t believe it could work in the long term.

5. Brand: Moderate. There were only a handful of market players in supercomputing, and brand and reputation were important. If some company switched to using commodity servers for their supercomputers, the public relation would likely be negative. The funding agencies, such as DARPA, would also frown upon such a move: they always wanted the top-notch technology, and in fact, they were right here, as simply using commodity servers would not lead to much progress in new hardware. So the incumbents were worried of reputation loss that would result had they accepted the commodity technology, which was inferior to what they were using. However, it turned out that using commodity technology led to the enormous expansion of the supercomputing market, attracting myriads of small customers and driving revenues, so worries were in vain.

6. Sunk costs: High. All the research, development and manufacturing infrastructure cost money, and these were sunk costs. If the commodity servers, manufactured by someone else, were adopted for building supercomputers, then previously used incumbents’ manufacturing facilities would become a useless asset.

7. Profitability: Moderate. The incumbents were not sitting on the huge heaps of cash and streams of revenue which would preclude them from changing their business models. Large contracts were coming in from the government and high-profile research institutions, but competition was noticeable. Therefore, profitability of the companies did not prohibit them from trying something new; it was denial that stopped them.

8. Lack of imagination: Low. Just as with “Resistance to change” above, the incumbents were ready to try new approaches to making supercomputers. It was mainly denial that prohibited them from actually trying to use commodity servers, and gaining access to a vast new market. They didn’t believe commodity servers would work in this role. And when T.Sterling and D.Becker demonstrated that it did work, it was too late for incumbents to catch up, because anybody could build cluster supercomputers out of freely available hardware.

4. What was the outcome of the disruption? Hint: think of three types of outcomes.

Let us describe the outcomes of disruption — that is, which companies could morph their existing business model, which abandoned it, and which just went out of business. We will take a look at several companies.

1. Thinking Machines Corporation: Failed, due to lack of orders, before the disruption came in. “The company filed for bankruptcy in 1994, with its hardware and parallel computing software divisions eventually acquired by Sun Microsystems — see Wikipedia entry for Thinking Machines Corporation. However, this happened before cluster supercomputing was introduced by T.Sterling and D.Becker, and the reason for failure was the lack of orders, not the disruption.

2. Intel: Morphed its business model. The company doesn’t assemble ready-to-use supercomputers any more, as it did back in 1993, but instead designs processor chips for cluster supercomputers (called Central Processing Units, or CPUs). Currently, Intel CPUs intended for commodity servers are equally suitable for supercomputing purposes. Recently the company announced a product called Intel Xeon Phi, designed specifically for supercomputing.

3. Cray: Abandoned its business model, moved to commodity solutions. Cray went through a series of mergers and acquisitions. It was sold to a company called Silicon Graphics (SGI) in 1996, then, in 2000, SGI sold it to another company, Tera Computer Company, which renamed itself as “Cray Inc.” Currently, Cray Inc. still produces very powerful supercomputers, although most of hardware is commodity. The know-how of Cray Inc. in these computers was a custom proprietary interconnection network, but Cray Inc. sold these assets to Intel in 2012. Eventually, Cray Inc. acquired Appro International, the company which builds cluster computers out of commodity hardware, thereby indicating their final move to commodity-only solutions.

4. Hewlett-Packard: Abandoned its business model, moved to commodity solutions. All supercomputers currently sold are based on commodity servers. (Note that supercomputers are a tiny fraction of a diverse HP’s business).

5. NEC (Japan): Allegedly went out of supercomputing business. Announced in 2011 that it plans to develop a traditional (not cluster) supercomputer, but gave no news since that.

6. Fujitsu (Japan): Morphed its business model. In June 2011, presented their “K Computer”, fastest in the world by that date, and not based on commodity hardware.

7. NVIDIA: Entered the market. NVIDIA was known as a company which produced graphics adaptors — the pieces of hardware that are responsible for drawing text and images on our computer monitors. When NVIDIA realised that graphics adaptors can be used as specialised computation engines, it capitalised on this opportunity. The modified variants of those adaptors are now called “GPGPU accelerators”. They are produced in very large quantities and are available from at least two manufacturers (AMD probably ranks second in the GPGPU market share), therefore we can call them “commodity hardware”. They are relatively cheap, and used in many current cluster supercomputers as just another commodity part.

8. IBM: Entered the market of “affordable supercomputing”. Traditionally, IBM did all things related to computing, and also did some supercomputing innovations (see, for example,¬†IBM Stretch in 1961, as well as the ACS-1 project running from 1961 to 1969 that, although never delivering a production machine, resulted in lots of fruitful ideas). Then there probably was a period of silence, because the June 1993 TOP500 list — the list of the best 500 top-performing computers in the world — lists IBM as a customer rather than a manufacturer: see entry 39 in the list. But since then the company has been increasing its market share, and in the November 2012 TOP500 list it has 193 supercomputers listed out of 500 — that is, a 38,6% share (see statistics generator). Currently it has two lines of products: one is cluster computers based on commodity servers (such as the 2nd fastest supercomputer in Europe as of today, SuperMUC), and the other is the proprietary technology called Blue Gene (such as used in the world’s 2nd fastest supercomputer, Sequoia). On the whole, IBM has lots of expertise in the supercomputing business, and because the business of the “Blue giant” is very diversified, it stands on the ground extremely firmly.

5. What the traditional supercomputing industry could have done differently in the past?

The answer is simple. Unlike some other industries that experienced disruptions, the supercomputing industry shouldn’t have been afraid to embrace the disruptive technology, namely, cluster supercomputers based on cheap commodity servers. It’s because their primary business — shipping very powerful machines to high-profile customers — would largely remain unaffected by the advent of cluster computers. Instead, the companies would get access to a vast market of medium-sized and small customers who also needed supercomputers but couldn’t afford their price. That would be a win-win situation for everyone.

The industry tried to “jump on the bandwagon”, but this move was largely inefficient, because the characteristic feature of cluster supercomputers is that they are easy to assemble without referring to a large company. In fact, this was how the technology proliferated in its first years: each research institution could build their own cluster supercomputer.

In a conclusion, the situation has now settled, but in the fast-moving supercomputing industry the new disruptions are always on the horizon.

This entry was posted in Ideas. Bookmark the permalink.