"A Conservative News Forum"
[ Browse | Search | Topics ]

Click to scroll to commentary.

University Of Kentucky Supercomputer Breaks The $100 Per GFLOPS Barrier
University of Kentucky ^ | 8/22/2003

Posted on 08/23/2003 7:55 AM PDT by justlurking

University Of Kentucky Supercomputer
Breaks The $100 Per GFLOPS Barrier

For immediate release, August 22, 2003, Lexington, Kentucky: Researchers at the University of Kentucky have constructed and demonstrated an innovative new, scalable, parallel supercomputer that achieves application performance of more than 1 billion floating point operations per second (GFLOPS) for every $100 spent on building the machine. The approach used to design and build this machine makes it cost-effective for solving a wide range of problems, from drug design using computational chemistry to design of quieter printers using computational fluid dynamics (CFD). Thus, this breakthrough is not only a milestone, but also will enable many more scientists and engineers to use computational models.

A decade ago, supercomputers cost about a $1,000,000 per GFLOPS performance. By using standard PC parts, "Beowulf" cluster supercomputers dramatically reduce the cost, but as processors and other components have become faster and cheaper, the network needed to coordinate them has become relatively expensive. The University of Kentucky researchers made their first breakthrough in reducing network cost in May 2000, when KLAT2, Kentucky Linux Athlon Testbed 2 ( used standard 100mb/s Fast Ethernet hardware in the world's first machine-designed asymmetric cluster network -- and achieved $640 per GFLOPS, breaking the $1,000 per GFLOPS barrier. Their newest machine, KASY0, Kentucky Asymmetric Zero (, uses a more advanced type of asymmetric network design to break the $100 per GFLOPS barrier.

A well-known reference for supercomputer performance is, which lists the 500 supercomputers that obtain the highest GFLOPS speed executing a Linpack benchmark program. Performance on that program depends partly on the theoretical peak GFLOPS of the processors, but also on the parallel implementation and efficiency of the network that allows the processors to work together. In the current (June 2003) list, most systems use expensive, specialized, network hardware. The machines explicitly listed as using standard 100mb/s Fast Ethernet achieve an average of less than 8.5% of peak. The average for the systems listed as using Gigabit Ethernet is somewhat better, at about 30% of peak. In contrast, KASY0's 100mb/s Fast Ethernet network allows it to achieve 187.3 GFLOPS, over 35% of peak using a double-precision version of the benchmark (HPL). Using a single-precision version, the $39,454.31 KASY0 obtains over 471.5 GFLOPS, more than 44% of its theoretical peak and less than $84 per GFLOPS.

The remarkable thing about KASY0's price/performance is that, while network hardware is often the dominant cost for a system of its size (128 plus 4 spare nodes), less than 11% of the system cost went for the network hardware. The AMD Athlon XP 2600+ processors were more than 35% of the total system cost; memory was 21%. Even more significantly, the network design technology that made this possible can be applied with similar benefit to cluster supercomputers with thousands of nodes. KLAT2's network was the world's first Flat Neighborhood Network; the enhanced version used for KASY0 is the world's first

Sparse Flat Neighborhood Network (SFNN). KASY0 also is the first supercomputer to have its physical node and switch placement optimized by a computer program. FNN design technology and tools have been freely available and used by various other groups; so too will the new SFNN technology be freely available.

KASY0 is not a toy or a "hack" -- it is a serious demonstration of a fundamental new advance in network design. The only other supercomputer we have seen claim close to the price/performance measured for KASY0 is this $50,000+ system built by the National Center for Supercomputing Applications (NCSA) using 70 PlayStation2 units. Not only does KASY0 have a vastly superior network and significantly higher peak floating point performance per node, but KASY0's lower price yields many more nodes and real application performance, not just high peak numbers.

For example, KASY0 also has set a new world record for rendering a complex image using the Persistence of Vision Raytracer (POV-Ray). Executing pvmpovray 3.5 on KASY0 to render the standard

benchmark.pov scene yielded a time of 72 seconds. According to this site, the previous record was 107 seconds set on August 1, 2003 by a cluster costing $79,000.

The primary architect of KASY0 is Tim Mattox, a research assistant who has been developing the Sparse Flat Neighborhood Network concept for his Ph.D. thesis. As an educational experience available to anyone, the physical construction of KASY0 was done entirely by volunteers at the University of Kentucky.

From the creation of the first Linux PC cluster in February 1994 to the construction of KASY0, Hank Dietz and his students have continued to improve cluster performance by making compilers, hardware architecture, and operating system work together more efficiently. At the University of Kentucky, as Professor of Electrical and Computer Engineering and James F. Hardymon Chair in Networking, Dietz's goal is to develop and freely diseminate the new technologies that will allow scientists and engineers to solve their most important computational problems.

TOPICS: Technical; US: Kentucky

I thought this was pretty cool: a supercomputer for less than $40,000!

1 posted on 08/23/2003 7:55 AM PDT by justlurking
[ Post Reply | Private Reply | View Replies ]

To: All

The Cost Of KASY0

The following is the complete (or nearly so) parts list. It may be useful to compare this to our previous big system, KLAT2, whose pricing summary is given here. Notice that costs for all items must be tracked in order for us to justify our claims of setting a new price/performance record.

Subsystem Description Model/Part Number Vendor/Source Quantity Delivered Price
Node Athlon XP 2600+ Processor Athlon XP 2600+ Retail
(333MHz FSB) 128 $13690.00
Node Athlon XP 2600+ Processor Athlon XP 2600+ Retail
(333MHz FSB) 4 $400.00
Node 512MB PC2700 DDR SDRAM Crucial CT6464Z335 MWave.Com 132 $8316.00
Node Athlon XP Motherboard BioStar M7VIT Pro MWave.Com 132 $6996.00
Node Node case + 400W power supply 6042L Codegen 4GoldenBridge.Com 64 $2462.00
Node Node case + 400W power supply 6042L Codegen PineComputer.Com 68 $2380.00
Nodes Subtotal $34244.00
Network Fast Ethernet NIC Linksys LNE100TX AlanComputech.Com 280 $2082.00
Network 24-port Fast Ethernet Switch BenQ SE0024 NewEgg.Com 6 $432.00
Network 24-port Fast Ethernet Switch BenQ SE0024 AudioExchange.Com 10 $772.03
Network 24-port Fast Ethernet Switch BenQ SE0024 existing equipment 2 $152.00
Network Cat5e 15-foot Cable (9 colors) CBLC515 LanAdapters.Com 450 $807.95
Network Subtotal $4245.98
Support 6-Shelf Commercial Chrome Rack Sku# 831725 SamsClub.Com 6 $548.60
Support 20" Box Fan Lakewood Model 202 WalMart.Com 2 $22.09
Support Surge Protector Power Strip Statitec 6-outlet Sku# 808571 SamsClub.Com 1 (32-pack) $173.64
Support Materials for power mounts BC Plywood, 2x4s, 7/8" dowels, glue, screws, paint local places/stock $20.00
Assembly Food for student helpers 4 dozen Panera Bagels, 11 large Papa John's Pizzas, 5 cases assorted soft drinks, 1 case party mix, 1 case Grandma's cookies, 2 cases miniture cheesecakes local places $200.00
Misc. Subtotal $964.33
Total $39454.31

We do not include the Nikon 950 camera which we have mounted in the cluster because it is completely unrelated to the cluster's operation, serving as a webcam and security monitor for the entire lab. Neither does the cost above include a firewall or a "head node," because the entire lab is behind an old PC used as a firewall and KASY0's configuration allows any of a number of existing machines to serve as "head nodes" for different purposes (i.e., KASY0 is a cluster of peer nodes). If we were to use included spare hardware for the firewall and head node, the only additional cost would be less than $100 for a disk drive.

The assembly cost might seem low, but we easily could have obtained comparable-performance assembled systems for similar pricing. In a university setting, it is simply more appropriate to give students the experience of building the systems and, as a side benefit, we get better control over the precise choice of components used and how they are assembled. For example, each case came with two side fans, which we converted into a redundant stack venting out the back. We also took a cost hit on doing our own assembly in a variety of ways; for example, shipping is higher for parts than for assembled systems. Another cost hit cam because we bought just 4 processors to test assemble systems with and only then ordered the other 128 -- had we ordered everything at once (as we would have for assembled systems), all the processors would have been purchased before a 7% price increase hit. We didn't cut corners on anything; note that we counted spares in the cost, the cases have 400W power supplies, the processors have full warranty retail packages, and even the power strips came with surge protection and full insurance for the protected equipment.

Thus, KASY0's "street cost" is under $40,000 by any accounting. In comparison, KLAT2's "street cost" was $41,205 for just 64+2 nodes, each of which was about 1/3 the speed of a KASY0 node. The memory size is also 4x per node, 8x total. Network latency typically will be identical, with total bandwidth about 1.5x that of KLAT2. The accounting of the network cost is somewhat debatable in that each node motherboard contains one built-in NIC; we counted that NIC as part of the node cost, not network cost, because the board isn't available without the NIC. Even if we had ignored the built-in NIC and purchased more NICs, the network on KASY0 is close to half the cost of that on KLAT2 -- after all, it even uses narrower switches than KLAT2 did: 24-port vs. 32-port switches.

An even cuter comparison is with this, a $50,000+ system built using 70 PlayStation2 units. Not only does KASY0 have a vastly superior network and significantly higher floating point performance per node (8 GFLOPS vs. 6.5 GFLOPS for the PS2), but we get LOTS more nodes!

For those interested in how this compares to UK's SDX HP Superdome, the quick answer is that the two machines are very different, but have roughly comparable performance. The HP is a vendor-packaged system with more processors (224 total: 3x64 + 1x32 750MHz HP PA-RISC 8700), more memory (448GB), and a higher double-precision speed (672 GFLOPS peak, 431.7 GFLOPS Linpack). On the other hand, KASY0's integer and single-precision speeds are faster (e.g., 1062.4 GFLOPS peak, 471.5 GFLOPS Linpack), it is a homogeneous system (not a cluster of different-sized shared memory systems), and power consumption is several times lower. Oh yeah: KASY0 also is much cheaper!

2 posted on 08/23/2003 8:01 AM PDT by justlurking
[ Post Reply | Private Reply | To 1 | View Replies ]

To: justlurking

Redneck computing!

3 posted on 08/23/2003 8:06 AM PDT by HiTech RedNeck
[ Post Reply | Private Reply | To 1 | View Replies ]

To: justlurking

Shadetree geeks

4 posted on 08/23/2003 8:10 AM PDT by CHICAGOFARMER (Citizen Carry)
[ Post Reply | Private Reply | To 2 | View Replies ]

To: HiTech RedNeck

Go Cats!

5 posted on 08/23/2003 8:11 AM PDT by JusPasenThru (We're through being cool (you can say that again, Dad))
[ Post Reply | Private Reply | To 3 | View Replies ]

To: HiTech RedNeck

I wonder if any of the steel computer cases have holes from shotgun shot in the side.

6 posted on 08/23/2003 8:12 AM PDT by July 4th
[ Post Reply | Private Reply | To 3 | View Replies ]

To: justlurking

For a few bucks more they could have added a dandy case:

7 posted on 08/23/2003 8:45 AM PDT by Leroy S. Mort
[ Post Reply | Private Reply | To 1 | View Replies ]

To: Leroy S. Mort; July 4th

I was wondering when the anti-Linux crowd would descend on this thread, but I never suspected that it would attract redneck and Kentucky jokes. :-)

Check out the last item in the cost breakdown. Do you think someone brought something beyond the typical techgeek diet, but didn't include it?

8 posted on 08/23/2003 8:57 AM PDT by justlurking
[ Post Reply | Private Reply | To 7 | View Replies ]

To: Ernest_at_the_Beach; sourcery


9 posted on 08/23/2003 9:05 AM PDT by Libertarianize the GOP (Ideas have consequences)
[ Post Reply | Private Reply | To 8 | View Replies ]

To: justlurking

I'm posting this for comparison: it's the cost breakdown for their previous cluster built back in 2000. Check out how much prices have declined (and speed/capacity has increased) in the past 3 years:

The Cost Of KLAT2

KLAT2's cost is somewhat difficult to specify precisely because the most expensive components, the Athlon processors, were donated by their manufacturer, AMD (Advanced Micro Devices). Here, we quote the retail price for these processors as found on Multiwave's WWW site on May 3, 2000. Similarly, although most applications use only 64 nodes, KLAT2 also has 2 "hot spare" nodes and an additional switch layer that are used for fault tolerance and system-level I/O; because we consider these components to be an integral part of KLAT2's design, we include their cost. We also included 16 spare NICs and several spare surge protectors. Due to University of Kentucky purchasing guidelines and part stocking issues, purchases from the same vendor were sometimes split in odd ways and there were various inconsistencies about how shipping was charged; although the vendor totals are correct, we have had to approximate the component cost breakdown in these cases.

The following table details the cost of KLAT2. Although specific vendors are listed, note that being listed here should not be taken as an implicit endorsement by the authors or by the University of Kentucky. Aside from donation of the Athlons, there were no exceptional discounts or other arrangements with any of the vendors.

Vendor and Part Descriptions Cost
66 Donated 700MHz Athlon OEM processor modules @ ~$200
66 128MB PC100 CAS2 SDRAMs @ $93
Technology Partners
66 Polaris II ATX Mid Towers 300W @ $58
66 Sony 1.44MB Floppy Drives (for net boot) @ $11
Multiwave Technology
66 FIC SD11 Motherboards @ $104
10 Smartlink 32-port wire-speed 100Mb/s switches @ $527
28 Smartlink 100Mb/s NIC 10-packs @ $80
32 Hawking 15' color-coded Cat.5 cable 5-packs @ $9
32 Hawking 15' transparent color-coded Cat.5e cable 5-packs @ $12
66 AMD K7/PII Dual Fans (CPU heat sinks & fans) @ $5
66 DC Fans 80mm (extra case fans) @ $4
4 48"x18"x72" black wire-frame shelves @ $64
2 WindDance fans (to direct airflow between shelves) @ $15
20 Surgestrip model 201 surge protectors @ $4
Various local stores
16 3" diameter threaded-mount wheels for shelves @ $9
16 Pizzas for student helpers @ $10
4 Cases of soda student helpers @ $7
4 2" diameter threaded-mount wheels for rack @ $7
Available at no cost/indirectly used items
1 10-year-old rack & mounting hardware
1 Surplus 17" monitor used for cluster status
1 Old PCI video card used for cluster status
1 18GB EIDE disk drive
66 Set of inkjet-printed labels for each node
2 Other clusters for KLAT2's HW design and SW development
Total $41,205

In summary, KLAT2's total value is about $41,200, with the primary costs being roughly $13,200 in processors, $8,100 in the network, $6,900 in motherboards, and $6,200 in memory.

10 posted on 08/23/2003 9:08 AM PDT by justlurking
[ Post Reply | Private Reply | To 2 | View Replies ]

To: July 4th

In California the cases would all have Glory Holes in the side.

Kentucky is better by far.

11 posted on 08/23/2003 9:33 AM PDT by Reactionary
[ Post Reply | Private Reply | To 6 | View Replies ]

To: justlurking

These things are great! I built two 16 node clusters at a previous job, took one on tour to our booth at Linux World or Linux Expo, whichever is in San Jose, and got to meet Donald Becker who is the originator of the concept.
BTW there is nothing "new" about Beowulf architecture, people have been building them for 6-7 years.

Notice there is not a disk in each node? The nodes boot the kernel either from floppy or NVRAM, use ARP/RARP to get an IP address, NFS mount their file system, and then boot the rest of the OS including multiprocessing bits like PVM and MPI over NFS (Network File Share, a UNIX RPC protocol)

They are in use all over the place. US DOE has a few, NASA invented it, Oil companies build them for processing geological data, and animation studios use them as rendering farms. They are useful for any algorithm that can be sped up by parallel processing. The PVM and MPI libraries handle the parallel tasking. Interestingly, at night, you could even include unusued Windows 2000 servers into the parallel cluster. The software exists to do so!

12 posted on 08/23/2003 9:39 AM PDT by adam_az (.)
[ Post Reply | Private Reply | To 1 | View Replies ]

To: justlurking

I was wondering when the anti-Linux crowd would descend on this thread

Whaaaat? They're not running XP Home Edition?

For chuckles, consider the cost if they'd had to include the cost of MS software licences for each CPU. The humanity!

13 posted on 08/23/2003 9:46 AM PDT by LTCJ
[ Post Reply | Private Reply | To 8 | View Replies ]

To: adam_az

Notice there is not a disk in each node? The nodes boot the kernel either from floppy or NVRAM, use ARP/RARP to get an IP address, NFS mount their file system, and then boot the rest of the OS including multiprocessing bits like PVM and MPI over NFS (Network File Share, a UNIX RPC protocol)

No, there is no floppy. I believe the MSI motherboard BIOS can be configured to boot from a network address: I built a system with a different MSI motherboard earlier this year for someone else and noticed it the functionality.

14 posted on 08/23/2003 9:58 AM PDT by justlurking
[ Post Reply | Private Reply | To 12 | View Replies ]


Actually, the running joke elsewhere is that the already owe $699 for each of 128 CPUs for SCO's extortion scheme. That's $89,472: over twice the cost of the hardware.

15 posted on 08/23/2003 10:01 AM PDT by justlurking
[ Post Reply | Private Reply | To 13 | View Replies ]

To: justlurking

Yup, then they flashed the kernel into nvram. To boot from a network address, you have to have enough driver loaded alreadyfor the ethernet card and network file system mount functionality to work. For a Beowulf, you have to compile this stuff directly into the kernel, and not use loadable modules.

16 posted on 08/23/2003 10:35 AM PDT by adam_az (.)
[ Post Reply | Private Reply | To 14 | View Replies ]

To: adam_az

No, this was something separate in the MSI BIOS. It wasn't even documented in the manual. I also remember that it was configured that way by default and I had to change it to boot from the CD. That particular BIOS can apparently support the network interface.

From what I've been able to find, the subsequent steps are to get an IP address from a DHCP server, then download the kernel from a TFTP server. From there, it's clear sailing.

17 posted on 08/23/2003 10:57 AM PDT by justlurking
[ Post Reply | Private Reply | To 16 | View Replies ]

To: Libertarianize the GOP; *tech_index; Salo; MizSterious; shadowman99; Sparta; freedom9; ...

Thanks, I like the 20 inch box fans from Walmart!

I may try that to keep my main box cool!


18 posted on 08/23/2003 1:58 PM PDT by Ernest_at_the_Beach (All we need from a Governor is a VETO PEN!!!)
[ Post Reply | Private Reply | To 9 | View Replies ]

To: justlurking

I hope this breakthrough will help meterologists predict the weather accurately.


19 posted on 08/23/2003 2:05 PM PDT by M Kehoe
[ Post Reply | Private Reply | To 1 | View Replies ]

To: justlurking

The researchers loaded WinXP on the thing and described its performance as "peppier."

20 posted on 08/23/2003 2:10 PM PDT by Petronski (I'm not always cranky.)
[ Post Reply | Private Reply | To 1 | View Replies ]

Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.

[ Browse | Search | Topics ]

FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794
Powered by Focus Forum (working name), Copyright 2000-2002 Robinson-DeFehr Consulting, LLC