Personalized Turnkey Superclusters (PeTS)

In comparison to traditional supercomputers, systems built by clustering PC hardware are cheap and offer many configuration options. We will take full advantage of these properties by customizing cluster hardware and software so that a PeTS system will appear to its scientific users as a dedicated piece of "laboratory equipment" that directly solves their most important computational problems. Thus, a PeTS system is a turnkey application engine that yields the performance of a dedicated supercomputer for the one problem it was designed to solve.

The Proposal

We have submitted a PeTS proposal to NSF's ITR program. The full text of the project description is here as a PDF file.

Our Current Facilities

In addition to the traditional supercomputer facilities hosted by the University of Kentucky Center for Computational Sciences, various workstation and software development labs, Dietz has established a new laboratory at the University of Kentucky for the integration of Compilers, Hardware Architectures, and Operating Systems -- the KAOS Lab. The KAOS lab currently hosts four Linux PC clusters: KLAT2 (Left, 64+2 Athlon), Opus (Center, 16+1 K6-2 with 6,400x4,800 pixel video wall), Odie (Right, 4+1 Athlon), and Galatica (not shown, a cluster being assembled from surplus PCs). The lab has sufficient space to house more than twice as many clusters, and a new power upgrade and air conditioning system were installed in May 2000. This lab would house the development cluster mentioned in the proposal; the actual PeTS systems, once complete, would of course be housed in the lab of the corresponding scientist or engineer.

Reference Materials

The proposal cites a variety of reference materials; those, plus several other relevant sites, are linked here in roughly the order in which they are discussed in the proposal. Unless otherwise noted, all links are to HTML documents.

An Introduction to Static Scheduling for MIMD Architectures
This paper discusses one of the key technologies that we have developed using barrier synchronization, timing analysis, and code scheduling to achieve higher performance from parallel applications that have interprocessor communication in performance-critical portions of their code.
KLAT2's Flat Neighborhood Network
Another key technology that we have been developing involves the design of customized cluster message-passing networks by genetic search. The first such system is KLAT2, Kentucky Linux Athlon Testbed 2, described here. Note that the 66 Athlon processors in KLAT2 were donated to us by AMD as part of their continuing support for our research in PC-based supercomputing.
SIMD Within A Register
We also have been developing compiler technology to make use of multimedia instruction set extensions (e.g., MMX and 3DNow!) for scientific computation. In fact, using just a little 3DNow! acceleration, KLAT2's performance on the ScaLAPACK benchmark (the one used for the Top500 Supercomputers list) is 64.459 GFLOPS -- without SWAR, it is around 28 GFLOPS.
The UTMC eCard Attached Parallel Processor
This is a technical overview of the content addressable memory (CAM) attached parallel processor PCI card discussed in the proposal. We are working with Aeroflex, UTMC's parent company, on the development of the eCard for cluster parallel supercomputing.
University of Kentucky Center for Computational Sciences
The center for computational sciences will coordinate the PeTS project application selection, interactions with scientists and engineers, and distribution and support of the completed PeTS designs.
Current Applications Groups Working With Us
A small sampling of the applications groups that the Center for Computational Sciences has been working with.... In addition, Dietz's cluster group has been discussing specific applications with Carol Post at Purdue University, Trevor Creamer at the University of Kentucky College of Medecine, Stephen Gedney at the University of Kentucky, and several other researchers.
The OVERSET Tools For CFD Analysis
The particular application that we have selected for the first PeTS system is the OVERSET tools for CFD analysis. We have been working with George Huang and his students on three separate CFD codes. George is one of the developers of the widely accepted OVERSET tools, and we will target the PeTS system to this CFD application set. However, George also is working on a version using some improved algorithms that may later be incorporated into OVERSET, and still a third version is being used for our Gordon Bell price/performance award submission for KLAT2. This third version, written entirely in C, is easier to tune for our cluster technology and the experience will allow us to be more effective in porting the OVERSET tools.
PAPERS
PAPERS, Purdue's Adapter for Parallel Execution and Rapid Synchronization, is cheap public domain network hardware and software that implements a wide range of operations on global state for Linux PC clusters. The best overview is an article that appeared in the Purdue Extrapolations magazine. Currently, the WWW site is somewhat fragmented because we are in the process of moving to http://aggregate.org/, a site created to better reflect the fact that Aggregate Function Network research is no longer based at Purdue, but involves many institutions, with the University of Kentucky now taking the lead role.
PCCTS, Antlr, etc.
Because Dietz and his students are frequently building specialized compilers and other translation systems to take advantage of particular features of particular applications (e.g., the Fortran-P compilers we built for Paul Woodward's CFD codes) and architectures (e.g., MasPar MP1; Thinking Machines CM2, CM200, CM5; Cray T3D; various SWAR targets; and many one-of-a-kind research machines), his group developed PCCTS: the Purdue Compiler Construction Tool Set. PCCTS, combined with our other tools, makes writing specialized compilers relatively easy. There is also a network newsgroup for PCCTS: comp.compilers.tools.pccts
The LDP Parallel Processing HOWTO
Dietz is the author of the Linux Documentation Project's Parallel Processing HOWTO, which, through several versions, has been the primary guide to all forms of parallel processing using Linux PCs. Note: this is the complete guide to all forms of Linux PC parallel processing, not the "Beowulf HOWTO" that was produced later by other authors to aid people in "cookbook" configuration of clusters.

A Few Overview Slides...

The following postscript/PDF slides are available:

Overview (.ps or .pdf)
Flat Neighborhood Network (.ps or .pdf)
Aggregate Function Network (.ps or .pdf)
SIMD Within A Register (.ps or .pdf)
KLAT2 & Friends (.ps or .pdf)

PI Contact Info

Professor Hank Dietz, James F. Hardymon Chair in Networking
College of Engineering
Electrical Engineering Department
453 Anderson Hall
(Office 307 EE Annex, Lab 672 Anderson Hall)
Lexington, KY 40506-0046

Office Phone: (606) 257 4701
Lab Phone:    (606) 257 9695
Fax :         (606) 257 3092
Email: hankd@engr.uky.edu
Home URL: http://www.cs.uky.edu/~hankd/

The Aggregate. The only thing set in stone is our name.