Personalized Turnkey Superclusters (PeTS)
In comparison to traditional supercomputers, systems built by
clustering PC hardware are cheap and offer many configuration
options. We will take full advantage of these properties by
customizing cluster hardware and software so that a PeTS system
will appear to its scientific users as a dedicated piece of
"laboratory equipment" that directly solves their most
important computational problems. Thus, a PeTS system is a
turnkey application engine that yields the performance of a
dedicated supercomputer for the one problem it was designed to
solve.
The Proposal
We have submitted a PeTS proposal to NSF's ITR program. The
full text of the project description is here as a
PDF file.
Our Current Facilities
In addition to the traditional supercomputer facilities hosted
by the University of Kentucky
Center for Computational Sciences, various workstation and
software development labs, Dietz has established a new
laboratory at the University of Kentucky for the integration of
Compilers, Hardware Architectures, and Operating Systems -- the
KAOS Lab. The KAOS lab currently hosts four Linux PC clusters:
KLAT2 (Left, 64+2 Athlon), Opus (Center, 16+1 K6-2 with
6,400x4,800 pixel video wall), Odie (Right, 4+1 Athlon), and
Galatica (not shown, a cluster being assembled from surplus
PCs). The lab has sufficient space to house more than twice as
many clusters, and a new power upgrade and air conditioning
system were installed in May 2000. This lab would house the
development cluster mentioned in the proposal; the actual PeTS
systems, once complete, would of course be housed in the lab of
the corresponding scientist or engineer.
Reference Materials
The proposal cites a variety of reference materials; those, plus
several other relevant sites, are linked here in roughly the
order in which they are discussed in the proposal. Unless
otherwise noted, all links are to HTML documents.
- An Introduction to Static Scheduling for MIMD Architectures
-
This paper discusses one of the key technologies that we have
developed using barrier synchronization, timing analysis, and
code scheduling to achieve higher performance from parallel
applications that have interprocessor communication in
performance-critical portions of their code.
- KLAT2's Flat Neighborhood Network
-
Another key technology that we have been developing involves the
design of customized cluster message-passing networks by genetic
search. The first such system is KLAT2, Kentucky Linux Athlon
Testbed 2, described here. Note that the 66 Athlon processors
in KLAT2 were donated to us by AMD as part of their continuing support for our research
in PC-based supercomputing.
- SIMD Within A Register
-
We also have been developing compiler technology to make use of
multimedia instruction set extensions (e.g., MMX and 3DNow!) for
scientific computation. In fact, using just a little 3DNow!
acceleration, KLAT2's performance on the ScaLAPACK benchmark
(the one used for the Top500
Supercomputers list) is 64.459 GFLOPS -- without SWAR, it
is around 28 GFLOPS.
- The UTMC eCard Attached Parallel Processor
-
This is a technical overview of the content addressable memory
(CAM) attached parallel processor PCI card discussed in the
proposal. We are working with Aeroflex, UTMC's parent
company, on the development of the eCard for cluster parallel
supercomputing.
- University of Kentucky Center for Computational Sciences
-
The center for computational sciences will coordinate the PeTS
project application selection, interactions with scientists and
engineers, and distribution and support of the completed PeTS
designs.
- Current Applications Groups Working With Us
-
A small sampling of the applications groups that the Center for
Computational Sciences has been working with.... In addition,
Dietz's cluster group has been discussing specific applications with
Carol Post at Purdue University,
Trevor Creamer at the University of Kentucky College of Medecine,
Stephen Gedney at the University
of Kentucky, and several other researchers.
- The OVERSET Tools For CFD Analysis
-
The particular application that we have selected for the first
PeTS system is the OVERSET tools for CFD analysis. We have been
working with George Huang and his students on three separate CFD codes.
George is one of the developers of the widely accepted OVERSET
tools, and we will target the PeTS system to this CFD
application set. However, George also is working on a version
using some improved algorithms that may later be incorporated
into OVERSET, and still a third version is being used for our
Gordon Bell price/performance award submission for KLAT2. This
third version, written entirely in C, is easier to tune for our
cluster technology and the experience will allow us to be more
effective in porting the OVERSET tools.
- PAPERS
-
PAPERS, Purdue's Adapter for Parallel Execution and Rapid
Synchronization, is cheap public domain network hardware and
software that implements a wide range of operations on global
state for Linux PC clusters. The best overview is an article that
appeared in the Purdue Extrapolations magazine. Currently,
the WWW site is somewhat fragmented because we are in the
process of moving to http://aggregate.org/, a site created to better
reflect the fact that Aggregate Function Network research is no
longer based at Purdue, but involves many institutions, with the
University of Kentucky now taking the lead role.
- PCCTS, Antlr, etc.
-
Because Dietz and his students are frequently building
specialized compilers and other translation systems to take
advantage of particular features of particular applications
(e.g., the Fortran-P compilers we built for Paul Woodward's CFD
codes) and architectures (e.g., MasPar MP1; Thinking Machines
CM2, CM200, CM5; Cray T3D; various SWAR targets; and many
one-of-a-kind research machines), his group developed PCCTS:
the Purdue Compiler Construction Tool Set. PCCTS, combined with
our other tools, makes writing specialized compilers relatively
easy. There is also a network newsgroup for PCCTS:
comp.compilers.tools.pccts
- The LDP Parallel Processing HOWTO
-
Dietz is the author of the Linux Documentation Project's
Parallel Processing HOWTO, which, through several
versions, has been the primary guide to all forms of parallel
processing using Linux PCs. Note: this is the complete guide
to all forms of Linux PC parallel processing, not the "Beowulf
HOWTO" that was produced later by other authors to aid people in
"cookbook" configuration of clusters.
A Few Overview Slides...
The following postscript/PDF slides are available:
Overview
(.ps or .pdf)
Flat Neighborhood Network
(.ps or .pdf)
Aggregate Function Network
(.ps or .pdf)
SIMD Within A Register
(.ps or .pdf)
KLAT2 & Friends
(.ps or .pdf)
PI Contact Info
Professor Hank Dietz, James F. Hardymon Chair in Networking
College of Engineering
Electrical Engineering Department
453 Anderson Hall
(Office 307 EE Annex, Lab 672 Anderson Hall)
Lexington, KY 40506-0046
Office Phone: (606) 257 4701
Lab Phone: (606) 257 9695
Fax : (606) 257 3092
Email: hankd@engr.uky.edu
Home URL: http://www.cs.uky.edu/~hankd/
The only thing set in stone is our name.