IEEE/ACM SC08 Aggregate.Org

This is the home page for our 15th major research exhibit at the IEEE/ACM Supercomputing conference. The exhibit is again under the title Aggregate.Org / University of Kentucky, the informal research consortium led by our KAOS (Compilers, Hardware Architectures, and Operating Systems) group here at the University of Kentucky's Department of Electrical and Computer Engineering.

A Maze Of Twisty Little Passages

The big thing in our exhibit this year is a technical demonstration consisting of a large wooden maze with four balls in it. Each of the colored balls has a different path to take (MIMD), yet it is perfectly feasible to get all the balls to their respective destinations by a series of tilts of the table (SIMD). Yes, you really can execute MIMD code on SIMD hardware with good efficiency... and that's what our latest software does for GPUs (Graphics Processing Units). Specifically, it can take shared-memory MIMD code written in C and efficiently execute it on an NVIDIA CUDA GPU.

Why do this? GPUs thus far have not had stable, portable, programming support for general-purpose use, so there is virtually no code base for supercomputing applications. Our technology allows codes written for popular cluster and SMP target models to be used directly. We plan to support both C and Fortran with both shared memory and MPI message passing. The current shared memory model uses "parallel subscripting" in which a[||b] means a in processor b's memory; we initially assumed OpenMP would be the prefered model, but have had requests for Posix Threads. It is surprisingly easy to support dynamic thread creation, although there are performance issues involving memory bank conflicts in sharing the complete memory map.

How well does MIMD code perform? It is too early to give a definitive answer, but there are two different ways to run, and they have very different performance. The MIMD On GPU Simulator, mogsim, gets the same order of magnitude performance as the host running optimized native code, with macho GPUs around 8X the host and wimpy ones about 2X slower than the host. The MIMD On GPU Meta-State Converter, mogmsc, generates pure native code for the target GPU -- with no interpreter overhead -- and is as much as 100X faster than the simulator.

The one-page technical overview PDF is A Maze Of Twisty Little Passages. We have prepared a MOG homepage which contains more technical details. The following are the two key publications from when we invented the basic MIMD-on-SIMD technology more than a decade and a half ago -- the MOG environment is heavily based on this work.

Other Work Presented

For us, the past year has been largely about GPUs. Although we have been working on GPU programming environments as a major topic for about 5 years, only in the past year have the base GPU hardware and vendor software interfaces become mature enough for our work to move quickly. Thus, most of the new work we have been doing is related to GPUs, and last year's handouts (see our SC07 exhibit page) are still a pretty good overview of the other work we are doing. In addition to MOG, new handouts were created for:

There also has been significant work done (an MS project completed) on dynamic underclocking of GPUs to maintain a desired energy use/temperature profile, but this was not incorporated into any of the handouts.

Slow-Update Live View of our Exhibit

The camera is mounted on top of the sign in the front right corner of our exhibit, facing toward the Indiana University exhibit. Later, we'll post a time-lapse movie.

The Aggregate. The only thing set in stone is our name.