SC 08 in Austin, TX
Went to the Ohio State 10 GBE and Infiniband tutorials. Learned some, but not as much as I hoped. The "advanced" session repeated much of the basic session. Best part was talking about OFED and the OSU IB software.
Went to the reconfigurable computing workshop on Monday. Lots of talk about FPGA based systems, many off the shelf systems are now available, mostly for defense applications. But many are becoming available for financial modeling, medical modelling, and computational chemistry. Saw a talk on Convey Computer Company from the founder of Convey and Convex, Steve Wallach (Who also won this year's Symore Cray award).
Some information from sixis on silicon circuit boards and very high density packaging. Think a bee2 in a deck of cards.
Some interesting stuff on implementing floating point of exactly the precision you need to save hardware.
Some information on managing memory bandwidth issues. Obviously important for multicore stuff.
This is definitely worth attending if you're interested in reconfigurable or multicore or accelerated computing.
GPU programming was one of the better sessions. A microsoft labs paper on FFT's for GPU's showed impressive speedups. It organized the FFT's to take maximum advantage of the GPU multiprocessors.
Compiler writers Protland Group had a session of thier upcoming GPU accelerator compiler that takes your code and compiles it into accelerated code with the use of pragmas and such.
Xtech showed their open source gpulib that works in IDL.
Went to a breakfast by Panassas where the LANL Roadrunner computer was described. (#1 on TOP500 list) Interesting talk by the LANL system manager, not IBM or Panassas.
I spent most of my time in the exhibits each day. It was a huge exhibit hall.
- Microway, and _ showed a 32 core machine based on the Tyan board we have in beef, using 8 4 core AMD Shanghai processors.
- AMD rules for memory and bandwidth intensive apps.
- Intel rules if your app and data fit in the cache.
- Intel's new architecture using essentially a hypertransport system will be better than the current intel offerings.
- Multicore processors are going to hit a memory bandwidth wall very soon.
- Convey computer company (FPGA coprocessor tightly coupled to Intel FSB)
- Xtech - gpulib and consulting
- GPUTech - French company with consulting and a prebuild library
- Nvidia and AMD both showed their latest stuff. AMD's stuff is faster, but no/limited programming support so far. Lots of third party vendors for both, with nvidia in the lead by a large margin.
- Compiler technology seems to be coming along. PG GPU compiler is about to be released. Rapidmind, mitrionics, and a different French company (CAPS, I think) all have GPU accelerated compilers.
- Open Source Cluster file system (gluster) + commodity data servers + support contract = ~$1K/terabyte for fast (enough) quality storage
- Hardware based storage systems ($$$$$$$$$) Insane speeds and cost. 5-20 GB/sec, >$10K/terabyte
- Most Use IB or 10 GbE for NAS, Fiber Channel 8 Gb/s for SAN's. FC is expensive.
- Use either SAS or SATA for drives. Use "enterprise" class drives.
- Red Barn Computers, Penguin, TeamHPC, Silicon Mechanics, other smaller vendors reasonably priced, knowledgeable of our price point.
- IB is very low latency, scales to 40 Gb/s. Qlogic and Mellanox are the silicon vendors to watch.
- Optical CX-4 cables getting cheaper. About $500 each for a few 10's of meters.
- IB is built into most HP computing platforms due to very low latency. Most software and hardware uses OFED software stack and drivers.
- Cluster file systems (Lustre, gluster, ...)
- Cluster management systems (Rocks, Scylld, etc.
- Compilers for native and accelerators (Portland Group, Rapid Mind, Mitrionics, ...
- Not really much else. Mathworks, Wolfram, and a couple of other software vendors were there showing multi-core ready products.
Useful show for seeing what is out there, and I went with an eye for:
- How can we rebuild our backend infrastructure for the future?
- How can we use OTS computing for GUPPI and CICADA
- Is there anything of interest to PTCS for simulation or modeling.
- Should we be looking at infiniband?
The answers to these broad questions are:
- How can we rebuild our backend infrastructure for the future? I think the answer is that we need to create the storage architecture that can hold our data, and have enough bandwidth that we can stream data onto it while serving up data to the observers. This can be done by buying a storage system off the shelf for $1M or so, or by integrating some systems ourselves using disks and CPU's we procure and combine into a system. We need to do some analysis and design on our probable future needs, assuming for instance our FPA's get funded and built. I got enough information from enough disparate sources to have a good feel for where we should be headed with our needs and our budgets.
- How can we use OTS computing for GUPPI and CICADA? GPU's, Convey Computer, Multicore AMD multiprocessor machines. Possibly use PG compilers, other vendors to help bootstrap our way forward.
- Is there anything of interest to PTCS? Obviously, all the HPC interesting for complex models. It seems like using GPU power for running models in simulation mode would be fine, but I don't think the reliability is there for control purposes yet. But the multicore multiprocessor hardware would be interesting to use for running models.
- Should be be looking at infiniband? Yes. The ability to transfer data at 20 or 40 Gb/s is looking very attractive, and the microsecond latencies are as well. It is cheaper than 10 GbE on a bits per second basis.
Last year I went out there without much idea as to what to expect, and I got a lot of info, but at a very shallow level. This year, I went out with some very specific questions, and spend time talking to people who had implemented system about like the ones I see us needing to implement. I got some very good answers. Including honest ones from Panassas and other high-end vendors telling me that we were not playing in the same league with them.