SC 09 in Portland, OR
Summary By John Ford(JF), Pam Ford (PF), and Patrick Prandt(PB) Unattributed sections by random combinations of the authors.
The themes of this year's conference were:
- 3D Internet
- Sustainability (energy efficiency)
Next year's themes (SC10, New Orleans, LA, Nov 13-19) are:
- Heterogenous Computing
- Data Intensive Computing
- Climate Modeling
Two of the three topics for next year are directly applicable to our work.
I think it was very useful to have our exhibit booth. The display was well done, as was the looping slide show. We had one of our FPGA processors there and operating with the artifical pulsar. Paul Demorest gets the credit for getting the software up and running with the prototype spectrometer personality in the FPGA, and Jason Ray gets credit for debugging the personality. We had many people stop by and ask questions about us. I made several contacts with people who are interested in our work. Having some demo hardware there was good in that most of the people at the show are really interested in hardware and off-the-beaten-path stuff, and it attracted attention. We should have a more organized schedule for manning the booth, though, as I think we left Mark stranded there sometimes.
The Rise of the 3D Internet - Advancements in Collaborative and Immersive Sciences
Intel Senior Fellow and CTO, Justin Rattner
Abstract: Forty Exabytes of unique data will be generated worldwide in 2009. This data can help us understand scientific and engineering phenomenon as well as operational trends in business and finance. The best way to understand, navigate and communicate these phenomena is through visualization. In his opening address, Intel CTO Justin Rattner will talk about today’s data deluge and how high performance computing is being used to deliver cutting edge, 3D collaborative visualizations. He will also discuss how the 2D Internet started and draw parallels to the rise of the 3D Internet today. With the help of demonstrations, he will show how rich visualization of scientific data is being used for discovery, collaboration and education.
The CTO from Intel gave a presentation on what could be done in the future with simulations (in the manner of "Second Life"). It was interesting, but to be honest I think we should save the electricity by going outside into our "First Life". He demonstrated a mockup of the Larabee processor that was, I think, a bit contrived. I suspect that Nvidia could have done the same with a couple of GPU's in a box. Update: Larabee processor's been cancelled. It indeed was a sham...
Systems Medicine, Transformational Technologies and the Emergence of Predictive, Personalized, Preventive and Participatory (P4) Medicine
President and co founder of the Institute for Systems Biology, Leroy Hood, M.D., Ph.D.
Abstract: The challenge for biology in the 21st century is the need to deal with its incredible complexity. One powerful way to think of biology is to view it as an informational science. This view leads to the conclusion that biological information is captured, mined, integrated by biological networks and finally passed off to molecular machines for execution. Hence the challenge in understanding biological complexity is that of deciphering the operation of dynamic biological networks across the three time scales of life—evolution, development and physiological responses. Systems approaches to biology are focused on delineating and deciphering dynamic biological networks and their interactions with simple and complex molecular machines. It appears that systems medicine, together with pioneering changes such as next-generation DNA sequencing and blood protein measurements (nanotechnology) and as well as the development of powerful new computational and mathematical tools will transform medicine over the next 5-20 years from its currently reactive state to a mode that is predictive, personalized, preventive and participatory (P4).
An excellent talk, and from my perspective, the coolest thing is how large-scale computing is being used for drug development and the understanding of disease processes. Also the power of large-scale databases for sorting out cause and effect of disease processes.
Keynote Address: Building Solutions: Energy, Climate and Computing for a Changing World
Former U.S. Vice President, Al Gore
Vice President's Gore's wit and charm showed why he has been such a successful politician, although he is (in his own words) "no longer the next President." Gore pointed out that human beings are wired to respond to immediate dangers and not gradual ones like global climate change. It is up to the supercomputing community not only to use modeling to get people to respond to this crisis, but also to find innovative solutions/alternatives to the worst greenhouse-gas-emitting processes. As an example, Gore related how, when it was discovered that a large hole in the ozone layer had been caused by the use of CFC's, the Nortel CEO promised those at a conference that his company would no longer use CFC's to clean their electronics. His engineers back home were dismayed at this promise, but they then came up with clean manufacturing processes that negated the need for the CFC's.
Principles and Practice of Application Performance Measurement and Analysis on Parallel Systems (PB)
The talk discussed many aspects of performance analysis:
- How to define performance: FLOPS vs OPS, wall vs CPU time
- How to measure performance: profiling, tracing, ...
- Tools for gathering metrics: PAPI, TAU, mpiP, ...
- Analyzing metrics: Vampir, TAU
Parallel Computing 101 (PF)
This tutorial has been offered before with the same presenters, and excellent notes by Bob Garwood can be found here
Expanding Your Debugging Options (PF)
Abstract (shortened): Techniques introduced will include: interactive debugging of MPI programs; graphical interactive remote debugging; scripted unattended debugging; and reverse debugging for the hardest bugs. Participants should bring an x86 or x86-64 laptop with CD or DVD drive to participate in the hands-on portion of the tutorial.
Sounds good, but this was really a tutorial on "how to use our TotalView software." Patrick also attended this session and said most of the features are available with gdb, which is free.
Python for High Performance and Scientific Computing (PF)
Abstract (shortened): This tutorial provides an introduction to Python focused on HPC and scientific computing. Throughout, we provide concrete examples, hands-on examples, and links to additional sources of information. The result will be a clear sense of possibilities and best practices using Python in HPC environments. We will cover several key concepts: language basics, NumPy
, parallel programming, performance issues, integrating C and Fortran, basic visualization, large production codes, and finding resources. While it is impossible to address all libraries and application domains, at the end participants should be able to write a simple application making use of parallel programming techniques, visualize the output, and know how to confidently proceed with future projects with Python.
Unfortunately, having >120 participants rather than the expected ~40 killed the interactive part of the sessions. We couldn't even download an updated pdf of the tutorial notes, much less log into their server. Instead, we were shown a real-world implementation using massively parallel Python and C (https://wiki.fysik.dtu.dk/gpaw
), and a demonstration of web2py (www.web2py.com), a web framework that was used to upload and analyze DNA sequences. Note: Python 2.6 has a new multiprocessing module.
Reconfigurable Computing (JF)
The talks were dominated by reconfigurable computing folks from CHREC, the NSF's Center for High Performance Reconfigurable Computing. We should write a paper for this workshop next year if they have it again, as our applications are nearly orthogonal to those presented. Most of the talks were on how to speed up large parallelizable problems, and there were none on real-time applications. I think that is in part due to the fact that most real-time applications are military.
There was a lot of talk about how the GPU is eating the FPGA's lunch in high performance computing. Several people opined that one reason is the huge marketing budget of Nvidia, and the amount of money poured into the tools. There was a talk that showed a chart of "Computational Density per Watt", which showed that an FPGA is clearly superior to a GPU or CPU for any CPU bound problem.
In the end, the consensus is that the FPGA is at the point now where tools are what is holding back the applications, and that clearly the watts/op favor the FPGA, especially where integer arithmetic will suffice.
Several technical talks presented a common theme, and that is that we all need to go back and figure out what we really are trying to do. We have spent the last 50 years trying to serialize our algorithms, and now we are working from those serial algorithms and trying to parallelize them. We should go back to the problem we want to solve and design a parallel algorithm instead of trying to parallelize a serial algorithm.
One presenter likened the current state of affairs to that of the development of the jet engine. Back in the day, it was clearly easier to increase power by adding cylinders or displacement to the existing piston engine. THe jet engine did not have many takers, only those with a long-term vision and the willingness to try an unproven technology. Obviously, the jet won out in the end for high-power engines.
Ultrascale Visualisation Workshop (JF)
I went to the first few talks on this, but I bailed out and went to talks that were more my speed. I did manage to get some contact information for all of these guys. I also got leads on who to approach for partnering with to attempt to get some support for our viz projects. The workshop was sponsored/organized by UC Davis folks at http://vis.cs.ucdavis.edu/Ultravis09/
Semantic Designs: Vector C++ (PB)
The talk mainly focused on a rule-based transformation engine written by Semantic Designs. In the past it was used to convert legacy code for the B2 Bomber from JOVIAL
to C, using only transforms from JOVIAL language constructs to C constructs (supposedly without ever seeing the actual JOVIAL code). The newest supercomputing application created from this technology is a C++ vectorization engine. It uses custom vector syntax to simplify coding, then is converted to STL vector code before compilation.
Debuggers: Allinea Distributed Debugging Tool (DDT) and Optimization and Profiling Tool (OPT); TotalView
Optimization Tools: Allinea OPT, Nema Labs FASThread, Acumem SlowSpotter
from Tech-X Corporation (free plugin for VisIt
, uses HDF5 data format, website also mentions a set of Python scripts using matplotlib )
GPU: NVIDIA Nexus (brings GPU Computing into Visual Studio 2008; debug, profile, and analyze GPU code; supports CUDA C, OpenCL
, and OpenGL
Tutorials: Dauger Research (http://daugerresearch.com/vault/tutorials.shtml
Technical Talks and Panels
The CHAPEL programming language (JF)
A very cool talk by a Cray Computer person. The language will run on anything. It allows the programmer to specify at compile and run time the organization of the computer, that is, the number of CPU's, tasks, etc. that allow the code to be optimized for the platform it's running on. It has directives for mapping the tasks onto particular types of memory, cpu's GPU's etc.
Co-Arrays and Multi-core Computing (JF)
The gist of this talk was that current computer architecture is similar to the Cray XMP,YMP, Cray-2,Cray-3... They were all shared memory multiprocessor machines, just like today's multicore machines. Everyone who knew how to progrma these is long gone, and nobody remembers how to do it. GPU's look a lot like the Cyber 205, Connection Machine, MASPAR, etc. This was one of the guys who wrote the Fortran Global Arrays extension, who was himself an old-timer.
All that is old is new again...
Panel Discussion on Multicore Programming (JF)
Benedict Gaster (AMD), Kevin Goldsmith (Adobe Systems Inc.), Tom Murphy (Contra Costa College) Steven Parker (NVIDIA), Michael Wrinn (Intel Corporation), Evan Smyth (DreamWorks
Abstract: Multicore platforms are transforming the nature of computation. Increasingly, FLOPs are free, but the people who know how to program these petascale, manycore platforms are not. This panel, composed of a diverse set of industry and academic representatives, is focused on presenting and discussing the abstractions, models, and (re-)training necessary to move parallel programming into a broad audience. The panelists, often vigorous competitors, will present the situation from their respective viewpoints, likely fueling a lively discussion of the entire room as has been seen recently in other venues.
One interesting theme was that people learning to program have the ability to program parallel system beaten out of them as they learn serial programming. Experiments showed that kids pick up and use parallel programming easily if they are not frightened off. Universities do a poor job of education in this case. Another interesting theme is that people are already using this stuff in production. The guys from Adobe and DreamWorks
both said that their programs are all parallelized to some extent.
Increase I/O bandwidth with flash technology from Intel, Fusion-io, Sun, and Spansion. Expensive compared to HDD's but less hardware/iops = less real estate = less cooling (energy savings). myspace converted to Fusion-io products and now has its storage in 1/4 the space.
I spent much of Tuesday-Thursday in the exhibit hall. I spent a lot of time looking for potential partners for visualization projects. That info is in a separate note to Amy. I also found some potential collaborators among the reconfigurable computing folks. Alan George, Director of CHREC would like the opportunity to speak at an SKA meeting to address large-scale systems and reconfigurable computing.
- Cluster computing is very big. Many vendors of clusters and cluster management software were in attendance. These don't hold much excitement for our data acquisition purposes, but they can be incredibly cost-effective for our scientific computing workloads.
- Intel's Nehalem architecture is faster than the current hypertransport machines from AMD. Memory bandwidth is higher for small numbers of processors.
- AMD is coming out with a faster hypertransport bus in the spring, and should regain the memory bandwidth lead again for the time being.
- (Note from last year repeated...) Multicore processors are going to hit a memory bandwidth wall very soon.
- Convey computer company (FPGA coprocessor tightly coupled to Intel FSB) This year, they have a much nicer and tightly integrated system. Still no external interface for real time data.
- Nvidia and AMD both showed their latest stuff.
- (Note from last year repeated...) Compiler technology seems to be coming along. PG GPU compiler is released. Rapidmind, mitrionics, and a different French company (CAPS, I think) all have GPU accelerated compilers. Several Matlab accelerator companies exhibited. One in particular , Jacket, allows you to compile the matlab application into an executable.
- Open Source Cluster file system (gluster) + commodity data servers + support contract = ~$1K/terabyte for fast (2.5 GB/s) quality storage
- Hardware based storage systems ($$$$$$$$$) Insane speeds and cost. 5-20 GB/sec, >$10K/terabyte
- Most Use IB or 10 GbE for NAS, Fiber Channel 8 Gb/s for SAN's. FC is expensive. 40 Gb/s IB is now commonplace, and about the same cost as 10 GbE.
- Use either SAS or SATA for drives. Use "enterprise" class drives.
- Red Barn Computers, Penguin, TeamHPC, Silicon Mechanics, other smaller vendors reasonably priced, knowledgeable of our price point. RAID, inc. is a team member with Gluster, and has quoted a system for my ATI proposal that comes in at about $0.5 per terabyte. These smaller vendors are eating the larger companies' lunch, as the market for big expensive systems has collapsed due to the economic conditions.
- IB is very low latency, scales to 40 Gb/s. Qlogic and Mellanox are the silicon vendors to watch.
- Optical CX-4 cables getting cheaper. About $500 each for a few 10's of meters.
- IB is built into most HP computing platforms due to very low latency. Most software and hardware uses OFED software stack and drivers.
- 40 Gb/s IB is now common. About 1K/drop, including NIC and cables. Optical connects for long runs available off the shelf.
- Mellanox seems to be the vendor to beat in this market.
- The Ethernet alliance is trying to force convergence to Ethernet. It's not working, as 40 Gb/s Infiniband is way cheaper than 4 aggregated 10 Gb Ethernet ports.
- Cluster file systems (Lustre, gluster, ...)
- Cluster management systems (Rocks, Scylld, etc.)
- Compilers for native and accelerators (Portland Group, Rapid Mind, Mitrionics, ...
- Many matlab add-ins for harnessing GPUs for matlab acceleration.
Last year's summary. It still applies!
Useful show for seeing what is out there, and I went with an eye for:
- How can we rebuild our backend infrastructure for the future?
- How can we use OTS computing for GUPPI and CICADA
- Is there anything of interest to PTCS for simulation or modeling.
- Should we be looking at infiniband?
The answers to these broad questions are, IMO:
- How can we rebuild our backend infrastructure for the future? I think the answer is that we need to create the storage architecture that can hold our data, and have enough bandwidth that we can stream data onto it while serving up data to the observers. This can be done by buying a storage system off the shelf for $1M or so, or by integrating some systems ourselves using disks and CPU's we procure and combine into a system. We need to do some analysis and design on our probable future needs, assuming for instance our FPA's get funded and built. I got enough information from enough disparate sources to have a good feel for where we should be headed with our needs and our budgets.
- How can we use OTS computing for GUPPI and CICADA? GPU's, Convey Computer, Multicore AMD multiprocessor machines. Possibly use PG compilers, other vendors to help bootstrap our way forward.
- Is there anything of interest to PTCS? Obviously, all the HPC interesting for complex models. It seems like using GPU power for running models in simulation mode would be fine, but I don't think the reliability is there for control purposes yet. But the multicore multiprocessor hardware would be interesting to use for running models.
- Should be be looking at infiniband? Yes. The ability to transfer data at 20 or 40 Gb/s is looking very attractive, and the microsecond latencies are as well. It is cheaper than 10 GbE on a bits per second basis.
Last year I went out there without much idea as to what to expect, and I got a lot of info, but at a very shallow level. This year, I went out with some very specific questions, and spend time talking to people who had implemented system about like the ones I see us needing to implement. I got some very good answers. Including honest ones from Panassas and other high-end vendors telling me that we were not playing in the same league with them.
This year's summary
Each year there are new vendors, and repeat vendors. All the big names are there in some form. I think that this year there were a lot of vendors pushing low-end storage, and that low-end storage is fast enough and reliable enough for our purposes. The other big thing this year is the explosion of GPU stuff. A third thing is the number of cluster management system vendors. All the big universities and labs had booths that showed off their specialties, and this was valuable to me to see what people with lots of money can achieve, and to show who we might want to partner with.
Portland itself was a very pleasant city, with excellent transportation. We stayed in a hotel right near the convention center, but we ventured into town several times.
I think it would be worth spending some time on a booth for next year that shows off how our work fits into two of the three themes for next year.
Browse around on the linux-mag site for more sc09 stuff.