The show was held in New Orleans. The convention center is downtown by the river, about 3/4 mile from the French Quarter. Very convenient!
The themes of this year's conference were:
- Heterogeneous Computing
- Data Intensive Computing
- Climate Modeling
Next year's themes (SC11, Seattle, WA, Nov 12-18) are:
- Data-intensive Science
- Sustained Performance
Clayton M. Christensen
Clayton M. Christensen is the Robert and Jane Cizik Professor of Business Administration at the Harvard Business School, and is widely regarded as one of the worlds foremost experts on innovation and growth.
The talk was on innovation in a disruptive environment.
He talked about how companies innovate, or fail to. His idea is that the large companies are overcome by smaller, more agile companies because they cannot move fast, due to their bulk and their bureaucracy. Spin-offs of successful companies are more likely to survive and thrive, because they throw off the shackles of said bureaucracy. An example that he gave was how the US steel mills slowly lost market share by repeatedly giving up the least profitable segments of the market to go after the more lucrative segments. The end result is that the smaller competitors eventually ate their lunch.
Another example was how Dell started out making most of their parts, designing their computers, and handling manufacture, sales, and support, and how over the years, they got out of the less profitable parts of the market, until today they are just a marketing company.
Here's the abstract:
How to Create New Growth Businesses in a Risk-Minimizing Environment
Disruption is the mechanism by which great companies continue to succeed and new entrants displace the market leaders. Disruptive innovations either create new markets or reshape existing markets by delivering relatively simple, convenient, low cost innovations to a set of customers who are ignored by industry leaders. One of the bedrock principles of Christensen's disruptive innovation theory is that companies innovate faster than customers' lives change. Because of this, most organizations end up producing products that are too good, too expensive, and too inconvenient for many customers. By only pursuing these "sustaining" innovations, companies unwittingly open the door to "disruptive" innovations, be it "low-end disruption" targeting overshot-less-demanding customers or "new-market disruption", targeting non-consumers.
- Many of todays markets that appear to have little growth remaining, actually have great growth potential through disruptive innovations that transform complicated, expensive products into simple, affordable ones.
- Successful innovation seems unpredictable because innovators rely excessively on data, which is only available about the past. They have not been equipped with sound theories that do not allow them to see the future perceptively. This problem has been solved.
- Understanding the customer is the wrong unit of analysis for successful innovation. Understanding the job that the customer is trying to do is the key.
- Many innovations that have extraordinary growth potential fail, not because of the product or service itself, but because the company forced it into an inappropriate business model instead of creating a new optimal one.
- Companies with disruptive products and business models are the ones whose share prices increase faster than the market over sustained periods
Dally is Chief Scientist for Nvidia. He talked about how the GPU could be used for an exascale machine; in the future the GPU is
the computer, not just an accelerator. (400 cabinets worth of stuff...) He talked about changes to GPU architecture to make GPUs more tuned for this application. It sounds like they are throwing in whole-hog. For now, need to stick with electric (vs. optical) interconnects for lower power.
Performance per Watt is the new performance. In todays power-limited regime, GPU Computing offers significant advantages in performance and energy efficiency. In this regime, performance derives from parallelism and efficiency derives from locality. Current GPUs provide both, with up to 512 cores per chip and an explicitly-managed memory hierarchy. This talk will review the current state of GPU computing and discuss how we plan to address the challenges of ExaScale
computing. Achieving ExaFLOPS
of sustained performance in a 20MW power envelope requires significant power reduction beyond what will be provided by technology scaling. Efficient processor design along with aggressive exploitation of locality is expected to address this power challenge. A focus on vertical rather than horizontal locality simplifies many issues including load balance, placement, and dynamic workloads. Efficient mechanisms for communication, synchronization, and thread management will be required to achieve the strong scaling required to achieve the 1010-thread parallelism needed to sustain an ExaFLOPS
on reasonable-sized problems. Resilience will be achieved through a combination of hardware mechanisms and an API that allows programs to specify when and where protection is required. Programming systems will evolve to improve programmer productivity with a global address space and global data abstractions while improving efficiency via machine independent abstractions for locality.
Davies is from the UK Met office. He talked about the progression of computer power and model complexity used for weather and climate modeling.
During the 1980's climate modeling moved from being an academic problem to become the main tool in predicting climate change. Since then with better science and access to more computer power, models have improved significantly. Until now, due to their cost in running for centuries, climate models have been run at relatively low resolution and much of the recent increase in computer power has been used to increase the number of processes e.g. aerosols and the carbon cycle. To try to predict regional impacts of climate change will require higher resolution but to harness the computer power of the next generation of computers will require more scalable model code. It is likely that many of the existing climate models will need major reformulation involving both algorithms and grids to make best use of massively parallel architectures. Whether this can be done without significant compromises in the science remains an open question.
Parallel Programming in Chapel: The Cascade High-Productivity Language
(John) I attended this tutorial for Chapel, a language designed to be expressive enough to program parallel problems, yet hide much of it from the common user. It has facilities behind the scenes to tune applications to the hardware available. It's one of 2 HPC languages being funded by DARPA. It is available for download, and is mostly complete. (there's a copy in ~jford/chapel)
Here's the official abstract:
Chapel is a new parallel language being developed by Cray Inc. to improve the productivity of parallel programmers on large-scale supercomputers, commodity clusters, and multicore workstations. Chapel aims to vastly improve programmability over current parallel programming models while supporting performance and portability at least as good as todays technologies. Though developed by Cray, Chapel is portable, open-source software that supports Linux, Mac, Cray, IBM, SGI, and most other UNIX-based platforms. This tutorial will provide an in-depth introduction to Chapel, from context and motivation to a detailed description of Chapel concepts via lecture and sample computations. A hands-on segment will let participants write, compile, and execute Chapel programs, either using provided accounts or by installing Chapel on their own machine. Well conclude by giving an overview of ongoing Chapel activities and collaborations, and by soliciting participants for their feedback to help improve Chapels applicability to their parallel computing needs.
Workshops, Education Programs
4th International Workshop on High-Performance Reconfigurable Computing Technology & Applications (HPRCTA'10)
(John) I gave a talk on use of heterogeneous computing in radio astronomy (another GUPPI architecture talk). I got several questions about our application and why we didn't just use GPUs for all of it. I suppose I should put a copy in arxiv... Anyway, this workshop continued on the reconfigurable computing theme from years past, with papers on fault isolation, computing on the NOVO-g FPGA machine, and a panel session with Xilinx, Altera, and tool vendors. It was interesting to see how it has progressed over the several workshops I've attended, becoming more concrete, with fewer vendors around to tell the tale.
Verification, Validation and Uncertainty Analysis in High-Performance Computing
- Applying Software Engineering Principles to the Development of Scientific and Engineering Software: Lessons Learned from a Series of Case Studies & Workshops - Jeffrey Carver (University of Alabama, cs.ua.edu/~carver)
- Validation: what is the correct output? Sometimes in science you don't know. Verification: mathematical model > algorithm > code; could be error in any of these. Hard to come up with good test cases. Testing should include inspection (reading requirements and code to see if it implements requirements instead of just running code).
- Developers on HPC: Goal = portability and maintainability over performance (platforms change over time). Prefer stable, lower-level languages; in higher-level languages, compiler didn't always do what they thought it would do. Agile over traditional methodologies - requirements change
- Computational Science & Engineering vs. Software Engineering: Most software development by CSE. CSE main focus is on science/engineering (paper), deep knowledge of domain, shallow knowledge of software engineering techniques. Most commonly, CSE develops algorithms and software, SE optimizes code
- Hardware and Software Considerations for VV&UQ - David Bernholdt (Oak Ridge National Laboratory)
- Problem today is silent data corruption (SDC) errors not caught by checkpoint/restart methods. Errors are in the CPU (disabled ECC), memory (parity errors), storage (sector errors), and network (checksums); rare but with lots of nodes more likely to occur. Today's systems are smaller but less reliable. Voltage is scaled down to minimum to save power.
- Errors must be fixed with defensive programming - don't trust the computer! - Algorithm-Based Fault Tolerance to catch SDC. Add code to detect and correct errors (= and 0 within tolerance), run reduced (simplified) model in backup process; compare results and if main process fails use backup process instead ("just good enough"). This is expensive; sometimes used in testing and development but not in production, but production usually stresses the code more. Use selective enforcement: only use algorithms in critical parts of code.
- SKA talk by person from Oxford e-Research Centre was all about power. Data centers take 0.5% of the world's electricity (more than some countries), 1.5% in the U.S. Need to optimize energy usage vs. performance, i.e. GFLOPS/Watt instead of just the FLOPS themselves. Need energy profilers, not just performance profilers.
- Exploit mixed precision: single precision 2x faster than double precision, saves power; decide where you really need the extra precision.
Education Program: GPGPU, Part I
- Overview of GPU's and CUDA. Need to look at Performance/Watt and Performance/sq.ft. Low overhead for CUDA vs. OpenMP, need more code/thread to be worth it in OpenMP. Same chip in various video cards, but different performance coming off assembly line.
Vendor talks, Technical talks
This was, in my opinion, the best talk I went to all week. Steve Wallach, designer of the Convex computer from years back, has formed a company to build computers with FPGAs attached to the QPI bus on the Intel processor. Xilinx and Intel are investors in the company. They provide math libraries and you can make custom personalities for your applications. His talk was irreverent, and insightful, and not a marketing talk.
Additional comments: Developers today willing to trade productivity for performance, using tools like Matlab. Thought #1 supercomputer Tianhe-1A, which runs at 53% of peak (kept throwing hardware at it until fast enough to be #1) was less impressive than #6 French Tera-100 Bull system (83% of peak, no GPUs) (http://www.top500.org/
). Need optical interconnects for bandwidth (compare to Nvidia statement to use electric). "The easiest to program will be the correct technology." Need new algorithms, not new languages or more hardware.
Various CUDA-based libraries
There were vendors selling CUDA-based libraries to allow you to accelerate your code by simply substituting their LA-pack, BLAS, etc. library for the standard non-GPU library. Speedups of a few 10's.
Coolest exhibit (sorry!)
Hardcore Computer's Liquid Blade. http://www.hardcorecomputer.com/index.html
They sell blade servers (based on Supermicro motherboards) that are liquid-cooled. No rotating disks inside, but they can do solid-state disks.
The cool thing is that they can use the return water to the chiller (any water less than 70 degrees) to remove heat from their cooling fluid with a liquid-liquid heat exchanger. They have no fans.
Pico Computing and Accelize
They have FPGA boards that plug into the PCIe bus, with connectors on them that can be used to hook up an ADC. It could be the equivalent of a ROACH-II on a PCI plug-in board. Accelize also has multi-FPGA boards.
A compact but powerful data acquisition system with gigabit Ethernet, ADCs, DACs, and digital I/O. Would be great for fast sampling of accelerometers, inclinometers, etc. They designed them and have them built outside by someone. It's called a CAPTAN Compact And Programmable daTa Acquisition module http://lss.fnal.gov/archive/2008/pub/fermilab-pub-08-527-cd.pdf
40 Gb/s Ethernet and IB adapters, switches, and cables. Good stuff. Pretty cheap, especially a 40 Gb QDR IB/10 GbE card for < $1K.
A cache-coherent 2 or 3-d toroidal reflective-memory system that hooks to the HyperTransport bus. 48 bit physical address space, up to 4K CPU nodes (not cores, CPUs).
These guys have multi-port 10 Gb Ethernet cards. You can get up to 6 ports in one card. The PCIe bus uses 5 GT/s at x16 for 80 Gbps throughput. We could potentially use this to get data from the new GBT spectrometer FPGA boards to the compute hardware.
Chips and other small Hardware
- Luxtera -- Optics on a chip. Direct electrical to fiber interface on a small chip. 40 Gb/s over 4 fibers
- Mindspeed -- Equalizer chips for PCIe and 100 GbE networks
- Inphi -- Fast amplifiers and Track/hold circuits, comparators (18 GHz analog bandwidth)
- Reflex Photonics -- another vendor of high-density fiber drivers.
- RAID, inc. -- Talked with the CEO and others about new offerings. We can get ~ 2 TB in 2 racks, including servers and networks. Suggested that we should just buy our spectrometer hardware with enough capacity to add disks to make up the full 2 PB we're talking about.
- Blue Arc -- NAS vendor. Says that in ~ 6 months Parallel NFS will be out on their products and we just plug and play, forget about Lustre, etc.
- Data Direct Networks -- Supplies Blue Arc with hardware, also sells non-integrated stuff. They also say our requirements are nothing out of the ordinary.
In general, folks are pushing Lustre for parallel file systems, but Blue Arc says we don't need it for our data rates and file storage.
- New processors out next year, more of the same, but possibly looser coupling between multiprocessors
- CUDA is their main thrust, and will continue to sort-of support OpenCL.
- Putting bigger memories on board the Tesla line.
- "thrust" C++ template library that hides all the CUDA ugliness from users.
- OpenCL is their language
- Their newest offerings compete well on power and speed with Nvidia stuff.
- Their Fusion stuff (CPU + GPU combined) will also use OpenCL.
- I spoke with one of their engineers who thought we could do much better with their processors and GPUs due to the I/O intensive nature of our problems (GUPPI). I will follow up with him and try to get more concrete info.
- Intel has thrown in with OpenCL for its many-core processors and on-chip GPU, if they ever get it done.
Pittsburgh Supercomputer Center
They have a new machine they are looking to fill with jobs. It's an SGI Altix shared memory machine.
Center for High-performance Reconfigurable Computing (CHREC)
CHREC has an FPGA based reconfigurable computing system, the NOVO-g. This system is available for researchers to use for science problems. They ask researchers to buy a node of hardware and design their code and algorithms, then come and use the full-up system of 128 FPGAs in the cluster.