What I read on my Christmas vacation or A summary of "Reconfigurable Computing"
At 875 pages, not including the index, this is a formidable volume. It was written by many different authors, and the editors put it together into a coherent book. It is fairly easy to read, considering the subject matter, and the wide variety of topics covered. Everything from FPGA internal architecture, layout, and device physics, to high-level cryptanalysis algorithms is included.
The book is divided into several six Parts
. Each part takes a shot at a different facet of the topic. The parts are:
- Reconfigurable computing hardware
- Programming RC systems
- Mapping designs to reconfigurable platforms
- Applications development
- Case Studies
- Theoretical underpinnings and future directions
I found items of interest in all the parts, but particularly parts one, two, and four. A good approach to reading this book is to read the introduction to each part, and then go back for more details in those parts of interest.
And now for a whirlwind tour!
Part One, Reconfigurable Computing Hardware
Part one consists of chapters on devices, system architectures, and implemented systems. I found that all of the first three of these were worth a quick read, with Chapter 2, Reconfigurable Computing Architectures, worth a close study. It goes into many of the factors driving differing architectural choices in designing a RC system, i.e., a fine-grained reconfigurable processor fabric, or a reconfigurable co-processor architecture. Chapter 3 consists of a history to date of various projects using RC as part of a system, up to today's Cray XD1 supercomputer.
Part Two, Programming Reconfigurable Systems
This part consists of chapters on system architecture, and programming using different types of tools, for instance C-like tools, VHDL/Verilog, Matlab/simulink (This chapter was written by the BWRC/SSL/RAL folks), and on to a SIMD//Vector organization, and then to a Java based tool for generating systems.
I would submit that except for academic curiosity, one should just skim the chapters that I don't call out specifically below.
Compute Models and System Architectures (Chapter 5)
This chapter, by Andre Dehon, goes into great detail on a taxonomy of compute models and system architectures possible with the reconfigurable computing paradigm.
The analysis of compute models has as a root Hoare's Communicating Sequential Processes. The tree forks there with sequential control processors on one branch, and dataflow models on another. Each of these branches on the tree is further refined into more specific versions of each model. This section is meant to help the reader understand how different applications map onto different computing models. It succeeds!
The systems architecture section deals with how to implement each of the compute models. There are many things to consider when deciding how to implement the system. This section begins with discussion of mapping a dataflow model onto a RC platform, and ends with a discussion of Cellular Automata and multithreaded systems.
Programming Streaming FPGA Applications Using Block Diagrams in Simulink (Chapter 8)
This chapter, Written by Berkeley people, goes through a generic video processor design using the Xilinx system generator along with Simulink. Once this is done, the design is mapped to a BEE2 reconfigurable computing system using the bee2 "yellow blocks". One interesting section in here is a section on building control tasks, i.e. state machines. Sections on building the control flow in different ways, using matlab to generate the system control, using VHDL, and using the embedded power PC's. A section on building libraries is also included that describes the Astronomy Library.
Part Three, Mapping designs to Reconfigurable Platforms
I would suggest on a first read that this part be skipped, as it has to do with developing tools to automatically place and route FPGA's and to modify algorithms for additional pipelining (Chapter 18). Chapter 18 is worth a look for the curious to see some possible enhancements to logic to make it run faster.
Part Four, Application Development
This part contains much good information on deciding if and how to deploy an application on a reconfigurable computing platform, and what platforms might be suitable.
Implementing Applications with FPGA's (Chapter 21)
This chapter talks about the strengths and weaknesses of FPGA's, and how to choose when to use them. It also discusses general implementation strategies and implemeting arithmetic on FPGA's, including a discussion of floating point.
Precision Analysis for Fixed-Point Computation (Chapter 23)
This chapter provides a basis for deciding on word lengths in calculations using fixed point arithmetic. Both analytic and simulation based techniques are covered. The effects of quantization as well as dynamic range are discussed, and wordlenth optimization is covered in great detail.
Distributed Arithmetic (Chapter 24) and the CORDIC Algorithm (Chapter 26)
These chapters describe FPGA optimized implementations of widely known arithmetic algorithms. They are worth reading for those who will be designing such low-level parts of the systems.
Hardware/Software Partitioning (Chapter 26)
This chapter devotes quite a bit of space to helping the reader decide what parts of the system should be partitioned to run on a RC, and what parts should stay in a software system (It also points out that in effect all
of this stuff is software...).
It reminds us of Amdahl's law (The speedup of a parallel system is limited by the non-parallelizable part), e.g., if 75% of the program is parallelizable, the maximum possible speedup is 1/(0.25 + 0.75/n), where n is the number of processes. For an arbitrarily large (infinite) n, the speedup is then 1/(0.25 + ~0) ~= 4.
Part Five, Case Studies of FPGA Applications
This part consists of information on specific applications that have been done. A wide variety of systems are discussed, with chapter 27 (image processing), chapter 31 (floating point on FPGA's), and chapter 32 (FDTD modeling) of interest.
SPIHT Image Compression (Chapter 27)
This system consists of signal processing system using the discrete wavelet transform to compress the image. Analysis of required precision, filter order, etc. are discussed, along with implementation details.
The Implications of Floating Point for FPGA's (Chapter 31)
This chapter discusses the ramifications of using floating point in FPGA's, and provides equations for estimating FLOPS based on memory bandwidth, pipeline depth, and clock speed. Plots are provided for the Virtex-II family for different memory bandwidths, pipeline depths, and clock speeds. A similar plot is developed for FFT's, and includes a comparison of the Pentium-4 processor FLOPS. The FPGA speed surpasses the Pentium at about 8K point FFT's. This is probably due to cache effects, and more modern CPU's with bigger caches likely would do better.
In summary, this is a very good book for the person jumping into the RC waters. There is a lot of good information condensed into the chapter introductions, and lots of footnotes for each chapter. I'll have this book in my office for a while for reference, but I will put it in the library in GB soon. If you want it before I do get it to the library, just ask!
_N.B. Rich Lacasse in CV borrowed the book, and the library didn't want to buy another, so I bought one for the project. It lives in my office on the shelf. You CV guys can pass the library's copy around over there if you want... John _