HPC in imaging implies Parallelization.
For large data sets in Continuum mode ingle processor imaging is dominated by gridding and degridding of the data (larger than 80% of the time); for large data sets in spectral cube mode (the deconvolution of the the images also become significant).
We have parallelized at 2 levels the gridding and degridding.
In a multithreaded approach the grid is shared among threads with each threads having exclusive access to only a section of the grid (each thread decide which part of the visibilities is useful to them)
In a multiprocess approach, we use the one master - multiple children design (which can be deployed on a beowulf cluster), each child process have a copy of the grid but is provided only a section of the visibility data to work on and at the end of the process the master process reconcile all the grids to make the final image in the continuum case and deconvolution is done . For the spectral cube case the data is partitioned along the spectral axis and each child is assigned the gridding/degridding and deconvolution with the master doing the final resulting cube reconciliation.
With this design depending on memory and cpus available on each node we may do a combination of multiprocess - multithread gridding on each node.
Note that simple gridding/degridding(*) per process or thread consumes data at around 80MB/s on a 2012 Intel's Xeon E5 chip. Thus if the visibility data on disk cannot be fed to each core of a node at that rate then the parallelization speed up gets limited. So the cluster and/or desktop have to have access to I/O system that can support roughly 80MB/s per core available. The data I/O requirement can be relaxed for more complicated gridding/degridding (WProjection, Mosaic/AProjection) as the number of cpu cycles per i/o byte is much larger than simple gridding/degridding
(*) Simple gridding/degridding is gridding with a 7 x 7 pixels spheroidal convolution function.