Efficient Data Restructuring and Aggregation for I/O Acceleration in PIDX
This paper presented a method for storage of data with an eye toward cache efficiency.
A Framework for Low-communication 1-D FFT
This paper presented a method to calculate 1-D FFTs with only one all-to-all data exchange (vs. three). An implementation of the algorithm is due out in the 2nd quarter of next year in Intel's Math Kernel Library.