Graphics Processing Unit (GPU) Acceleration
Traditionally, central processing units (CPUs) are used in workstations to perform computations, but GPU and mixed CPU/GPU processing is integrated in FEKO for run-time critical phases of its solvers.
Single GPU acceleration of the finite difference time domain (FDTD) solver is supported and significant speedup can be achieved.
The most run-time critical solution phase of moderate to large size Method of Moments (MoM) problems is the solution of the system of linear equations. Even though FEKO employs highly optimised libraries for this phase for various CPUs, the GPU capabilities can accelerate this performance by more than an order of magnitude.
FEKO's implementation is unique as it also allows the solutions of problems which exceed the memory of the graphics card (blocking will be done then). Both single and double precision computations are supported. FEKO also has the ability to use multiple GPUs in a single solution run, accelerating solution speed in relation to the number of GPUs that are applied.
As an example, consider the radar scattering from a metallic object at a single frequency. The following figures give the performance in GFLOPS (billions of floating point operations per second) of the matrix solution phase (i.e. excluding times for matrix setup, near- and far-field calculations etc.) for solving this class of problem with the MoM. Measured performance is given for problem sizes ranging from between 2000 and 20000 unknowns, with a higher performance value indicating that a problem of a certain size will be solved more quickly.
|Figure 1: GFLOPS performance (Double Precision) in FEKO for the MoM solution phase of solving the system of linear equations using different Intel CPUs and NVidia Graphics Cards (GPUs).|
|Figure 2: GFLOPS performance (Single Precision) in FEKO for the MoM solution phase of solving the system of linear equations using different Intel CPUs and NVidia Graphics Cards (GPUs).|
The superiority of GPU processing for the matrix solution phase is obvious. Results also show that the single precision solution is about two times faster than double precision and that the addition of extra GPUs improves performance significantly.