Preconditioning is a simple but very powerful technique for accelerating the convergence of an iterative method for solving large, sparse linear systems. Traditionally, to design a good preconditioner implemented on a personal computer or a cluster of computer systems with CPUs, the preconditioner should meet the following three conditions. (a) the preconditioner is a good approximation of the inverse of the original linear coefficient matrix in some sense, (b) Setting up the components needed for the preconditioning operator is easy, and (c) the cost for the application of the preconditioner with a given vector is low. However, as new computer systems are rapidly invented, such designing strategies need to be reinvestigated. Motivated by the random walk technique, which is a computationally intensive task, we proposed a different preconditioner based on a multi-elimination algorithm, where the inversion operator is approximated explicitly by a partial product of the Neumann series of the iterative matrix for the original system. The computation of such an approximation takes advantages of the strengths of the GPU system. We report some numerical results implemented by CUDA PETSc. In our implementation, we adopt the hybrid GPU–CPU approach that preconditioning setup and application are performed on GPU, while Krylov subspace iteration, which requires a certain degree of communications is done on CPU. This method minimizes data communication between GPU–CPU, which is a bottleneck for many existing algorithms.
- Markov chain
- Multi-elimination preconditioner
- Neumann series
- Parallel processing
- Random walk