Parallel multi-dimensional interpolation in the complex Fourier domain (also known as the non-uniform fast Fourier transform) encounters major challenges, due to computation issues such as increasing computation complexity and space complexity. For instance, the contemporary graphics processing unit (GPU) is limited by the relatively small memory size and the increasing size of the interpolator. This issue makes multidimensional Fourier domain interpolation problematic, while finding an optimized configuration remains an unsolved challenge in industrial applications, e.g. magnetic resonance imaging (MRI) or computerized tomography. To enhance the performance of multi-dimensional interpolation on GPU, a new parallel hierarchical tensor products tree approach is proposed. The method combines the composite 1D interpolators under the limitation imposed by the memory size of the device. The resultant run-time performance on the GPU varies with different configurations. The best-tuned method is 2.52-4.98× faster than the compressed sparse row (CSR) on the discrete GPU and 4.16-9.59× faster than CSR on the integrated GPU. The hierarchical tensor product interpolation is used to compute the multi-dimensional nonuniform fast Fourier transform. An acceleration of 30× was achieved in 3D MRI reconstruction.