Cross Data Matrix-Based PCA: Theory and Applications(2/3)

Project Details

Description

Principal component analysis (PCA) has been a useful and important tool for dimension reduction. Yata and Aoshima (2010) proposed a cross data matrix (CDM)- based PCA in the setting of high dimension and low sample size. It has been shown that CDM-based PCA has a broader consistency region than the usual PCA for leading eigenvalues (Yata and Aoshima 2010; Aoshima et al. 2018) and the same consistency region for eigenvectors (Wang et al., 2020). In numerical study, CDM-PCA has a better performance than PCA in some high dimensional and high correlation data (Yata and Aoshima 2020; Wang et al. 2020). These existing results imply that CDM PCA has great potentialities for improving PCA method in high dimensional data. However, it is still lacking with regard to a theoretical evident to support these performances. The project aims to give a theoretical explanation to support the better performance of CDM-PCA in high dimensional and high correlation data. Meanwhile, we will develop a guideline for using CDM-PCA or PCA. In addition, we will investigate the other theoretical properties for CDM-PCA. For example, it is well-known that eigenvalues of PCA for a sample covariance matrix with standard Gaussian entries in the setting of random matrix weakly converge to the Marcenko-Pastur distributions. An interesting question is what the corresponding asymptotic behavior CDM-PCA has? On the other hand, this projection also incorporates the design of CDM into other dimension reduction methods like MPCA, 2SDR, and so on. Further, this design can be used for machine learning to improve computer algorithms in artificial intelligence.
StatusFinished
Effective start/end date1/10/2130/09/22

Keywords

  • Asymptotic normality
  • cross data matrix
  • high dimension
  • low sample size
  • principal component analysis
  • random matrix
  • spiked covariance model
  • perturbation

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.