Perturbation theory for cross data matrix-based PCA

Shao Hsuan Wang, Su Yun Huang

Research output: Contribution to journalArticlepeer-review

5 Scopus citations


Principal component analysis (PCA) has long been a useful and important tool for dimension reduction. However, this method must be used with care under certain circumstances such as high dimension and small sample size. In general, low dimension with large sample size or large signal to noise ratio is vital to guarantee the consistency of the leading eigenvalues and eigenvectors obtained by PCA. Cross data matrix (CDM)-based PCA is another way to estimate PCA components, through splitting data into two subsets and calculating singular value decomposition for the cross product of the corresponding covariance matrices. It has been shown that CDM-based PCA has a broader region of consistency than ordinary PCA for leading eigenvalues and eigenvectors. Although the difference in regions of consistency is well studied, an interesting practical as well as theoretical question is how they differ in eigenvalues and eigenvectors estimation, especially for the case where both fall in a common region of consistency. In this article, we derive the finite sample approximation results as well as the asymptotic behavior for CDM-based PCA via matrix perturbation. Furthermore, we also derive a comparison measure for CDM-based PCA vs. ordinary PCA. This measure only depends on the data dimension, noise correlations and the noise-to-signal ratio (NSR). Using this measure, we develop an algorithm, which selects good partitions and integrates results from these good partitions to form a final estimate for CDM-based PCA. Numerical and real data examples are presented for illustration.

Original languageEnglish
Article number104960
JournalJournal of Multivariate Analysis
StatePublished - Jul 2022


  • Cross data matrix
  • Finite sample approximation
  • High dimension and low sample size
  • Matrix perturbation
  • Principal component analysis
  • Spiked covariance model


Dive into the research topics of 'Perturbation theory for cross data matrix-based PCA'. Together they form a unique fingerprint.

Cite this