TY - JOUR
T1 - Statistical mechanics of learning multiple orthogonal signals: Asymptotic theory and fluctuation effects
AU - Hoyle, David
AU - Hoyle, D. C.
AU - Rattray, M.
PY - 2007
Y1 - 2007
N2 - The learning of signal directions in high-dimensional data through orthogonal decomposition or principal component analysis (PCA) has many important applications in physics and engineering disciplines, e.g., wireless communication, information theory, and econophysics. The accuracy of the orthogonal decomposition can be studied using mean-field theory. Previous analysis of data produced from a model with a single signal direction has predicted a retarded learning phase transition below which learning is not possible, i.e., if the signal is too weak or the data set is too small then it is impossible to learn anything about the signal direction or magnitude. In this contribution we show that the result can be generalized to the case where there are multiple signal directions. Each nondegenerate signal is associated with a retarded learning transition. However, fluctuations around the mean-field solution lead to large finite size effects unless the signal strengths are very well separated. We evaluate the one-loop contribution to the mean-field theory, which shows that signal directions are indistinguishable from one another if their corresponding population eigenvalues are separated by O (N-T) with exponent Ï.,> 1 3, where N is the data dimension. Numerical simulations are consistent with the analysis and show that finite size effects can persist even for very large data sets. © 2007 The American Physical Society.
AB - The learning of signal directions in high-dimensional data through orthogonal decomposition or principal component analysis (PCA) has many important applications in physics and engineering disciplines, e.g., wireless communication, information theory, and econophysics. The accuracy of the orthogonal decomposition can be studied using mean-field theory. Previous analysis of data produced from a model with a single signal direction has predicted a retarded learning phase transition below which learning is not possible, i.e., if the signal is too weak or the data set is too small then it is impossible to learn anything about the signal direction or magnitude. In this contribution we show that the result can be generalized to the case where there are multiple signal directions. Each nondegenerate signal is associated with a retarded learning transition. However, fluctuations around the mean-field solution lead to large finite size effects unless the signal strengths are very well separated. We evaluate the one-loop contribution to the mean-field theory, which shows that signal directions are indistinguishable from one another if their corresponding population eigenvalues are separated by O (N-T) with exponent Ï.,> 1 3, where N is the data dimension. Numerical simulations are consistent with the analysis and show that finite size effects can persist even for very large data sets. © 2007 The American Physical Society.
KW - PRINCIPAL COMPONENT ANALYSIS
U2 - 10.1103/PhysRevE.75.016101
DO - 10.1103/PhysRevE.75.016101
M3 - Article
C2 - 17358218
SN - 1539-3755
VL - 75
JO - Physical Review E - Statistical, Nonlinear, and Soft Matter Physics
JF - Physical Review E - Statistical, Nonlinear, and Soft Matter Physics
IS - 1
M1 - 016101
ER -