The main idea of PCA is to seek the most accurate data representation in a lower dimensional space. For a formal definition, according to Wikipedia, Principal component analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components.
A nice application of PCA is plane fitting (3D->2D dimension reduction):
Another example is to project data from 2D to 1D subspace,
The goal of PCA is to minimize the projection error:
The optimal value for each coefficient is just the dot products between x_i and the e, the total error can be simplified to:
The larger the eigenvalue of S, the larger is the variance in the direction of corresponding eigenvector
FLA / LDA
PCA finds the most accurate data representation in a lower dimensional space. However, the directions of maximum variance may be useless for classification. Fisher Linear Discriminant (FLA) project to a line which preserves direction useful for classification. Linear discriminant analysis (LDA) is a generalization of Fisher’s linear discriminant, a method used in statistics, pattern recognition and machine learning to find a linear combination of features that characterizes or separates two or more classes of objects or events. The resulting combination may be used as a linear classifier, or, more commonly, for dimensionality reduction before later classification.
The main idea of FLA is to find projection to a line s.t. samples from different classes are well separated, thus separating the means of each cluster and minimize the variance
LDA only projects data into 1D space that keeps the data easy to classify, and PCA projects data from N-D to (N-1)-D.